53869 Commits

Author SHA1 Message Date
Li RongQing
45cf7959c3 net: slightly optimize eth_type_trans
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.

After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparision

Before:
1.03%  [kernel]          [k] eth_type_trans

After:
0.78%  [kernel]          [k] eth_type_trans

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 15:10:59 -08:00
Li RongQing
982c17b9e3 net: remove BUG_ON from __pskb_pull_tail
if list is NULL pointer, and the following access of list
will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 15:07:50 -08:00
Eric Dumazet
08e14fe429 net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows
When EDT conversion happened, fq lost the ability to enfore a maxrate
for all flows. It kept it for non EDT flows.

This commit restores the functionality.

Tested:

tc qd replace dev eth0 root fq maxrate 500Mbit
netperf -P0 -H host -- -O THROUGHPUT
489.75

Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:42:12 -08:00
Amritha Nambiar
5c72299fba net: sched: cls_flower: Classify packets using port ranges
Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v4:
1. Added condition before setting port key.
2. Organized setting and dumping port range keys into functions
   and added validation of input range.

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:38:23 -08:00
Cong Wang
7fe50ac83f net: dump more useful information in netdev_rx_csum_fault()
Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging csum
failures.

Sample output:

 ens3: hw csum failure
 dev features: 0x0000000000014b89
 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0

Note, I use pr_err() just to be consistent with the existing one.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:37:04 -08:00
David Howells
7150ceaacb rxrpc: Fix life check
The life-checking function, which is used by kAFS to make sure that a call
is still live in the event of a pending signal, only samples the received
packet serial number counter; it doesn't actually provoke a change in the
counter, rather relying on the server to happen to give us a packet in the
time window.

Fix this by adding a function to force a ping to be transmitted.

kAFS then keeps track of whether there's been a stall, and if so, uses the
new function to ping the server, resetting the timeout to allow the reply
to come back.

If there's a stall, a ping and the call is *still* stalled in the same
place after another period, then the call will be aborted.

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Fixes: f4d15fb6f99a ("rxrpc: Provide functions for allowing cleaner handling of signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:35:40 -08:00
Linus Torvalds
94ca5c18e1 NFS client bugfixes for Linux 4.20
Highlights include:
 
 Stable fixes:
 - Don't exit the NFSv4 state manager without clearing NFS4CLNT_MANAGER_RUNNING
 
 Bugfixes:
 - Fix an Oops when destroying the RPCSEC_GSS credential cache
 - Fix an Oops during delegation callbacks
 - Ensure that the NFSv4 state manager exits the loop on SIGKILL
 - Fix a bogus get/put in generic_key_to_expire()
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb7Lh0AAoJEA4mA3inWBJc8uAQAIkrGChs3AFuEQ3G3H9RlxDX
 WFsPghRGmDwXf2sD+nWjl0r60v0v5fQaUhW/7EPe2kbVTF/rnjieXNeFOw33ZMFk
 MDq03nL1/I25DoNK/qg5GZ2NIltZ9oKKbwaN+0LxXKz69X5qIXYnDzYPHDR/PNTg
 Go7PvG8rU31Wd67E2pquwC6zZ6rCPf2BtQjZdzouLAEUWXAMHyJmszpFUxhLMJoz
 k6dZouphj8fkMse3cfKLnGDqbQ2bE6+Yb0B6Hi0p5nShYgZTaQNZ9KxrEJF7J05i
 cxH6IvLEawEMWXYzGEwr1LUDDrpwveuNTt/OroTgOcSsVpZx1DE0sOZkQ4pt/uTe
 c5NzZYKjEOb2DWxoGR2GEDkRasKVBkWvR5MegvyDgyAcXkAjN/6CgYXiniNYDxl6
 qk7sIqkJfug7fv+VW5YHwORKnvRIEDlFcwy5yZ0ij/Qa0dqUR3aczINGLwS6kcfn
 u7M42UR17FUo2zaI9pZhuijwntbtkXMIETWHGRctK7Mum6u37QSVySNCO2A4knBE
 jEy+oYPFCIUqH+ESpNp73otrVt1CTexScIJNsEi1naLmOhjQRW7YjUPEH1Xjg0Ss
 OGyqIjOf6ToF6ma39/XZI9miJe08k6x8b0aGUdG29Cko9UvjLH86ODEausSRAyFA
 OyZFFuHHAau5FGpNvZfj
 =AstN
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

 Stable fixes:

   - Don't exit the NFSv4 state manager without clearing
     NFS4CLNT_MANAGER_RUNNING

  Bugfixes:

   - Fix an Oops when destroying the RPCSEC_GSS credential cache

   - Fix an Oops during delegation callbacks

   - Ensure that the NFSv4 state manager exits the loop on SIGKILL

   - Fix a bogus get/put in generic_key_to_expire()"

* tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix an Oops during delegation callbacks
  SUNRPC: Fix a bogus get/put in generic_key_to_expire()
  SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache
  NFSv4: Ensure that the state manager exits the loop on SIGKILL
  NFSv4: Don't exit the state manager without clearing NFS4CLNT_MANAGER_RUNNING
2018-11-15 10:59:37 -06:00
Xin Long
f8504f4ca0 l2tp: fix a sock refcnt leak in l2tp_tunnel_register
This issue happens when trying to add an existent tunnel. It
doesn't call sock_put() before returning -EEXIST to release
the sock refcnt that was held by calling sock_hold() before
the existence check.

This patch is to fix it by holding the sock after doing the
existence check.

Fixes: f6cd651b056f ("l2tp: fix race in duplicate tunnel detection")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 22:49:31 -08:00
Linus Torvalds
4e4490d438 Three nfsd bugfixes. None are new bugs, but they all take a little
effort to hit, which might explain why they weren't found sooner.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb6zwdAAoJECebzXlCjuG+SLMP/AlpI+vPV7DdCLRWGCY1ZMjk
 5pxIS+74mD2EopBYgZY58L1fxWgv2bLOiAs/baAlNpkjTNX3wlxXGTu9IzVdPOn7
 3n+W2Rb+mXEFaag7mP8RFpOvt7Yb3p4DObGpg7TKWJZ6r/8xcxQWQO+e0iiS5+XK
 EOiaFcGmYlOC1JtrRIL2fr16trXUhT1gz7qAZgKBzebbEdn4FfdsdwHm7nUyRB3I
 LhCMV35RfzOBC2C/kQzlHaHYlo0dx5lKMtVzvtgMdpgXr4QXE/7Ke/ANQ7oGfhhO
 9uX0Uf18HmeGRejK9QoMha7VWuwh5pyHBq0ppMpGL2jb11BD/l9iXgS+vTxpA2B0
 YIiSOnaiDFsEk6hMsFqueVIdaTrarcjg/S2mh2QDjtkXKS3L0W6/7v97JJHu9J4l
 6zxiT6Crq2p8pMZ5gY3RI1AYllW/K+TRoccLhO+q19g3q1HWxP6DyeFBgNF66/ha
 NtmQP+94IkaCS70zirpEu/OeUMviQgX2x77OReyibHLA4+R+hNHwtR67BLl+xG0G
 jmKHfqqX7offFaHmsoD8kK3gpKtit0/py9Hp7gXQg4vU5iL512gI83ICEOEkZMXn
 Ppsrl1HyoO/ohY/USpMvRqYHjM1ZGew19ZzD7SId6vUVaYjQIEsjQVnycK3h+gSb
 otk5pc3bWPCwa8csOWPs
 =+1Ub
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Three nfsd bugfixes.

  None are new bugs, but they all take a little effort to hit, which
  might explain why they weren't found sooner"

* tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer()
  nfsd: COPY and CLONE operations require the saved filehandle to be set
  sunrpc: correct the computation for page_ptr when truncating
2018-11-14 15:31:15 -06:00
Jakub Kicinski
c0b7490b19 net: sched: red: notify drivers about RED's limit parameter
RED qdisc's limit parameter changes the behaviour of the qdisc,
for instance if it's set to 0 qdisc will drop all the packets.

When replace operation happens and parameter is set to non-0
a new fifo qdisc will be instantiated and replace the old child
qdisc which will be destroyed.

Drivers need to know the parameter, even if they don't impose
the actual limit to be able to reliably reconstruct the Qdisc
hierarchy.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
d577a3d279 net: sched: mq: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from MQ.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
bf2a752bea net: sched: red: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from RED.  The drivers
are expected to simply stop offloading the Qdisc, if a non-standard
child is ever grafted onto it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:28 -08:00
Jakub Kicinski
98b0e5f684 net: sched: provide notification for graft on root
Drivers are currently not notified when a Qdisc is grafted as root.
This requires special casing Qdiscs added with parent = TC_H_ROOT in
the driver.  Also there is no notification sent to the driver when
an existing Qdisc is grafted as root.

Add this very simple notifications, drivers should now be able to
track their Qdisc tree fully.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14 08:51:27 -08:00
David S. Miller
11123ab9d9 linux-can-fixes-for-4.20-20181109
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEENrCndlB/VnAEWuH5k9IU1zQoZfEFAlvlt0gTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCT0hTXNChl8bMDB/9ElLCS/uh3CznHeX8w24t/LldHoy0q
 eposGQ6+uWV/R7lUfNNUtIAcoSxzuOyXSMh9skz8NdExdQ0/9osnvNWemKTGrfhm
 ndCVmMd7dMoWX2m1VTJ2jrij3MKPe8HmUei+kB9PrhHFNwofNSOvw2dEVjJDSwUW
 gAvs6K/KrHh5ncd9O3JfaXqc9Cs95o0dz4U4AGZ68UjUemx1AmDse2q3JVPQcxn0
 muXoWWFXBbKob/0qpFG0xP9ssdq75AL58dlEqRV+64EMgqWcgvdoPxGGIBbP4t0x
 zMwE3hCaoC7Uogr28tnQrf4kSm5IC33AiMQDKmBQRtzFLxtCI1wE71M4
 =eM20
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.20-20181109' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2018-11-09

this is a pull request of 20 patches for net/master.

First we have a patch by Oliver Hartkopp which changes the raw socket's
raw_sendmsg() to return an error value if the user tries to send a CANFD
frame to a CAN-2.0 device.

The next two patches are by Jimmy Assarsson and fix potential problems
in the kvaser_usb driver.

YueHaibing's patches for the ucan driver fix a compile time warning and
remove a duplicate include.

Eugeniu Rosca patch adds more binding documentation to the rcar_can
driver bindings. The next two patches are by Fabrizio Castro for the
rcar_can driver and fixes a problem in the driver's probe function and
document the r8a774a1 binding.

Lukas Wunner's patch fixes a recpetion problem in hi311x driver by
switching from edge to level triggered interruts.

The next three patches all target the flexcan driver. Pankaj Bansal's
patch unconditionally unlocks the last mailbox used for RX. Alexander
Stein provides a better workaround for a hardware limitation when
sending RTR frames, by using the last mailbox for TX, resulting in fewer
lost frames. The patch by me simplyfies the driver, by making a runtime
value a compile time constant.

The following 4 patches are by me and provide the groundwork for the
next patches by Oleksij Rempel. To avoid code duplication common code in
the common CAN driver infrastructure is factured out and error handling
is cleaned up.

The next 4 patches are by Oleksij Rempel and fix the problem in the
flexcan driver that other processes see TX frames arrive out of order
with ragards to a RX'ed frame (which are send by a different system on
the CAN bus as the result of our TX frame).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-13 08:43:05 -08:00
Trond Myklebust
e3d5e573a5 SUNRPC: Fix a bogus get/put in generic_key_to_expire()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12 16:39:13 -05:00
Trond Myklebust
a652a4bc21 SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache
Commit 07d02a67b7fa causes a use-after free in the RPCSEC_GSS credential
destroy code, because the call to get_rpccred() in gss_destroying_context()
will now always fail to increment the refcount.

While we could just replace the get_rpccred() with a refcount_set(), that
would have the unfortunate consequence of resurrecting a credential in
the credential cache for which we are in the process of destroying the
RPCSEC_GSS context. Rather than do this, we choose to make a copy that
is never added to the cache and use that to destroy the context.

Fixes: 07d02a67b7fa ("SUNRPC: Simplify lookup code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12 16:39:13 -05:00
Xin Long
6ba8457402 sctp: process sk_reuseport in sctp_get_port_local
When socks' sk_reuseport is set, the same port and address are allowed
to be bound into these socks who have the same uid.

Note that the difference from sk_reuse is that it allows multiple socks
to listen on the same port and address.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00
Xin Long
76c6d988ae sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
This is a part of sk_reuseport support for sctp. It defines a helper
sctp_bind_addrs_check() to check if the bind_addrs in two socks are
matched. It will add sock_reuseport if they are completely matched,
and return err if they are partly matched, and alloc sock_reuseport
if all socks are not matched at all.

It will work until sk_reuseport support is added in
sctp_get_port_local() in the next patch.

v1->v2:
  - use 'laddr->valid && laddr2->valid' check instead as Marcelo
    pointed in sctp_bind_addrs_check().

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00
Xin Long
532ae2f10e sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
This is a part of sk_reuseport support for sctp, and it selects a
sock by the hashkey of lport, paddr and dport by default. It will
work until sk_reuseport support is added in sctp_get_port_local()
in the next patch.

v1->v2:
  - define lport as __be16 instead of __be32 as Marcelo pointed in
    __sctp_rcv_lookup_endpoint().

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-12 09:09:51 -08:00
Linus Lüssing
016fd28568 batman-adv: enable MCAST by default at compile time
Thanks to rigorous testing in wireless community mesh networks several
issues with multicast entries in the translation table were found and
fixed in the last 1.5 years. Now we see the first larger networks
(a few hundred nodes) with a batman-adv version with multicast
optimizations enabled arising, with no TT / multicast optimization
related issues so far.

Therefore it seems safe to enable multicast optimizations by default.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
fb939135a6 batman-adv: Move CRC16 dependency to BATMAN_ADV_BLA
The commit ced72933a5e8 ("batman-adv: use CRC32C instead of CRC16 in TT
code") switched the translation table code from crc16 to crc32c. The
(optional) bridge loop avoidance code is the only user of this function.

batman-adv should only select CRC16 when it is actually using it.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
d2d489b7d8 batman-adv: Add inconsistent multicast netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. The already existing
generation sequence counter from the hash helper can be used for this
simple hash.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
6b7b40aad5 batman-adv: Add inconsistent local TT netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. The already existing
generation sequence counter from the hash helper can be used for this
simple hash.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
6f81652a47 batman-adv: Add inconsistent dat netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. The already existing
generation sequence counter from the hash helper can be used for this
simple hash.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
24d71b9232 batman-adv: Add inconsistent claim netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. The already existing
generation sequence counter from the hash helper can be used for this
simple hash.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
b00d0e6a2c batman-adv: Add inconsistent backbone netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. The already existing
generation sequence counter from the hash helper can be used for this
simple hash.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
05abd7bcc9 batman-adv: Store modification counter via hash helpers
Multiple datastructures use the hash helper functions to add and remove
entries from the simple hlist based hashes. These are often also dumped to
userspace via netlink and thus should have a generation sequence counter.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
fb69be6979 batman-adv: Add inconsistent hardif netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. And an external generation
sequence counter is introduced which tracks all modifications of the list.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
9264c85c8b batman-adv: Add inconsistent gateway netlink dump detection
The netlink dump functionality transfers a large number of entries from the
kernel to userspace. It is rather likely that the transfer has to
interrupted and later continued. During that time, it can happen that
either new entries are added or removed. The userspace could than either
receive some entries multiple times or miss entries.

Commit 670dc2833d14 ("netlink: advertise incomplete dumps") introduced a
mechanism to inform userspace about this problem. Userspace can then decide
whether it is necessary or not to retry dumping the information again.

The netlink dump functions have to be switched to exclusive locks to avoid
changes while the current message is prepared. And an external generation
sequence counter is introduced which tracks all modifications of the list.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:51 +01:00
Sven Eckelmann
694127c1dd batman-adv: Fix description for BATMAN_ADV_DEBUG
The debug messages of batman-adv are not printed to the kernel log at all
but can be stored (depending on the compile setting) in the tracing buffer
or the batadv specific log buffer. There is also no debug module parameter
but a batadv netdev specific log_level setting to enable/disable different
classes of debug messages at runtime.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Sven Eckelmann
0dacc7fab6 batman-adv: Allow to use BATMAN_ADV_DEBUG without BATMAN_ADV_DEBUGFS
The BATMAN_ADV_DEBUGFS portion of batman-adv is marked as deprecated. Thus
all required functionality should be available without it. The debug log
was already modified to also output via the kernel tracing function but
still retained its BATMAN_ADV_DEBUGFS functionality.

Separate the entry point for the debug log from the debugfs portions to
make it possible to build with BATMAN_ADV_DEBUG and without
BATMAN_ADV_DEBUGFS.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Sven Eckelmann
95d8f85c91 batman-adv: Improve includes for trace functionality
The batadv_dbg trace event uses different functionality and datastructures
which are not directly associated with the trace infrastructure. It should
not be expected that the trace headers indirectly provide them and instead
include the required headers directly.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Sven Eckelmann
a5dac4da72 batman-adv: Add includes for deprecation warning
The commit 00caf6a2b318 ("batman-adv: Mark debugfs functionality as
deprecated") introduced various messages to inform the user about the
deprecation of the debugfs based functionality. The messages also include
the context/task in which this problem was observed.

The datastructures and functions to access this information require special
headers. These should be included directly instead of depending on a more
complex and fragile include chain.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Sven Eckelmann
01468225f3 batman-adv: Drop unused lockdep include
The commit dee222c7b20c ("batman-adv: Move OGM rebroadcast stats to
orig_ifinfo") removed all used functionality of the include linux/lockdep.h
from batadv_iv_ogm.c.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Simon Wunderlich
3987b6a4cc batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:50 +01:00
Sven Eckelmann
d7d8bbb40a batman-adv: Expand merged fragment buffer for full packet
The complete size ("total_size") of the fragmented packet is stored in the
fragment header and in the size of the fragment chain. When the fragments
are ready for merge, the skbuff's tail of the first fragment is expanded to
have enough room after the data pointer for at least total_size. This means
that it gets expanded by total_size - first_skb->len.

But this is ignoring the fact that after expanding the buffer, the fragment
header is pulled by from this buffer. Assuming that the tailroom of the
buffer was already 0, the buffer after the data pointer of the skbuff is
now only total_size - len(fragment_header) large. When the merge function
is then processing the remaining fragments, the code to copy the data over
to the merged skbuff will cause an skb_over_panic when it tries to actually
put enough data to fill the total_size bytes of the packet.

The size of the skb_pull must therefore also be taken into account when the
buffer's tailroom is expanded.

Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge")
Reported-by: Martin Weinelt <martin@darmstadt.freifunk.net>
Co-authored-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:29 +01:00
Sven Eckelmann
f4156f9656 batman-adv: Use explicit tvlv padding for ELP packets
The announcement messages of batman-adv COMPAT_VERSION 15 have the
possibility to announce additional information via a dynamic TVLV part.
This part is optional for the ELP packets and currently not parsed by the
Linux implementation. Still out-of-tree versions are using it to transport
things like neighbor hashes to optimize the rebroadcast behavior.

Since the ELP broadcast packets are smaller than the minimal ethernet
packet, it often has to be padded. This is often done (as specified in
RFC894) with octets of zero and thus work perfectly fine with the TVLV
part (making it a zero length and thus empty). But not all ethernet
compatible hardware seems to follow this advice. To avoid ambiguous
situations when parsing the TVLV header, just force the 4 bytes (TVLV
length + padding) after the required ELP header to zero.

Fixes: d6f94d91f766 ("batman-adv: ELP - adding basic infrastructure")
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12 10:41:29 +01:00
David S. Miller
2b9b7502df Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-11 17:57:54 -08:00
Eric Dumazet
48872c11b7 net_sched: sch_fq: add dctcp-like marking
Similar to 80ba92fa1a92 ("codel: add ce_threshold attribute")

After EDT adoption, it became easier to implement DCTCP-like CE marking.

In many cases, queues are not building in the network fabric but on
the hosts themselves.

If packets leaving fq missed their Earliest Departure Time by XXX usec,
we mark them with ECN CE. This gives a feedback (after one RTT) to
the sender to slow down and find better operating mode.

Example :

tc qd replace dev eth0 root fq ce_threshold 2.5ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:59:21 -08:00
Eric Dumazet
c73e5807e4 tcp: tsq: no longer use limit_output_bytes for paced flows
FQ pacing guarantees that paced packets queued by one flow do not
add head-of-line blocking for other flows.

After TCP GSO conversion, increasing limit_output_bytes to 1 MB is safe,
since this maps to 16 skbs at most in qdisc or device queues.
(or slightly more if some drivers lower {gso_max_segs|size})

We still can queue at most 1 ms worth of traffic (this can be scaled
by wifi drivers if they need to)

Tested:

# ethtool -c eth0 | egrep "tx-usecs:|tx-frames:" # 40 Gbit mlx4 NIC
tx-usecs: 16
tx-frames: 16
# tc qdisc replace dev eth0 root fq
# for f in {1..10};do netperf -P0 -H lpaa24,6 -o THROUGHPUT;done

Before patch:
27711
26118
27107
27377
27712
27388
27340
27117
27278
27509

After patch:
37434
36949
36658
36998
37711
37291
37605
36659
36544
37349

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:57:03 -08:00
Eric Dumazet
a682850a11 tcp: get rid of tcp_tso_should_defer() dependency on HZ/jiffies
tcp_tso_should_defer() first heuristic is to not defer
if last send is "old enough".

Its current implementation uses jiffies and its low granularity.

TSO autodefer performance should not rely on kernel HZ :/

After EDT conversion, we have state variables in nanoseconds that
can allow us to properly implement the heuristic.

This patch increases TSO chunk sizes on medium rate flows,
especially when receivers do not use GRO or similar aggregation.

It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP
behavior more uniform.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:54:53 -08:00
Eric Dumazet
f1c6ea3827 tcp: refine tcp_tso_should_defer() after EDT adoption
tcp_tso_should_defer() last step tries to check if the probable
next ACK packet is coming in less than half rtt.

Problem is that the head->tstamp might be in the future,
so we need to use signed arithmetics to avoid overflows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:54:53 -08:00
Eric Dumazet
1c09f7d073 tcp: do not try to defer skbs with eor mark (MSG_EOR)
Applications using MSG_EOR are giving a strong hint to TCP stack :

Subsequent sendmsg() can not append more bytes to skbs having
the EOR mark.

Do not try to TSO defer suchs skbs, there is really no hope.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 13:54:53 -08:00
Yafang Shao
5e13a0d3f5 tcp: minor optimization in tcp ack fast path processing
Bitwise operation is a little faster.
So I replace after() with using the flag FLAG_SND_UNA_ADVANCED as it is
already set before.

In addtion, there's another similar improvement in tcp_cwnd_reduction().

Cc: Joe Perches <joe@perches.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 10:24:18 -08:00
Eric Dumazet
7236ead1b1 act_mirred: clear skb->tstamp on redirect
If sch_fq is used at ingress, skbs that might have been
timestamped by net_timestamp_set() if a packet capture
is requesting timestamps could be delayed by arbitrary
amount of time, since sch_fq time base is MONOTONIC.

Fix this problem by moving code from sch_netem.c to act_mirred.c.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 10:21:31 -08:00
Jon Maloy
7ab412d33b tipc: fix link re-establish failure
When a link failure is detected locally, the link is reset, the flag
link->in_session is set to false, and a RESET_MSG with the 'stopping'
bit set is sent to the peer.

The purpose of this bit is to inform the peer that this endpoint just
is going down, and that the peer should handle the reception of this
particular RESET message as a local failure. This forces the peer to
accept another RESET or ACTIVATE message from this endpoint before it
can re-establish the link. This again is necessary to ensure that
link session numbers are properly exchanged before the link comes up
again.

If a failure is detected locally at the same time at the peer endpoint
this will do the same, which is also a correct behavior.

However, when receiving such messages, the endpoints will not
distinguish between 'stopping' RESETs and ordinary ones when it comes
to updating session numbers. Both endpoints will copy the received
session number and set their 'in_session' flags to true at the
reception, while they are still expecting another RESET from the
peer before they can go ahead and re-establish. This is contradictory,
since, after applying the validation check referred to below, the
'in_session' flag will cause rejection of all such messages, and the
link will never come up again.

We now fix this by not only handling received RESET/STOPPING messages
as a local failure, but also by omitting to set a new session number
and the 'in_session' flag in such cases.

Fixes: 7ea817f4e832 ("tipc: check session number before accepting link protocol messages")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 10:03:38 -08:00
LUU Duc Canh
31c4f4cc32 tipc: improve broadcast retransmission algorithm
Currently, the broadcast retransmission algorithm is using the
'prev_retr' field in struct tipc_link to time stamp the latest broadcast
retransmission occasion. This helps to restrict retransmission of
individual broadcast packets to max once per 10 milliseconds, even
though all other criteria for retransmission are met.

We now move this time stamp to the control block of each individual
packet, and remove other limiting criteria. This simplifies the
retransmission algorithm, and eliminates any risk of logical errors
in selecting which packets can be retransmitted.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 09:57:46 -08:00
John Hurley
7f76fa3675 net: sched: register callbacks for indirect tc block binds
Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.

Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The callback is triggered when the
device is bound to a block, allowing the driver to register for rules
applied to that block using already available functions.

Freeing an indirect block callback will trigger an unbind event (if
necessary) to direct the driver to remove any offloaded rules and unreg
any block rule callbacks. It is the responsibility of the implementing
driver to clean any registered indirect block callbacks before exiting,
if the block it still active at such a time.

Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11 09:54:52 -08:00
David S. Miller
e15e067d06 sctp: Fix SKB list traversal in sctp_intl_store_ordered().
Same change as made to sctp_intl_store_reasm().

To be fully correct, an iterator has an undefined value when something
like skb_queue_walk() naturally terminates.

This will actually matter when SKB queues are converted over to
list_head.

Formalize what this code ends up doing with the current
implementation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-10 19:32:23 -08:00
David S. Miller
348bbc25c4 sctp: Fix SKB list traversal in sctp_intl_store_reasm().
To be fully correct, an iterator has an undefined value when something
like skb_queue_walk() naturally terminates.

This will actually matter when SKB queues are converted over to
list_head.

Formalize what this code ends up doing with the current
implementation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-10 19:28:27 -08:00