62755 Commits

Author SHA1 Message Date
Anna Schumaker
e6ac0accb2 SUNRPC: Add an xdr_align_data() function
For now, this function simply aligns the data at the beginning of the
pages. This can eventually be expanded to shift data to the correct
offsets when we're ready.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:40 -04:00
Anna Schumaker
84ce182ab8 SUNRPC: Add the ability to expand holes in data pages
This patch adds the ability to "read a hole" into a set of XDR data
pages by taking the following steps:

1) Shift all data after the current xdr->p to the right, possibly into
   the tail,
2) Zero the specified range, and
3) Update xdr->p to point beyond the hole.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker
43f0f0816c SUNRPC: Split out _shift_data_right_tail()
xdr_shrink_pagelen() is very similar to what we need for hole expansion,
so split out the common code into its own function that can be used by
both functions.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker
06216ecbd9 SUNRPC: Split out xdr_realign_pages() from xdr_align_pages()
I don't need the entire align pages code for READ_PLUS, so split out the
part I do need so I don't need to reimplement anything.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker
cf1f08cac3 SUNRPC: Implement a xdr_page_pos() function
I'll need this for READ_PLUS to help figure out the offset where page
data is stored at, but it might also be useful for other things.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Anna Schumaker
f7d61ee414 SUNRPC: Split out a function for setting current page
I'm going to need this bit of code in a few places for READ_PLUS
decoding, so let's make it a helper function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-07 14:28:39 -04:00
Vincent Mailhol
eb88531bdb can: raw: add missing error queue support
Error queue are not yet implemented in CAN-raw sockets.

The problem: a userland call to recvmsg(soc, msg, MSG_ERRQUEUE) on a
CAN-raw socket would unqueue messages from the normal queue without
any kind of error or warning. As such, it prevented CAN drivers from
using the functionalities that relies on the error queue such as
skb_tx_timestamp().

SCM_CAN_RAW_ERRQUEUE is defined as the type for the CAN raw error
queue. SCM stands for "Socket control messages". The name is inspired
from SCM_J1939_ERRQUEUE of include/uapi/linux/can/j1939.h.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20200926162527.270030-1-mailhol.vincent@wanadoo.fr
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-06 22:44:27 +02:00
Marc Kleine-Budde
80ede649ea can: af_can: can_rcv_list_find(): fix kernel doc after variable renaming
This patch fixes the kernel doc for can_rcv_list_find() which was broken in commit:

    3ee6d2bebef8 ("can: af_can: rename find_rcv_list() to can_rcv_list_find()")

while renaming a variable, but forgetting to rename the kernel doc, too.

Link: http://lore.kernel.org/r/20201006203748.1750156-2-mkl@pengutronix.de
Fixes: 3ee6d2bebef8 ("can: af_can: rename find_rcv_list() to can_rcv_list_find()")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-06 22:42:07 +02:00
Jakub Kicinski
a0de1cd356 ethtool: specify which header flags are supported per command
Perform header flags validation through the policy.

Only pause command supports ETHTOOL_FLAG_STATS. Create a separate
policy to be able to express that in policy dumps to user space.

Note that even though the core will validate the header policy,
it cannot record multiple layers of attributes and we have to
re-parse header sub-attrs. When doing so we could skip attribute
validation, or use most permissive policy. Opt for the former.

We will no longer return the extack cookie for flags but since
we only added first new flag in this release it's not expected
that any user space had a chance to make use of it.

v2: - remove the re-validation in ethnl_parse_header_dev_get()

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Jakub Kicinski
bdbb4e29df netlink: add mask validation
We don't have good validation policy for existing unsigned int attrs
which serve as flags (for new ones we could use NLA_BITFIELD32).
With increased use of policy dumping having the validation be
expressed as part of the policy is important. Add validation
policy in form of a mask of supported/valid bits.

Support u64 in the uAPI to be future-proof, but really for now
the embedded mask member can only hold 32 bits, so anything with
bit 32+ set will always fail validation.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Jakub Kicinski
329d9c333e ethtool: link up ethnl_header_policy as a nested policy
To get the most out of parsing by the core, and to allow dumping
full policies we need to specify which policy applies to nested
attrs. For headers it's ethnl_header_policy.

$ sed -i 's@\(ETHTOOL_A_.*HEADER\].*=\) { .type = NLA_NESTED },@\1\n\t\tNLA_POLICY_NESTED(ethnl_header_policy),@' net/ethtool/*

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Jakub Kicinski
ff419afa43 ethtool: trim policy tables
Since ethtool uses strict attribute validation there's no need
to initialize all attributes in policy tables. 0 is NLA_UNSPEC
which is going to be rejected. Remove the NLA_REJECTs.

Similarly attributes above maxattrs are rejected, so there's
no need to always size the policy tables to ETHTOOL_A_..._MAX.

v2: - new patch

Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Jakub Kicinski
5028588b62 ethtool: wire up set policies to ops
Similarly to get commands wire up the policies of set commands
to get parsing by the core and policy dumps.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Jakub Kicinski
4f30974feb ethtool: wire up get policies to ops
Wire up policies for get commands in struct nla_policy of the ethtool
family. Make use of genetlink code attr validation and parsing, as well
as allow dumping policies to user space.

For every ETHTOOL_MSG_*_GET:
 - add 'ethnl_' prefix to policy name
 - add extern declaration in net/ethtool/netlink.h
 - wire up the policy & attr in ethtool_genl_ops[].
 - remove .request_policy and .max_attr from ethnl_request_ops.

Obviously core only records the first "layer" of parsed attrs
so we still need to parse the sub-attrs of the nested header
attribute.

v2:
 - merge of patches 1 and 2 from v1
 - remove stray empty lines in ops
 - also remove .max_attr

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:25:55 -07:00
Fabian Frederick
560b50cf6c ipv4: use dev_sw_netstats_rx_add()
use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:23:21 -07:00
Fabian Frederick
e40b3727f9 net: openvswitch: use dev_sw_netstats_rx_add()
use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:23:21 -07:00
Fabian Frederick
c852162ea9 xfrm: use dev_sw_netstats_rx_add()
use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:23:21 -07:00
Fabian Frederick
5711eb0502 ipv6: use dev_sw_netstats_rx_add()
use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:23:21 -07:00
David S. Miller
d91dc434f2 rxrpc fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl97RWEACgkQ+7dXa6fL
 C2sxNBAAhr1dnVfGHAV7mUVAv8BtNwY6B+mczIo48k53oiy0+Ngh83yrcdt2EkmY
 s3JdbWq1rVlCps6zOOefKYfXG8FS2guFVDjKl9SaC6nYmxdEPnRmbW9mlhiFg/Na
 xLnYVcJnuHw2ymisaRkARQn4w6F4CfEYBI9pbRpiw2d7vfD+Rziu49JMqVbTc2mF
 g8tY0KPt81TouPlc//5BrY0dFat06gRbBsYcLmL/x/9aNofWg6F8dse9Evixgl3y
 sY+ZwQkIxipYVyfuS9Z2UVhFTcYSvbTKWgvE08f9AK7iO6Y35hI4HIkZckIepgU0
 rRNZY5AAq6Qb/kbGwIN27GDD/Ef8SqrW5NFdyRQykr8h1DIxGi5BlWRpVcpH1d9x
 JI4fAp9dAcySOtusETrOBMvczz9wxB1HSe0tmrUP3lx0DLA484zdR8M+rQNPcEOK
 M/x83hmIkMnmd3dH/eVNx0OwA35KVQ/eW79QsfDhnG2JVms4jwzqe/QfGpwXl2q9
 SYNrlJZe6HjypNdWwMPZLswKzKe+7v9zKxY69TvsdKmqycQf2hVwsIxRmAr1GHEc
 dQX3ag+LzS8elgqWRZ/NC4y8ojUgO73BhgL1DCrSgvu1UIzMC9bNSxrsdN+d3VSt
 ZKzaFGQ9E9GDGSvfVJt/yRAb7kjQdeXchowWSGg804fPEzlGmds=
 =dmWc
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-fixes-20201005' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Miscellaneous fixes

Here are some miscellaneous rxrpc fixes:

 (1) Fix the xdr encoding of the contents read from an rxrpc key.

 (2) Fix a BUG() for a unsupported encoding type.

 (3) Fix missing _bh lock annotations.

 (4) Fix acceptance handling for an incoming call where the incoming call
     is encrypted.

 (5) The server token keyring isn't network namespaced - it belongs to the
     server, so there's no need.  Namespacing it means that request_key()
     fails to find it.

 (6) Fix a leak of the server keyring.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:18:20 -07:00
Igor Russkikh
c6db31ffe2 ethtool: allow netdev driver to define phy tunables
Define get/set phy tunable callbacks in ethtool ops.
This will allow MAC drivers with integrated PHY still to implement
these tunables.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:16:01 -07:00
Vladimir Oltean
302af7c604 net: always dump full packets with skb_dump
Currently skb_dump has a restriction to only dump full packet for the
first 5 socket buffers, then only headers will be printed. Remove this
arbitrary and confusing restriction, which is only documented vaguely
("up to") in the comments above the prototype.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:14:02 -07:00
Eric Dumazet
86bccd0367 tcp: fix receive window update in tcp_add_backlog()
We got reports from GKE customers flows being reset by netfilter
conntrack unless nf_conntrack_tcp_be_liberal is set to 1.

Traces seemed to suggest ACK packet being dropped by the
packet capture, or more likely that ACK were received in the
wrong order.

 wscale=7, SYN and SYNACK not shown here.

 This ACK allows the sender to send 1871*128 bytes from seq 51359321 :
 New right edge of the window -> 51359321+1871*128=51598809

 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0

 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384

 Receiver now allows to send 606*128=77568 from seq 51521241 :
 New right edge of the window -> 51521241+606*128=51598809

 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0

 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384

 It seems the sender exceeds RWIN allowance, since 51611353 > 51598809

 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040

 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0

 netfilter conntrack is not happy and sends RST

 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0
 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0

 Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
 New right edge of the window -> 51521241+859*128=51631193

Normally TCP stack handles OOO packets just fine, but it
turns out tcp_add_backlog() does not. It can update the window
field of the aggregated packet even if the ACK sequence
of the last received packet is too old.

Many thanks to Alexandre Ferrieux for independently reporting the issue
and suggesting a fix.

Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:11:58 -07:00
Paolo Abeni
717f203416 mptcp: don't skip needed ack
Currently we skip calling tcp_cleanup_rbuf() when packets
are moved into the OoO queue or simply dropped. In both
cases we still increment tp->copied_seq, and we should
ask the TCP stack to check for ack.

Fixes: c76c6956566f ("mptcp: call tcp_cleanup_rbuf on subflows")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:08:06 -07:00
Paolo Abeni
017512a07e mptcp: more DATA FIN fixes
Currently data fin on data packet are not handled properly:
the 'rcv_data_fin_seq' field is interpreted as the last
sequence number carrying a valid data, but for data fin
packet with valid maps we currently store map_seq + map_len,
that is, the next value.

The 'write_seq' fields carries instead the value subseguent
to the last valid byte, so in mptcp_write_data_fin() we
never detect correctly the last DSS map.

Fixes: 7279da6145bb ("mptcp: Use MPTCP-level flag for sending DATA_FIN")
Fixes: 1a49b2c2a501 ("mptcp: Handle incoming 32-bit DATA_FIN values")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:06:59 -07:00
Manivannan Sadhasivam
082bb94fe1 net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API
since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence,
fix it by excluding the locking for kernel_sendmsg().

While at it, let's also use radix_tree_deref_retry() to confirm the
validity of the pointer returned by radix_tree_deref_slot() and use
radix_tree_iter_resume() to resume iterating the tree properly before
releasing the lock as suggested by Doug.

Fixes: a7809ff90ce6 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks")
Reported-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:01:35 -07:00
David S. Miller
8b0308fe31 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 18:40:01 -07:00
Linus Torvalds
165563c050 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Make sure SKB control block is in the proper state during IPSEC
    ESP-in-TCP encapsulation. From Sabrina Dubroca.

 2) Various kinds of attributes were not being cloned properly when we
    build new xfrm_state objects from existing ones. Fix from Antony
    Antony.

 3) Make sure to keep BTF sections, from Tony Ambardar.

 4) TX DMA channels need proper locking in lantiq driver, from Hauke
    Mehrtens.

 5) Honour route MTU during forwarding, always. From Maciej
    Żenczykowski.

 6) Fix races in kTLS which can result in crashes, from Rohit
    Maheshwari.

 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
    Jha.

 8) Use correct address family in xfrm state lookups, from Herbert Xu.

 9) A bridge FDB flush should not clear out user managed fdb entries
    with the ext_learn flag set, from Nikolay Aleksandrov.

10) Fix nested locking of netdev address lists, from Taehee Yoo.

11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.

12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.

13) Don't free command entries in mlx5 while comp handler could still be
    running, from Eran Ben Elisha.

14) Error flow of request_irq() in mlx5 is busted, due to an off by one
    we try to free and IRQ never allocated. From Maor Gottlieb.

15) Fix leak when dumping netlink policies, from Johannes Berg.

16) Sendpage cannot be performed when a page is a slab page, or the page
    count is < 1. Some subsystems such as nvme were doing so. Create a
    "sendpage_ok()" helper and use it as needed, from Coly Li.

17) Don't leak request socket when using syncookes with mptcp, from
    Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net/core: check length before updating Ethertype in skb_mpls_{push,pop}
  net: mvneta: fix double free of txq->buf
  net_sched: check error pointer in tcf_dump_walker()
  net: team: fix memory leak in __team_options_register
  net: typhoon: Fix a typo Typoon --> Typhoon
  net: hinic: fix DEVLINK build errors
  net: stmmac: Modify configuration method of EEE timers
  tcp: fix syn cookied MPTCP request socket leak
  libceph: use sendpage_ok() in ceph_tcp_sendpage()
  scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
  drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
  tcp: use sendpage_ok() to detect misused .sendpage
  nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
  net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
  net: introduce helper sendpage_ok() in include/linux/net.h
  net: usb: pegasus: Proper error handing when setting pegasus' MAC address
  net: core: document two new elements of struct net_device
  netlink: fix policy dump leak
  net/mlx5e: Fix race condition on nhe->n pointer in neigh update
  net/mlx5e: Fix VLAN create flow
  ...
2020-10-05 11:27:14 -07:00
David Howells
38b1dc47a3 rxrpc: Fix server keyring leak
If someone calls setsockopt() twice to set a server key keyring, the first
keyring is leaked.

Fix it to return an error instead if the server key keyring is already set.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 17:09:22 +01:00
David Howells
fea9911124 rxrpc: The server keyring isn't network-namespaced
The keyring containing the server's tokens isn't network-namespaced, so it
shouldn't be looked up with a network namespace.  It is expected to be
owned specifically by the server, so namespacing is unnecessary.

Fixes: a58946c158a0 ("keys: Pass the network namespace into request_key mechanism")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:36:06 +01:00
David Howells
2d914c1bf0 rxrpc: Fix accept on a connection that need securing
When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket.  This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref.  This means that the call runs out of refs too
early and the kernel oopses.

By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.

Change the mode of operation for userspace rxrpc server sockets to work
like this too.  Although this is a UAPI change, server sockets aren't
currently functional.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:35:57 +01:00
David Howells
fa1d113a0f rxrpc: Fix some missing _bh annotations on locking conn->state_lock
conn->state_lock may be taken in softirq mode, but a previous patch
replaced an outer lock in the response-packet event handling code, and lost
the _bh from that when doing so.

Fix this by applying the _bh annotation to the state_lock locking.

Fixes: a1399f8bb033 ("rxrpc: Call channels should have separate call number spaces")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:34:32 +01:00
David Howells
9a059cd5ca rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a
type it doesn't recognise, it can BUG in a couple of places, which is
unnecessary as it can easily get back to userspace.

Fix this to print an error message instead.

Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:33:37 +01:00
Marc Dionne
56305118e0 rxrpc: Fix rxkad token xdr encoding
The session key should be encoded with just the 8 data bytes and
no length; ENCODE_DATA precedes it with a 4 byte length, which
confuses some existing tools that try to parse this format.

Add an ENCODE_BYTES macro that does not include a length, and use
it for the key.  Also adjust the expected length.

Note that commit 774521f353e1d ("rxrpc: Fix an assertion in
rxrpc_read()") had fixed a BUG by changing the length rather than
fixing the encoding.  The original length was correct.

Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:33:28 +01:00
Björn Töpel
b75597d894 xsk: Remove internal DMA headers
Christoph Hellwig correctly pointed out [1] that the AF_XDP core was
pointlessly including internal headers. Let us remove those includes.

  [1] https://lore.kernel.org/bpf/20201005084341.GA3224@infradead.org/

Fixes: 1c1efc2af158 ("xsk: Create and free buffer pool independently from umem")
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20201005090525.116689-1-bjorn.topel@gmail.com
2020-10-05 15:48:00 +02:00
Vladimir Oltean
2e554a7a5d net: dsa: propagate switchdev vlan_filtering prepare phase to drivers
A driver may refuse to enable VLAN filtering for any reason beyond what
the DSA framework cares about, such as:
- having tc-flower rules that rely on the switch being VLAN-aware
- the particular switch does not support VLAN, even if the driver does
  (the DSA framework just checks for the presence of the .port_vlan_add
  and .port_vlan_del pointers)
- simply not supporting this configuration to be toggled at runtime

Currently, when a driver rejects a configuration it cannot support, it
does this from the commit phase, which triggers various warnings in
switchdev.

So propagate the prepare phase to drivers, to give them the ability to
refuse invalid configurations cleanly and avoid the warnings.

Since we need to modify all function prototypes and check for the
prepare phase from within the drivers, take that opportunity and move
the existing driver restrictions within the prepare phase where that is
possible and easy.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Landen Chao <Landen.Chao@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 05:56:48 -07:00
Rikard Falkeborn
b980b313e5 net: openvswitch: Constify static struct genl_small_ops
The only usage of these is to assign their address to the small_ops field
in the genl_family struct, which is a const pointer, and applying
ARRAY_SIZE() on them. Make them const to allow the compiler to put them
in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 21:13:36 -07:00
Rikard Falkeborn
674d3ab949 mptcp: Constify mptcp_pm_ops
The only usages of mptcp_pm_ops is to assign its address to the small_ops
field of the genl_family struct, which is a const pointer, and applying
ARRAY_SIZE() on it. Make it const to allow the compiler to put it in
read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 21:13:36 -07:00
Guillaume Nault
4296adc3e3 net/core: check length before updating Ethertype in skb_mpls_{push,pop}
Openvswitch allows to drop a packet's Ethernet header, therefore
skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
doesn't point to an Ethernet header and the new Ethertype is written at
unexpected locations.

Fix this by verifying that mac_len is big enough to contain an Ethernet
header.

Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 15:09:26 -07:00
Cong Wang
580e4273d7 net_sched: check error pointer in tcf_dump_walker()
Although we take RTNL on dump path, it is possible to
skip RTNL on insertion path. So the following race condition
is possible:

rtnl_lock()		// no rtnl lock
			mutex_lock(&idrinfo->lock);
			// insert ERR_PTR(-EBUSY)
			mutex_unlock(&idrinfo->lock);
tc_dump_action()
rtnl_unlock()

So we have to skip those temporary -EBUSY entries on dump path
too.

Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:53:06 -07:00
Andrew Lunn
08156ba430 net: dsa: Add devlink port regions support to DSA
Allow DSA drivers to make use of devlink port regions, via simple
wrappers.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:53 -07:00
Andrew Lunn
544e7c33ec net: devlink: Add support for port regions
Allow regions to be registered to a devlink port. The same netlink API
is used, but the port index is provided to indicate when a region is a
port region as opposed to a device region.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:53 -07:00
Andrew Lunn
3122433eb5 net: dsa: Register devlink ports before calling DSA driver setup()
DSA drivers want to create regions on devlink ports as well as the
devlink device instance, in order to export registers and other tables
per port. To keep all this code together in the drivers, have the
devlink ports registered early, so the setup() method can setup both
device and port devlink regions.

v3:
Remove dp->setup
Move common code out of switch statement.
Fix wrong goto

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:53 -07:00
Andrew Lunn
f15ec13a96 net: dsa: Make use of devlink port flavour unused
If a port is unused, still create a devlink port for it, but set the
flavour to unused. This allows us to attach devlink regions to the
port, etc.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:52 -07:00
Andrew Lunn
cf1166349c net: devlink: Add unused port flavour
Not all ports of a switch need to be used, particularly in embedded
systems. Add a port flavour for ports which physically exist in the
switch, but are not connected to the front panel etc, and so are
unused. By having unused ports present in devlink, it gives a more
accurate representation of the hardware. It also allows regions to be
associated to such ports, so allowing, for example, to determine
unused ports are correctly powered off, or to compare probable reset
defaults of unused ports to used ports experiences issues.

Actually registering unused ports and setting the flavour to unused is
optional. The DSA core will register all such switch ports, but such
ports are expected to be limited in number. Bigger ASICs may decide
not to list unused ports.

v2:
Expand the description about why it is useful

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:38:52 -07:00
David S. Miller
321e921daa Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Rename 'searched' column to 'clashres' in conntrack /proc/ stats
   to amend a recent patch, from Florian Westphal.

2) Remove unused nft_data_debug(), from YueHaibing.

3) Remove unused definitions in IPVS, also from YueHaibing.

4) Fix user data memleak in tables and objects, this is also amending
   a recent patch, from Jose M. Guisado.

5) Use nla_memdup() to allocate user data in table and objects, also
   from Jose M. Guisado

6) User data support for chains, from Jose M. Guisado

7) Remove unused definition in nf_tables_offload, from YueHaibing.

8) Use kvzalloc() in ip_set_alloc(), from Vasily Averin.

9) Fix false positive reported by lockdep in nfnetlink mutexes,
   from Florian Westphal.

10) Extend fast variant of cmp for neq operation, from Phil Sutter.

11) Implement fast bitwise variant, also from Phil Sutter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 14:35:53 -07:00
Phil Sutter
10fdd6d80e netfilter: nf_tables: Implement fast bitwise expression
A typical use of bitwise expression is to mask out parts of an IP
address when matching on the network part only. Optimize for this common
use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
on 32bit-sized values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:33 +02:00
Phil Sutter
5f48846daf netfilter: nf_tables: Enable fast nft_cmp for inverted matches
Add a boolean indicating NFT_CMP_NEQ. To include it into the match
decision, it is sufficient to XOR it with the data comparison's result.

While being at it, store the mask that is calculated during expression
init and free the eval routine from having to recalculate it each time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:32 +02:00
Florian Westphal
ab6c41eefd netfilter: nfnetlink: place subsys mutexes in distinct lockdep classes
From time to time there are lockdep reports similar to this one:

 WARNING: possible circular locking dependency detected
 ------------------------------------------------------
 000000004f61aa56 (&table[i].mutex){+.+.}, at: nfnl_lock [nfnetlink]
 but task is already holding lock:
 [..] (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid [nf_tables]
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #1 (&net->nft.commit_mutex){+.+.}:
 [..]
        nf_tables_valid_genid+0x18/0x60 [nf_tables]
        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
        nfnetlink_rcv+0x110/0x140 [nfnetlink]
        netlink_unicast+0x12c/0x1e0
 [..]
        sys_sendmsg+0x18/0x40
        linux_sparc_syscall+0x34/0x44
 -> #0 (&table[i].mutex){+.+.}:
 [..]
        nfnl_lock+0x24/0x40 [nfnetlink]
        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
        set_match_v1_checkentry+0x14/0xc0 [xt_set]
        xt_check_match+0x238/0x260 [x_tables]
        __nft_match_init+0x160/0x180 [nft_compat]
 [..]
        sys_sendmsg+0x18/0x40
        linux_sparc_syscall+0x34/0x44
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0                    CPU1
        ----                    ----
   lock(&net->nft.commit_mutex);
                                lock(&table[i].mutex);
                                lock(&net->nft.commit_mutex);
   lock(&table[i].mutex);

Lockdep considers this an ABBA deadlock because the different nfnl subsys
mutexes reside in the same lockdep class, but this is a false positive.

CPU1 table[i] refers to the nftables subsys mutex, whereas CPU1 locks
the ipset subsys mutex.

Yi Che reported a similar lockdep splat, this time between ipset and
ctnetlink subsys mutexes.

Time to place them in distinct classes to avoid these warnings.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:32 +02:00
Vasily Averin
9446ab34ac netfilter: ipset: enable memory accounting for ipset allocations
Currently netadmin inside non-trusted container can quickly allocate
whole node's memory via request of huge ipset hashtable.
Other ipset-related memory allocations should be restricted too.

v2: fixed typo ALLOC -> ACCOUNT

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:25 +02:00
YueHaibing
82ec6630f9 netfilter: nf_tables_offload: Remove unused macro FLOW_SETUP_BLOCK
commit 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call")
left behind this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:07:55 +02:00