1185890 Commits

Author SHA1 Message Date
Russell King (Oracle)
dc7a51411e net: phylink: remove duplicated linkmode pause resolution
Phylink had two chunks of code virtually the same for resolving the
negotiated pause modes. Factor this down to one function.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
Russell King (Oracle)
e9261467ae net: mdio: add clause 73 to ethtool conversion helper
Add a helper to convert a clause 73 advertisement to an ethtool bitmap.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24 09:13:22 -07:00
David S. Miller
41a45ea49d Merge branch 'devlink-port_del-new-cleanup'
Jiri Pirko says:

====================
devlink: small port_new/del() cleanup

This patchset cleans up couple of leftovers after recent devlink locking
changes. Previously, both port_new/dev() commands were called without
holding instance lock. Currently all devlink commands are called with
instance lock held.

The first patch just removes redundant port notification.
The second one removes couple of outdated comments.
The last patch changes port_dev() to have devlink_port pointer as an arg
instead of port_index, which makes it similar to the rest of port
related ops.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
Jiri Pirko
9277649c66 devlink: pass devlink_port pointer to ops->port_del() instead of index
Historically there was a reason why port_dev() along with for example
port_split() did get port_index instead of the devlink_port pointer.
With the locking changes that were done which ensured devlink instance
mutex is hold for every command, the port ops could get devlink_port
pointer directly. Change the forgotten port_dev() op to be as others
and pass devlink_port pointer instead of port_index.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
Jiri Pirko
1bb1b57898 devlink: remove no longer true locking comment from port_new/del()
All commands are called holding instance lock. Remove the outdated
comment that says otherwise.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
Jiri Pirko
c496daeb86 devlink: remove duplicate port notification
The notification about created port is send from devl_port_register()
function called from ops->port_new(). No need to send it again here,
so remove the call and the helper function.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 10:34:26 +01:00
David S. Miller
47469d2d59 Merge branch 'tools-ynl-byteorder'
Donald Hunter says:

====================
tools: ynl: Add byte-order support for struct members

This patchset adds support to ynl for handling byte-order in struct
members. The first patch is a refactor to use predefined Struct() objects
instead of generating byte-order specific formats on the fly. The second
patch adds byte-order handling for struct members.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:46:54 +01:00
Donald Hunter
bddd2e561b tools: ynl: Handle byte-order in struct members
Add support for byte-order in struct members in the genetlink-legacy
spec.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:46:54 +01:00
Donald Hunter
7c2435ef76 tools: ynl: Use dict of predefined Structs to decode scalar types
Use a dict of predefined Struct() objects to decode scalar types in native,
big or little endian format. This removes the repetitive code for the
scalar variants and ensures all the signed variants are supported.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:46:54 +01:00
Russell King (Oracle)
59088b5a94 net: phy: avoid kernel warning dump when stopping an errored PHY
When taking a network interface down (or removing a SFP module) after
the PHY has encountered an error, phy_stop() complains incorrectly
that it was called from HALTED state.

The reason this is incorrect is that the network driver will have
called phy_start() when the interface was brought up, and the fact
that the PHY has a problem bears no relationship to the administrative
state of the interface. Taking the interface administratively down
(which calls phy_stop()) is always the right thing to do after a
successful phy_start() call, whether or not the PHY has encountered
an error.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:28:27 +01:00
David S. Miller
18731fe01d Merge branch 'RTO_ONLINK'
Guillaume Nault says:

====================
ipv4: Remove RTO_ONLINK from udp, ping and raw sockets.

udp_sendmsg(), ping_v4_sendmsg() and raw_sendmsg() use similar patterns
for restricting their route lookup to on-link hosts. Although they use
slightly different code, they all use RTO_ONLINK to override the least
significant bit of their tos value.

RTO_ONLINK is used to restrict the route scope even when the scope is
set to RT_SCOPE_UNIVERSE. Therefore it isn't necessary: we can properly
set the scope to RT_SCOPE_LINK instead.

Removing RTO_ONLINK will allow to convert .flowi4_tos to dscp_t in the
future, thus allowing to properly separate the DSCP from the ECN bits
in the networking stack.

This patch series defines a common helper to figure out what's the
scope of the route lookup. This unifies the way udp, ping and raw
sockets get their routing scope and removes their dependency on
RTO_ONLINK.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:22:06 +01:00
Guillaume Nault
0e26371db5 udp: Stop using RTO_ONLINK.
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.

Now that the scope is determined by ip_sendmsg_scope(), we need to
check its result to set the 'connected' variable.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:22:06 +01:00
Guillaume Nault
c85be08fc4 raw: Stop using RTO_ONLINK.
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.

The MSG_DONTROUTE and SOCK_LOCALROUTE cases were already handled by
raw_sendmsg() (SOCK_LOCALROUTE was handled by the RT_CONN_FLAGS*()
macros called by get_rtconn_flags()). However, opt.is_strictroute
wasn't taken into account. Therefore, a side effect of this patch is to
now honour opt.is_strictroute, and thus align raw_sendmsg() with
ping_v4_sendmsg() and udp_sendmsg().

Since raw_sendmsg() was the only user of get_rtconn_flags(), we can now
remove this function.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:22:06 +01:00
Guillaume Nault
726de790f6 ping: Stop using RTO_ONLINK.
Define a new helper to figure out the correct route scope to use on TX,
depending on socket configuration, ancillary data and send flags.

Use this new helper to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag.

The objective is to eventually remove RTO_ONLINK, which will allow
converting .flowi4_tos to dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24 08:22:06 +01:00
Coco Li
a695641c8e gve: Support IPv6 Big TCP on DQ
Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO
packets. See https://lwn.net/Articles/895398/. This can improve the
throughput and CPU usage.

Perf test result:
ip -d link show $DEV
gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000

For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate.

In single streams when line rate is not saturated, we expect throughput improvements.
When the networking is performing at line rate, we expect cpu usage improvements.

Tcp_stream (unidirectional stream test, T=thread, F=flow):
skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68
skb=64kb,  T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67
skb=180kb, T=1, F=1, yes zerocopy:  throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52
skb=64kb,  T=1, F=1, yes zerocopy:  throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25

skb=180kb, T=20, F=100, no zerocopy:  throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4
skb=64kb,  T=20, F=100, no zerocopy:  throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69
skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7
skb=64kb,  T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 21:11:35 -07:00
Jakub Kicinski
51c78a4d53 Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'
David Howells says:

====================
splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1

Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg().  MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can and copy them if not.

This will allow splice to pass multiple pages in a single call and allow
certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire
message in one go rather than having to send them piecemeal.  This should
also make it easier to handle the splicing of multipage folios.

A helper, skb_splice_from_iter() is provided to do the work of splicing or
copying data from an iterator.  If a page is determined to be unspliceable
(such as being in the slab), then the helper will give an error.

Note that this facility is not made available to userspace and does not
provide any sort of callback.

This set consists of the following parts:

 (1) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being
     able to set it.

 (2) Add an extra argument to skb_append_pagefrags() so that something
     other than MAX_SKB_FRAGS can be used (sysctl_max_skb_frags for
     example).

 (3) Add the skb_splice_from_iter() helper to handle splicing pages into
     skbuffs for MSG_SPLICE_PAGES that can be shared by TCP, IP/UDP and
     AF_UNIX.

 (4) Implement MSG_SPLICE_PAGES support in TCP.

 (5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its
     various callers.

 (6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just
     a wrapper around sendmsg().

 (7) Implement MSG_SPLICE_PAGES support in IP6/UDP6.

 (8) Implement MSG_SPLICE_PAGES support in AF_UNIX.

 (9) Make AF_UNIX copy unspliceable pages.

Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/ # v3
Link: https://lore.kernel.org/r/20230405165339.3468808-1-dhowells@redhat.com/ # v4
Link: https://lore.kernel.org/r/20230406094245.3633290-1-dhowells@redhat.com/ # v5
Link: https://lore.kernel.org/r/20230411160902.4134381-1-dhowells@redhat.com/ # v6
Link: https://lore.kernel.org/r/20230515093345.396978-1-dhowells@redhat.com/ # v7
Link: https://lore.kernel.org/r/20230518113453.1350757-1-dhowells@redhat.com/ # v8
====================

Link: https://lore.kernel.org/r/20230522121125.2595254-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:53 -07:00
David Howells
57d44a354a unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES
Convert unix_stream_sendpage() to use sendmsg() with MSG_SPLICE_PAGES
rather than directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:28 -07:00
David Howells
a0dbf5f818 af_unix: Support MSG_SPLICE_PAGES
Make AF_UNIX sendmsg() support MSG_SPLICE_PAGES, splicing in pages from the
source iterator if possible and copying the data in otherwise.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
c49cf26632 ip: Remove ip_append_page()
ip_append_page() is no longer used with the removal of udp_sendpage(), so
remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
7ac7c98785 udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
Convert udp_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
6d8192bd69 ip6, udp6: Support MSG_SPLICE_PAGES
Make IP6/UDP6 sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible, copying the data if not.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
7da0dde684 ip, udp: Support MSG_SPLICE_PAGES
Make IP/UDP sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
5367f9bbb8 tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
Fold do_tcp_sendpages() into its last remaining caller,
tcp_sendpage_locked().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
c2ff29e99a siw: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
e117dcfd64 tls: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
7f8816ab4b espintcp: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
ebf2e8860e tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it.  This is part of replacing ->sendpage() with a call to
sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
c5c37af6ec tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.  do_tcp_sendpages() can then be
inlined in subsequent patches into its callers.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
270a1c3de4 tcp: Support MSG_SPLICE_PAGES
Make TCP's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced or copied (if it cannot be spliced) from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
2e910b9532 net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
Add a function to handle MSG_SPLICE_PAGES being passed internally to
sendmsg().  Pages are spliced into the given socket buffer if possible and
copied in if not (e.g. they're slab pages or have a zero refcount).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
96449f9024 net: Pass max frags into skb_append_pagefrags()
Pass the maximum number of fragments into skb_append_pagefrags() rather
than using MAX_SKB_FRAGS so that it can be used from code that wants to
specify sysctl_max_skb_frags.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
David Howells
b841b901c4 net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can.  This flag is added to a list that
is cleared by sendmsg syscalls on entry.

This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:48:27 -07:00
Jaco Coetzee
57910a47ff nfp: add L4 RSS hashing on UDP traffic
Add layer 4 RSS hashing on UDP traffic to allow for the
utilization of multiple queues for multiple connections on
the same IP address.

Previously, since the introduction of the driver, RSS hashing
was only performed on the source and destination IP addresses
of UDP packets thereby limiting UDP traffic to a single queue
for multiple connections on the same IP address. The transport
layer is now included in RSS hashing for UDP traffic, which
was not previously the case. The reason behind the previous
limitation is unclear - either a historic limitation of the
NFP device, or an oversight.

Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20230522141335.22536-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:36:43 -07:00
Josua Mayer
ac2e8e3cfe net: sfp: add support for HXSX-ATRI-1 copper SFP+ module
Walsun offers commercial ("C") and industrial ("I") variants of
multi-rate copper SFP+ modules.

Add quirk for HXSX-ATRI-1 using same parameters as the already supported
commercial variant HXSX-ATRC-1.

Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230522145242.30192-2-josua@solid-run.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23 20:34:22 -07:00
Ratheesh Kannoth
b2e3406a38 octeontx2-pf: Add support for page pool
Page pool for each rx queue enhance rx side performance
by reclaiming buffers back to each queue specific pool. DMA
mapping is done only for first allocation of buffers.
As subsequent buffers allocation avoid DMA mapping,
it results in performance improvement.

Image        |  Performance
------------ | ------------
Vannila      |   3Mpps
             |
with this    |   42Mpps
change	     |
---------------------------

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://lore.kernel.org/r/20230522020404.152020-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23 10:47:50 +02:00
Jakub Kicinski
62a41dc716 mlx5-updates-2023-05-19
mlx5 misc changes and code clean up:
 
 The following series contains general changes for improving
 E-Switch driver behavior.
 
 1) improving condition checking
 2) Code clean up
 3) Using metadata matching on send-to-vport rules.
 4) Using RoCE v2 instead of v1 for loopback rules.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmRntucACgkQSD+KveBX
 +j4Bigf+O2dpTClTifJ/5PbUi+dGuC7wEZcgwi75BZnoQVirxj8yizaPMgBb+daw
 nW9YGV820eDB51ES7kiz6i9pYRO/LApbXGKvPmNaQw5OHqF7oQqT/dY4RJz2gyBN
 QnTbZ8R8quG8KXa7ftt58ZeNAUxaPe+XxYR/TmSYl6IDUNtqy0pSu3llx2jxQfZS
 ZUp5Vnd2wRwfIsOhI29eGZAzN0ND8NehdCw3tiH3nAcUP9Jv+W4KgYFJj9mkAjbv
 m4f4R3orbMN+4eExeaHygDY2wJ+75YP3hAhA4Wdbm3RfLf3yCMVAIDFomaCnypXd
 ZNeVaijnOrePqJ81nOjRizKJd+vc5Q==
 =YKxP
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-05-19

mlx5 misc changes and code clean up:

The following series contains general changes for improving
E-Switch driver behavior.

1) improving condition checking
2) Code clean up
3) Using metadata matching on send-to-vport rules.
4) Using RoCE v2 instead of v1 for loopback rules.

* tag 'mlx5-updates-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: E-Switch, Initialize E-Switch for eswitch manager
  net/mlx5: devlink, Only show PF related devlink warning when needed
  net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule
  net/mlx5: E-Switch, Use RoCE version 2 for loopback traffic
  net/mlx5e: E-Switch, Add a check that log_max_l2_table is valid
  net/mlx5e: E-Switch: move debug print of adding mac to correct place
  net/mlx5e: E-Switch, Check device is PF when stopping esw offloads
  net/mlx5: Remove redundant vport_group_manager cap check
  net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules
  net/mlx5e: E-Switch, Allow get vport api if esw exists
  net/mlx5e: E-Switch, Update when to set other vport context
  net/mlx5e: Remove redundant __func__ arg from fs_err() calls
  net/mlx5e: E-Switch, Remove flow_source check for metadata matching
  net/mlx5: E-Switch, Remove redundant check
  net/mlx5: Remove redundant esw multiport validate function
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230519175557.15683-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-22 19:12:31 -07:00
Russell King (Oracle)
de5c9bf40c net: phylink: require supported_interfaces to be filled
We have been requiring the supported_interfaces bitmap to be filled in
by MAC drivers that have a mac_select_pcs() method. Now that all MAC
drivers fill in the supported_interfaces bitmap, it is time to enforce
this. We have already required supported_interfaces to be set in order
for optical SFPs to be configured in commit f81fa96d8a6c ("net: phylink:
use phy_interface_t bitmaps for optical modules").

Refuse phylink creation if supported_interfaces is empty, and remove
code to deal with cases where this mask is empty.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1q0K1u-006EIP-ET@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-22 19:07:42 -07:00
Russell King (Oracle)
5859a99b52 net: sfp: add support for a couple of copper multi-rate modules
Add support for the Fiberstore SFP-10G-T and Walsun HXSX-ATRC-1
modules. Internally, the PCB silkscreen has what seems to be a part
number of WT_502. Fiberstore use v2.2 whereas Walsun use v2.6.

These modules contain a Marvell AQrate AQR113C PHY, accessible through
I2C 0x51 using the "rollball" protocol. In both cases, the PHY is
programmed to use 10GBASE-R with pause-mode rate adaption.

Unlike the other rollball modules, these only need a four second delay
before we can talk to the PHY.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1q0JfS-006Dqc-8t@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-22 19:07:29 -07:00
David S. Miller
d49b9b0772 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ice: allow matching on meta data

Michal Swiatkowski says:

This patchset is intended to improve the usability of the switchdev
slow path. Without matching on a meta data values slow path works
based on VF's MAC addresses. It causes a problem when the VF wants
to use more than one MAC address (e.g. when it is in trusted mode).

Parse all meta data in the same place where protocol type fields are
parsed. Add description for the currently implemented meta data. It is
important to note that depending on DDP not all described meta data can
be available. Using not available meta data leads to error returned by
function which is looking for correct words in profiles read from DDP.

There is also one small improvement, remove of rx field in rule info
structure (patch 2). It is redundant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 12:44:44 +01:00
Uwe Kleine-König
efc3001f8b nfc: Switch i2c drivers back to use .probe()
After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498b5 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:41:57 +01:00
Pavel Begunkov
fe79bd65c8 net/tcp: refactor tcp_inet6_sk()
Don't keep hand coded offset caluclations and replace it with
container_of(). It should be type safer and a bit less confusing.

It also makes it with a macro instead of inline function to preserve
constness, which was previously casted out like in case of
tcp_v6_send_synack().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:22:58 +01:00
Russell King
4b159f5048 net: phy: add helpers for comparing phy IDs
There are several places which open code comparing PHY IDs. Provide a
couple of helpers to assist with this, using a slightly simpler test
than the original:

- phy_id_compare() compares two arbitary PHY IDs and a mask of the
  significant bits in the ID.
- phydev_id_compare() compares the bound phydev with the specified
  PHY ID, using the bound driver's mask.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:21:48 +01:00
Russell King (Oracle)
8b6b7c1190 net: altera: tse: remove mac_an_restart() function
The mac_an_restart() method will only be called if the driver sets
legacy_pre_march2020, which the altera tse driver does not do.
Therefore, providing a stub is unnecessary.

Fixes: fef2998203e1 ("net: altera: tse: convert to phylink")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:20:35 +01:00
Arnd Bergmann
dbb99d7852 net: ipconfig: move ic_nameservers_fallback into #ifdef block
The new variable is only used when IPCONFIG_BOOTP is defined and otherwise
causes a warning:

net/ipv4/ipconfig.c:177:12: error: 'ic_nameservers_fallback' defined but not used [-Werror=unused-variable]

Move it next to the user.

Fixes: 81ac2722fa19 ("net: ipconfig: Allow DNS to be overwritten by DHCPACK")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:17:55 +01:00
Wei Fang
2ae9c66b04 net: fec: remove useless fec_enet_reset_skb()
This patch is a cleanup for fec driver. The fec_enet_reset_skb()
is used to free skb buffers for tx queues and is only invoked in
fec_restart(). However, fec_enet_bd_init() also resets skb buffers
and is invoked in fec_restart() too. So fec_enet_reset_skb() is
redundant and useless.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:10:49 +01:00
Wei Fang
e4ac7cc6e5 net: fec: turn on XDP features
The XDP features are supported since the commit 66c0e13ad236
("drivers: net: turn on XDP features"). Currently, the fec
driver supports NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT
and NETDEV_XDP_ACT_NDO_XMIT. So turn on these XDP features
for fec driver.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-22 11:09:53 +01:00
Jakub Kicinski
dcbe4ea198 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-05-18 (igc, igb, e1000e)

This series contains updates to igc, igb, and e1000e drivers.

Kurt Kanzenbach adds calls to txq_trans_cond_update() for XDP transmit
on igc.

Tom Rix makes definition of igb_pm_ops conditional on CONFIG_PM for igb.

Baozhu Ni adds a missing kdoc description on e1000e.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000e: Add @adapter description to kdoc
  igb: Define igb_pm_ops conditionally on CONFIG_PM
  igc: Avoid transmit queue timeout for XDP
====================

Link: https://lore.kernel.org/r/20230518170942.418109-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-19 21:52:18 -07:00
Roi Dayan
f5d87b47a1 net/mlx5e: E-Switch, Initialize E-Switch for eswitch manager
Initialize eswitch instance for a function which is eswitch manager
but not a vport group manager.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19 10:50:31 -07:00
Roi Dayan
0279b5454c net/mlx5: devlink, Only show PF related devlink warning when needed
Limit the PF related warning to show if device is actually a PF.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19 10:50:31 -07:00
Roi Dayan
7eb197fd83 net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule
Use metadata matching for RoCE loopback rule if device is configured
to use metadata for source port matching.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19 10:50:31 -07:00