998831 Commits

Author SHA1 Message Date
Chris Mi
f94d6389f6 net/mlx5e: TC, Add support to offload sample action
The following diagram illustrates the hardware model for tc sample action:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:05 -07:00
Chris Mi
be9dc00474 net/mlx5e: TC, Handle sampled packets
Mark the sampled packets with a sample restore object. Send sampled
packets using the psample api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi
7319a1cc3c net/mlx5e: TC, Refactor tc update skb function
As a pre-step to process sampled packet in this function.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi
36a3196256 net/mlx5e: TC, Add sampler restore handle API
Use common object pool to create an object ID to map sample parameters.
Allocate a modify header action to write the object ID to reg_c0 lower
16 bits. Create a restore rule to pass the object ID to software. So
software can identify sampled packets via the object ID and send it to
userspace.

Aggregate the modify header action, restore rule and object ID to a
sample restore handle. Re-use identical sample restore handle for
the same object ID.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi
11ecd6c60b net/mlx5e: TC, Add sampler object API
In order to offload sample action, HW introduces sampler object. The
sampler object samples packets according to the provided sample ratio.
Sampled packets are duplicated. One copy is processed by a termination
table, named the sample table, which sends the packet up to software.
The second copy is processed by the default table.

Instantiate sampler object. Re-use identical sampler object for
the same sample ratio, sample table and default table as a prestep for
offloading tc sample actions.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi
2a9ab10a56 net/mlx5e: TC, Add sampler termination table API
Sampled packets are sent to software using termination tables. There
is only one rule in that table that is to forward sampled packets to
the e-switch management vport.

Create a sampler termination table and rule for each eswitch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi
41c2fd9498 net/mlx5e: TC, Parse sample action
Parse TC sample action and save sample parameters in flow attribute
data structure.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi
c935568271 net/mlx5: Instantiate separate mapping objects for FDB and NIC tables
Currently, the u32 chain id is mapped to u16 value which is stored on
the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The
mapping is internally maintained by the chains object. However, with
the introduction of reg_c0 objects the fdb may store more than just
the chain id on reg_c0. This is not relevant for NIC tables.

Separate the chains mapping instantiation for FDB and NIC tables.
Remove the mapping from the chains object. For FDB tables, create
the mapping per eswitch. For NIC tables, create the mapping per tc
table. Pass the corresponding mapping pointer when creating the
chains object.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi
a91d98a0a2 net/mlx5: Map register values to restore objects
Currently reg_c0 lower 16 bits and reg_b are used to store the chain
id that missed in FDB and NIC tables accordingly. However, the
registers' values may index a restore object, rather than a single u32
value. Different object types can be used to restore mutually exclusive
contexts such as chain id and sample group id.

Use the mapping object to associate an index with a restore object
as a prestep for supporting additional restore types.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi
c1904360dd net/mlx5: E-switch, Set per vport table default group number
Different per voprt table is created using a different per vport table
namespace. Because we can't use variable to set the namespace member
value.  If max group number is 0 in the namespace, use the eswitch
default max group number.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi
c796bb7cd2 net/mlx5: E-switch, Generalize per vport table API
Currently, per vport table was used only for port mirroring actions.
However, sample action will also require a per vport table instance.

Generalize the vport table API to work with multiple namespaces where
each namespace manages its own vport table instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Chris Mi
0a9e230787 net/mlx5: E-switch, Rename functions to follow naming convention.
Public api starts with mlx5 and remove mlx5 for non-public api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Chris Mi
4c7f40287a net/mlx5: E-switch, Move vport table functions to a new file
Currently, the vport table functions are in common eswitch offload
file. This file is too big. Move the vport table create, delete and
lookup functions to a separate file. Put the file in esw directory.

Pre-step for generalizing its functionality for serving both the
mirroring and the sample features.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Jakub Kicinski
0b35e0deb5 docs: ethtool: correct quotes
Quotes to backticks. All commands use backticks since the names
are constants.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:56:58 -07:00
Peng Zhang
631a44ed25 nfp: flower: add support for packet-per-second policing
Allow hardware offload of a policer action attached to a matchall filter
which enforces a packets-per-second rate-limit.

e.g.
tc filter add dev tap1 parent ffff: u32 match \
        u32 0 0 police pkts_rate 3000 pkts_burst 1000

Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:47:46 -07:00
David S. Miller
5106efe6ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter/IPVS updates for your net-next tree:

1) Simplify log infrastructure modularity: Merge ipv4, ipv6, bridge,
   netdev and ARP families to nf_log_syslog.c. Add module softdeps.
   This fixes a rare deadlock condition that might occur when log
   module autoload is required. From Florian Westphal.

2) Moves part of netfilter related pernet data from struct net to
   net_generic() infrastructure. All of these users can be modules,
   so if they are not loaded there is no need to waste space. Size
   reduction is 7 cachelines on x86_64, also from Florian.

2) Update nftables audit support to report events once per table,
   to get it aligned with iptables. From Richard Guy Briggs.

3) Check for stale routes from the flowtable garbage collector path.
   This is fixing IPv6 which breaks due missing check for the dst_cookie.

4) Add a nfnl_fill_hdr() function to simplify netlink + nfnetlink
   headers setup.

5) Remove documentation on several statified functions.

6) Remove printk on netns creation for the FTP IPVS tracker,
   from Florian Westphal.

7) Remove unnecessary nf_tables_destroy_list_lock spinlock
   initialization, from Yang Yingliang.

7) Remove a duplicated forward declaration in ipset,
   from Wan Jiabing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:36:41 -07:00
Andy Shevchenko
a460513ed4 time64.h: Consolidated PSEC_PER_SEC definition
We have currently three users of the PSEC_PER_SEC each of them defining it
individually. Instead, move it to time64.h to be available for everyone.

There is a new user coming with the same constant in use. It will also
make its life easier.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:32:17 -07:00
Andy Shevchenko
3036ec035c stmmac: intel: Drop duplicate ID in the list of PCI device IDs
The PCI device IDs are defined with a prefix PCI_DEVICE_ID.
There is no need to repeat the ID part at the end of each definition.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:31:18 -07:00
Qiheng Lin
be107538c5 netdevsim: remove unneeded semicolon
Eliminate the following coccicheck warning:
 drivers/net/netdevsim/fib.c:569:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:27:33 -07:00
Qiheng Lin
3b2c32f96e net: ethernet: mtk_eth_soc: remove unneeded semicolon
Eliminate the following coccicheck warning:
 drivers/net/ethernet/mediatek/mtk_ppe.c:270:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:27:00 -07:00
wengjianfeng
b58c4649d9 nfc: s3fwrn5: remove unnecessary label
In function s3fwrn5_nci_post_setup, the variable ret is assigned then
goto out label, which just return ret, so we use return to replace it.
Other goto sentences are similar, we use return sentences to replace
goto sentences and delete out label.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:24:20 -07:00
David S. Miller
b3feb439f2 Merge branch 'usbnet-speed'
Grant Grundler says:

====================
usbnet: speed reporting for devices without MDIO

This series introduces support for USB network devices that report
speed as a part of their protocol, not emulating an MII to be accessed
over MDIO.

v2: rebased on recent upstream changes
v3: incorporated hints on naming and comments
v4: fix misplaced hunks; reword some commit messages;
    add same change for cdc_ether
v4-repost: added "net-next" to subject and Andrew Lunn's Reviewed-by

I'm reposting Oliver Neukum's <oneukum@suse.com> patch series with
fix ups for "misplaced hunks" (landed in the wrong patches).
Please fixup the "author" if "git am" fails to attribute the
patches 1-3 (of 4) to Oliver.

I've tested v4 series with "5.12-rc3+" kernel on Intel NUC6i5SYB
and + Sabrent NT-S25G. Google Pixelbook Go (chromeos-4.4 kernel)
+ Alpha Network AUE2500C were connected directly to the NT-S25G
to get 2.5Gbps link rate:
Settings for enx002427880815:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 2500Mb/s
        Duplex: Half
        Auto-negotiation: off
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        MDI-X: Unknown
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

"Duplex" is a lie since we get no information about it.

I expect "Auto-Negotiation" is always true for cdc_ncm and
cdc_ether devices and perhaps someone knows offhand how
to have ethtool report "true" instead.

But this is good step in the right direction.

base-commit: 1c273e10bc0cc7efb933e0ca10e260cdfc9f0b8c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:22:37 -07:00
Grant Grundler
d42ebcbb63 net: cdc_ether: record speed in status method
Until very recently, the usbnet framework only had support functions
for devices which reported the link speed by explicitly querying the
PHY over a MDIO interface. However, the cdc_ether devices send
notifications when the link state or link speeds change and do not
expose the PHY (or modem) directly.

Support funtions (e.g. usbnet_get_link_ksettings_internal()) to directly
query state recorded by the cdc_ether driver were added in a previous patch.

Instead of cdc_ether spewing the link speed into the dmesg buffer,
record the link speed encoded in these notifications and tell the
usbnet framework to use the new functions to get link speed/state.

User space can now get the most recent link speed/state using ethtool.

v4: added to series since cdc_ether uses same notifications
    as cdc_ncm driver.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:22:37 -07:00
Oliver Neukum
eb47c274d8 net: cdc_ncm: record speed in status method
Until very recently, the usbnet framework only had support functions
for devices which reported the link speed by explicitly querying the
PHY over a MDIO interface. However, the cdc_ncm devices send
notifications when the link state or link speeds change and do not
expose the PHY (or modem) directly.

Support funtions (e.g. usbnet_get_link_ksettings_internal()) to directly
query state recorded by the cdc_ncm driver were added in a previous patch.

So instead of cdc_ncm spewing the link speed into the dmesg buffer,
record the link speed encoded in these notifications and tell the
usbnet framework to use the new functions to get link speed/state.
Link speed/state is now available via ethtool.

This is especially useful given all current RTL8156 devices emit
a connection/speed status notification every 32ms and this would
fill the dmesg buffer. This implementation replaces the one
recently submitted in de658a195ee23ca6aaffe197d1d2ea040beea0a2 :
   "net: usb: cdc_ncm: don't spew notifications"

v2: rebased on upstream
v3: changed variable names
v4: rewrote commit message

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:22:36 -07:00
Oliver Neukum
956baa9957 usbnet: add method for reporting speed without MII
The old method for reporting link speed assumed a driver uses the
generic phy (mii) MDIO read/write functions. CDC devices don't
expose the phy.

Add a primitive internal version reporting back directly what
the CDC notification/status operations recorded.

v2: rebased on upstream
v3: changed names and made clear which units are used
v4: moved hunks to correct patch; rewrote commmit messages

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:22:36 -07:00
Oliver Neukum
77651900ce usbnet: add _mii suffix to usbnet_set/get_link_ksettings
The generic functions assumed devices provided an MDIO interface (accessed
via older mii code, not phylib). This is true only for genuine ethernet.

Devices with a higher level of abstraction or based on different
technologies do not have MDIO. To support this case, first rename
the existing functions with _mii suffix.

v2: rebased on changed upstream
v3: changed names to clearly say that this does NOT use phylib
v4: moved hunks to correct patch; reworded commmit messages

Signed-off-by : Oliver Neukum <oneukum@suse.com>
Tested-by: Roland Dreier <roland@kernel.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:22:36 -07:00
Manoj Basapathi
e880f8b3a2 tcp: Reset tcp connections in SYN-SENT state
Userspace sends tcp connection (sock) destroy on network switch
i.e switching the default network of the device between multiple
networks(Cellular/Wifi/Ethernet).

Kernel though doesn't send reset for the connections in SYN-SENT state
and these connections continue to remain.
Even as per RFC 793, there is no hard rule to not send RST on ABORT in
this state.

Modify tcp_abort and tcp_disconnect behavior to send RST for connections
in syn-sent state to avoid lingering connections on network switch.

Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: Sauvik Saha <ssaha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:17:20 -07:00
Florian Westphal
db3685b404 net: remove obsolete members from struct net
all have been moved to generic_net infra. On x86_64, this reduces
struct net size from 70 to 63 cache lines (4480 to 4032 byte).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:53 +02:00
Florian Westphal
1379940bf8 netfilter: conntrack: move ecache dwork to net_generic infra
dwork struct is large (>128 byte) and not needed when conntrack module
is not loaded.

Place it in net_generic data instead.  The struct net dwork member is now
obsolete and will be removed in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:53 +02:00
Florian Westphal
7b5974709f netfilter: conntrack: move sysctl pointer to net_generic infra
No need to keep this in struct net, place it in the net_generic data.
The sysctl pointer is removed from struct net in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:53 +02:00
Florian Westphal
1d610d4d31 netfilter: x_tables: move known table lists to net_generic infra
Will reduce struct net size by 208 bytes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
0854db2aae netfilter: nf_tables: use net_generic infra for transaction data
This moves all nf_tables pernet data from struct net to a net_generic
extension, with the exception of the gencursor.

The latter is used in the data path and also outside of the nf_tables
core. All others are only used from the configuration plane.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
5b53951cfc netfilter: ebtables: use net_generic infra
ebtables currently uses net->xt.tables[BRIDGE], but upcoming
patch will move net->xt.tables away from struct net.

To avoid exposing x_tables internals to ebtables, use a private list
instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
7b1957b049 netfilter: nf_defrag_ipv4: use net_generic infra
This allows followup patch to remove the defrag_ipv4 member from struct
net.  It also allows to auto-remove the hooks later on by adding a
_disable() function.  This will be done later in a follow patch series.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
8b0adbe3e3 netfilter: nf_defrag_ipv6: use net_generic infra
This allows followup patch to remove these members from struct net.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Florian Westphal
ebfbe67568 netfilter: cttimeout: use net_generic infra
reduce size of struct net and make this self-contained.
The member in struct net is kept to minimize changes to struct net
layout, it will be removed in a separate patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Florian Westphal
1be05ea766 netfilter: nfnetlink: use net_generic infra
No need to place it in struct net, nfnetlink is a module and usage
doesn't occur in fastpath.

Also remove rcu usage:

Not a single reader of net->nfnl uses rcu accessors.

When exit_batch callbacks are executed the net namespace is already dead
so no calls to these functions are possible anymore (else we'd get NULL
deref crash too).

If the module is removed, then modules that call any of those functions
have been removed too so no calls to nfnl functions are possible either.

The nfnl and nfl_stash pointers in struct net are no longer used, they
will be removed in a followup patch to minimize changes to struct net
(causes rebuild for entire network stack).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Florian Westphal
237c609f87 netfilter: nfnetlink: add and use nfnetlink_broadcast
This removes the only reference of net->nfnl outside of the nfnetlink
module.  This allows to move net->nfnl to net_generic infra.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Krzysztof Kozlowski
cc0626c2aa net: smsc911x: skip acpi_device_id table when !CONFIG_ACPI
The driver can match via multiple methods.  Its acpi_device_id table is
referenced via ACPI_PTR() so it will be unused for !CONFIG_ACPI builds:

  drivers/net/ethernet/smsc/smsc911x.c:2652:36: warning:
    ‘smsc911x_acpi_match’ defined but not used [-Wunused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:05:02 -07:00
Salil Mehta
d392ecd1bc net: hns3: Limiting the scope of vector_ring_chain variable
Limiting the scope of the variable vector_ring_chain to the block where it
is used.

Fixes: 424eb834a9be ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:04:09 -07:00
Andrei Vagin
eeb85a14ee net: Allow to specify ifindex when device is moved to another namespace
Currently, we can specify ifindex on link creation. This change allows
to specify ifindex when a device is moved to another network namespace.

Even now, a device ifindex can be changed if there is another device
with the same ifindex in the target namespace. So this change doesn't
introduce completely new behavior, it adds more control to the process.

CRIU users want to restore containers with pre-created network devices.
A user will provide network devices and instructions where they have to
be restored, then CRIU will restore network namespaces and move devices
into them. The problem is that devices have to be restored with the same
indexes that they have before C/R.

Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 14:49:40 -07:00
Zheng Yongjun
d3295869c4 net: nfc: Fix spelling errors in net/nfc module
These patches fix a series of spelling errors in net/nfc module.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 12:58:31 -07:00
Wu XiangCheng
85d091a794 tipc: Fix a kernel-doc warning in name_table.c
Fix kernel-doc warning:

Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c
  :558: WARNING: Unexpected indentation.
Documentation/networking/tipc:66: /home/sfr/next/next/net/tipc/name_table.c
  :559: WARNING: Block quote ends without a blank line; unexpected unindent.

Due to blank line missing.

Fixes: 908148bc5046 ("tipc: refactor tipc_sendmsg() and tipc_lookup_anycast()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/netdev/20210318172255.63185609@canb.auug.org.au/
Signed-off-by: Wu XiangCheng <bobwxc@email.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 12:51:34 -07:00
Taehee Yoo
4b4b84468a mld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklist
struct ip6_sf_socklist and ipv6_mc_socklist are per-socket MLD data.
These data are protected by rtnl lock, socket lock, and RCU.
So, when these are used, it verifies whether rtnl lock is acquired or not.

ip6_mc_msfget() is called by do_ipv6_getsockopt().
But caller doesn't acquire rtnl lock.
So, when these data are used in the ip6_mc_msfget() lockdep warns about it.
But accessing these is actually safe because socket lock was acquired by
do_ipv6_getsockopt().

So, it changes lockdep annotation from rtnl lock to socket lock.
(rtnl_dereference -> sock_dereference)

Locking graph for mld data is like below:

When writing mld data:
do_ipv6_setsockopt()
    rtnl_lock
    lock_sock
    (mld functions)
        idev->mc_lock(if per-interface mld data is modified)

When reading mld data:
do_ipv6_getsockopt()
    lock_sock
    ip6_mc_msfget()

Splat looks like:
=============================
WARNING: suspicious RCU usage
5.12.0-rc4+ #503 Not tainted
-----------------------------
net/ipv6/mcast.c:610 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by mcast-listener-/923:
 #0: ffff888007958a70 (sk_lock-AF_INET6){+.+.}-{0:0}, at:
ipv6_get_msfilter+0xaf/0x190

stack backtrace:
CPU: 1 PID: 923 Comm: mcast-listener- Not tainted 5.12.0-rc4+ #503
Call Trace:
 dump_stack+0xa4/0xe5
 ip6_mc_msfget+0x553/0x6c0
 ? ipv6_sock_mc_join_ssm+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? mark_held_locks+0xb7/0x120
 ? lockdep_hardirqs_on_prepare+0x27c/0x3e0
 ? __local_bh_enable_ip+0xa5/0xf0
 ? lock_sock_nested+0x82/0xf0
 ipv6_get_msfilter+0xc3/0x190
 ? compat_ipv6_get_msfilter+0x300/0x300
 ? lock_downgrade+0x690/0x690
 do_ipv6_getsockopt.isra.6.constprop.13+0x1809/0x29e0
 ? do_ipv6_mcast_group_source+0x150/0x150
 ? register_lock_class+0x1750/0x1750
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? find_held_lock+0x3a/0x1c0
 ? lock_downgrade+0x690/0x690
 ? ipv6_getsockopt+0xdb/0x1b0
 ipv6_getsockopt+0xdb/0x1b0
[ ... ]

Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 12:50:04 -07:00
Christophe JAILLET
7190e9d8e1 qede: Use 'skb_add_rx_frag()' instead of hand coding it
Some lines of code can be merged into an equivalent 'skb_add_rx_frag()'
call which is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:59:33 -07:00
Christophe JAILLET
1ec3d02f9c qede: Remove a erroneous ++ in 'qede_rx_build_jumbo()'
This ++ is confusing. It looks duplicated with the one already performed in
'skb_fill_page_desc()'.

In fact, it is harmless. 'nr_frags' is written twice with the same value.
Once, because of the nr_frags++, and once because of the 'nr_frags = i + 1'
in 'skb_fill_page_desc()'.

So axe this post-increment to avoid confusion.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:59:33 -07:00
Christophe JAILLET
c438a801e0 sfc: Use 'skb_add_rx_frag()' instead of hand coding it
Some lines of code can be merged into an equivalent 'skb_add_rx_frag()'
call which is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:55:44 -07:00
Christophe JAILLET
c3105f8485 ibmvnic: Use 'skb_frag_address()' instead of hand coding it
'page_address(skb_frag_page()) + skb_frag_off()' can be replaced by an
equivalent 'skb_frag_address()' which is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:55:44 -07:00
Christophe JAILLET
0282bc6ae8 net: ag71xx: Slightly simplify 'ag71xx_rx_packets()'
There is no need to use 'list_for_each_entry_safe' here, as nothing is
removed from the list in the 'for' loop.
Use 'list_for_each_entry' instead, it is slightly less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:53:44 -07:00
Xie He
514e1150da net: x25: Queue received packets in the drivers instead of per-CPU queues
X.25 Layer 3 (the Packet Layer) expects layer 2 to provide a reliable
datalink service such that no packets are reordered or dropped. And
X.25 Layer 2 (the LAPB layer) is indeed designed to provide such service.

However, this reliability is not preserved when a driver calls "netif_rx"
to deliver the received packets to layer 3, because "netif_rx" will put
the packets into per-CPU queues before they are delivered to layer 3.
If there are multiple CPUs, the order of the packets may not be preserved.
The per-CPU queues may also drop packets if there are too many.

Therefore, we should not call "netif_rx" to let it queue the packets.
Instead, we should use our own queue that won't reorder or drop packets.

This patch changes all X.25 drivers to use their own queues instead of
calling "netif_rx". The patch also documents this requirement in the
"x25-iface" documentation.

Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:42:12 -07:00