Commit Graph

1075916 Commits

Author SHA1 Message Date
Matthieu Baerts
e59300ce3f selftests: mptcp: join: reset failing links
Best to always reset this env var before each test to avoid surprising
behaviour depending on the order tests are running.

Also clearly set it for the last failing links test is also needed when
only this test is executed.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 12:29:58 -08:00
Matthieu Baerts
3afd0280e7 selftests: mptcp: join: define tests groups once
When adding a new tests group, it has to be defined in multiple places:

- in the all_tests() function
- in the 'usage()' function
- in the getopts: short option + what to do when the option is used

Because it is easy to forget one of them, it is useful to have to define
them only once.

Note: only using an associative array would simplify the code but the
entries are stored in a hashtable and iterating over the different items
doesn't give the same order as the one used in the declaration of this
array. Because we want to run these tests in the same order as before, a
"simple" array is used first.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 12:29:57 -08:00
Geliang Tang
3c082695e7 selftests: mptcp: drop msg argument of chk_csum_nr
This patch dropped the msg argument of chk_csum_nr, to unify chk_csum_nr
with other chk_*_nr functions.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 12:29:57 -08:00
Luiz Angelo Daros de Luca
3126b731ce net: dsa: tag_rtl8_4: fix typo in modalias name
DSA_TAG_PROTO_RTL8_4L is not defined. It should be
DSA_TAG_PROTO_RTL8_4T.

Fixes: cd87fecded ("net: dsa: tag_rtl8_4: add rtl8_4t trailing variant")
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220309175641.12943-1-luizluca@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:36:24 -08:00
Robert Hancock
6c7e7da2e0 net: axienet: Use napi_alloc_skb when refilling RX ring
Use napi_alloc_skb to allocate memory when refilling the RX ring
in axienet_poll for more efficiency. napi_alloc_skb() can reuse
softirq-local cache of freed skbs which may still be cache-warm
and skipping allocator calls.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220308211013.1530955-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:19:16 -08:00
Eric Dumazet
65466904b0 tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)

~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:05:44 -08:00
Eric Dumazet
b0de0cf4f5 tcp: autocork: take MSG_EOR hint into consideration
tcp_should_autocork() is evaluating if it makes senses
to not immediately send current skb, hoping that
user space will add more payload on it by the
time TCP stack reacts to upcoming TX completions.

If current skb got MSG_EOR mark, then we know
that no further data will be added, it is therefore
futile to wait.

SOF_TIMESTAMPING_TX_ACK will become a bit more accurate,
if prior packets are still in qdisc/device queues.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20220309054706.2857266-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:05:20 -08:00
Michael Sit Wei Hong
30c5601fbf stmmac: intel: Add ADL-N PCI ID
Add PCI ID for Ethernet TSN Controller on ADL-N.

Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Link: https://lore.kernel.org/r/20220309033415.3370250-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:04:53 -08:00
Dust Li
d9f5099159 net/smc: fix -Wmissing-prototypes warning when CONFIG_SYSCTL not set
when CONFIG_SYSCTL not set, smc_sysctl_net_init/exit
need to be static inline to avoid missing-prototypes
if compile with W=1.

Since __net_exit has noinline annotation when CONFIG_NET_NS
not set, it should not be used with static inline.
So remove the __net_init/exit when CONFIG_SYSCTL not set.

Fixes: 7de8eb0d90 ("net/smc: fix compile warning for smc_sysctl")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220309033051.41893-1-dust.li@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:02:35 -08:00
Jakub Kicinski
c01e605904 Merge branch 'net-fungible-fix-errors-when-config_tls_device-n'
Dimitris Michailidis says:

====================
net/fungible: fix errors when CONFIG_TLS_DEVICE=n

This pair of patches fix compile errors in funeth when
CONFIG_TLS_DEVICE=n. The errors are due to symbols that are not defined
in this config but are used in code guarded by
"if (IS_ENABLED(CONFIG_TLS_DEVICE) ..."

One option is to place this code under preprocessor guards that will
keep the compiler from looking at the code. The option adopted here is
to define the offending symbols also when CONFIG_TLS_DEVICE=n.

The first patch does this for two functions in tls.h.
The second does the same for driver symbols and makes tls.h inclusion
unconditional.
====================

Link: https://lore.kernel.org/r/20220309034032.405212-1-dmichail@fungible.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:01:20 -08:00
Dimitris Michailidis
b23f923919 net/fungible: fix errors when CONFIG_TLS_DEVICE=n
Include the TLS headers unconditionally and define driver TLS symbols
used in code compiled also when CONFIG_TLS_DEVICE=n to fix the
following errors:

../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘write_pkt_desc’:
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:13: error: implicit declaration of function ‘tls_driver_ctx’ [-Werror=implicit-function-declaration]
  244 |   tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX);
      |             ^~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: error: ‘TLS_OFFLOAD_CTX_DIR_TX’ undeclared (first use in this function)
  244 |   tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX);
      |                                     ^~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: note: each undeclared identifier is reported only once for each function it appears in
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:245:23: error: dereferencing pointer to incomplete type ‘struct fun_ktls_tx_ctx’
  245 |   tls->tlsid = tls_ctx->tlsid;
      |                       ^~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘fun_start_xmit’:
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:310:6: error: implicit declaration of function ‘tls_is_sk_tx_device_offloaded’ [-Werror=implicit-function-declaration]
  310 |      tls_is_sk_tx_device_offloaded(skb->sk)) {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:9: error: implicit declaration of function ‘fun_tls_tx’; did you mean ‘fun_xdp_tx’? [-Werror=implicit-function-declaration]
  311 |   skb = fun_tls_tx(skb, q, &tls_len);
      |         ^~~~~~~~~~
      |         fun_xdp_tx
../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:7: warning: assignment to ‘struct sk_buff *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
  311 |   skb = fun_tls_tx(skb, q, &tls_len);
      |       ^

Fixes: db37bc177d ("net/funeth: add the data path")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:01:19 -08:00
Dimitris Michailidis
77f09e66f6 net/tls: Provide {__,}tls_driver_ctx() unconditionally
Having the definitions of {__,}tls_driver_ctx() under an #if
guard means code referencing them also needs to rely on the
preprocessor. The protection doesn't appear needed so make the
definitions unconditional.

Fixes: db37bc177d ("net/funeth: add the data path")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:01:14 -08:00
Jakub Kicinski
4a5eaa2fde bnxt: revert hastily merged uAPI aberrations
This reverts:
 commit 02acd39953 ("bnxt_en: parse result field when NVRAM package install fails")
 commit 22f5dba506 ("bnxt_en: add an nvm test for hw diagnose")
 commit bafed3f231 ("bnxt_en: implement hw health reporter")

These patches are still under discussion / I don't think they
are right, and since the authors don't reply promptly let me
lessen my load of "things I need to resolve before next release"
and revert them.

Acked-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220308173659.304915-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:55:00 -08:00
Heiner Kallweit
1a21277190 net: stmmac: switch no PTP HW support message to info level
If HW doesn't support PTP, then it doesn't support it. This is neither
a problem nor can the user do something about it. Therefore change the
message level to info.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/ee685745-f1ab-e9bf-f20e-077d55dff441@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:54:32 -08:00
Chen Yu
91ec779247 e1000e: Print PHY register address when MDI read/write fails
There is occasional suspend error from e1000e which blocks the
system from further suspending. And the issue was found on
a WhiskeyLake-U platform with I219-V:

[   20.078957] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x780 [e1000e] returns -2
[   20.078970] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -2
[   20.078974] e1000e 0000:00:1f.6: PM: pci_pm_suspend+0x0/0x170 returned -2 after 371012 usecs
[   20.078978] e1000e 0000:00:1f.6: PM: failed to suspend async: error -2

According to the code flow, this might be caused by broken MDI read/write
to PHY registers. However currently the code does not tell us which
register is broken. Thus enhance the debug information to print the
offender PHY register. So the next the issue is reproduced, this
information could be used for narrow down.

Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220308172030.451566-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:53:03 -08:00
Min Li
013a3e7c79 ptp: idt82p33: use rsmu driver to access i2c/spi bus
rsmu (Renesas Synchronization Management Unit ) driver is located in
drivers/mfd and responsible for creating multiple devices including
idt82p33 phc, which will then use the exposed regmap and mutex
handle to access i2c/spi bus.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/1646748651-16811-1-git-send-email-min.li.xe@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:50:57 -08:00
Oleksij Rempel
e18058ea99 net: dsa: microchip: ksz9477: implement MTU configuration
This chips supports two ways to configure max MTU size:
- by setting SW_LEGAL_PACKET_DISABLE bit: if this bit is 0 allowed packed size
  will be between 64 and bytes 1518. If this bit is 1, it will accept
  packets up to 2000 bytes.
- by setting SW_JUMBO_PACKET bit. If this bit is set, the chip will
  ignore SW_LEGAL_PACKET_DISABLE value and use REG_SW_MTU__2 register to
  configure MTU size.

Current driver has disabled SW_JUMBO_PACKET bit and activates
SW_LEGAL_PACKET_DISABLE. So the switch will pass all packets up to 2000 without
any way to configure it.

By providing port_change_mtu we are switch to SW_JUMBO_PACKET way and will
be able to configure MTU up to ~9000.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220308135857.1119028-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:47:18 -08:00
Guo Zhengkui
e58bc86463 drivers: vxlan: fix returnvar.cocci warning
Fix the following coccicheck warning:

drivers/net/vxlan/vxlan_core.c:2995:5-8:
Unneeded variable: "ret". Return "0" on line 3004.

Fixes: f9c4bb0b24 ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Link: https://lore.kernel.org/r/20220308134321.29862-1-guozhengkui@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 19:40:36 -08:00
Vladimir Oltean
24055bb879 net: tcp: fix shim definition of tcp_inbound_md5_hash
When CONFIG_TCP_MD5SIG isn't enabled, there is a compilation bug due to
the fact that the static inline definition of tcp_inbound_md5_hash() has
an unexpected semicolon. Remove it.

Fixes: 1330b6ef33 ("skb: make drop reason booleanable")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220309122012.668986-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 08:44:40 -08:00
Lukas Bulwahn
7f415828f9 MAINTAINERS: rectify entry for REALTEK RTL83xx SMI DSA ROUTER CHIPS
Commit 429c83c78a ("dt-bindings: net: dsa: realtek: convert to YAML
schema, add MDIO") converts realtek-smi.txt to realtek.yaml, but missed to
adjust its reference in MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.

Repair this file reference in REALTEK RTL83xx SMI DSA ROUTER CHIPS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 15:02:04 +00:00
Horatiu Vultur
0dbdf819f4 net: lan966x: Add spinlock for frame transmission from CPU.
The registers used to inject a frame to one of the ports is shared
between all the net devices. Therefore, there can be race conditions for
accessing the registers when two processes send frames at the same time
on different ports.

To fix this, add a spinlock around the function
'lan966x_port_ifh_xmit()'.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 14:59:14 +00:00
Changcheng Deng
2c9ec169f7 net: ethernet: sun: use min_t() to make code cleaner
Use min_t() in order to make code cleaner.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 14:58:06 +00:00
Dimitris Michailidis
40bb09c87f net/fungible: CONFIG_FUN_CORE needs SBITMAP
fun_core.ko uses sbitmaps and needs to select SBITMAP.
Fixes below errors:

ERROR: modpost: "__sbitmap_queue_get"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!
ERROR: modpost: "sbitmap_finish_wait"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!
ERROR: modpost: "sbitmap_queue_clear"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!
ERROR: modpost: "sbitmap_prepare_to_wait"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!
ERROR: modpost: "sbitmap_queue_init_node"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!
ERROR: modpost: "sbitmap_queue_wake_all"
[drivers/net/ethernet/fungible/funcore/funcore.ko] undefined!

v2: correct "Fixes" SHA

Fixes: 749efb1e6d ("net/fungible: Kconfig, Makefiles, and MAINTAINERS")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:26:44 +00:00
Dimitris Michailidis
cdba24904e net/fungible: Fix local_memory_node error
Stephen Rothwell reported the following failure on powerpc:

ERROR: modpost: ".local_memory_node"
[drivers/net/ethernet/fungible/funeth/funeth.ko] undefined!

AFAICS this is because local_memory_node() is a non-inline non-exported
function when CONFIG_HAVE_MEMORYLESS_NODES=y. It is also the wrong API
to get a CPU's memory node. Use cpu_to_mem() in the two spots it's used.

Fixes: ee6373ddf3 ("net/funeth: probing and netdev ops")
Fixes: db37bc177d ("net/funeth: add the data path")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:26:07 +00:00
Jakub Kicinski
1330b6ef33 skb: make drop reason booleanable
We have a number of cases where function returns drop/no drop
decision as a boolean. Now that we want to report the reason
code as well we have to pass extra output arguments.

We can make the reason code evaluate correctly as bool.

I believe we're good to reorder the reasons as they are
reported to user space as strings.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:22:58 +00:00
David S. Miller
1163319993 Merge branch 'dsa-next-fixups'
Vladimir Oltean says:

====================
Incremental fixups for DSA unicast filtering

There are some bugs I've discovered in the recently merged "DSA unicast
filtering" series:
https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/

First bug is the dereference of an uninitialized list (dp->fdbs) when
the "initial" tag protocol is placed in the device tree for the Felix
switch driver. This is a scenario I hadn't tested. It is handled by
patches 1-3.

Second bug is actually a sum of bugs that canceled each other out during
my previous testing. The MAC address change of a DSA slave interface
breaks termination for the other slave interfaces. But this actually
does not happen if the slave interface whose address is changing is
down. And even when up, traffic termination is still not broken because
we fail to properly disable host flooding. Patches 4-6 handle this for
the Felix driver (the only one benefiting from unicast filtering so far).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:10 +00:00
Vladimir Oltean
7e580490ac net: dsa: felix: avoid early deletion of host FDB entries
The Felix driver declares FDB isolation but puts all standalone ports in
VID 0. This is mostly problem-free as discussed with Alvin here:
https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870

however there is one catch. DSA still thinks that FDB entries are
installed on the CPU port as many times as there are user ports, and
this is problematic when multiple user ports share the same MAC address.

Consider the default case where all user ports inherit their MAC address
from the DSA master, and then the user runs:

ip link set swp0 address 00:01:02:03:04:05

The above will make dsa_slave_set_mac_address() call
dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's
standalone database, and dsa_port_standalone_host_fdb_del() for the old
address of swp0, again in swp0's standalone database.

Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down
to the felix driver, which will end up deleting the old MAC address from
the CPU port. But this is still in use by other user ports, so we end up
breaking unicast termination for them.

There isn't a problem in the fact that DSA keeps track of host
standalone addresses in the individual database of each user port: some
drivers like sja1105 need this. There also isn't a problem in the fact
that some drivers choose the same VID/FID for all standalone ports.
It is just that the deletion of these host addresses must be delayed
until they are known to not be in use any longer, and only the driver
has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and
&cpu_db->mdbs, it is just a matter of walking over those lists and see
whether the same MAC address is present on the CPU port in the port db
of another user port.

I have considered reusing the generic dsa_port_walk_fdbs() and
dsa_port_walk_mdbs() schemes for this, but locking makes it difficult.
In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but
dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that
we introduce an unlocked variant of the address iterator, we'd still
need some relatively complex data structures, and a void *ctx in the
dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are
able to figure out, after iterating, whether the same MAC address is or
isn't present in the port db of another port.

All the above, plus the fact that I expect other drivers to follow the
same model as felix where all standalone ports use the same FID, made me
conclude that a generic method provided by DSA is necessary:
dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this
from the ->port_fdb_del() handler for the CPU port, when the database
was classified to either a port db, or a LAG db.

For symmetry, we also call this from ->port_fdb_add(), because if the
address was installed once, then installing it a second time serves no
purpose: it's already in hardware in VID 0 and it affects all standalone
ports.

This change moves dsa_db_equal() from switch.c to dsa.c, since it now
has one more caller.

Fixes: 54c3198460 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:10 +00:00
Vladimir Oltean
f2e2662ccf net: dsa: felix: actually disable flooding towards NPI port
The two blamed commits were written/tested individually but not
together.

When put together, commit 90897569be ("net: dsa: felix: start off with
flooding disabled on the CPU port"), which deletes a reinitialization of
PGID_UC/PGID_MC/PGID_BC, is no longer sufficient to ensure that these
port masks don't contain the CPU port module.

This is because commit b903a6bd2e ("net: dsa: felix: migrate flood
settings from NPI to tag_8021q CPU port") overwrites the hardware
default settings towards the CPU port module with the settings that used
to be present on the NPI port treated as a regular port. There, flooding
is enabled, so flooding would get enabled on the CPU port module too.

Adding conditional logic somewhere within felix_setup_tag_npi() to
configure either the default no-flood policy or the flood policy
inherited from the tag_8021q CPU port from a previous call to
dsa_port_manage_cpu_flood() is getting complicated. So just let the
migration logic do its thing during initial setup (which will
temporarily turn on flooding), then turn flooding off for the NPI port
after felix_set_tag_protocol() finishes. Here we are in felix_setup(),
so the DSA slave interfaces are not yet created, and this doesn't affect
traffic in any way.

Fixes: 90897569be ("net: dsa: felix: start off with flooding disabled on the CPU port")
Fixes: b903a6bd2e ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:10 +00:00
Vladimir Oltean
e2d0576f0c net: dsa: be mostly no-op in dsa_slave_set_mac_address when down
Since the slave unicast address is synced to hardware and to the DSA
master during dsa_slave_open(), this means that a call to
dsa_slave_set_mac_address() while the slave interface is down will
result to a call to dsa_port_standalone_host_fdb_del() and to
dev_uc_del() for the MAC address while there was no previous
dsa_port_standalone_host_fdb_add() or dev_uc_add().

This is a partial revert of the blamed commit below, which was too
aggressive.

Fixes: 35aae5ab91 ("net: dsa: remove workarounds for changing master promisc/allmulti only while up")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:09 +00:00
Vladimir Oltean
c69f40ac60 net: dsa: felix: drop "bool change" from felix_set_tag_protocol
We no longer need the workaround in the felix driver to avoid calling
dsa_port_walk_fdbs() when &dp->fdbs is an uninitialized list, because
that list is now initialized from all call paths of felix_set_tag_protocol().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:09 +00:00
Vladimir Oltean
fe95784fb1 net: dsa: move port lists initialization to dsa_port_touch
&cpu_db->fdbs and &cpu_db->mdbs may be uninitialized lists during some
call paths of felix_set_tag_protocol().

There was an attempt to avoid calling dsa_port_walk_fdbs() during setup
by using a "bool change" in the felix driver, but this doesn't work when
the tagging protocol is defined in the device tree, and a change is
triggered by DSA at pseudo-runtime:

dsa_tree_setup_switches
-> dsa_switch_setup
   -> dsa_switch_setup_tag_protocol
      -> ds->ops->change_tag_protocol
dsa_tree_setup_ports
-> dsa_port_setup
   -> &dp->fdbs and &db->mdbs only get initialized here

So it seems like the only way to fix this is to move the initialization
of these lists earlier.

dsa_port_touch() is called from dsa_switch_touch_ports() which is called
from dsa_switch_parse_of(), and this runs completely before
dsa_tree_setup(). Similarly, dsa_switch_release_ports() runs after
dsa_tree_teardown().

Fixes: f9cef64fa2 ("net: dsa: felix: migrate host FDB and MDB entries when changing tag proto")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:09 +00:00
Vladimir Oltean
0832cd9f1f net: dsa: warn if port lists aren't empty in dsa_port_teardown
There has been recent work towards matching each switchdev object
addition with a corresponding deletion.

Therefore, having elements in the fdbs, mdbs, vlans lists at the time of
a shared (DSA, CPU) port's teardown is indicative of a bug somewhere
else, and not something that is to be expected.

We shouldn't try to silently paper over that. Instead, print a warning
and a stack trace.

This change is a prerequisite for moving the initialization/teardown of
these lists. Make it clear that clearing the lists isn't needed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 11:12:09 +00:00
David S. Miller
ce7ec1b8ec Merge branch 'ptrp-ocp-next'
Jonathan Lemon says:

====================
ptp: ocp: update devlink information

Both of these patches update the information displayed via devlink.

v1 -> v2: remove board.manufacture information
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 10:57:07 +00:00
Jonathan Lemon
b0ca789ade ptp: ocp: Update devlink firmware display path.
Cache the firmware version when the card is initialized,
and use this field to populate the devlink firmware information.

The cached firmware version will be used for feature gating in
upcoming patches.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 10:57:07 +00:00
Jonathan Lemon
0cfcdd1ebc ptp: ocp: add nvmem interface for accessing eeprom
Add the at24 drivers for the eeprom, and use the accessors
via the nvmem API instead of direct i2c accesses.  This makes
things cleaner.

Add an eeprom map table which specifies where the pre-defined
information is located.  Retrieve the information and and export
it via the devlink interface.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 10:57:07 +00:00
David S. Miller
b57b44f749 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next
-queue

Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-03-08

This series contains updates to ixgbe and ixgbevf drivers.

Slawomir adds an implementation for ndo_set_vf_link_state() to allow
for disabling of VF link state as well a mailbox implementation so
the VF can query the state. Additionally, for 82599, the option to
disable a VF after receiving several malicious driver detection (MDD)
events are encountered is added. For ixgbevf, the corresponding
implementation to query and report a disabled state is added.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 10:39:29 +00:00
Colin Ian King
d82a6c5ef9 net: prestera: acl: make read-only array client_map static const
Don't populate the read-only array client_map  on the stack but
instead make it static const. Also makes the object code a little
smaller.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220307221349.164585-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:22:36 -08:00
Jonathan Lemon
4587369b6c ptp: ocp: correct label for error path
When devlink_register() was removed from the error path, the
corresponding label was not updated.   Rename the label for
readability puposes, no functional change.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220308000458.2166-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:14:53 -08:00
Samuel Thibault
869420a8be SO_ZEROCOPY should return -EOPNOTSUPP rather than -ENOTSUPP
ENOTSUPP is documented as "should never be seen by user programs",
and thus not exposed in <errno.h>, and thus applications cannot safely
check against it (they get "Unknown error 524" as strerror). We should
rather return the well-known -EOPNOTSUPP.

This is similar to 2230a7ef51 ("drop_monitor: Use correct error
code") and 4a5cdc604b ("net/tls: Fix return values to avoid
ENOTSUPP"), which did not seem to cause problems.

Signed-off-by: Samuel Thibault <samuel.thibault@labri.fr>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220307223126.djzvg44v2o2jkjsx@begin
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:14:49 -08:00
Jakub Kicinski
964efdab03 Merge branch 'mptcp-advertisement-reliability-improvement-and-misc-updates'
Mat Martineau says:

====================
mptcp: Advertisement reliability improvement and misc. updates

Patch 1 adds a helpful debug tracepoint for outgoing MPTCP packets.

Patch 2 is a small "magic number" refactor.

Patches 3 & 4 refactor parts of the mptcp_join.sh selftest. No change in
test coverage.

Patch 5 ensures only advertised address IDs are un-advertised.

Patches 6-8 improve handling of an edge case where endpoint IDs need to
be created on-the-fly when adding subflows. Includes selftest coverage.

Patch 9 adds validation of the fullmesh flag in a MPTCP netlink command,
which was overlooked when this flag was introduced for 5.18.
====================

Link: https://lore.kernel.org/r/20220307204439.65164-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:15 -08:00
Geliang Tang
0dc626e5e8 mptcp: add fullmesh flag check for adding address
The fullmesh flag mustn't be used with the signal flag when adding an
address. This patch added the necessary flags check for this case.

Fixes: 73c762c1f0 ("mptcp: set fullmesh flag in pm_netlink")
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:12 -08:00
Paolo Abeni
69c6ce7b6e selftests: mptcp: add implicit endpoint test case
Ensure implicit endpoint are created when expected and
that the user-space can update them

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:12 -08:00
Paolo Abeni
4cf86ae84c mptcp: strict local address ID selection
The address ID selection for MPJ subflows created in response
to incoming ADD_ADDR option is currently unreliable: it happens
at MPJ socket creation time, when the local address could be
unknown.

Additionally, if the no local endpoint is available for the local
address, a new dummy endpoint is created, confusing the user-land.

This change refactor the code to move the address ID selection inside
the rebuild_header() helper, when the local address eventually
selected by the route lookup is finally known. If the address used
is not mapped by any endpoint - and thus can't be advertised/removed
pick the id 0 instead of allocate a new endpoint.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:12 -08:00
Paolo Abeni
d045b9eb95 mptcp: introduce implicit endpoints
In some edge scenarios, an MPTCP subflows can use a local address
mapped by a "implicit" endpoint created by the in-kernel path manager.

Such endpoints presence can be confusing, as it's creation is hard
to track and will prevent the later endpoint creation from the user-space
using the same address.

Define a new endpoint flag to mark implicit endpoints and allow the
user-space to replace implicit them with user-provided data at endpoint
creation time.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:11 -08:00
Paolo Abeni
6fa0174a7c mptcp: more careful RM_ADDR generation
The in-kernel MPTCP path manager, when processing the MPTCP_PM_CMD_FLUSH_ADDR
command, generates RM_ADDR events for each known local address. While that
is allowed by the RFC, it makes unpredictable the exact number of RM_ADDR
generated when both ends flush the PM addresses.

This change restricts the RM_ADDR generation to previously explicitly
announced addresses, and adjust the expected results in a bunch of related
self-tests.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:11 -08:00
Mat Martineau
f98c2bca7b selftests: mptcp: Rename wait function
The "selftests: mptcp: improve 'fair usage on close' stability" commit
changed that self test to check the TcpAttemptFails MIB instead of
looking for TW sockets. The associated bash function wasn't renamed in
that commit because of the merge conflicts it would cause, so this
commit updates the function name as Paolo originally intended.

Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:11 -08:00
Matthieu Baerts
826d7bdca8 selftests: mptcp: join: allow running -cCi
Without this patch, no tests would be ran when launching:

  mptcp_join.sh -cCi

In any order or a combination with 2 of these letters.

The recommended way with getopt is first parse all options and then act.

This allows to do some actions in priority, e.g. display the help menu
and stop.

But also some global variables changing the behaviour of this selftests
 -- like the ones behind -cCi options -- can be set before running the
different tests. By doing that, we can also avoid long and unreadable
regex.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:11 -08:00
Geliang Tang
ea56dcb43c mptcp: use MPTCP_SUBFLOW_NODATA
Set subflow->data_avail with the enum value MPTCP_SUBFLOW_NODATA, instead
of using 0 directly.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:11 -08:00
Geliang Tang
0eb4e7ee16 mptcp: add tracepoint in mptcp_sendmsg_frag
The tracepoint in get_mapping_status() only dumped the incoming mpext
fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump
the outgoing mpext too.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:06:10 -08:00
Slawomir Mrozowicz
443ebdd68b ixgbevf: add disable link state
Add possibility to disable link state if it is administratively
disabled in PF.

It is part of the general functionality that allows the PF driver
to control the state of the virtual link VF devices.

Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08 07:41:18 -08:00