1090037 Commits

Author SHA1 Message Date
Pavel Pisa
fb23e43a0a dt-bindings: vendor-prefix: add prefix for the Czech Technical University in Prague.
The Czech Technical University in Prague (CTU) is one of
the biggest and oldest (founded 1707) technical universities
in Europe. The abbreviation in Czech language is ČVUT according
to official name in Czech language

  České vysoké učení technické v Praze

The English translation

  The Czech Technical University in Prague

The university pages in English

  https://www.cvut.cz/en

Link: https://lore.kernel.org/all/ff3a7216114fcd83530e70b994ef0e4277ddf000.1647904780.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:14 +02:00
Marc Kleine-Budde
c6f2a617a0 can: mcp251xfd: add support for mcp251863
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an
integrated transceiver (ATA6563). This patch add support for the new
device.

Link: https://lore.kernel.org/all/20220419072805.2840340-3-mkl@pengutronix.de
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:13 +02:00
Marc Kleine-Budde
6211197648 dt-binding: can: mcp251xfd: add binding information for mcp251863
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an
integrated Transceiver (ATA6563). Add the microchip,mcp251863 as a new
compatible to the binding.

Link: https://lore.kernel.org/all/20220419072805.2840340-2-mkl@pengutronix.de
Cc: devicetree@vger.kernel.org
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:13 +02:00
Wolfram Sang
44b6b105dd dt-bindings: can: renesas,rcar-canfd: document r8a77961 support
This patch adds documentation for the r8a77961 to the
renesas,rcar-canfd binding.

Link: https://lore.kernel.org/all/20220401153743.77871-1-wsa+renesas@sang-engineering.com
Cc: devicetree@vger.kernel.org
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:13 +02:00
Marc Kleine-Budde
ae38fda029 can: xilinx_can: mark bit timing constants as const
This patch marks the bit timing constants as const.

Fixes: c223da689324 ("can: xilinx_can: Add support for CANFD FD frames")
Link: https://lore.kernel.org/all/20220317203119.792552-1-mkl@pengutronix.de
Cc: Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com>
Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:13 +02:00
Lukas Bulwahn
badea4fc70 MAINTAINERS: rectify entry for XILINX CAN DRIVER
Commit 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN
binding to YAML") converts xilinx_can.txt to xilinx,can.yaml, but
missed to adjust its reference in MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains
about a broken reference.

Repair this file reference in XILINX CAN DRIVER.

Fixes: 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML")
Link: https://lore.kernel.org/all/20220321122840.17841-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:13 +02:00
Minghao Chi
e6ec837905 can: flexcan: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Link: https://lore.kernel.org/all/20220419081449.2574026-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:12 +02:00
Christophe Leroy
bb75e352d7 can: mscan: mpc5xxx_can: Prepare cleanup of powerpc's asm/prom.h
powerpc's asm/prom.h brings some headers that it doesn't need itself.

In order to clean it up, first add missing headers in users of
asm/prom.h

Link: https://lore.kernel.org/all/878888f9057ad2f66ca0621a0007472bf57f3e3d.1648833432.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:12 +02:00
Kris Bahnsen
20c7258980 can: Fix Links to Technologic Systems web resources
Technologic Systems has rebranded as embeddedTS with the current
domain eventually going offline. Update web/doc URLs to correct
resource locations.

Link: https://lore.kernel.org/all/20220329201229.16279-1-kris@embeddedTS.com
Signed-off-by: Kris Bahnsen <kris@embeddedTS.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:12 +02:00
Marc Kleine-Budde
85d4eb2a3d can: bittiming: can_calc_bittiming(): prefer small bit rate pre-scalers over larger ones
The CiA (CAN in Automation) lists in their Newsletter 1/2018 in the
"Recommendation for the CAN FD bit-timing" [1] article several
recommendations, one of them is:

| Recommendation 3: Choose BRPA and BRPD as low as possible

[1] https://can-newsletter.org/uploads/media/raw/f6a36d1461371a2f86ef0011a513712c.pdf

With the current bit timing algorithm Srinivas Neeli noticed that on
the Xilinx Versal ACAP board the CAN data bit timing parameters are
not calculated optimally. For most bit rates, the bit rate
prescaler (BRP) is != 1, although it's possible to configure the
requested with a bit rate with a prescaler of 1:

| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'v4.8'
|  nominal                                  real  Bitrt    nom   real  SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP  Error
| 12000000     12   2    2    2   1   1 11428571   4.8%  75.0%  71.4%   4.8%
| 10000000     25   1    1    1   1   2  9999999   0.0%  75.0%  75.0%   0.0%
|  8000000     12   3    3    3   1   1  7999999   0.0%  75.0%  70.0%   6.7%
|  5000000     50   1    1    1   1   4  4999999   0.0%  75.0%  75.0%   0.0%
|  4000000     62   1    1    1   1   5  3999999   0.0%  75.0%  75.0%   0.0%
|  2000000    125   1    1    1   1  10  1999999   0.0%  75.0%  75.0%   0.0%
|  1000000    250   1    1    1   1  20   999999   0.0%  75.0%  75.0%   0.0%

The bit timing parameter calculation algorithm iterates effectively
from low to high BRP values. It selects a new best parameter set, if
the sample point error of the current parameter set is equal or less
to old best parameter set.

If the given hardware constraints (clock rate and bit timing parameter
constants) don't allow a sample point error of 0, the algorithm will
first find a valid bit timing parameter set with a low BRP, but then
will accept parameter sets with higher BRPs that have the same sample
point error.

This patch changes the algorithm to only accept a new parameter set,
if the resulting sample point error is lower. This leads to the
following data bit timing parameter for the Versal ACAP board:

| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'can-next'
|  nominal                                  real  Bitrt    nom   real  SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP  Error
| 12000000     12   2    2    2   1   1 11428571   4.8%  75.0%  71.4%   4.8%
| 10000000     12   2    3    2   1   1  9999999   0.0%  75.0%  75.0%   0.0%
|  8000000     12   3    3    3   1   1  7999999   0.0%  75.0%  70.0%   6.7%
|  5000000     12   5    6    4   1   1  4999999   0.0%  75.0%  75.0%   0.0%
|  4000000     12   7    7    5   1   1  3999999   0.0%  75.0%  75.0%   0.0%
|  2000000     12  14   15   10   1   1  1999999   0.0%  75.0%  75.0%   0.0%
|  1000000     25  14   15   10   1   2   999999   0.0%  75.0%  75.0%   0.0%

Note: Due to HW constraints a data bit rate of 1 MBit/s with BRP = 1 is not possible.

Link: https://lore.kernel.org/all/20220318144913.873614-1-mkl@pengutronix.de
Link: https://lore.kernel.org/all/20220113203004.jf2rqj2pirhgx72i@pengutronix.de
Cc: Srinivas Neeli <sneeli@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 17:12:05 +02:00
Marc Kleine-Budde
eb38c2053b can: rx-offload: rename can_rx_offload_queue_sorted() -> can_rx_offload_queue_timestamp()
This patch renames the function can_rx_offload_queue_sorted() to
can_rx_offload_queue_timestamp(). This better describes what the
function does, it adds a newly RX'ed skb to the sorted queue by its
timestamp.

Link: https://lore.kernel.org/all/20220417194327.2699059-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19 16:58:04 +02:00
Eric Dumazet
99c07327ae netlink: reset network and mac headers in netlink_dump()
netlink_dump() is allocating an skb, reserves space in it
but forgets to reset network header.

This allows a BPF program, invoked later from sk_filter()
to access uninitialized kernel memory from the reserved
space.

Theorically mac header reset could be omitted, because
it is set to a special initial value.
bpf_internal_load_pointer_neg_helper calls skb_mac_header()
without checking skb_mac_header_was_set().
Relying on skb->len not being too big seems fragile.
We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
to avoid surprises in the future.

syzbot report was:

BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
 ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
 __bpf_prog_run include/linux/filter.h:626 [inline]
 bpf_prog_run include/linux/filter.h:633 [inline]
 __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
 bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
 sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
 sk_filter include/linux/filter.h:905 [inline]
 netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_read_iter+0x5a9/0x630 net/socket.c:1039
 do_iter_readv_writev+0xa7f/0xc70
 do_iter_read+0x52c/0x14c0 fs/read_write.c:786
 vfs_readv fs/read_write.c:906 [inline]
 do_readv+0x432/0x800 fs/read_write.c:943
 __do_sys_readv fs/read_write.c:1034 [inline]
 __se_sys_readv fs/read_write.c:1031 [inline]
 __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was stored to memory at:
 ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
 __bpf_prog_run include/linux/filter.h:626 [inline]
 bpf_prog_run include/linux/filter.h:633 [inline]
 __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
 bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
 sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
 sk_filter include/linux/filter.h:905 [inline]
 netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_read_iter+0x5a9/0x630 net/socket.c:1039
 do_iter_readv_writev+0xa7f/0xc70
 do_iter_read+0x52c/0x14c0 fs/read_write.c:786
 vfs_readv fs/read_write.c:906 [inline]
 do_readv+0x432/0x800 fs/read_write.c:943
 __do_sys_readv fs/read_write.c:1034 [inline]
 __se_sys_readv fs/read_write.c:1031 [inline]
 __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:737 [inline]
 slab_alloc_node mm/slub.c:3244 [inline]
 __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
 kmalloc_reserve net/core/skbuff.c:354 [inline]
 __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
 alloc_skb include/linux/skbuff.h:1158 [inline]
 netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_read_iter+0x5a9/0x630 net/socket.c:1039
 do_iter_readv_writev+0xa7f/0xc70
 do_iter_read+0x52c/0x14c0 fs/read_write.c:786
 vfs_readv fs/read_write.c:906 [inline]
 do_readv+0x432/0x800 fs/read_write.c:943
 __do_sys_readv fs/read_write.c:1034 [inline]
 __se_sys_readv fs/read_write.c:1031 [inline]
 __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x44/0xae

CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: db65a3aaf29e ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 15:05:03 +02:00
Paolo Abeni
cc4bdef26e Merge branch 'rtnetlink-improve-alt_ifname-config-and-fix-dangerous-group-usage'
Florent Fourcot says:

====================
rtnetlink: improve ALT_IFNAME config and fix dangerous GROUP usage

First commit forbids dangerous calls when both IFNAME and GROUP are
given, since it can introduce unexpected behaviour when IFNAME does not
match any interface.

Second patch achieves primary goal of this patchset to fix/improve
IFLA_ALT_IFNAME attribute, since previous code was never working for
newlink/setlink. ip-link command is probably getting interface index
before, and was not using this feature.

Last two patches are improving error code on corner cases.

Changes in v2:
  * Remove ifname argument in rtnl_dev_get/do_setlink
    functions (simplify code)
  * Use a boolean to avoid condition duplication in __rtnl_newlink

Changes in v3:
  * Simplify rtnl_dev_get signature

Changes in v4:
  * Rename link_lookup to link_specified

Changes in v5:
  * Re-order patches
====================

Link: https://lore.kernel.org/r/20220415165330.10497-1-florent.fourcot@wifirst.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 13:39:01 +02:00
Florent Fourcot
b6177d3240 rtnetlink: return EINVAL when request cannot succeed
A request without interface name/interface index/interface group cannot
work. We should return EINVAL

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 13:38:55 +02:00
Florent Fourcot
dee04163e9 rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellink
If IFLA_ALT_IFNAME is set and given interface is not found,
we should return ENODEV and be consistent with IFLA_IFNAME
behaviour
This commit extends feature of commit 76c9ac0ee878,
"net: rtnetlink: add possibility to use alternative names as message handle"

CC: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 13:38:50 +02:00
Florent Fourcot
5ea08b5286 rtnetlink: enable alt_ifname for setlink/newlink
buffer called "ifname" given in function rtnl_dev_get
is always valid when called by setlink/newlink,
but contains only empty string when IFLA_IFNAME is not given. So
IFLA_ALT_IFNAME is always ignored

This patch fixes rtnl_dev_get function with a remove of ifname argument,
and move ifname copy in do_setlink when required.

It extends feature of commit 76c9ac0ee878,
"net: rtnetlink: add possibility to use alternative names as message
handle""

CC: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 13:38:43 +02:00
Florent Fourcot
ef2a7c9065 rtnetlink: return ENODEV when ifname does not exist and group is given
When the interface does not exist, and a group is given, the given
parameters are being set to all interfaces of the given group. The given
IFNAME/ALT_IF_NAME are being ignored in that case.

That can be dangerous since a typo (or a deleted interface) can produce
weird side effects for caller:

Case 1:

 IFLA_IFNAME=valid_interface
 IFLA_GROUP=1
 MTU=1234

Case 1 will update MTU and group of the given interface "valid_interface".

Case 2:

 IFLA_IFNAME=doesnotexist
 IFLA_GROUP=1
 MTU=1234

Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is
ignored in this case

This behaviour is not consistent and dangerous. In order to fix this issue,
we now return ENODEV when the given IFNAME does not exist.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 13:38:31 +02:00
Paolo Abeni
8b11c35d97 Merge branch 'net-sched-allow-user-to-select-txqueue'
Tonghao Zhang says:

====================
net: sched: allow user to select txqueue

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Patch 1 allow user to select txqueue in clsact hook.
Patch 2 support skbhash to select txqueue.
====================

Link: https://lore.kernel.org/r/20220415164046.26636-1-xiangxia.m.yue@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 12:20:48 +02:00
Tonghao Zhang
38a6f08657 net: sched: support hash selecting tx queue
This patch allows users to pick queue_mapping, range
from A to B. Then we can load balance packets from A
to B tx queue. The range is an unsigned 16bit value
in decimal format.

$ tc filter ... action skbedit queue_mapping skbhash A B

"skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit")
is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH

  +----+      +----+      +----+
  | P1 |      | P2 |      | Pn |
  +----+      +----+      +----+
    |           |           |
    +-----------+-----------+
                |
                | clsact/skbedit
                |      MQ
                v
    +-----------+-----------+
    | q0        | qn        | qm
    v           v           v
  HTB/FQ       FIFO   ...  FIFO

For example:
If P1 sends out packets to different Pods on other host, and
we want distribute flows from qn - qm. Then we can use skb->hash
as hash.

setup commands:
$ NETDEV=eth0
$ ip netns add n1
$ ip link add ipv1 link $NETDEV type ipvlan mode l2
$ ip link set ipv1 netns n1
$ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up

$ tc qdisc add dev $NETDEV clsact
$ tc filter add dev $NETDEV egress protocol ip prio 1 \
        flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6
$ tc qdisc add dev $NETDEV handle 1: root mq
$ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
$ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
$ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit
$ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
$ tc qdisc add dev $NETDEV parent 1:3 pfifo
$ tc qdisc add dev $NETDEV parent 1:4 pfifo
$ tc qdisc add dev $NETDEV parent 1:5 pfifo
$ tc qdisc add dev $NETDEV parent 1:6 pfifo
$ tc qdisc add dev $NETDEV parent 1:7 pfifo

$ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10

pick txqueue from 2 - 6:
$ ethtool -S $NETDEV | grep -i tx_queue_[0-9]_bytes
     tx_queue_0_bytes: 42
     tx_queue_1_bytes: 0
     tx_queue_2_bytes: 11442586444
     tx_queue_3_bytes: 7383615334
     tx_queue_4_bytes: 3981365579
     tx_queue_5_bytes: 3983235051
     tx_queue_6_bytes: 6706236461
     tx_queue_7_bytes: 42
     tx_queue_8_bytes: 0
     tx_queue_9_bytes: 0

txqueues 2 - 6 are mapped to classid 1:3 - 1:7
$ tc -s class show dev $NETDEV
...
class mq 1:3 root leaf 8002:
 Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq 1:4 root leaf 8003:
 Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq 1:5 root leaf 8004:
 Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq 1:6 root leaf 8005:
 Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq 1:7 root leaf 8006:
 Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
...

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 12:20:45 +02:00
Tonghao Zhang
2f1e85b1ae net: sched: use queue_mapping to pick tx queue
This patch fixes issue:
* If we install tc filters with act_skbedit in clsact hook.
  It doesn't work, because netdev_core_pick_tx() overwrites
  queue_mapping.

  $ tc filter ... action skbedit queue_mapping 1

And this patch is useful:
* We can use FQ + EDT to implement efficient policies. Tx queues
  are picked by xps, ndo_select_queue of netdev driver, or skb hash
  in netdev_core_pick_tx(). In fact, the netdev driver, and skb
  hash are _not_ under control. xps uses the CPUs map to select Tx
  queues, but we can't figure out which task_struct of pod/containter
  running on this cpu in most case. We can use clsact filters to classify
  one pod/container traffic to one Tx queue. Why ?

  In containter networking environment, there are two kinds of pod/
  containter/net-namespace. One kind (e.g. P1, P2), the high throughput
  is key in these applications. But avoid running out of network resource,
  the outbound traffic of these pods is limited, using or sharing one
  dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
  (e.g. Pn), the low latency of data access is key. And the traffic is not
  limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
  This choice provides two benefits. First, contention on the HTB/FQ Qdisc
  lock is significantly reduced since fewer CPUs contend for the same queue.
  More importantly, Qdisc contention can be eliminated completely if each
  CPU has its own FIFO Qdisc for the second kind of pods.

  There must be a mechanism in place to support classifying traffic based on
  pods/container to different Tx queues. Note that clsact is outside of Qdisc
  while Qdisc can run a classifier to select a sub-queue under the lock.

  In general recording the decision in the skb seems a little heavy handed.
  This patch introduces a per-CPU variable, suggested by Eric.

  The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
  - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
    is set in qdisc->enqueue() though tx queue has been selected in
    netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
    firstly in __dev_queue_xmit(), is useful:
  - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
    in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
    For example, eth0, macvlan in pod, which root Qdisc install skbedit
    queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
    eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping
    because there is no filters in clsact or tx Qdisc of this netdev.
    Same action taked in eth0, ixgbe in Host.
  - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
    in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it
    in __dev_queue_xmit when processing next packets.

  For performance reasons, use the static key. If user does not config the NET_EGRESS,
  the patch will not be compiled.

  +----+      +----+      +----+
  | P1 |      | P2 |      | Pn |
  +----+      +----+      +----+
    |           |           |
    +-----------+-----------+
                |
                | clsact/skbedit
                |      MQ
                v
    +-----------+-----------+
    | q0        | q1        | qn
    v           v           v
  HTB/FQ      HTB/FQ  ...  FIFO

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 12:20:45 +02:00
Vladimir Oltean
4cf35a2b62 net: mscc: ocelot: fix broken IP multicast flooding
When the user runs:
bridge link set dev $br_port mcast_flood on

this command should affect not only L2 multicast, but also IPv4 and IPv6
multicast.

In the Ocelot switch, unknown multicast gets flooded according to
different PGIDs according to its type, and PGID_MC only handles L2
multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their
default value of 0, unknown IP multicast traffic is never flooded.

Fixes: 421741ea5672 ("net: mscc: ocelot: offload bridge port flags to device")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 10:33:33 +02:00
Kurt Kanzenbach
0763120b09 net: dsa: hellcreek: Calculate checksums in tagger
In case the checksum calculation is offloaded to the DSA master network
interface, it will include the switch trailing tag. As soon as the switch strips
that tag on egress, the calculated checksum is wrong.

Therefore, add the checksum calculation to the tagger (if required) before
adding the switch tag. This way, the hellcreek code works with all DSA master
interfaces regardless of their declared feature set.

Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19 09:49:38 +02:00
Manuel Ullmann
cbe6c3a8f8 net: atlantic: invert deep par in pm functions, preventing null derefs
This will reset deeply on freeze and thaw instead of suspend and
resume and prevent null pointer dereferences of the uninitialized ring
0 buffer while thawing.

The impact is an indefinitely hanging kernel. You can't switch
consoles after this and the only possible user interaction is SysRq.

BUG: kernel NULL pointer dereference
RIP: 0010:aq_ring_rx_fill+0xcf/0x210 [atlantic]
aq_vec_init+0x85/0xe0 [atlantic]
aq_nic_init+0xf7/0x1d0 [atlantic]
atl_resume_common+0x4f/0x100 [atlantic]
pci_pm_thaw+0x42/0xa0

resolves in aq_ring.o to

```
0000000000000ae0 <aq_ring_rx_fill>:
{
/* ... */
 baf:	48 8b 43 08          	mov    0x8(%rbx),%rax
 		buff->flags = 0U; /* buff is NULL */
```

The bug has been present since the introduction of the new pm code in
8aaa112a57c1 ("net: atlantic: refactoring pm logic") and was hidden
until 8ce84271697a ("net: atlantic: changes for multi-TC support"),
which refactored the aq_vec_{free,alloc} functions into
aq_vec_{,ring}_{free,alloc}, but is technically not wrong. The
original functions just always reinitialized the buffers on S3/S4. If
the interface is down before freezing, the bug does not occur. It does
not matter, whether the initrd contains and loads the module before
thawing.

So the fix is to invert the boolean parameter deep in all pm function
calls, which was clearly intended to be set like that.

First report was on Github [1], which you have to guess from the
resume logs in the posted dmesg snippet. Recently I posted one on
Bugzilla [2], since I did not have an AQC device so far.

#regzbot introduced: 8ce84271697a
#regzbot from: koo5 <kolman.jindrich@gmail.com>
#regzbot monitor: https://github.com/Aquantia/AQtion/issues/32

Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic")
Link: https://github.com/Aquantia/AQtion/issues/32 [1]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215798 [2]
Cc: stable@vger.kernel.org
Reported-by: koo5 <kolman.jindrich@gmail.com>
Signed-off-by: Manuel Ullmann <labre@posteo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 13:34:36 +01:00
Luiz Angelo Daros de Luca
a997157e42 docs: net: dsa: describe issues with checksum offload
DSA tags before IP header (categories 1 and 2) or after the payload (3)
might introduce offload checksum issues.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 13:29:02 +01:00
David S. Miller
2a38de067b Merge branch 'mlxsw-line-card'
Ido Schimmel says:

====================
mlxsw: Introduce line card support for modular switch

Jiri says:

This patchset introduces support for modular switch systems and also
introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch.
It contains 8 slots to accommodate line cards - replaceable PHY modules
which may contain gearboxes.
Currently supported line card:
16X 100GbE (QSFP28)
Other line cards that are going to be supported:
8X 200GbE (QSFP56)
4X 400GbE (QSFP-DD)
There may be other types of line cards added in the future.

To be consistent with the port split configuration (splitter cabels),
the line card entities are treated in the similar way. The nature of
a line card is not "a pluggable device", but "a pluggable PHY module".

A concept of "provisioning" is introduced. The user may "provision"
certain slot with a line card type. Driver then creates all instances
(devlink ports, netdevices, etc) related to this line card type. It does
not matter if the line card is plugged-in at the time. User is able to
configure netdevices, devlink ports, setup port splitters, etc. From the
perspective of the switch ASIC, all is present and can be configured.

The carrier of netdevices stays down if the line card is not plugged-in.
Once the line card is inserted and activated, the carrier of
the related netdevices is then reflecting the physical line state,
same as for an ordinary fixed port.

Once user does not want to use the line card related instances
anymore, he can "unprovision" the slot. Driver then removes the
instances.

Patches 1-4 are extending devlink driver API and UAPI in order to
register, show, dump, provision and activate the line card.
Patches 5-17 are implementing the introduced API in mlxsw.
The last patch adds a selftest for mlxsw line cards.

Example:
$ devlink port # No ports are listed
$ devlink lc
pci/0000:01:00.0:
  lc 1 state unprovisioned
    supported_types:
       16x100G
  lc 2 state unprovisioned
    supported_types:
       16x100G
  lc 3 state unprovisioned
    supported_types:
       16x100G
  lc 4 state unprovisioned
    supported_types:
       16x100G
  lc 5 state unprovisioned
    supported_types:
       16x100G
  lc 6 state unprovisioned
    supported_types:
       16x100G
  lc 7 state unprovisioned
    supported_types:
       16x100G
  lc 8 state unprovisioned
    supported_types:
       16x100G

Note that driver exposes list supported line card types. Currently
there is only one: "16x100G".

To provision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8 state active type 16x100G
    supported_types:
       16x100G
$ devlink port
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4

To uprovision the slot #8:

$ devlink lc set pci/0000:01:00.0 lc 8 notype
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
e1fad9517f selftests: mlxsw: Introduce devlink line card provision/unprovision/activation tests
Introduce basic line card manipulation which consists of provisioning,
unprovisioning and activation of a line card.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
6445eef0f6 mlxsw: spectrum: Add port to linecard mapping
For each port get slot_index using PMLP register. For ports residing
on a linecard, identify it with the linecard by setting mapping
using devlink_port_linecard_set() helper. Use linecard slot index for
PMTDB register queries.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
45bf3b7267 mlxsw: core: Extend driver ops by remove selected ports op
In case of line card implementation, the core has to have a way to
remove relevant ports manually. Extend the Spectrum driver ops by an op
that implements port removal of selected ports upon request.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
ee7a70fa67 mlxsw: core_linecards: Implement line card activation process
Allow to process events generated upon line card getting "ready" and
"active".

When DSDSC event with "ready" bit set is delivered, that means the
line card is powered up. Use MDDC register to push the line card to
active state. Once FW is done with that, the DSDSC event with "active"
bit set is delivered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
b217127e5e mlxsw: core_linecards: Add line card objects and implement provisioning
Introduce objects for line cards and an infrastructure around that.
Use devlink_linecard_create/destroy() to register the line card with
devlink core. Implement provisioning ops with a list of supported
line cards.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
5bade5aa4a mlxsw: reg: Add Management Binary Code Transfer Register
The MBCT register allows to transfer binary INI codes from the host to
the management FW by transferring it by chunks of maximum 1KB.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
5290a8ff2e mlxsw: reg: Add Management DownStream Device Control Register
The MDDC register allows to control downstream devices and line cards.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
505f524dc6 mlxsw: reg: Add Management DownStream Device Query Register
The MDDQ register allows to query the DownStream device properties.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
b0ec003e9a mlxsw: spectrum: Introduce port mapping change event processing
Register PMLPE trap and process the port mapping changes delivered
by it by creating related ports. Note that this happens after
provisioning. The INI of the linecard is processed and merged by FW.
PMLPE is generated for each port. Process this mapping change.

Layout of PMLPE is the same as layout of PMLP.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:19 +01:00
Jiri Pirko
adc6462376 mlxsw: Narrow the critical section of devl_lock during ports creation/removal
No need to hold the lock for alloc and freecpu. So narrow the critical
section. Follow-up patch is going to benefit from this by adding more
code to the functions which will be out of the critical as well.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
ebf0c53417 mlxsw: reg: Add Ports Mapping Event Configuration Register
The PMECR register is used to enable/disable event triggering
in case of local port mapping change.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
d3ad2d8820 mlxsw: spectrum: Allocate port mapping array of structs instead of pointers
Instead of array of pointers to port mapping structures, allocate the
array of structures directly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
bac62191a3 mlxsw: spectrum: Allow lane to start from non-zero index
So far, the lane index always started from zero. That is not true for
modular systems with gearbox-equipped linecards. Loose the check so the
lanes can start from non-zero index.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
b837585985 devlink: add port to line card relationship set
In order to properly inform user about relationship between port and
line card, introduce a driver API to set line card for a port. Use this
information to extend port devlink netlink message by line card index
and also include the line card index into phys_port_name and by that
into a netdevice name.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
fc9f50d5b3 devlink: implement line card active state
Allow driver to mark a line card as active. Expose this state to the
userspace over devlink netlink interface with proper notifications.
'active' state means that line card was plugged in after
being provisioned.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
fcdc8ce23a devlink: implement line card provisioning
In order to be able to configure all needed stuff on a port/netdevice
of a line card without the line card being present, introduce line card
provisioning. Basically by setting a type, provisioning process will
start and driver is supposed to create a placeholder for instances
(ports/netdevices) for a line card type.

Allow the user to query the supported line card types over line card
get command. Then implement two netlink command SET to allow user to
set/unset the card type.

On the driver API side, add provision/unprovision ops and supported
types array to be advertised. Upon provision op call, the driver should
take care of creating the instances for the particular line card type.
Introduce provision_set/clear() functions to be called by the driver
once the provisioning/unprovisioning is done on its side. These helpers
are not to be called directly due to the async nature of provisioning.

Example:
$ devlink port # No ports are listed
$ devlink lc
pci/0000:01:00.0:
  lc 1 state unprovisioned
    supported_types:
       16x100G
  lc 2 state unprovisioned
    supported_types:
       16x100G
  lc 3 state unprovisioned
    supported_types:
       16x100G
  lc 4 state unprovisioned
    supported_types:
       16x100G
  lc 5 state unprovisioned
    supported_types:
       16x100G
  lc 6 state unprovisioned
    supported_types:
       16x100G
  lc 7 state unprovisioned
    supported_types:
       16x100G
  lc 8 state unprovisioned
    supported_types:
       16x100G

$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
  lc 8 state active type 16x100G
    supported_types:
       16x100G
$ devlink port
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4

$ devlink lc set pci/0000:01:00.0 lc 8 notype

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Jiri Pirko
c246f9b5fd devlink: add support to create line card and expose to user
Extend the devlink API so the driver is going to be able to create and
destroy linecard instances. There can be multiple line cards per devlink
device. Expose this new type of object over devlink netlink API to the
userspace, with notifications.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 11:00:18 +01:00
Eric Dumazet
843f77407e tcp: fix signed/unsigned comparison
Kernel test robot reported:

smatch warnings:
net/ipv4/tcp_input.c:5966 tcp_rcv_established() warn: unsigned 'reason' is never less than zero.

I actually had one packetdrill failing because of this bug,
and was about to send the fix :)

v2: Andreas Schwab also pointed out that @reason needs to be negated
    before we reach tcp_drop_reason()

Fixes: 4b506af9c5b8 ("tcp: add two drop reasons for tcp_ack()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 10:28:40 +01:00
David S. Miller
d94ef51d5b linux-can-fixes-for-5.18-20220417
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmJcMUsTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXf+UB/9DyZD5tzOgeHEj/BHyN15TQ/KyABou
 fifW4Al5awFaoNG4gAmKfBiSqI0H4pejhIIB76t5xCfdftEiW6VbLDQz048oOezK
 olAW71Te0ESHP57ZOPsGPb2Im/qIl0b1nymatgse6XSsDzMupQiDT+BD79BTq8re
 sO/IzAky6L1nnaxfG2GrJMHMqsttyKQtYLJReDRTMaskI11ya9TjaJjI/uYOpm6F
 jCNTv0M/rEjDal85rN4m1E6NIh5E0qhaPAR5O+olKCfuVmneIl5YESnOIovA/19M
 0KPcOAT1rCoKaH7K94IHHJ9XG2bO1P5GTtHu3eUkjLwtEusMpHjfOMME
 =Andf
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.18-20220417' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-04-17

this is a pull request of 1 patch for net/master.

The patch is by Oliver Hartkopp and fixes a timeout monitoring problem
in the ISO TP protocol found by the syzbot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18 10:22:22 +01:00
Linus Torvalds
b2d229d4dd Linux 5.18-rc3 v5.18-rc3 2022-04-17 13:57:31 -07:00
Linus Torvalds
a1901b464e xen: branch for v5.18-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYlvMdwAKCRCAXGG7T9hj
 vueqAQCeJiUt0Adhs7ACqzvTsBc1TGXD44J6AAfwedMgtdgtvAD+LvWXLhTcgiCb
 DT03AIpI1Z/40QgPYuJ3o4yAZN7eUg4=
 =dV8F
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixlet from Juergen Gross:
 "A single cleanup patch for the Xen balloon driver"

* tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: don't use PV mode extra memory for zone device allocations
2022-04-17 10:29:10 -07:00
Linus Torvalds
3a69a44278 Two x86 fixes related to TSX:
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX to
     cover all CPUs which allow to disable it.
 
   - Disable TSX development mode at boot so that a microcode update which
     provides TSX development mode does not suddenly make the system
     vulnerable to TSX Asynchronous Abort.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb5LYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVVbD/9cxZWkFctCiymedUZqLabkfpYSki65
 MngdpCPzCNaaIdlp44lwCido5+gJsY9unXdm3OAUzLjv6SsxxpDr5njz1/C6TM1l
 XmWjlkLEbG2QDPd1Ybd/lpYQORBmiukyo8v8x0yFT7ZzwvSddoDZAbeUtkQBrIin
 sDTeExsewKzL2X5qXhttrHLHu1PYgurn4ThIrrG+eg2e4FNk6UUFUS3TOyMvzJDg
 NWJ7N5pGy9YkR7CISq1q+qdnH55pGaUrgonDi2qBTt3EaH0fQtZP2ZtIOYr3O4nI
 YCx6isrIiGUB6kSygofxmk4B+22CaUJXd2OcUxMZ/Th/a2aCK+35BtGVPXQGi6nU
 d7m+ZWB7dShOiejFygS59ty+5L5kliKXYZfUASsq1CLoXH8K1xUwBMkbY5FQ2WH1
 Ue4KUvjguNqsgSRAfeHdOi6B36oot0Xf9JO013Wm3V/r9hsGPtSOjWwFuVvT/euw
 a9iFtruATxDssBxH/l0djCKnwwm5yuOt1OpyizcIMFnlCgRD06h/6zgAvsJK7c8d
 dh6lC4D2mXP1e2wtEyZelve1tmRJ/FeReyG2V5FNU7m1mWYGm1rJZ4AEvnbrzcbC
 ePwFva0lPu8GVKG6HRgHfR8PjuQ7TFmKPKytT7fboIqQpTIY+1Q75wYD4eXkSu8Q
 /ltzXQz/8lz7bA==
 =UQaW
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two x86 fixes related to TSX:

   - Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX
     to cover all CPUs which allow to disable it.

   - Disable TSX development mode at boot so that a microcode update
     which provides TSX development mode does not suddenly make the
     system vulnerable to TSX Asynchronous Abort"

* tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsx: Disable TSX development mode at boot
  x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
2022-04-17 09:55:59 -07:00
Linus Torvalds
fbb9c58e56 A small set of fixes for the timers core:
- Fix the warning condition in __run_timers() which does not take into
     account, that a CPU base (especially the deferrable base) has never a
     timer armed on it and therefore the next_expiry value can become stale.
 
   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to prevent
     endless spam in dmesg.
 
   - Remove the double star from a comment which is not meant to be in
     kernel-doc format.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb44gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXptD/4tKI9W4q0CAZnXw/G9cJdUK2u2VQGb
 QfB0DMABuDLMlHUSO92T49IcfjQUR7hz36kHXY0OxwALaEB8ftap6yZLfI1864mZ
 RFatWcTl71ZYY0SSio0rMnbDawTeN9N69WLPR+A6fDH0ahBqTwpeJoa1zj5w99LF
 kFoEtI5vEBtf+m94PljBTclxOKTXmRRCzEZRpJ6L4Ze9SHR7aMKtBgp84M7a3IAw
 IzXq0Kw0MRIm4P26uOpsVzi2KOq6+F1O1zy4L8cHVeU+Y6HC1qWLPXsyhipb0liE
 6STFs1L7ANX+dpMW8kmorXc3RcN4xCuCY4Sb17W6Hqiq3syQ+Ho2G0CTheL0yOpM
 GVqf7SVes3EGAJwWyKaFmhpgWGKcS93p18XGUrf5V+rVIU+ggQcc2vM9E8FiM03d
 TA6T9+1u0TvtPlmd/KQ3OPLMEtu6/3R0fOLEXnv+Cb/PYIYMvPpb9cu6LjJkNqGv
 6R+3dWORI7mYLZD+vS3Y5mrmB5OECGXTPNJ+aBr46PfdvSVLxvmOhFlNR7eQKHZ8
 wFwOC6n4wkD5Y1VOVGEn0GVGWVkZXdlQzZOQGqBhJPjsfPfOaGacyLqCcsZH4V6w
 kitKI8z7A7evLozZMQqKZskwLG1BTQ13YeElb5jJMpG3aTG7lSmPk+iGgLFMDWkc
 Z2k4YD0Uw+92Tw==
 =SffK
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A small set of fixes for the timers core:

   - Fix the warning condition in __run_timers() which does not take
     into account that a CPU base (especially the deferrable base) never
     has a timer armed on it and therefore the next_expiry value can
     become stale.

   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to
     prevent endless spam in dmesg.

   - Remove the double star from a comment which is not meant to be in
     kernel-doc format"

* tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/sched: Fix non-kernel-doc comment
  tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
  timers: Fix warning condition in __run_timers()
2022-04-17 09:53:01 -07:00
Linus Torvalds
0e59732ed6 Two fixes for the SMP core:
- Make the warning condition in flush_smp_call_function_queue() correct,
    which checks a just emptied list head for being empty instead of
    validating that there was no pending entry on the offlined CPU at all.
 
  - The @cpu member of struct cpuhp_cpu_state is initialized when the CPU
    hotplug thread for the upcoming CPU is created. That's too late because
    the creation of the thread can fail and then the following rollback
    operates on CPU0. Get rid of the CPU member and hand the CPU number to
    the involved functions directly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4skTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUYaD/9SWzwEqwKICAhWWSMtRg6cvtVTAKGL
 xWXNb5ZxLA8MdG9CYu7WaUF7mU3E8o7kF/dItvkdj0/9mg/IqcVnP1CEOfp0ioHc
 1cbAZShORfFlOowO1WR9Xe3MSZB2mXPHsjh3FWPeGTDxnSByMDXROMFyotxji+7q
 g1ZsyVbq3+hSVTOhnhpxrh8MS3qcCAsgYtHKRnPjOE/tqbNXSmbhmeuN7QBKLVp6
 AD6DaWj8jGtZ8yzbpm0Ve3ZsjH+1SdZAzm4yhh3FHMKsSOwB8yuNwq1oNbbEjxWu
 mTg+LCBBMGYybSTa9sUpDG5Xfc3/dyWnzkGnmp7M8bfElrI3x4d+sMdVhfFRjEXB
 xlXmNqRoExgZwifyoEJbmSbG2SgLiWqyLqgpUvLS/yHud6IurFbu36TrByYRMMsG
 CyaiLMdWIlBibo+xGkVgnLyc2io98KeoLJoc35HM7VYyn9oNI4hUU9OJDIkG5D2i
 lyS+qMTFInFSOm7L5u/SUTYe4I0sXYJ6uEXxhR/Gi3ITXb7aLOJyxhNtq//u1dmg
 6vZHgs8HZWylG3LxVn1QERWotVlPuNPRilsUGCVMsCHjOFnoPy+eM3ANtR2x/uaE
 XpQ58jnhkhPXZfmnsjDEILkWB67J0/9FujbcPvixH8SRvUgBB7xRRbuo9zXPchhw
 VDKy1BCIcZ93gA==
 =XNLP
 -----END PGP SIGNATURE-----

Merge tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP fixes from Thomas Gleixner:
 "Two fixes for the SMP core:

   - Make the warning condition in flush_smp_call_function_queue()
     correct, which checked a just emptied list head for being empty
     instead of validating that there was no pending entry on the
     offlined CPU at all.

   - The @cpu member of struct cpuhp_cpu_state is initialized when the
     CPU hotplug thread for the upcoming CPU is created. That's too late
     because the creation of the thread can fail and then the following
     rollback operates on CPU0. Get rid of the CPU member and hand the
     CPU number to the involved functions directly"

* tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
  smp: Fix offline cpu check in flush_smp_call_function_queue()
2022-04-17 09:46:15 -07:00
Linus Torvalds
7e1777f5ec A single fix for the interrupt affinity spreading logic to take into
account that there can be an imbalance between present and possible CPUs,
 which causes already assigned bits to be overwritten.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4cETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWrDD/wOd9X4zQz6cZci2Xl091IhMlS2/sBm
 afY98mTK56rX/BEgybMhzXZtBOSdDL2UDnBYznuYiPWof09+mqJGp4cLJAtVvK7E
 cDoLdpqmFuQzO8Ka1lUgdgwR+d32pFmaBASbl/0COdl8qgNskQp+jD6BaztNxYbZ
 HjkN2chMXqRlCOkKCthNajPEB3thmW2H0gDiZT/VEGYkQfp+HdoH7QhpIXt8HkWd
 lb1g1bpDBUwsGrVJ2PlI+XvwmYdW/WOyebIq2D1XBhSS1RXVTt5xfXh/ZfhULzgn
 2bnhvXU7XwDZRJNte1r8EFHSufXZUwhZIa9DRKHoy1fxNaxufRy0VJLEqcRHbcBO
 xGIvXFfjmn+vmO6hwC1/DeP7Dz6WiTYoxTwJVpvgURqcEdr1HvnSy1CdMEOsAMUa
 ZQDJzudV47a2qj80MRO1TSovW2gtDPdSWEUp5oOx2nYUZbqME146ubglZEcXLzq7
 Y7BeMaO1VzXkfyk6ImtrBYY+qrXaT1ZQvCOGPrOltnlU0cBLkQMSUpjTstIcaOAp
 iIvPLjVR1fDdgEBRck+ZAsM9snefz160wEbx23PzGuh2BenLpQfKZHIYhDO8wo2l
 4uUmnp7VY0FS5I38f+rWNvqhFfJV1oda0dQfprP+s3Usz5oSegl764as5hjGcu+y
 0Z/mCvaPODGctg==
 =wSdp
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the interrupt affinity spreading logic to take into
  account that there can be an imbalance between present and possible
  CPUs, which causes already assigned bits to be overwritten"

* tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/affinity: Consider that CPUs on nodes can be unbalanced
2022-04-17 09:42:03 -07:00