1186921 Commits

Author SHA1 Message Date
Robert Hancock
26dd2974c5 net: phy: micrel: Move KSZ9477 errata fixes to PHY driver
The ksz9477 DSA switch driver is currently updating some MMD registers
on the internal port PHYs to address some chip errata. However, these
errata are really a property of the PHY itself, not the switch they are
part of, so this is kind of a layering violation. It makes more sense for
these writes to be done inside the driver which binds to the PHY and not
the driver for the containing device.

This also addresses some issues where the ordering of when these writes
are done may have been incorrect, causing the link to erratically fail to
come up at the proper speed or at all. Doing this in the PHY driver
during config_init ensures that they happen before anything else tries to
change the state of the PHY on the port.

The new code also ensures that autonegotiation is disabled during the
register writes and re-enabled afterwards, as indicated by the latest
version of the errata documentation from Microchip.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:08:37 -07:00
Jakub Kicinski
2dc476404e Merge branch 'tools-ynl-user-space-c'
Jakub Kicinski says:

====================
tools: ynl: user space C

Use the code gen which is already in tree to generate a user space
library for a handful of simple families. I find YNL C quite useful
in some WIP projects, and I think others may find it useful, too.
I was hoping someone will pick this work up and finish it...
but it seems that Python YNL has largely stolen the thunder.
Python may not be great for selftest, tho, and actually this lib
is more fully-featured. The Python script was meant as a quick demo,
funny how those things go.

v2: https://lore.kernel.org/all/20230604175843.662084-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20230603052547.631384-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230605190108.809439-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:34 -07:00
Jakub Kicinski
ee0202e2e7 tools: ynl: add sample for netdev
Add a sample application using the C library.
My main goal is to make writing selftests easier but until
I have some of those ready I think it's useful to show off
the functionality and let people poke and tinker.

Sample outputs - dump:

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 0
      lo[1]	0:
  enp1s0[2]	23: basic redirect rx-sg

Notifications (watching veth pair getting added and deleted):

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): -2
[53]	0: (ntf: dev-add-ntf)
[54]	0: (ntf: dev-add-ntf)
[54]	23: basic redirect rx-sg (ntf: dev-change-ntf)
[53]	23: basic redirect rx-sg (ntf: dev-change-ntf)
[53]	23: basic redirect rx-sg (ntf: dev-del-ntf)
[54]	23: basic redirect rx-sg (ntf: dev-del-ntf)

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:32 -07:00
Jakub Kicinski
d75fdfbc6f tools: ynl: support fou and netdev in C
Generate the code for netdev and fou families. They are simple
and already supported by the code gen.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:32 -07:00
Jakub Kicinski
86878f14d7 tools: ynl: user space helpers
Add "fixed" part of the user space Netlink Spec-based library.
This will get linked with the protocol implementations to form
a full API.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:31 -07:00
Jakub Kicinski
a99bfdf647 tools: ynl-gen: clean up stray new lines at the end of reply-less requests
Do not print empty lines before closing brackets.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:31 -07:00
Lukas Bulwahn
ae91f7e436 net/pppoe: fix a typo for the PPPOE_HASH_BITS_1 definition
Instead of its intention to define PPPOE_HASH_BITS_1, commit 96ba44c637b0
("net/pppoe: make number of hash bits configurable") actually defined
config PPPOE_HASH_BITS_2 twice in the ppp's Kconfig file due to a quick
typo with the numbers.

Fix the typo and define PPPOE_HASH_BITS_1.

Fixes: 96ba44c637b0 ("net/pppoe: make number of hash bits configurable")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jaco Kroon <jaco@uls.co.za>
Link: https://lore.kernel.org/r/20230605072743.11247-1-lukas.bulwahn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:28:30 +02:00
Andy Shevchenko
8d2b2281ae mac_pton: Clean up the header inclusions
Since hex_to_bin() is provided by hex.h there is no need to require
kernel.h. Replace the latter by the former and add missing export.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230604132858.6650-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:18:32 +02:00
Richard Gobert
7b355b76e2 gro: decrease size of CB
The GRO control block (NAPI_GRO_CB) is currently at its maximum size.
This commit reduces its size by putting two groups of fields that are
used only at different times into a union.

Specifically, the fields frag0 and frag0_len are the fields that make up
the frag0 optimisation mechanism, which is used during the initial
parsing of the SKB.

The fields last and age are used after the initial parsing, while the
SKB is stored in the GRO list, waiting for other packets to arrive.

There was one location in dev_gro_receive that modified the frag0 fields
after setting last and age. I changed this accordingly without altering
the code behaviour.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230601161407.GA9253@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 11:12:20 +02:00
Jakub Kicinski
ddb8701dcb Merge branch 'splice-net-handle-msg_splice_pages-in-af_kcm'
David Howells says:

====================
splice, net: Handle MSG_SPLICE_PAGES in AF_KCM

Here are patches to make AF_KCM handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  Its sendpage
implementation is then turned into a wrapper around that.

Does anyone actually use AF_KCM?  Upstream it has some issues.  It doesn't
seem able to handle a "message" longer than 113920 bytes without jamming
and doesn't handle the client termination once it is jammed.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524144923.3623536-1-dhowells@redhat.com/ # v1
====================

Link: https://lore.kernel.org/r/20230531110423.643196-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:58 -07:00
David Howells
5bb3a5cb3e kcm: Convert kcm_sendpage() to use MSG_SPLICE_PAGES
Convert kcm_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:56 -07:00
David Howells
2b03bcae66 kcm: Support MSG_SPLICE_PAGES
Make AF_KCM sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:56 -07:00
Jakub Kicinski
28cfea989d mlx5-updates-2023-05-31
net/mlx5: Support 4 ports VF LAG, part 1/2
 
 This series continues the series[1] "Support 4 ports HCAs LAG mode"
 by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB
 E-Switch).
 
 This series of patches focuses on refactoring different sections of the
 code that make assumptions about VF LAG supporting only two ports. For
 instance, it assumes that each device can only have one peer.
 
 Patches 1-5:
 - Refactor ETH handling of TC rules of eswitches with peers.
 Patch 6:
 - Refactors peer miss group table.
 Patches 7-9:
 - Refactor single FDB E-Switch creation.
 Patch 10:
 - Refactor the DR layer.
 Patches 11-14:
 - Refactors devcom layer.
 
 Next series will refactor LAG layer and enable 4 ports VF LAG.
 This series specifically allows HCAs with 4 ports to create a VF LAG
 with only 4 ports. It is not possible to create a VF LAG with 2 or 3
 ports using HCAs that have 4 ports.
 
 Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
 However, upcoming patches will introduce support for HCAs with 4 ports.
 
 In order to activate VF LAG a user can execute:
 
 devlink dev eswitch set pci/0000:08:00.0 mode switchdev
 devlink dev eswitch set pci/0000:08:00.1 mode switchdev
 devlink dev eswitch set pci/0000:08:00.2 mode switchdev
 devlink dev eswitch set pci/0000:08:00.3 mode switchdev
 ip link add name bond0 type bond
 ip link set dev bond0 type bond mode 802.3ad
 ip link set dev eth2 master bond0
 ip link set dev eth3 master bond0
 ip link set dev eth4 master bond0
 ip link set dev eth5 master bond0
 
 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.
 
 User can verify LAG state and type via debugfs:
 /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
 /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type
 
 [1]
 https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmR6PrkACgkQSD+KveBX
 +j4QYAf/TlwnjQP+Is+9lJEIC+RH1bDPEsbiw+kwlMU6AveDYaHl1bevjJoE7qS5
 uvVunRh7SENWtU56TOiYO7CwkbTHAx1WdJdXZDudfed1olF0hMUO5yDixaFgiCMn
 hTWJJvNprWWK5Ti+yZ39ZHsrigXze7g4ZoJ0JFyDKfR3cobqNjgpOLXz+XMR0NEY
 7ym6VxZATNjVWorBeHMA+Oe249s4oK/m5i0LV/8yrkhDpFkCcfkTdLmvKQ+TRu11
 Akq1dd236LqGqAPO2tFg8NgnjsXu1vvCJnvjML5VFDbruDv3h3fa6F0mClrRmYm7
 f7I32QW10+NYBANcoWVCn6EQzmWDpQ==
 =ZDar
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-05-31

net/mlx5: Support 4 ports VF LAG, part 1/2

This series continues the series[1] "Support 4 ports HCAs LAG mode"
by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB
E-Switch).

This series of patches focuses on refactoring different sections of the
code that make assumptions about VF LAG supporting only two ports. For
instance, it assumes that each device can only have one peer.

Patches 1-5:
- Refactor ETH handling of TC rules of eswitches with peers.
Patch 6:
- Refactors peer miss group table.
Patches 7-9:
- Refactor single FDB E-Switch creation.
Patch 10:
- Refactor the DR layer.
Patches 11-14:
- Refactors devcom layer.

Next series will refactor LAG layer and enable 4 ports VF LAG.
This series specifically allows HCAs with 4 ports to create a VF LAG
with only 4 ports. It is not possible to create a VF LAG with 2 or 3
ports using HCAs that have 4 ports.

Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
However, upcoming patches will introduce support for HCAs with 4 ports.

In order to activate VF LAG a user can execute:

devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
devlink dev eswitch set pci/0000:08:00.2 mode switchdev
devlink dev eswitch set pci/0000:08:00.3 mode switchdev
ip link add name bond0 type bond
ip link set dev bond0 type bond mode 802.3ad
ip link set dev eth2 master bond0
ip link set dev eth3 master bond0
ip link set dev eth4 master bond0
ip link set dev eth5 master bond0

Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.

User can verify LAG state and type via debugfs:
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type

[1]
https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/

* tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices
  net/mlx5: Devcom, introduce devcom_for_each_peer_entry
  net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired
  net/mlx5: Devcom, Rename paired to ready
  net/mlx5: DR, handle more than one peer domain
  net/mlx5: E-switch, generalize shared FDB creation
  net/mlx5: E-switch, Handle multiple master egress rules
  net/mlx5: E-switch, refactor FDB miss rule add/remove
  net/mlx5: E-switch, enlarge peer miss group table
  net/mlx5e: Handle offloads flows per peer
  net/mlx5e: en_tc, re-factor query route port
  net/mlx5e: rep, store send to vport rules per peer
  net/mlx5e: tc, Refactor peer add/del flow
  net/mlx5e: en_tc, Extend peer flows to a list
====================

Link: https://lore.kernel.org/r/20230602191301.47004-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:42:22 -07:00
Jakub Kicinski
c422ac94e6 Merge branch 'drm-i915-use-ref_tracker-library-for-tracking-wakerefs'
Andrzej Hajda says:

====================
drm/i915: use ref_tracker library for tracking wakerefs

This is reviewed series of ref_tracker patches, ready to merge
via network tree, rebased on net-next/main.
i915 patches will be merged later via intel-gfx tree.
====================

Merge on top of an -rc tag in case it's needed in another tree.

Link: https://lore.kernel.org/r/20230224-track_gt-v9-0-5b47a33f55d1@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:45 -07:00
Andrzej Hajda
acd8f0e5d7 lib/ref_tracker: remove warnings in case of allocation failure
Library can handle allocation failures. To avoid allocation warnings
__GFP_NOWARN has been added everywhere. Moreover GFP_ATOMIC has been
replaced with GFP_NOWAIT in case of stack allocation on tracker free
call.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
227c6c8323 lib/ref_tracker: add printing to memory buffer
Similar to stack_(depot|trace)_snprint the patch
adds helper to printing stats to memory buffer.
It will be helpful in case of debugfs.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
b6d7c0eb2d lib/ref_tracker: improve printing stats
In case the library is tracking busy subsystem, simply
printing stack for every active reference will spam log
with long, hard to read, redundant stack traces. To improve
readabilty following changes have been made:
- reports are printed per stack_handle - log is more compact,
- added display name for ref_tracker_dir - it will differentiate
  multiple subsystems,
- stack trace is printed indented, in the same printk call,
- info about dropped references is printed as well.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
7a113ff635 lib/ref_tracker: add unlocked leak print helper
To have reliable detection of leaks, caller must be able to check under
the same lock both: tracked counter and the leaks. dir.lock is natural
candidate for such lock and unlocked print helper can be called with this
lock taken.
As a bonus we can reuse this helper in ref_tracker_dir_exit.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
David S. Miller
69da40ac34 Merge branch 'mlxsw-selftests-cleanups'
Petr Machata says:

====================
mlxsw, selftests: Cleanups

This patchset consolidates a number of disparate items that can all be
considered cleanups. They are all related to mlxsw in that they are
directly in mlxsw code, or in selftests that mlxsw heavily uses.

- patch #1 fixes a comment, patch #2 propagates an extack

- patches #3 and #4 tweak several loops to query a resource once and cache
  in a local variable instead of querying on each iteration

- patches #5 and #6 fix selftest diagrams, and #7 adds a missing diagram
  into an existing test

- patch #8 disables a PVID on a bridge in a selftest that should not need
  said PVID
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
f5136877f4 selftests: router_bridge_vlan: Set vlan_default_pvid 0 on the bridge
When everything is configured, VLAN membership on the bridge in this
selftest are as follows:

    # bridge vlan show
    port              vlan-id
    swp2              1 PVID Egress Untagged
                      555
    br1               1 Egress Untagged
                      555 PVID Egress Untagged

Note that it is possible for untagged traffic to just flow through as VLAN
1, instead of using VLAN 555 as intended by the test. This configuration
seems too close to "works by accident", and it would be better to just shut
out VLAN 1 altogether.

To that end, configure vlan_default_pvid of 0:

    # bridge vlan show
    port              vlan-id
    swp2              555
    br1               555 PVID Egress Untagged

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
812de4dfab selftests: router_bridge_vlan: Add a diagram
Add a topology diagram to this selftest to make the configuration easier to
understand.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
34ad708d1b selftests: mlxsw: egress_vid_classification: Fix the diagram
The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
204cc3d04f selftests: mlxsw: ingress_rif_conf_1d: Fix the diagram
The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
75426cc0b3 mlxsw: spectrum_router: Do not query MAX_VRS on each iteration
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
3903249ee1 mlxsw: spectrum_router: Do not query MAX_RIFS on each iteration
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:48 +01:00
Petr Machata
5afef6748c mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()
In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack
further"), the mlxsw_sp_rif_ops.configure callback got a new argument,
extack. However the callbacks that deal with tunnel configuration,
mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(),
were never updated to pass the parameter further. Do that now.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:48 +01:00
Petr Machata
be35db17c8 mlxsw: spectrum_router: Clarify a comment
"Reserved for X" usually means that only X is supposed to use a given
object. Here, it is used in the sense that X should consider the object
"reserved", as in "restricted".

Replace the comment simply by "X", with the implication that that's where
the field is used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:48 +01:00
David S. Miller
3db0577603 Merge branch 'sja1105-cleanups'
Russell King says:

====================
convert sja1105 xpcs creation and remove xpcs_create

This series of three patches converts sja1105 to use the newly
provided xpcs_create_mdiodev(), and as there become no users of
xpcs_create(), removes this function from the global namespace to
discourage future direct use.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:26:03 +01:00
Russell King (Oracle)
4739b9f3d2 net: pcs: xpcs: remove xpcs_create() from public view
There are now no callers of xpcs_create(), so let's remove it from
public view to discourage future direct usage.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:26:02 +01:00
Russell King (Oracle)
bf9a17b04c net: dsa: sja1105: use xpcs_create_mdiodev()
Use the new xpcs_create_mdiodev() creator, which simplifies the
creation and destruction of the mdio device associated with xpcs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:26:02 +01:00
Russell King (Oracle)
9607eaadba net: dsa: sja1105: allow XPCS to handle mdiodev lifetime
Put the mdiodev after xpcs_create() so that the XPCS driver can manage
the lifetime of the mdiodev its using.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:26:02 +01:00
David S. Miller
f91e32dea6 Merge branch 'regmap-TSE-PCS'
Maxime Chevallier says:

====================
net: add a regmap-based mdio driver and drop TSE PCS

This is the V4 of a series that follows-up on the work [1] aiming to drop the
altera TSE PCS driver, as it turns out to be a version of the Lynx PCS exposed
as a memory-mapped block, instead of living on an MDIO bus.

One step of this removal involved creating a regmap-based mdio driver
that translates MDIO accesses into the actual underlying bus that
exposes the register. The register layout must of course match the
standard MDIO layout, but we can now account for differences in stride
with recent work on the regmap subsystem [2].

Sorry for repeating this, but I didn't hear anything on this matter in previous
iterations, Mark, Net maintainers, this series depends on the patch
e12ff2876493 that was recently merged into the regmap tree [3].

For this series to be usable in net-next, this patch must be applied
beforehand. Should Mark create a tag that would then be merged into
net-next ? Or should we just wait for the next release to merge this
into net-next ?

This series introduces a new MDIO driver, and uses it to convert Altera
TSE from the actual TSE PCS driver to Lynx PCS.

Since it turns out dwmac_socfpga also uses a TSE PCS block, port that
driver to Lynx as well.

Changes in V4 :
 - Use new pcs_lynx_create/destroy helpers added by Russell
 - Rework the cleanup sequence to avoid leaking data
 - Rework a bit KConfig to properly select dependencies
 - Fix a few hiccups with misplaced hunks in 2 commits

Changes in V3 :
 - Use a dedicated struct for the mii bus's priv data, to avoid
   duplicating the whole struct mdio_regmap_config, from which 2 fields
   only are necessary after init, as suggested by Russell
 - Use ~0 instead of ~0UL for the no-scan bitmask, following Simon's
   review.

Changes in V2 :
 - Use phy_mask to avoid unnecessarily scanning the whole mdio bus
 - Go one step further and completely disable scanning if users
   set the .autoscan flag to false, in case the mdiodevice isn't an
   actual PHY (a PCS for example).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 09:56:36 +01:00
Maxime Chevallier
5d1f3fe7d2 net: stmmac: dwmac-sogfpga: use the lynx pcs driver
dwmac_socfpga re-implements support for the TSE PCS, which is identical
to the already existing TSE PCS, which in turn is the same as the Lynx
PCS. Drop the existing TSE re-implemenation and use the Lynx PCS
instead, relying on the regmap-mdio driver to translate MDIO accesses
into mmio accesses.

Add a lynx_pcs reference in the stmmac's internal structure, and use
.mac_select_pcs() to return the relevant PCS to be used.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 09:56:36 +01:00
Maxime Chevallier
196eec4062 net: pcs: Drop the TSE PCS driver
Now that we can easily create a mdio-device that represents a
memory-mapped device that exposes an MDIO-like register layout, we don't
need the Altera TSE PCS anymore, since we can use the Lynx PCS instead.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 09:56:36 +01:00
Maxime Chevallier
db48abbaa1 net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx
The newly introduced regmap-based MDIO driver allows for an easy mapping
of an mdiodevice onto the memory-mapped TSE PCS, which is actually a
Lynx PCS.

Convert Altera TSE to use this PCS instead of the pcs-altera-tse, which
is nothing more than a memory-mapped Lynx PCS.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 09:56:36 +01:00
Maxime Chevallier
642af0f92c net: mdio: Introduce a regmap-based mdio driver
There exists several examples today of devices that embed an ethernet
PHY or PCS directly inside an SoC. In this situation, either the device
is controlled through a vendor-specific register set, or sometimes
exposes the standard 802.3 registers that are typically accessed over
MDIO.

As phylib and phylink are designed to use mdiodevices, this driver
allows creating a virtual MDIO bus, that translates mdiodev register
accesses to regmap accesses.

The reason we use regmap is because there are at least 3 such devices
known today, 2 of them are Altera TSE PCS's, memory-mapped, exposed
with a 4-byte stride in stmmac's dwmac-socfpga variant, and a 2-byte
stride in altera-tse. The other one (nxp,sja1110-base-tx-mdio) is
exposed over SPI.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 09:56:36 +01:00
Matthieu Baerts
f69de8aa47 ipv6: lower "link become ready"'s level message
This following message is printed in the console each time a network
device configured with an IPv6 addresses is ready to be used:

  ADDRCONF(NETDEV_CHANGE): <iface>: link becomes ready

When netns are being extensively used -- e.g. by re-creating netns' with
veth to discuss with each others for testing purposes like mptcp_join.sh
selftest does -- it generates a lot of messages like that: more than 700
when executing mptcp_join.sh with the latest version.

It looks like this message is not that helpful after all: maybe it can
be used as a sign to know if there is something wrong, e.g. if a device
is being regularly reconfigured by accident? But even then, there are
better ways to monitor and diagnose such issues.

When looking at commit 3c21edbd1137 ("[IPV6]: Defer IPv6 device
initialization until the link becomes ready.") which introduces this new
message, it seems it had been added to verify that the new feature was
working as expected. It could have then used a lower level than "info"
from the beginning but it was fine like that back then: 17 years ago.

It seems then OK today to simply lower its level, similar to commit
7c62b8dd5ca8 ("net/ipv6: lower the level of "link is not ready" messages")
and as suggested by Mat [1], Stephen and David [2].

Link: https://lore.kernel.org/mptcp/614e76ac-184e-c553-af72-084f792e60b0@kernel.org/T/ [1]
Link: https://lore.kernel.org/netdev/68035bad-b53e-91cb-0e4a-007f27d62b05@tessares.net/T/ [2]
Suggested-by: Mat Martineau <martineau@kernel.org>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04 15:39:10 +01:00
Russell King (Oracle)
4ec7329517 net: phylib: fix phy_read*_poll_timeout()
Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew
reports that:

"It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work."

Let's make phy_read*_poll_timeout() immune to further issues when "val"
is an unsigned type by storing the read function's result in a signed
int as well as "val", and using the signed variable both to check for
an error and for propagating that error to the caller.

The advantage of this method is we don't change where the cast from
the signed return code to the user's variable occurs - so users will
see no change.

Previously Heiner changed phy_read_poll_timeout() to check for an error
before evaluating the user supplied condition, but didn't update
phy_read_mmd_poll_timeout(). Make that change there too.

Link: https://lore.kernel.org/r/d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q4kX6-00BNuM-Mx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 23:20:31 -07:00
Jakub Kicinski
7fa217d4be Merge branch 'tools-ynl-gen-dust-off-the-user-space-code'
Jakub Kicinski says:

====================
tools: ynl-gen: dust off the user space code

Every now and then I wish I finished the user space part of
the netlink specs, Python scripts kind of stole the show but
C is useful for selftests and stuff which needs to be fast.
Recently someone asked me how to access devlink and ethtool
from C++ which pushed me over the edge.

Fix things which bit rotted and finish notification handling.
This series contains code gen changes only. I'll follow up
with the fixed component, samples and docs as soon as it's
merged.
====================

Link: https://lore.kernel.org/r/20230602023548.463441-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:49 -07:00
Jakub Kicinski
59d814f0f2 tools: ynl-gen: generate static descriptions of notifications
Notifications may come in at any time. The family must be always
ready to parse a random incoming notification. Generate notification
table for parsing and tell YNL which request we're processing
to distinguish responses from notifications.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
8cb6afb335 tools: ynl-gen: switch to family struct
We'll want to store static info about the family soon.
Generate a struct. This changes creation from, e.g.:

	 ys = ynl_sock_create("netdev", &yerr);
to:
	 ys = ynl_sock_create(&ynl_netdev_family, &yerr);

on user's side.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
5d58f911c7 tools: ynl-gen: generate alloc and free helpers for req
We expect user to allocate requests with calloc(),
make things a bit more consistent and provide helpers.
Generate free calls, too.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
dc0956c98f tools: ynl-gen: move the response reading logic into YNL
We generate send() and recv() calls and all msg handling for
each operation. It's a lot of repeated code and will only grow
with notification handling. Call back to a helper YNL lib instead.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
21b6e30278 tools: ynl-gen: generate enum-to-string helpers
It's sometimes useful to print the name of an enum value,
flag or name of the op. Python can do it, add C helper
code gen for getting names of things.

Example:

  static const char * const netdev_xdp_act_strmap[] = {
	[0] = "basic",
	[1] = "redirect",
	[2] = "ndo-xmit",
	[3] = "xsk-zerocopy",
	[4] = "hw-offload",
	[5] = "rx-sg",
	[6] = "ndo-xmit-sg",
  };

  const char *netdev_xdp_act_str(enum netdev_xdp_act value)
  {
	value = ffs(value) - 1;
	if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xdp_act_strmap))
		return NULL;
	return netdev_xdp_act_strmap[value];
  }

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
eef9b794ea tools: ynl-gen: add error checking for nested structs
Parsing nested types may return an error, propagate it.
Not marking as a fix, because nothing uses YNL upstream.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
5605f10237 tools: ynl-gen: loosen type consistency check for events
Both event and notify types are always consistent. Rewrite
the condition checking if we can reuse reply types to be
less picky and let notify thru.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:47 -07:00
Jakub Kicinski
67c65ce762 tools: ynl-gen: don't override pure nested struct
For pure structs (parsed nested attributes) we track what
forms of the struct exist in request and reply directions.
Make sure we don't overwrite the recorded struct each time,
otherwise the information is lost.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:46 -07:00
Jakub Kicinski
6ad49839ba tools: ynl-gen: fix unused / pad attribute handling
Unused and Pad attributes don't carry information.
Unused should never exist, and be rejected.
Pad should be silently skipped.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:46 -07:00
Jakub Kicinski
91dfaef243 tools: ynl-gen: add extra headers for user space
Make sure all relevant headers are included, we allocate memory,
use memcpy() and Linux types without including the headers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-02 22:10:46 -07:00
Shay Drory
e2a82bf8a4 net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices
mlx5_devcom_send_event is used to send event from one eswitch to the
other. In other words, only one event is sent, which means, no error
mechanism is needed.
However, In case devcom have more than two eswitches, a proper error
mechanism is needed. Hence, in case of error, devcom will perform the
error unwind, since devcom knows how many events were successful.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02 12:10:49 -07:00