IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Drop specific PHY ID check for cable test functions for at803x. This is
done to make functions more generic. While at it better describe what
the functions does by using more symbolic function names.
PHYs that requires to set additional reg are moved to specific function
calling the more generic one.
cdt_start and cdt_wait_for_completion are changed to take an additional
arg to pass specific values specific to the PHY.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move at8035 specific DT parse for clock out frequency to dedicated probe
to make at803x probe function more generic.
This is to tidy code and no behaviour change are intended.
Detection logic is changed, we check if the clk 25m mask is set and if
it's not zero, we assume the qca,clk-out-frequency property is set.
The property is checked in the generic at803x_parse_dt called by
at803x_probe.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move at8031 functions in dedicated section with dedicated at8031
parse_dt and probe.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename at8031 related DT function name to a more specific name
referencing they are only related to at8031 and not to the generic
at803x PHY family.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move specific at8031 config_intr bits to dedicated function to make
at803x_config_initr more generic.
This is needed in preparation for PHY driver split as qca8081 share the
same function to setup interrupts.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move specific at8031 WOL enable/disable to dedicated function to make
at803x_set_wol more generic.
This is needed in preparation for PHY driver split as qca8081 share the
same function to toggle WOL settings.
In this new implementation WOL module in at8031 is enabled after the
generic interrupt is setup. This should not cause any problem as the
WOL_INT has a separate implementation and only relay on MAC bits.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move specific at8031 config_init to dedicated function to make
at803x_config_init more generic and tidy things up.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move specific at8031 probe mode check to dedicated probe to make
at803x_probe more generic and keep code tidy.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move specific DT options for at8031 to specific probe to tidy things up
and make at803x_parse_dt more generic.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework qca83xx specific check to dedicated function to tidy things up
and drop useless phy_id check.
Also drop an useless link_change_notify for QCA8337 as it did nothing an
returned early.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function and the struct related to hw_stats were specific to qca83xx
PHY but were called following the convention in the driver of calling
everything with at803x prefix.
To better organize the code, rename these function a more specific name
to better describe that they are specific to 83xx PHY family.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the WOL disable call to specific at8031 probe to make at803x_probe
more generic and drop extra check for PHY ID.
Keep the same previous behaviour by first calling at803x_probe and then
disabling WOL.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix passing the wrong reference for config_initr on passing the function
pointer, drop the wrong & from at803x_config_intr in the PHY struct.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mode supported is currently reported to the user exactly the same, as
the current mode. That's because mode changing is not implemented.
Remove the leftover mode_supported() op and use mode_get() to fill up
the supported mode exposed to user.
One, if even, mode changing is going to be introduced, this could be
very easily taken back. In the meantime, prevent drivers form
implementing this in wrong way (as for example recent netdevsim
implementation attempt intended to do).
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Update for net-next
The first 4 patches in the series fix issues in the net-next tree
introduced in the last 4 weeks. The first 3 patches fix ring accounting
and indexing logic. The 4th patch fix TX timeout when the TX ring is
very small.
The next 7 patches add new features on the P7 chips, including TX
coalesced completions, VXLAN GPE and UDP GSO stateless offload, a
new rx_filter_miss counters, and more QP backing store memory for
RoCE.
The last 2 patches are PTP improvements.
====================
Link: https://lore.kernel.org/r/20231212005122.2401-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a busy network, especially with flow control enabled, we may
experience timestamp query failures fairly regularly. After a while,
dmesg may be flooded with timestamp query failure error messages.
Silence the error message from the low level hwrm function that
sends the firmware message. Change netdev_err() to netdev_WARN_ONCE()
if this FW call ever fails.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-14-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We don't have to close and open the nic to make sure we have
valid rx timestamps. Once we have the timestamp filter applied to
the HW and the timestamp_fld_format bit is cleared in the rx
completion and the timestamp is non-zero, we can be sure that rx
timestamp is valid data.
Skip close/open when we set any timestamp filter.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-13-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new 5760X chips supports UDP GSO. Tested using udpgso_bench_tx.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rx_filter_miss counter is newly added to the rx_port_stats_ext
stats structure for newer chips. Newer firmware will return the
structure size that includes this counter. Add this entry to
the bnxt_port_stats_ext_arr array and the ethtool -S code will
pick up this counter if it is supported.
Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On the new P7 chips, TPA for tunnel packets can be independently
enabled for each VNIC. The default TPA configuration should not
include UDP tunnels because the UDP ports for these tunnels are not
known yet. The chip should not aggregate these UDP tunneled packets
using default UDP ports until the ports are known.
Add a new function bnxt_hwrm_vnic_update_tunl_tpa() to enable VXLAN
and Geneve TPA if the corresponding UDP ports are known.
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new bnxt_udp_tunnels_p7 struct to support the new P7 chips that
can parse VXLAN GPE packets. Add VXLAN GPE tunnel type handling to
the .set_port() and .unset_port() functions. .ndo_features_check()
is also enhanced to support VXLAN GPE which may encapsulate inner
IP packets instead of ethernet packets.
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnxt_udp_tunnel_set_port(), use the proper ALLOC commands instead
of the FREE commands for correctness. The ALLOC and FREE commands
happen to be identical so this is just a cosmetic fix for correctness.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Fast QP modify destroy RoCE feature requires additional QP entries
in QP context backing store. FW reports the extra count to be
allocated during backing store query. Use this value and allocate extra
memory. Note that this works for both the V1 and V1 backing store
FW APIs.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TX coalesced completions are supported on newer chips to provide
one TX completion record for multiple TX packets up to the
sq_cons_idx in the completion record. This method saves PCIe
bandwidth by reducing the number of TX completions.
Only very minor changes are now required to support this mode
with the new framework that handles TX completions based on
the consumer indices.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If xmit_more condition is true, the driver may set the
TX_BD_FLAGS_NO_CMPL flag. If after this packet, the TX ring can no
longer hold a packet with maximum fragments, we will stop the TX
queue. When this happens, we must clear the TX_BD_FLAGS_NO_CMPL flag
on the last packet or there will be no completion and cause TX
timeout.
Fixes: c1056a59ae ("bnxt_en: Optimize xmit_more TX path")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two spots were missed when modifying the TX ring indexing logic.
The use of unmasked TX index in bnxt_tx_int() will cause unnecessary
__bnxt_tx_int() calls. The same issue in bnxt_tx_int_xdp() can
result in illegal array index.
Fixes: 6d1add9553 ("bnxt_en: Modify TX ring indexing logic.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
_bnxt_get_max_rings() that is invoked in bnxt_check_rings() already
accounts for the AGG ring(s) and gives a max value based on that.
Increasing for AGG rings before calling _bnxt_get_max_rings() will
result in checking for twice the number of rings than required and
it can fail. Fix it by adjusting for AGG rings after calling
_bnxt_get_max_rings().
Fixes: f5b29c6afe ("bnxt_en: Add helper to get the number of CP rings required for TX rings")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The recent commit to trim the RX and TX rings on P5 chips by assigning
each with max CP rings divided by 2 is not correct. Max CP rings
divided by 2 may be bigger than the original RX or TX and would
lead to failure. In other words, we may be checking for increased
RX/TX rings than required and it may fail.
Fix it by calling __bnxt_trim_rings() instead that would properly
trim RX and TX without the possibility of increasing their values.
Fixes: f5b29c6afe ("bnxt_en: Add helper to get the number of CP rings required for TX rings")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231212005122.2401-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmV42XAACgkQQRCzN7AZ
XXNFUw/9FsuuGYEhZ10wTvyosu7rEQVngfXORURLy4KZuWmoq5/fhYv7/x4Qiyv1
r5xaQ5yGBKpZyGq3V1XGdPrHMnBXQFnR/wPbepyXchPWH6CU6dhV04D1gjDplEdL
8wBYG9ZFuvMN7NaEczi2+IizTqqLo0oxcYH4rEN2Mf14L9jVkZvRsU8jo5Nsn4OR
6oPWMww2e0jCo50vcA3QdpBBc0AyEKfytac3ka0VxzH2w6f5chmkMDiVRkdVhG9l
4bL0K6RYnImWMmj2nomiey+XjCQAQYJ6LzXj5kZU+5G+PImHY0Q/E5TGoF1tEqm+
GCyuwr73f0kJJMsUuCIeIGlL7XT4UMac+Pbii7qgNNGBD/AeRm5S+V4ZHLT1OOOe
IY/4+hTa/k5mXd+YaURy7QB9dvu+vf2zqWW8tf4bnEQW687nvk0hS8ek8Ddo+CTU
Q0KiNcDhWd7csZPCVN+VEy259i3/hwUZOzwscCBrj8RORoc/YjTQbGkyfFvmuanO
1fLiJrh2G0LFcOkImmHO+Q/sVuyHKp8afFwxiYdaZQedJ+6WT0lYnV9PFuxCkrd3
cZz+KlMYSOzIOogfVCqqpZysf9zuVnuVVv7Qan5udDnMpwP6wOaylbDwiCFhr7U3
iZBgd7bZKZh+AJtIBStw30Al8vNTDIustSXphts95sBGieiBJio=
=MQyi
-----END PGP SIGNATURE-----
Merge tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Linus Walleij says:
====================
Immutable tag for the PEF2256 framer
* tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
MAINTAINERS: Add the Lantiq PEF2256 driver entry
pinctrl: Add support for the Lantic PEF2256 pinmux
net: wan: framer: Add support for the Lantiq PEF2256 framer
dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framer
net: wan: Add framer framework support
====================
Link: https://lore.kernel.org/all/CACRpkdYT1J7noFUhObFgfA60XQAfL4rb=knEmWS__TKKtCMh7Q@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After contributing the driver, add myself as the maintainer for the
Lantiq PEF2256 driver.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20231128132534.258459-6-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
This kind of component can be found in old telecommunication system.
It was used to digital transmission of many simultaneous telephone calls
by time-division multiplexing. Also using HDLC protocol, WAN networks
can be reached through the framer.
This pinmux support handles the pin muxing part (pins RP(A..D) and pins
XP(A..D)) of the PEF2256.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231128132534.258459-5-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231128132534.258459-4-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20231128132534.258459-3-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
A framer is a component in charge of an E1/T1 line interface.
Connected usually to a TDM bus, it converts TDM frames to/from E1/T1
frames. It also provides information related to the E1/T1 line.
The framer framework provides a set of APIs for the framer drivers
(framer provider) to create/destroy a framer and APIs for the framer
users (framer consumer) to obtain a reference to the framer, and
use the framer.
This basic implementation provides a framer abstraction for:
- power on/off the framer
- get the framer status (line state)
- be notified on framer status changes
- get/set the framer configuration
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231128132534.258459-2-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When compiling with gcc version 14.0.0 20231129 (experimental) and
CONFIG_FORTIFY_SOURCE=y, I've noticed the following warning:
...
In function 'fortify_memcpy_chk',
inlined from 'ax88796c_tx_fixup' at drivers/net/ethernet/asix/ax88796c_main.c:287:2:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
This call to 'memcpy()' is interpreted as an attempt to copy TX_OVERHEAD
(which is 8) bytes from 4-byte 'sop' field of 'struct tx_pkt_info' and
thus overread warning is issued. Since we actually want to copy both
'sop' and 'seg' fields at once, use the convenient 'struct_group()' here.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20231211090535.9730-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Walleij says:
====================
net: dsa: realtek: Two RTL8366RB fixes
These minor fixes were found while digging into other
issues: a weirdly named variable and bogus MTU handling.
Fix it up.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
====================
Link: https://lore.kernel.org/r/20231209-rtl8366rb-mtu-fix-v1-0-df863e2b2b2a@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The MTU callbacks are in layer 1 size, so for example 1500
bytes is a normal setting. Cache this size, and only add
the layer 2 framing right before choosing the setting. On
the CPU port this will however include the DSA tag since
this is transmitted from the parent ethernet interface!
Add the layer 2 overhead such as ethernet and VLAN framing
and FCS before selecting the size in the register.
This will make the code easier to understand.
The rtl8366rb_max_mtu() callback returns a bogus MTU
just subtracting the CPU tag, which is the only thing
we should NOT subtract. Return the correct layer 1
max MTU after removing headers and checksum.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The driver is using iowriteXX()/ioreadXX() APIs which are LE IO
accessors simplified as
1. Convert given value _from_ CPU _to_ LE
2. Write it to the device as is
The dev_addr is a byte stream, but because the driver uses 16-bit
IO accessors, it wants to perform double conversion on BE CPUs,
but it took it wrong, as it effectivelly does two times _from_ CPU
_to_ LE. What it has to do is to consider dev_addr as an array of
LE16 and hence do _from_ LE _to_ CPU conversion, followed by implied
_from_ CPU _to_ LE in the iowrite16().
To achieve that, use get_unaligned_le16(). This will make it correct
and allows to avoid sparse warning as reported by LKP.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312030058.hfZPTXd7-lkp@intel.com/
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20231208153327.3306798-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela says:
====================
net/sched: conditional notification of events for cls and act
This is an optimization we have been leveraging on P4TC but we believe
it will benefit rtnl users in general.
It's common to allocate an skb, build a notification message and then
broadcast an event. In the absence of any user space listeners, these
resources (cpu and memory operations) are wasted. In cases where the subsystem
is lockless (such as in tc-flower) this waste is more prominent. For the
scenarios where the rtnl_lock is held it is not as prominent.
The idea is simple. Build and send the notification iif:
- The user requests via NLM_F_ECHO or
- Someone is listening to the rtnl group (tc mon)
On a simple test with tc-flower adding 1M entries, using just a single core,
there's already a noticeable difference in the cycles spent in tc_new_tfilter
with this patchset.
before:
- 43.68% tc_new_tfilter
+ 31.73% fl_change
+ 6.35% tfilter_notify
+ 1.62% nlmsg_notify
0.66% __tcf_qdisc_find.part.0
0.64% __tcf_chain_get
0.54% fl_get
+ 0.53% tcf_proto_lookup_ops
after:
- 39.20% tc_new_tfilter
+ 34.58% fl_change
0.69% __tcf_qdisc_find.part.0
0.67% __tcf_chain_get
+ 0.61% tcf_proto_lookup_ops
Note, the above test is using iproute2:tc which execs a shell.
We expect people using netlink directly to observe even greater
reductions.
The qdisc side needs some refactoring of the notification routines to fit in
this new model, so they will be sent in a later patchset.
====================
Link: https://lore.kernel.org/r/20231208192847.714940-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today tc-filter/chain events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed. This will help to alleviate
system pressure when filters are concurrently added without the rtnl
lock as in tc-flower.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-8-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This argument is never called while set to true, so remove it as there's
no need for it.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-7-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today tc-action events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use max() in a couple of places that are open coding it with the
ternary operator.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a convenience helper for routines handling conditional rtnl
events, that is code that might send a notification depending on
rtnl_has_listeners/rtnl_notify_needed.
Instead of:
if (skb)
rtnetlink_send(...)
Use:
rtnetlink_maybe_send(...)
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Building on the rtnl_has_listeners helper, add the rtnl_notify_needed
helper to check if we can bail out early in the notification routines.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of today, rtnl code creates a new skb and unconditionally fills and
broadcasts it to the relevant group. For most operations this is okay
and doesn't waste resources in general.
When operations are done without the rtnl_lock, as in tc-flower, such
skb allocation, message fill and no-op broadcasting can happen in all
cores of the system, which contributes to system pressure and wastes
precious cpu cycles when no one will receive the built message.
Introduce this helper so rtnetlink operations can simply check if someone
is listening and then proceed if necessary.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231208192847.714940-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>