1268642 Commits

Author SHA1 Message Date
Daniel Jurgens
c39add9b24 virtio_net: Add TX stopped and wake counters
Add a tx queue stop and wake counters, they are useful for debugging.

$ ./tools/net/ynl/cli.py --spec netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue"}'
...
 {'ifindex': 13,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 14756682850,
  'tx-packets': 226465,
  'tx-stop': 113208,
  'tx-wake': 113208},
 {'ifindex': 13,
  'queue-id': 1,
  'queue-type': 'tx',
  'tx-bytes': 18167675008,
  'tx-packets': 278660,
  'tx-stop': 8632,
  'tx-wake': 8632}]

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-3-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:58:36 -07:00
Daniel Jurgens
b56035101e netdev: Add queue stats for TX stop and wake
TX queue stop and wake are counted by some drivers.
Support reporting these via netdev-genl queue stats.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:58:36 -07:00
Matthieu Baerts (NGI0)
c084ebd77a tcp: socket option to check for MPTCP fallback to TCP
A way for an application to know if an MPTCP connection fell back to TCP
is to use getsockopt(MPTCP_INFO) and look for errors. The issue with
this technique is that the same errors -- EOPNOTSUPP (IPv4) and
ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the
kernel doesn't support this socket option. The userspace then has to
look at the kernel version to understand what the errors mean.

It is not clean, and it doesn't take into account older kernels where
the socket option has been backported. A cleaner way would be to expose
this info to the TCP socket level. In case of MPTCP socket where no
fallback happened, the socket options for the TCP level will be handled
in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in
TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value
1, while TCP has to set 0.

If the socket option is not supported, one of these two errors will be
reported:
- EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets
- ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the
  socket received after an 'accept()', when the client didn't request to
  use MPTCP: this socket will be a TCP one, even if the listen socket
  was an MPTCP one.

With this new option, the kernel can return a clear answer to both "Is
this kernel new enough to tell me the fallback status?" and "If it is
new enough, is it currently a TCP or MPTCP socket?" questions, while not
breaking the previous method.

Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:48:04 -07:00
Jakub Kicinski
e6e43570fd Merge branch 'net-gro-remove-network_header-use-move-p-flush-flush_id-calculations-to-l4'
Richard Gobert says:

====================
net: gro: remove network_header use, move p->{flush/flush_id} calculations to L4

The cb fields network_offset and inner_network_offset are used instead of
skb->network_header throughout GRO.

These fields are then leveraged in the next commit to remove flush_id state
from napi_gro_cb, and stateful code in {ipv6,inet}_gro_receive which may be
unnecessarily complicated due to encapsulation support in GRO. These fields
are checked in L4 instead.

3rd patch adds tests for different flush_id flows in GRO.
====================

Link: https://lore.kernel.org/r/20240509190819.2985-1-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:44:13 -07:00
Richard Gobert
bc21faefbe selftests/net: add flush id selftests
Added flush id selftests to test different cases where DF flag is set or
unset and id value changes in the following packets. All cases where the
packets should coalesce or should not coalesce are tested.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-4-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:44:06 -07:00
Richard Gobert
4b0ebbca3e net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
iph->id, ...) against all packets in a loop. These flush checks are used in
all merging UDP and TCP flows.

These checks need to be done only once and only against the found p skb,
since they only affect flush and not same_flow.

This patch leverages correct network header offsets from the cb for both
outer and inner network headers - allowing these checks to be done only
once, in tcp_gro_receive and udp_gro_receive_segment. As a result,
NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are
more declarative and contained in inet_gro_flush, thus removing the need
for flush_id in napi_gro_cb.

This results in less parsing code for non-loop flush tests for TCP and UDP
flows.

To make sure results are not within noise range - I've made netfilter drop
all TCP packets, and measured CPU performance in GRO (in this case GRO is
responsible for about 50% of the CPU utilization).

perf top while replaying 64 parallel IP/TCP streams merging in GRO:
(gro_receive_network_flush is compiled inline to tcp_gro_receive)
net-next:
        6.94% [kernel] [k] inet_gro_receive
        3.02% [kernel] [k] tcp_gro_receive

patch applied:
        4.27% [kernel] [k] tcp_gro_receive
        4.22% [kernel] [k] inet_gro_receive

perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
results for any encapsulation, in this case inet_gro_receive is top
offender in net-next)
net-next:
        10.09% [kernel] [k] inet_gro_receive
        2.08% [kernel] [k] tcp_gro_receive

patch applied:
        6.97% [kernel] [k] inet_gro_receive
        3.68% [kernel] [k] tcp_gro_receive

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-3-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:44:06 -07:00
Richard Gobert
186b1ea73a net: gro: use cb instead of skb->network_header
This patch converts references of skb->network_header to napi_gro_cb's
network_offset and inner_network_offset.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-2-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:44:06 -07:00
Jakub Kicinski
9af9b891fc Merge branch 'ena-driver-changes-may-2024'
David Arinzon says:

====================
ENA driver changes May 2024

This patchset contains several misc and minor
changes to the ENA driver.
====================

Link: https://lore.kernel.org/r/20240512134637.25299-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:07 -07:00
David Arinzon
1cc0a47daa net: ena: Change initial rx_usec interval
For the purpose of obtaining better CPU utilization,
minimum rx moderation interval is set to 20 usec.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-6-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
97776caf6c net: ena: Changes around strscpy calls
strscpy copies as much of the string as possible,
meaning that the destination string will be truncated
in case of no space. As this is a non-critical error in
our case, adding a debug level print for indication.

This patch also removes a -1 which was added to ensure
enough space for NUL, but strscpy destination string is
guaranteed to be NUL-terminted, therefore, the -1 is
not needed.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
b37b98a3a0 net: ena: Add validation for completion descriptors consistency
Validate that `first` flag is set only for the first
descriptor in multi-buffer packets.
In case of an invalid descriptor, a reset will occur.
A new reset reason for RX data corruption has been added.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
48673ef444 net: ena: Reduce holes in ena_com structures
This patch makes two changes in order to fill holes and
reduce ther overall size of the structures ena_com_dev
and ena_com_rx_ctx.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
62a261f6c1 net: ena: Add a counter for driver's reset failures
This patch adds a counter to the ena_adapter struct in
order to keep track of reset failures.
The counter is incremented every time either ena_restore_device()
or ena_destroy_device() fail.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
Florian Westphal
5fcc17dfe0 selftests: netfilter: nft_flowtable.sh: bump socat timeout to 1m
Now that this test runs in netdev CI it looks like 10s isn't enough
for debug kernels:
  selftests: net/netfilter: nft_flowtable.sh
  2024/05/10 20:33:08 socat[12204] E write(7, 0x563feb16a000, 8192): Broken pipe
  FAIL: file mismatch for ns1 -> ns2
  -rw------- 1 root root 37345280 May 10 20:32 /tmp/tmp.Am0yEHhNqI
 ...

Looks like socat gets zapped too quickly, so increase timeout to 1m.

Could also reduce tx file size for KSFT_MACHINE_SLOW, but its preferrable
to have same test for both debug and nondebug.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240511064814.561525-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:36:26 -07:00
Vladimir Oltean
cfc2eefd40 selftests: net: use upstream mtools
Joachim kindly merged the IPv6 support in
https://github.com/troglobit/mtools/pull/2, so we can just use his
version now. A few more fixes subsequently came in for IPv6, so even
better.

Check that the deployed mtools version is 3.0 or above. Note that the
version check breaks compatibility with my fork where I didn't bump the
version, but I assume that won't be a problem.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240510112856.1262901-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 13:54:33 -07:00
Colin Ian King
f37dc28ac6 selftest: epoll_busy_poll: Fix spelling mistake "couldnt" -> "couldn't"
There is a spelling mistake in a TH_LOG message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510084811.3299685-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 13:53:53 -07:00
Daniel Golle
87bfdbbb19 net: phy: air_en8811h: reset netdev rules when LED is set manually
Setting LED_OFF via brightness_set should deactivate hw control, so make
sure netdev trigger rules also get cleared in that case.
This fixes unwanted restoration of the default netdev trigger rules and
matches the behaviour when using the 'netdev' trigger without any
hardware offloading.

Fixes: 71e79430117d ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/5ed8ea615890a91fa4df59a7ae8311bbdf63cdcf.1715248281.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 13:51:17 -07:00
Jakub Kicinski
c85e41bfe7 netfilter pull request 24-05-12
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZA578ACgkQ1V2XiooU
 IOS18g//Zyuv+23GcUM7+FEXrlMN658xJyWiYvKjOaZZx5ZiV0QdZc4cbfPFD44p
 qZBMmVC/WVoC89SLdwH1W47KoJU0xsK3/OdqGHHeNJ69111wIQMpOLfAetS2K0mb
 F+Ue2vyWg1GQDICGsCdenHX7ihtVvnJJkomxc+3ObxtLCNsb2Dsr6JM5hMVP5Bil
 4UZnPsrgfWy3A8O92burlPVE1sTWDFfFUGIf8geJc4QadwkgufkzxMhXNO7xHlpG
 EZ99s8FPyD3R6tRPjf4gwdjr7JjinrdrYjZDuS4d3Uv8pKlUqcx8PgXG51/unr/y
 qlynLXtEc1QU6SO2jENosHAG2/LQG2zsYEiiLFCP+a1JOtOxevZQKx8MyAFW8xDX
 +RQhcBpTBocIyJ/tCDoM9lp69iYTR196Ct48v6pSGMNhZcddT4K4BkUL47GEs8T3
 IA5x8h5gV2Q9ECMgqSaycdUsfLNgE/6fWx0ROs/wo3tMsgWrXCSJi8RFtN1sNbIO
 rfuNnQiETIFBkQxBi7um8jadxdfIHm65cjgZBCyVbNNml3JwjYvXLxCXt2G7LxC4
 Sg4nZvIbqWIifoMc1aQKypvFZjzzsWtFmYCuEUVLrnpj2SFTyh5CNzNo3MlHf7LG
 sRb/XubdY6e0spLzd5VDjwH5qOT3poWAccatRr5BVUarxXCD5Vs=
 =RD9p
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 skips transaction if object type provides no .update interface.

Patch #2 skips NETDEV_CHANGENAME which is unused.

Patch #3 enables conntrack to handle Multicast Router Advertisements and
	 Multicast Router Solicitations from the Multicast Router Discovery
	 protocol (RFC4286) as untracked opposed to invalid packets.
	 From Linus Luessing.

Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
	 dropping them, from Jason Xing.

Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
	 also from Jason.

Patch #6 removes reference in netfilter's sysctl documentation on pickup
	 entries which were already removed by Florian Westphal.

Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
	 allows to evict entries from the conntrack table,
	 also from Florian.

Patches #8 to #16 updates nf_tables pipapo set backend to allocate
	 the datastructure copy on-demand from preparation phase,
	 to better deal with OOM situations where .commit step is too late
	 to fail. Series from Florian Westphal.

Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
	 transitions, also from Florian.

Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
	 quick atomic reserves exhaustion with large sets, reporter refers
	 to million entries magnitude.

* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: allow clone callbacks to sleep
  selftests: netfilter: add packetdrill based conntrack tests
  netfilter: nft_set_pipapo: remove dirty flag
  netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  netfilter: nft_set_pipapo: merge deactivate helper into caller
  netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  netfilter: nft_set_pipapo: move prove_locking helper around
  netfilter: conntrack: remove flowtable early-drop test
  netfilter: conntrack: documentation: remove reference to non-existent sysctl
  netfilter: use NF_DROP instead of -NF_DROP
  netfilter: conntrack: dccp: try not to drop skb in conntrack
  netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
  netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
  netfilter: nf_tables: skip transaction if update object is not implemented
====================

Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 13:12:35 -07:00
Jakub Kicinski
cddd2dc639 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-05-08 (most Intel drivers)

This series contains updates to i40e, iavf, ice, igb, igc, e1000e, and ixgbe
drivers.

Asbjørn Sloth Tønnesen adds checks against supported flower control flags
for i40e, iavf, ice, and igb drivers.

Michal corrects filters removed during eswitch release for ice.

Corinna Vinschen defers PTP initialization to later in probe so that
netdev log entry is initialized on igc.

Ilpo Järvinen removes a couple of unused, duplicate defines on
e1000e and ixgbe.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
  igc: fix a log entry using uninitialized netdev
  ice: remove correct filters during eswitch release
  igb: flower: validate control flags
  ice: flower: validate control flags
  iavf: flower: validate control flags
  i40e: flower: validate control flags
====================

Link: https://lore.kernel.org/r/20240508173342.2760994-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:33:52 -07:00
Jakub Kicinski
24e28b60b0 Merge branch 'net-qede-convert-filter-code-to-use-extack'
Asbjørn Sloth Tønnesen says:

====================
net: qede: convert filter code to use extack

This series converts the filter code in the qede driver
to use NL_SET_ERR_MSG_*(extack, ...) for error handling.

Patch 1-12 converts qede_parse_flow_attr() to use extack,
along with all it's static helper functions.

qede_parse_flow_attr() is used in two places:
- qede_add_tc_flower_fltr()
- qede_flow_spec_to_rule()

In the latter call site extack is faked in the same way as
is done in mlxsw (patch 12).

While the conversion is going on, some error messages are silenced
in between patch 1-12. If wanted could squash patch 1-12 in a v3, but
I felt that it would be easier to review as 12 more trivial patches.

Patch 13 and 14, finishes up by converting qede_parse_actions(),
and ensures that extack is propagated to it, in both call contexts.

v1: https://lore.kernel.org/netdev/20240507104421.1628139-1-ast@fiberby.net/
====================

Link: https://lore.kernel.org/r/20240508143404.95901-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:02 -07:00
Asbjørn Sloth Tønnesen
841548793b net: qede: use extack in qede_parse_actions()
Convert DP_NOTICE/DP_INFO to NL_SET_ERR_MSG_MOD.

Keep edev around for use with QEDE_RSS_COUNT().

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-15-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
d2a437efd0 net: qede: propagate extack through qede_flow_spec_validate()
Pass extack to qede_flow_spec_validate() when called in
qede_flow_spec_to_rule().

Pass extack to qede_parse_actions().

Not converting qede_flow_spec_validate() to use extack for
errors, as it's only called from qede_flow_spec_to_rule(),
where extack is faked into a DP_NOTICE anyway, so opting to
keep DP_VERBOSE/DP_NOTICE usage.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-14-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
eb705d7345 net: qede: use faked extack in qede_flow_spec_to_rule()
Since qede_parse_flow_attr() now does error reporting
through extack, then give it a fake extack and extract the
error message afterwards if one was set.

The extracted error message is then passed on through
DP_NOTICE(), including messages that was earlier issued
with DP_INFO().

This fake extack approach is already used by
mlxsw_env_linecard_modules_power_mode_apply() in
drivers/net/ethernet/mellanox/mlxsw/core_env.c

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-13-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
d6883bceb2 net: qede: use extack in qede_parse_flow_attr()
Convert qede_parse_flow_attr() to take extack,
and drop the edev argument.

Convert DP_NOTICE calls to use NL_SET_ERR_MSG_* instead.

Pass extack in calls to qede_flow_parse_{tcp,udp}_v{4,6}().

In calls to qede_parse_flow_attr(), if extack is
unavailable, then use NULL for now, until a
subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-12-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
f833a6555e net: qede: add extack in qede_add_tc_flower_fltr()
Define extack locally, to reduce line lengths and aid future users.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-11-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
9c8f5ed884 net: qede: use extack in qede_flow_parse_udp_v4()
Convert qede_flow_parse_udp_v4() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v4_common().

In call to qede_flow_parse_udp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-10-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
b73ad5c7a7 net: qede: use extack in qede_flow_parse_udp_v6()
Convert qede_flow_parse_udp_v6() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v6_common().

In call to qede_flow_parse_udp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-9-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
f84d52776c net: qede: use extack in qede_flow_parse_tcp_v4()
Convert qede_flow_parse_tcp_v4() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v4_common().

In call to qede_flow_parse_tcp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-8-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
b1a18d5781 net: qede: use extack in qede_flow_parse_tcp_v6()
Convert qede_flow_parse_tcp_v6() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v6_common().

In call to qede_flow_parse_tcp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-7-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
f2f993835b net: qede: use extack in qede_flow_parse_v4_common()
Convert qede_flow_parse_v4_common() to take extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

Pass extack in calls to qede_flow_parse_ports() and
qede_set_v4_tuple_to_profile().

In calls to qede_flow_parse_v4_common(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-6-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
a62944d11a net: qede: use extack in qede_flow_parse_v6_common()
Convert qede_flow_parse_v6_common() to take extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

Pass extack in calls to qede_flow_parse_ports() and
qede_set_v6_tuple_to_profile().

In calls to qede_flow_parse_v6_common(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-5-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
f63a9dc507 net: qede: use extack in qede_set_v4_tuple_to_profile()
Convert qede_set_v4_tuple_to_profile() to take extack,
and drop the edev argument.

Convert DP_INFO call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_set_v4_tuple_to_profile(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-4-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
6f88f1257a net: qede: use extack in qede_set_v6_tuple_to_profile()
Convert qede_set_v6_tuple_to_profile() to take extack,
and drop the edev argument.

Convert DP_INFO call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_set_v6_tuple_to_profile(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
a7c9540e96 net: qede: use extack in qede_flow_parse_ports()
Convert qede_flow_parse_ports to use extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_flow_parse_ports(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Eric Dumazet
d50729f1d6 net: usb: smsc95xx: stop lying about skb->truesize
Some usb drivers try to set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a:
stop lying about skb->truesize")

v3: also fix a sparse error ( https://lore.kernel.org/oe-kbuild-all/202405091310.KvncIecx-lkp@intel.com/ )
v2: leave the skb_trim() game because smsc95xx_rx_csum_offload()
    needs the csum part. (Jakub)
    While we are it, use get_unaligned() in smsc95xx_rx_csum_offload().

Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Cc: UNGLinuxDriver@microchip.com
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509083313.2113832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:22:53 -07:00
Colin Ian King
089507a679 net: dsa: microchip: Fix spellig mistake "configur" -> "configure"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509065023.3033397-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:21:22 -07:00
Kuniyuki Iwashima
7172dc93d6 af_unix: Add dead flag to struct scm_fp_list.
Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges()
during GC.") fixed use-after-free by avoid accessing edge->successor while
GC is in progress.

However, there could be a small race window where another process could
call unix_del_edges() while gc_in_progress is true and __skb_queue_purge()
is on the way.

So, we need another marker for struct scm_fp_list which indicates if the
skb is garbage-collected.

This patch adds dead flag in struct scm_fp_list and set it true before
calling __skb_queue_purge().

Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:52:45 -07:00
Andy Shevchenko
84c8b7ad5e net: ethernet: adi: adin1110: Replace linux/gpio.h by proper one
linux/gpio.h is deprecated and subject to remove.
The driver doesn't use it directly, replace it
with what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508114519.972082-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:51:34 -07:00
Hariprasad Kelam
04fb71cc5f octeontx2-pf: Reuse Transmit queue/Send queue index of HTB class
Real number of Transmit queues are incremented when user enables HTB
class and vice versa. Depending on SKB priority driver returns transmit
queue (Txq). Transmit queues and Send queues are one-to-one mapped.

In few scenarios, Driver is returning transmit queue value which is
greater than real number of transmit queue and Stack detects this as
error and overwrites transmit queue value.

For example
user has added two classes and real number of queues are incremented
accordingly
- tc class add dev eth1 parent 1: classid 1:1 htb
      rate 100Mbit ceil 100Mbit prio 1 quantum 1024
- tc class add dev eth1 parent 1: classid 1:2 htb
      rate 100Mbit ceil 200Mbit prio 7 quantum 1024

now if user deletes the class with id 1:1, driver decrements the real
number of queues
- tc class del dev eth1 classid 1:1

But for the class with id 1:2, driver is returning transmit queue
value which is higher than real number of transmit queue leading
to below error

eth1 selects TX queue x, but real number of TX queues is x

This patch solves the problem by assigning deleted class transmit
queue/send queue to active class.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508070935.11501-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:49:15 -07:00
Jakub Kicinski
9c1bbc7ea1 Merge branch 'gve-minor-cleanups'
Simon Horman says:

====================
gve: Minor cleanups

This short patchset provides two minor cleanups for the gve driver.

These were found by tooling as mentioned in each patch,
and otherwise by inspection.

No change in run time behaviour is intended.
Each patch is compile tested only.

v1: https://lore.kernel.org/r/20240503-gve-comma-v1-0-b50f965694ef@kernel.org
====================

Link: https://lore.kernel.org/r/20240508-gve-comma-v2-0-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:52 -07:00
Simon Horman
ba8bcb012b gve: Use ethtool_sprintf/puts() to fill stats strings
Make use of standard helpers to simplify filling in stats strings.

The first two ethtool_puts() changes address the following fortification
warnings flagged by W=1 builds with clang-18. (The last ethtool_puts
change does not because the warning relates to writing beyond the first
element of an array, and gve_gstrings_priv_flags only has one element.)

.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  562 |                         __read_overflow2_field(q_size_field, size);
      |                         ^
.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]

Likewise, the same changes resolve the same problems flagged by Smatch.

.../gve_ethtool.c:100 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_main_stats' too small (32 vs 576)
.../gve_ethtool.c:120 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_adminq_stats' too small (32 vs 512)

Compile tested only.

Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-2-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:49 -07:00
Simon Horman
ebb8308eac gve: Avoid unnecessary use of comma operator
Although it does not seem to have any untoward side-effects,
the use of ';' to separate to assignments seems more appropriate than ','.

Flagged by clang-18 -Wcomma

No functional change intended.
Compile tested only.

Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-1-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:49 -07:00
Jakub Kicinski
b9d5f5711d selftests: net: increase the delay for relative cmsg_time.sh test
Slow machines can delay scheduling of the packets for milliseconds.
Increase the delay to 8ms if KSFT_MACHINE_SLOW. Try to limit the
variability by moving setsockopts earlier (before we read time).

This fixes the "TXTIME rel" failures on debug kernels, like:

  Case ICMPv4  - TXTIME rel returned '', expected 'OK'

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:22:10 -07:00
Jakub Kicinski
2d3b8dfd82 selftests: net: fix timestamp not arriving in cmsg_time.sh
On slow machines the SND timestamp sometimes doesn't arrive before
we quit. The test only waits as long as the packet delay, so it's
easy for a race condition to happen.

Double the wait but do a bit of polling, once the SND timestamp
arrives there's no point to wait any longer.

This fixes the "TXTIME abs" failures on debug kernels, like:

   Case ICMPv4  - TXTIME abs returned '', expected 'OK'

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:22:00 -07:00
Daniel Jurgens
b49bd37f0b virtio_net: Fix memory leak in virtnet_rx_mod_work
The pointer delcaration was missing the __free(kfree).

Fixes: ff7c7d9f5261 ("virtio_net: Remove command data from control_buf")
Reported-by: Jens Axboe <axboe@kernel.dk>
Closes: https://lore.kernel.org/netdev/0674ca1b-020f-4f93-94d0-104964566e3f@kernel.dk/
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240509183634.143273-1-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:20:07 -07:00
Vadim Fedorenko
38155539a1 bnxt_en: silence clang build warning
Clang build brings a warning:

    ../drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:133:12: warning:
    comparison of distinct pointer types ('typeof (tmo_us) *' (aka 'unsigned
    int *') and 'typeof (65535) *' (aka 'int *'))
    [-Wcompare-distinct-pointer-types]
      133 |                 tmo_us = min(tmo_us, BNXT_PTP_QTS_MAX_TMO_US);
          |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix it by specifying proper type for BNXT_PTP_QTS_MAX_TMO_US.

Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240509151833.12579-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:16:35 -07:00
David S. Miller
f8beae078c gtp pull request 24-05-07
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFEKqOX9jqZmzwkCc1GSBvS7ZkBkFAmY5ajUACgkQ1GSBvS7Z
 kBmR5w//U8Cr5QqBzvczVZNmEtgNm4Zsy9DR2QnV7Wzp1hH/4DL7oDzJX319sw0i
 jNUu5QnGhbq5T3hBF/GIbugSQSVhTeS2oRk/Qxc/k8X7cNchlk+iB9U1YF8Lppj8
 Y7pB6pT20SFO/8nhQguYQC8F+W0TKnpwSNcykHHIF1THe1MO8PfPbkdcvty28Mvs
 xB/IiYee5nm88Chx/+PAuJWFSaUK/AmS5jlcBRKpffzPu/HtQ6cLtC8eRrZlYG0v
 plVkExIlsRYhCudBp8ihhLQx2Nc3PpbRPDnE8AKyQCOILXM29/Lk85U4583Lbg9t
 bMQuVEm1lw0HPH2iDMRwkkdsl09xPZX7XpnAv5zhEXnl9kDaEwLjV3eXaxdMUmK0
 wHd05OMnPiOeyTyBvAjVdjNythfgg8fdy40K8E7G2BXw0G9Yv+/klyT2LM8bhmYp
 ec2tI9FyxzCtY/Dz89Of2xQCrFLSXCHVjQAwmAYheSw247JfKu2m3fSnC5ABd/cm
 B70VBkXESYk78uzK6JtpRCqA3CgppeB7tryOJ60Ac0GczeZUSUirbSJxH5laMF71
 aPnhw+PRep+GWaX0ym2dmaIiK6NnkbEHk5B4oHwJFoJzwqJ2ezUAVh0oQssSxpwJ
 zU4i8CtFegVWNsyly7N8js96a6T/Sn5Sny0NTCt+nJpbCwLaaD0=
 =aPYm
 -----END PGP SIGNATURE-----

Merge tag 'gtp-24-05-07' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp
Pablo neira Ayuso says:

====================
gtp pull request 24-05-07

This v3 includes:
- fix for clang uninitialized variable per Jakub.
- address Smatch and Coccinelle reports per Simon
- remove inline in new IPv6 support per Simon
- fix memleaks in netlink control plane per Simon
-o-

The following patchset contains IPv6 GTP driver support for net-next,
this also includes IPv6 over IPv4 and vice-versa:

Patch #1 removes a unnecessary stack variable initialization in the
         socket routine.

Patch #2 deals with GTP extension headers. This variable length extension
         header to decapsulate packets accordingly. Otherwise, packets are
         dropped when these extension headers are present which breaks
         interoperation with other non-Linux based GTP implementations.

Patch #3 prepares for IPv6 support by moving IPv4 specific fields in PDP
         context objects to a union.

Patch #4 adds IPv6 support while retaining backward compatibility.
         Three new attributes allows to declare an IPv6 GTP tunnel
         GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6 as well as
         IFLA_GTP_LOCAL6 to declare the IPv6 GTP UDP socket. Up to this
         patch, only IPv6 outer in IPv6 inner is supported.

Patch #5 uses IPv6 address /64 prefix for UE/MS in the inner headers.
         Unlike IPv4, which provides a 1:1 mapping between UE/MS,
         IPv6 tunnel encapsulates traffic for /64 address as specified
         by 3GPP TS. Patch has been split from Patch #4 to highlight
         this behaviour.

Patch #6 passes up IPv6 link-local traffic, such as IPv6 SLAAC, for
         handling to userspace so they are handled as control packets.

Patch #7 prepares to allow for GTP IPv4 over IPv6 and vice-versa by
         moving IP specific debugging out of the function to build
         IPv4 and IPv6 GTP packets.

Patch #8 generalizes TOS/DSCP handling following similar approach as
         in the existing iptunnel infrastructure.

Patch #9 adds a helper function to build an IPv4 GTP packet in the outer
         header.

Patch #10 adds a helper function to build an IPv6 GTP packet in the outer
          header.

Patch #11 adds support for GTP IPv4-over-IPv6 and vice-versa.

Patch #12 allows to use the same TID/TEID (tunnel identifier) for inner
          IPv4 and IPv6 packets for better UE/MS dual stack integration.

This series integrates with the osmocom.org project CI and TTCN-3 test
infrastructure (Oliver Smith) as well as the userspace libgtpnl library.

Thanks to Harald Welte, Oliver Smith and Pau Espin for reviewing and
providing feedback through the osmocom.org redmine platform to make this
happen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-10 13:59:27 +01:00
Florian Westphal
fa23e0d4b7 netfilter: nf_tables: allow clone callbacks to sleep
Sven Auhagen reports transaction failures with following error:
  ./main.nft:13:1-26: Error: Could not process rule: Cannot allocate memory
  percpu: allocation failed, size=16 align=8 atomic=1, atomic alloc failed, no space left

This points to failing pcpu allocation with GFP_ATOMIC flag.
However, transactions happen from user context and are allowed to sleep.

One case where we can call into percpu allocator with GFP_ATOMIC is
nft_counter expression.

Normally this happens from control plane, so this could use GFP_KERNEL
instead.  But one use case, element insertion from packet path,
needs to use GFP_ATOMIC allocations (nft_dynset expression).

At this time, .clone callbacks always use GFP_ATOMIC for this reason.

Add gfp_t argument to the .clone function and pass GFP_KERNEL or
GFP_ATOMIC flag depending on context, this allows all clone memory
allocations to sleep for the normal (transaction) case.

Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00
Florian Westphal
a8a388c2aa selftests: netfilter: add packetdrill based conntrack tests
Add a new test script that uses packetdrill tool to exercise conntrack
state machine.

Needs ip/ip6tables and conntrack tool (to check if we have an entry in
the expected state).

Test cases added here cover following scenarios:
1. already-acked (retransmitted) packets are not tagged as INVALID
2. RST packet coming when conntrack is already closing (FIN/CLOSE_WAIT)
  transitions conntrack to CLOSE even if the RST is not an exact match
3. RST packets with out-of-window sequence numbers are marked as INVALID
4. SYN+Challenge ACK: check that challenge ack is allowed to pass
5. Old SYN/ACK: check conntrack handles the case where SYN is answered
  with SYN/ACK for an old, previous connection attempt
6. Check SYN reception while in ESTABLISHED state generates a challenge
   ack, RST response clears 'outdated' state + next SYN retransmit gets
   us into 'SYN_RECV' conntrack state.

Tests get run twice, once with ipv4 and once with ipv6.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00
Florian Westphal
532aec7e87 netfilter: nft_set_pipapo: remove dirty flag
After previous change:
 ->clone exists: ->dirty is always true
 ->clone == NULL ->dirty is always false

So remove this flag.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00