6564 Commits

Author SHA1 Message Date
Jakub Kicinski
ec4c20ca09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mac80211/rx.c
  91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames")
  6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value")

Adjacent changes:

drivers/net/ethernet/apm/xgene/xgene_enet_main.c
  61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void")
  d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF")

net/vmw_vsock/virtio_transport.c
  64c99d2d6ada ("vsock/virtio: support to send non-linear skb")
  53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 13:46:28 -07:00
Paolo Abeni
3967336126 netfilter pull request 23-10-25
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmU5gvgACgkQ1V2XiooU
 IOS9eA//QyIqcGRzr+tX1ZPwikkicmSb7w8vkrY7jXMNWNiye54tA5fSJsxqpIMy
 9J9k8eB+fI6AJV44tOq4K8XsYCcI4ZEst2mftumvuq8igX27ulz46uLvoVwnqhwc
 NlPO06RwSHQHR3S5tKRRwwYfUwpPjDRCW15c14pHw4EkHAaL+dItLwATrrJhPv93
 PUZNGbB+i+55QrLJdMMshvpoPAhLmo57cDvDcerhOWygaoxiaKIaR0bRQ40eM3Zl
 j9veG2oiuehv/RHqFJ5MBCiqrIYRHU8kTflqaVA+ODfgUSbijCcv/RQxaILnwnZd
 57vdLSBTgVFh2uiPTYGAxfUwv3BpA7g9uzmeBgMbl/t71HhYoPe/QhjvoVhlzNl/
 1JrCFbrQVaCdhtZbDt7f359mUtXv9yh1+9pBytEkdbxRfsDmzRVsFrUeLo0P9Ho8
 4jKpnaqPTdnzhfoQocZjL7+M22/zk6jZCu1Pcs318yqpTkhJwiTtByo4iaQ2f0F7
 xgA/auZ/33mmirFXnLzMoA4b0TbJNm+Jjye3tdbu0FlY2sKb943RW2kXg9rdfyOL
 OUvSSux7Fezyq5y55+KV4FFrKgYhrZ5tWR8mVBg1KHrYs9r9p7SmXL3KzbSPxBvn
 QrPdCQsQ187QNIkwli/ChYQcrahwSiEqFSJbdDmzIA7AFNLHVaY=
 =ZG+P
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next. Mostly
nf_tables updates with two patches for connlabel and br_netfilter.

1) Rename function name to perform on-demand GC for rbtree elements,
   and replace async GC in rbtree by sync GC. Patches from Florian Westphal.

2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two
   concurrent threads invoking this command do not underrun stateful
   objects. Patches from Phil Sutter.

3) Use single hook to deal with IP and ARP packets in br_netfilter.
   Patch from Florian Westphal.

4) Use atomic_t in netns->connlabel use counter instead of using a
   spinlock, also patch from Florian.

5) Cleanups for stateful objects infrastructure in nf_tables.
   Patches from Phil Sutter.

6) Flush path uses opaque set element offered by the iterator, instead of
   calling pipapo_deactivate() which looks up for it again.

7) Set backend .flush interface always succeeds, make it return void
   instead.

8) Add struct nft_elem_priv placeholder structure and use it by replacing
   void * to pass opaque set element representation from backend to frontend
   which defeats compiler type checks.

9) Shrink memory consumption of set element transactions, by reducing
   struct nft_trans_elem object size and reducing stack memory usage.

10) Use struct nft_elem_priv also for set backend .insert operation too.

11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it
    as a function argument, from Phil Sutter.

netfilter pull request 23-10-25

* tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx
  netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST
  netfilter: nf_tables: shrink memory consumption of set elements
  netfilter: nf_tables: expose opaque set element as struct nft_elem_priv
  netfilter: nf_tables: set backend .flush always succeeds
  netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush
  netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx
  netfilter: nf_tables: nft_obj_filter fits into cb->ctx
  netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx
  netfilter: nf_tables: A better name for nft_obj_filter
  netfilter: nf_tables: Unconditionally allocate nft_obj_filter
  netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj
  netfilter: conntrack: switch connlabels to atomic_t
  br_netfilter: use single forward hook for ip and arp
  netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests
  netfilter: nf_tables: Introduce nf_tables_getrule_single()
  netfilter: nf_tables: Open-code audit log call in nf_tables_getrule()
  netfilter: nft_set_rbtree: prefer sync gc to async worker
  netfilter: nft_set_rbtree: rename gc deactivate+erase function
====================

Link: https://lore.kernel.org/r/20231025212555.132775-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-26 12:20:35 +02:00
Pablo Neira Ayuso
735795f68b netfilter: flowtable: GC pushes back packets to classic path
Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded
unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
back to classic path in every run, ie. every second. This is because of
a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.

In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on
and IPS_SEEN_REPLY is unreliable since users decide when to offload the
flow before, such bit might be set on at a later stage.

Fix it by adding a custom .gc handler that sched/act_ct can use to
deal with its NF_FLOW_HW_ESTABLISHED bit.

Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reported-by: Vladimir Smelhaus <vl.sm@email.cz>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-25 11:35:46 +02:00
Phil Sutter
9cdee06347 netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx
Relieve the dump callback from having to check nlmsg_type upon each
call. Prep work for set element reset locking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 15:48:30 +02:00
Pablo Neira Ayuso
078996fcd6 netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST
Return struct nft_elem_priv instead of struct nft_set_ext for
consistency with ("netfilter: nf_tables: expose opaque set element as
struct nft_elem_priv") and to prepare the introduction of element
timeout updates from control path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:37:46 +02:00
Pablo Neira Ayuso
0e1ea651c9 netfilter: nf_tables: shrink memory consumption of set elements
Instead of copying struct nft_set_elem into struct nft_trans_elem, store
the pointer to the opaque set element object in the transaction. Adapt
set backend API (and set backend implementations) to take the pointer to
opaque set element representation whenever required.

This patch deconstifies .remove() and .activate() set backend API since
these modify the set element opaque object. And it also constify
nft_set_elem_ext() this provides access to the nft_set_ext struct
without updating the object.

According to pahole on x86_64, this patch shrinks struct nft_trans_elem
size from 216 to 24 bytes.

This patch also reduces stack memory consumption by removing the
template struct nft_set_elem object, using the opaque set element object
instead such as from the set iterator API, catchall elements and the get
element command.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:37:42 +02:00
Pablo Neira Ayuso
9dad402b89 netfilter: nf_tables: expose opaque set element as struct nft_elem_priv
Add placeholder structure and place it at the beginning of each struct
nft_*_elem for each existing set backend, instead of exposing elements
as void type to the frontend which defeats compiler type checks. Use
this pointer to this new type to replace void *.

This patch updates the following set backend API to use this new struct
nft_elem_priv placeholder structure:

- update
- deactivate
- flush
- get

as well as the following helper functions:

- nft_set_elem_ext()
- nft_set_elem_init()
- nft_set_elem_destroy()
- nf_tables_set_elem_destroy()

This patch adds nft_elem_priv_cast() to cast struct nft_elem_priv to
native element representation from the corresponding set backend.
BUILD_BUG_ON() makes sure this .priv placeholder is always at the top
of the opaque set element representation.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Pablo Neira Ayuso
6509a2e410 netfilter: nf_tables: set backend .flush always succeeds
.flush is always successful since this results from iterating over the
set elements to toggle mark the element as inactive in the next
generation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Pablo Neira Ayuso
26cec9d414 netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush
Use the element object that is already offered instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
a552339063 netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx
Relieve the dump callback from having to inspect nlmsg_type upon each
call, just do it once at start of the dump.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
5a893b9cdf netfilter: nf_tables: nft_obj_filter fits into cb->ctx
No need to allocate it if one may just use struct netlink_callback's
scratch area for it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
2eda95cfa2 netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx
Prep work for moving the context into struct netlink_callback scratch
area.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
ecf49cad80 netfilter: nf_tables: A better name for nft_obj_filter
Name it for what it is supposed to become, a real nft_obj_dump_ctx. No
functional change intended.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
4279cc60b3 netfilter: nf_tables: Unconditionally allocate nft_obj_filter
Prep work for moving the filter into struct netlink_callback's scratch
area.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
ff16111cc1 netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj
The code does not make use of cb->args fields past the first one, no
need to zero them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Florian Westphal
643d126036 netfilter: conntrack: switch connlabels to atomic_t
The spinlock is back from the day when connabels did not have
a fixed size and reallocation had to be supported.

Remove it.  This change also allows to call the helpers from
softirq or timers without deadlocks.

Also add WARN()s to catch refcounting imbalances.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:30 +02:00
Phil Sutter
3cb03edb4d netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests
Rule reset is not concurrency-safe per-se, so multiple CPUs may reset
the same rule at the same time. At least counter and quota expressions
will suffer from value underruns in this case.

Prevent this by introducing dedicated locking callbacks for nfnetlink
and the asynchronous dump handling to serialize access.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:29 +02:00
Phil Sutter
1578c32877 netfilter: nf_tables: Introduce nf_tables_getrule_single()
Outsource the reply skb preparation for non-dump getrule requests into a
distinct function. Prep work for rule reset locking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:29 +02:00
Phil Sutter
8877393029 netfilter: nf_tables: Open-code audit log call in nf_tables_getrule()
The table lookup will be dropped from that function, so remove that
dependency from audit logging code. Using whatever is in
nla[NFTA_RULE_TABLE] is sufficient as long as the previous rule info
filling succeded.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:29 +02:00
Florian Westphal
7d259f021a netfilter: nft_set_rbtree: prefer sync gc to async worker
There is no need for asynchronous garbage collection, rbtree inserts
can only happen from the netlink control plane.

We already perform on-demand gc on insertion, in the area of the
tree where the insertion takes place, but we don't do a full tree
walk there for performance reasons.

Do a full gc walk at the end of the transaction instead and
remove the async worker.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:29 +02:00
Florian Westphal
8079fc30f7 netfilter: nft_set_rbtree: rename gc deactivate+erase function
Next patch adds a cllaer that doesn't hold the priv->write lock and
will need a similar function.

Rename the existing function to make it clear that it can only
be used for opportunistic gc during insertion.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24 13:16:29 +02:00
Eric Dumazet
2a7c8d291f tcp: introduce tcp_clock_ms()
It delivers current TCP time stamp in ms unit, and is used
in place of confusing tcp_time_stamp_raw()

It is the same family than tcp_clock_ns() and tcp_clock_ms().

tcp_time_stamp_raw() will be replaced later for TSval
contexts with a more descriptive name.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23 09:35:01 +01:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2fb ("wifi: mac80211: fix error path key leak")
  2a8b665e6bcc ("wifi: mac80211: remove key_mtx")
  7d6904bf26b9 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Pablo Neira Ayuso
f86fb94011 netfilter: nf_tables: revert do not remove elements if set backend implements .abort
nf_tables_abort_release() path calls nft_set_elem_destroy() for
NFT_MSG_NEWSETELEM which releases the element, however, a reference to
the element still remains in the working copy.

Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:32 +02:00
Pablo Neira Ayuso
d111692a59 netfilter: nft_set_rbtree: .deactivate fails if element has expired
This allows to remove an expired element which is not possible in other
existing set backends, this is more noticeable if gc-interval is high so
expired elements remain in the tree. On-demand gc also does not help in
this case, because this is delete element path. Return NULL if element
has expired.

Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:32 +02:00
Phil Sutter
1baf0152f7 netfilter: nf_tables: audit log object reset once per table
When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:43:40 +02:00
Florian Westphal
2560016721 netfilter: nf_tables: de-constify set commit ops function argument
The set backend using this already has to work around this via ugly
cast, don't spread this pattern.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 10:26:43 +02:00
Florian Westphal
e0d4593140 netfilter: make nftables drops visible in net dropmonitor
net_dropmonitor blames core.c:nf_hook_slow.
Add NF_DROP_REASON() helper and use it in nft_do_chain().

The helper releases the skb, so exact drop location becomes
available. Calling code will observe the NF_STOLEN verdict
instead.

Adjust nf_hook_slow so we can embed an erro value wih
NF_STOLEN verdicts, just like we do for NF_DROP.

After this, drop in nftables can be pinpointed to a drop due
to a rule or the chain policy.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 10:26:43 +02:00
Florian Westphal
35c038b0a4 netfilter: nf_nat: mask out non-verdict bits when checking return value
Same as previous change: we need to mask out the non-verdict bits, as
upcoming patches may embed an errno value in NF_STOLEN verdicts too.

NF_DROP could already do this, but not all called functions do this.

Checks that only test ret vs NF_ACCEPT are fine, the 'errno parts'
are always 0 for those.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 10:26:43 +02:00
Florian Westphal
6291b3a67a netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts
This function calls helpers that can return nf-verdicts, but then
those get converted to -1/0 as thats what the caller expects.

Theoretically NF_DROP could have an errno number set in the upper 24
bits of the return value. Or any of those helpers could return
NF_STOLEN, which would result in use-after-free.

This is fine as-is, the called functions don't do this yet.

But its better to avoid possible future problems if the upcoming
patchset to add NF_DROP_REASON() support gains further users, so remove
the 0/-1 translation from the picture and pass the verdicts down to
the caller.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 10:26:43 +02:00
Florian Westphal
4d26ab0086 netfilter: nf_tables: mask out non-verdict bits when checking return value
nftables trace infra must mask out the non-verdict bit parts of the
return value, else followup changes that 'return errno << 8 | NF_STOLEN'
will cause breakage.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 10:26:43 +02:00
Florian Westphal
d351c1ea2d netfilter: nft_payload: fix wrong mac header matching
mcast packets get looped back to the local machine.
Such packets have a 0-length mac header, we should treat
this like "mac header not set" and abort rule evaluation.

As-is, we just copy data from the network header instead.

Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Blažej Krajňák <krajnak@levonet.sk>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Xingyuan Mo
505ce0630a nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()
We should check whether the NFTA_EXPR_NAME netlink attribute is present
before accessing it, otherwise a null pointer deference error will occur.

Call Trace:
 <TASK>
 dump_stack_lvl+0x4f/0x90
 print_report+0x3f0/0x620
 kasan_report+0xcd/0x110
 __asan_load2+0x7d/0xa0
 nla_strcmp+0x2f/0x90
 __nft_expr_type_get+0x41/0xb0
 nft_expr_inner_parse+0xe3/0x200
 nft_inner_init+0x1be/0x2e0
 nf_tables_newrule+0x813/0x1230
 nfnetlink_rcv_batch+0xec3/0x1170
 nfnetlink_rcv+0x1e4/0x220
 netlink_unicast+0x34e/0x4b0
 netlink_sendmsg+0x45c/0x7e0
 __sys_sendto+0x355/0x370
 __x64_sys_sendto+0x84/0xa0
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Xingyuan Mo
52177bbf19 nf_tables: fix NULL pointer dereference in nft_inner_init()
We should check whether the NFTA_INNER_NUM netlink attribute is present
before accessing it, otherwise a null pointer deference error will occur.

Call Trace:
 dump_stack_lvl+0x4f/0x90
 print_report+0x3f0/0x620
 kasan_report+0xcd/0x110
 __asan_load4+0x84/0xa0
 nft_inner_init+0x128/0x2e0
 nf_tables_newrule+0x813/0x1230
 nfnetlink_rcv_batch+0xec3/0x1170
 nfnetlink_rcv+0x1e4/0x220
 netlink_unicast+0x34e/0x4b0
 netlink_sendmsg+0x45c/0x7e0
 __sys_sendto+0x355/0x370
 __x64_sys_sendto+0x84/0xa0
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso
4c90bba60c netfilter: nf_tables: do not refresh timeout when resetting element
The dump and reset command should not refresh the timeout, this command
is intended to allow users to list existing stateful objects and reset
them, element expiration should be refresh via transaction instead with
a specific command to achieve this, otherwise this is entering combo
semantics that will be hard to be undone later (eg. a user asking to
retrieve counters but _not_ requiring to refresh expiration).

Fixes: 079cd633219d ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Kees Cook
d51c42cdef netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct nft_pipapo_match.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Florian Westphal
2e1d175410 netfilter: nfnetlink_log: silence bogus compiler warning
net/netfilter/nfnetlink_log.c:800:18: warning: variable 'ctinfo' is uninitialized

The warning is bogus, the variable is only used if ct is non-NULL and
always initialised in that case.  Init to 0 too to silence this.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309100514.ndBFebXN-lkp@intel.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso
ebd032fa88 netfilter: nf_tables: do not remove elements if set backend implements .abort
pipapo set backend maintains two copies of the datastructure, removing
the elements from the copy that is going to be discarded slows down
the abort path significantly, from several minutes to few seconds after
this patch.

Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-12 10:28:45 +02:00
Florian Westphal
6ac9c51eeb netfilter: conntrack: prefer tcp_error_log to pr_debug
pr_debug doesn't provide any information other than that a packet
did not match existing state but also was found to not create a new
connection.

Replaces this with tcp_error_log, which will also dump packets'
content so one can see if this is a stray FIN or RST.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Florian Westphal
8a23f4ab92 netfilter: conntrack: simplify nf_conntrack_alter_reply
nf_conntrack_alter_reply doesn't do helper reassignment anymore.
Remove the comments that make this claim.

Furthermore, remove dead code from the function and place ot
in nf_conntrack.h.

Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
99ab9f84b8 netfilter: nf_tables: Don't allocate nft_rule_dump_ctx
Since struct netlink_callback::args is not used by rule dumpers anymore,
use it to hold nft_rule_dump_ctx. Add a build-time check to make sure it
won't ever exceed the available space.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
8194d599bc netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx
In order to move the context into struct netlink_callback's scratch
area, the latter must be unused first.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:28 +02:00
Phil Sutter
405c8fd62d netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx
This relieves the dump callback from having to check nlmsg_type upon
each call and instead performs the check once in .start callback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:27 +02:00
Phil Sutter
30fa41a0f6 netfilter: nf_tables: Drop pointless memset when dumping rules
None of the dump callbacks uses netlink_callback::args beyond the first
element, no need to zero the data.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:34:20 +02:00
Phil Sutter
afed2b54c5 netfilter: nf_tables: Always allocate nft_rule_dump_ctx
It will move into struct netlink_callback's scratch area later, just put
nf_tables_dump_rules_start in shape to reduce churn later.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-10 16:01:42 +02:00
Jakub Kicinski
2606cf059c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts (or adjacent changes of note).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-05 13:16:47 -07:00
Jakub Kicinski
07cf7974a2 netfilter pull request 2023-09-28
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUVjwUNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AKneEACzrKtIC0j0DyhgVW4Kb57T8Y7cD5wQCv7oz1Cx
 8A3UJ1pSLYhRnz94zY453GIenK+zx/KKIetDhyWnjA9gjk95HkUN+OwuuiKnUAgu
 7KPGbIYat7hERwoZpR88nrbTYXcDZfcZGTqWA++3yL2vn4Lu4lsuowqXYKBf/axk
 5gEwEtwn2mVsdo0qTVJcXkHqnf5CCdqd26ixF4yB1rz/P6kISi4I9q7ul43paFJW
 +/ifacdG+7raQkGlUlYiDNMVd0uO01HHaAcWfYa+FOMK+GSn+89zzTs906CU0g2O
 GRJSWjNTgfDtM2AHN7peUnf/G9XHSK2Y7Re8FzauKzwWSl5N9w5610nbQnT+ME5O
 uOZE1P/lhnidOwCEV8zU4yhs6fBrCMCHz+S5Yh8C8PCUhi12IEEYRHyGCoUVMOwY
 1LINjdn4HddL57QUGumy0VqVBlxQru8VXnlzm0eIyhsbZ3/mVXQWIHX4u1G36UUQ
 zSkm4/qP4kna/tV86mETNX1MUcJsQ1vQ842abcUbxudKei/uT9av6YHlz/aBOcQZ
 NDMrGVO6mjh7/HnYUr7+zbQfhLZdg424SpGEoiuS7dDcTpGlcT3pnWBJDGEHsy+4
 0VnLI8/GPT1/jQCCYTVLu+tn0XmfZF18j2bvGhz1hM9J/HXaRpuqjGF6thLgYl63
 CZf5Yg==
 =ALU2
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

First patch, from myself, is a bug fix. The issue (connect timeout) is
ancient, so I think its safe to give this more soak time given the esoteric
conditions needed to trigger this.
Also updates the existing selftest to cover this.

Add netlink extacks when an update references a non-existent
table/chain/set.  This allows userspace to provide much better
errors to the user, from Pablo Neira Ayuso.

Last patch adds more policy checks to nf_tables as a better
alternative to the existing runtime checks, from Phil Sutter.

* tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY
  netfilter: nf_tables: missing extended netlink error in lookup functions
  selftests: netfilter: test nat source port clash resolution interaction with tcp early demux
  netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash
====================

Link: https://lore.kernel.org/r/20230928144916.18339-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04 14:25:37 -07:00
Florian Westphal
087388278e netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
nft_rbtree_gc_elem() walks back and removes the end interval element that
comes before the expired element.

There is a small chance that we've cached this element as 'rbe_ge'.
If this happens, we hold and test a pointer that has been queued for
freeing.

It also causes spurious insertion failures:

$ cat test-testcases-sets-0044interval_overlap_0.1/testout.log
Error: Could not process rule: File exists
add element t s {  0 -  2 }
                   ^^^^^^
Failed to insert  0 -  2 given:
table ip t {
        set s {
                type inet_service
                flags interval,timeout
                timeout 2s
                gc-interval 2s
        }
}

The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.

Reason is that when we try to insert, the tree may hold an expired
element that collides with the range we're adding.
While we do evict/erase this element, we can trip over this check:

if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
      return -ENOTEMPTY;

rbe_ge was erased by the synchronous gc, we should not have done this
check.  Next attempt won't find it, so retry results in successful
insertion.

Restart in-kernel to avoid such spurious errors.

Such restart are rare, unless userspace intentionally adds very large
numbers of elements with very short timeouts while setting a huge
gc interval.

Even in this case, this cannot loop forever, on each retry an existing
element has been removed.

As the caller is holding the transaction mutex, its impossible
for a second entity to add more expiring elements to the tree.

After this it also becomes feasible to remove the async gc worker
and perform all garbage collection from the commit path.

Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04 15:57:28 +02:00
Phil Sutter
0d880dc6f0 netfilter: nf_tables: Deduplicate nft_register_obj audit logs
When adding/updating an object, the transaction handler emits suitable
audit log entries already, the one in nft_obj_notify() is redundant. To
fix that (and retain the audit logging from objects' 'update' callback),
Introduce an "audit log free" variant for internal use.

Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04 15:57:06 +02:00
Xin Long
8e56b063c8 netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
In Scenario A and B below, as the delayed INIT_ACK always changes the peer
vtag, SCTP ct with the incorrect vtag may cause packet loss.

Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK

  192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772]
    192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151]
    192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772]
  192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] *
  192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
    192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
    192.168.1.2 > 192.168.1.1: [COOKIE ACK]

Scenario B: INIT_ACK is delayed until the peer completes its own handshake

  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
    192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
    192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
  192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] *

This patch fixes it as below:

In SCTP_CID_INIT processing:
- clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
  ct->proto.sctp.init[!dir]. (Scenario E)
- set ct->proto.sctp.init[dir].

In SCTP_CID_INIT_ACK processing:
- drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
  ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
- drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
  ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)

In SCTP_CID_COOKIE_ACK processing:
- clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir].
  (Scenario D)

Also, it's important to allow the ct state to move forward with cookie_echo
and cookie_ack from the opposite dir for the collision scenarios.

There are also other Scenarios where it should allow the packet through,
addressed by the processing above:

Scenario C: new CT is created by INIT_ACK.

Scenario D: start INIT on the existing ESTABLISHED ct.

Scenario E: start INIT after the old collision on the existing ESTABLISHED
ct.

  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
  192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
  (both side are stopped, then start new connection again in hours)
  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742]

Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04 14:12:01 +02:00