IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
NF_REPEAT places the packet at the beginning of the iptables chain
instead of accepting or rejecting it right away. The packet however will
reach the end of the chain and continue to the end of iptables
eventually, so it needs the same handling as NF_ACCEPT and NF_DROP.
Fixes: 368982cd7d ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks")
Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Compiler did not catch incorrect typing in the rcu hook assignment.
% nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10
% iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp
dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000
The CT target bails out with incorrect layer 3 protocol number.
Fixes: 6c1fd7dc48 ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object")
Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that cttimeout support for nft_ct is in place, these should depend
on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the
policy if this option is not enabled.
[ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[...]
[ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246
[...]
[ 71.600188] Call Trace:
[ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Doug Smythies says:
Sometimes it is desirable to temporarily disable, or clear,
the iptables rule set on a computer being controlled via a
secure shell session (SSH). While unwise on an internet facing
computer, I also do it often on non-internet accessible computers
while testing. Recently, this has become problematic, with the
SSH session being dropped upon re-load of the rule set.
The problem is that when all rules are deleted, conntrack hooks get
unregistered.
In case the rules are re-added later, its possible that tcp window
has moved far enough so that all packets are considered invalid (out of
window) until entry expires (which can take forever, default
established timeout is 5 days).
Fix this by clearing maxwin of existing tcp connections on register.
v2: don't touch entries on hook removal.
v3: remove obsolete expiry check.
Reported-by: Doug Smythies <dsmythies@telus.net>
Fixes: 4d3a57f23d ("netfilter: conntrack: do not enable connection tracking unless needed")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NF_TABLES_IPV4 is now boolean so it is possible to set
NF_TABLES=m
NF_TABLES_IPV4=y
NFT_CHAIN_NAT_IPV4=y
which causes:
nft_chain_nat_ipv4.c:(.text+0x6d): undefined reference to `nft_do_chain'
Wrap NFT_CHAIN_NAT_IPV4 and related nat expressions with NF_TABLES to
restore the dependency.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 02c7b25e5f ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Using a private template is problematic:
1. We can't assign both a zone and a timeout policy
(zone assigns a conntrack template, so we hit problem 1)
2. Using a template needs to take care of ct refcount, else we'll
eventually free the private template due to ->use underflow.
This patch reworks template policy to instead work with existing conntrack.
As long as such conntrack has not yet been placed into the hash table
(unconfirmed) we can still add the timeout extension.
The only caveat is that we now need to update/correct ct->timeout to
reflect the initial/new state, otherwise the conntrack entry retains the
default 'new' timeout.
Side effect of this change is that setting the policy must
now occur from chains that are evaluated *after* the conntrack lookup
has taken place.
No released kernel contains the timeout policy feature yet, so this change
should be ok.
Changes since v2:
- don't handle 'ct is confirmed case'
- after previous patch, no need to special-case tcp/dccp/sctp timeout
anymore
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tcp, sctp and dccp trackers re-use the userspace ctnetlink states
to index their timeout arrays, which means timeout[0] is never
used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well
so external users can simply read it off timeouts[0] without need to
differentiate dccp/sctp/tcp and udp/icmp/gre/generic.
The alternative is to map all array accesses to 'i - 1', but that
is a much more intrusive change.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Satish Patel reports a skb_warn_bad_offload() splat caused
by -j CHECKSUM rules:
-A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM
The CHECKSUM target has never worked with GSO skbs, and the above rule
makes no sense as kernel will handle checksum updates on transmit.
Unfortunately, there are 3rd party tools that install such rules, so we
cannot reject this from the config plane without potential breakage.
Amend Kconfig text to clarify that the CHECKSUM target is only useful
in virtualized environments, where old dhcp clients that use AF_PACKET
used to discard UDP packets with a 'bad' header checksum and add a
one-time warning in case such rule isn't restricted to UDP.
v2: check IP6T_F_PROTO flag before cmp (Michal Kubecek)
Reported-by: Satish Patel <satish.txt@gmail.com>
Reported-by: Markos Chandras <markos.chandras@suse.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The cluster match requires conntrack for matching packets. If the
netns does not have conntrack hooks registered, the match does not
work at all.
Implicitly load the conntrack hook for the family, exactly as many
other extensions do. This ensures that the match works even if the
hooks have not been registered by other means.
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 6edb3c96a5 ("net/ipv6: Defer initialization of dst to data path")
forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
Fix it by setting anycast rt->dst.input back to ip6_input.
Fixes: 6edb3c96a5 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit fixes a corner case where TCP BBR would enter PROBE_RTT
mode but not reduce its cwnd. If a TCP receiver ACKed less than one
full segment, the number of delivered/acked packets was 0, so that
bbr_set_cwnd() would short-circuit and exit early, without cutting
cwnd to the value we want for PROBE_RTT.
The fix is to instead make sure that even when 0 full packets are
ACKed, we do apply all the appropriate caps, including the cap that
applies in PROBE_RTT mode.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix the case where BBR does not exit PROBE_RTT mode when
it restarts from idle. When BBR restarts from idle and if BBR is in
PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If
yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its
full value.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add a helper function bbr_check_probe_rtt_done() to
1. check the condition to see if bbr should exit probe_rtt mode;
2. process the logic of exiting probe_rtt mode.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.
1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.
Geoff Alexander reported this could be used to build off-path attacks.
These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.
We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Geoff Alexander <alexandg@cs.unm.edu>
Tested-by: Geoff Alexander <alexandg@cs.unm.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the 3 callers of addrconf_add_mroute() assert RTNL
lock, they don't take any additional lock either, so
it is safe to convert it to GFP_KERNEL.
Same for sit_add_v4_addrs().
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TC filter flow mapping override completely skipped the call to
cake_hash(); however that meant that the internal state was not being
updated, which ultimately leads to deadlocks in some configurations. Fix
that by passing the overridden flow ID into cake_hash() instead so it can
react appropriately.
In addition, the major number of the class ID can now be set to override
the host mapping in host isolation mode. If both host and flow are
overridden (or if the respective modes are disabled), flow dissection and
hashing will be skipped entirely; otherwise, the hashing will be kept for
the portions that are not set by the filter.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI
flag, causing additional package information after the first to be lost.
Also fixup a sanity check in ncsi_write_package_info() to reject out of
range package IDs.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!
Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.
And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.
Fixes: 817e9f2c5c ("act_ife: acquire ife_mod_lock before reading ifeoplist")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().
This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Vlad Buslov <vladbu@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 42c625a486 ("net: sched: act_ife: disable bh
when taking ife_mod_lock"), because what ife_mod_lock protects
is absolutely not touched in rate est timer BH context, they have
no race.
A better fix is following up.
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 90b73b77d0, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.
Fixes: 90b73b77d0 ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_idr_check() is replaced by tcf_idr_check_alloc(),
and __tcf_idr_check() now can be folded into tcf_idr_search().
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.
More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().
So it can be just removed.
Fixes: b409074e66 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_action_put_many() is mostly called to clean up actions on
failure path, but tcf_action_put_many(&actions[acts_deleted]) is
used in the ugliest way: it passes a slice of the array and
uses an additional NULL at the end to avoid out-of-bound
access.
acts_deleted is completely unnecessary since we can teach
tcf_action_put_many() scan the whole array and checks against
NULL pointer. Which also means tcf_action_delete() should
set deleted action pointers to NULL to avoid double free.
Fixes: 90b73b77d0 ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove including <linux/version.h> that don't need it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to the introduction of fib6_info lwtstate was managed by the dst
code. With fib6_info releasing lwtstate needs to be done when the struct
is freed.
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix races in IPVS, from Tan Hu.
2) Missing unbind in matchall classifier, from Hangbin Liu.
3) Missing act_ife action release, from Vlad Buslov.
4) Cure lockdep splats in ila, from Cong Wang.
5) veth queue leak on link delete, from Toshiaki Makita.
6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From
Kees Cook.
7) RCU usage fixup in XDP, from Tariq Toukan.
8) Two TCP ULP fixes from Daniel Borkmann.
9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
Kallweit.
10) Always take tcf_lock with BH disabled, otherwise we can deadlock
with rate estimator code paths. From Vlad Buslov.
11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly.
From Jian-Hong Pan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ip6_vti: fix creating fallback tunnel device for vti6
ip_vti: fix a null pointer deferrence when create vti fallback tunnel
r8169: don't use MSI-X on RTL8106e
net: lan743x_ptp: convert to ktime_get_clocktai_ts64
net: sched: always disable bh when taking tcf_lock
ip6_vti: simplify stats handling in vti6_xmit
bpf: fix redirect to map under tail calls
r8169: add missing Kconfig dependency
tools/bpf: fix bpf selftest test_cgroup_storage failure
bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
bpf, sockmap: fix map elem deletion race with smap_stop_sock
bpf, sockmap: fix leakage of smap_psock_map_entry
tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
tcp, ulp: add alias for all ulp modules
bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
net/xdp: Fix suspicious RCU usage warning
net/mlx5e: Delete unneeded function argument
Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
isdn: Disable IIOCDBGVAR
...
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.
Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
code path.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-18
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
restrictions, from Yonghong.
2) Fix a suspicious RCU rcu_dereference_check() warning triggered
from removing a device's XDP memory allocator by using the correct
rhashtable lookup function, from Tariq.
3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
as well as enforcing module aliases for ULPs. Another fix for BPF
map redirect to make them work again with tail calls, from Daniel.
4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Infinite loop in IPVS when net namespace is released, from
Tan Hu.
2) Do not show negative timeouts in ip_vs_conn by using the new
jiffies_delta_to_msecs(), patches from Matteo Croce.
3) Set F_IFACE flag for linklocal addresses in ip6t_rpfilter,
from Florian Westphal.
4) Fix overflow in set size allocation, from Taehee Yoo.
5) Use netlink_dump_start() from ctnetlink to fix memleak from
the error path, again from Florian.
6) Register nfnetlink_subsys in last place, otherwise netns
init path may lose race and see net->nft uninitialized data.
This also reverts previous attempt to fix this by increase
netns refcount, patches from Florian.
7) Remove conntrack entries on layer 4 protocol tracker module
removal, from Florian.
8) Use GFP_KERNEL_ACCOUNT for xtables blob allocation, from
Michal Hocko.
9) Get tproxy documentation in sync with existing codebase,
from Mate Eckl.
10) Honor preset layer 3 protocol via ctx->family in the new nft_ct
timeout infrastructure, from Harsha Sharma.
11) Let uapi nfnetlink_osf.h compile standalone with no errors,
from Dmitry V. Levin.
12) Missing braces compilation warning in nft_tproxy, patch from
Mate Eclk.
13) Disregard bogus check to bail out on non-anonymous sets from
the dynamic set update extension.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This tag is the same as 9p-for-4.19 without the two MAINTAINERS patches
Contains mostly fixes (6 to be backported to stable) and a few changes,
here is the breakdown:
* Rework how fids are attributed by replacing some custom tracking in a
list by an idr (f28cdf0430)
* For packet-based transports (virtio/rdma) validate that the packet
length matches what the header says (f984579a01)
* A few race condition fixes found by syzkaller (9f476d7c54,
430ac66eb4)
* Missing argument check when NULL device is passed in sys_mount
(10aa14527f)
* A few virtio fixes (23cba9cbde, 31934da810, d28c756cae)
* Some spelling and style fixes
----------------------------------------------------------------
Chirantan Ekbote (1):
9p/net: Fix zero-copy path in the 9p virtio transport
Colin Ian King (1):
fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown"
Jean-Philippe Brucker (1):
net/9p: fix error path of p9_virtio_probe
Matthew Wilcox (4):
9p: Fix comment on smp_wmb
9p: Change p9_fid_create calling convention
9p: Replace the fidlist with an IDR
9p: Embed wait_queue_head into p9_req_t
Souptick Joarder (1):
fs/9p/vfs_file.c: use new return type vm_fault_t
Stephen Hemminger (1):
9p: fix whitespace issues
Tomas Bortoli (5):
net/9p/client.c: version pointer uninitialized
net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()
net/9p/trans_fd.c: fix race by holding the lock
9p: validate PDU length
9p: fix multiple NULL-pointer-dereferences
jiangyiwen (2):
net/9p/virtio: Fix hard lockup in req_done
9p/virtio: fix off-by-one error in sg list bounds check
piaojun (5):
net/9p/client.c: add missing '\n' at the end of p9_debug()
9p/net/protocol.c: return -ENOMEM when kmalloc() failed
net/9p/trans_virtio.c: fix some spell mistakes in comments
fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
net/9p/trans_virtio.c: add null terminal for mount tag
fs/9p/v9fs.c | 2 +-
fs/9p/vfs_file.c | 2 +-
fs/9p/xattr.c | 6 ++++--
include/net/9p/client.h | 11 ++++-------
net/9p/client.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------------------------
net/9p/protocol.c | 2 +-
net/9p/trans_fd.c | 22 +++++++++++++++-------
net/9p/trans_rdma.c | 4 ++++
net/9p/trans_virtio.c | 66 +++++++++++++++++++++++++++++++++++++---------------------------
net/9p/trans_xen.c | 3 +++
net/9p/util.c | 1 -
12 files changed, 122 insertions(+), 116 deletions(-)
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQ8idm2ZSicIMLgzKqoqIItDqvwPAUCW3ElNwAKCRCoqIItDqvw
PMzfAKCkCYFyNC89vcpxcCNsK7rFQ1qKlwCgoaBpZDdegOu0jMB7cyKwAWrB0LM=
=h3T0
-----END PGP SIGNATURE-----
Merge tag '9p-for-4.19-2' of git://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
"This contains mostly fixes (6 to be backported to stable) and a few
changes, here is the breakdown:
- rework how fids are attributed by replacing some custom tracking in
a list by an idr
- for packet-based transports (virtio/rdma) validate that the packet
length matches what the header says
- a few race condition fixes found by syzkaller
- missing argument check when NULL device is passed in sys_mount
- a few virtio fixes
- some spelling and style fixes"
* tag '9p-for-4.19-2' of git://github.com/martinetd/linux: (21 commits)
net/9p/trans_virtio.c: add null terminal for mount tag
9p/virtio: fix off-by-one error in sg list bounds check
9p: fix whitespace issues
9p: fix multiple NULL-pointer-dereferences
fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
9p: validate PDU length
net/9p/trans_fd.c: fix race by holding the lock
net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()
net/9p/virtio: Fix hard lockup in req_done
net/9p/trans_virtio.c: fix some spell mistakes in comments
9p/net: Fix zero-copy path in the 9p virtio transport
9p: Embed wait_queue_head into p9_req_t
9p: Replace the fidlist with an IDR
9p: Change p9_fid_create calling convention
9p: Fix comment on smp_wmb
net/9p/client.c: version pointer uninitialized
fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown"
net/9p: fix error path of p9_virtio_probe
9p/net/protocol.c: return -ENOMEM when kmalloc() failed
net/9p/client.c: add missing '\n' at the end of p9_debug()
...
Commits 109980b894 ("bpf: don't select potentially stale ri->map
from buggy xdp progs") and 7c30013133 ("bpf: fix ri->map_owner
pointer on bpf_prog_realloc") tried to mitigate that buggy programs
using bpf_redirect_map() helper call do not leave stale maps behind.
Idea was to add a map_owner cookie into the per CPU struct redirect_info
which was set to prog->aux by the prog making the helper call as a
proof that the map is not stale since the prog is implicitly holding
a reference to it. This owner cookie could later on get compared with
the program calling into BPF whether they match and therefore the
redirect could proceed with processing the map safely.
In (obvious) hindsight, this approach breaks down when tail calls are
involved since the original caller's prog->aux pointer does not have
to match the one from one of the progs out of the tail call chain,
and therefore the xdp buffer will be dropped instead of redirected.
A way around that would be to fix the issue differently (which also
allows to remove related work in fast path at the same time): once
the life-time of a redirect map has come to its end we use it's map
free callback where we need to wait on synchronize_rcu() for current
outstanding xdp buffers and remove such a map pointer from the
redirect info if found to be present. At that time no program is
using this map anymore so we simply invalidate the map pointers to
NULL iff they previously pointed to that instance while making sure
that the redirect path only reads out the map once.
Fixes: 97f91a7cf0 ("bpf: add bpf_redirect_map helper routine")
Fixes: 109980b894 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
Reported-by: Sebastiano Miano <sebastiano.miano@polito.it>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I found that in BPF sockmap programs once we either delete a socket
from the map or we updated a map slot and the old socket was purged
from the map that these socket can never get reattached into a map
even though their related psock has been dropped entirely at that
point.
Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops
intact, so that on the next tcp_set_ulp_id() the kernel returns an
-EEXIST thinking there is still some active ULP attached.
BPF sockmap is the only one that has this issue as the other user,
kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas
sockmap semantics allow dropping the socket from the map with all
related psock state being cleaned up.
Fixes: 1aa12bdf1b ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:
[root@bar]# cat foo.c
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/tcp.h>
#include <linux/in.h>
int main(void)
{
int sock = socket(PF_INET, SOCK_STREAM, 0);
setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
return 0;
}
[root@bar]# gcc foo.c -O2 -Wall
[root@bar]# lsmod | grep sctp
[root@bar]# ./a.out
[root@bar]# lsmod | grep sctp
sctp 1077248 4
libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
[root@bar]#
Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:
[root@bar]# lsmod | grep sctp
[root@bar]# ./a.out
[root@bar]# lsmod | grep sctp
[root@bar]#
Sockmap is not affected from this since it's either built-in or not.
Fixes: 734942cc4e ("tcp: ULP infrastructure")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
rdma.git merge resolution for the 4.19 merge window
Conflicts:
drivers/infiniband/core/rdma_core.c
- Use the rdma code and revise with the new spelling for
atomic_fetch_add_unless
drivers/nvme/host/rdma.c
- Replace max_sge with max_send_sge in new blk code
drivers/nvme/target/rdma.c
- Use the blk code and revise to use NULL for ib_post_recv when
appropriate
- Replace max_sge with max_recv_sge in new blk code
net/rds/ib_send.c
- Use the net code and revise to use NULL for ib_post_recv when
appropriate
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This reverts commit ddb457c699.
The include rdma/ib_cache.h is kept, and we have to add a memset
to the compat wrapper to avoid compiler warnings in gcc-7
This revert is done to avoid extensive merge conflicts with SMC
changes in netdev during the 4.19 merge window.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Action init API was changed to always take reference to action, even when
overwriting existing action. Substitute conditional action release, which
was executed only if action is newly created, with unconditional release in
tcf_ife_init() error handling code to prevent double free or memory leak in
case of overwrite.
Fixes: 4e8ddd7f17 ("net: sched: don't release reference on action overwrite")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
L47IdXo=
=sJkt
-----END PGP SIGNATURE-----
Merge tag 'v4.18' into rdma.git for-next
Resolve merge conflicts from the -rc cycle against the rdma.git tree:
Conflicts:
drivers/infiniband/core/uverbs_cmd.c
- New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
- Merge removal of file->ucontext in for-next with new code in -rc
drivers/infiniband/core/uverbs_main.c
- for-next removed code from ib_uverbs_write() that was modified
in for-rc
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix tcf_unbind_filter missing in cls_matchall as this will trigger
WARN_ON() in cbq_destroy_class().
Fixes: fd62d9f5c5 ("net/sched: matchall: Fix configuration race")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a warning reported by the kbuild test robot (from linux-next
tree):
net/netfilter/nft_tproxy.c: In function 'nft_tproxy_eval_v6':
>> net/netfilter/nft_tproxy.c:85:9: warning: missing braces around initializer [-Wmissing-braces]
struct in6_addr taddr = {0};
^
net/netfilter/nft_tproxy.c:85:9: warning: (near initialization for 'taddr.in6_u') [-Wmissing-braces]
This warning is actually caused by a gcc bug already resolved in newer
versions (kbuild used 4.9) so this kind of initialization is omitted and
memset is used instead.
Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>