IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In the quest to remove all stack VLA usage removed from the kernel[1],
just use XFRM_MAX_DEPTH as already done for the "class" array. In one
case, it'll do this loop up to 5, the other caller up to 6.
[1] https://lkml.org/lkml/2018/3/7/621
Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The addrconf_ifdown() evaluates keep_addr_on_down state twice. There
is no need to do it.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
ECMP (equal-cost multipath) hashes are typically computed on the packets'
5-tuple(src IP, dst IP, src port, dst port, L4 proto).
For encapsulated packets, the L4 data is not readily available and ECMP
hashing will often revert to (src IP, dst IP). This will lead to traffic
polarization on a single ECMP path, causing congestion and waste of network
capacity.
In IPv6, the 20-bit flow label field is also used as part of the ECMP hash.
In the lack of L4 data, the hashing will be on (src IP, dst IP, flow
label). Having a non-zero flow label is thus important for proper traffic
load balancing when L4 data is unavailable (i.e., when packets are
encapsulated).
Currently, the seg6_do_srh_encap() function extracts the original packet's
flow label and set it as the outer IPv6 flow label. There are two issues
with this behaviour:
a) There is no guarantee that the inner flow label is set by the source.
b) If the original packet is not IPv6, the flow label will be set to
zero (e.g., IPv4 or L2 encap).
This patch adds a function, named seg6_make_flowlabel(), that computes a
flow label from a given skb. It supports IPv6, IPv4 and L2 payloads, and
leverages the per namespace 'seg6_flowlabel" sysctl value.
The currently support behaviours are as follows:
-1 set flowlabel to zero.
0 copy flowlabel from Inner paceket in case of Inner IPv6
(Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()
This patch has been tested for IPv6, IPv4, and L2 traffic.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt6_remove_exception_rt() is called under rcu_read_lock() only.
We lock rt6_exception_lock a bit later, so we do not hold
rt6_exception_lock yet.
Fixes: 8a14e46f1402 ("net/ipv6: Fix missing rcu dereferences on from")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the check_target, ip6t_get_target is called twice.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).
In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.
Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)
This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.
A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reduces duplication of .gc and .params in flowtable type definitions and
makes the API clearer
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Useful as preparation for adding iptables support for offload.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:
1) Fix SIP conntrack with phones sending session descriptions for different
media types but same port numbers, from Florian Westphal.
2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian
Anastasov.
3) Skip compat array allocation in ebtables if there is no entries, also
from Florian.
4) Do not lose left/right bits when shifting marks from xt_connmark, from
Jack Ma.
5) Silence false positive memleak in conntrack extensions, from Cong Wang.
6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann.
7) Cannot kfree rule that is already in list in nf_tables, switch order
so this error handling is not required, from Florian Westphal.
8) Release set name in error path, from Florian.
9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell.
10) NAT chain and extensions depend on NF_TABLES.
11) Out of bound access when renaming chains, from Taehee Yoo.
12) Incorrect casting in xt_connmark leads to wrong bitshifting.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
kbuild test robot reported 2 uses of rt->from not properly accessed
using rcu_dereference:
1. add rcu_dereference_protected to rt6_remove_exception_rt and make
sure it is always called with rcu lock held.
2. change rt6_do_redirect to take a reference on 'from' when accessed
the first time so it can be used the sceond time outside of the lock
Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN reported use of uninit-value that I tracked to lack
of proper size check on RTA_TABLE attribute.
I also believe RTA_PREFSRC lacks a similar check.
Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config")
Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan reported an imbalance in fib6_check on use of f6i and checking
whether it is null. Since fib6_check is only called if f6i is non-null,
remove the unnecessary check.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a dst entry is created from a fib entry, the 'from' in rt6_info
is set to the fib entry. The 'from' reference is used most notably for
cookie checking - making sure stale dst entries are updated if the
fib entry is changed.
When a fib entry is deleted, the pcpu routes on it are walked releasing
the fib6_info reference. This is needed for the fib6_info cleanup to
happen and to make sure all device references are released in a timely
manner.
There is a race window when a FIB entry is deleted and the 'from' on the
pcpu route is dropped and the pcpu route hits a cookie check. Handle
this race using rcu on from.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A later patch protects 'from' in rt6_info and this simplifies the
locking needed by it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A later patch protects 'from' in rt6_info and this simplifies the
locking needed by it.
With the move, the fib6_info_hold for the uncached_rt is no longer
needed since the rcu_lock is still held.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt6_get_cookie_safe takes a fib6_info and checks the sernum of
the node. Update the name to reflect its purpose.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt6_clean_expires and rt6_set_expires are no longer used. Removed them.
rt6_update_expires has 1 caller in route.c, so move it from the header.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduces the number of cache lines touched in the offload forwarding
path. This is safe because PMTU limits are bypassed for the forwarding
path (see commit f87c10a8aa1e for more details).
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just like ip_dst_mtu_maybe_forward(), to avoid a dependency with ipv6.ko.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so
the gfp argument to addrconf_prefix_route can not be GFP_KERNEL.
While scrubbing other calls I noticed addrconf_addr_gen has one
place with GFP_ATOMIC that can be GFP_KERNEL.
Fixes: acb54e3cba404 ("net/ipv6: Add gfp_flags to route add functions")
Reported-by: syzbot+2add39b05179b31f912f@syzkaller.appspotmail.com
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib6_idev can be obtained from __in6_dev_get on the nexthop device
rather than caching it in the fib6_info. Remove it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.
Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change a couple of device checks during route lookups
are no longer needed. Remove them.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aca_idev has only 1 user - inet6_fill_ifacaddr - and it only
wants the device index which can be extracted from the fib6_info
nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
addrconf_dst_alloc now returns a fib6_info. Update the name
and its users to reflect the change.
Rename only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.
Rename only; not functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move these options inside the scope of the 'if' NF_TABLES and
NF_TABLES_IPV6 dependencies. This patch fixes:
net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_nat_do_chain':
>> net/ipv6/netfilter/nft_chain_nat_ipv6.c:37: undefined reference to `nft_do_chain'
net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_exit':
>> net/ipv6/netfilter/nft_chain_nat_ipv6.c:94: undefined reference to `nft_unregister_chain_type'
net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_init':
>> net/ipv6/netfilter/nft_chain_nat_ipv6.c:87: undefined reference to `nft_register_chain_type'
that happens with:
CONFIG_NF_TABLES=m
CONFIG_NFT_CHAIN_NAT_IPV6=y
Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.
It complains because of following chains :
1) sch_direct_xmit() (lock txq->_xmit_lock)
dev_hard_start_xmit()
xmit_one()
dev_queue_xmit_nit()
packet_rcv_fanout()
ip_check_defrag()
ip_defrag()
spin_lock() (lock frag queue spinlock)
2) ip6_input_finish()
ipv6_frag_rcv() (lock frag queue spinlock)
ip6_frag_queue()
icmpv6_param_prob() (lock txq->_xmit_lock at some point)
We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop unneeded elements from rt6_info struct and rearrange layout to
something more relevant for the data path.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert all code paths referencing a FIB entry from
rt6_info to fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last step before flipping the data type for FIB entries:
- use fib6_info_alloc to create FIB entries in ip6_route_info_create
and addrconf_dst_alloc
- use fib6_info_release in place of dst_release, ip6_rt_put and
rt6_release
- remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
- when purging routes, drop per-cpu routes
- replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
- use rt->from since it points to the FIB entry
- drop references to exception bucket, fib6_metrics and per-cpu from
dst entries (those are relevant for fib entries only)
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add fib6_info struct and alloc, destroy, hold and release helpers.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 FIB will only contain FIB entries with exception routes added to
the FIB entry. Once this transformation is complete, FIB lookups will
return a fib6_info with the lookup functions still returning a dst
based rt6_info. The current code uses rt6_info for both paths and
overloads the rt6_info variable usually called 'rt'.
This patch introduces a new 'f6i' variable name for the result of the FIB
lookup and keeps 'rt' as the dst based return variable. 'f6i' becomes a
fib6_info in a later patch which is why it is introduced as f6i now;
avoids the additional churn in the later patch.
In addition, remove RTF_CACHE and dst checks from fib6 add and delete
since they can not happen now and will never happen after the data
type flip.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most FIB entries can be added using memory allocated with GFP_KERNEL.
Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that
can be reached from the packet path (e.g., ndisc and autoconfig) or
atomic notifiers use GFP_ATOMIC; paths from user context (adding
addresses and routes) use GFP_KERNEL.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The router discovery code has a FIB entry and wants to validate the
gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
for IPv6 and create a new function that takes the gateway and device
and returns a neighbor entry. Use the new function in
ndisc_router_discovery to validate the gateway.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Continuing to wean FIB paths off of dst_entry, use a bool to hold
requests for certain dst settings. Add a helper to convert the
flags to DST flags when a FIB entry is converted to a dst_entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_pol_route_lookup is the lookup function for ip6_route_lookup and
rt6_lookup. At the moment it returns either a reference to a FIB entry
or a cached exception. To move FIB entries to a separate struct, this
lookup function needs to convert FIB entries to an rt6_info that is
returned to the caller.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_null_entry will stay a dst based return for lookups that fail to
match an entry.
Add a new fib6_null_entry which constitutes the root node and leafs
for fibs. Replace existing references to ip6_null_entry with the
new fib6_null_entry when dealing with FIBs.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add expires to rt6_info for FIB entries, and add fib6 helpers to
manage it. Data path use of dst.expires remains.
The transition is fairly straightforward: when working with fib entries,
rt->dst.expires is just rt->expires, rt6_clean_expires is replaced with
fib6_clean_expires, rt6_set_expires becomes fib6_set_expires, and
rt6_check_expired becomes fib6_check_expired, where the fib6 versions
are added by this patch.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to IPv4, add fib metrics to the fib struct, which at the moment
is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics
into dst by reference using refcount.
To make the transition:
- add dst_metrics to rt6_info. Default to dst_default_metrics if no
metrics are passed during route add. No need for a separate pmtu
entry; it can reference the MTU slot in fib6_metrics
- ip6_convert_metrics allocates memory in the FIB entry and uses
ip_metrics_convert to copy from netlink attribute to metrics entry
- the convert metrics call is done in ip6_route_info_create simplifying
the route add path
+ fib6_commit_metrics and fib6_copy_metrics and the temporary
mx6_config are no longer needed
- add fib6_metric_set helper to change the value of a metric in the
fib entry since dst_metric_set can no longer be used
- cow_metrics for IPv6 can drop to dst_cow_metrics_generic
- rt6_dst_from_metrics_check is no longer needed
- rt6_fill_node needs the FIB entry and dst as separate arguments to
keep compatibility with existing output. Current dst address is
renamed to dest.
(to be consistent with IPv4 rt6_fill_node really should be split
into 2 functions similar to fib_dump_info and rt_fill_info)
- rt6_fill_node no longer needs the temporary metrics variable
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defer setting dst input, output and error until fib entry is copied.
The reject path from ip6_route_info_create is moved to a new function
ip6_rt_init_dst_reject with a helper doing the conversion from fib6_type
to dst error.
The remainder of the new ip6_rt_init_dst is an amalgamtion of dst code
from addrconf_dst_alloc and the non-reject path of ip6_route_info_create.
The dst output function is always ip6_output and the input function is
either ip6_input (local routes), ip6_mc_input (multicast routes) or
ip6_forward (anything else).
A couple of places using dst.error are updated to look at rt6i_flags.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce fib6_nh structure and move nexthop related data from
rt6_info and rt6_info.dst to fib6_nh. References to dev, gateway or
lwtstate from a FIB lookup perspective are converted to use fib6_nh;
datapath references to dst version are left as is.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTN_ type for IPv6 FIB entries is currently embedded in rt6i_flags
and dst.error. Since dst is going to be removed, it can no longer be
relied on for FIB dumps so save the route type as fib6_type.
fc_type is set in current users based on the algorithm in rt6_fill_node:
- rt6i_flags contains RTF_LOCAL: fc_type = RTN_LOCAL
- rt6i_flags contains RTF_ANYCAST: fc_type = RTN_ANYCAST
- else fc_type = RTN_UNICAST
Similarly, fib6_type is set in the rt6_info templates based on the
RTF_REJECT section of rt6_fill_node converting dst.error to RTN type.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass network namespace reference into route add, delete and get
functions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>