IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.
The icmp_global_allow() call states it must be called with BH
disabled. This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path. Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.
The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process. The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.
Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is
a token bucket limiter with a global lock. It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.
The speedup should be similar -- and even more impressive -- to the
speedup from the sequence number fix in this series.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, RST packets generated by the TCP stack for non-existing
sockets always have mark 0.
The mark from the original packet is assigned to the netns_ipv4/6
socket used to send the response so that it can get copied into the
response skb when the socket sends it.
Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.
Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In matches and targets that define a kernel-only tail to their
xt_match and xt_target data structs, add a field .usersize that
specifies up to where data is to be shared with userspace.
Performed a search for comment "Used internally by the kernel" to find
relevant matches and targets. Manually inspected the structs to derive
a valid offsetof.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Convert ip6tables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
vti6 interface is registered before the rtnl_link_ops block
is attached. As a result the resulting RTM_NEWLINK is missing
IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix.
Signed-off-by: Dave Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTM_NEWADDR notification is sent when IFA_F_TENTATIVE is cleared from
the address. So if the address is added and deleted before DAD probes
completes, the RTM_DELADDR will be sent for which there was no
RTM_NEWADDR causing asymmetry in notification. However if the same
logic is used while sending RTM_DELADDR notification, this asymmetry
can be avoided.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
does not match it. e.g.,
ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument
Relax the check in setsockopt to allow setting mc_index to an L3 slave if
sk_bound_dev_if points to an L3 master.
Make a similar change for IPv6. In this case change the device lookup to
take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
the lookup of a potential L3 master device.
This really only silences a setsockopt failure since uses of mc_index are
secondary to sk_bound_dev_if if it is set. In both cases, if either index
is an L3 slave or master, lookups are directed to the same FIB table so
relaxing the check at setsockopt time causes no harm.
Patch is based on a suggested change by Darwin for a problem noted in
their code base.
Suggested-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
np is already assigned in the variable declaration of ping_v6_sendmsg.
At this point, we have already dereferenced np several times, so the
NULL check is also redundant.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an inconsistent conditional judgement between __ip6_append_data
and ip6_finish_output functions, the variable length in __ip6_append_data
just include the length of application's payload and udp6 header, don't
include the length of ipv6 header, but in ip6_finish_output use
(skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ipv6 header.
That causes some particular application's udp6 payloads whose length are
between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
though the rst->dev support UFO feature.
Add the length of ipv6 header to length in __ip6_append_data to keep
consistent conditional judgement as ip6_finish_output for ip6 fragment.
Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No point in going through loops and hoops instead of just comparing the
values.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).
This can be verified using `ip -6 route del <prefix> proto something`.
Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.
This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains a large Netfilter update for net-next,
to summarise:
1) Add support for stateful objects. This series provides a nf_tables
native alternative to the extended accounting infrastructure for
nf_tables. Two initial stateful objects are supported: counters and
quotas. Objects are identified by a user-defined name, you can fetch
and reset them anytime. You can also use a maps to allow fast lookups
using any arbitrary key combination. More info at:
http://marc.info/?l=netfilter-devel&m=148029128323837&w=2
2) On-demand registration of nf_conntrack and defrag hooks per netns.
Register nf_conntrack hooks if we have a stateful ruleset, ie.
state-based filtering or NAT. The new nf_conntrack_default_on sysctl
enables this from newly created netnamespaces. Default behaviour is not
modified. Patches from Florian Westphal.
3) Allocate 4k chunks and then use these for x_tables counter allocation
requests, this improves ruleset load time and also datapath ruleset
evaluation, patches from Florian Westphal.
4) Add support for ebpf to the existing x_tables bpf extension.
From Willem de Bruijn.
5) Update layer 4 checksum if any of the pseudoheader fields is updated.
This provides a limited form of 1:1 stateless NAT that make sense in
specific scenario, eg. load balancing.
6) Add support to flush sets in nf_tables. This series comes with a new
set->ops->deactivate_one() indirection given that we have to walk
over the list of set elements, then deactivate them one by one.
The existing set->ops->deactivate() performs an element lookup that
we don't need.
7) Two patches to avoid cloning packets, thus speed up packet forwarding
via nft_fwd from ingress. From Florian Westphal.
8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
prevent infinite loops, patch from Dwip Banerjee. And one minor
refactoring from Gao feng.
9) Revisit recent log support for nf_tables netdev families: One patch
to ensure that we correctly handle non-ethernet packets. Another
patch to add missing logger definition for netdev. Patches from
Liping Zhang.
10) Three patches for nft_fib, one to address insufficient register
initialization and another to solve incorrect (although harmless)
byteswap operation. Moreover update xt_rpfilter and nft_fib to match
lbcast packets with zeronet as source, eg. DHCP Discover packets
(0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.
11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
been broken in many-cast mode for some little time, let's give them a
chance by placing them at the same level as other existing protocols.
Thus, users don't explicitly have to modprobe support for this and
NAT rules work for them. Some people point to the lack of support in
SOHO Linux-based routers that make deployment of new protocols harder.
I guess other middleboxes outthere on the Internet are also to blame.
Anyway, let's see if this has any impact in the midrun.
12) Skip software SCTP software checksum calculation if the NIC comes
with SCTP checksum offload support. From Davide Caratti.
13) Initial core factoring to prepare conversion to hook array. Three
patches from Aaron Conole.
14) Gao Feng made a wrong conversion to switch in the xt_multiport
extension in a patch coming in the previous batch. Fix it in this
batch.
15) Get vmalloc call in sync with kmalloc flags to avoid a warning
and likely OOM killer intervention from x_tables. From Marcelo
Ricardo Leitner.
16) Update Arturo Borrero's email address in all source code headers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Acctually ntohl and htonl are identical, so this doesn't affect
anything, but it is conceptually wrong.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.
This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.
As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Keeps some noise away from a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On SMP we overload the packet counter (unsigned long) to contain
percpu offset. Hide this from callers and pass xt_counters address
instead.
Preparation patch to allocate the percpu counters in page-sized batch
chunks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_defrag modules for ipv4 and ipv6 export an empty stub function.
Any module that needs the defragmentation hooks registered simply 'calls'
this empty function to create a phony module dependency -- modprobe will
then load the defrag module too.
This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook
registration until the functionality is requested within a network namespace
instead of module load time for all namespaces.
Hooks are only un-registered on module unload or when a namespace that used
such defrag functionality exits.
We have to use struct net for this as the register hooks can be called
before netns initialization here from the ipv4/ipv6 conntrack module
init path.
There is no unregister functionality support, defrag will always be
active once it was requested inside a net namespace.
The reason is that defrag has impact on nft and iptables rulesets
(without defrag we might see framents).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.
It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.
RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tsq_flags being in the same cache line than sk_wmem_alloc
makes a lot of sense. Both fields are changed from tcp_wfree()
and more generally by various TSQ related functions.
Prior patch made room in struct sock and added sk_tsq_flags,
this patch deletes tsq_flags from struct tcp_sock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes use of nf_ct_netns_get/put added in previous patch.
We add get/put functions to nf_conntrack_l3proto structure, ipv4 and ipv6
then implement use-count to track how many users (nft or xtables modules)
have a dependency on ipv4 and/or ipv6 connection tracking functionality.
When count reaches zero, the hooks are unregistered.
This delays activation of connection tracking inside a namespace until
stateful firewall rule or nat rule gets added.
This patch breaks backwards compatibility in the sense that connection
tracking won't be active anymore when the protocol tracker module is
loaded. This breaks e.g. setups that ctnetlink for flow accounting and
the like, without any '-m conntrack' packet filter rules.
Followup patch restores old behavour and makes new delayed scheme
optional via sysctl.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
so that conntrack core will add the needed hooks in this namespace.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
currently aliased to try_module_get/_put.
Will be changed in next patch when we add functions to make use of ->net
argument to store usercount per l3proto tracker.
This is needed to avoid registering the conntrack hooks in all netns and
later only enable connection tracking in those that need conntrack.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y,
connection tracking support for UDPlite protocol is built-in into
nf_conntrack.ko.
footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko
(builtin)|| udplite| ipv4 | ipv6 |nf_conntrack
---------++--------+--------+--------+--------------
none || 432538 | 828755 | 828676 | 6141434
UDPlite || - | 829649 | 829362 | 6498204
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
tracking support for SCTP protocol is built-in into nf_conntrack.ko.
footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko
(builtin)|| sctp | ipv4 | ipv6 | nf_conntrack
---------++--------+--------+--------+--------------
none || 498243 | 828755 | 828676 | 6141434
SCTP || - | 829254 | 829175 | 6547872
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
tracking support for DCCP protocol is built-in into nf_conntrack.ko.
footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
net/ipv4/netfilter/nf_conntrack_ipv4.ko \
net/ipv6/netfilter/nf_conntrack_ipv6.ko
(builtin)|| dccp | ipv4 | ipv6 | nf_conntrack
---------++--------+--------+--------+--------------
none || 469140 | 828755 | 828676 | 6141434
DCCP || - | 830566 | 829935 | 6533526
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The email address has changed, let's update the copyright statements.
Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Couple conflicts resolved here:
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.
This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.
commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).
So only two items are left:
- add a tsoffset for request sockets
- extend the tcp isn generator to also return another 32bit number
in addition to the ISN.
Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.
Includes fixes from Eric Dumazet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When xfrm is applied to TSO/GSO packets, it follows this path:
xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
where skb_gso_segment() relies on skb->protocol to function properly.
This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2016-12-01
1) Change the error value when someone tries to run 32bit
userspace on a 64bit host from -ENOTSUPP to the userspace
exported -EOPNOTSUPP. Fix from Yi Zhao.
2) On inbound, ESN sequence numbers are already in network
byte order. So don't try to convert it again, this fixes
integrity verification for ESN. Fixes from Tobias Brunner.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is a large batch of Netfilter fixes for net, they are:
1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist
structure that allows to have several objects with the same key.
Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is
expecting a return value similar to memcmp(). Change location of
the nat_bysource field in the nf_conn structure to avoid zeroing
this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us
to crashes. From Florian Westphal.
2) Don't allow malformed fragments go through in IPv6, drop them,
otherwise we hit GPF, patch from Florian Westphal.
3) Fix crash if attributes are missing in nft_range, from Liping Zhang.
4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia.
5) Two patches from David Ahern to fix netfilter interaction with vrf.
From David Ahern.
6) Fix element timeout calculation in nf_tables, we take milliseconds
from userspace, but we use jiffies from kernelspace. Patch from
Anders K. Pedersen.
7) Missing validation length netlink attribute for nft_hash, from
Laura Garcia.
8) Fix nf_conntrack_helper documentation, we don't default to off
anymore for a bit of time so let's get this in sync with the code.
I know is late but I think these are important, specifically the NAT
bits, as they are mostly addressing fallout from recent changes. I also
read there are chances to have -rc8, if that is the case, that would
also give us a bit more time to test this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.
This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.
Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>