IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We can simply use the %pISc format specifier that was recently added
and thus remove some code that distinguishes between IPv4 and IPv6.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_probe currently only supports analysis of IPv4 connections.
Therefore, it would be nice to have IPv6 supported as well. Since we
have the recently added %pISpc specifier that is IPv4/IPv6 generic,
build related sockaddress structures from the flow information and
pass this to our format string. Tested with SSH and HTTP sessions
on IPv4 and IPv6.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches fixes a rather unproblematic function signature mismatch
as the const specifier was missing for the th variable; and next to
that it adds a build-time assertion so that future function signature
mismatches for kprobes will not end badly, similarly as commit 22222997
("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack")
did it for SCTP.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is helpful to sometimes know the TCP window sizes of an established
socket e.g. to confirm that window scaling is working or to tweak the
window size to improve high-latency connections, etc etc. Currently the
TCP snooper only exports the send window size, but not the receive window
size. Therefore, also add the receive window size to the end of the
output line.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Some constifications, from Mathias Krause.
2) Catch bugs if a hold timer is still active when xfrm_policy_destroy()
is called, from Fan Du.
3) Remove a redundant address family checking, from Fan Du.
4) Make xfrm_state timer monotonic to be independent of system clock changes,
from Fan Du.
5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(),
from Rami Rosen.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The stack currently detects reordering and avoid spurious
retransmission very well. However the throughput is sub-optimal under
high reordering because cwnd is increased only if the data is deliverd
in order. I.e., FLAG_DATA_ACKED check in tcp_ack(). The more packet
are reordered the worse the throughput is.
Therefore when reordering is proven high, cwnd should advance whenever
the data is delivered regardless of its ordering. If reordering is low,
conservatively advance cwnd only on ordered deliveries in Open state,
and retain cwnd in Disordered state (RFC5681).
Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
to 55ms (for reordering effect). This change increases TCP throughput
by 20 - 25% to near bottleneck BW.
A special case is the stretched ACK with new SACK and/or ECE mark.
For example, a receiver may receive an out of order or ECN packet with
unacked data buffered because of LRO or delayed ACK. The principle on
such an ACK is to advance cwnd on the cummulative acked part first,
then reduce cwnd in tcp_fastretrans_alert().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of hard-coding length values, use a define to make it clear
where those lengths come from.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the functions mld_gq_start_timer(), mld_ifc_start_timer(),
and mld_dad_start_timer(), rather use unsigned long than int
as we operate only on unsigned values anyway. This seems more
appropriate as there is no good reason to do type conversions
to int, that could lead to future errors.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proper API functions to calculate jiffies from milliseconds and
not the crude method of dividing HZ by a value. This ensures more
accurate values even in the case of strange HZ values. While at it,
also simplify code in the mlh2 case by using max().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an Xin6 tunnel is set up, we check other netdevices to inherit the link-
local address. If none is available, the interface will not have any link-local
address. RFC4862 expects that each interface has a link local address.
Now than this kind of tunnels supports x-netns, it's easy to fall in this case
(by creating the tunnel in a netns where ethernet interfaces stand and then
moving it to a other netns where no ethernet interface is available).
RFC4291, Appendix A suggests two methods: the first is the one currently
implemented, the second is to generate a unique identifier, so that we can
always generate the link-local address. Let's use eth_random_addr() to generate
this interface indentifier.
I remove completly the previous method, hence for the whole life of the
interface, the link-local address remains the same (previously, it depends on
which ethernet interfaces were up when the tunnel interface was set up).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit df8372ca74.
These changes are buggy and make unintended semantic changes
to ip6_tnl_add_linklocal().
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to move the derefernce after the IS_ERR() check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discussed last year [1], there is no compelling reason
to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF
[1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2
Willem raised this issue again because some of our internal
regression tests broke after lo mtu being set to 65536.
IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of
mtu + 1 bytes, expecting the send() to fail, but it does not.
Alexey raised interesting points about TCP MSS, that should be addressed
in follow-up patches in TCP stack if needed, as someone could also set
an odd mtu anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nocache-argument was used in tcp_v4_send_synack as an argument to
inet_csk_route_req. However, since ba3f7f04ef (ipv4: Kill
FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used.
This patch removes the unsued argument from tcp_v4_send_synack.
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: code indent should use tabs where possible: fix 2.
ERROR: do not use assignment in if condition: fix 5.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just follow the Joe Perches's opinion, it is a better way to fix the
style errors.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/netfilter/nf_conntrack_proto_tcp.c
The conflict had to do with overlapping changes dealing with
fixing the use of an "s32" to hold the value returned by
NAT_OFFSET().
Pablo Neira Ayuso says:
====================
The following batch contains Netfilter/IPVS updates for your net-next tree.
More specifically, they are:
* Trivial typo fix in xt_addrtype, from Phil Oester.
* Remove net_ratelimit in the conntrack logging for consistency with other
logging subsystem, from Patrick McHardy.
* Remove unneeded includes from the recently added xt_connlabel support, from
Florian Westphal.
* Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
this, from Florian Westphal.
* Remove tproxy core, now that we have socket early demux, from Florian
Westphal.
* A couple of patches to refactor conntrack event reporting to save a good
bunch of lines, from Florian Westphal.
* Fix missing locking in NAT sequence adjustment, it did not manifested in
any known bug so far, from Patrick McHardy.
* Change sequence number adjustment variable to 32 bits, to delay the
possible early overflow in long standing connections, also from Patrick.
* Comestic cleanups for IPVS, from Dragos Foianu.
* Fix possible null dereference in IPVS in the SH scheduler, from Daniel
Borkmann.
* Allow to attach conntrack expectations via nfqueue. Before this patch, you
had to use ctnetlink instead, thus, we save the conntrack lookup.
* Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle context based address when an unspecified address is given.
For other context based address we print a warning and drop the packet
because we don't support it right now.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch drops the pre and postcount calculation from the
lowpan_uncompress_addr function.We use instead a switch/case
over address_mode value. The original implementation has several
bugs in this function and it was hard to decrypt how it works.
To make it maintainable and fix these bugs this patch basically
reimplements lowpan_uncompress_addr from scratch.
A list of bugs we found in the current implementation:
1) Properly support uncompression of short-address based IPv6 addresses
(instead of basically copying garbage)
2) Fix use and uncompression of long-addresses based IPv6 addresses
3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add function to uncompress multicast address.
This function split the uncompress function for a multicast address
in a seperate function.
To uncompress a multicast address is different than a other
non-multicasts addresses according to rfc6282.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a helper function to parse the ipv6 header to a
6lowpan header in stream.
This function checks first if we can pull data with a specific
length from a skb. If this seems to be okay, we copy skb data to
a destination pointer and run skb_pull.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new 6lowpan fragment is received, a skbuff is allocated for
the reassembled packet. However when a 6lowpan packet compresses
link-local addresses based on link-layer addresses, the processing
function relies on the skb mac control block to find the related
link-layer address.
This patch copies the control block from the first fragment into
the newly allocated skb to keep a trace of the link-layer addresses
in case of a link-local compressed address.
Edit: small changes on comment issue
Signed-off-by: David Hauweele <david@hauweele.net>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch simplify the handling to set fields inside of struct ipv6hdr
to zero. Instead of setting some memory regions with memset to zero we
initialize the whole ipv6hdr to zero.
This is a simplification for parsing the 6lowpan header for the upcomming
patches.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch adds vxlan vport type for openvswitch using
vxlan api. So now there is vxlan dependency for openvswitch.
CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a comment in xfrm_input() which became irrelevant
due to commit 2774c13, "xfrm: Handle blackhole route creation via afinfo".
That commit removed returning -EREMOTE in the xfrm_lookup() method when the
packet should be discarded and also removed the correspoinding -EREMOTE
handlers. This was replaced by calling the make_blackhole() method. Therefore
the comment about -EREMOTE is not relevant anymore.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull networking fixes from David Miller:
1) Fix SKB leak in 8139cp, from Dave Jones.
2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar.
3) RCU conversion of macvtap introduced two races, fixes by Eric
Dumazet
4) Synchronize statistic flows in bnx2x driver to prevent corruption,
from Dmitry Kravkov
5) Undo optimization in IP tunneling, we were using the inner IP header
in some cases to inherit the IP ID, but that isn't correct in some
circumstances. From Pravin B Shelar
6) Use correct struct size when parsing netlink attributes in
rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen
7) Length verifications in tun_get_user() are bogus, from Weiping Pan
and Dan Carpenter
8) Fix bad merge resolution during 3.11 networking development in
openvswitch, albeit a harmless one which added some unreachable
code. From Jesse Gross
9) Wrong size used in flexible array allocation in openvswitch, from
Pravin B Shelar
10) Clear out firmware capability flags the be2net driver isn't ready to
handle yet, from Sarveshwar Bandi
11) Revert DMA mapping error checking addition to cxgb3 driver, it's
buggy. From Alexey Kardashevskiy
12) Fix regression in packet scheduler rate limiting when working with a
link layer of ATM. From Jesper Dangaard Brouer
13) Fix several errors in TCP Cubic congestion control, in particular
overflow errors in timestamp calculations. From Eric Dumazet and
Van Jacobson
14) In ipv6 routing lookups, we need to backtrack if subtree traversal
don't result in a match. From Hannes Frederic Sowa
15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs
16) Get "low latency" out of the new MIB counter names. From Eliezer
Tamir
17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar
Samudrala
18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung
Cheng
19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean
20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong
21) call_rcu() call to destroy SCTP transport is done too early and
might result in an oops. From Daniel Borkmann
22) Fix races in genetlink family dumps, from Johannes Berg
23) Flags passed into macvlan by the user need to be validated properly,
from Michael S Tsirkin
24) Fix skge build on 32-bit, from Stephen Hemminger
25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira
Ayuso
26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay
Aleksandrov
27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel
Borkmann
28) neigh_parms need to be setup before calling the ->ndo_neigh_setup()
method. From Veaceslav Falico
29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet
30) Don't dereference MLD query message if the length isn't value in the
bridge multicast code, from Linus Lüssing
31) Fix VXLAN IGMP join regression due to an inverted check, from Cong
Wang
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes
tun: signedness bug in tun_get_user()
qlcnic: Fix diagnostic interrupt test for 83xx adapters
qlcnic: Fix beacon state return status handling
qlcnic: Fix set driver version command
net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset
net_sched: restore "linklayer atm" handling
drivers/net/ethernet/via/via-velocity.c: update napi implementation
Revert "cxgb3: Check and handle the dma mapping errors"
be2net: Clear any capability flags that driver is not interested in.
openvswitch: Reset tunnel key between input and output.
openvswitch: Use correct type while allocating flex array.
openvswitch: Fix bad merge resolution.
tun: compare with 0 instead of total_len
rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header
ethernet/arc/arc_emac - fix NAPI "work > weight" warning
ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
bnx2x: prevent crash in shutdown flow with CNIC
bnx2x: fix PTE write access error
bnx2x: fix memory leak in VF
...
xfrm_state timer should be independent of system clock change,
so switch to CLOCK_BOOTTIME base which is not only monotonic but
also counting suspend time.
Thus issue reported in commit: 9e0d57fd6d
("xfrm: SAD entries do not expire correctly after suspend-resume")
could ALSO be avoided.
v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Following patch stores struct netlink_callback in netlink_sock
to avoid allocating and freeing it on every netlink dump msg.
Only one dump operation is allowed for a given socket at a time
therefore we can safely convert cb pointer to cb struct inside
netlink_sock.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UIDs are printed in the proc_fs as signed int, whereas
they are unsigned int.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.
tc class add ... htb rate X ceil Y linklayer atm
The linklayer setting is implemented by modifying the rate table
which is send to the kernel. No direct parameter were
transferred to the kernel indicating the linklayer setting.
The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.
To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table. It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.
Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect. Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
Three bug fixes that are fairly small either way but resolve obviously
incorrect code. For net/3.11.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We know that "dev" is a valid pointer at this point, so we can remove
the test and clean up a little.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's better to use available helpers for these tests.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans()
set pkt_type.
In fact, we should force pkt_type to PACKET_HOST, so move the call after
eth_type_trans().
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't make sense to output a tunnel packet using the same
parameters that it was received with since that will generally
just result in the packet going back to us. As a result, userspace
assumes that the tunnel key is cleared when transitioning through
the switch. In the majority of cases this doesn't matter since a
packet is either going to a tunnel port (in which the key is
overwritten with new values) or to a non-tunnel port (in which
case the key is ignored). However, it's theoreticaly possible that
userspace could rely on the documented behavior, so this corrects
it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Flex array is used to allocate hash buckets which is type struct
hlist_head, but we use `struct hlist_head *` to calculate
array size. Since hlist_head is of size pointer it works fine.
Following patch use correct type.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Fix the iproute2 command `bridge vlan show`, after switching from
rtgenmsg to ifinfomsg.
Let's start with a little history:
Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in
the 3.9 merge window.
In the kernel commit 6cbdceeb, he added attribute support to
bridge GETLINK requests sent with rtgenmsg.
Mar 6th: Vlad got this iproute2 reference implementation of the bridge
vlan netlink interface accepted (iproute2 9eff0e5c)
Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
http://patchwork.ozlabs.org/patch/239602/http://marc.info/?t=136680900700007
Apr 28th: Linus released 3.9
Apr 30th: Stephen released iproute2 3.9.0
The `bridge vlan show` command haven't been working since the switch to
ifinfomsg, or in a released version of iproute2. Since the kernel side
only supports rtgenmsg, which iproute2 switched away from just prior to
the iproute2 3.9.0 release.
I haven't been able to find any documentation, about neither rtgenmsg
nor ifinfomsg, and in which situation to use which, but kernel commit
88c5b5ce seams to suggest that ifinfomsg should be used.
Fixing this in kernel will break compatibility, but I doubt that anybody
have been using it due to this bug in the user space reference
implementation, at least not without noticing this bug. That said the
functionality is still fully functional in 3.9, when reversing iproute2
commit 63338dca.
This could also be fixed in iproute2, but thats an ugly patch that would
reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
like rtgenmsg usage is discouraged. I'm assuming that the only reason
that Vlad implemented the kernel side to use rtgenmsg, was because
iproute2 was using it at the time.
Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net>
Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit cab70040df ("net: igmp:
Reduce Unsolicited report interval to 1s when using IGMPv3") and
2690048c01 ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval") by William Manley made
igmp unsolicited report intervals configurable per interface and corrected
the interval of unsolicited igmpv3 report messages resendings to 1s.
Same needs to be done for IPv6:
MLDv1 (RFC2710 7.10.): 10 seconds
MLDv2 (RFC3810 9.11.): 1 second
Both intervals are configurable via new procfs knobs
mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
(also added .force_mld_version to ipv6_devconf_dflt to bring structs in
line without semantic changes)
v2:
a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
unsolicited_report_interval procfs knobs.
b) incorporate stylistic feedback from William Manley
v3:
a) add new DEVCONF_* values to the end of the enum (thanks to David
Miller)
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: William Manley <william.manley@youview.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.
Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)
CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On timeout the TCP sender unconditionally resets the estimated degree
of network reordering (tp->reordering). The idea behind this is that
the estimate is too large to trigger fast recovery (e.g., due to a IP
path change).
But for example if the sender only had 2 packets outstanding, then a
timeout doesn't tell much about reordering. A sender that learns about
reordering on big writes and loses packets on small writes will end up
falsely retransmitting again and again, especially when reordering is
more likely on big writes.
Therefore the sender should only suspect that tp->reordering is too
high if it could have gone into fast recovery with the (lower) default
estimate.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
This is a batch of updates intended for 3.12. It is mostly driver
stuff, although Johannes Berg and Simon Wunderlich make a good
showing with mac80211 bits (particularly some work on 5/10 MHz
channel support).
The usual suspects are mostly represented. There are lots of updates
to iwlwifi, ath9k, ath10k, mwifiex, rt2x00, wil6210, as usual.
The bcma bus gets some love this time, as do cw1200, iwl4965, and a
few other bits here and there. I don't think there is much unusual
here, FWIW.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the capability to attach expectations via nfnetlink_queue.
This is required by conntrack helpers that trigger expectations based on
the first packet seen like the TFTP and the DHCPv6 user-space helpers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch refactors ctnetlink_create_expect by spliting it in two
chunks. As a result, we have a new function ctnetlink_alloc_expect
to allocate and to setup the expectation from ctnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When dumping generic netlink families, only the first dump call
is locked with genl_lock(), which protects the list of families,
and thus subsequent calls can access the data without locking,
racing against family addition/removal. This can cause a crash.
Fix it - the locking needs to be conditional because the first
time around it's already locked.
A similar bug was reported to me on an old kernel (3.4.47) but
the exact scenario that happened there is no longer possible,
on those kernels the first round wasn't locked either. Looking
at the current code I found the race described above, which had
also existed on the old kernel.
Cc: stable@vger.kernel.org
Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Probably this one is quite unlikely to be triggered, but it's more safe
to do the call_rcu() at the end after we have dropped the reference on
the asoc and freed sctp packet chunks. The reason why is because in
sctp_transport_destroy_rcu() the transport is being kfree()'d, and if
we're unlucky enough we could run into corrupted pointers. Probably
that's more of theoretical nature, but it's safer to have this simple fix.
Introduced by commit 8c98653f ("sctp: sctp_close: fix release of bindings
for deferred call_rcu's"). I also did the 8c98653f regression test and
it's fine that way.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>