linux

iv/linux

Author	SHA1	Message	Date
Eric W. Biederman	06198b34a3	netfilter: Pass priv instead of nf_hook_ops to netfilter hooks Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:16 +02:00
Eric W. Biederman	a31f1adc09	netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple As gre does not have the srckey in the packet gre_pkt_to_tuple needs to perform a lookup in it's per network namespace tables. Pass in the proper network namespace to all pkt_to_tuple implementations to ensure gre (and any similar protocols) can get this right. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:04 +02:00
Eric W. Biederman	206e8c0075	netfilter: Pass net to nf_dup_ipv4 and nf_dup_ipv6 This allows them to stop guessing the network namespace with pick_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:59:11 +02:00
Eric W. Biederman	686c9b5080	netfilter: x_tables: Use par->net instead of computing from the passed net devices Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:25 +02:00
Eric W. Biederman	156c196f60	netfilter: x_tables: Pass struct net in xt_action_param As xt_action_param lives on the stack this does not bloat any persistent data structures. This is a first step in making netfilter code that needs to know which network namespace it is executing in simpler. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:14 +02:00
Eric W. Biederman	6aa187f21c	netfilter: nf_tables: kill nft_pktinfo.ops - Add nft_pktinfo.pf to replace ops->pf - Add nft_pktinfo.hook to replace ops->hooknum This simplifies the code, makes it more readable, and likely reduces cache line misses. Maintainability is enhanced as the details of nft_hook_ops are of no concern to the recpients of nft_pktinfo. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:01 +02:00
Eric W. Biederman	082a758f04	inet netfilter: Prefer state->hook to ops->hooknum The values of nf_hook_state.hook and nf_hook_ops.hooknum must be the same by definition. We are more likely to access the fields in nf_hook_state over the fields in nf_hook_ops so with a little luck this results in fewer cache line misses, and slightly more consistent code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:57:51 +02:00
Eric W. Biederman	6cb8ff3f1a	inet netfilter: Remove hook from ip6t_do_table, arp_do_table, ipt_do_table The values of ops->hooknum and state->hook are guaraneted to be equal making the hook argument to ip6t_do_table, arp_do_table, and ipt_do_table is unnecessary. Remove the unnecessary hook argument. In the callers use state->hook instead of ops->hooknum for clarity and to reduce the number of cachelines the callers touch. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:57:43 +02:00
Eric Dumazet	c2e7204d18	tcp_cubic: do not set epoch_start in the future Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start is normally set at ACK processing time, not at send time. Doing a proper fix would need to add an additional state variable, and does not seem worth the trouble, given CUBIC bug has been there forever before Jana noticed it. Let's simply not set epoch_start in the future, otherwise bictcp_update() could overflow and CUBIC would again grow cwnd too fast. This was detected thanks to a packetdrill test Neal wrote that was flaky before applying this fix. Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Jana Iyengar <jri@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 22:35:07 -07:00
David Ahern	bde6f9ded1	net: Initialize table in fib result Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard: [ 0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056 [ 0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00 [ 0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 [ 0.877597] Oops: 0000 [#1] SMP [ 0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio [ 0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 [ 0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000 [ 0.877597] RIP: 0010:[<ffffffff8155b5e2>] [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00 [ 0.877597] RSP: 0018:ffff88003ed03ba0 EFLAGS: 00010202 [ 0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020 [ 0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8 [ 0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000 [ 0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00 [ 0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600 [ 0.877597] FS: 00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000 [ 0.877597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0 [ 0.877597] Stack: [ 0.877597] 0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0 [ 0.877597] ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00 [ 0.877597] 0000000000000000 0000000000000046 0000000000000000 0000000400000000 [ 0.877597] Call Trace: [ 0.877597] <IRQ> [ 0.877597] [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40 [ 0.877597] [<ffffffff8158e13c>] arp_process+0x39c/0x690 [ 0.877597] [<ffffffff8158e57e>] arp_rcv+0x13e/0x170 [ 0.877597] [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00 [ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100 [ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100 [ 0.877597] [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70 [ 0.877597] [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90 [ 0.877597] [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0 [ 0.877597] [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net] [ 0.877597] [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net] [ 0.877597] [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0 [ 0.877597] [<ffffffff81053228>] __do_softirq+0x98/0x260 [ 0.877597] [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30 The root cause is use of res.table uninitialized. Thanks to Nikolay for noticing the uninitialized use amongst the maze of gotos. As Nikolay pointed out the second initialization is not required to fix the oops, but rather to fix a related problem where a valid lookup should be invalidated before creating the rth entry. Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reported-by: Richard Alpe <richard.alpe@ericsson.com> Reported-by: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:34:08 -07:00
Junwei Zhang	f6c53334d6	net: only check perm protocol when register proto The permanent protocol nodes are at the head of the list, So only need check all these nodes. No matter the new node is permanent or not, insert the new node after the last permanent protocol node, If the new node conflicts with existing permanent node, return error. Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:02:59 -07:00
Eric Dumazet	58d607d3e5	tcp: provide skb->hash to synack packets In commit b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit"), Tom provided a l4 hash to most outgoing TCP packets. We'd like to provide one as well for SYNACK packets, so that all packets of a given flow share same txhash, to later enable bonding driver to also use skb->hash to perform slave selection. Note that a SYNACK retransmit shuffles the tx hash, as Tom did in commit 265f94ff54d62 ("net: Recompute sk_txhash on negative routing advice") for established sockets. This has nice effect making TCP flows resilient to some kind of black holes, even at connection establish phase. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:01:04 -07:00
Eric W. Biederman	be10de0a32	netfilter: Add blank lines in callers of netfilter hooks In code review it was noticed that I had failed to add some blank lines in places where they are customarily used. Taking a second look at the code I have to agree blank lines would be nice so I have added them here. Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	0c4b51f005	netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	9dff2c966a	netfilter: Use nf_hook_state.net Instead of saying "net = dev_net(state->in?state->in:state->out)" just say "state->net". As that information is now availabe, much less confusing and much less error prone. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	29a26a5680	netfilter: Pass struct net into the netfilter hooks Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	f9e4306fd8	arp: Introduce arp_xmit_finish The function dev_queue_xmit_skb_sk is unncessary and very confusing. Introduce arp_xmit_finish to remove the need for dev_queue_xmit_skb_sk, and have arp_xmit_finish call dev_queue_xmit. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:35 -07:00
Eric W. Biederman	758ccac8e7	ipv4: Only compute net once in ipmr_forward_finish Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:34 -07:00
Eric W. Biederman	38184b3b07	ipv4: Only compute net once in ip_rcv_finish Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:34 -07:00
Eric W. Biederman	4ba1bf4292	ipv4: Only compute net once in ip_finish_output2 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:34 -07:00
Eric W. Biederman	9479b0af48	ipv4: Explicitly compute net in ip_fragment Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:34 -07:00
Eric W. Biederman	26a949dbd5	ipv4: Only compute net once in ip_do_fragment Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:34 -07:00
Eric W. Biederman	cc4c851e4b	ipv4: Don't recompute net in ipmr_queue_xmit Calling dev_net(dev) for is just silly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:33 -07:00
Eric W. Biederman	88f5cc2458	ipv4: Remember the net in ip_output and ip_mc_output This is a prepatory patch to passing net int the netfilter hooks, where net will be used again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:33 -07:00
Eric W. Biederman	e707766ce0	ipv4: Compute net once in ip_rcv Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:33 -07:00
Eric W. Biederman	f8e1ac7912	ipv4: Compute net once in ip_forward_finish Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:33 -07:00
Eric W. Biederman	fcad0ac2da	ipv4: Compute net once in ip_forward Compute struct net from the input device in ip_forward before it is used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:33 -07:00
Eric W. Biederman	5a70649e0d	net: Merge dst_output and dst_output_sk Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:32 -07:00
David Ahern	58189ca7b2	net: Fix vti use case with oif in dst lookups Steffen reported that the recent change to add oif to dst lookups breaks the VTI use case. The problem is that with the oif set in the flow struct the comparison to the nh_oif is triggered. Fix by splitting the FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare nh oif (FLOWI_FLAG_SKIP_NH_OIF). Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 16:36:34 -07:00
Sowmini Varadhan	d5566fd72e	rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats Many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the the AF_INET6 statististics that are always returned by default from rtnl_fill_ifinfo(). Computing the statistics can be an expensive operation that impacts scaling, so it is desirable to avoid this if the information is not needed. This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that can be passed with netlink_request() to avoid statistics computation for the ifinfo path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 15:25:02 -07:00
David Ahern	c36ba6603a	net: Allow user to get table id from route lookup rt_fill_info which is called for 'route get' requests hardcodes the table id as RT_TABLE_MAIN which is not correct when multiple tables are used. Use the newly added table id in the rtable to send back the correct table similar to what is done for IPv6. To maintain current ABI a new request flag, RTM_F_LOOKUP_TABLE, is added to indicate the actual table is wanted versus the hardcoded response. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 12:01:41 -07:00
David Ahern	b7503e0cdb	net: Add FIB table id to rtable Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 12:01:41 -07:00
David Ahern	d08c4f3554	net: Refactor rtable initialization All callers to rt_dst_alloc have nearly the same initialization following a successful allocation. Consolidate it into rt_dst_alloc. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 12:01:40 -07:00
Eric Dumazet	30927520db	tcp_cubic: better follow cubic curve after idle period Jana Iyengar found an interesting issue on CUBIC : The epoch is only updated/reset initially and when experiencing losses. The delta "t" of now - epoch_start can be arbitrary large after app idle as well as the bic_target. Consequentially the slope (inverse of ca->cnt) would be really large, and eventually ca->cnt would be lower-bounded in the end to 2 to have delayed-ACK slow-start behavior. This particularly shows up when slow_start_after_idle is disabled as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle time. Jana initial fix was to reset epoch_start if app limited, but Neal pointed out it would ask the CUBIC algorithm to recalculate the curve so that we again start growing steeply upward from where cwnd is now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth curve to be the same shape, just shifted later in time by the amount of the idle period. Reported-by: Jana Iyengar <jri@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Sangtae Ha <sangtae.ha@gmail.com> Cc: Lawrence Brakmo <lawrence@brakmo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-10 10:58:33 -07:00
Neal Cardwell	05c5a46d71	tcp: generate CA_EVENT_TX_START on data frames Issuing a CC TX_START event on control frames like pure ACK is a waste of time, as a CC should not care. Following patch needs this change, as we want CUBIC to properly track idle time at a low cost, with a single TX_START being generated. Yuchung might slightly refine the condition triggering TX_START on a followup patch. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Jana Iyengar <jri@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Sangtae Ha <sangtae.ha@gmail.com> Cc: Lawrence Brakmo <lawrence@brakmo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-10 10:58:33 -07:00
Phil Sutter	f53de1e9a4	net: ipv6: use common fib_default_rule_pref This switches IPv6 policy routing to use the shared fib_default_rule_pref() function of IPv4 and DECnet. It is also used in multicast routing for IPv4 as well as IPv6. The motivation for this patch is a complaint about iproute2 behaving inconsistent between IPv4 and IPv6 when adding policy rules: Formerly, IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the assigned priority value was decreased with each rule added. Since then all users of the default_pref field have been converted to assign the generic function fib_default_rule_pref(), fib_nl_newrule() may just use it directly instead. Therefore get rid of the function pointer altogether and make fib_default_rule_pref() static, as it's not used outside fib_rules.c anymore. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-09 14:19:50 -07:00
Daniel Borkmann	a82b0e6391	netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled While testing various Kconfig options on another issue, I found that the following one triggers as well on allmodconfig and nf_conntrack disabled: net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’: net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function) if (this_cpu_read(nf_skb_duplicated)) [...] net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’: net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function) if (this_cpu_read(nf_skb_duplicated)) Fix it by including directly the header where it is defined. Fixes: bbde9fc1824a ("netfilter: factor out packet duplication for IPv4/IPv6") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-02 16:28:06 -07:00
David Ahern	9b8ff51822	net: Make table id type u32 A number of VRF patches used 'int' for table id. It should be u32 to be consistent with the rest of the stack. Fixes: 4e3c89920cd3a ("net: Introduce VRF related flags and helpers") 15be405eb2ea9 ("net: Add inet_addr lookup by table") 30bbaa1950055 ("net: Fix up inet_addr_type checks") 021dd3b8a142d ("net: Add routes to the table associated with the device") dc028da54ed35 ("inet: Move VRF table lookup to inlined function") f6d3c19274c74 ("net: FIB tracepoints") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 14:32:44 -07:00
Daniel Borkmann	c3a8d94746	tcp: use dctcp if enabled on the route to the initiator Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Daniel Borkmann	b8d3e4163a	fib, fib6: reject invalid feature bits Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Florian Westphal	6cf9dfd3bd	net: fib: move metrics parsing to a helper fib_create_info() is already quite large, so before adding more code to the metrics section move that to a helper, similar to ip6_convert_metrics. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Pravin B Shelar	4c22279848	ip-tunnel: Use API to access tunnel metadata options. Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:28:56 -07:00
Madalin Bucur	d1bfc62591	ipv4: fix 32b build Address remaining issue after 80ec192. Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 11:32:41 -07:00
David S. Miller	80ec1927b1	ipv4: Fix 32-bit build. net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64': >> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function) v = (((u64 )bhptr) + offt); ^ net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in net/ipv4/af_inet.c: In function 'snmp_fold_field64': >> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function) res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ >> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field' res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ net/ipv4/af_inet.c:1455:5: note: declared here u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt) ^ Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-30 22:40:44 -07:00
Raghavendra K T	c4c6bc3146	net: Introduce helper functions to get the per cpu data Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-30 21:48:58 -07:00
Jiri Benc	b9b6695cf0	fou: reject IPv6 config fou does not really support IPv6 encapsulation. After an UDP socket is created in fou_create, the encap_rcv callback is set either to fou_udp_recv or to gue_udp_recv. Both of those unconditionally assume that the received packet has an IPv4 header and access the data at network_header as it was an IPv4 header. This leads to IPv6 flow label being interpreted as IP packet length, etc. Disallow fou tunnel to be configured as IPv6 until real IPv6 support is added to fou. CC: Tom Herbert <tom@herbertland.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
Jiri Benc	7f9562a1f4	ip_tunnels: record IP version in tunnel info There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
Jiri Benc	46fa062ad6	ip_tunnels: convert the mode field of ip_tunnel_info to flags The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
David Ahern	f6d3c19274	net: FIB tracepoints A few useful tracepoints developing VRF driver. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:05:16 -07:00
David S. Miller	581a5f2a61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. In sum, patches to address fallout from the previous round plus updates from the IPVS folks via Simon Horman, they are: 1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. From Raducu Deaconu. 2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch from Julian Anastasov. 3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From Julian Anastasov. 4) Enhance multicast configuration for the IPVS state sync daemon. Also from Julian. 5) Resolve sparse warnings in the nf_dup modules. 6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set. 7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative subsets of code 1. From Andreas Herz. 8) Revert the jumpstack size calculation from mark_source_chains due to chain depth miscalculations, from Florian Westphal. 9) Calm down more sparse warning around the Netfilter tree, again from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 16:29:59 -07:00

1 2 3 4 5 ...

7046 Commits