linux

iv/linux

Author	SHA1	Message	Date
Eric Dumazet	18aafc622a	net: splice: fix __splice_segment() commit `9ca1b22d6d` (net: splice: avoid high order page splitting) forgot that skb->head could need a copy into several page frags. This could be the case for loopback traffic mostly. Also remove now useless skb argument from linear_to_page() and __splice_segment() prototypes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-11 16:48:08 -08:00
Rami Rosen	28a28283f8	ipv4: fib: fix a comment. In fib_frontend.c, there is a confusing comment; NETLINK_CB(skb).portid does not refer to a pid of sending process, but rather to a netlink portid. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-11 15:58:08 -08:00
Stanislaw Gruszka	d07d7507bf	net, wireless: overwrite default_ethtool_ops Since: commit `2c60db0370` Author: Eric Dumazet <edumazet@google.com> Date: Sun Sep 16 09:17:26 2012 +0000 net: provide a default dev->ethtool_ops wireless core does not correctly assign ethtool_ops. After alloc_netdev*() call, some cfg80211 drivers provide they own ethtool_ops, but some do not. For them, wireless core provide generic cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call: if (!dev->ethtool_ops) dev->ethtool_ops = &cfg80211_ethtool_ops; But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211 drivers without custom ethtool_ops), but points to &default_ethtool_ops. In order to fix the problem, provide function which will overwrite default_ethtool_ops and use it by wireless core. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-11 15:55:48 -08:00
Alexander Duyck	87696f9234	net: Export __netdev_pick_tx so that it can be used in modules When testing with FCoE enabled we discovered that I had not exported __netdev_pick_tx. As a result ixgbe doesn't build with the RFC patches applied because ixgbe_select_queue was calling the function. This change corrects that build issue by correctly exporting __netdev_pick_tx so it can be used by modules. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-11 15:47:27 -08:00
Linus Torvalds	93ccb3910a	NFS client bugfixe for Linux 3.8 - Fix a socket lock leak in net/sunrpc/xprt.c -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQ8FNyAAoJEGcL54qWCgDy1EAP/jetZgUmOLCV37TVAFDPkaDy ADjeIshsJt7T2/2zKWBoDQ4sKSNO3wRbuSQ9gaMPglfdf8j3PV38+2MOyL3L4yTp 2L5RqVrbzs+xgIRN7uu6pajVNeZpZb4PqphO+2SnM8uSz6XMVpYRoDtVBiEhgF16 F9csoBEX5HMC4AFhbkDoKOUoIb13cutYdd+0ijKnAwBrc31YUrcQDwUtZfcp8h2P xk4q/k5uj0ilHGafu0BkkMqyQLVocvp/FJXDQ5CjCI73J55hE7lcfM2LMavrJ0gA ACxE5+kr0vVOaasvpyu3nkntQ4Td6Z2PYbXCyIIlGvsyqCM8QgqUrfTU9zZauxRa mrRWgw0c/mqJ2o41Jl2GxWXCPIoDMX9izdZad3wZ9ct0OTTk6RumHTvnGo1XoZBI i5UTVgmnZoOFBQ+gWsxBay9rBjEoG2IBxsew7eEDPCXM0nIG0NztvGK7psFbjR1y +wPAgB9+NghOzTwH3GrC1zEK5tpGq1DAbyciT5HC7gk/1ZmfVcvT0iAqO6nkyeyX MArMSS6TAgR4IH+gr/qdybnwI6AezGVLiRwCScNPWyHq/gJ9tMCpZ+iodQKxMkoW PGHaldLdMWtL+PEEYAmqWclMTaEnnsgMbbqmU1PucWYZ9Ovq2Kktzucczd/2GwdO Gh2Utpg0vfAJSZkxy1yK =ukG7 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfix from Trond Myklebust: - Fix a socket lock leak in net/sunrpc/xprt.c * tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Ensure we release the socket write lock if the rpc_task exits early	2013-01-11 12:09:04 -08:00
Eric Dumazet	7b514a886b	tcp: accept RST without ACK flag commit `c3ae62af8e` (tcp: should drop incoming frames without ACK flag set) added a regression on the handling of RST messages. RST should be allowed to come even without ACK bit set. We validate the RST by checking the exact sequence, as requested by RFC 793 and 5961 3.2, in tcp_validate_incoming() Reported-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:49:30 -08:00
Alexander Duyck	024e9679a2	net: Add support for XPS without sysfs being defined This patch makes it so that we can support transmit packet steering without sysfs needing to be enabled. The reason for making this change is to make it so that a driver can make use of the XPS even while the sysfs portion of the interface is not present. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:47:04 -08:00
Alexander Duyck	01c5f864e6	net: Rewrite netif_set_xps_queues to address several issues This change is meant to address several issues I found within the netif_set_xps_queues function. If the allocation of one of the maps to be assigned to new_dev_maps failed we could end up with the device map in an inconsistent state since we had already worked through a number of CPUs and removed or added the queue. To address that I split the process into several steps. The first of which is just the allocation of updated maps for CPUs that will need larger maps to store the queue. By doing this we can fail gracefully without actually altering the contents of the current device map. The second issue I found was the fact that we were always allocating a new device map even if we were not adding any queues. I have updated the code so that we only allocate a new device map if we are adding queues, otherwise if we are not adding any queues to CPUs we just skip to the removal process. The last change I made was to reuse the code from remove_xps_queue to remove the queue from the CPU. By making this change we can be consistent in how we go about adding and removing the queues from the CPUs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:47:04 -08:00
Alexander Duyck	10cdc3f3cd	net: Rewrite netif_reset_xps_queue to allow for better code reuse This patch does a minor refactor on netif_reset_xps_queue to address a few items I noticed. First is the fact that we are doing removal of queues in both netif_reset_xps_queue and netif_set_xps_queue. Since there is no need to have the code in two places I am pushing it out into a separate function and will come back in another patch and reuse the code in netif_set_xps_queue. The second item this change addresses is the fact that the Tx queues were not getting their numa_node value cleared as a part of the XPS queue reset. This patch resolves that by resetting the numa_node value if the dev_maps value is set. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:47:04 -08:00
Alexander Duyck	537c00de1c	net: Add functions netif_reset_xps_queue and netif_set_xps_queue This patch adds two functions, netif_reset_xps_queue and netif_set_xps_queue. The main idea behind these two functions is to provide a mechanism through which drivers can update their defaults in regards to XPS. Currently no such mechanism exists and as a result we cannot use XPS for things such as ATR which would require a basic configuration to start in which the Tx queues are mapped to CPUs via a 1:1 mapping. With this change I am making it possible for drivers such as ixgbe to be able to use the XPS feature by controlling the default configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:47:03 -08:00
Alexander Duyck	416186fbf8	net: Split core bits of netdev_pick_tx into __netdev_pick_tx This change splits the core bits of netdev_pick_tx into a separate function. The main idea behind this is to make this code accessible to select queue functions when they decide to process the standard path instead of their own custom path in their select queue routine. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 22:47:03 -08:00
Eric Dumazet	1def9238d4	net_sched: more precise pkt_len computation One long standing problem with TSO/GSO/GRO packets is that skb->len doesn't represent a precise amount of bytes on wire. Headers are only accounted for the first segment. For TCP, thats typically 66 bytes per 1448 bytes segment missing, an error of 4.5 % for normal MSS value. As consequences : 1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits. 2) Device stats are slightly under estimated as well. Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len computation. Packet schedulers should use qdisc pkt_len instead of skb->len for their bandwidth limitations, and TSO enabled devices drivers could use pkt_len if their statistics are not hardware assisted, and if they don't scratch skb->cb[] first word. Both egress and ingress paths work, thanks to commit `fda55eca5a` (net: introduce skb_transport_header_was_set()) : If GRO built a GSO packet, it also set the transport header for us. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Paolo Valente <paolo.valente@unimore.it> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:58:13 -08:00
Randy Dunlap	7144bca681	nfs: fix sunrpc/clnt.c kernel-doc warnings Fix new kernel-doc warnings in clnt.c: Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor' Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-10 14:35:23 -08:00
Romain Kuntz	21caa6622b	ipv6: use addrconf_get_prefix_route for prefix route lookup [v2] Replace ip6_route_lookup() with addrconf_get_prefix_route() when looking up for a prefix route. This ensures that the connected prefix is looked up in the main table, and avoids the selection of other matching routes located in different tables as well as blackhole or prohibited entries. In addition, this fixes an Opps introduced by commit `64c6d08e` (ipv6: del unreachable route when an addr is deleted on lo), that would occur when a blackhole or prohibited entry is selected by ip6_route_lookup(). Such entries have a NULL rt6i_table argument, which is accessed by __ip6_del_rt() when trying to lock rt6i_table->tb6_lock. The function addrconf_is_prefix_route() is not used anymore and is removed. [v2] Minor indentation cleanup and log updates. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:22:54 -08:00
Romain Kuntz	85da53bf1c	ipv6: fix the noflags test in addrconf_get_prefix_route The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:13:33 -08:00
Eric Dumazet	f26845b43c	tcp: fix splice() and tcp collapsing interaction Under unusual circumstances, TCP collapse can split a big GRO TCP packet while its being used in a splice(socket->pipe) operation. skb_splice_bits() releases the socket lock before calling splice_to_pipe(). [ 1081.353685] WARNING: at net/ipv4/tcp.c:1330 tcp_cleanup_rbuf+0x4d/0xfc() [ 1081.371956] Hardware name: System x3690 X5 -[7148Z68]- [ 1081.391820] cleanup rbuf bug: copied AD3BCF1 seq AD370AF rcvnxt AD3CF13 To fix this problem, we must eat skbs in tcp_recv_skb(). Remove the inline keyword from tcp_recv_skb() definition since it has three call sites. Reported-by: Christian Becker <c.becker@traviangames.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:09:57 -08:00
Eric Dumazet	ff905b1e4a	tcp: splice: fix an infinite loop in tcp_read_sock() commit `02275a2ee7` (tcp: don't abort splice() after small transfers) added a regression. [ 83.843570] INFO: rcu_sched self-detected stall on CPU [ 83.844575] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=21002 jiffies, g=4457, c=4456, q=13132) [ 83.844582] Task dump for CPU 6: [ 83.844584] netperf R running task 0 8966 8952 0x0000000c [ 83.844587] 0000000000000000 0000000000000006 0000000000006c6c 0000000000000000 [ 83.844589] 000000000000006c 0000000000000096 ffffffff819ce2bc ffffffffffffff10 [ 83.844592] ffffffff81088679 0000000000000010 0000000000000246 ffff880c4b9ddcd8 [ 83.844594] Call Trace: [ 83.844596] [<ffffffff81088679>] ? vprintk_emit+0x1c9/0x4c0 [ 83.844601] [<ffffffff815ad449>] ? schedule+0x29/0x70 [ 83.844606] [<ffffffff81537bd2>] ? tcp_splice_data_recv+0x42/0x50 [ 83.844610] [<ffffffff8153beaa>] ? tcp_read_sock+0xda/0x260 [ 83.844613] [<ffffffff81537b90>] ? tcp_prequeue_process+0xb0/0xb0 [ 83.844615] [<ffffffff8153c0f0>] ? tcp_splice_read+0xc0/0x250 [ 83.844618] [<ffffffff814dc0c2>] ? sock_splice_read+0x22/0x30 [ 83.844622] [<ffffffff811b820b>] ? do_splice_to+0x7b/0xa0 [ 83.844627] [<ffffffff811ba4bc>] ? sys_splice+0x59c/0x5d0 [ 83.844630] [<ffffffff8119745b>] ? putname+0x2b/0x40 [ 83.844633] [<ffffffff8118bcb4>] ? do_sys_open+0x174/0x1e0 [ 83.844636] [<ffffffff815b6202>] ? system_call_fastpath+0x16/0x1b if recv_actor() returns 0, we should stop immediately, because looping wont give a chance to drain the pipe. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:07:19 -08:00
Linus Torvalds	7be72c3954	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 patches from Martin Schwidefsky: "Add the finit_module system call, fix the irq statistics in /proc/stat, fix a s390dbf lockdep problem, a patch revert for a problem that is not 100% understood yet, and a few patches to fix warnings." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: define read*_relaxed functions s390/topology: export cpu_topology s390/pm: export pm_power_off s390/pci: define isa_dma_bridge_buggy s390/3215: partially revert tty close handling fix s390/irq: count cpu restart events s390/irq: remove split irq fields from /proc/stat s390/irq: enable irq sum accounting for /proc/stat again s390/syscalls: wire up finit_module syscall s390/pci: remove dead code s390/smp: fix section mismatch for smp_add_present_cpu() s390/debug: Fix s390dbf lockdep problem in debug_(un)register_view()	2013-01-10 08:20:15 -08:00
Pablo Neira Ayuso	4610476d89	netfilter: xt_CT: fix unset return value if conntrack zone are disabled net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’: net/netfilter/xt_CT.c:250:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v0’: net/netfilter/xt_CT.c:112:6: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] Reported-by: Borislav Petkov <bp@alien8.de> Acked-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-10 13:11:00 +01:00
YOSHIFUJI Hideaki / 吉藤英明	6c40d100ce	ipv6: Use container_of macro instead of magic number to get ipv6 header. In ipv6_recv_error(), addr_offset points to daddr field of the ip header. To get ipv6 header, use container_of() macro instead of substracting magic number (24). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:59:53 -08:00
YOSHIFUJI Hideaki / 吉藤英明	b4fff5f8bf	unix: Use FIELD_SIZEOF() in af_unix_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明	ce6654cfc1	rxrpc: Use FIELD_SIZEOF() in af_rxrpc_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明	3523b29bd2	openvswitch: Use FIELD_SIZEOF() in dp_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:24 -08:00
YOSHIFUJI Hideaki / 吉藤英明	fab2574591	netlink: Use FIELD_SIZEOF() in netlink_proto_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:23 -08:00
YOSHIFUJI Hideaki / 吉藤英明	ba96bcbcd2	ipv6: Use FIELD_SIZEOF() in inet6_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:23 -08:00
YOSHIFUJI Hideaki / 吉藤英明	95c7e0e4d4	ipv4: Use FIELD_SIZEOF() in inet_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:23 -08:00
John W. Linville	a9b8a894ad	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-01-09 11:01:37 -05:00
Jiri Pirko	948b337e62	net: init perm_addr in register_netdevice() Benefit from the fact that dev->addr_assign_type is set to NET_ADDR_PERM in case the device has permanent address. This also fixes the problem that many drivers do not set perm_addr at all. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 18:00:47 -08:00
Cong Wang	c9be4a5c49	net: prevent setting ttl=0 via IP_TTL A regression is introduced by the following commit: commit `4d52cfbef6` Author: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue Jun 2 00:42:16 2009 -0700 net: ipv4/ip_sockglue.c cleanups Pure cleanups but it is not a pure cleanup... - if (val != -1 && (val < 1 \|\| val>255)) + if (val != -1 && (val < 0 \|\| val > 255)) Since there is no reason provided to allow ttl=0, change it back. Reported-by: nitin padalia <padalia.nitin@gmail.com> Cc: nitin padalia <padalia.nitin@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:57:10 -08:00
Cong Wang	b3d936f3ea	netpoll: add IPv6 support Currently, netpoll only supports IPv4. This patch adds IPv6 support to netpoll so that we can run netconsole over IPv6 network. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:56:10 -08:00
Cong Wang	acb3e04119	ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library As suggested by David, udp6_csum_init() is too big to be inlined, move it to ipv6 static library, net/ipv6/ip6_checksum.c. And the generic csum_ipv6_magic() too. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:56:10 -08:00
Cong Wang	b7394d2429	netpoll: prepare for ipv6 This patch adjusts some struct and functions, to prepare for supporting IPv6. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:56:09 -08:00
Eric Dumazet	fda55eca5a	net: introduce skb_transport_header_was_set() We have skb_mac_header_was_set() helper to tell if mac_header was set on a skb. We would like the same for transport_header. __netif_receive_skb() doesn't reset the transport header if already set by GRO layer. Note that network stacks usually reset the transport header anyway, after pulling the network header, so this change only allows a followup patch to have more precise qdisc pkt_len computation for GSO packets at ingress side. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:51:54 -08:00
Trond Myklebust	87ed50036b	SUNRPC: Ensure we release the socket write lock if the rpc_task exits early If the rpc_task exits while holding the socket write lock before it has allocated an rpc slot, then the usual mechanism for releasing the write lock in xprt_release() is defeated. The problem occurs if the call to xprt_lock_write() initially fails, so that the rpc_task is put on the xprt->sending wait queue. If the task exits after being assigned the lock by __xprt_lock_write_func, but before it has retried the call to xprt_lock_and_alloc_slot(), then it calls xprt_release() while holding the write lock, but will immediately exit due to the test for task->tk_rqstp != NULL. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.1]	2013-01-08 14:30:43 -05:00
Linus Torvalds	5c33d9b248	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) New sysctl ndisc_notify needs some documentation, from Hanns Frederic Sowa. 2) Netfilter REJECT target doesn't set transport header of SKB correctly, from Mukund Jampala. 3) Forcedeth driver needs to check for DMA mapping failures, from Larry Finger. 4) brcmsmac driver can't use usleep_range while holding locks, use udelay instead. From Niels Ole Salscheider. 5) Fix unregister of netlink bridge multicast database handlers, from Vlad Yasevich and Rami Rosen. 6) Fix checksum calculations in netfilter's ipv6 network prefix translation module. 7) Fix high order page allocation failures in netfilter xt_recent, from Eric Dumazet. 8) mac802154 needs to use netif_rx_ni() instead of netif_rx() because mac802154_process_data() can execute in process rather than interrupt context. From Alexander Aring. 9) Fix splice handling of MSG_SENDPAGE_NOTLAST, otherwise we elide one tcp_push() too many. From Eric Dumazet and Willy Tarreau. 10) Fix skb->truesize tracking in XEN netfront driver, from Ian Campbell. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) xen/netfront: improve truesize tracking ipv4: fix NULL checking in devinet_ioctl() tcp: fix MSG_SENDPAGE_NOTLAST logic net/ipv4/ipconfig: really display the BOOTP/DHCP server's address. ip-sysctl: fix spelling errors mac802154: fix NOHZ local_softirq_pending 08 warning ipv6: document ndisc_notify in networking/ip-sysctl.txt ath9k: Fix Kconfig for ATH9K_HTC netfilter: xt_recent: avoid high order page allocations netfilter: fix missing dependencies for the NOTRACK target netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation bridge: add empty br_mdb_init() and br_mdb_uninit() definitions. vxlan: allow live mac address change bridge: Correctly unregister MDB rtnetlink handlers brcmfmac: fix parsing rsn ie for ap mode. brcmsmac: add copyright information for Canonical rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call ...	2013-01-08 07:31:49 -08:00
Heiko Carstens	420f42ecf4	s390/irq: remove split irq fields from /proc/stat Now that irq sum accounting for /proc/stat's "intr" line works again we have the oddity that the sum field (first field) contains only the sum of the second (external irqs) and third field (I/O interrupts). The reason for that is that these two fields are already sums of all other fields. So if we would sum up everything we would count every interrupt twice. This is broken since the split interrupt accounting was merged two years ago: `052ff461c8` "[S390] irq: have detailed statistics for interrupt types". To fix this remove the split interrupt fields from /proc/stat's "intr" line again and only have them in /proc/interrupts. This restores the old behaviour, seems to be the only sane fix and mimics a behaviour from other architectures where /proc/interrupts also contains more than /proc/stat's "intr" line does. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-01-08 10:57:07 +01:00
Jiri Pirko	c03a14e8db	ethtool: consolidate work with ethtool_ops No need to check if ethtool_ops == NULL since it can't be. Use local variable "ops" in functions where it is present instead of dev->ethtool_ops Introduce local variable "ops" in functions where dev->ethtool_ops is used many times. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-07 19:54:19 -08:00
David S. Miller	32fa10b24e	Merge branch 'master' of git://1984.lsi.us.es/nf Pablo Neira Ayuso says: ==================== The following batch contains Netfilter fixes for 3.8-rc2, they are: * Fix IPv6 stateless network/port translation (NPT) checksum calculation, from Ulrich Weber. * Fix for xt_recent to avoid memory allocation failures if large hashtables are used, from Eric Dumazet. * Fix missing dependencies in Kconfig for the deprecated NOTRACK, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-07 19:25:13 -08:00
Alex Elder	36a25de233	sctp: fix Kconfig bug in default cookie hmac selection Commit `0d0863b020` ("sctp: Change defaults on cookie hmac selection") added a "choice" to the sctp Kconfig file. It introduced a bug which led to an infinite loop when while running "make oldconfig". The problem is that the wrong symbol was defined as the default value for the choice. Using the correct value gets rid of the infinite loop. Note: if CONFIG_SCTP_COOKIE_HMAC_SHA1=y was present in the input config file, both that and CONFIG_SCTP_COOKIE_HMAC_MD5=y be present in the generated config file. Signed-off-by: Alex Elder <elder@inktank.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-07 09:27:06 -08:00
Xi Wang	c7e2e1d72e	ipv4: fix NULL checking in devinet_ioctl() The NULL pointer check `!ifa' should come before its first use. [ Bug origin : commit `fd23c3b311` (ipv4: Add hash table of interface addresses) in linux-2.6.39 ] Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:11:18 -08:00
Hannes Frederic Sowa	5d134f1c1f	tcp: make sysctl_tcp_ecn namespace aware As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl namespace aware. The reason behind this patch is to ease the testing of ecn problems on the internet and allows applications to tune their own use of ecn. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:09:56 -08:00
YOSHIFUJI Hideaki / 吉藤英明	71bcdba06d	ndisc: Use struct rd_msg for redirect message. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:08:38 -08:00
Eric Dumazet	9ca1b22d6d	net: splice: avoid high order page splitting splice() can handle pages of any order, but network code tries hard to split them in PAGE_SIZE units. Not quite successfully anyway, as __splice_segment() assumed poff < PAGE_SIZE. This is true for the skb->data part, not necessarily for the fragments. This patch removes this logic to give the pages as they are in the skb. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:07:24 -08:00
Jiri Pirko	7826d43f2d	ethtool: fix drvinfo strings set in drivers Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:06:31 -08:00
Jiri Pirko	2afb9b5334	ethtool: set addr_assign_type to NET_ADDR_SET when addr is passed on create In case user passed address via netlink during create, NET_ADDR_PERM was set. That is not correct so fix this by setting NET_ADDR_SET. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:05:02 -08:00
YOSHIFUJI Hideaki / 吉藤英明	b7dc8c3959	ndisc: Remove unused space at tail of skb for ndisc messages. (TAKE 3) Currently, the size of skb allocated for NDISC is MAX_HEADER + LL_RESERVED_SPACE(dev) + packet length + dev->needed_tailroom, but only LL_RESERVED_SPACE(dev) bytes is "reserved" for headers. As a result, the skb looks like this (after construction of the message): head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\|<--MAX_HEADER-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom As the name implies, "MAX_HEADER" is used for headers, and should be "reserved" in prior to packet construction. Or, if some space is really required at the tail of ther skb, it should be explicitly documented. We have several option after construction of NDISC message: Option 1: head data tail end +---------------------------------------------+ + \| \| \| +---------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Option 2: head data tail end +--------------------------------------------------+ + \| \| \| +--------------------------------------------------+ \|<--MAX_HEADER-->\|<---ipv6 packet------>\|<--tlen-->\| = dev ->needed_ tailroom Option 3: head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<--MAX_HEADER-->\|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Our tunnel drivers try expanding headroom and the space for tunnel encapsulation was not a mandatory space -- so we are not seeing bugs here --, but just for optimization for performance critial situations. Since NDISC messages are not performance critical unlike TCP, and as we know outgoing device, LL_RESERVED_SPACE(dev) should be just enough for the device in most (if not all) cases: LL_RESERVED_SPACE(dev) <= LL_MAX_HEADER <= MAX_HEADER Note that LL_RESERVED_SPACE(dev) is also enough for NDISC over SIT (e.g., ISATAP). So, I think Option 1 is just fine here. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 15:16:44 -08:00
Philippe De Muyter	9dd4a13a89	net/ipv4/ipconfig: really display the BOOTP/DHCP server's address. Up to now, the debug and info messages from the ipconfig subsytem claim to display the IP address of the DHCP/BOOTP server but display instead the IP address of the bootserver. Fix that. Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 15:14:14 -08:00
Alexander Aring	5ff3fec6d3	mac802154: fix NOHZ local_softirq_pending 08 warning When using nanosleep() in an userspace application we get a ratelimit warning NOHZ: local_softirq_pending 08 for 10 times. This patch replaces netif_rx() with netif_rx_ni() which has to be used from process/softirq context. The process/softirq context will be called from fakelb driver. See linux-kernel commit `481a819` for similar fix. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:47:21 -08:00
Jiri Pirko	8b98a70c28	net: remove no longer used netdev_set_bond_master() and netdev_set_master() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Jiri Pirko	471cb5a33d	bonding: remove usage of dev->master Benefit from new upper dev list and free bonding from dev->master usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Jiri Pirko	1cdfd72f79	vlan: remove usage of dev->master in __vlan_find_dev_deep() Also, since all users call __vlan_find_dev_deep() with rcu_read_lock, make no possibility to call this with rtnl mutex held only. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Jiri Pirko	49bd8fb0b1	netpoll: remove usage of dev->master Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Jiri Pirko	74fdd93fbc	bridge: remove usage of netdev_set_master() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Jiri Pirko	898e506171	rtnetlink: remove usage of dev->master Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:49 -08:00
Jiri Pirko	126d6c236b	vlan: add link to upper device Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:49 -08:00
Jiri Pirko	9ff162a8b9	net: introduce upper device lists This lists are supposed to serve for storing pointers to all upper devices. Eventually it will replace dev->master pointer which is used for bonding, bridge, team but it cannot be used for vlan, macvlan where there might be multiple upper present. In case the upper link is replacement for dev->master, it is marked with "master" flag. New upper device list resolves this limitation. Also, the information stored in lists is used for preventing looping setups like "bond->somethingelse->samebond" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:49 -08:00
Eric Dumazet	2727de7604	netfilter: xt_recent: avoid high order page allocations xt_recent can try high order page allocations and this can fail. iptables: page allocation failure: order:9, mode:0xc0d0 It also wastes about half the allocated space because of kmalloc() power-of-two roundups and struct recent_table layout. Use vmalloc() instead to save space and be less prone to allocation errors when memory is fragmented. Reported-by: Miroslav Kratochvil <exa.exa@gmail.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Harald Reindl <h.reindl@thelounge.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-04 20:14:42 +01:00
Pablo Neira Ayuso	757ae316fb	netfilter: fix missing dependencies for the NOTRACK target warning: (NETFILTER_XT_TARGET_NOTRACK) selects NETFILTER_XT_TARGET_CT which has unmet direct +dependencies (NET && INET && NETFILTER && NETFILTER_XTABLES && NF_CONNTRACK && (IP_NF_RAW \|\| +IP6_NF_RAW) && NETFILTER_ADVANCED) Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-04 20:14:38 +01:00
Ulrich Weber	429da4c0b1	netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation csum16_add() has a broken carry detection, should be: sum += sum < (__force u16)b; Instead of fixing csum16_add, remove the custom checksum functions and use the generic csum_add/csum_sub ones. Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-04 20:03:02 +01:00
Trond Myklebust	360e1a5349	SUNRPC: Partial revert of commit `168e4b39d1` Partially revert commit (SUNRPC: add WARN_ON_ONCE for potential deadlock). The looping behaviour has been tracked down to a knownn issue with workqueues, and a workaround has now been implemented. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org [>= 3.7]	2013-01-04 12:59:30 -05:00
Trond Myklebust	c6567ed140	SUNRPC: Ensure that we free the rpc_task after cleanups are done This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org	2013-01-04 12:53:59 -05:00
Jiri Pirko	15c6ff3bc0	net: remove unnecessary NET_ADDR_RANDOM "bitclean" NET_ADDR_SET is set in dev_set_mac_address() no need to alter dev->addr_assign_type value in drivers. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 22:37:36 -08:00
Jiri Pirko	fbdeca2d77	net: add address assign type "SET" This is the way to indicate that mac address of a device has been set by dev_set_mac_address() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 22:37:36 -08:00
Jiri Pirko	f652151640	net: call add_device_randomness() only after successful mac change Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 22:37:35 -08:00
Jiri Pirko	e7c3273ec2	rtnl: use dev_set_mac_address() instead of plain ndo_ Benefit from existence of dev_set_mac_address() and remove duplicate code. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 22:37:35 -08:00
Chaitanya	09b1426e7f	mac80211: fix maximum MTU The maximum MTU shouldn't take the headers into account, the maximum MSDU size is exactly the maximum MTU. Signed-off-by: T Krishna Chaitanya <chaitanyatk@posedge.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 13:00:01 +01:00
Johannes Berg	826262c3d2	mac80211: fix dtim_period in hidden SSID AP association When AP's SSID is hidden the BSS can appear several times in cfg80211's BSS list: once with a zero-length SSID that comes from the beacon, and once for each SSID from probe reponses. Since the mac80211 stores its data in ieee80211_bss which is embedded into cfg80211_bss, mac80211's data will be duplicated too. This becomes a problem when a driver needs the dtim_period since this data exists only in the beacon's instance in cfg80211 bss table which isn't the instance that is used when associating. Remove the DTIM period from the BSS table and track it explicitly to avoid this problem. Cc: stable@vger.kernel.org Tested-by: Efi Tubul <efi.tubul@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 13:00:00 +01:00
Johannes Berg	a56f992cda	mac80211: use del_timer_sync for final sta cleanup timer deletion This is a very old bug, but there's nothing that prevents the timer from running while the module is being removed when we only do del_timer() instead of del_timer_sync(). The timer should normally not be running at this point, but it's not clearly impossible (or we could just remove this.) Cc: stable@vger.kernel.org Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 13:00:00 +01:00
Johannes Berg	97f97b1f5f	mac80211: fix station destruction in AP/mesh modes Unfortunately, commit `b22cfcfcae`, intended to speed up roaming by avoiding the synchronize_rcu() broke AP/mesh modes as it moved some code into that work item that will still call into the driver at a time where it's no longer expected to handle this: after the AP or mesh has been stopped. To fix this problem remove the per-station work struct, maintain a station cleanup list instead and flush this list when stations are flushed. To keep this patch smaller for stable, do this when the stations are flushed (sta_info_flush()). This unfortunately brings back the original roaming delay; I'll fix that again in a separate patch. Also, Ben reported that the original commit could sometimes (with many interfaces) cause long delays when an interface is set down, due to blocking on flush_workqueue(). Since we now maintain the cleanup list, this particular change of the original patch can be reverted. Cc: stable@vger.kernel.org [3.7] Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 12:59:59 +01:00
Thomas Pedersen	b7cfcd113a	mac80211: RMC buckets are just list heads The array of rmc_entrys is redundant since only the list_head is used. Make this an array of list_heads instead and save ~6k per vif at runtime :D Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 12:59:59 +01:00
Johannes Berg	4d76d21bd7	mac80211: assign VLAN channel contexts Make AP_VLAN type interfaces track the AP master channel context so they have one assigned for the various lookups. Don't give them their own refcount etc. since they're just slaves to the AP master. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 12:59:58 +01:00
Felix Fietkau	2d4072a547	mac80211: flush AP_VLAN stations when tearing down the BSS AP Signed-off-by: Felix Fietkau <nbd@openwrt.org> [change to flush stations with AP flush in second loop] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 12:59:58 +01:00
Stanislaw Gruszka	34bcf71502	mac80211: fix ibss scanning Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this can trigger Microcode errors on iwlwifi and iwlegacy drivers. Also rename ieee80211_request_internal_scan() function since it is only used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss(). This patch should address: https://bugzilla.redhat.com/show_bug.cgi?id=883414 https://bugzilla.kernel.org/show_bug.cgi?id=49411 Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm> Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-03 12:59:57 +01:00
Rami Rosen	fdb184d146	bridge: add empty br_mdb_init() and br_mdb_uninit() definitions. This patch adds empty br_mdb_init() and br_mdb_uninit() definitions in br_private.h to avoid build failure when CONFIG_BRIDGE_IGMP_SNOOPING is not set. These methods were moved from br_multicast.c to br_netlink.c by commit `3ec8e9f085` Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 03:35:22 -08:00
Vlad Yasevich	3ec8e9f085	bridge: Correctly unregister MDB rtnetlink handlers Commit `63233159fd`: bridge: Do not unregister all PF_BRIDGE rtnl operations introduced a bug where a removal of a single bridge from a multi-bridge system would remove MDB netlink handlers. The handlers should only be removed once all bridges are gone, but since we don't keep track of the number of bridge interfaces, it's simpler to do it when the bridge module is unloaded. To make it consistent, move the registration code into module initialization code path. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-03 01:56:11 -08:00
Linus Torvalds	58890c0669	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Two of Alex's patches deal with a race when reseting server connections for open RBD images, one demotes some non-fatal BUGs to WARNs, and my patch fixes a protocol feature bit failure path." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix protocol feature mismatch failure path libceph: WARN, don't BUG on unexpected connection states libceph: always reset osds when kicking libceph: move linger requests sooner in kick_requests()	2013-01-02 17:32:49 -08:00
stephen hemminger	576eb62598	bridge: respect RFC2863 operational state The bridge link detection should follow the operational state of the lower device, rather than the carrier bit. This allows devices like tunnels that are controlled by userspace control plane to work with bridge STP link management. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-30 02:31:43 -08:00
Daniel Borkmann	aa1113d9f8	net: filter: return -EINVAL if BPF_S_ANC* operation is not supported Currently, we return -EINVAL for malformed or wrong BPF filters. However, this is not done for BPF_S_ANC* operations, which makes it more difficult to detect if it's actually supported or not by the BPF machine. Therefore, we should also return -EINVAL if K is within the SKF_AD_OFF universe and the ancillary operation did not match. Why exactly is it needed? If tools such as libpcap/tcpdump want to make use of new ancillary operations (like filtering VLAN in kernel space), there is currently no sane way to test if this feature / BPF_S_ANC* op is present or not, since no error is returned. This patch will make life easier for that and allow for a proper usage for user space applications. There was concern, if this patch will break userland. Short answer: Yes and no. Long answer: It will "break" only for code that calls ... { BPF_LD \| BPF_(W\|H\|B) \| BPF_ABS, 0, 0, <K> }, ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is not an ancillary. And here comes the BUT: assuming some old code will have such an instruction where <K> is between [0xfffff000, 0xffffffff] and it doesn't know ancillary operations, then this will give a non-expected / unwanted behavior as well (since we do not return the BPF machine with 0 after a failed load_pointer(), which was the case before introducing ancillary operations, but load sth. into the accumulator instead, and continue with the next instruction, for instance). Thus, user space code would already have been broken by introducing ancillary operations into the BPF machine per se. Code that does such a direct load, e.g. "load word at packet offset 0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken, isn't it? The whole assumption of ancillary operations is that no-one intentionally calls things like "ld [0xffffffff]" and expect this word to be loaded from such a packet offset. Hence, we can also safely make use of this feature testing patch and facilitate application development. Therefore, at least from this patch onwards, we have for sure a check whether current or in future implemented BPF_S_ANC* ops are supported in the kernel. Patch was tested on x86_64. (Thanks to Eric for the previous review.) Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Ani Sinha <ani@aristanetworks.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-30 02:30:28 -08:00
stephen hemminger	61c5e88aec	skbuff: make __kmalloc_reserve static Sparse detected case where this local function should be static. It may even allow some compiler optimizations. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 20:32:36 -08:00
stephen hemminger	bb717d7649	tcp: make proc_tcp_fastopen_key static Detected by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 20:32:36 -08:00
stephen hemminger	bd2a13e2eb	sctp: make sctp_addr_wq_timeout_handler static Fix sparse warning about local function that should be static. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 20:32:36 -08:00
Eric Dumazet	b2111724a6	net: use per task frag allocator in skb_append_datato_frags Use the new per task frag allocator in skb_append_datato_frags(), to reduce number of frags and page allocator overhead. Tested: ifconfig lo mtu 16436 perf record netperf -t UDP_STREAM ; perf report before : Throughput: 32928 Mbit/s 51.79% netperf [kernel.kallsyms] [k] copy_user_generic_string 5.98% netperf [kernel.kallsyms] [k] __alloc_pages_nodemask 5.58% netperf [kernel.kallsyms] [k] get_page_from_freelist 5.01% netperf [kernel.kallsyms] [k] __rmqueue 3.74% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.87% netperf [kernel.kallsyms] [k] prep_new_page 1.42% netperf [kernel.kallsyms] [k] next_zones_zonelist 1.28% netperf [kernel.kallsyms] [k] __inc_zone_state 1.26% netperf [kernel.kallsyms] [k] alloc_pages_current 0.78% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.74% netperf [kernel.kallsyms] [k] udp_sendmsg 0.72% netperf [kernel.kallsyms] [k] zone_watermark_ok 0.68% netperf [kernel.kallsyms] [k] __cpuset_node_allowed_softwall 0.67% netperf [kernel.kallsyms] [k] fib_table_lookup 0.60% netperf [kernel.kallsyms] [k] memcpy_fromiovecend 0.55% netperf [kernel.kallsyms] [k] __udp4_lib_lookup after: Throughput: 47185 Mbit/s 61.74% netperf [kernel.kallsyms] [k] copy_user_generic_string 2.07% netperf [kernel.kallsyms] [k] prep_new_page 1.98% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.02% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.97% netperf [kernel.kallsyms] [k] enqueue_task_fair 0.97% netperf [kernel.kallsyms] [k] udp_sendmsg 0.91% netperf [kernel.kallsyms] [k] __ip_route_output_key 0.88% netperf [kernel.kallsyms] [k] __netif_receive_skb 0.87% netperf [kernel.kallsyms] [k] fib_table_lookup 0.85% netperf [kernel.kallsyms] [k] resched_task 0.78% netperf [kernel.kallsyms] [k] __udp4_lib_lookup 0.77% netperf [kernel.kallsyms] [k] _raw_spin_lock_irqsave Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 15:25:19 -08:00
Jiri Pirko	9a57247f31	rtnl: expose carrier value with possibility to set it Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 15:24:18 -08:00
Jiri Pirko	fdae0fde53	net: allow to change carrier via sysfs Make carrier writable Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 15:24:18 -08:00
Jiri Pirko	4bf84c35c6	net: add change_carrier netdev op This allows a driver to register change_carrier callback which will be called whenever user will like to change carrier state. This is useful for devices like dummy, gre, team and so on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 15:24:18 -08:00
David S. Miller	ac196f8c92	Merge branch 'master' of git://1984.lsi.us.es/nf Pablo Neira Ayuso says: ==================== The following batch contains Netfilter fixes for 3.8-rc1. They are a mixture of old bugs that have passed unnoticed (I'll pass these to stable) and more fresh ones from the previous merge window, they are: * Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd showing up wrong address, from Bob Hockney. * Fix a comment in nf_conntrack_ipv6, from Florent Fourcot. * Fix a leak an error path in ctnetlink while creating an expectation, from Jesper Juhl. * Fix missing ICMP time exceeded in the IPv6 defragmentation code, from Haibo Xi. * Fix inconsistent handling of routing changes in MASQUERADE for the new connections case, from Andrew Collins. * Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to crashes in the ixgbe driver (since it seems to access the transport header with TSO enabled), from Mukund Jampala. * Recover obsoleted NOTRACK target by including it into the CT and spot a warning via printk about being obsoleted. Many people don't check the scheduled to be removal file under Documentation, so we follow some less agressive approach to kill this in a year or so. Spotted by Florian Westphal, patch from myself. * Fix race condition in xt_hashlimit that allows to create two or more entries, from myself. * Fix crash if the CT is used due to the recently added facilities to consult the dying and unconfirmed conntrack lists, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 14:28:17 -08:00
Sage Weil	0fa6ebc600	libceph: fix protocol feature mismatch failure path We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	122070a2ff	libceph: WARN, don't BUG on unexpected connection states A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	e6d50f67a6	libceph: always reset osds when kicking When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00
Alex Elder	ab60b16d3c	libceph: move linger requests sooner in kick_requests() The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for completed linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request before it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and that will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-27 20:27:04 -06:00
Isaku Yamahata	ae782bb16c	ipv6/ip6_gre: set transport header correctly ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:19:56 -08:00
Isaku Yamahata	861aa6d56d	ipv4/ip_gre: set transport header correctly to gre header ipgre_tunnel_xmit() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. So set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:19:56 -08:00
Marciniszyn, Mike	a49675988c	IB/rds: suppress incompatible protocol when version is known Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:17:37 -08:00
Marciniszyn, Mike	f2e9bd7032	IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len `0b088e00` ("RDS: Use page_remainder_alloc() for recv bufs") added uses of sg_dma_len() and sg_dma_address(). This makes RDS DOA with the qib driver. IB ulps should use ib_sg_dma_len() and ib_sg_dma_address respectively since some HCAs overload ib_sg_dma* operations. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:17:37 -08:00
Eric Dumazet	c3ae62af8e	tcp: should drop incoming frames without ACK flag set In commit `96e0bf4b51` (tcp: Discard segments that ack data not yet sent) John Dykstra enforced a check against ack sequences. In commit `354e4aa391` (tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation) I added more safety tests. But we missed fact that these tests are not performed if ACK bit is not set. RFC 793 3.9 mandates TCP should drop a frame without ACK flag set. " fifth check the ACK field, if the ACK bit is off drop the segment and return" Not doing so permits an attacker to only guess an acceptable sequence number, evading stronger checks. Many thanks to Zhiyun Qian for bringing this issue to our attention. See : http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf Reported-by: Zhiyun Qian <zhiyunq@umich.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:08:55 -08:00
Akinobu Mita	143cdd8f33	batman-adv: fix random jitter calculation batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 14:13:23 -08:00
Jesper Juhl	1310b955c8	netfilter: ctnetlink: fix leak in error path of ctnetlink_create_expect This patch fixes a leak in one of the error paths of ctnetlink_create_expect if no helper and no timeout is specified. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-26 23:02:09 +01:00
Vitaly E. Lavrov	32263dd1b4	netfilter: xt_hashlimit: fix namespace destroy path recent_net_exit() is called before recent_mt_destroy() in the destroy path of network namespaces. Make sure there are no entries in the parent proc entry xt_recent before removing it. Signed-off-by: Vitaly E. Lavrov <lve@guap.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-26 18:14:48 +01:00
Vitaly E. Lavrov	665e205c16	netfilter: xt_recent: fix namespace destroy path recent_net_exit() is called before recent_mt_destroy() in the destroy path of network namespaces. Make sure there are no entries in the parent proc entry xt_recent before removing it. Signed-off-by: Vitaly E. Lavrov <lve@guap.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-26 18:14:48 +01:00
Pablo Neira Ayuso	09181842b0	netfilter: xt_hashlimit: fix race that results in duplicated entries Two packets may race to create the same entry in the hashtable, double check if this packet lost race. This double checking only happens in the path of the packet that creates the hashtable for first time. Note that, with this patch, no packet drops occur if the race happens. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-26 18:14:44 +01:00

1 2 3 4 5 ...

26314 Commits