linux

iv/linux

Author	SHA1	Message	Date
Pablo Neira Ayuso	1cdb09056b	netfilter: nfnetlink_queue: use xor hash function to distribute instances Thanks to Eric Dumazet for suggesting this during the NFWS. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 12:38:40 +01:00
Pablo Neira Ayuso	bae99f7a1d	netfilter: nfnetlink_queue: fix incorrect initialization of copy range field 2^16 = 0xffff, not 0xfffff (note the extra 'f'). Not dangerous since you adjust it to min_t(data_len, skb->len) just after on. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 12:35:49 +01:00
Gao feng	0d98da5d84	netfilter: nf_conntrack: register pernet subsystem before register L4 proto In (c296bb4 netfilter: nf_conntrack: refactor l4proto support for netns) the l4proto gre/dccp/udplite/sctp registration happened before the pernet subsystem, which is wrong. Register pernet subsystem before register L4proto since after register L4proto, init_conntrack may try to access the resources which allocated in register_pernet_subsys. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 12:29:25 +01:00
Gao feng	fa900b9cf5	netfilter: ebt_ulog: remove unnecessary spin lock protection No need for spinlock to protect the netlink skb in the ebt_ulog_fini path. We are sure there is noone using it at that stage. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 11:56:09 +01:00
Hannes Frederic Sowa	d00bd3d4fb	netfilter: nf_ct_ipv6: use ipv6_iface_scope_id in conntrack to return scope id As in (842df07 ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id). Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 11:48:43 +01:00
Silviu-Mihai Popescu	5eb358d029	bridge: netfilter: use PTR_RET instead of IS_ERR + PTR_ERR This uses PTR_RET instead of IS_ERR and PTR_ERR in order to increase readability. Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 11:03:56 +01:00
Silviu-Mihai Popescu	015ba03c1a	ipv4: netfilter: use PTR_RET instead of IS_ERR + PTR_ERR This uses PTR_RET instead of IS_ERR and PTR_ERR in order to increase readability. Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 11:02:14 +01:00
YOSHIFUJI Hideaki	2d2fd8c50a	netfilter: ip6t_NPT: Use csum_partial() [ Some fixes went into mainstream before this patch, so I needed to rebase it upon the current tree, that's why it's different from the original one posted on the list --pablo ] Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-15 11:02:04 +01:00
Vinicius Costa Gomes	eb20ff9c91	Bluetooth: Fix not closing SCO sockets in the BT_CONNECT2 state With deferred setup for SCO, it is possible that userspace closes the socket when it is in the BT_CONNECT2 state, after the Connect Request is received but before the Accept Synchonous Connection is sent. If this happens the following crash was observed, when the connection is terminated: [ +0.000003] hci_sync_conn_complete_evt: hci0 status 0x10 [ +0.000005] sco_connect_cfm: hcon ffff88003d1bd800 bdaddr 40:98:4e:32:d7:39 status 16 [ +0.000003] sco_conn_del: hcon ffff88003d1bd800 conn ffff88003cc8e300, err 110 [ +0.000015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000199 [ +0.000906] IP: [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] PGD 3d21f067 PUD 3d291067 PMD 0 [ +0.000000] Oops: 0002 [#1] SMP [ +0.000000] Modules linked in: rfcomm bnep btusb bluetooth [ +0.000000] CPU 0 [ +0.000000] Pid: 1481, comm: kworker/u:2H Not tainted 3.9.0-rc1-25019-gad82cdd #1 Bochs Bochs [ +0.000000] RIP: 0010:[<ffffffff810620dd>] [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] RSP: 0018:ffff88003c3c19d8 EFLAGS: 00010002 [ +0.000000] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000 [ +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003d1be868 [ +0.000000] RBP: ffff88003c3c1a98 R08: 0000000000000002 R09: 0000000000000000 [ +0.000000] R10: ffff88003d1be868 R11: ffff88003e20b000 R12: 0000000000000002 [ +0.000000] R13: ffff88003aaa8000 R14: 000000000000006e R15: ffff88003d1be850 [ +0.000000] FS: 0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000 [ +0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ +0.000000] CR2: 0000000000000199 CR3: 000000003c1cb000 CR4: 00000000000006b0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ +0.000000] Process kworker/u:2H (pid: 1481, threadinfo ffff88003c3c0000, task ffff88003aaa8000) [ +0.000000] Stack: [ +0.000000] ffffffff81b16342 0000000000000000 0000000000000000 ffff88003d1be868 [ +0.000000] ffffffff00000000 00018c0c7863e367 000000003c3c1a28 ffffffff8101efbd [ +0.000000] 0000000000000000 ffff88003e3d2400 ffff88003c3c1a38 ffffffff81007c7a [ +0.000000] Call Trace: [ +0.000000] [<ffffffff8101efbd>] ? kvm_clock_read+0x34/0x3b [ +0.000000] [<ffffffff81007c7a>] ? paravirt_sched_clock+0x9/0xd [ +0.000000] [<ffffffff81007fd4>] ? sched_clock+0x9/0xb [ +0.000000] [<ffffffff8104fd7a>] ? sched_clock_local+0x12/0x75 [ +0.000000] [<ffffffff810632d1>] lock_acquire+0x93/0xb1 [ +0.000000] [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffff8105f3d8>] ? lock_release_holdtime.part.22+0x4e/0x55 [ +0.000000] [<ffffffff814f6038>] _raw_spin_lock+0x40/0x74 [ +0.000000] [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffff814f6936>] ? _raw_spin_unlock+0x23/0x36 [ +0.000000] [<ffffffffa0022339>] spin_lock+0x9/0xb [bluetooth] [ +0.000000] [<ffffffffa00230cc>] sco_conn_del+0x76/0xbb [bluetooth] [ +0.000000] [<ffffffffa002391d>] sco_connect_cfm+0x2da/0x2e9 [bluetooth] [ +0.000000] [<ffffffffa000862a>] hci_proto_connect_cfm+0x38/0x65 [bluetooth] [ +0.000000] [<ffffffffa0008d30>] hci_sync_conn_complete_evt.isra.79+0x11a/0x13e [bluetooth] [ +0.000000] [<ffffffffa000cd96>] hci_event_packet+0x153b/0x239d [bluetooth] [ +0.000000] [<ffffffff814f68ff>] ? _raw_spin_unlock_irqrestore+0x48/0x5c [ +0.000000] [<ffffffffa00025f6>] hci_rx_work+0xf3/0x2e3 [bluetooth] [ +0.000000] [<ffffffff8103efed>] process_one_work+0x1dc/0x30b [ +0.000000] [<ffffffff8103ef83>] ? process_one_work+0x172/0x30b [ +0.000000] [<ffffffff8103e07f>] ? spin_lock_irq+0x9/0xb [ +0.000000] [<ffffffff8103fc8d>] worker_thread+0x123/0x1d2 [ +0.000000] [<ffffffff8103fb6a>] ? manage_workers+0x240/0x240 [ +0.000000] [<ffffffff81044211>] kthread+0x9d/0xa5 [ +0.000000] [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60 [ +0.000000] [<ffffffff814f75bc>] ret_from_fork+0x7c/0xb0 [ +0.000000] [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60 [ +0.000000] Code: d7 44 89 8d 50 ff ff ff 4c 89 95 58 ff ff ff e8 44 fc ff ff 44 8b 8d 50 ff ff ff 48 85 c0 4c 8b 95 58 ff ff ff 0f 84 7a 04 00 00 <f0> ff 80 98 01 00 00 83 3d 25 41 a7 00 00 45 8b b5 e8 05 00 00 [ +0.000000] RIP [<ffffffff810620dd>] __lock_acquire+0xed/0xe82 [ +0.000000] RSP <ffff88003c3c19d8> [ +0.000000] CR2: 0000000000000199 [ +0.000000] ---[ end trace e73cd3b52352dd34 ]--- Cc: stable@vger.kernel.org [3.8] Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Tested-by: Frederic Dalleau <frederic.dalleau@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-03-14 13:14:21 -03:00
Eric Dumazet	16fad69cfe	tcp: fix skb_availroom() Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit 22b4a4f22da (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-14 11:49:45 -04:00
Linus Torvalds	aea8b5d1e5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This tree includes a partial revert for "fs: Limit sys_mount to only request filesystem modules." When I added the new style module aliases to the filesystems I deleted the old ones. A bad move. It turns out that distributions like Arch linux use module aliases when constructing ramdisks. Which meant ultimately that an ext3 filesystem mounted with ext4 would not result in the ext4 module being put into the ramdisk. The other change in this tree adds a handful of filesystem module alias I simply failed to add the first time. Which inconvinienced a few folks using cifs. I don't want to inconvinience folks any longer than I have to so here are these trivial fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fs: Readd the fs module aliases. fs: Limit sys_mount to only request filesystem modules. (Part 3)	2013-03-13 15:47:50 -07:00
Martin Hundebøll	2df5278b02	batman-adv: network coding - receive coded packets and decode them When receiving a network coded packet, the decoding buffer is searched for a packet to use for decoding. The source, destination, and crc32 from the coded packet is used to identify the wanted packet. The decoded packet is passed to the usual unicast receiver function, as had it never been network coded. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:51 +01:00
Martin Hundebøll	612d2b4fe0	batman-adv: network coding - save overheard and tx packets for decoding To be able to decode a network coded packet, a node must already know one of the two coded packets. This is done by buffering skbs before transmission and buffering packets sniffed with promiscuous mode from other hosts. Packets are kept in a buffer similar to the one with forward-skbs: A hash table, where each entry, which corresponds to a src-dst pair, has a linked list packets. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:50 +01:00
Martin Hundebøll	3c12de9a5c	batman-adv: network coding - code and transmit packets if possible Before adding forward-skbs to the coding buffer, the buffer is searched for a potential coding opportunity. If one is found, the two packets are network coded and transmitted right away. If not, the forward-skb is added to the buffer. Network coded packets are transmitted with information about the two receivers and the two coded packets. The first receiver is given by the MAC header, while the second is given in the payload/bat-header. The second receiver uses promiscuous mode to receive the packet and check the second destination. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:50 +01:00
Martin Hundebøll	953324776d	batman-adv: network coding - buffer unicast packets before forward Two be able to network code two packets, one packet must be buffered until the next is available. This is done in a "coding buffer", which is essentially a hash table with lists of packets. Each entry in the hash table corresponds to a specific src-dst pair, which has a linked list of packets that are buffered. This patch adds skbs to the buffer just before forwarding them. The buffer is traversed every 10 ms, where timed skbs are removed from the buffer and transmitted. To allow experiments with the network coding scheme, the timeout is tunable through a file in debugfs. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:49 +01:00
Martin Hundebøll	d56b1705e2	batman-adv: network coding - detect coding nodes and remove these after timeout To use network coding efficiently, a relay must know when neighbor nodes are likely to have enough information to be able to decode a network coded packet. This is detected by using OGMs from batman-adv to discover when one neighbor is in range of another neighbor. The relay check the TLL to detect when an OGM is forwarded from one neighbor by another neighbor, and thereby knows that the two neighbors are in range and thus overhear packets sent by each other. This information is saved in the orig_node struct to be used when searching for coding opportunities. Two lists are added to the orig_node struct: One for neighbors that can hear the orig_node (outgoing nc_nodes) and one for neighbors that the orig_node can hear (incoming nc_nodes). Information about nc_nodes is kept for 10 seconds and is available through debugfs in batman_adv/nc_nodes to use when debugging network coding. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:49 +01:00
Martin Hundebøll	d353d8d4d9	batman-adv: network coding - add the initial infrastructure code Network coding exploits the 802.11 shared medium to allow multiple packets to be sent in a single transmission. In brief, a relay can XOR two packets, and send the coded packet to two destinations. The receivers can decode one of the original packets by XOR'ing the coded packet with the other original packet. This will lead to increased throughput in topologies where two packets cross one relay. In a simple topology with three nodes, it takes four transmissions without network coding to get one packet from Node A to Node B and one from Node B to Node A: 1. Node A ---- p1 ---> Node R Node B 2. Node A Node R <--- p2 ---- Node B 3. Node A <--- p2 ---- Node R Node B 4. Node A Node R ---- p1 ---> Node B With network coding, the relay only needs one transmission, which saves us one slot of valuable airtime: 1. Node A ---- p1 ---> Node R Node B 2. Node A Node R <--- p2 ---- Node B 3. Node A <- p1 x p2 - Node R - p1 x p2 -> Node B The same principle holds for a topology including five nodes. Here the packets from Node A and Node B are overheard by Node C and Node D, respectively. This allows Node R to send a network coded packet to save one transmission: Node A Node B \| \ / \| \| p1 p2 \| \| \ / \| p1 > Node R < p2 \| \| \| / \ \| \| p1 x p2 p1 x p2 \| v / \ v / \ Node C < > Node D More information is available on the open-mesh.org wiki[1]. This patch adds the initial code to support network coding in batman-adv. It sets up a worker thread to do house keeping and adds a sysfs file to enable/disable network coding. The feature is disabled by default, as it requires a wifi-driver with working promiscuous mode, and also because it adds a small delay at each hop. [1] http://www.open-mesh.org/projects/batman-adv/wiki/Catwoman Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:48 +01:00
Antonio Quartulli	c1d07431b9	batman-adv: don't use !! in bool conversion In C standard any expression different from 0 will be converted to 'true' when casting to bool (whatever is the length of the value). Therefore all the "!!" conversions can be removed. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2013-03-13 22:53:48 +01:00
Martin Hundebøll	f86ce0ad10	batman-adv: Return reason for failure in batadv_check_unicast_packet() batadv_check_unicast_packet() is changed to return a value based on the reason to drop the packet, which will be useful information for future users of batadv_check_unicast_packet(). Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:47 +01:00
Marek Lindner	736292c2e8	batman-adv: replace redundant primary_if_get calls The batadv_priv struct carries a pointer to its own interface struct. Therefore, it is not necessary to retrieve the soft_iface via the primary interface. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-13 22:53:47 +01:00
Xufeng Zhang	2317f449af	sctp: don't break the loop while meeting the active_path so as to find the matched transport sctp_assoc_lookup_tsn() function searchs which transport a certain TSN was sent on, if not found in the active_path transport, then go search all the other transports in the peer's transport_addr_list, however, we should continue to the next entry rather than break the loop when meet the active_path transport. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-13 10:09:55 -04:00
Vlad Yasevich	f281563350	sctp: Use correct sideffect command in duplicate cookie handling When SCTP is done processing a duplicate cookie chunk, it tries to delete a newly created association. For that, it has to set the right association for the side-effect processing to work. However, when it uses the SCTP_CMD_NEW_ASOC command, that performs more work then really needed (like hashing the associationa and assigning it an id) and there is no point to do that only to delete the association as a next step. In fact, it also creates an impossible condition where an association may be found by the getsockopt() call, and that association is empty. This causes a crash in some sctp getsockopts. The solution is rather simple. We simply use SCTP_CMD_SET_ASOC command that doesn't have all the overhead and does exactly what we need. Reported-by: Karl Heiss <kheiss@gmail.com> Tested-by: Karl Heiss <kheiss@gmail.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-13 09:59:21 -04:00
Eric W. Biederman	fa7614ddd6	fs: Readd the fs module aliases. I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-12 18:55:21 -07:00
Linus Torvalds	368edaadc0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This fixes a bug in the new message decoding that just went in during the last window." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix decoding of pgids	2013-03-12 09:22:42 -07:00
Linus Torvalds	5b22b1848b	Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfixes from Bruce Fields: "Some minor fallout from the user-namespace work broke most krb5 mounts to nfsd, and I screwed up a change to the AF_LOCAL rpc code." * 'for-3.9' of git://linux-nfs.org/~bfields/linux: sunrpc: don't attempt to cancel unitialized work nfsd: fix krb5 handling of anonymous principals	2013-03-12 09:20:58 -07:00
Li RongQing	c80a8512ee	net/core: move vlan_depth out of while loop in skb_network_protocol() [ Bug added added in commit 05e8ef4ab2d8087d (net: factor out skb_mac_gso_segment() from skb_gso_segment() ) ] move vlan_depth out of while loop, or else vlan_depth always is ETH_HLEN, can not be increased, and lead to infinite loop when frame has two vlan headers. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 11:47:40 -04:00
Nandita Dukkipati	9b717a8d24	tcp: TLP loss detection. This is the second of the TLP patch series; it augments the basic TLP algorithm with a loss detection scheme. This patch implements a mechanism for loss detection when a Tail loss probe retransmission plugs a hole thereby masking packet loss from the sender. The loss detection algorithm relies on counting TLP dupacks as outlined in Sec. 3 of: http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 The basic idea is: Sender keeps track of TLP "episode" upon retransmission of a TLP packet. An episode ends when the sender receives an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the episode. We want to make sure that before the episode ends the sender receives a "TLP dupack", indicating that the TLP retransmission was unnecessary, so there was no loss/hole that needed plugging. If the sender gets no TLP dupack before the end of the episode, then it reduces ssthresh and the congestion window, because the TLP packet arriving at the receiver probably plugged a hole. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:30:34 -04:00
Nandita Dukkipati	6ba8a3b19e	tcp: Tail loss probe (TLP) This patch series implement the Tail loss probe (TLP) algorithm described in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The first patch implements the basic algorithm. TLP's goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly RTO. In the absence of loss, there is no change in the connection state. PTO stands for probe timeout. It is a timer event indicating that an ACK is overdue and triggers a loss probe packet. The PTO value is set to max(2SRTT, 10ms) and is adjusted to account for delayed ACK timer when there is only one oustanding packet. TLP Algorithm On transmission of new data in Open state: -> packets_out > 1: schedule PTO in max(2SRTT, 10ms). -> packets_out == 1: schedule PTO in max(2RTT, 1.5RTT + 200ms) -> PTO = min(PTO, RTO) Conditions for scheduling PTO: -> Connection is in Open state. -> Connection is either cwnd limited or no new data to send. -> Number of probes per tail loss episode is limited to one. -> Connection is SACK enabled. When PTO fires: new_segment_exists: -> transmit new segment. -> packets_out++. cwnd remains same. no_new_packet: -> retransmit the last segment. Its ACK triggers FACK or early retransmit based recovery. ACK path: -> rearm RTO at start of ACK processing. -> reschedule PTO if need be. In addition, the patch includes a small variation to the Early Retransmit (ER) algorithm, such that ER and TLP together can in principle recover any N-degree of tail loss through fast recovery. TLP is controlled by the same sysctl as ER, tcp_early_retrans sysctl. tcp_early_retrans==0; disables TLP and ER. ==1; enables RFC5827 ER. ==2; delayed ER. ==3; TLP and delayed ER. [DEFAULT] ==4; TLP only. The TLP patch series have been extensively tested on Google Web servers. It is most effective for short Web trasactions, where it reduced RTOs by 15% and improved HTTP response time (average by 6%, 99th percentile by 10%). The transmitted probes account for <0.5% of the overall transmissions. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:30:34 -04:00
Wei Yongjun	74694e7bd0	bridge: using for_each_set_bit to simplify the code Using for_each_set_bit() to simplify the code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:04:09 -04:00
Wei Yongjun	5096e3c4b2	bridge: using for_each_set_bit_from to simplify the code Using for_each_set_bit_from() to simplify the code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:04:08 -04:00
David S. Miller	e5f2ef7ab4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/e1000e/netdev.c Minor conflict in e1000e, a line that got fixed in 'net' has been removed in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 05:52:22 -04:00
stephen hemminger	3da889b616	bridge: reserve space for IFLA_BRPORT_FAST_LEAVE The bridge multicast fast leave feature was added sufficient space was not reserved in the netlink message. This means the flag may be lost in netlink events and results of queries. Found by observation while looking up some netlink stuff for discussion with Vlad. Problem introduced by commit c2d3babfafbb9f6629cfb47139758e59a5eb0d80 Author: David S. Miller <davem@davemloft.net> Date: Wed Dec 5 16:24:45 2012 -0500 bridge: implement multicast fast leave Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 05:38:29 -04:00
David S. Miller	2230e0c193	Included changes ares: - fix packet parsing routine to avoid to read beyond the packet boundary -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABCAAGBQJRPlQRAAoJEADl0hg6qKeOP44QAI20iYAaMmA40lrsELAAJMVJ G/hHyKYcup0nxrXnlj9yOft6bYx0232/TjhGKGQ7eYcl+ri+Wyu96kC1hJG9rr/Z +WCU4CimTY5MRVzFKwNriaiqyAsW2cw2T1k1KfZD9Wb9t6hEdvd8f+4DbXYrYxHG nSZQKKDD0cxs1ARScOEGbf7KF8sw6RcGWj0m4xM00Wo/fai+CZZX/HLcUnHQrQxx 4w9safvaIVuQV3mANTpSoerfkraNzaX14i2ZU5SGi2/mhR9PC4JyGz5FIge+fuvp rP/E40GdCYpcuDL7UAyd+IBaOoiP6llDUJA/LqbZLyEZgkMtt8rgQwBsmcYDtiTt zmqCgwjp2/mTs44LfuxtxvLcIDRsQh52I0ceZaAzflG3m9t5eYs6L7oyEEUtOSCm wwY+RmBdMrArr8dohkxopjxAJtCLuHxC8e9AfXwzqt8FYZIQG/oayBrgtEoxCgzf PnJWX0uw4m6WisvMN5Ko8bNeacVRyceTqTOpIWxbdF0wku2evCbkxkK6PfvRDAca UKyrLfbDH59OObhq3fEov7wiNjLJo92bV6dLqTGgQp/GQBTswttb+9WwjT6+PE8b dKRM1eKlCBDMT5q4tGSzKvoH6cfC+h7GIPmpNDdcG+ByJI4bLyzDxLs6ZzYoyYP9 plB7HO/1r1pnOE/vtqXE =X3VY -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included changes ares: - fix packet parsing routine to avoid to read beyond the packet boundary Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 05:36:52 -04:00
David Ward	4660c7f498	net/ipv4: Ensure that location of timestamp option is stored This is needed in order to detect if the timestamp option appears more than once in a packet, to remove the option if the packet is fragmented, etc. My previous change neglected to store the option location when the router addresses were prespecified and Pointer > Length. But now the option location is also stored when Flag is an unrecognized value, to ensure these option handling behaviors are still performed. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 05:35:39 -04:00
Michael Dalton	e1733de224	flow_dissector: support L2 GRE Add support for L2 GRE tunnels, so that RPS can be more effective. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 05:14:06 -04:00
Marek Lindner	b47506d912	batman-adv: verify tt len does not exceed packet len batadv_iv_ogm_process() accesses the packet using the tt_num_changes attribute regardless of the real packet len (assuming the length check was done before). Therefore a length check is needed to avoid reading random memory. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-03-11 22:59:47 +01:00
Sage Weil	d6c0dd6b0c	libceph: fix decoding of pgids In 4f6a7e5ee1393ec4b243b39dac9f36992d161540 we effectively dropped support for the legacy encoding for the OSDMap and incremental. However, we didn't fix the decoding for the pgid. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2013-03-11 14:31:00 -07:00
Linus Torvalds	0cb7750825	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Missing cancel of work items in mac80211 MLME, from Ben Greear. 2) Fix DMA mapping handling in iwlwifi by using coherent DMA for command headers, from Johannes Berg. 3) Decrease the amount of pressure on the page allocator by using order 1 pages less in iwlwifi, from Emmanuel Grumbach. 4) Fix mesh PS broadcast OOPS in mac80211, from Marco Porsch. 5) Don't forget to recalculate idle state in mac80211 monitor interface, from Felix Fietkau. 6) Fix varargs in netfilter conntrack handler, from Joe Perches. 7) Need to reset entire chip when command queue fills up in iwlwifi, from Emmanuel Grumbach. 8) The TX antenna value must be valid when calibrations are performed in iwlwifi, fix from Dor Shaish. 9) Don't generate netfilter audit log entries when audit is disabled, from Gao Feng. 10) Deal with DMA unit hang on e1000e during power state transitions, from Bruce Allan. 11) Remove BUILD_BUG_ON check from igb driver, from Alexander Duyck. 12) Fix lockdep warning on i2c handling of igb driver, from Carolyn Wyborny. 13) Fix several TTY handling issues in IRDA ircomm tty driver, from Peter Hurley. 14) Several QFQ packet scheduler fixes from Paolo Valente. 15) When VXLAN encapsulates on transmit, we have to reset the netfilter state. From Zang MingJie. 16) Fix jiffie check in net_rx_action() so that we really cap the processing at 2HZ. From Eric Dumazet. 17) Fix erroneous trigger of IP option space exhaustion, when routers are pre-specified and we are looking to see if we can insert a timestamp, we will have the space. From David Ward. 18) Fix various issues in benet driver wrt waiting for firmware to finish POST after resets or errors. From Gavin Shan and Sathya Perla. 19) Fix TX locking in SFC driver, from Ben Hutchings. 20) Like the VXLAN fix above, when we encap in a TUN device we have to reset the netfilter state. This should fix several strange crashes reported by Dave Jones and others. From Eric Dumazet. 21) Don't forget to clean up MAC address resources when shutting down a port in mlx4 driver, from Yan Burman. 22) Fix divide by zero in vmxnet3 driver, from Bhavesh Davda. 23) Fix device statistic regression in tg3 when the driver is using phylib, from Nithin Sujir. 24) Fix info leak in several netlink handlers, from Mathias Krause. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) 6lowpan: Fix endianness issue in is_addr_link_local(). rrunner.c: fix possible memory leak in rr_init_one() dcbnl: fix various netlink info leaks rtnl: fix info leak on RTM_GETLINK request for VF devices bridge: fix mdb info leaks tg3: Update link_up flag for phylib devices ipv6: stop multicast forwarding to process interface scoped addresses bridging: fix rx_handlers return code netlabel: fix build problems when CONFIG_IPV6=n drivers/isdn: checkng length to be sure not memory overflow net/rds: zero last byte for strncpy bnx2x: Fix SFP+ misconfiguration in iSCSI boot scenario bnx2x: Fix intermittent long KR2 link up time macvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode. team: unsyc the devices addresses when port is removed bridge: add missing vid to br_mdb_get() Fix: sparse warning in inet_csk_prepare_forced_close afkey: fix a typo MAINTAINERS: Update qlcnic maintainers list netlabel: correctly list all the static label mappings ...	2013-03-11 07:51:59 -07:00
Valentin Ilie	f4f3efdaf9	net: can: af_can.c: Fix checkpatch warnings Replace printk(KERN_ERR with pr_err Add space before { Removed OOM messages Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-11 07:40:17 -04:00
Thierry Escande	40213fa851	NFC: llcp: Add cleanup support for unreplied SNL requests If the remote LLC doesn't reply in time to our SNL requests we remove them from the list of pending requests. The timeout is fixed to an arbitrary value of 3 times remote_lto. When not replied, the local LLC broadcasts NFC_EVENT_LLC_SDRES nl events for the concerned uris with sap values set to LLCP_SDP_UNBOUND (which is 65). Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 23:16:41 +01:00
Thierry Escande	d9b8d8e19b	NFC: llcp: Service Name Lookup netlink interface This adds a netlink interface for service name lookup support. Multiple URIs can be passed nested into the NFC_ATTR_LLC_SDP attribute using the NFC_CMD_LLC_SDREQ netlink command. When the SNL reply is received, a NFC_EVENT_LLC_SDRES event is sent to the user space. URI and SAP tuples are passed back, nested into NFC_ATTR_LLC_SDP attribute. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 23:14:54 +01:00
Thierry Escande	e0ae7bac06	NFC: llcp: Service Name Lookup SDRES aggregation This modifies the way SDRES PDUs are sent back. If multiple SDREQs are received within a single SNL PDU, all SDRES replies are sent packed in one SNL PDU too. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 23:10:55 +01:00
Thierry Escande	8af362d124	NFC: Add missing type policies for netlink attributes Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 22:20:05 +01:00
Samuel Ortiz	8808edb1ec	NFC: llcp: Remove redundant printk We already have a pr_debug for that. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 22:20:05 +01:00
Samuel Ortiz	06d44f806a	NFC: llcp: Use socket specific link parameters before the local ones If the socket link options are set, use them before the local one. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 22:20:05 +01:00
Samuel Ortiz	26fd76cab2	NFC: llcp: Implement socket options Some LLCP services (e.g. the validation ones) require some control over the LLCP link parameters like the receive window (RW) or the MIU extension (MIUX). This can only be done through socket options. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 22:20:05 +01:00
Samuel Ortiz	e4306bec47	NFC: llcp: Rename socket rw and miu fields They really are remote peer parameters, and we need to distinguish them from the local ones as we'll modify the latter with socket options. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2013-03-10 22:20:05 +01:00
Cong Wang	e8f72ea4a1	ipv6: introduce ip6tunnel_xmit() helper Similar to iptunnel_xmit(), group these operations into a helper function. This by the way fixes the missing u64_stats_update_begin() and u64_stats_update_end() for 32 bit arch. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-10 16:53:34 -04:00
YOSHIFUJI Hideaki / 吉藤英明	9026c49272	6lowpan: Fix endianness issue in is_addr_link_local(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-10 16:49:35 -04:00
Mathias Krause	22c352195e	ipv6: remove superfluous nla_data() NULL pointer checks nla_data() cannot return NULL, so these NULL pointer checks are superfluous. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-10 16:46:09 -04:00

... 2 3 4 5 6 ...

27433 Commits