63131 Commits

Author SHA1 Message Date
Felix Fietkau
a7022e65c6 mac80211: add helper functions for tracking P2P NoA state
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-19 13:37:46 +01:00
Nicholas Bellinger
95cadace8f target/file: Update hw_max_sectors based on current block_size
This patch allows FILEIO to update hw_max_sectors based on the current
max_bytes_per_io.  This is required because vfs_[writev,readv]() can accept
a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really
needs to be calculated based on block_size.

This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M
sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for
the block_size=4096 case.

(v2: Use max_bytes_per_io instead of ->update_hw_max_sectors)

Reported-by: Henrik Goldman <hg@x-formation.com>
Cc: <stable@vger.kernel.org> #3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19 00:18:54 -08:00
Francesco Fusco
237217546d lib: hash: follow-up fixups for arch hash
This patch adds the include file to pull in __read_mostly on some
architectures e.g. ppc and also fixes up signatures in generic
asm.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 00:14:53 -05:00
Linus Torvalds
86fbf1617a Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
 "23 fixes and a MAINTAINERS update"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  mm/hugetlb: check for pte NULL pointer in __page_check_address()
  fix build with make 3.80
  mm/mempolicy: fix !vma in new_vma_page()
  MAINTAINERS: add Davidlohr as GPT maintainer
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  mm/compaction: respect ignore_skip_hint in update_pageblock_skip
  mm/mempolicy: correct putback method for isolate pages if failed
  mm: add missing dependency in Kconfig
  sh: always link in helper functions extracted from libgcc
  mm: page_alloc: exclude unreclaimable allocations from zone fairness policy
  mm: numa: defer TLB flush for THP migration as long as possible
  mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
  mm: fix TLB flush race between migration, and change_protection_range
  mm: numa: avoid unnecessary disruption of NUMA hinting during migration
  mm: numa: clear numa hinting information on mprotect
  sched: numa: skip inaccessible VMAs
  mm: numa: avoid unnecessary work on the failure path
  mm: numa: ensure anon_vma is locked to prevent parallel THP splits
  mm: numa: do not clear PTE for pte_numa update
  mm: numa: do not clear PMD during PTE update scan
  ...
2013-12-18 19:05:00 -08:00
Mel Gorman
af2c1401e6 mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending
can be visible after a page table update.  As per revised documentation,
this patch adds a smp_mb__before_spinlock to guarantee the correct
ordering.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:51 -08:00
Rik van Riel
2084140594 mm: fix TLB flush race between migration, and change_protection_range
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:51 -08:00
Mel Gorman
de466bd628 mm: numa: avoid unnecessary disruption of NUMA hinting during migration
do_huge_pmd_numa_page() handles the case where there is parallel THP
migration.  However, by the time it is checked the NUMA hinting
information has already been disrupted.  This patch adds an earlier
check with some helpers.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:51 -08:00
Vivek Goyal
c97102ba96 kexec: migrate to reboot cpu
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic
kernel") moved reboot= handling to generic code.  In the process it also
removed the code in native_machine_shutdown() which are moving reboot
process to reboot_cpu/cpu0.

I guess that thought must have been that all reboot paths are calling
migrate_to_reboot_cpu(), so we don't need this special handling.  But
kexec reboot path (kernel_kexec()) is not calling
migrate_to_reboot_cpu() so above change broke kexec.  Now reboot can
happen on non-boot cpu and when INIT is sent in second kerneo to bring
up BP, it brings down the machine.

So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
this problem.

Bisected by WANG Chao.

Reported-by: Matthew Whitehead <mwhitehe@redhat.com>
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:50 -08:00
Timo Teräs
0e3da5bb8d ip_gre: fix msg_name parsing for recvfrom/recvmsg
ipgre_header_parse() needs to parse the tunnel's ip header and it
uses mac_header to locate the iphdr. This got broken when gre tunneling
was refactored as mac_header is no longer updated to point to iphdr.
Introduce skb_pop_mac_header() helper to do the mac_header assignment
and use it in ipgre_rcv() to fix msg_name parsing.

Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.)

Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 17:44:33 -05:00
Hannes Frederic Sowa
93b36cf342 ipv6: support IPV6_PMTU_INTERFACE on sockets
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.

The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 17:37:05 -05:00
Hannes Frederic Sowa
974eda11c5 inet: make no_pmtu_disc per namespace and kill ipv4_config
The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:58:20 -05:00
David S. Miller
143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
WANG Cong
3627287463 net_sched: convert tcf_proto_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:08 -05:00
WANG Cong
1f747c26c4 net_sched: convert tc_action_ops to use struct list_head
We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
89819dc01f net_sched: convert tcf_hashinfo to hlist and use spinlock
So that we don't need to play with singly linked list,
and since the code is not on hot path, we can use spinlock
instead of rwlock.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
369ba56787 net_sched: init struct tcf_hashinfo at register time
It looks weird to store the lock out of the struct but
still points to a static variable. Just move them into the struct.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
5da57f422d net_sched: cls: refactor out struct tcf_ext_map
These information can be saved in tcf_exts, and this will
simplify the code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
33be627159 net_sched: act: use standard struct list_head
Currently actions are chained by a singly linked list,
therefore it is a bit hard to add and remove a specific
entry. Convert it to struct list_head so that in the
latter patch we can remove an action without finding
its head.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
WANG Cong
d84231d3a2 net_sched: remove get_stats from tc_action_ops
It is not used.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 12:52:07 -05:00
Atzm Watanabe
a0cdfcf393 packet: deliver VLAN TPID to userspace
This enables userspace to get VLAN TPID as well as the VLAN TCI.

Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 00:36:16 -05:00
Atzm Watanabe
e4d26f4b08 packet: fill the gap of TPACKET_ALIGNMENT with zeros
struct tpacket{2,3}_hdr is aligned to a multiple of TPACKET_ALIGNMENT.
Explicitly defining and zeroing the gap of this makes additional changes
easier.

Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 00:36:16 -05:00
John Fastabend
85328240c6 net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock held
It is useful to be able to walk all upper devices when bringing
a device online where the RTNL lock is held. In this case it
is safe to walk the all_adj_list because the RTNL lock is used
to protect the write side as well.

This patch adds a check to see if the rtnl lock is held before
throwing a warning in netdev_all_upper_get_next_dev_rcu().

Also because we now have a call site for lockdep_rtnl_is_held()
outside COFIG_LOCK_PROVING an inline definition returning 1 is
needed. Similar to the rcu_read_lock_is_held().

Fixes: 2a47fa45d4df ("ixgbe: enable l2 forwarding acceleration for macvlans")
CC: Veaceslav Falico <vfalico@redhat.com>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-12-17 21:19:08 -08:00
Linus Torvalds
35eecf0522 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Definitely seems quieter this week,

  Radeon, intel, intel broadwell, vmwgfx, ttm, armada, and a couple of
  core fixes, one revert in radeon

  Most of these are either going to stable or fixes for things
  introduced in the merge window"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (30 commits)
  drm/edid: add quirk for BPC in Samsung NP700G7A-S01PL notebook
  drm/ttm: Fix accesses through vmas with only partial coverage
  drm/nouveau: only runtime suspend by default in optimus configuration
  drm: don't double-free on driver load error
  Revert "drm/radeon: Implement radeon_pci_shutdown"
  drm/radeon: add missing display tiling setup for oland
  drm/radeon: fix typo in cik_copy_dma
  drm/radeon/cik: plug in missing blit callback
  drm/radeon/dpm: Fix hwmon crash
  drm/radeon: Fix sideport problems on certain RS690 boards
  drm/i915: don't update the dri1 breadcrumb with modesetting
  DRM: Armada: prime refcounting bug fix
  DRM: Armada: fix printing of phys_addr_t/dma_addr_t
  DRM: Armada: destroy framebuffer after helper
  DRM: Armada: implement lastclose() for fbhelper
  drm/i915: Repeat eviction search after idling the GPU
  drm/vmwgfx: Add max surface memory param
  drm/i915: Fix use-after-free in do_switch
  drm/i915: fix pm init ordering
  drm/i915: Hold mutex across i915_gem_release
  ...
2013-12-17 16:59:59 -08:00
Tom Herbert
3df7a74e79 net: Add utility function to copy skb hash
Adds skb_copy_hash to copy rxhash and l4_rxhash from one skb to another.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:22 -05:00
Tom Herbert
09323cc479 net: Add function to set the rxhash
The function skb_set_rxash was added for drivers to call to set
the rxhash in an skb. The type of hash is also specified as
a parameter (L2, L3, L4, or unknown type).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:22 -05:00
Tom Herbert
7539fadcb8 net: Add utility functions to clear rxhash
In several places 'skb->rxhash = 0' is being done to clear the
rxhash value in an skb.  This does not clear l4_rxhash which could
still be set so that the rxhash wouldn't be recalculated on subsequent
call to skb_get_rxhash.  This patch adds an explict function to clear
all the rxhash related information in the skb properly.

skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
l4_rxhash.

Fixed up places where 'skb->rxhash = 0' was being called.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:21 -05:00
Tom Herbert
3958afa1b2 net: Change skb_get_rxhash to skb_get_hash
Changing name of function as part of making the hash in skbuff to be
generic property, not just for receive path.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:21 -05:00
sfeldma@cumulusnetworks.com
d8838de70a bonding: add resend_igmp attribute netlink support
Add IFLA_BOND_RESEND_IGMP to allow get/set of bonding parameter
resend_igmp via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:08:45 -05:00
sfeldma@cumulusnetworks.com
f70161c672 bonding: add xmit_hash_policy attribute netlink support
Add IFLA_BOND_XMIT_HASH_POLICY to allow get/set of bonding parameter
xmit_hash_policy via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:08:45 -05:00
sfeldma@cumulusnetworks.com
89901972de bonding: add fail_over_mac attribute netlink support
Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter
fail_over_mac via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:08:45 -05:00
sfeldma@cumulusnetworks.com
8a41ae4496 bonding: add primary_select attribute netlink support
Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter
primary_select via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:08:45 -05:00
sfeldma@cumulusnetworks.com
0a98a0d12c bonding: add primary attribute netlink support
Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter
primary via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:08:45 -05:00
Linus Torvalds
dd0508093b Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Three fixes for scheduler crashes, each triggers in relatively rare,
  hardware environment dependent situations"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Rework sched_fair time accounting
  math64: Add mul_u64_u32_shr()
  sched: Remove PREEMPT_NEED_RESCHED from generic code
  sched: Initialize power_orig for overlapping groups
2013-12-17 12:35:54 -08:00
Eric Dumazet
d4589926d7 tcp: refine TSO splits
While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set,
or if no small packet is in flight.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We can avoid this split by including the Nagle tests at the right place.

Note : If some NIC had trouble sending TSO packets with a partial
last segment, we would have hit the problem in GRO/forwarding workload already.

tcp_minshall_update() is moved to tcp_output.c and is updated as we might
feed a TSO packet with a partial last segment.

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized
         8,449,968 context-switches          #    0.041 M/sec
         1,935,997 CPU-migrations            #    0.009 M/sec
           160,541 page-faults               #    0.780 K/sec
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized
         5,635,458 context-switches          #    0.052 M/sec
         1,374,707 CPU-migrations            #    0.013 M/sec
           160,920 page-faults               #    0.001 M/sec
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Many thanks to Neal for review and ideas.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:15:25 -05:00
stephen hemminger
477bb93320 net: remove dead code for add/del multiple
These function to manipulate multiple addresses are not used anywhere
in current net-next tree. Some out of tree code maybe using these but
too bad; they should submit their code upstream..

Also, make __hw_addr_flush local since only used by dev_addr_lists.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:14:04 -05:00
David S. Miller
6ea09d8a09 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates for the 3.14 stream...

For the Bluetooth bits, Gustavo says:

"This is the first batch of patches intended for 3.14. There is
nothing big here.  Most of the code are refactors, clean up, small
fixes, plus some new device id support."

And...

"More patches to 3.14. Here we have the support for Low Energy
Connection Oriented Channels (LE CoC). Basically, as the name says,
this adds supports for connection oriented channels in the same way
we already have them for BR/EDR connections so profiles/protocols
that work on top of BR/EDR can now work on LE plus a plenty of new
possibilities for LE."

For the ath10k bits, Kalle says:

"Janusz and Marek implemented DFS support to ath10k, but the code is
not enabled yet due to missing cfg80211/mac80211 patches (it will be
enabled in the next pull request). Michal did some device reset fixes
and made it possible for ath10k to share an interrupt with another
device. And lots of smaller fixes from different people."

For the iwlwifi bits, Emmanuel says:

"I have here a big rework of the rate control by Eyal. This is obviously
the biggest part of this batch.
I also have enhancement of protection flags by Avri and a few bits for
WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few
fixes. I provided a few things for Bluetooth coexistence.
Besides this we have an implementation for low priority scan."

Along with all that, there are big batches of updates to mwifiex and
ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other
bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:08:17 -05:00
Sebastian Hesselbarth
481b5d938b net: phy: provide phy_resume/phy_suspend helpers
This adds helper functions to resume and suspend a given phy_device
by calling the corresponding driver callbacks if available.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:44 -05:00
wangweidong
be78cfcb25 sctp: Reorder 'struc association' members to reduce its size
Members of 'struct association' are not in appropriate order to
reuse compiler added padding on 64bit architectures. In this patch
we reorder those struct members and help reduce the size of the
structure from 2776 bytes to 2720 bytes on 64 bit architectures.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:32:43 -05:00
Francesco Fusco
71ae8aac3e lib: introduce arch optimized hash library
We introduce a new hashing library that is meant to be used in
the contexts where speed is more important than uniformity of the
hashed values. The hash library leverages architecture specific
implementation to achieve high performance and fall backs to
jhash() for the generic case.

On Intel-based x86 architectures, the library can exploit the crc32l
instruction, part of the Intel SSE4.2 instruction set, if the
instruction is supported by the processor. This implementation
is twice as fast as the jhash() implementation on an i7 processor.

Additional architectures, such as Arm64 provide instructions for
accelerating the computation of CRC, so they could be added as well
in follow-up work.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:17 -05:00
Vince Weaver
189b84fb54 perf: Document the new transaction sample type
Commit fdfbbd07e91f8fe3871 ("perf: Add generic transaction flags")
added support for PERF_SAMPLE_TRANSACTION but forgot to add documentation
for the sample type to include/uapi/linux/perf_event.h

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1312131548450.10372@pianoman.cluster.toy
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:04:01 +01:00
Tomasz Bursztyka
d8bcc768c8 netfilter: nf_tables: Expose the table usage counter via netlink
Userspace can therefore know whether a table is in use or not, and
by how many chains. Suggested by Pablo Neira Ayuso.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-17 14:28:05 +01:00
Marc Carino
f78dea064c libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs
Certain drives cannot handle queued TRIM commands properly, even
though support is indicated in the IDENTIFY DEVICE buffer.  This patch
allows for disabling the commands for the affected drives and apply it
to the Micron/Crucial M500 SSDs which exhibit incorrect protocol
behavior when issued queued TRIM commands, which could lead to silent
data corruption.

tj: Merged two unnecessarily split patches and made minor edits
    including shortening horkage name.

Signed-off-by: Marc Carino <marc.ceeeee@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/g/1387246554-7311-1-git-send-email-marc.ceeeee@gmail.com
Cc: stable@vger.kernel.org # 3.12+
2013-12-17 07:03:14 -05:00
Yann Droneaud
309243ec14 IB/core: const'ify inbuf in struct ib_udata
Userspace input buffer is not modified by kernel, so it can be 'const'.

This is also a prerequisite to remove the implicit cast
from INIT_UDATA().

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-16 10:38:28 -08:00
Janusz Dziedzic
204e35a91c nl80211: add VHT support for set_bitrate_mask
Add VHT MCS/NSS set support for nl80211_set_tx_bitrate_mask().
This should be used mainly for test purpose, to check
different MCS/NSS VHT combinations.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 16:05:17 +01:00
Fan Du
776e9dd90c xfrm: export verify_userspi_info for pkfey and netlink interface
In order to check against valid IPcomp spi range, export verify_userspi_info
for both pfkey and netlink interface.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-12-16 12:54:02 +01:00
Felix Fietkau
70dabeb74e mac80211: let the driver reserve extra tailroom in beacons
Can be used to add extra IEs (such as P2P NoA) without having to
reallocate the buffer.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 12:14:07 +01:00
Johannes Berg
6a9d1b91f3 mac80211: add pre-RCU-sync sta removal driver operation
Currently, mac80211 allows drivers to keep RCU-protected station
references that are cleared when the station is removed from the
driver and consequently needs to synchronize twice, once before
removing the station from the driver (so it can guarantee that
the station is no longer used in TX towards the driver) and once
after the station is removed from the driver.

Add a new pre-RCU-synchronisation station removal operation to
the API to allow drivers to clear/invalidate their RCU-protected
station pointers before the RCU synchronisation.

This will allow removing the second synchronisation by changing
the driver API so that the driver may no longer assume a valid
RCU-protected pointer after sta_remove/sta_state returns.

The alternative to this would be to synchronize_rcu() in all the
drivers that currently rely on this behaviour (only iwlmvm) but
that would defeat the purpose.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-16 11:29:44 +01:00
Johannes Berg
c4de673b77 Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2013-12-16 11:23:45 +01:00
Rafał Miłecki
2a4d81547b Input: define KEY_WWAN for Wireless WAN
Some devices with support for mobile networks may have buttons for
enabling/disabling such connection. An example can be Linksys router 54G3G.
We already have KEY_BLUETOOTH, KEY_WLAN and KEY_UWB so it makes sense to
add KEY_WWAN as well.  As we already have KEY_WIMAX, use it's value for
KEY_WWAN and make it an alias.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-12-16 02:20:40 -08:00
Linus Torvalds
4a251dd29c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
    figure out why it breaks things.

 2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
    was basically doing "x == x", from Dave Jones.

 3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
    Sebastian Siewior.

 4) Blackhole and prohibit routes in ipv6 were not doing the right thing
    because their ->input and ->output methods were not being assigned
    correctly.  Now they behave properly like their ipv4 counterparts.
    From Kamala R.

 5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
    really do not support this feature and will send garbage packets if
    fed fraglist SKBs.  From Eric Dumazet.

 6) Fix long standing user triggerable BUG_ON over loopback in RDS
    protocol stack, from Venkat Venkatsubra.

 7) Several not so common code paths can potentially try to invoke
    packet scheduler actions that might be NULL without checking.  Shore
    things up by either 1) defining a method as mandatory and erroring
    on registration if that method is NULL 2) defininig a method as
    optional and the registration function hooks up a default
    implementation when NULL is seen.  From Jamal Hadi Salim.

 8) Fix fragment detection in xen-natback driver, from Paul Durrant.

 9) Kill dangling enter_memory_pressure method in cg_proto ops, from
    Eric W Biederman.

10) SKBs that traverse namespaces should have their local_df cleared,
    from Hannes Frederic Sowa.

11) IOCB file position is not being updated by macvtap_aio_read() and
    tun_chr_aio_read().  From Zhi Yong Wu.

12) Don't free virtio_net netdev before releasing all of the NAPI
    instances.  From Andrey Vagin.

13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.

14) IPv6 routes that are no cached routes should not count against the
    garbage collection limits.  We had this almost right, but were
    missing handling addrconf generated routes properly.  From Hannes
    Frederic Sowa.

15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
    route info when they are called, from Stefan Tomanek.

16) TUN and MACVTAP have had truncated packet signalling for some time,
    fix from Jason Wang.

17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.

18) xen-netback does not interpret the NAPI budget properly for TX work,
    fix from Paul Durrant.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
  igb: Fix for issue where values could be too high for udelay function.
  i40e: fix null dereference
  xen-netback: fix gso_prefix check
  net: make neigh_priv_len in struct net_device 16bit instead of 8bit
  drivers: net: cpsw: fix for cpsw crash when build as modules
  xen-netback: napi: don't prematurely request a tx event
  xen-netback: napi: fix abuse of budget
  sch_tbf: use do_div() for 64-bit divide
  udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
  net:fec: remove duplicate lines in comment about errata ERR006358
  Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
  8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
  xen-netback: make sure skb linear area covers checksum field
  net: smc91x: Fix device tree based configuration so it's usable
  udp: ipv4: fix potential use after free in udp_v4_early_demux()
  macvtap: signal truncated packets
  tun: unbreak truncated packet signalling
  net: sched: htb: fix the calculation of quantum
  net: sched: tbf: fix the calculation of max_size
  micrel: add support for KSZ8041RNLI
  ...
2013-12-15 11:56:47 -08:00