24518 Commits

Author SHA1 Message Date
Sage Weil
3b5ede07b5 libceph: fix fault locking; close socket on lossy fault
If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.

We should also take the con mutex before printing out the state bits in
the debug output.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:15:55 -07:00
Jiaju Zhang
048a9d2d06 libceph: trivial fix for the incorrect debug output
This is a trivial fix for the debug output, as it is inconsistent
with the function name so may confuse people when debugging.

[elder@inktank.com: switched to use __func__]

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:15:36 -07:00
Sage Weil
85effe183d libceph: reset connection retry on successfully negotiation
We exponentially back off when we encounter connection errors.  If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.

Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node.  Fixes ceph issue #2802.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:15:34 -07:00
Sage Weil
5469155f2b libceph: protect ceph_con_open() with mutex
Take the con mutex while we are initiating a ceph open.  This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:15:33 -07:00
Sage Weil
a410702697 libceph: (re)initialize bio_iter on start of message receive
Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:31 -07:00
Sage Weil
6194ea895e libceph: resubmit linger ops when pg mapping changes
The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:31 -07:00
Sage Weil
8c50c81756 libceph: fix mutex coverage for ceph_con_close
Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:30 -07:00
Sage Weil
3a140a0d5c libceph: report socket read/write error message
We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:29 -07:00
Sage Weil
546f04ef71 libceph: support crush tunables
The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:23 -07:00
Stanislav Kinsbursky
caea33da89 SUNRPC: return negative value in case rpcbind client creation error
Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.0]
2012-07-30 20:39:05 -04:00
Sage Weil
1fe60e51a3 libceph: move feature bits to separate header
This is simply cleanup that will keep things more closely synced with the
userland code.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 16:23:22 -07:00
Jeff Layton
5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
Jeff Layton
506026c3ec sunrpc: clarify comments on rpc_make_runnable
rpc_make_runnable is not generally called with the queue lock held, unless
it's waking up a task that has been sitting on a waitqueue. This is safe
when the task has not entered the FSM yet, but the comments don't really
spell this out.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:55:08 -04:00
Joe Perches
cac5d07e3c sunrpc: clnt: Add missing braces
Add a missing set of braces that commit 4e0038b6b24
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
2012-07-30 17:58:08 -04:00
stephen hemminger
5a0d513b62 bridge: make port attributes const
Simple table that can be marked const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
0c7462a235 ipv4: remove rt_cache_rebuild_count
After IP route cache removal, rt_cache_rebuild_count is no longer
used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
404e0a8b6a net: ipv4: fix RCU races on dst refcounts
commit c6cffba4ffa2 (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.

crashes happen on tcp workloads if routes are added/deleted at the same
time.

The dst_free() calls from free_fib_info_rcu() are clearly racy.

We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :

Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst

Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)

rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.

With this patch, my machines survive various benchmarks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
cca32e4bf9 net: TCP early demux cleanup
early_demux() handlers should be called in RCU context, and as we
use skb_dst_set_noref(skb, dst), caller must not exit from RCU context
before dst use (skb_dst(skb)) or release (skb_drop(dst))

Therefore, rcu_read_lock()/rcu_read_unlock() pairs around
->early_demux() are confusing and not needed :

Protocol handlers are already in an RCU read lock section.
(__netif_receive_skb() does the rcu_read_lock() )

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:21 -07:00
Guanjun He
a2a3258417 libceph: prevent the race of incoming work during teardown
Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.

Signed-off-by: Guanjun He <gjhe@suse.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:53 -07:00
Sage Weil
a16cb1f707 libceph: fix messenger retry
In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time.  This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:52 -07:00
Sage Weil
cd43045c2d libceph: initialize rb, list nodes in ceph_osd_request
These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.

Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:51 -07:00
Sage Weil
d50b409fb8 libceph: initialize msgpool message types
Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.

Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:50 -07:00
Eliad Peller
ba846a502c mac80211: don't clear sched_scan_sdata on sched scan stop request
ieee80211_request_sched_scan_stop() cleared
local->sched_scan_sdata. However, sched_scan_sdata
should be cleared only after the driver calls
ieee80211_sched_scan_stopped() (like with normal hw scan).

Clearing sched_scan_sdata too early caused
ieee80211_sched_scan_stopped_work to exit prematurely
without properly cleaning all the sched scan resources
and without calling cfg80211_sched_scan_stopped (so
userspace wasn't notified about sched scan completion).

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-30 09:14:36 +02:00
Johannes Berg
fcb06702f0 Merge remote-tracking branch 'wireless/master' into mac80211 2012-07-30 09:13:03 +02:00
Li Wei
8253947e2c ipv6: fix incorrect route 'expires' value passed to userspace
When userspace use RTM_GETROUTE to dump route table, with an already
expired route entry, we always got an 'expires' value(2147157)
calculated base on INT_MAX.

The reason of this problem is in the following satement:
	rt->dst.expires - jiffies < INT_MAX
gcc promoted the type of both sides of '<' to unsigned long, thus
a small negative value would be considered greater than INT_MAX.

With the help of Eric Dumazet, do the out of bound checks in
rtnl_put_cacheinfo(), _after_ conversion to clock_t.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29 23:18:31 -07:00
Lin Ming
61648d91fc ipv4: clean up put_child
The first parameter struct trie *t is not used anymore.
Remove it.

Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29 23:18:30 -07:00
Lin Ming
4ea4bf7ebc ipv4: fix debug info in tnode_new
It should print size of struct rt_trie_node * allocated instead of size
of struct rt_trie_node.

Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-29 23:18:30 -07:00
Al Viro
faf0201029 clean unix_bind() up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:15 +04:00
Al Viro
a8104a9fcd pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp.
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now.  Makes more sense, POSIX allows, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:15 +04:00
Al Viro
921a1650de new helper: done_path_create()
releases what needs to be released after {kern,user}_path_create()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:13 +04:00
Jiri Kosina
59ea33a68a tcp: perform DMA to userspace only if there is a task waiting for it
Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-27 13:45:51 -07:00
Jesse Gross
6081030769 Revert "openvswitch: potential NULL deref in sample()"
This reverts commit 5b3e7e6cb5771bedda51cdb6f715d1da8cd9e644.

The problem that the original commit was attempting to fix can
never happen in practice because validation is done one a per-flow
basis rather than a per-packet basis.  Adding additional checks at
runtime is unnecessary and inconsistent with the rest of the code.

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-27 13:45:51 -07:00
Eric Dumazet
505fbcf035 ipv4: fix TCP early demux
commit 92101b3b2e317 (ipv4: Prepare for change of rt->rt_iif encoding.)
invalidated TCP early demux, because rx_dst_ifindex is not properly
initialized and checked.

Also remove the use of inet_iif(skb) in favor or skb->skb_iif

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-27 13:45:51 -07:00
Jiri Benc
b1beb681cb net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.

This can be easily trigerred by doing:

tcpdump -i eth0 &
ip l s up eth0

ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-27 13:45:50 -07:00
Hangbin Liu
4249357010 tcp: Add TCP_USER_TIMEOUT negative value check
TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-27 13:45:50 -07:00
Linus Torvalds
aa0b3b2bee Merge branch 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED subsystem update from Bryan Wu.

* 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: (50 commits)
  leds-lp8788: forgotten unlock at lp8788_led_work
  LEDS: propagate error codes in blinkm_detect()
  LEDS: memory leak in blinkm_led_common_set()
  leds: add new lp8788 led driver
  LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS
  leds: max8997: Simplify max8997_led_set_mode implementation
  leds/leds-s3c24xx: use devm_gpio_request
  leds: convert Network Space v2 LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert DAC124S085 LED driver to devm_kzalloc()
  leds: convert LM3530 LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert TCA6507 LED driver to devm_kzalloc()
  leds: convert Freescale MC13783 LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert ADP5520 LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert PCA955x LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert Sun Fire LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert PCA9532 LED driver to devm_kzalloc()
  leds: convert LT3593 LED driver to devm_kzalloc()
  leds: convert Renesas TPU LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert LP5523 LED driver to devm_kzalloc() and cleanup error exit path
  leds: convert PCA9633 LED driver to devm_kzalloc()
  ...
2012-07-26 20:26:27 -07:00
Linus Torvalds
1e30c1b386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates and fixes from David Miller:

1) Reinstate the no-ref optimization for input route lookups in ipv4 to
   fix some routing cache removal perf regressions.

2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.

3) Get RX hash value from correct place in be2net driver, from
   Sarveshwar Bandi.

4) Validation of FIB cached routes missing critical check, from Eric
   Dumazet.

5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  ipv6: Early TCP socket demux
  ipv4: Fix input route performance regression.
  pch_gbe: vlan skb len fix
  pch_gbe: add extra clean tx
  pch_gbe: fix transmit watchdog timeout
  ixgbe: fix panic while dumping packets on Tx hang with IOMMU
  be2net: Fix to parse RSS hash from Receive completions correctly.
  net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
  hyperv: Add error handling to rndis_filter_device_add()
  hyperv: Add a check for ring_size value
  ipv4: rt_cache_valid must check expired routes
  net/pch_gpe: Cannot disable ethernet autonegation
  qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
  netiucv: cleanup attribute usage
  net: wiznet add missing HAS_IOMEM dependency
  be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
  mlx4: Add support for EEH error recovery
  cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
  wanmain: comparing array with NULL
  caif: fix NULL pointer check
  ...
2012-07-26 18:09:01 -07:00
Eric Dumazet
c7109986db ipv6: Early TCP socket demux
This is the IPv6 missing bits for infrastructure added in commit
41063e9dd1195 (ipv4: Early TCP socket demux.)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 15:50:39 -07:00
David S. Miller
c6cffba4ff ipv4: Fix input route performance regression.
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.

Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 15:50:39 -07:00
Eric Dumazet
4331debc51 ipv4: rt_cache_valid must check expired routes
commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced rt_cache_valid() helper. It unfortunately doesn't check if
route is expired before caching it.

I noticed sk_setup_caps() was constantly called on a tcp workload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25 15:24:14 -07:00
Stanislaw Gruszka
5e31fc0815 wireless: reg: restore previous behaviour of chan->max_power calculations
commit eccc068e8e84c8fe997115629925e0422a98e4de
Author: Hong Wu <Hong.Wu@dspg.com>
Date:   Wed Jan 11 20:33:39 2012 +0200

    wireless: Save original maximum regulatory transmission power for the calucation of the local maximum transmit pow

changed the way we calculate chan->max_power as min(chan->max_power,
chan->max_reg_power). That broke rt2x00 (and perhaps some other
drivers) that do not set chan->max_power. It is not so easy to fix this
problem correctly in rt2x00.

According to commit eccc068e8 changelog, change claim only to save
maximum regulatory power - changing setting of chan->max_power was side
effect. This patch restore previous calculations of chan->max_power and
do not touch chan->max_reg_power.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-25 16:11:12 +02:00
Alan Cox
8b72ff6484 wanmain: comparing array with NULL
gcc really should warn about these !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24 13:55:21 -07:00
Eric Dumazet
9cb429d692 tcp: early_demux fixes
1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux()
   and fix a potential bug if skb->head was reallocated
   (iph & th pointers were not reloaded)

TCP stack will pull/check headers anyway.

2) must reload iph in ip_rcv_finish() after early_demux()
 call since skb->head might have changed.

3) skb->dev->ifindex can be now replaced by skb->skb_iif

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24 13:54:15 -07:00
Linus Torvalds
d14b7a419a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Trivial updates all over the place as usual."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits)
  Fix typo in include/linux/clk.h .
  pci: hotplug: Fix typo in pci
  iommu: Fix typo in iommu
  video: Fix typo in drivers/video
  Documentation: Add newline at end-of-file to files lacking one
  arm,unicore32: Remove obsolete "select MISC_DEVICES"
  module.c: spelling s/postition/position/g
  cpufreq: Fix typo in cpufreq driver
  trivial: typo in comment in mksysmap
  mach-omap2: Fix typo in debug message and comment
  scsi: aha152x: Fix sparse warning and make printing pointer address more portable.
  Change email address for Steve Glendinning
  Btrfs: fix typo in convert_extent_bit
  via: Remove bogus if check
  netprio_cgroup.c: fix comment typo
  backlight: fix memory leak on obscure error path
  Documentation: asus-laptop.txt references an obsolete Kconfig item
  Documentation: ManagementStyle: fixed typo
  mm/vmscan: cleanup comment error in balance_pgdat
  mm: cleanup on the comments of zone_reclaim_stat
  ...
2012-07-24 13:34:56 -07:00
Linus Torvalds
3c4cfadef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David S Miller:

 1) Remove the ipv4 routing cache.  Now lookups go directly into the FIB
    trie and use prebuilt routes cached there.

    No more garbage collection, no more rDOS attacks on the routing
    cache.  Instead we now get predictable and consistent performance,
    no matter what the pattern of traffic we service.

    This has been almost 2 years in the making.  Special thanks to
    Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
    have helped along the way.

    I'm sure that with a change of this magnitude there will be some
    kind of fallout, but such things ought the be simple to fix at this
    point.  Luckily I'm not European so I'll be around all of August to
    fix things :-)

    The major stages of this work here are each fronted by a forced
    merge commit whose commit message contains a top-level description
    of the motivations and implementation issues.

 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
    input.

 3) TCP SYN/ACK performance tweaks from Eric Dumazet.

 4) Add namespace support for netfilter L4 conntrack helpers, from Gao
    Feng.

 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
    Yuval Mintz.

 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.

 7) Support for connection tracker helpers in userspace, from Pablo
    Neira Ayuso.

 8) Allow userspace driven TX load balancing functions in TEAM driver,
    from Jiri Pirko.

 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
    embedded gotos.

10) TCP Small Queues, essentially minimize the amount of TCP data queued
    up in the packet scheduler layer.  Whereas the existing BQL (Byte
    Queue Limits) limits the pkt_sched --> netdevice queuing levels,
    this controls the TCP --> pkt_sched queueing levels.

    From Eric Dumazet.

11) Reduce the number of get_page/put_page ops done on SKB fragments,
    from Alexander Duyck.

12) Implement protection against blind resets in TCP (RFC 5961), from
    Eric Dumazet.

13) Support the client side of TCP Fast Open, basically the ability to
    send data in the SYN exchange, from Yuchung Cheng.

    Basically, the sender queues up data with a sendmsg() call using
    MSG_FASTOPEN, then they do the connect() which emits the queued up
    fastopen data.

14) Avoid all the problems we get into in TCP when timers or PMTU events
    hit a locked socket.  The TCP Small Queues changes added a
    tcp_release_cb() that allows us to queue work up to the
    release_sock() caller, and that's what we use here too.  From Eric
    Dumazet.

15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
  genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
  r8169: revert "add byte queue limit support".
  ipv4: Change rt->rt_iif encoding.
  net: Make skb->skb_iif always track skb->dev
  ipv4: Prepare for change of rt->rt_iif encoding.
  ipv4: Remove all RTCF_DIRECTSRC handliing.
  ipv4: Really ignore ICMP address requests/replies.
  decnet: Don't set RTCF_DIRECTSRC.
  net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
  ipv4: Remove redundant assignment
  rds: set correct msg_namelen
  openvswitch: potential NULL deref in sample()
  tcp: dont drop MTU reduction indications
  bnx2x: Add new 57840 device IDs
  tcp: avoid oops in tcp_metrics and reset tcpm_stamp
  niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
  niu: Fix to check for dma mapping errors.
  net: Fix references to out-of-scope variables in put_cmsg_compat()
  net: ethernet: davinci_emac: add pm_runtime support
  net: ethernet: davinci_emac: Remove unnecessary #include
  ...
2012-07-24 10:01:50 -07:00
Johannes Berg
3aa569c3fe mac80211: fix scan_sdata assignment
We need to use RCU to assign scan_sdata.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-24 16:54:11 +02:00
WANG Cong
320f5ea0ce genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
lockdep_is_held() is defined when CONFIG_LOCKDEP, not CONFIG_PROVE_LOCKING.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24 00:01:30 -07:00
Shuah Khan
19cd67e2d5 leds: Rename led_brightness_set() to led_set_brightness()
Rename leds external interface led_brightness_set() to led_set_brightness().
This is the second phase of the change to reduce confusion between the
leds internal and external interfaces that set brightness. With this change,
now the external interface is led_set_brightness(). The first phase renamed
the internal interface led_set_brightness() to __led_set_brightness().
There are no changes to the interface implementations.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
2012-07-24 07:52:34 +08:00
David S. Miller
13378cad02 ipv4: Change rt->rt_iif encoding.
On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.

Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 16:36:27 -07:00
David S. Miller
b68581778c net: Make skb->skb_iif always track skb->dev
Make it follow device decapsulation, from things such as VLAN and
bonding.

The stuff that actually cares about pre-demuxed device pointers, is
handled by the "orig_dev" variable in __netif_receive_skb().  And
the only consumer of that is the po->origdev feature of AF_PACKET
sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 16:36:27 -07:00