12809 Commits

Author SHA1 Message Date
Linus Torvalds
df36b439c5 Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits)
  nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh()
  nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
  nfs: remove unnecessary NFS_INO_INVALID_ACL checks
  NFS: More "sloppy" parsing problems
  NFS: Invalid mount option values should always fail, even with "sloppy"
  NFS: Remove unused XDR decoder functions
  NFS: Update MNT and MNT3 reply decoding functions
  NFS: add XDR decoder for mountd version 3 auth-flavor lists
  NFS: add new file handle decoders to in-kernel mountd client
  NFS: Add separate mountd status code decoders for each mountd version
  NFS: remove unused function in fs/nfs/mount_clnt.c
  NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
  NFS: Clean up MNT program definitions
  lockd: Don't bother with RPC ping for NSM upcalls
  lockd: Update NSM state from SM_MON replies
  NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
  NFS: Return error code from nfs_callback_up() to user space
  NFS: Do not display the setting of the "intr" mount option
  NFS: add support for splice writes
  nfs41: Backchannel: CB_SEQUENCE validation
  ...
2009-06-22 12:53:06 -07:00
Linus Torvalds
5165aece0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (43 commits)
  via-velocity: Fix velocity driver unmapping incorrect size.
  mlx4_en: Remove redundant refill code on RX
  mlx4_en: Removed redundant check on lso header size
  mlx4_en: Cancel port_up check in transmit function
  mlx4_en: using stop/start_all_queues
  mlx4_en: Removed redundant skb->len check
  mlx4_en: Counting all the dropped packets on the TX side
  usbnet cdc_subset: fix issues talking to PXA gadgets
  Net: qla3xxx, remove sleeping in atomic
  ipv4: fix NULL pointer + success return in route lookup path
  isdn: clean up documentation index
  cfg80211: validate station settings
  cfg80211: allow setting station parameters in mesh
  cfg80211: allow adding/deleting stations on mesh
  ath5k: fix beacon_int handling
  MAINTAINERS: Fix Atheros pattern paths
  ath9k: restore PS mode, before we put the chip into FULL SLEEP state.
  ath9k: wait for beacon frame along with CAB
  acer-wmi: fix rfkill conversion
  ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling
  ...
2009-06-22 11:57:09 -07:00
Patrick McHardy
4d900f9df5 netfilter: xt_rateest: fix comparison with self
As noticed by Trk Edwin <edwintorok@gmail.com>:

Compiling the kernel with clang has shown this warning:

net/netfilter/xt_rateest.c:69:16: warning: self-comparison always results in a
constant value
                        ret &= pps2 == pps2;
                                    ^
Looking at the code:
if (info->flags & XT_RATEEST_MATCH_BPS)
            ret &= bps1 == bps2;
        if (info->flags & XT_RATEEST_MATCH_PPS)
            ret &= pps2 == pps2;

Judging from the MATCH_BPS case it seems to be a typo, with the intention of
comparing pps1 with pps2.

http://bugzilla.kernel.org/show_bug.cgi?id=13535

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:17:12 +02:00
Jan Engelhardt
6d62182fea netfilter: xt_quota: fix incomplete initialization
Commit v2.6.29-rc5-872-gacc738f ("xtables: avoid pointer to self")
forgot to copy the initial quota value supplied by iptables into the
private structure, thus counting from whatever was in the memory
kmalloc returned.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:16:45 +02:00
Patrick McHardy
2495561928 netfilter: nf_log: fix direct userspace memory access in proc handler
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:15:30 +02:00
Patrick McHardy
f9ffc31251 netfilter: fix some sparse endianess warnings
net/netfilter/xt_NFQUEUE.c:46:9: warning: incorrect type in assignment (different base types)
net/netfilter/xt_NFQUEUE.c:46:9:    expected unsigned int [unsigned] [usertype] ipaddr
net/netfilter/xt_NFQUEUE.c:46:9:    got restricted unsigned int
net/netfilter/xt_NFQUEUE.c:68:10: warning: incorrect type in assignment (different base types)
net/netfilter/xt_NFQUEUE.c:68:10:    expected unsigned int [unsigned] <noident>
net/netfilter/xt_NFQUEUE.c:68:10:    got restricted unsigned int
net/netfilter/xt_NFQUEUE.c:69:10: warning: incorrect type in assignment (different base types)
net/netfilter/xt_NFQUEUE.c:69:10:    expected unsigned int [unsigned] <noident>
net/netfilter/xt_NFQUEUE.c:69:10:    got restricted unsigned int
net/netfilter/xt_NFQUEUE.c:70:10: warning: incorrect type in assignment (different base types)
net/netfilter/xt_NFQUEUE.c:70:10:    expected unsigned int [unsigned] <noident>
net/netfilter/xt_NFQUEUE.c:70:10:    got restricted unsigned int
net/netfilter/xt_NFQUEUE.c:71:10: warning: incorrect type in assignment (different base types)
net/netfilter/xt_NFQUEUE.c:71:10:    expected unsigned int [unsigned] <noident>
net/netfilter/xt_NFQUEUE.c:71:10:    got restricted unsigned int

net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types)
net/netfilter/xt_cluster.c:20:55:    expected unsigned int
net/netfilter/xt_cluster.c:20:55:    got restricted unsigned int const [usertype] ip
net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types)
net/netfilter/xt_cluster.c:20:55:    expected unsigned int
net/netfilter/xt_cluster.c:20:55:    got restricted unsigned int const [usertype] ip

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:15:02 +02:00
Patrick McHardy
8d8890b775 netfilter: nf_conntrack: fix conntrack lookup race
The RCU protected conntrack hash lookup only checks whether the entry
has a refcount of zero to decide whether it is stale. This is not
sufficient, entries are explicitly removed while there is at least
one reference left, possibly more. Explicitly check whether the entry
has been marked as dying to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:14:41 +02:00
Patrick McHardy
5c8ec910e7 netfilter: nf_conntrack: fix confirmation race condition
New connection tracking entries are inserted into the hash before they
are fully set up, namely the CONFIRMED bit is not set and the timer not
started yet. This can theoretically lead to a race with timer, which
would set the timeout value to a relative value, most likely already in
the past.

Perform hash insertion as the final step to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:14:16 +02:00
Eric Dumazet
8cc20198cf netfilter: nf_conntrack: death_by_timeout() fix
death_by_timeout() might delete a conntrack from hash list
and insert it in dying list.

 nf_ct_delete_from_lists(ct);
 nf_ct_insert_dying_list(ct);

I believe a (lockless) reader could *catch* ct while doing a lookup
and miss the end of its chain.
(nulls lookup algo must check the null value at the end of lookup and
should restart if the null value is not the expected one.
cf Documentation/RCU/rculist_nulls.txt for details)

We need to change nf_conntrack_init_net() and use a different "null" value,
guaranteed not being used in regular lists. Choose very large values, since
hash table uses [0..size-1] null values.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-22 14:13:55 +02:00
Ricardo Labiaga
e9f0298558 nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh()
xprt_alloc_bc_request() is always called in soft interrupt context.
Grab the spin_lock instead of the bottom half spin_lock.  Softirqs
do not preempt other softirqs running on the same processor, so there
is no need to disable bottom halves.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-20 14:55:39 -04:00
David S. Miller
c3da63f357 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-06-20 01:16:40 -07:00
Neil Horman
73e42897e8 ipv4: fix NULL pointer + success return in route lookup path
Don't drop route if we're not caching	

	I recently got a report of an oops on a route lookup.  Maxime was
testing what would happen if route caching was turned off (doing so by setting
making rt_caching always return 0), and found that it triggered an oops.  I
looked at it and found that the problem stemmed from the fact that the route
lookup routines were returning success from their lookup paths (which is good),
but never set the **rp pointer to anything (which is bad).  This happens because
in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
This almost emulates slient success.  What we should be doing is assigning *rp =
rt and _not_ dropping the route.  This way, during slow path lookups, when we
create a new route cache entry, we don't immediately discard it, rather we just
don't add it into the cache hash table, but we let this one lookup use it for
the purpose of this route request.  Maxime has tested and reports it prevents
the oops.  There is still a subsequent routing issue that I'm looking into
further, but I'm confident that, even if its related to this same path, this
patch makes sense to take.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-20 01:15:16 -07:00
Johannes Berg
a97f4424fb cfg80211: validate station settings
When I disallowed interfering with stations on non-AP interfaces,
I not only forget mesh but also managed interfaces which need
this for the authorized flag. Let's actually validate everything
properly.

This fixes an nl80211 regression introduced by the interfering,
under which wpa_supplicant -Dnl80211 could not properly connect.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:24 -04:00
Andrey Yurovsky
9a5e8bbc8f cfg80211: allow setting station parameters in mesh
Mesh Point interfaces can also set parameters, for example plink_open is
used to manually establish peer links from user-space (currently via
iw).  Add Mesh Point to the check in nl80211_set_station.

Signed-off-by: Andrey Yurovsky <andrey@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:24 -04:00
Andrey Yurovsky
155cc9e4b1 cfg80211: allow adding/deleting stations on mesh
Commit b2a151a288 added a check that prevents adding or deleting
stations on non-AP interfaces.  Adding and deleting stations is
supported for Mesh Point interfaces, so add Mesh Point to that check as
well.

Signed-off-by: Andrey Yurovsky <andrey@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:23 -04:00
Alan Jenkins
464902e812 rfkill: export persistent attribute in sysfs
This information allows userspace to implement a hybrid policy where
it can store the rfkill soft-blocked state in platform non-volatile
storage if available, and if not then file-based storage can be used.

Some users prefer platform non-volatile storage because of the behaviour
when dual-booting multiple versions of Linux, or if the rfkill setting
is changed in the BIOS setting screens, or if the BIOS responds to
wireless-toggle hotkeys itself before the relevant platform driver has
been loaded.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:18 -04:00
Alan Jenkins
06d5caf47e rfkill: don't restore software blocked state on persistent devices
The setting of the "persistent" flag is also made more explicit using
a new rfkill_init_sw_state() function, instead of special-casing
rfkill_set_sw_state() when it is called before registration.

Suspend is a bit of a corner case so we try to get away without adding
another hack to rfkill-input - it's going to be removed soon.
If the state does change over suspend, users will simply have to prod
rfkill-input twice in order to toggle the state.

Userspace policy agents will be able to implement a more consistent user
experience.  For example, they can avoid the above problem if they
toggle devices individually.  Then there would be no "global state"
to get out of sync.

Currently there are only two rfkill drivers with persistent soft-blocked
state.  thinkpad-acpi already checks the software state on resume.
eeepc-laptop will require modification.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
CC: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:17 -04:00
Alan Jenkins
7fa20a7f60 rfkill: rfkill_set_block() when suspended nitpick
If we return after fiddling with the state, userspace will see the
wrong state and rfkill_set_sw_state() won't work until the next call to
rfkill_set_block().  At the moment rfkill_set_block() will always be
called from rfkill_resume(), but this will change in future.

Also, presumably the point of this test is to avoid bothering devices
which may be suspended.  If we don't want to call set_block(), we
probably don't want to call query() either :-).

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-06-19 11:50:17 -04:00
Dmitry Baryshkov
25502bda07 ieee802154: use standard routine for printing dumps
Use print_hex_dump_bytes instead of self-written dumping function
for outputting packet dumps.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19 00:18:43 -07:00
Hendrik Brueckner
0ea920d211 af_iucv: Return -EAGAIN if iucv msg limit is exceeded
If the iucv message limit for a communication path is exceeded,
sendmsg() returns -EAGAIN instead of -EPIPE.
The calling application can then handle this error situtation,
e.g. to try again after waiting some time.

For blocking sockets, sendmsg() waits up to the socket timeout
before returning -EAGAIN. For the new wait condition, a macro
has been introduced and the iucv_sock_wait_state() has been
refactored to this macro.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19 00:10:40 -07:00
Hendrik Brueckner
bb664f49f8 af_iucv: Change if condition in sendmsg() for more readability
Change the if condition to exit sendmsg() if the socket in not connected.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-19 00:10:39 -07:00
Trond Myklebust
47fcb03fef SUNRPC: Fix the TCP server's send buffer accounting
Currently, the sunrpc server is refusing to allow us to process new RPC
calls if the TCP send buffer is 2/3 full, even if we do actually have
enough free space to guarantee that we can send another request.
The following patch fixes svc_tcp_has_wspace() so that we only stop
processing requests if we know that the socket buffer cannot possibly fit
another reply.

It also fixes the tcp write_space() callback so that we only clear the
SOCK_NOSPACE flag when the TCP send buffer is less than 2/3 full.
This should ensure that the send window will grow as per the standard TCP
socket code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 19:58:51 -07:00
Trond Myklebust
1f84603c09 Merge branch 'devel-for-2.6.31' into for-2.6.31
Conflicts:
	fs/nfs/client.c
	fs/nfs/super.c
2009-06-18 18:13:44 -07:00
Linus Torvalds
d2aa455037 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
  netxen: fix tx ring accounting
  netxen: fix detection of cut-thru firmware mode
  forcedeth: fix dma api mismatches
  atm: sk_wmem_alloc initial value is one
  net: correct off-by-one write allocations reports
  via-velocity : fix no link detection on boot
  Net / e100: Fix suspend of devices that cannot be power managed
  TI DaVinci EMAC : Fix rmmod error
  net: group address list and its count
  ipv4: Fix fib_trie rebalancing, part 2
  pkt_sched: Update drops stats in act_police
  sky2: version 1.23
  sky2: add GRO support
  sky2: skb recycling
  sky2: reduce default transmit ring
  sky2: receive counter update
  sky2: fix shutdown synchronization
  sky2: PCI irq issues
  sky2: more receive shutdown
  sky2: turn off pause during shutdown
  ...

Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
2009-06-18 14:07:15 -07:00
Eric Dumazet
81e2a3d5b7 atm: sk_wmem_alloc initial value is one
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

This broke net/atm since this protocol assumed a null
initial value. This patch makes necessary changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:12 -07:00
Eric Dumazet
31e6d363ab net: correct off-by-one write allocations reports
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:12 -07:00
Jiri Pirko
31278e7147 net: group address list and its count
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
   on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c              |    4 +-
 drivers/net/e1000/e1000_main.c  |    4 +-
 drivers/net/ixgbe/ixgbe_main.c  |    6 +-
 drivers/net/mv643xx_eth.c       |    2 +-
 drivers/net/niu.c               |    4 +-
 drivers/net/virtio_net.c        |   10 ++--
 drivers/s390/net/qeth_l2_main.c |    2 +-
 include/linux/netdevice.h       |   17 +++--
 net/core/dev.c                  |  130 ++++++++++++++++++--------------------
 9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:08 -07:00
Jarek Poplawski
7b85576d15 ipv4: Fix fib_trie rebalancing, part 2
My previous patch, which explicitly delays freeing of tnodes by adding
them to the list to flush them after the update is finished, isn't
strict enough. It treats exceptionally tnodes without parent, assuming
they are newly created, so "invisible" for the read side yet.

But the top tnode doesn't have parent as well, so we have to exclude
all exceptions (at least until a better way is found). Additionally we
need to move rcu assignment of this node before flushing, so the
return type of the trie_rebalance() function is changed.

Reported-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:28:51 -07:00
Jarek Poplawski
b964758050 pkt_sched: Update drops stats in act_police
Action police statistics could be misleading because drops are not
shown when expected.

With feedback from: Jamal Hadi Salim <hadi@cyberus.ca>

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 18:56:45 -07:00
Stephen Hemminger
603a8bbe62 skbuff: don't corrupt mac_header on skb expansion
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
value. The places where skb is expanded add an offset which would
change this flag into an invalid pointer (or offset).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 18:46:41 -07:00
Stephen Hemminger
19633e129c skbuff: skb_mac_header_was_set is always true on >32 bit
Looking at the crash in log_martians(), one suspect is that the check for
mac header being set is not correct.  The value of mac_header defaults to
0 on allocation, therefore skb_mac_header_was_set will always be true on
platforms using NET_SKBUFF_USES_OFFSET.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 18:46:03 -07:00
Trond Myklebust
301933a0ac Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31 2009-06-17 17:59:58 -07:00
Ricardo Labiaga
dd2b63d049 nfs41: Rename rq_received to rq_reply_bytes_recvd
The 'rq_received' member of 'struct rpc_rqst' is used to track when we
have received a reply to our request.  With v4.1, the backchannel
can now accept callback requests over the existing connection.  Rename
this field to make it clear that it is only used for tracking reply bytes
and not all bytes received on the connection.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:40 -07:00
Rahul Iyer
343952fa5a nfs41: Get the rpc_xprt * from the rpc_rqst instead of the rpc_clnt.
Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies
can both use the same code path.  A client needs the rpc_xprt in order
to reply to a callback.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:34 -07:00
Benny Halevy
8f97524235 nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:31 -07:00
Andy Adamson
9c9f3f5fa6 nfs41: sunrpc: add a struct svc_xprt pointer to struct svc_serv for backchannel use
This svc_xprt is passed on to the callback service thread to be later used
to processes incoming svc_rqst's

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:31 -07:00
Benny Halevy
7652e5a09b nfs41: sunrpc: provide functions to create and destroy a svc_xprt for backchannel use
For nfs41 callbacks we need an svc_xprt to process requests coming up the
backchannel socket as rpc_rqst's that are transformed into svc_rqst's that
need a rq_xprt to be processed.

The svc_{udp,tcp}_create methods are too heavy for this job as svc_create_socket
creates an actual socket to listen on while for nfs41 we're "reusing" the
fore channel's socket.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:30 -07:00
Ricardo Labiaga
4d6bbb6233 nfs41: Backchannel bc_svc_process()
Implement the NFSv4.1 backchannel service.  Invokes the common callback
processing logic svc_process_common() to authenticate the call and
dispatch the appropriate NFSv4.1 XDR decoder and operation procedure.
It then invokes bc_send() to send the reply over the same connection.
bc_send() is implemented in a separate patch.

At this time there is no slot validation or reply cache handling.

[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Move bc_svc_process() declaration to correct patch]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:29 -07:00
Ricardo Labiaga
1cad7ea6fe nfs41: Refactor svc_process()
net/sunrpc/svc.c:svc_process() is used by the NFSv4 callback service
to process RPC requests arriving over connections initiated by the
server.  NFSv4.1 supports callbacks over the backchannel on connections
initiated by the client.  This patch refactors svc_process() so that
common code can also be used by the backchannel.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:28 -07:00
Ricardo Labiaga
0d90ba1cd4 nfs41: Backchannel callback service helper routines
Executes the backchannel task on the RPC state machine using
the existing open connection previously established by the client.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>

nfs41: Add bc_svc.o to sunrpc Makefile.

[nfs41: bc_send() does not need to be exported outside RPC module]
[nfs41: xprt_free_bc_request() need not be exported outside RPC module]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:28 -07:00
Ricardo Labiaga
55ae1aabfb nfs41: Add backchannel processing support to RPC state machine
Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests.  It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.

It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.

Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.

Introduce a new transmit state for the client to reply on to backchannel
requests.  This new state simply reserves the transport and issues the
reply.  In case of a connection related error, disconnects the transport and
drops the reply.  It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".

Note: There is no need to loop attempting to reserve the transport.  If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit.  rpc_execute() will invoke it again
after the task is taken off the sleep queue.

[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:24 -07:00
Trond Myklebust
88b5ed73bc SUNRPC: Fix a missing "break" option in xs_tcp_setup_socket()
In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want
to get rid of the existing socket and retry immediately, just as the
comment says. Currently we end up sleeping for a minute, due to the missing
"break" statement.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:57 -07:00
Ricardo Labiaga
44b98efdd0 nfs41: New xs_tcp_read_data()
Handles RPC replies and backchannel callbacks.  Traditionally the NFS
client has expected only RPC replies on its open connections.  With
NFSv4.1, callbacks can arrive over an existing open connection.

This patch refactors the old xs_tcp_read_request() into an RPC reply handler:
xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(),
and a common routine to read the data off the transport: xs_tcp_read_common().
The new xs_tcp_read_callback() queues callback requests onto a queue where
the callback service (a separate thread) is listening for the processing.

This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com)
and Benny Halevy (bhalevy@panasas.com).

xs_tcp_read_callback() drops the connection when the number of expected
callbacks is exceeded.  Use xprt_force_disconnect(), ensuring tasks on
the pending queue are awaken on disconnect.

[nfs41: Keep track of RPC call/reply direction with a flag]
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Moves embedded #ifdefs into #ifdef function blocks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:16 -07:00
Ricardo Labiaga
fb7a0b9add nfs41: New backchannel helper routines
This patch introduces support to setup the callback xprt on the client side.
It allocates/ destroys the preallocated memory structures used to process
backchannel requests.

At setup time, xprt_setup_backchannel() is invoked to allocate one or
more rpc_rqst structures and substructures.  This ensures that they
are available when an RPC callback arrives.  The rpc_rqst structures
are maintained in a linked list attached to the rpc_xprt structure.
We keep track of the number of allocations so that they can be correctly
removed when the channel is destroyed.

When an RPC callback arrives, xprt_alloc_bc_request() is invoked to
obtain a preallocated rpc_rqst structure.  An rpc_xprt structure is
returned, and its RPC_BC_PREALLOC_IN_USE bit is set in
rpc_xprt->bc_flags.  The structure is removed from the the list
since it is now in use, and it will be later added back when its
user is done with it.

After the RPC callback replies, the rpc_rqst structure is returned
by invoking xprt_free_bc_request().  This clears the
RPC_BC_PREALLOC_IN_USE bit and adds it back to the list, allowing it
to be reused by a subsequent RPC callback request.

To be consistent with the reception of RPC messages, the backchannel requests
should be placed into the 'struct rpc_rqst' rq_rcv_buf, which is then in turn
copied to the 'struct rpc_rqst' rq_private_buf.

[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright notice and explain page allocation]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:14 -07:00
Ricardo Labiaga
f9acac1a47 nfs41: Initialize new rpc_xprt callback related fields
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:14 -07:00
Ricardo Labiaga
f4a2e418bf nfs41: Process the RPC call direction
Reading and storing the RPC direction is a three step process.

1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it
in the XDR buffer since the 'struct rpc_rqst' is not yet available.

2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state.
This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR.
For example, we may be reading a continuation packet to a large reply.
Therefore, we can't simply obtain the 'struct rpc_rqst' during the
TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA.

This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to
read the RPC direction.  It then uses TCP_RCV_COPY_CALLDIR to indicate the
RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated.

3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper
functions.  xs_tcp_read_common() then saves the RPC direction in the XDR
buffer if TCP_RCV_COPY_CALLDIR is set.  This will happen when we're reading
the data immediately after the direction was read.  xs_tcp_read_common()
then clears this flag.

[was nfs41: Skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Add RPC direction back into the XDR buffer]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Don't skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:46 -07:00
Ricardo Labiaga
18dca02aeb nfs41: Add ability to read RPC call direction on TCP stream.
NFSv4.1 callbacks can arrive over an existing connection. This patch adds
the logic to read the RPC call direction (call or reply). It does this by
updating the state machine to look for the call direction invoking
xs_tcp_read_calldir(...) after reading the XID.

[nfs41: Keep track of RPC call/reply direction with a flag]

As per 11/14/08 review of RFC 53/85.

Add a new flag to track whether the incoming message is an RPC call or an
RPC reply.  TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in
xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel.
It is cleared if the message is an RPC request sent on the back channel.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:45 -07:00
Andy Adamson
aae2006e9b nfs41: sunrpc: Export the call prepare state for session reset
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:07 -07:00
Eric Dumazet
c564039fd8 net: sk_wmem_alloc has initial value of one, not zero
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.

Reported by Ingo Molnar, and full diagnostic from David Miller.

This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-17 04:31:25 -07:00
David Howells
519d25679e RxRPC: Don't attempt to reuse aborted connections
Connections that have seen a connection-level abort should not be reused
as the far end will just abort them again; instead a new connection
should be made.

Connection-level aborts occur due to such things as authentication
failures.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 21:20:14 -07:00