10794 Commits

Author SHA1 Message Date
Tom Tucker
45abdf1c7b p9: Fix leak of waitqueue in request allocation path
If a T or R fcall cannot be allocated, the function returns an error
but neglects to free the wait queue that was successfully allocated.

If it comes through again a second time this wq will be overwritten
with a new allocation and the old allocation will be leaked.

Also, if the client is subsequently closed, the close path will
attempt to clean up these allocations, so set the req fields to
NULL to avoid duplicate free.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05 13:19:06 -06:00
Tom Tucker
82b189eaaf 9p: Remove unneeded free of fcall for Flush
T and R fcall are reused until the client is destroyed. There does
not need to be a special case for Flush

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05 13:19:06 -06:00
Tom Tucker
cac23d6505 9p: Make all client spin locks IRQ safe
The client lock must be IRQ safe. Some of the lock acquisition paths
took regular spin locks.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05 13:19:06 -06:00
Tom Tucker
517ac45af4 9p: rdma: Set trans prior to requesting async connection ops
The RDMA connection manager is fundamentally asynchronous.
Since the async callback context is the client pointer, the
transport in the client struct needs to be set prior to calling
the first async op.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05 13:19:06 -06:00
David S. Miller
518a09ef11 tcp: Fix recvmsg MSG_PEEK influence of blocking behavior.
Vito Caputo noticed that tcp_recvmsg() returns immediately from
partial reads when MSG_PEEK is used.  In particular, this means that
SO_RCVLOWAT is not respected.

Simply remove the test.  And this matches the behavior of several
other systems, including BSD.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 03:36:01 -08:00
Alexey Dobriyan
efb9a8c28c netfilter: netns ct: walk netns list under RTNL
netns list (just list) is under RTNL. But helper and proto unregistration
happen during rmmod when RTNL is not held, and that's how it was tested:
modprobe/rmmod vs clone(CLONE_NEWNET)/exit.

BUG: unable to handle kernel paging request at 0000000000100100	<===
IP: [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
PGD 15e300067 PUD 15e1d8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0
Modules linked in: nf_conntrack_proto_sctp(-) nf_conntrack_proto_dccp(-) af_packet iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom [last unloaded: nf_conntrack_proto_sctp]
Pid: 16758, comm: rmmod Not tainted 2.6.28-rc2-netns-xfrm #3
RIP: 0010:[<ffffffffa009890f>]  [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
RSP: 0018:ffff88015dc1fec8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: 00000000001000f8 RCX: 0000000000000000
RDX: ffffffffa009575c RSI: 0000000000000003 RDI: ffffffffa00956b5
RBP: ffff88015dc1fed8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88015dc1fe48 R12: ffffffffa0458f60
R13: 0000000000000880 R14: 00007fff4c361d30 R15: 0000000000000880
FS:  00007f624435a6f0(0000) GS:ffffffff80521580(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000100100 CR3: 0000000168969000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 16758, threadinfo ffff88015dc1e000, task ffff880179864218)
Stack:
 ffffffffa0459100 0000000000000000 ffff88015dc1fee8 ffffffffa0457934
 ffff88015dc1ff78 ffffffff80253fef 746e6e6f635f666e 6f72705f6b636172
 00707463735f6f74 ffffffff8024cb30 00000000023b8010 0000000000000000
Call Trace:
 [<ffffffffa0457934>] nf_conntrack_proto_sctp_fini+0x10/0x1e [nf_conntrack_proto_sctp]
 [<ffffffff80253fef>] sys_delete_module+0x19f/0x1fe
 [<ffffffff8024cb30>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff803ea9b2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8020b52b>] system_call_fastpath+0x16/0x1b
Code: 13 35 e0 e8 c4 6c 1a e0 48 8b 1d 6d c6 46 e0 eb 16 48 89 df 4c 89 e2 48 c7 c6 fc 85 09 a0 e8 61 cd ff ff 48 8b 5b 08 48 83 eb 08 <48> 8b 43 08 0f 18 08 48 8d 43 08 48 3d 60 4f 50 80 75 d3 5b 41
RIP  [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack]
 RSP <ffff88015dc1fec8>
CR2: 0000000000100100
---[ end trace bde8ac82debf7192 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 03:03:18 -08:00
Benjamin Thery
e3ec6cfc26 ipv6: fix run pending DAD when interface becomes ready
With some net devices types, an IPv6 address configured while the
interface was down can stay 'tentative' forever, even after the interface
is set up. In some case, pending IPv6 DADs are not executed when the
device becomes ready.

I observed this while doing some tests with kvm. If I assign an IPv6 
address to my interface eth0 (kvm driver rtl8139) when it is still down
then the address is flagged tentative (IFA_F_TENTATIVE). Then, I set
eth0 up, and to my surprise, the address stays 'tentative', no DAD is
executed and the address can't be pinged.

I also observed the same behaviour, without kvm, with virtual interfaces
types macvlan and veth.

Some easy steps to reproduce the issue with macvlan:

1. ip link add link eth0 type macvlan
2. ip -6 addr add 2003::ab32/64 dev macvlan0
3. ip addr show dev macvlan0
   ... 
   inet6 2003::ab32/64 scope global tentative
   ...
4. ip link set macvlan0 up
5. ip addr show dev macvlan0
   ...
   inet6 2003::ab32/64 scope global tentative
   ...
   Address is still tentative

I think there's a bug in net/ipv6/addrconf.c, addrconf_notify():
addrconf_dad_run() is not always run when the interface is flagged IF_READY.
Currently it is only run when receiving NETDEV_CHANGE event. Looks like
some (virtual) devices doesn't send this event when becoming up.

For both NETDEV_UP and NETDEV_CHANGE events, when the interface becomes
ready, run_pending should be set to 1. Patch below.

'run_pending = 1' could be moved below the if/else block but it makes 
the code less readable.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:43:57 -08:00
Eric Dumazet
270acefafe net: sk_free_datagram() should use sk_mem_reclaim_partial()
I noticed a contention on udp_memory_allocated on regular UDP applications.

While tcp_memory_allocated is seldom used, it appears each incoming UDP frame
is currently touching udp_memory_allocated when queued, and when received by
application.

One possible solution is to use sk_mem_reclaim_partial() instead of
sk_mem_reclaim(), so that we keep a small reserve (less than one page)
of memory for each UDP socket.

We did something very similar on TCP side in commit
9993e7d313e80bdc005d09c7def91903e0068f07
([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer())

A more complex solution would need to convert prot->memory_allocated to
use a percpu_counter with batches of 64 or 128 pages.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:38:06 -08:00
Randy Dunlap
b22cecdd8f net/9p: fix printk format warnings
Fix printk format warnings in net/9p.
Built cleanly on 7 arches.

net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05 01:35:55 -08:00
Gerrit Renker
d99a7bd210 dccp: Cleanup routines for feature negotiation
This inserts the required de-allocation routines for memory allocated
by feature negotiation in the socket destructors, replacing
dccp_feat_clean() in one instance.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:56:30 -08:00
Gerrit Renker
ac75773c27 dccp: Per-socket initialisation of feature negotiation
This provides feature-negotiation initialisation for both DCCP sockets
and DCCP request_sockets, to support feature negotiation during
connection setup.

It also resolves a FIXME regarding the congestion control
initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:55:49 -08:00
Gerrit Renker
61e6473efb dccp: List management for new feature negotiation
This adds list initial fields and list management functions for the
new feature negotiation implementation.

Thanks to Arnaldo for suggestions and improvements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:54:04 -08:00
Gerrit Renker
7d43d1a0f2 dccp: Implement lookup table for feature-negotiation information
A lookup table for feature-negotiation information, extracted from RFC
4340/42, is provided by this patch. All currently known features can
be found in this table, along with their feature location, their
default value, and type.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:43:47 -08:00
Gerrit Renker
bd012f2e7b dccp: Basic data structure for feature negotiation
This patch prepares for the new and extended feature-negotiation
routines.

The following feature-negotiation data structures are provided:
	* a container for the various (SP or NN) values,
	* symbolic state names to track feature states,
	* an entry struct which holds all current information together,
	* elementary functions to fill in and process these structures.

Entry structs are arranged as FIFO for the following reason: RFC 4340
specifies that if multiple options of the same type are present, they
are processed in the order of their appearance in the packet; which
means that this order needs to be preserved in the local data
structure (the later insertion code also respects this order).

The struct list_head has been chosen for the following reasons: the most
frequent operations are

 * add new entry at tail (when receiving Change or setting socket
   options);
 * delete entry (when Confirm has been received);
 * deep copy of entire list (cloning from listening socket onto
   request socket).

The NN value has been set to 64 bit, which is a currently sufficient
upper limit (Sequence Window feature has 48 bit).

Thanks to Arnaldo, who contributed the streamlined layout of the entry
struct.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 23:38:20 -08:00
Patrick McHardy
9b22ea5609 net: fix packet socket delivery in rx irq handler
The changes to deliver hardware accelerated VLAN packets to packet
sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
The __vlan_hwaccel_rx() function is called directly from the drivers
RX function, for non-NAPI drivers that means its still in RX IRQ
context:

[   27.779463] ------------[ cut here ]------------
[   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
...
[   27.782520]  [<c0264755>] netif_nit_deliver+0x5b/0x75
[   27.782590]  [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
[   27.782664]  [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
[   27.782738]  [<c0155b17>] handle_IRQ_event+0x23/0x51
[   27.782808]  [<c015692e>] handle_edge_irq+0xc2/0x102
[   27.782878]  [<c0105fd5>] do_IRQ+0x4d/0x64

Split hardware accelerated VLAN reception into two parts to fix this:

- __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
  device lookup, then calls netif_receive_skb()/netif_rx()

- vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
  in softirq context, performs the real reception and delivery to
  packet sockets.

Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04 14:49:57 -08:00
Andreas Steffen
79654a7698 xfrm: Have af-specific init_tempsel() initialize family field of temporary selector
While adding MIGRATE support to strongSwan, Andreas Steffen noticed that
the selectors provided in XFRM_MSG_ACQUIRE have their family field
uninitialized (those in MIGRATE do have their family set).

Looking at the code, this is because the af-specific init_tempsel()
(called via afinfo->init_tempsel() in xfrm_init_tempsel()) do not set
the value.

Reported-by: Andreas Steffen <andreas.steffen@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
2008-11-04 14:49:19 -08:00
Linus Torvalds
75fa67706c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  xfrm: Fix xfrm_policy_gc_lock handling.
  niu: Use pci_ioremap_bar().
  bnx2x: Version Update
  bnx2x: Calling netif_carrier_off at the end of the probe
  bnx2x: PCI configuration bug on big-endian
  bnx2x: Removing the PMF indication when unloading
  mv643xx_eth: fix SMI bus access timeouts
  net: kconfig cleanup
  fs_enet: fix polling
  XFRM: copy_to_user_kmaddress() reports local address twice
  SMC91x: Fix compilation on some platforms.
  udp: Fix the SNMP counter of UDP_MIB_INERRORS
  udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS
  drivers/net/smc911x.c: Fix lockdep warning on xmit.
2008-11-04 08:30:12 -08:00
David S. Miller
d2ad3ca88d net/: Kill now superfluous ->last_rx stores.
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 22:01:07 -08:00
Stephen Hemminger
265eb67fb4 netem: eliminate unneeded return values
All these individual parsing functions never return an error,
so they can be void.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 21:13:26 -08:00
Alexey Dobriyan
bbb770e7ab xfrm: Fix xfrm_policy_gc_lock handling.
From: Alexey Dobriyan <adobriyan@gmail.com>

Based upon a lockdep trace by Simon Arlott.

xfrm_policy_kill() can be called from both BH and
non-BH contexts, so we have to grab xfrm_policy_gc_lock
with BH disabling.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 19:11:29 -08:00
Jianjun Kong
ab29109210 net: remove two duplicated #include
Removed duplicated #include <rdma/ib_verbs.h> in net/9p/trans_rdma.c
		and  #include <linux/thread_info.h> in net/socket.c

Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 18:23:09 -08:00
Alexey Dobriyan
6d9f239a1e net: '&' redux
I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.

So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 18:21:05 -08:00
Stephen Hemminger
24f8b2385e net: increase receive packet quantum
This patch gets about 1.25% back on tbench regression.

My change to NAPI for multiqueue support changed the time limit on
network receive processing.  Under sustained loads like tbench, this
can cause the receiver to reschedule prematurely. 

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 17:14:38 -08:00
Julius Volz
48148938b4 IPVS: Remove supports_ipv6 scheduler flag
Remove the 'supports_ipv6' scheduler flag since all schedulers now
support IPv6.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 17:08:56 -08:00
Julius Volz
445483758e IPVS: Add IPv6 support to LBLC/LBLCR schedulers
Add IPv6 support to LBLC and LBLCR schedulers. These were the last
schedulers without IPv6 support, but we might want to keep the
supports_ipv6 flag in the case of future schedulers without IPv6
support.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 17:08:28 -08:00
Jarek Poplawski
67305ebc99 pkt_sched: sch_generic: Kfree gso_skb in qdisc_reset()
Since gso_skb is re-used for qdisc_peek_dequeued(), and this skb is
counted in the qdisc->q.qlen, it has to be kfreed during qdisc_reset()
when qlen is zeroed.

With help from David S. Miller <davem@davemloft.net>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 02:52:50 -08:00
Jianjun Kong
5799de0b12 net: clean up net/ipv4/tcp_ipv4.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 02:49:10 -08:00
Jianjun Kong
539afedfcc net: clean up net/ipv4/devinet.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 02:48:48 -08:00
Jianjun Kong
f4cca7ffb2 net: clean up net/ipv4/pararp.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 02:48:14 -08:00
Jianjun Kong
fd3f8c4cb6 net: clean up net/ipv4/ip_fragment.c tcp_timer.c ip_input.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 02:47:38 -08:00
Arnaud Ebalard
a1caa32295 XFRM: copy_to_user_kmaddress() reports local address twice
While adding support for MIGRATE/KMADDRESS in strongSwan (as specified
in draft-ebalard-mext-pfkey-enhanced-migrate-00), Andreas Steffen
noticed that XFRMA_KMADDRESS attribute passed to userland contains the
local address twice (remote provides local address instead of remote
one).

This bug in copy_to_user_kmaddress() affects only key managers that use
native XFRM interface (key managers that use PF_KEY are not affected).

For the record, the bug was in the initial changeset I posted which
added support for KMADDRESS (13c1d18931ebb5cf407cb348ef2cd6284d68902d
'xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)').

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Reported-by: Andreas Steffen <andreas.steffen@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 01:30:23 -08:00
Jianjun Kong
c354e12463 net: clean up net/ipv4/ipmr.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:28:02 -08:00
Jianjun Kong
09cb105ea7 net: clean up net/ipv4/ip_sockglue.c tcp_output.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:27:11 -08:00
Jianjun Kong
a7e9ff735b net: clean up net/ipv4/igmp.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:26:09 -08:00
Jianjun Kong
6ed2533e55 net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:25:16 -08:00
Jianjun Kong
5a5f3a8db9 net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:24:34 -08:00
Jianjun Kong
d9319100c1 net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03 00:23:42 -08:00
David S. Miller
e0db4a786b sunrpc: Fix build warning due to typo in %pI4 format changes.
Noticed by Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 23:57:06 -08:00
Wei Yongjun
0856f93958 udp: Fix the SNMP counter of UDP_MIB_INERRORS
UDP packets received in udpv6_recvmsg() are not only IPv6 UDP packets, but
also have IPv4 UDP packets, so when do the counter of UDP_MIB_INERRORS in
udpv6_recvmsg(), we should check whether the packet is a IPv6 UDP packet
or a IPv4 UDP packet.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 23:52:46 -08:00
Wei Yongjun
f26ba17511 udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS
If UDP echo is sent to xinetd/echo-dgram, the UDP reply will be received
at the sender. But the SNMP counter of UDP_MIB_INDATAGRAMS will be not
increased, UDP6_MIB_INDATAGRAMS will be increased instead.

  Endpoint A                      Endpoint B
  UDP Echo request ----------->
  (IPv4, Dst port=7)
                   <----------    UDP Echo Reply
                                  (IPv4, Src port=7)

This bug is come from this patch cb75994ec311b2cd50e5205efdcc0696abd6675d.

It do counter UDP[6]_MIB_INDATAGRAMS until udp[v6]_recvmsg. Because
xinetd used IPv6 socket to receive UDP messages, thus, when received
UDP packet, the UDP6_MIB_INDATAGRAMS will be increased in function
udpv6_recvmsg() even if the packet is a IPv4 UDP packet.

This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 23:52:45 -08:00
Julius Volz
20971a0afb IPVS: Add IPv6 support to SH and DH schedulers
Add IPv6 support to SH and DH schedulers. I hope this simple IPv6 address
hashing is good enough. The 128 bit are just XORed into 32 before hashing
them like an IPv4 address.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 23:52:01 -08:00
Linus Torvalds
391e572cd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  af_unix: netns: fix problem of return value
  IRDA: remove double inclusion of module.h
  udp: multicast packets need to check namespace
  net: add documentation for skb recycling
  key: fix setkey(8) policy set breakage
  bpa10x: free sk_buff with kfree_skb
  xfrm: do not leak ESRCH to user space
  net: Really remove all of LOOPBACK_TSO code.
  netfilter: nf_conntrack_proto_gre: switch to register_pernet_gen_subsys()
  netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsys
  net: delete excess kernel-doc notation
  pppoe: Fix socket leak.
  gianfar: Don't reset TBI<->SerDes link if it's already up
  gianfar: Fix race in TBI/SerDes configuration
  at91_ether: request/free GPIO for PHY interrupt
  amd8111e: fix dma_free_coherent context
  atl1: fix vlan tag regression
  SMC91x: delete unused local variable "lp"
  myri10ge: fix stop/go mmio ordering
  bonding: fix panic when taking bond interface down before removing module
  ...
2008-11-02 10:15:52 -08:00
Jarek Poplawski
8ba25dad0a sch_netem: Replace ->requeue() method with open code
After removing netem classful functionality we are sure its inner
qdisc is tfifo, so we can replace qdisc->ops->requeue() method with
open code. After this patch there are no more ops->requeue() users.

The idea of this patch is by Patrick McHardy.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 00:36:03 -07:00
Jarek Poplawski
0220146411 sch_netem: Remove classful functionality
Patrick McHardy noticed that: "a lot of the functionality of netem
requires the inner tfifo anyways and rate-limiting is usually done
on top of netem. So I would suggest so either hard-wire the tfifo
qdisc or at least make the assumption that inner qdiscs are
work-conserving.", and later: "- a lot of other qdiscs still don't
work as inner qdiscs of netem [...]".

So, according to his suggestion, this patch removes classful options
of netem. The main reason of this change is to remove ops->requeue()
method, which is currently used only by netem.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 00:35:24 -07:00
Sangtae Ha
ae27e98a51 [TCP] CUBIC v2.3
Signed-off-by: Sangtae Ha <sha2@ncsu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 00:28:10 -07:00
Jianjun Kong
e27dfcea48 af_unix: clean up net/unix/af_unix.c garbage.c sysctl_net_unix.c
clean up net/unix/af_unix.c garbage.c sysctl_net_unix.c

Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01 21:38:31 -07:00
Jianjun Kong
48dcc33e5e af_unix: netns: fix problem of return value
fix problem of return value

net/unix/af_unix.c: unix_net_init()
when error appears, it should return 'error', not always return 0.

Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01 21:37:27 -07:00
Eric Dumazet
920a46115c udp: multicast packets need to check namespace
Current UDP multicast delivery is not namespace aware.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01 21:22:23 -07:00
Eric Dumazet
c37ccc0d4e udp: add a missing smp_wmb() in udp_lib_get_port()
Corey Minyard spotted a missing memory barrier in udp_lib_get_port()

We need to make sure a reader cannot read the new 'sk->sk_next' value
and previous value of 'sk->sk_hash'. Or else, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.

This patch is temporary, since we expect an upcoming patch
to introduce another way of handling the problem.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01 21:19:18 -07:00
Nicolas Dichtel
7e3a42a12c xfrm6: handling fragment
RFC4301 Section 7.1 says:

"7.1.  Tunnel Mode SAs that Carry Initial and Non-Initial Fragments

     All implementations MUST support tunnel mode SAs that are configured
     to pass traffic without regard to port field (or ICMP type/code or
     Mobility Header type) values.  If the SA will carry traffic for
     specified protocols, the selector set for the SA MUST specify the
     port fields (or ICMP type/code or Mobility Header type) as ANY.  An
     SA defined in this fashion will carry all traffic including initial
     and non-initial fragments for the indicated Local/Remote addresses
     and specified Next Layer protocol(s)."

But for IPv6, fragment is treated as a protocol.  This change catches
protocol transported in fragmented packet.  In IPv4, there is no
problem.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-01 21:12:07 -07:00