Commit Graph

221782 Commits

Author SHA1 Message Date
KOVACS Krisztian
ae90bdeaea netfilter: fix compilation when conntrack is disabled but tproxy is enabled
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-12-15 23:53:41 +01:00
Jan Engelhardt
f1c722295e netfilter: xtables: use guarded types
We are supposed to use the kernel's own types in userspace exports.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-12-15 22:58:53 +01:00
Hans Schillstrom
b880c1f077 IPVS: Backup, adding version 0 sending capabilities
This patch adds a sysclt net.ipv4.vs.sync_version
that can be used to send sync msg in version 0 or 1 format.

sync_version value is logical,
     Value 1 (default) New version
           0 Plain old version

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
986a075795 IPVS: Backup, Change sending to Version 1 format
Enable sending and removal of version 0 sending
Affected functions,

ip_vs_sync_buff_create()
ip_vs_sync_conn()

ip_vs_core.c removal of IPv4 check.

*v5
 Just check cp->pe_data_len in ip_vs_sync_conn
 Check if padding needed before adding a new sync_conn
 to the buffer, i.e. avoid sending padding at the end.

*v4
 moved sanity check and pe_name_len after sloop.
 use cp->pe instead of cp->dest->svc->pe
 real length in each sync_conn, not padded length
 however total size of a sync_msg includes padding.

*v3
 Sending ip_vs_sync_conn_options in network order.
 Sending Templates for ONE_PACKET conn.
 Renaming of ip_vs_sync_mesg to ip_vs_sync_mesg_v0

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
fe5e7a1efb IPVS: Backup, Adding Version 1 receive capability
Functionality improvements
 * flags  changed from 16 to 32 bits
 * fwmark added (32 bits)
 * timeout in sec. added (32 bits)
 * pe data added (Variable length)
 * IPv6 capabilities (3x16 bytes for addr.)
 * Version and type in every conn msg.

ip_vs_process_message() now handles Version 1 messages
and will call ip_vs_process_message_v0() for version 0 messages.

ip_vs_proc_conn() is common for both version, and handles the update of
connection hash.

ip_vs_conn_fill_param_sync()    - Version 1 messages only
ip_vs_conn_fill_param_sync_v0() - Version 0 messages only

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
2981bc9a63 IPVS: Backup, Adding structs for new sync format
New structs defined for version 1 of sync.

 * ip_vs_sync_v4       Ipv4 base format struct
 * ip_vs_sync_v6       Ipv6 base format struct

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
a5959d53d6 IPVS: Handle Scheduling errors.
If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist,
this error must propagate as ignored=-1 to ip_vs_schedule().
Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule()
should also return *ignored=-1;

This patch just relies on the fact that ignored is 1 before calling
ip_vs_sched_persist().

Sent from Julian:
  "The new case when ip_vs_conn_fill_param_persist fails
   should set *ignored = -1, so that we can use NF_DROP,
   see below. *ignored = -1 should be also used for ip_vs_conn_new
   failure in ip_vs_sched_persist() and ip_vs_schedule().
   The new negative value should be handled in tcp,udp,sctp"

"To summarize:

- *ignored = 1:
      protocol tried to schedule (eg. on SYN), found svc but the
      svc/scheduler decides that this packet should be accepted with
      NF_ACCEPT because it must not be scheduled.

- *ignored = 0:
      scheduler can not find destination, so try bypass or
      return ICMP and then NF_DROP (ip_vs_leave).

- *ignored = -1:
      scheduler tried to schedule but fatal error occurred, eg.
      ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param
      failure such as missing Call-ID, ENOMEM on skb_linearize
      or pe_data. In this case we should return NF_DROP without
      any attempts to send ICMP with ip_vs_leave."

More or less all ideas and input to this patch is work from
Julian Anastasov

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
3716522653 IPVS: skb defrag in L7 helpers
L7 helpers like sip needs skb defrag
since L7 data can be fragmented.

This patch requires "IPVS Break ports-2 into src_port and dst_port" patch

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
Hans Schillstrom
ce144f249f IPVS: Split ports[2] into src_port and dst_port
Avoid sending invalid pointer due to skb_linearize() call.
This patch prepares for next patch where skb_linearize is a part.

In ip_vs_sched_persist() params the ports ptr will be replaced by
src and dst port.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
Hans Schillstrom
0e051e683b IPVS: Backup, Prepare for transferring firewall marks (fwmark) to the backup daemon.
One struct will have fwmark added:
 * ip_vs_conn

ip_vs_conn_new() and ip_vs_find_dest()
will have an extra param - fwmark
The effects of that, is in this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
Patrick McHardy
2c2bf08614 Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6 2010-11-16 10:21:27 +01:00
Eric Dumazet
3bfd45f93c netfilter: nf_conntrack: one less atomic op in nf_ct_expect_insert()
Instead of doing atomic_inc(&exp->use) twice,
call atomic_add(2, &exp->use);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-16 10:19:18 +01:00
Simon Horman
8f1b03a4c1 ipvs: allow transmit of GRO aggregated skbs
Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Eric Dumazet
a333e2ec05 ipvs: remove shadow rt variable
Remove a sparse warning about rt variable.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Eric Dumazet
4ecd29447e ipvs: add static and read_mostly attributes
ip_vs_conn_tab_bits & ip_vs_conn_tab_mask are static to
ipvs/ip_vs_conn.c

ip_vs_conn_tab_size, ip_vs_conn_tab_mask, ip_vs_conn_tab [the pointer],
ip_vs_conn_rnd are mostly read.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
8aadf93c9c IPVS: buffer argument to ip_vs_process_message() should not be const
It is assigned to a non-const variable and its contents are modified.

Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
7ae246a15a IPVS: Remove useless { } block from ip_vs_process_message()
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
d494262b8a IPVS: Make the cp argument to ip_vs_sync_conn() static
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Simon Horman
ea2c73afc2 IPVS: Only match pe_data created by the same pe
Only match persistence engine data if it was
created by the same persistence engine.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Simon Horman
e9e5eee873 IPVS: Add persistence engine to connection entry
The dest of a connection may not exist if it has been created as the result
of connection synchronisation. But in order for connection entries for
templates with persistence engine data created through connection
synchronisation to be valid access to the persistence engine pointer is
required.  So add the persistence engine to the connection itself.

Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Eric Dumazet
c5d277d29a netfilter: rcu sparse cleanups
Use RCU helpers to reduce number of sparse warnings
(CONFIG_SPARSE_RCU_POINTER=y), and adds lockdep checks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 19:45:13 +01:00
Eric Dumazet
ab0cba2512 netfilter: nf_nat_amanda: rename a variable
Avoid a sparse warning about 'ret' variable shadowing

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 18:45:12 +01:00
Eric Dumazet
eb733162ae netfilter: add __rcu annotations
Use helpers to reduce number of sparse warnings
(CONFIG_SPARSE_RCU_POINTER=y)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 18:43:59 +01:00
Eric Dumazet
be9e9163af netfilter: nf_ct_frag6_sysctl_table is static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 18:18:29 +01:00
Eric Dumazet
0e60ebe04c netfilter: add __rcu annotations
Add some __rcu annotations and use helpers to reduce number of sparse
warnings (CONFIG_SPARSE_RCU_POINTER=y)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 18:17:21 +01:00
Frdric Leroy
9811600f7c netfilter: xt_CLASSIFY: add ARP support, allow CLASSIFY target on any table
Signed-off-by: Frdric Leroy <fredo@starox.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 13:57:56 +01:00
Changli Gao
03c0e5bb34 netfilter: nf_nat: define nat_pptp_info as needed
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 12:27:27 +01:00
Changli Gao
e0e76c83be netfilter: ct_extend: define NF_CT_EXT_* as needed
Less IDs make nf_ct_ext smaller.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 12:23:24 +01:00
Changli Gao
76a2d3bcfc netfilter: nf_nat: don't use atomic bit operation
As we own the conntrack and the others can't see it until we confirm it,
we don't need to use atomic bit operation on ct->status.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 11:59:03 +01:00
Changli Gao
0f8e80044b netfilter: nf_conntrack: define ct_*_info as needed
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 11:51:06 +01:00
Changli Gao
3b23688069 netfilter: ct_extend: fix the wrong alloc_size
In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice
wrongly.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 11:47:52 +01:00
Jan Engelhardt
b468645d72 netfilter: xt_LOG: do print MAC header on FORWARD
I am observing consistent behavior even with bridges, so let's unlock
this. xt_mac is already usable in FORWARD, too. Section 9 of
http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section9 says
the MAC source address is changed, but my observation does not match
that claim -- the MAC header is retained.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
[Patrick; code inspection seems to confirm this]
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 11:23:06 +01:00
Changli Gao
ca36181050 netfilter: xt_NFQUEUE: remove modulo operations
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-12 17:34:17 +01:00
Changli Gao
e5fc9e7a66 netfilter: nf_conntrack: don't always initialize ct->proto
ct->proto is big(60 bytes) due to structure ip_ct_tcp, and we don't need
to initialize the whole for all the other protocols. This patch moves
proto to the end of structure nf_conn, and pushes the initialization down
to the individual protocols.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-12 17:33:17 +01:00
David S. Miller
c753796769 ipv4: Make rt->fl.iif tests lest obscure.
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.

Make that explicit with some helper functions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 17:07:48 -08:00
David S. Miller
ed1deb7021 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-11-11 10:43:30 -08:00
Eric Dumazet
72cdd1d971 net: get rid of rtable->idev
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Eric Dumazet
46b13fc5c0 neigh: reorder struct neighbour
It is important to move nud_state outside of the often modified cache
line (because of refcnt), to reduce false sharing in neigh_event_send()

This is a followup of commit 0ed8ddf404 (neigh: Protect neigh->ha[]
with a seqlock)

This gives a 7% speedup on routing test with IP route cache disabled.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Jon Mason
c0c04c2a89 vxge: update driver version
Update vxge driver version

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:24 -08:00
Jon Mason
2c91308f44 vxge: sparse and other clean-ups
Correct issues found by running sparse on the vxge driver, as well as
other miscellaneous cleanups.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:24 -08:00
Jon Mason
1901d042ab vxge: update Kconfig
Update Kconfig to reflect Exar's purchase of Neterion (formerly S2IO).

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:23 -08:00
Jon Mason
ca3e3b8fae vxge: correct multi-function detection
The values used to determined if the adapter is running in single or
multi-function mode were previously modified to the values necessary
when making the VXGE_HW_FW_API_GET_FUNC_MODE firmware call.  However,
the firmware call was not modified.  This had the driver printing out on
probe that the adapter was in multi-function mode when in single
function mode and vice versa.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:23 -08:00
Jon Mason
e7935c9669 vxge: Titan1A detection
Detect if the adapter is Titan or Titan1A, and tune the driver for this
hardware.  Also, remove unnecessary function __vxge_hw_device_id_get.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:22 -08:00
Jon Mason
c3150eac9f vxge: Handle errors in vxge_hw_vpath_fw_api
Propagate the return code of the call to vxge_hw_vpath_fw_api and
__vxge_hw_vpath_pci_func_mode_get.  This enables the proper handling of
error conditions when querying the function mode of the device during
probe.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:22 -08:00
Jon Mason
b81b373384 vxge: add receive hardware timestamping
Add support for enable/disabling hardware timestamping on receive
packets via ioctl call.  When enabled, the hardware timestamp replaces
the FCS in the payload.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:21 -08:00
Jon Mason
e8ac175615 vxge: add support for ethtool firmware flashing
Add the ability in the vxge driver to flash firmware via ethtool.

Updated to include comments from Ben Hutchings.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:21 -08:00
Jon Mason
8424e00dfd vxge: serialize access to steering control register
It is possible for multiple callers to access the firmware interface for
the same vpath simultaneously, resulting in uncertain output.  Add locks
to serialize access.  Also, make functions only accessed locally static,
thus requiring some movement of code blocks.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:20 -08:00
Jon Mason
ddd62726e0 vxge: cleanup debug printing and asserts
Remove all of the unnecessary debug printk indirection and temporary
variables for vxge_debug_ll and vxge_assert.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:20 -08:00
Jon Mason
4d2a5b406c vxge: Wait for Rx to become idle before reseting or closing
Wait for the receive traffic to become idle before attempting to close
or reset the adapter.  To enable the processing of packets while Receive
Idle, move the clearing of __VXGE_STATE_CARD_UP bit in vxge_close to
after it.  Also, modify the return value of the ISR when the adapter is
down to IRQ_HANDLED.  Otherwise there are unhandled interrupts for the
device.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:19 -08:00
Jon Mason
47f01db44b vxge: enable rxhash
Enable RSS hashing and add ability to pass up the adapter calculated rx
hash up the network stack (if feature is available).  Add the ability to
enable/disable feature via ethtool, which requires that the adapter is
not running at the time.  Other miscellaneous cleanups and fixes
required to get RSS working.

Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 09:30:18 -08:00