2005-04-16 15:20:36 -07:00
/*
2007-02-09 23:24:49 +09:00
* Linux INET6 implementation
2005-04-16 15:20:36 -07:00
* Forwarding Information Database
*
* Authors :
2007-02-09 23:24:49 +09:00
* Pedro Roque < roque @ di . fc . ul . pt >
2005-04-16 15:20:36 -07:00
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
2014-03-28 12:07:02 +08:00
*
* Changes :
* Yuji SEKIYA @ USAGI : Support default route on router node ;
* remove ip6_null_entry from the top of
* routing table .
* Ville Nuorvala : Fixed routing subtrees .
2005-04-16 15:20:36 -07:00
*/
2012-05-15 14:11:53 +00:00
# define pr_fmt(fmt) "IPv6: " fmt
2005-04-16 15:20:36 -07:00
# include <linux/errno.h>
# include <linux/types.h>
# include <linux/net.h>
# include <linux/route.h>
# include <linux/netdevice.h>
# include <linux/in6.h>
# include <linux/init.h>
2006-08-04 23:20:06 -07:00
# include <linux/list.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2005-04-16 15:20:36 -07:00
# include <net/ipv6.h>
# include <net/ndisc.h>
# include <net/addrconf.h>
2015-07-21 10:43:48 +02:00
# include <net/lwtunnel.h>
2005-04-16 15:20:36 -07:00
# include <net/ip6_fib.h>
# include <net/ip6_route.h>
# define RT6_DEBUG 2
# if RT6_DEBUG >= 3
2012-05-15 14:11:54 +00:00
# define RT6_TRACE(x...) pr_debug(x)
2005-04-16 15:20:36 -07:00
# else
# define RT6_TRACE(x...) do { ; } while (0)
# endif
2014-03-28 12:07:04 +08:00
static struct kmem_cache * fib6_node_kmem __read_mostly ;
2005-04-16 15:20:36 -07:00
2014-10-06 19:58:34 +02:00
struct fib6_cleaner {
struct fib6_walker w ;
2008-03-03 23:31:57 -08:00
struct net * net ;
2005-04-16 15:20:36 -07:00
int ( * func ) ( struct rt6_info * , void * arg ) ;
2014-10-06 19:58:38 +02:00
int sernum ;
2005-04-16 15:20:36 -07:00
void * arg ;
} ;
# ifdef CONFIG_IPV6_SUBTREES
# define FWS_INIT FWS_S
# else
# define FWS_INIT FWS_L
# endif
2014-05-09 13:31:43 +08:00
static void fib6_prune_clones ( struct net * net , struct fib6_node * fn ) ;
2008-03-04 13:48:30 -08:00
static struct rt6_info * fib6_find_prefix ( struct net * net , struct fib6_node * fn ) ;
static struct fib6_node * fib6_repair_tree ( struct net * net , struct fib6_node * fn ) ;
2016-03-08 14:44:35 +01:00
static int fib6_walk ( struct net * net , struct fib6_walker * w ) ;
2014-10-06 19:58:34 +02:00
static int fib6_walk_continue ( struct fib6_walker * w ) ;
2005-04-16 15:20:36 -07:00
/*
* A routing update causes an increase of the serial number on the
* affected subtree . This allows for cached routes to be asynchronously
* tested when modifications are made to the destination cache as a
* result of redirects , path MTU changes , etc .
*/
2008-03-03 23:28:58 -08:00
static void fib6_gc_timer_cb ( unsigned long arg ) ;
2016-03-08 14:44:35 +01:00
# define FOR_WALKERS(net, w) \
list_for_each_entry ( w , & ( net ) - > ipv6 . fib6_walkers , lh )
2005-04-16 15:20:36 -07:00
2016-03-08 14:44:35 +01:00
static void fib6_walker_link ( struct net * net , struct fib6_walker * w )
2006-08-14 23:49:16 -07:00
{
2016-03-08 14:44:35 +01:00
write_lock_bh ( & net - > ipv6 . fib6_walker_lock ) ;
list_add ( & w - > lh , & net - > ipv6 . fib6_walkers ) ;
write_unlock_bh ( & net - > ipv6 . fib6_walker_lock ) ;
2006-08-14 23:49:16 -07:00
}
2016-03-08 14:44:35 +01:00
static void fib6_walker_unlink ( struct net * net , struct fib6_walker * w )
2006-08-14 23:49:16 -07:00
{
2016-03-08 14:44:35 +01:00
write_lock_bh ( & net - > ipv6 . fib6_walker_lock ) ;
2010-02-18 08:13:30 +00:00
list_del ( & w - > lh ) ;
2016-03-08 14:44:35 +01:00
write_unlock_bh ( & net - > ipv6 . fib6_walker_lock ) ;
2006-08-14 23:49:16 -07:00
}
2014-10-06 19:58:34 +02:00
2014-10-06 19:58:37 +02:00
static int fib6_new_sernum ( struct net * net )
2005-04-16 15:20:36 -07:00
{
2014-10-06 19:58:35 +02:00
int new , old ;
do {
2014-10-06 19:58:37 +02:00
old = atomic_read ( & net - > ipv6 . fib6_sernum ) ;
2014-10-06 19:58:35 +02:00
new = old < INT_MAX ? old + 1 : 1 ;
2014-10-06 19:58:37 +02:00
} while ( atomic_cmpxchg ( & net - > ipv6 . fib6_sernum ,
old , new ) ! = old ) ;
2014-10-06 19:58:35 +02:00
return new ;
2005-04-16 15:20:36 -07:00
}
2014-10-06 19:58:38 +02:00
enum {
FIB6_NO_SERNUM_CHANGE = 0 ,
} ;
2005-04-16 15:20:36 -07:00
/*
* Auxiliary address test functions for the radix tree .
*
2007-02-09 23:24:49 +09:00
* These assume a 32 bit processor ( although it will work on
2005-04-16 15:20:36 -07:00
* 64 bit processors )
*/
/*
* test bit
*/
2010-03-27 01:24:16 +00:00
# if defined(__LITTLE_ENDIAN)
# define BITOP_BE32_SWIZZLE (0x1F & ~7)
# else
# define BITOP_BE32_SWIZZLE 0
# endif
2005-04-16 15:20:36 -07:00
2014-10-06 19:58:34 +02:00
static __be32 addr_bit_set ( const void * token , int fn_bit )
2005-04-16 15:20:36 -07:00
{
2011-04-22 04:53:02 +00:00
const __be32 * addr = token ;
2010-03-27 01:24:16 +00:00
/*
* Here ,
2014-03-28 12:07:02 +08:00
* 1 < < ( ( ~ fn_bit ^ BITOP_BE32_SWIZZLE ) & 0x1f )
2010-03-27 01:24:16 +00:00
* is optimized version of
* htonl ( 1 < < ( ( ~ fn_bit ) & 0x1F ) )
* See include / asm - generic / bitops / le . h .
*/
2010-04-20 19:06:52 -07:00
return ( __force __be32 ) ( 1 < < ( ( ~ fn_bit ^ BITOP_BE32_SWIZZLE ) & 0x1f ) ) &
addr [ fn_bit > > 5 ] ;
2005-04-16 15:20:36 -07:00
}
2014-10-06 19:58:34 +02:00
static struct fib6_node * node_alloc ( void )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn ;
2007-02-10 01:45:03 -08:00
fn = kmem_cache_zalloc ( fib6_node_kmem , GFP_ATOMIC ) ;
2005-04-16 15:20:36 -07:00
return fn ;
}
2014-10-06 19:58:34 +02:00
static void node_free ( struct fib6_node * fn )
2005-04-16 15:20:36 -07:00
{
kmem_cache_free ( fib6_node_kmem , fn ) ;
}
2015-09-15 14:30:09 -07:00
static void rt6_rcu_free ( struct rt6_info * rt )
{
call_rcu ( & rt - > dst . rcu_head , dst_rcu_free ) ;
}
2015-05-22 20:56:06 -07:00
static void rt6_free_pcpu ( struct rt6_info * non_pcpu_rt )
{
int cpu ;
if ( ! non_pcpu_rt - > rt6i_pcpu )
return ;
for_each_possible_cpu ( cpu ) {
struct rt6_info * * ppcpu_rt ;
struct rt6_info * pcpu_rt ;
ppcpu_rt = per_cpu_ptr ( non_pcpu_rt - > rt6i_pcpu , cpu ) ;
pcpu_rt = * ppcpu_rt ;
if ( pcpu_rt ) {
2015-09-15 14:30:09 -07:00
rt6_rcu_free ( pcpu_rt ) ;
2015-05-22 20:56:06 -07:00
* ppcpu_rt = NULL ;
}
}
2015-08-14 11:05:54 -07:00
2016-07-05 12:10:23 -07:00
free_percpu ( non_pcpu_rt - > rt6i_pcpu ) ;
2015-08-14 11:05:54 -07:00
non_pcpu_rt - > rt6i_pcpu = NULL ;
2015-05-22 20:56:06 -07:00
}
2014-10-06 19:58:34 +02:00
static void rt6_release ( struct rt6_info * rt )
2005-04-16 15:20:36 -07:00
{
2015-05-22 20:56:06 -07:00
if ( atomic_dec_and_test ( & rt - > rt6i_ref ) ) {
rt6_free_pcpu ( rt ) ;
2015-09-15 14:30:09 -07:00
rt6_rcu_free ( rt ) ;
2015-05-22 20:56:06 -07:00
}
2005-04-16 15:20:36 -07:00
}
2008-03-03 23:25:27 -08:00
static void fib6_link_table ( struct net * net , struct fib6_table * tb )
2006-08-10 23:11:17 -07:00
{
unsigned int h ;
2006-10-21 20:20:54 -07:00
/*
* Initialize table lock at a single place to give lockdep a key ,
* tables aren ' t visible prior to being linked to the list .
*/
rwlock_init ( & tb - > tb6_lock ) ;
2009-07-30 18:52:15 -07:00
h = tb - > tb6_id & ( FIB6_TABLE_HASHSZ - 1 ) ;
2006-08-10 23:11:17 -07:00
/*
* No protection necessary , this is the only list mutatation
* operation , tables never disappear once they exist .
*/
2008-03-03 23:25:27 -08:00
hlist_add_head_rcu ( & tb - > tb6_hlist , & net - > ipv6 . fib_table_hash [ h ] ) ;
2006-08-10 23:11:17 -07:00
}
2006-08-04 23:20:06 -07:00
2006-08-10 23:11:17 -07:00
# ifdef CONFIG_IPV6_MULTIPLE_TABLES
2008-03-03 23:24:31 -08:00
2008-03-04 13:48:30 -08:00
static struct fib6_table * fib6_alloc_table ( struct net * net , u32 id )
2006-08-04 23:20:06 -07:00
{
struct fib6_table * table ;
table = kzalloc ( sizeof ( * table ) , GFP_ATOMIC ) ;
2011-12-03 17:50:45 -05:00
if ( table ) {
2006-08-04 23:20:06 -07:00
table - > tb6_id = id ;
2008-03-04 13:48:30 -08:00
table - > tb6_root . leaf = net - > ipv6 . ip6_null_entry ;
2006-08-04 23:20:06 -07:00
table - > tb6_root . fn_flags = RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO ;
2012-06-11 00:01:52 -07:00
inet_peer_base_init ( & table - > tb6_peers ) ;
2006-08-04 23:20:06 -07:00
}
return table ;
}
2008-03-03 23:25:27 -08:00
struct fib6_table * fib6_new_table ( struct net * net , u32 id )
2006-08-04 23:20:06 -07:00
{
struct fib6_table * tb ;
if ( id = = 0 )
id = RT6_TABLE_MAIN ;
2008-03-03 23:25:27 -08:00
tb = fib6_get_table ( net , id ) ;
2006-08-04 23:20:06 -07:00
if ( tb )
return tb ;
2008-03-04 13:48:30 -08:00
tb = fib6_alloc_table ( net , id ) ;
2011-12-03 17:50:45 -05:00
if ( tb )
2008-03-03 23:25:27 -08:00
fib6_link_table ( net , tb ) ;
2006-08-04 23:20:06 -07:00
return tb ;
}
2016-05-04 21:46:12 -07:00
EXPORT_SYMBOL_GPL ( fib6_new_table ) ;
2006-08-04 23:20:06 -07:00
2008-03-03 23:25:27 -08:00
struct fib6_table * fib6_get_table ( struct net * net , u32 id )
2006-08-04 23:20:06 -07:00
{
struct fib6_table * tb ;
2008-03-03 23:25:27 -08:00
struct hlist_head * head ;
2006-08-04 23:20:06 -07:00
unsigned int h ;
if ( id = = 0 )
id = RT6_TABLE_MAIN ;
2009-07-30 18:52:15 -07:00
h = id & ( FIB6_TABLE_HASHSZ - 1 ) ;
2006-08-04 23:20:06 -07:00
rcu_read_lock ( ) ;
2008-03-03 23:25:27 -08:00
head = & net - > ipv6 . fib_table_hash [ h ] ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry_rcu ( tb , head , tb6_hlist ) {
2006-08-04 23:20:06 -07:00
if ( tb - > tb6_id = = id ) {
rcu_read_unlock ( ) ;
return tb ;
}
}
rcu_read_unlock ( ) ;
return NULL ;
}
2015-10-12 11:47:08 -07:00
EXPORT_SYMBOL_GPL ( fib6_get_table ) ;
2006-08-04 23:20:06 -07:00
2010-01-17 03:35:32 +00:00
static void __net_init fib6_tables_init ( struct net * net )
2006-08-04 23:20:06 -07:00
{
2008-03-03 23:25:27 -08:00
fib6_link_table ( net , net - > ipv6 . fib6_main_tbl ) ;
fib6_link_table ( net , net - > ipv6 . fib6_local_tbl ) ;
2006-08-04 23:20:06 -07:00
}
# else
2008-03-03 23:25:27 -08:00
struct fib6_table * fib6_new_table ( struct net * net , u32 id )
2006-08-04 23:20:06 -07:00
{
2008-03-03 23:25:27 -08:00
return fib6_get_table ( net , id ) ;
2006-08-04 23:20:06 -07:00
}
2008-03-03 23:25:27 -08:00
struct fib6_table * fib6_get_table ( struct net * net , u32 id )
2006-08-04 23:20:06 -07:00
{
2008-03-03 23:25:27 -08:00
return net - > ipv6 . fib6_main_tbl ;
2006-08-04 23:20:06 -07:00
}
2011-03-12 16:22:43 -05:00
struct dst_entry * fib6_rule_lookup ( struct net * net , struct flowi6 * fl6 ,
2008-03-03 23:25:27 -08:00
int flags , pol_lookup_t lookup )
2006-08-04 23:20:06 -07:00
{
2015-10-23 15:36:53 +08:00
struct rt6_info * rt ;
rt = lookup ( net , net - > ipv6 . fib6_main_tbl , fl6 , flags ) ;
if ( rt - > rt6i_flags & RTF_REJECT & &
rt - > dst . error = = - EAGAIN ) {
ip6_rt_put ( rt ) ;
rt = net - > ipv6 . ip6_null_entry ;
dst_hold ( & rt - > dst ) ;
}
return & rt - > dst ;
2006-08-04 23:20:06 -07:00
}
2010-01-17 03:35:32 +00:00
static void __net_init fib6_tables_init ( struct net * net )
2006-08-04 23:20:06 -07:00
{
2008-03-03 23:25:27 -08:00
fib6_link_table ( net , net - > ipv6 . fib6_main_tbl ) ;
2006-08-04 23:20:06 -07:00
}
# endif
2014-10-06 19:58:34 +02:00
static int fib6_dump_node ( struct fib6_walker * w )
2006-08-10 23:11:17 -07:00
{
int res ;
struct rt6_info * rt ;
2010-06-10 23:31:35 -07:00
for ( rt = w - > leaf ; rt ; rt = rt - > dst . rt6_next ) {
2006-08-10 23:11:17 -07:00
res = rt6_dump_route ( rt , w - > args ) ;
if ( res < 0 ) {
/* Frame is full, suspend walking */
w - > leaf = rt ;
return 1 ;
}
}
w - > leaf = NULL ;
return 0 ;
}
static void fib6_dump_end ( struct netlink_callback * cb )
{
2016-03-08 14:44:35 +01:00
struct net * net = sock_net ( cb - > skb - > sk ) ;
2014-10-06 19:58:34 +02:00
struct fib6_walker * w = ( void * ) cb - > args [ 2 ] ;
2006-08-10 23:11:17 -07:00
if ( w ) {
2009-01-13 22:17:51 -08:00
if ( cb - > args [ 4 ] ) {
cb - > args [ 4 ] = 0 ;
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , w ) ;
2009-01-13 22:17:51 -08:00
}
2006-08-10 23:11:17 -07:00
cb - > args [ 2 ] = 0 ;
kfree ( w ) ;
}
2014-03-28 12:07:04 +08:00
cb - > done = ( void * ) cb - > args [ 3 ] ;
2006-08-10 23:11:17 -07:00
cb - > args [ 1 ] = 3 ;
}
static int fib6_dump_done ( struct netlink_callback * cb )
{
fib6_dump_end ( cb ) ;
return cb - > done ? cb - > done ( cb ) : 0 ;
}
static int fib6_dump_table ( struct fib6_table * table , struct sk_buff * skb ,
struct netlink_callback * cb )
{
2016-03-08 14:44:35 +01:00
struct net * net = sock_net ( skb - > sk ) ;
2014-10-06 19:58:34 +02:00
struct fib6_walker * w ;
2006-08-10 23:11:17 -07:00
int res ;
w = ( void * ) cb - > args [ 2 ] ;
w - > root = & table - > tb6_root ;
if ( cb - > args [ 4 ] = = 0 ) {
2010-02-08 05:19:03 +00:00
w - > count = 0 ;
w - > skip = 0 ;
2006-08-10 23:11:17 -07:00
read_lock_bh ( & table - > tb6_lock ) ;
2016-03-08 14:44:35 +01:00
res = fib6_walk ( net , w ) ;
2006-08-10 23:11:17 -07:00
read_unlock_bh ( & table - > tb6_lock ) ;
2010-02-08 05:19:03 +00:00
if ( res > 0 ) {
2006-08-10 23:11:17 -07:00
cb - > args [ 4 ] = 1 ;
2010-02-08 05:19:03 +00:00
cb - > args [ 5 ] = w - > root - > fn_sernum ;
}
2006-08-10 23:11:17 -07:00
} else {
2010-02-08 05:19:03 +00:00
if ( cb - > args [ 5 ] ! = w - > root - > fn_sernum ) {
/* Begin at the root if the tree changed */
cb - > args [ 5 ] = w - > root - > fn_sernum ;
w - > state = FWS_INIT ;
w - > node = w - > root ;
w - > skip = w - > count ;
} else
w - > skip = 0 ;
2006-08-10 23:11:17 -07:00
read_lock_bh ( & table - > tb6_lock ) ;
res = fib6_walk_continue ( w ) ;
read_unlock_bh ( & table - > tb6_lock ) ;
2009-01-13 22:17:51 -08:00
if ( res < = 0 ) {
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , w ) ;
2009-01-13 22:17:51 -08:00
cb - > args [ 4 ] = 0 ;
2006-08-10 23:11:17 -07:00
}
}
2009-01-13 22:17:51 -08:00
2006-08-10 23:11:17 -07:00
return res ;
}
2007-03-22 11:58:32 -07:00
static int inet6_dump_fib ( struct sk_buff * skb , struct netlink_callback * cb )
2006-08-10 23:11:17 -07:00
{
2008-03-26 02:26:21 +09:00
struct net * net = sock_net ( skb - > sk ) ;
2006-08-10 23:11:17 -07:00
unsigned int h , s_h ;
unsigned int e = 0 , s_e ;
struct rt6_rtnl_dump_arg arg ;
2014-10-06 19:58:34 +02:00
struct fib6_walker * w ;
2006-08-10 23:11:17 -07:00
struct fib6_table * tb ;
2008-03-03 23:25:27 -08:00
struct hlist_head * head ;
2006-08-10 23:11:17 -07:00
int res = 0 ;
s_h = cb - > args [ 0 ] ;
s_e = cb - > args [ 1 ] ;
w = ( void * ) cb - > args [ 2 ] ;
2011-12-03 17:50:45 -05:00
if ( ! w ) {
2006-08-10 23:11:17 -07:00
/* New dump:
*
* 1. hook callback destructor .
*/
cb - > args [ 3 ] = ( long ) cb - > done ;
cb - > done = fib6_dump_done ;
/*
* 2. allocate and initialize walker .
*/
w = kzalloc ( sizeof ( * w ) , GFP_ATOMIC ) ;
2011-12-03 17:50:45 -05:00
if ( ! w )
2006-08-10 23:11:17 -07:00
return - ENOMEM ;
w - > func = fib6_dump_node ;
cb - > args [ 2 ] = ( long ) w ;
}
arg . skb = skb ;
arg . cb = cb ;
2008-08-14 15:33:21 -07:00
arg . net = net ;
2006-08-10 23:11:17 -07:00
w - > args = & arg ;
2011-04-27 22:56:07 +00:00
rcu_read_lock ( ) ;
2009-07-30 18:52:15 -07:00
for ( h = s_h ; h < FIB6_TABLE_HASHSZ ; h + + , s_e = 0 ) {
2006-08-10 23:11:17 -07:00
e = 0 ;
2008-03-03 23:25:27 -08:00
head = & net - > ipv6 . fib_table_hash [ h ] ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry_rcu ( tb , head , tb6_hlist ) {
2006-08-10 23:11:17 -07:00
if ( e < s_e )
goto next ;
res = fib6_dump_table ( tb , skb , cb ) ;
if ( res ! = 0 )
goto out ;
next :
e + + ;
}
}
out :
2011-04-27 22:56:07 +00:00
rcu_read_unlock ( ) ;
2006-08-10 23:11:17 -07:00
cb - > args [ 1 ] = e ;
cb - > args [ 0 ] = h ;
res = res < 0 ? res : skb - > len ;
if ( res < = 0 )
fib6_dump_end ( cb ) ;
return res ;
}
2005-04-16 15:20:36 -07:00
/*
* Routing Table
*
* return the appropriate node for a routing tree " add " operation
* by either creating and inserting or by returning an existing
* node .
*/
2013-07-22 14:21:09 +08:00
static struct fib6_node * fib6_add_1 ( struct fib6_node * root ,
struct in6_addr * addr , int plen ,
2011-11-14 00:15:14 +00:00
int offset , int allow_create ,
2014-10-06 19:58:36 +02:00
int replace_required , int sernum )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn , * in , * ln ;
struct fib6_node * pn = NULL ;
struct rt6key * key ;
int bit ;
2007-02-09 23:24:49 +09:00
__be32 dir = 0 ;
2005-04-16 15:20:36 -07:00
RT6_TRACE ( " fib6_add_1 \n " ) ;
/* insert node in tree */
fn = root ;
do {
key = ( struct rt6key * ) ( ( u8 * ) fn - > leaf + offset ) ;
/*
* Prefix match
*/
if ( plen < fn - > fn_bit | |
2011-11-14 00:15:14 +00:00
! ipv6_prefix_equal ( & key - > addr , addr , fn - > fn_bit ) ) {
2011-11-16 21:18:02 +00:00
if ( ! allow_create ) {
if ( replace_required ) {
2012-05-15 14:11:53 +00:00
pr_warn ( " Can't replace route, no match found \n " ) ;
2011-11-16 21:18:02 +00:00
return ERR_PTR ( - ENOENT ) ;
}
2012-05-15 14:11:53 +00:00
pr_warn ( " NLM_F_CREATE should be set when creating new route \n " ) ;
2011-11-16 21:18:02 +00:00
}
2005-04-16 15:20:36 -07:00
goto insert_above ;
2011-11-14 00:15:14 +00:00
}
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
/*
* Exact match ?
*/
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
if ( plen = = fn - > fn_bit ) {
/* clean up an intermediate node */
2011-12-03 17:50:45 -05:00
if ( ! ( fn - > fn_flags & RTN_RTINFO ) ) {
2005-04-16 15:20:36 -07:00
rt6_release ( fn - > leaf ) ;
fn - > leaf = NULL ;
}
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
fn - > fn_sernum = sernum ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
return fn ;
}
/*
* We have more bits to go
*/
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
/* Try to walk down on tree. */
fn - > fn_sernum = sernum ;
dir = addr_bit_set ( addr , fn - > fn_bit ) ;
pn = fn ;
2014-03-28 12:07:02 +08:00
fn = dir ? fn - > right : fn - > left ;
2005-04-16 15:20:36 -07:00
} while ( fn ) ;
2011-11-16 21:18:02 +00:00
if ( ! allow_create ) {
2011-11-14 00:15:14 +00:00
/* We should not create new node because
* NLM_F_REPLACE was specified without NLM_F_CREATE
* I assume it is safe to require NLM_F_CREATE when
* REPLACE flag is used ! Later we may want to remove the
* check for replace_required , because according
* to netlink specification , NLM_F_CREATE
* MUST be specified if new route is created .
* That would keep IPv6 consistent with IPv4
*/
2011-11-16 21:18:02 +00:00
if ( replace_required ) {
2012-05-15 14:11:53 +00:00
pr_warn ( " Can't replace route, no match found \n " ) ;
2011-11-16 21:18:02 +00:00
return ERR_PTR ( - ENOENT ) ;
}
2012-05-15 14:11:53 +00:00
pr_warn ( " NLM_F_CREATE should be set when creating new route \n " ) ;
2011-11-14 00:15:14 +00:00
}
2005-04-16 15:20:36 -07:00
/*
* We walked to the bottom of tree .
* Create new leaf node without children .
*/
ln = node_alloc ( ) ;
2011-12-03 17:50:45 -05:00
if ( ! ln )
2012-09-25 15:17:07 +00:00
return ERR_PTR ( - ENOMEM ) ;
2005-04-16 15:20:36 -07:00
ln - > fn_bit = plen ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
ln - > parent = pn ;
ln - > fn_sernum = sernum ;
if ( dir )
pn - > right = ln ;
else
pn - > left = ln ;
return ln ;
insert_above :
/*
2007-02-09 23:24:49 +09:00
* split since we don ' t have a common prefix anymore or
2005-04-16 15:20:36 -07:00
* we have a less significant route .
* we ' ve to insert an intermediate node on the list
* this new node will point to the one we need to create
* and the current
*/
pn = fn - > parent ;
/* find 1st bit in difference between the 2 addrs.
2005-11-08 09:37:56 -08:00
See comment in __ipv6_addr_diff : bit may be an invalid value ,
2005-04-16 15:20:36 -07:00
but if it is > = plen , the value is ignored in any case .
*/
2007-02-09 23:24:49 +09:00
2013-07-22 14:21:09 +08:00
bit = __ipv6_addr_diff ( addr , & key - > addr , sizeof ( * addr ) ) ;
2005-04-16 15:20:36 -07:00
2007-02-09 23:24:49 +09:00
/*
* ( intermediate ) [ in ]
2005-04-16 15:20:36 -07:00
* / \
* ( new leaf node ) [ ln ] ( old node ) [ fn ]
*/
if ( plen > bit ) {
in = node_alloc ( ) ;
ln = node_alloc ( ) ;
2007-02-09 23:24:49 +09:00
2011-12-03 17:50:45 -05:00
if ( ! in | | ! ln ) {
2005-04-16 15:20:36 -07:00
if ( in )
node_free ( in ) ;
if ( ln )
node_free ( ln ) ;
2012-09-25 15:17:07 +00:00
return ERR_PTR ( - ENOMEM ) ;
2005-04-16 15:20:36 -07:00
}
2007-02-09 23:24:49 +09:00
/*
* new intermediate node .
2005-04-16 15:20:36 -07:00
* RTN_RTINFO will
* be off since that an address that chooses one of
* the branches would not match less specific routes
* in the other branch
*/
in - > fn_bit = bit ;
in - > parent = pn ;
in - > leaf = fn - > leaf ;
atomic_inc ( & in - > leaf - > rt6i_ref ) ;
in - > fn_sernum = sernum ;
/* update parent pointer */
if ( dir )
pn - > right = in ;
else
pn - > left = in ;
ln - > fn_bit = plen ;
ln - > parent = in ;
fn - > parent = in ;
ln - > fn_sernum = sernum ;
if ( addr_bit_set ( addr , bit ) ) {
in - > right = ln ;
in - > left = fn ;
} else {
in - > left = ln ;
in - > right = fn ;
}
} else { /* plen <= bit */
2007-02-09 23:24:49 +09:00
/*
2005-04-16 15:20:36 -07:00
* ( new leaf node ) [ ln ]
* / \
* ( old node ) [ fn ] NULL
*/
ln = node_alloc ( ) ;
2011-12-03 17:50:45 -05:00
if ( ! ln )
2012-09-25 15:17:07 +00:00
return ERR_PTR ( - ENOMEM ) ;
2005-04-16 15:20:36 -07:00
ln - > fn_bit = plen ;
ln - > parent = pn ;
ln - > fn_sernum = sernum ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
if ( dir )
pn - > right = ln ;
else
pn - > left = ln ;
if ( addr_bit_set ( & key - > addr , plen ) )
ln - > right = fn ;
else
ln - > left = fn ;
fn - > parent = ln ;
}
return ln ;
}
2014-10-06 19:58:34 +02:00
static bool rt6_qualify_for_ecmp ( struct rt6_info * rt )
2013-07-12 23:46:33 +02:00
{
return ( rt - > rt6i_flags & ( RTF_GATEWAY | RTF_ADDRCONF | RTF_DYNAMIC ) ) = =
RTF_GATEWAY ;
}
2015-01-05 23:57:44 +01:00
static void fib6_copy_metrics ( u32 * mp , const struct mx6_config * mxc )
2014-03-27 13:04:08 +01:00
{
2015-01-05 23:57:44 +01:00
int i ;
for ( i = 0 ; i < RTAX_MAX ; i + + ) {
if ( test_bit ( i , mxc - > mx_valid ) )
mp [ i ] = mxc - > mx [ i ] ;
}
}
static int fib6_commit_metrics ( struct dst_entry * dst , struct mx6_config * mxc )
{
if ( ! mxc - > mx )
return 0 ;
if ( dst - > flags & DST_HOST ) {
u32 * mp = dst_metrics_write_ptr ( dst ) ;
if ( unlikely ( ! mp ) )
return - ENOMEM ;
fib6_copy_metrics ( mp , mxc ) ;
} else {
dst_init_metrics ( dst , mxc - > mx , false ) ;
/* We've stolen mx now. */
mxc - > mx = NULL ;
2014-03-27 13:04:08 +01:00
}
2015-01-05 23:57:44 +01:00
2014-03-27 13:04:08 +01:00
return 0 ;
}
2015-01-26 15:11:17 +01:00
static void fib6_purge_rt ( struct rt6_info * rt , struct fib6_node * fn ,
struct net * net )
{
if ( atomic_read ( & rt - > rt6i_ref ) ! = 1 ) {
/* This route is used as dummy address holder in some split
* nodes . It is not leaked , but it still holds other resources ,
* which must be released in time . So , scan ascendant nodes
* and replace dummy references to this route with references
* to still alive ones .
*/
while ( fn ) {
if ( ! ( fn - > fn_flags & RTN_RTINFO ) & & fn - > leaf = = rt ) {
fn - > leaf = fib6_find_prefix ( net , fn ) ;
atomic_inc ( & fn - > leaf - > rt6i_ref ) ;
rt6_release ( rt ) ;
}
fn = fn - > parent ;
}
/* No more references are possible at this point. */
BUG_ON ( atomic_read ( & rt - > rt6i_ref ) ! = 1 ) ;
}
}
2005-04-16 15:20:36 -07:00
/*
* Insert routing information in a node .
*/
static int fib6_add_rt2node ( struct fib6_node * fn , struct rt6_info * rt ,
2015-01-05 23:57:44 +01:00
struct nl_info * info , struct mx6_config * mxc )
2005-04-16 15:20:36 -07:00
{
struct rt6_info * iter = NULL ;
struct rt6_info * * ins ;
2015-05-18 20:54:00 +02:00
struct rt6_info * * fallback_ins = NULL ;
2011-12-03 17:50:45 -05:00
int replace = ( info - > nlh & &
( info - > nlh - > nlmsg_flags & NLM_F_REPLACE ) ) ;
int add = ( ! info - > nlh | |
( info - > nlh - > nlmsg_flags & NLM_F_CREATE ) ) ;
2011-11-14 00:15:14 +00:00
int found = 0 ;
2013-07-12 23:46:33 +02:00
bool rt_can_ecmp = rt6_qualify_for_ecmp ( rt ) ;
2016-09-07 17:21:40 +02:00
u16 nlflags = NLM_F_EXCL ;
2014-03-27 13:04:08 +01:00
int err ;
2005-04-16 15:20:36 -07:00
ins = & fn - > leaf ;
2011-12-03 17:50:45 -05:00
for ( iter = fn - > leaf ; iter ; iter = iter - > dst . rt6_next ) {
2005-04-16 15:20:36 -07:00
/*
* Search for duplicates
*/
if ( iter - > rt6i_metric = = rt - > rt6i_metric ) {
/*
* Same priority level
*/
2011-12-03 17:50:45 -05:00
if ( info - > nlh & &
( info - > nlh - > nlmsg_flags & NLM_F_EXCL ) )
2011-11-14 00:15:14 +00:00
return - EEXIST ;
2016-09-07 17:21:40 +02:00
nlflags & = ~ NLM_F_EXCL ;
2011-11-14 00:15:14 +00:00
if ( replace ) {
2015-05-18 20:54:00 +02:00
if ( rt_can_ecmp = = rt6_qualify_for_ecmp ( iter ) ) {
found + + ;
break ;
}
if ( rt_can_ecmp )
fallback_ins = fallback_ins ? : ins ;
goto next_iter ;
2011-11-14 00:15:14 +00:00
}
2005-04-16 15:20:36 -07:00
2011-12-28 20:19:20 -05:00
if ( iter - > dst . dev = = rt - > dst . dev & &
2005-04-16 15:20:36 -07:00
iter - > rt6i_idev = = rt - > rt6i_idev & &
ipv6_addr_equal ( & iter - > rt6i_gateway ,
& rt - > rt6i_gateway ) ) {
2012-10-22 03:42:09 +00:00
if ( rt - > rt6i_nsiblings )
rt - > rt6i_nsiblings = 0 ;
2011-12-03 17:50:45 -05:00
if ( ! ( iter - > rt6i_flags & RTF_EXPIRES ) )
2005-04-16 15:20:36 -07:00
return - EEXIST ;
2012-04-06 00:13:10 +00:00
if ( ! ( rt - > rt6i_flags & RTF_EXPIRES ) )
rt6_clean_expires ( iter ) ;
else
rt6_set_expires ( iter , rt - > dst . expires ) ;
2015-05-22 20:56:00 -07:00
iter - > rt6i_pmtu = rt - > rt6i_pmtu ;
2005-04-16 15:20:36 -07:00
return - EEXIST ;
}
2012-10-22 03:42:09 +00:00
/* If we have the same destination and the same metric,
* but not the same gateway , then the route we try to
* add is sibling to this route , increment our counter
* of siblings , and later we will add our route to the
* list .
* Only static routes ( which don ' t have flag
* RTF_EXPIRES ) are used for ECMPv6 .
*
* To avoid long list , we only had siblings if the
* route have a gateway .
*/
2013-07-12 23:46:33 +02:00
if ( rt_can_ecmp & &
rt6_qualify_for_ecmp ( iter ) )
2012-10-22 03:42:09 +00:00
rt - > rt6i_nsiblings + + ;
2005-04-16 15:20:36 -07:00
}
if ( iter - > rt6i_metric > rt - > rt6i_metric )
break ;
2015-05-18 20:54:00 +02:00
next_iter :
2010-06-10 23:31:35 -07:00
ins = & iter - > dst . rt6_next ;
2005-04-16 15:20:36 -07:00
}
2015-05-18 20:54:00 +02:00
if ( fallback_ins & & ! found ) {
/* No ECMP-able route found, replace first non-ECMP one */
ins = fallback_ins ;
iter = * ins ;
found + + ;
}
[IPV6]: Fix routing round-robin locking.
As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.
Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf. The round robin code executes during
lookup and thus with the rwlock taken as a reader. A small local
spinlock tries to provide protection but this does not work at all
for two reasons:
1) The round-robin list manipulation, as coded, goes like this (with
read lock held):
walk routes finding head and tail
spin_lock();
rotate list using head and tail
spin_unlock();
While one thread is rotating the list, another thread can
end up with stale values of head and tail and then proceed
to corrupt the list when it gets the lock. This ends up causing
the OOPS in fib6_add() later onthat many people have been hitting.
2) All the other code paths that run with the rwlock held as
a reader do not expect the list to change on them, they
expect it to remain completely fixed while they hold the
lock in that way.
So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.
Reimplement using a per-fib6_node round-robin pointer. This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself. We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.
The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-24 20:36:25 -07:00
/* Reset round-robin state, if necessary */
if ( ins = = & fn - > leaf )
fn - > rr_ptr = NULL ;
2012-10-22 03:42:09 +00:00
/* Link this route to others same route. */
if ( rt - > rt6i_nsiblings ) {
unsigned int rt6i_nsiblings ;
struct rt6_info * sibling , * temp_sibling ;
/* Find the first route that have the same metric */
sibling = fn - > leaf ;
while ( sibling ) {
2013-07-12 23:46:33 +02:00
if ( sibling - > rt6i_metric = = rt - > rt6i_metric & &
rt6_qualify_for_ecmp ( sibling ) ) {
2012-10-22 03:42:09 +00:00
list_add_tail ( & rt - > rt6i_siblings ,
& sibling - > rt6i_siblings ) ;
break ;
}
sibling = sibling - > dst . rt6_next ;
}
/* For each sibling in the list, increment the counter of
* siblings . BUG ( ) if counters does not match , list of siblings
* is broken !
*/
rt6i_nsiblings = 0 ;
list_for_each_entry_safe ( sibling , temp_sibling ,
& rt - > rt6i_siblings , rt6i_siblings ) {
sibling - > rt6i_nsiblings + + ;
BUG_ON ( sibling - > rt6i_nsiblings ! = rt - > rt6i_nsiblings ) ;
rt6i_nsiblings + + ;
}
BUG_ON ( rt6i_nsiblings ! = rt - > rt6i_nsiblings ) ;
}
2005-04-16 15:20:36 -07:00
/*
* insert node
*/
2011-11-14 00:15:14 +00:00
if ( ! replace ) {
if ( ! add )
2012-05-15 14:11:53 +00:00
pr_warn ( " NLM_F_CREATE should be set when creating new route \n " ) ;
2011-11-14 00:15:14 +00:00
add :
2016-09-07 17:21:40 +02:00
nlflags | = NLM_F_CREATE ;
2015-01-05 23:57:44 +01:00
err = fib6_commit_metrics ( & rt - > dst , mxc ) ;
if ( err )
return err ;
2011-11-14 00:15:14 +00:00
rt - > dst . rt6_next = iter ;
* ins = rt ;
rt - > rt6i_node = fn ;
atomic_inc ( & rt - > rt6i_ref ) ;
2016-09-07 17:21:40 +02:00
inet6_rt_notify ( RTM_NEWROUTE , rt , info , nlflags ) ;
2011-11-14 00:15:14 +00:00
info - > nl_net - > ipv6 . rt6_stats - > fib_rt_entries + + ;
2011-12-03 17:50:45 -05:00
if ( ! ( fn - > fn_flags & RTN_RTINFO ) ) {
2011-11-14 00:15:14 +00:00
info - > nl_net - > ipv6 . rt6_stats - > fib_route_nodes + + ;
fn - > fn_flags | = RTN_RTINFO ;
}
2005-04-16 15:20:36 -07:00
2011-11-14 00:15:14 +00:00
} else {
2015-05-18 20:54:00 +02:00
int nsiblings ;
2011-11-14 00:15:14 +00:00
if ( ! found ) {
if ( add )
goto add ;
2012-05-15 14:11:53 +00:00
pr_warn ( " NLM_F_REPLACE set, but no existing node found! \n " ) ;
2011-11-14 00:15:14 +00:00
return - ENOENT ;
}
2015-01-05 23:57:44 +01:00
err = fib6_commit_metrics ( & rt - > dst , mxc ) ;
if ( err )
return err ;
2011-11-14 00:15:14 +00:00
* ins = rt ;
rt - > rt6i_node = fn ;
rt - > dst . rt6_next = iter - > dst . rt6_next ;
atomic_inc ( & rt - > rt6i_ref ) ;
2015-09-13 10:18:33 -07:00
inet6_rt_notify ( RTM_NEWROUTE , rt , info , NLM_F_REPLACE ) ;
2011-12-03 17:50:45 -05:00
if ( ! ( fn - > fn_flags & RTN_RTINFO ) ) {
2011-11-14 00:15:14 +00:00
info - > nl_net - > ipv6 . rt6_stats - > fib_route_nodes + + ;
fn - > fn_flags | = RTN_RTINFO ;
}
2015-05-18 20:54:00 +02:00
nsiblings = iter - > rt6i_nsiblings ;
2015-01-26 15:11:17 +01:00
fib6_purge_rt ( iter , fn , info - > nl_net ) ;
rt6_release ( iter ) ;
2015-05-18 20:54:00 +02:00
if ( nsiblings ) {
/* Replacing an ECMP route, remove all siblings */
ins = & rt - > dst . rt6_next ;
iter = * ins ;
while ( iter ) {
if ( rt6_qualify_for_ecmp ( iter ) ) {
* ins = iter - > dst . rt6_next ;
fib6_purge_rt ( iter , fn , info - > nl_net ) ;
rt6_release ( iter ) ;
nsiblings - - ;
} else {
ins = & iter - > dst . rt6_next ;
}
iter = * ins ;
}
WARN_ON ( nsiblings ! = 0 ) ;
}
2005-04-16 15:20:36 -07:00
}
return 0 ;
}
2014-10-06 19:58:34 +02:00
static void fib6_start_gc ( struct net * net , struct rt6_info * rt )
2005-04-16 15:20:36 -07:00
{
2008-07-22 14:33:45 -07:00
if ( ! timer_pending ( & net - > ipv6 . ip6_fib_timer ) & &
2011-12-03 17:50:45 -05:00
( rt - > rt6i_flags & ( RTF_EXPIRES | RTF_CACHE ) ) )
2008-07-22 14:33:45 -07:00
mod_timer ( & net - > ipv6 . ip6_fib_timer ,
2008-07-21 13:21:35 -07:00
jiffies + net - > ipv6 . sysctl . ip6_rt_gc_interval ) ;
2005-04-16 15:20:36 -07:00
}
2008-03-03 23:31:11 -08:00
void fib6_force_start_gc ( struct net * net )
2005-04-16 15:20:36 -07:00
{
2008-07-22 14:33:45 -07:00
if ( ! timer_pending ( & net - > ipv6 . ip6_fib_timer ) )
mod_timer ( & net - > ipv6 . ip6_fib_timer ,
2008-07-21 13:21:35 -07:00
jiffies + net - > ipv6 . sysctl . ip6_rt_gc_interval ) ;
2005-04-16 15:20:36 -07:00
}
/*
* Add routing information to the routing tree .
* < destination addr > / < source addr >
* with source addr info in sub - trees
*/
2015-01-05 23:57:44 +01:00
int fib6_add ( struct fib6_node * root , struct rt6_info * rt ,
struct nl_info * info , struct mx6_config * mxc )
2005-04-16 15:20:36 -07:00
{
2006-08-23 17:20:34 -07:00
struct fib6_node * fn , * pn = NULL ;
2005-04-16 15:20:36 -07:00
int err = - ENOMEM ;
2011-11-14 00:15:14 +00:00
int allow_create = 1 ;
int replace_required = 0 ;
2014-10-06 19:58:37 +02:00
int sernum = fib6_new_sernum ( info - > nl_net ) ;
2011-12-03 17:50:45 -05:00
2015-09-15 14:30:08 -07:00
if ( WARN_ON_ONCE ( ( rt - > dst . flags & DST_NOCACHE ) & &
! atomic_read ( & rt - > dst . __refcnt ) ) )
return - EINVAL ;
2011-12-03 17:50:45 -05:00
if ( info - > nlh ) {
if ( ! ( info - > nlh - > nlmsg_flags & NLM_F_CREATE ) )
2011-11-14 00:15:14 +00:00
allow_create = 0 ;
2011-12-03 17:50:45 -05:00
if ( info - > nlh - > nlmsg_flags & NLM_F_REPLACE )
2011-11-14 00:15:14 +00:00
replace_required = 1 ;
}
if ( ! allow_create & & ! replace_required )
2012-05-15 14:11:53 +00:00
pr_warn ( " RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE \n " ) ;
2005-04-16 15:20:36 -07:00
2013-07-22 14:21:09 +08:00
fn = fib6_add_1 ( root , & rt - > rt6i_dst . addr , rt - > rt6i_dst . plen ,
offsetof ( struct rt6_info , rt6i_dst ) , allow_create ,
2014-10-06 19:58:36 +02:00
replace_required , sernum ) ;
2011-11-14 00:15:14 +00:00
if ( IS_ERR ( fn ) ) {
err = PTR_ERR ( fn ) ;
net: fib: fib6_add: fix potential NULL pointer dereference
When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
with an error in fn = fib6_add_1(), then error codes are encoded into
the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
write the error code into err and jump to out, hence enter the if(err)
condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:
if (pn != fn && pn->leaf == rt)
...
if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
...
Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
evaluates to true and causes a NULL-pointer dereference on further
checks on pn. Fix it, by setting both NULL in error case, so that
pn != fn already evaluates to false and no further dereference
takes place.
This was first correctly implemented in 4a287eba2 ("IPv6 routing,
NLM_F_* flag support: REPLACE and EXCL flags support, warn about
missing CREATE flag"), but the bug got later on introduced by
188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()").
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>
Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-07 15:13:20 +02:00
fn = NULL ;
2005-04-16 15:20:36 -07:00
goto out ;
2012-09-25 15:17:07 +00:00
}
2005-04-16 15:20:36 -07:00
2006-08-23 17:20:34 -07:00
pn = fn ;
2005-04-16 15:20:36 -07:00
# ifdef CONFIG_IPV6_SUBTREES
if ( rt - > rt6i_src . plen ) {
struct fib6_node * sn ;
2011-12-03 17:50:45 -05:00
if ( ! fn - > subtree ) {
2005-04-16 15:20:36 -07:00
struct fib6_node * sfn ;
/*
* Create subtree .
*
* fn [ main tree ]
* |
* sfn [ subtree root ]
* \
* sn [ new leaf node ]
*/
/* Create subtree root node */
sfn = node_alloc ( ) ;
2011-12-03 17:50:45 -05:00
if ( ! sfn )
2005-04-16 15:20:36 -07:00
goto st_failure ;
2008-03-04 13:48:30 -08:00
sfn - > leaf = info - > nl_net - > ipv6 . ip6_null_entry ;
atomic_inc ( & info - > nl_net - > ipv6 . ip6_null_entry - > rt6i_ref ) ;
2005-04-16 15:20:36 -07:00
sfn - > fn_flags = RTN_ROOT ;
2014-10-06 19:58:36 +02:00
sfn - > fn_sernum = sernum ;
2005-04-16 15:20:36 -07:00
/* Now add the first leaf node to new subtree */
sn = fib6_add_1 ( sfn , & rt - > rt6i_src . addr ,
2013-07-22 14:21:09 +08:00
rt - > rt6i_src . plen ,
2011-11-14 00:15:14 +00:00
offsetof ( struct rt6_info , rt6i_src ) ,
2014-10-06 19:58:36 +02:00
allow_create , replace_required , sernum ) ;
2005-04-16 15:20:36 -07:00
2012-09-20 18:29:56 +00:00
if ( IS_ERR ( sn ) ) {
2005-04-16 15:20:36 -07:00
/* If it is failed, discard just allocated
root , and then ( in st_failure ) stale node
in main tree .
*/
node_free ( sfn ) ;
2012-09-25 15:17:07 +00:00
err = PTR_ERR ( sn ) ;
2005-04-16 15:20:36 -07:00
goto st_failure ;
}
/* Now link new subtree to main tree */
sfn - > parent = fn ;
fn - > subtree = sfn ;
} else {
sn = fib6_add_1 ( fn - > subtree , & rt - > rt6i_src . addr ,
2013-07-22 14:21:09 +08:00
rt - > rt6i_src . plen ,
2011-11-14 00:15:14 +00:00
offsetof ( struct rt6_info , rt6i_src ) ,
2014-10-06 19:58:36 +02:00
allow_create , replace_required , sernum ) ;
2005-04-16 15:20:36 -07:00
2011-11-14 00:15:14 +00:00
if ( IS_ERR ( sn ) ) {
err = PTR_ERR ( sn ) ;
2005-04-16 15:20:36 -07:00
goto st_failure ;
2012-09-25 15:17:07 +00:00
}
2005-04-16 15:20:36 -07:00
}
2011-12-03 17:50:45 -05:00
if ( ! fn - > leaf ) {
2006-08-23 17:20:34 -07:00
fn - > leaf = rt ;
atomic_inc ( & rt - > rt6i_ref ) ;
}
2005-04-16 15:20:36 -07:00
fn = sn ;
}
# endif
2015-01-05 23:57:44 +01:00
err = fib6_add_rt2node ( fn , rt , info , mxc ) ;
2011-12-03 17:50:45 -05:00
if ( ! err ) {
2008-03-03 23:31:11 -08:00
fib6_start_gc ( info - > nl_net , rt ) ;
2011-12-03 17:50:45 -05:00
if ( ! ( rt - > rt6i_flags & RTF_CACHE ) )
2014-05-09 13:31:43 +08:00
fib6_prune_clones ( info - > nl_net , pn ) ;
2015-09-15 14:30:08 -07:00
rt - > dst . flags & = ~ DST_NOCACHE ;
2005-04-16 15:20:36 -07:00
}
out :
2006-08-23 17:20:34 -07:00
if ( err ) {
# ifdef CONFIG_IPV6_SUBTREES
/*
* If fib6_add_1 has cleared the old leaf pointer in the
* super - tree leaf node we have to find a new one for it .
*/
2008-04-18 01:46:19 -07:00
if ( pn ! = fn & & pn - > leaf = = rt ) {
pn - > leaf = NULL ;
atomic_dec ( & rt - > rt6i_ref ) ;
}
2006-08-23 17:20:34 -07:00
if ( pn ! = fn & & ! pn - > leaf & & ! ( pn - > fn_flags & RTN_RTINFO ) ) {
2008-03-04 13:48:30 -08:00
pn - > leaf = fib6_find_prefix ( info - > nl_net , pn ) ;
2006-08-23 17:20:34 -07:00
# if RT6_DEBUG >= 2
if ( ! pn - > leaf ) {
2008-07-25 21:43:18 -07:00
WARN_ON ( pn - > leaf = = NULL ) ;
2008-03-04 13:48:30 -08:00
pn - > leaf = info - > nl_net - > ipv6 . ip6_null_entry ;
2006-08-23 17:20:34 -07:00
}
# endif
atomic_inc ( & pn - > leaf - > rt6i_ref ) ;
}
# endif
2015-09-15 14:30:08 -07:00
if ( ! ( rt - > dst . flags & DST_NOCACHE ) )
dst_free ( & rt - > dst ) ;
2006-08-23 17:20:34 -07:00
}
2005-04-16 15:20:36 -07:00
return err ;
# ifdef CONFIG_IPV6_SUBTREES
/* Subtree creation failed, probably main tree node
is orphan . If it is , shoot it .
*/
st_failure :
if ( fn & & ! ( fn - > fn_flags & ( RTN_RTINFO | RTN_ROOT ) ) )
2008-03-04 13:48:30 -08:00
fib6_repair_tree ( info - > nl_net , fn ) ;
2015-09-15 14:30:08 -07:00
if ( ! ( rt - > dst . flags & DST_NOCACHE ) )
dst_free ( & rt - > dst ) ;
2005-04-16 15:20:36 -07:00
return err ;
# endif
}
/*
* Routing tree lookup
*
*/
struct lookup_args {
2011-12-03 17:50:45 -05:00
int offset ; /* key offset on rt6_info */
2011-04-22 04:53:02 +00:00
const struct in6_addr * addr ; /* search key */
2005-04-16 15:20:36 -07:00
} ;
2014-03-28 12:07:04 +08:00
static struct fib6_node * fib6_lookup_1 ( struct fib6_node * root ,
struct lookup_args * args )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn ;
2006-11-14 20:56:00 -08:00
__be32 dir ;
2005-04-16 15:20:36 -07:00
2006-08-23 17:21:29 -07:00
if ( unlikely ( args - > offset = = 0 ) )
return NULL ;
2005-04-16 15:20:36 -07:00
/*
* Descend on a tree
*/
fn = root ;
for ( ; ; ) {
struct fib6_node * next ;
dir = addr_bit_set ( args - > addr , fn - > fn_bit ) ;
next = dir ? fn - > right : fn - > left ;
if ( next ) {
fn = next ;
continue ;
}
break ;
}
2011-12-03 17:50:45 -05:00
while ( fn ) {
2006-08-23 17:22:24 -07:00
if ( FIB6_SUBTREE ( fn ) | | fn - > fn_flags & RTN_RTINFO ) {
2005-04-16 15:20:36 -07:00
struct rt6key * key ;
key = ( struct rt6key * ) ( ( u8 * ) fn - > leaf +
args - > offset ) ;
2006-08-23 17:21:12 -07:00
if ( ipv6_prefix_equal ( & key - > addr , args - > addr , key - > plen ) ) {
# ifdef CONFIG_IPV6_SUBTREES
2013-08-07 02:34:31 +02:00
if ( fn - > subtree ) {
struct fib6_node * sfn ;
sfn = fib6_lookup_1 ( fn - > subtree ,
args + 1 ) ;
if ( ! sfn )
goto backtrack ;
fn = sfn ;
}
2006-08-23 17:21:12 -07:00
# endif
2013-08-07 02:34:31 +02:00
if ( fn - > fn_flags & RTN_RTINFO )
2006-08-23 17:21:12 -07:00
return fn ;
}
2005-04-16 15:20:36 -07:00
}
2013-08-07 02:34:31 +02:00
# ifdef CONFIG_IPV6_SUBTREES
backtrack :
# endif
2006-08-23 17:21:12 -07:00
if ( fn - > fn_flags & RTN_ROOT )
break ;
2005-04-16 15:20:36 -07:00
fn = fn - > parent ;
}
return NULL ;
}
2014-03-28 12:07:04 +08:00
struct fib6_node * fib6_lookup ( struct fib6_node * root , const struct in6_addr * daddr ,
const struct in6_addr * saddr )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn ;
2006-08-23 17:21:29 -07:00
struct lookup_args args [ ] = {
{
. offset = offsetof ( struct rt6_info , rt6i_dst ) ,
. addr = daddr ,
} ,
2005-04-16 15:20:36 -07:00
# ifdef CONFIG_IPV6_SUBTREES
2006-08-23 17:21:29 -07:00
{
. offset = offsetof ( struct rt6_info , rt6i_src ) ,
. addr = saddr ,
} ,
2005-04-16 15:20:36 -07:00
# endif
2006-08-23 17:21:29 -07:00
{
. offset = 0 , /* sentinel */
}
} ;
2005-04-16 15:20:36 -07:00
2006-08-23 17:21:50 -07:00
fn = fib6_lookup_1 ( root , daddr ? args : args + 1 ) ;
2011-12-03 17:50:45 -05:00
if ( ! fn | | fn - > fn_flags & RTN_TL_ROOT )
2005-04-16 15:20:36 -07:00
fn = root ;
return fn ;
}
/*
* Get node with specified destination prefix ( and source prefix ,
* if subtrees are used )
*/
2014-03-28 12:07:04 +08:00
static struct fib6_node * fib6_locate_1 ( struct fib6_node * root ,
const struct in6_addr * addr ,
int plen , int offset )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn ;
for ( fn = root ; fn ; ) {
struct rt6key * key = ( struct rt6key * ) ( ( u8 * ) fn - > leaf + offset ) ;
/*
* Prefix match
*/
if ( plen < fn - > fn_bit | |
! ipv6_prefix_equal ( & key - > addr , addr , fn - > fn_bit ) )
return NULL ;
if ( plen = = fn - > fn_bit )
return fn ;
/*
* We have more bits to go
*/
if ( addr_bit_set ( addr , fn - > fn_bit ) )
fn = fn - > right ;
else
fn = fn - > left ;
}
return NULL ;
}
2014-03-28 12:07:04 +08:00
struct fib6_node * fib6_locate ( struct fib6_node * root ,
const struct in6_addr * daddr , int dst_len ,
const struct in6_addr * saddr , int src_len )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn ;
fn = fib6_locate_1 ( root , daddr , dst_len ,
offsetof ( struct rt6_info , rt6i_dst ) ) ;
# ifdef CONFIG_IPV6_SUBTREES
if ( src_len ) {
2008-07-25 21:43:18 -07:00
WARN_ON ( saddr = = NULL ) ;
2006-08-23 17:21:12 -07:00
if ( fn & & fn - > subtree )
fn = fib6_locate_1 ( fn - > subtree , saddr , src_len ,
2005-04-16 15:20:36 -07:00
offsetof ( struct rt6_info , rt6i_src ) ) ;
}
# endif
2011-12-03 17:50:45 -05:00
if ( fn & & fn - > fn_flags & RTN_RTINFO )
2005-04-16 15:20:36 -07:00
return fn ;
return NULL ;
}
/*
* Deletion
*
*/
2008-03-04 13:48:30 -08:00
static struct rt6_info * fib6_find_prefix ( struct net * net , struct fib6_node * fn )
2005-04-16 15:20:36 -07:00
{
2011-12-03 17:50:45 -05:00
if ( fn - > fn_flags & RTN_ROOT )
2008-03-04 13:48:30 -08:00
return net - > ipv6 . ip6_null_entry ;
2005-04-16 15:20:36 -07:00
2011-12-03 17:50:45 -05:00
while ( fn ) {
if ( fn - > left )
2005-04-16 15:20:36 -07:00
return fn - > left - > leaf ;
2011-12-03 17:50:45 -05:00
if ( fn - > right )
2005-04-16 15:20:36 -07:00
return fn - > right - > leaf ;
2006-08-23 17:22:24 -07:00
fn = FIB6_SUBTREE ( fn ) ;
2005-04-16 15:20:36 -07:00
}
return NULL ;
}
/*
* Called to trim the tree of intermediate nodes when possible . " fn "
* is the node we want to try and remove .
*/
2008-03-04 13:48:30 -08:00
static struct fib6_node * fib6_repair_tree ( struct net * net ,
struct fib6_node * fn )
2005-04-16 15:20:36 -07:00
{
int children ;
int nstate ;
struct fib6_node * child , * pn ;
2014-10-06 19:58:34 +02:00
struct fib6_walker * w ;
2005-04-16 15:20:36 -07:00
int iter = 0 ;
for ( ; ; ) {
RT6_TRACE ( " fixing tree: plen=%d iter=%d \n " , fn - > fn_bit , iter ) ;
iter + + ;
2008-07-25 21:43:18 -07:00
WARN_ON ( fn - > fn_flags & RTN_RTINFO ) ;
WARN_ON ( fn - > fn_flags & RTN_TL_ROOT ) ;
2015-03-29 14:00:05 +01:00
WARN_ON ( fn - > leaf ) ;
2005-04-16 15:20:36 -07:00
children = 0 ;
child = NULL ;
2014-03-28 12:07:03 +08:00
if ( fn - > right )
child = fn - > right , children | = 1 ;
if ( fn - > left )
child = fn - > left , children | = 2 ;
2005-04-16 15:20:36 -07:00
2006-08-23 17:22:24 -07:00
if ( children = = 3 | | FIB6_SUBTREE ( fn )
2005-04-16 15:20:36 -07:00
# ifdef CONFIG_IPV6_SUBTREES
/* Subtree root (i.e. fn) may have one child */
2011-12-03 17:50:45 -05:00
| | ( children & & fn - > fn_flags & RTN_ROOT )
2005-04-16 15:20:36 -07:00
# endif
) {
2008-03-04 13:48:30 -08:00
fn - > leaf = fib6_find_prefix ( net , fn ) ;
2005-04-16 15:20:36 -07:00
# if RT6_DEBUG >= 2
2011-12-03 17:50:45 -05:00
if ( ! fn - > leaf ) {
2008-07-25 21:43:18 -07:00
WARN_ON ( ! fn - > leaf ) ;
2008-03-04 13:48:30 -08:00
fn - > leaf = net - > ipv6 . ip6_null_entry ;
2005-04-16 15:20:36 -07:00
}
# endif
atomic_inc ( & fn - > leaf - > rt6i_ref ) ;
return fn - > parent ;
}
pn = fn - > parent ;
# ifdef CONFIG_IPV6_SUBTREES
2006-08-23 17:22:24 -07:00
if ( FIB6_SUBTREE ( pn ) = = fn ) {
2008-07-25 21:43:18 -07:00
WARN_ON ( ! ( fn - > fn_flags & RTN_ROOT ) ) ;
2006-08-23 17:22:24 -07:00
FIB6_SUBTREE ( pn ) = NULL ;
2005-04-16 15:20:36 -07:00
nstate = FWS_L ;
} else {
2008-07-25 21:43:18 -07:00
WARN_ON ( fn - > fn_flags & RTN_ROOT ) ;
2005-04-16 15:20:36 -07:00
# endif
2014-03-28 12:07:03 +08:00
if ( pn - > right = = fn )
pn - > right = child ;
else if ( pn - > left = = fn )
pn - > left = child ;
2005-04-16 15:20:36 -07:00
# if RT6_DEBUG >= 2
2008-07-25 21:43:18 -07:00
else
WARN_ON ( 1 ) ;
2005-04-16 15:20:36 -07:00
# endif
if ( child )
child - > parent = pn ;
nstate = FWS_R ;
# ifdef CONFIG_IPV6_SUBTREES
}
# endif
2016-03-08 14:44:35 +01:00
read_lock ( & net - > ipv6 . fib6_walker_lock ) ;
FOR_WALKERS ( net , w ) {
2011-12-03 17:50:45 -05:00
if ( ! child ) {
2005-04-16 15:20:36 -07:00
if ( w - > root = = fn ) {
w - > root = w - > node = NULL ;
RT6_TRACE ( " W %p adjusted by delroot 1 \n " , w ) ;
} else if ( w - > node = = fn ) {
RT6_TRACE ( " W %p adjusted by delnode 1, s=%d/%d \n " , w , w - > state , nstate ) ;
w - > node = pn ;
w - > state = nstate ;
}
} else {
if ( w - > root = = fn ) {
w - > root = child ;
RT6_TRACE ( " W %p adjusted by delroot 2 \n " , w ) ;
}
if ( w - > node = = fn ) {
w - > node = child ;
if ( children & 2 ) {
RT6_TRACE ( " W %p adjusted by delnode 2, s=%d \n " , w , w - > state ) ;
2014-03-28 12:07:02 +08:00
w - > state = w - > state > = FWS_R ? FWS_U : FWS_INIT ;
2005-04-16 15:20:36 -07:00
} else {
RT6_TRACE ( " W %p adjusted by delnode 2, s=%d \n " , w , w - > state ) ;
2014-03-28 12:07:02 +08:00
w - > state = w - > state > = FWS_C ? FWS_U : FWS_INIT ;
2005-04-16 15:20:36 -07:00
}
}
}
}
2016-03-08 14:44:35 +01:00
read_unlock ( & net - > ipv6 . fib6_walker_lock ) ;
2005-04-16 15:20:36 -07:00
node_free ( fn ) ;
2011-12-03 17:50:45 -05:00
if ( pn - > fn_flags & RTN_RTINFO | | FIB6_SUBTREE ( pn ) )
2005-04-16 15:20:36 -07:00
return pn ;
rt6_release ( pn - > leaf ) ;
pn - > leaf = NULL ;
fn = pn ;
}
}
static void fib6_del_route ( struct fib6_node * fn , struct rt6_info * * rtp ,
2006-08-22 00:01:08 -07:00
struct nl_info * info )
2005-04-16 15:20:36 -07:00
{
2014-10-06 19:58:34 +02:00
struct fib6_walker * w ;
2005-04-16 15:20:36 -07:00
struct rt6_info * rt = * rtp ;
2008-03-03 23:34:17 -08:00
struct net * net = info - > nl_net ;
2005-04-16 15:20:36 -07:00
RT6_TRACE ( " fib6_del_route \n " ) ;
/* Unlink it */
2010-06-10 23:31:35 -07:00
* rtp = rt - > dst . rt6_next ;
2005-04-16 15:20:36 -07:00
rt - > rt6i_node = NULL ;
2008-03-03 23:34:17 -08:00
net - > ipv6 . rt6_stats - > fib_rt_entries - - ;
net - > ipv6 . rt6_stats - > fib_discarded_routes + + ;
2005-04-16 15:20:36 -07:00
[IPV6]: Fix routing round-robin locking.
As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.
Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf. The round robin code executes during
lookup and thus with the rwlock taken as a reader. A small local
spinlock tries to provide protection but this does not work at all
for two reasons:
1) The round-robin list manipulation, as coded, goes like this (with
read lock held):
walk routes finding head and tail
spin_lock();
rotate list using head and tail
spin_unlock();
While one thread is rotating the list, another thread can
end up with stale values of head and tail and then proceed
to corrupt the list when it gets the lock. This ends up causing
the OOPS in fib6_add() later onthat many people have been hitting.
2) All the other code paths that run with the rwlock held as
a reader do not expect the list to change on them, they
expect it to remain completely fixed while they hold the
lock in that way.
So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.
Reimplement using a per-fib6_node round-robin pointer. This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself. We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.
The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-24 20:36:25 -07:00
/* Reset round-robin state, if necessary */
if ( fn - > rr_ptr = = rt )
fn - > rr_ptr = NULL ;
2012-10-22 03:42:09 +00:00
/* Remove this entry from other siblings */
if ( rt - > rt6i_nsiblings ) {
struct rt6_info * sibling , * next_sibling ;
list_for_each_entry_safe ( sibling , next_sibling ,
& rt - > rt6i_siblings , rt6i_siblings )
sibling - > rt6i_nsiblings - - ;
rt - > rt6i_nsiblings = 0 ;
list_del_init ( & rt - > rt6i_siblings ) ;
}
2005-04-16 15:20:36 -07:00
/* Adjust walkers */
2016-03-08 14:44:35 +01:00
read_lock ( & net - > ipv6 . fib6_walker_lock ) ;
FOR_WALKERS ( net , w ) {
2005-04-16 15:20:36 -07:00
if ( w - > state = = FWS_C & & w - > leaf = = rt ) {
RT6_TRACE ( " walker %p adjusted by delroute \n " , w ) ;
2010-06-10 23:31:35 -07:00
w - > leaf = rt - > dst . rt6_next ;
2011-12-03 17:50:45 -05:00
if ( ! w - > leaf )
2005-04-16 15:20:36 -07:00
w - > state = FWS_U ;
}
}
2016-03-08 14:44:35 +01:00
read_unlock ( & net - > ipv6 . fib6_walker_lock ) ;
2005-04-16 15:20:36 -07:00
2010-06-10 23:31:35 -07:00
rt - > dst . rt6_next = NULL ;
2005-04-16 15:20:36 -07:00
/* If it was last route, expunge its radix tree node */
2011-12-03 17:50:45 -05:00
if ( ! fn - > leaf ) {
2005-04-16 15:20:36 -07:00
fn - > fn_flags & = ~ RTN_RTINFO ;
2008-03-03 23:34:17 -08:00
net - > ipv6 . rt6_stats - > fib_route_nodes - - ;
2008-03-04 13:48:30 -08:00
fn = fib6_repair_tree ( net , fn ) ;
2005-04-16 15:20:36 -07:00
}
2015-01-26 15:11:17 +01:00
fib6_purge_rt ( rt , fn , net ) ;
2005-04-16 15:20:36 -07:00
2015-09-13 10:18:33 -07:00
inet6_rt_notify ( RTM_DELROUTE , rt , info , 0 ) ;
2005-04-16 15:20:36 -07:00
rt6_release ( rt ) ;
}
2006-08-22 00:01:08 -07:00
int fib6_del ( struct rt6_info * rt , struct nl_info * info )
2005-04-16 15:20:36 -07:00
{
2008-03-04 13:48:30 -08:00
struct net * net = info - > nl_net ;
2005-04-16 15:20:36 -07:00
struct fib6_node * fn = rt - > rt6i_node ;
struct rt6_info * * rtp ;
# if RT6_DEBUG >= 2
2014-03-28 12:07:02 +08:00
if ( rt - > dst . obsolete > 0 ) {
2015-03-29 14:00:05 +01:00
WARN_ON ( fn ) ;
2005-04-16 15:20:36 -07:00
return - ENOENT ;
}
# endif
2011-12-03 17:50:45 -05:00
if ( ! fn | | rt = = net - > ipv6 . ip6_null_entry )
2005-04-16 15:20:36 -07:00
return - ENOENT ;
2008-07-25 21:43:18 -07:00
WARN_ON ( ! ( fn - > fn_flags & RTN_RTINFO ) ) ;
2005-04-16 15:20:36 -07:00
2011-12-03 17:50:45 -05:00
if ( ! ( rt - > rt6i_flags & RTF_CACHE ) ) {
2006-08-23 17:22:55 -07:00
struct fib6_node * pn = fn ;
# ifdef CONFIG_IPV6_SUBTREES
/* clones of this route might be in another subtree */
if ( rt - > rt6i_src . plen ) {
2011-12-03 17:50:45 -05:00
while ( ! ( pn - > fn_flags & RTN_ROOT ) )
2006-08-23 17:22:55 -07:00
pn = pn - > parent ;
pn = pn - > parent ;
}
# endif
2014-05-09 13:31:43 +08:00
fib6_prune_clones ( info - > nl_net , pn ) ;
2006-08-23 17:22:55 -07:00
}
2005-04-16 15:20:36 -07:00
/*
* Walk the leaf entries looking for ourself
*/
2010-06-10 23:31:35 -07:00
for ( rtp = & fn - > leaf ; * rtp ; rtp = & ( * rtp ) - > dst . rt6_next ) {
2005-04-16 15:20:36 -07:00
if ( * rtp = = rt ) {
2006-08-22 00:01:08 -07:00
fib6_del_route ( fn , rtp , info ) ;
2005-04-16 15:20:36 -07:00
return 0 ;
}
}
return - ENOENT ;
}
/*
* Tree traversal function .
*
* Certainly , it is not interrupt safe .
* However , it is internally reenterable wrt itself and fib6_add / fib6_del .
* It means , that we can modify tree during walking
* and use this function for garbage collection , clone pruning ,
2007-02-09 23:24:49 +09:00
* cleaning tree when a device goes down etc . etc .
2005-04-16 15:20:36 -07:00
*
* It guarantees that every node will be traversed ,
* and that it will be traversed only once .
*
* Callback function w - > func may return :
* 0 - > continue walking .
* positive value - > walking is suspended ( used by tree dumps ,
* and probably by gc , if it will be split to several slices )
* negative value - > terminate walking .
*
* The function itself returns :
* 0 - > walk is complete .
* > 0 - > walk is incomplete ( i . e . suspended )
* < 0 - > walk is terminated by an error .
*/
2014-10-06 19:58:34 +02:00
static int fib6_walk_continue ( struct fib6_walker * w )
2005-04-16 15:20:36 -07:00
{
struct fib6_node * fn , * pn ;
for ( ; ; ) {
fn = w - > node ;
2011-12-03 17:50:45 -05:00
if ( ! fn )
2005-04-16 15:20:36 -07:00
return 0 ;
if ( w - > prune & & fn ! = w - > root & &
2011-12-03 17:50:45 -05:00
fn - > fn_flags & RTN_RTINFO & & w - > state < FWS_C ) {
2005-04-16 15:20:36 -07:00
w - > state = FWS_C ;
w - > leaf = fn - > leaf ;
}
switch ( w - > state ) {
# ifdef CONFIG_IPV6_SUBTREES
case FWS_S :
2006-08-23 17:22:24 -07:00
if ( FIB6_SUBTREE ( fn ) ) {
w - > node = FIB6_SUBTREE ( fn ) ;
2005-04-16 15:20:36 -07:00
continue ;
}
w - > state = FWS_L ;
2007-02-09 23:24:49 +09:00
# endif
2005-04-16 15:20:36 -07:00
case FWS_L :
if ( fn - > left ) {
w - > node = fn - > left ;
w - > state = FWS_INIT ;
continue ;
}
w - > state = FWS_R ;
case FWS_R :
if ( fn - > right ) {
w - > node = fn - > right ;
w - > state = FWS_INIT ;
continue ;
}
w - > state = FWS_C ;
w - > leaf = fn - > leaf ;
case FWS_C :
2011-12-03 17:50:45 -05:00
if ( w - > leaf & & fn - > fn_flags & RTN_RTINFO ) {
2010-02-08 05:19:03 +00:00
int err ;
2012-06-25 15:37:19 -07:00
if ( w - > skip ) {
w - > skip - - ;
2014-04-24 09:48:53 -04:00
goto skip ;
2010-02-08 05:19:03 +00:00
}
err = w - > func ( w ) ;
2005-04-16 15:20:36 -07:00
if ( err )
return err ;
2010-02-08 05:19:03 +00:00
w - > count + + ;
2005-04-16 15:20:36 -07:00
continue ;
}
2014-04-24 09:48:53 -04:00
skip :
2005-04-16 15:20:36 -07:00
w - > state = FWS_U ;
case FWS_U :
if ( fn = = w - > root )
return 0 ;
pn = fn - > parent ;
w - > node = pn ;
# ifdef CONFIG_IPV6_SUBTREES
2006-08-23 17:22:24 -07:00
if ( FIB6_SUBTREE ( pn ) = = fn ) {
2008-07-25 21:43:18 -07:00
WARN_ON ( ! ( fn - > fn_flags & RTN_ROOT ) ) ;
2005-04-16 15:20:36 -07:00
w - > state = FWS_L ;
continue ;
}
# endif
if ( pn - > left = = fn ) {
w - > state = FWS_R ;
continue ;
}
if ( pn - > right = = fn ) {
w - > state = FWS_C ;
w - > leaf = w - > node - > leaf ;
continue ;
}
# if RT6_DEBUG >= 2
2008-07-25 21:43:18 -07:00
WARN_ON ( 1 ) ;
2005-04-16 15:20:36 -07:00
# endif
}
}
}
2016-03-08 14:44:35 +01:00
static int fib6_walk ( struct net * net , struct fib6_walker * w )
2005-04-16 15:20:36 -07:00
{
int res ;
w - > state = FWS_INIT ;
w - > node = w - > root ;
2016-03-08 14:44:35 +01:00
fib6_walker_link ( net , w ) ;
2005-04-16 15:20:36 -07:00
res = fib6_walk_continue ( w ) ;
if ( res < = 0 )
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , w ) ;
2005-04-16 15:20:36 -07:00
return res ;
}
2014-10-06 19:58:34 +02:00
static int fib6_clean_node ( struct fib6_walker * w )
2005-04-16 15:20:36 -07:00
{
int res ;
struct rt6_info * rt ;
2014-10-06 19:58:34 +02:00
struct fib6_cleaner * c = container_of ( w , struct fib6_cleaner , w ) ;
2008-03-03 23:31:57 -08:00
struct nl_info info = {
. nl_net = c - > net ,
} ;
2005-04-16 15:20:36 -07:00
2014-10-06 19:58:38 +02:00
if ( c - > sernum ! = FIB6_NO_SERNUM_CHANGE & &
w - > node - > fn_sernum ! = c - > sernum )
w - > node - > fn_sernum = c - > sernum ;
if ( ! c - > func ) {
WARN_ON_ONCE ( c - > sernum = = FIB6_NO_SERNUM_CHANGE ) ;
w - > leaf = NULL ;
return 0 ;
}
2010-06-10 23:31:35 -07:00
for ( rt = w - > leaf ; rt ; rt = rt - > dst . rt6_next ) {
2005-04-16 15:20:36 -07:00
res = c - > func ( rt , c - > arg ) ;
if ( res < 0 ) {
w - > leaf = rt ;
2007-12-13 09:45:12 -08:00
res = fib6_del ( rt , & info ) ;
2005-04-16 15:20:36 -07:00
if ( res ) {
# if RT6_DEBUG >= 2
2012-05-15 14:11:54 +00:00
pr_debug ( " %s: del failed: rt=%p@%p err=%d \n " ,
__func__ , rt , rt - > rt6i_node , res ) ;
2005-04-16 15:20:36 -07:00
# endif
continue ;
}
return 0 ;
}
2008-07-25 21:43:18 -07:00
WARN_ON ( res ! = 0 ) ;
2005-04-16 15:20:36 -07:00
}
w - > leaf = rt ;
return 0 ;
}
/*
* Convenient frontend to tree walker .
2007-02-09 23:24:49 +09:00
*
2005-04-16 15:20:36 -07:00
* func is called on each route .
* It may return - 1 - > delete this route .
* 0 - > continue walking
*
* prune = = 1 - > only immediate children of node ( certainly ,
* ignoring pure split nodes ) will be scanned .
*/
2008-03-03 23:31:57 -08:00
static void fib6_clean_tree ( struct net * net , struct fib6_node * root ,
2006-08-07 21:50:48 -07:00
int ( * func ) ( struct rt6_info * , void * arg ) ,
2014-10-06 19:58:38 +02:00
bool prune , int sernum , void * arg )
2005-04-16 15:20:36 -07:00
{
2014-10-06 19:58:34 +02:00
struct fib6_cleaner c ;
2005-04-16 15:20:36 -07:00
c . w . root = root ;
c . w . func = fib6_clean_node ;
c . w . prune = prune ;
2010-02-08 05:19:03 +00:00
c . w . count = 0 ;
c . w . skip = 0 ;
2005-04-16 15:20:36 -07:00
c . func = func ;
2014-10-06 19:58:38 +02:00
c . sernum = sernum ;
2005-04-16 15:20:36 -07:00
c . arg = arg ;
2008-03-03 23:31:57 -08:00
c . net = net ;
2005-04-16 15:20:36 -07:00
2016-03-08 14:44:35 +01:00
fib6_walk ( net , & c . w ) ;
2005-04-16 15:20:36 -07:00
}
2014-10-06 19:58:38 +02:00
static void __fib6_clean_all ( struct net * net ,
int ( * func ) ( struct rt6_info * , void * ) ,
int sernum , void * arg )
2006-08-04 23:20:06 -07:00
{
struct fib6_table * table ;
2008-03-03 23:25:27 -08:00
struct hlist_head * head ;
2006-08-10 23:11:17 -07:00
unsigned int h ;
2006-08-04 23:20:06 -07:00
2006-08-10 23:11:17 -07:00
rcu_read_lock ( ) ;
2009-07-30 18:52:15 -07:00
for ( h = 0 ; h < FIB6_TABLE_HASHSZ ; h + + ) {
2008-03-03 23:27:06 -08:00
head = & net - > ipv6 . fib_table_hash [ h ] ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry_rcu ( table , head , tb6_hlist ) {
2006-08-04 23:20:06 -07:00
write_lock_bh ( & table - > tb6_lock ) ;
2008-03-03 23:31:57 -08:00
fib6_clean_tree ( net , & table - > tb6_root ,
2014-10-06 19:58:38 +02:00
func , false , sernum , arg ) ;
2006-08-04 23:20:06 -07:00
write_unlock_bh ( & table - > tb6_lock ) ;
}
}
2006-08-10 23:11:17 -07:00
rcu_read_unlock ( ) ;
2006-08-04 23:20:06 -07:00
}
2014-10-06 19:58:38 +02:00
void fib6_clean_all ( struct net * net , int ( * func ) ( struct rt6_info * , void * ) ,
void * arg )
{
__fib6_clean_all ( net , func , FIB6_NO_SERNUM_CHANGE , arg ) ;
}
2005-04-16 15:20:36 -07:00
static int fib6_prune_clone ( struct rt6_info * rt , void * arg )
{
if ( rt - > rt6i_flags & RTF_CACHE ) {
RT6_TRACE ( " pruning clone %p \n " , rt ) ;
return - 1 ;
}
return 0 ;
}
2014-05-09 13:31:43 +08:00
static void fib6_prune_clones ( struct net * net , struct fib6_node * fn )
2005-04-16 15:20:36 -07:00
{
2014-10-06 19:58:38 +02:00
fib6_clean_tree ( net , fn , fib6_prune_clone , true ,
FIB6_NO_SERNUM_CHANGE , NULL ) ;
2014-09-28 00:46:06 +02:00
}
static void fib6_flush_trees ( struct net * net )
{
2014-10-06 19:58:37 +02:00
int new_sernum = fib6_new_sernum ( net ) ;
2014-09-28 00:46:06 +02:00
2014-10-06 19:58:38 +02:00
__fib6_clean_all ( net , NULL , new_sernum , NULL ) ;
2014-09-28 00:46:06 +02:00
}
2005-04-16 15:20:36 -07:00
/*
* Garbage collection
*/
2016-03-08 14:44:25 +01:00
struct fib6_gc_args
2005-04-16 15:20:36 -07:00
{
int timeout ;
int more ;
2016-03-08 14:44:25 +01:00
} ;
2005-04-16 15:20:36 -07:00
static int fib6_age ( struct rt6_info * rt , void * arg )
{
2016-03-08 14:44:25 +01:00
struct fib6_gc_args * gc_args = arg ;
2005-04-16 15:20:36 -07:00
unsigned long now = jiffies ;
/*
* check addrconf expiration here .
* Routes are expired even if they are in use .
*
* Also age clones . Note , that clones are aged out
* only if they are not in use now .
*/
2011-12-28 20:19:20 -05:00
if ( rt - > rt6i_flags & RTF_EXPIRES & & rt - > dst . expires ) {
if ( time_after ( now , rt - > dst . expires ) ) {
2005-04-16 15:20:36 -07:00
RT6_TRACE ( " expiring %p \n " , rt ) ;
return - 1 ;
}
2016-03-08 14:44:25 +01:00
gc_args - > more + + ;
2005-04-16 15:20:36 -07:00
} else if ( rt - > rt6i_flags & RTF_CACHE ) {
2010-06-10 23:31:35 -07:00
if ( atomic_read ( & rt - > dst . __refcnt ) = = 0 & &
2016-03-08 14:44:25 +01:00
time_after_eq ( now , rt - > dst . lastuse + gc_args - > timeout ) ) {
2005-04-16 15:20:36 -07:00
RT6_TRACE ( " aging clone %p \n " , rt ) ;
return - 1 ;
2012-01-27 15:14:01 -08:00
} else if ( rt - > rt6i_flags & RTF_GATEWAY ) {
struct neighbour * neigh ;
__u8 neigh_flags = 0 ;
neigh = dst_neigh_lookup ( & rt - > dst , & rt - > rt6i_gateway ) ;
if ( neigh ) {
neigh_flags = neigh - > flags ;
neigh_release ( neigh ) ;
}
2012-06-07 06:51:04 +00:00
if ( ! ( neigh_flags & NTF_ROUTER ) ) {
2012-01-27 15:14:01 -08:00
RT6_TRACE ( " purging route %p via non-router but gateway \n " ,
rt ) ;
return - 1 ;
}
2005-04-16 15:20:36 -07:00
}
2016-03-08 14:44:25 +01:00
gc_args - > more + + ;
2005-04-16 15:20:36 -07:00
}
return 0 ;
}
2013-08-01 10:04:14 +02:00
void fib6_run_gc ( unsigned long expires , struct net * net , bool force )
2005-04-16 15:20:36 -07:00
{
2016-03-08 14:44:25 +01:00
struct fib6_gc_args gc_args ;
2013-08-01 10:04:24 +02:00
unsigned long now ;
2013-08-01 10:04:14 +02:00
if ( force ) {
2016-03-08 14:44:45 +01:00
spin_lock_bh ( & net - > ipv6 . fib6_gc_lock ) ;
} else if ( ! spin_trylock_bh ( & net - > ipv6 . fib6_gc_lock ) ) {
2013-08-01 10:04:14 +02:00
mod_timer ( & net - > ipv6 . ip6_fib_timer , jiffies + HZ ) ;
return ;
2005-04-16 15:20:36 -07:00
}
2013-08-01 10:04:14 +02:00
gc_args . timeout = expires ? ( int ) expires :
net - > ipv6 . sysctl . ip6_rt_gc_interval ;
2005-04-16 15:20:36 -07:00
2008-07-22 14:35:50 -07:00
gc_args . more = icmp6_dst_gc ( ) ;
2008-03-03 23:27:06 -08:00
2016-03-08 14:44:25 +01:00
fib6_clean_all ( net , fib6_age , & gc_args ) ;
2013-08-01 10:04:24 +02:00
now = jiffies ;
net - > ipv6 . ip6_rt_last_gc = now ;
2005-04-16 15:20:36 -07:00
if ( gc_args . more )
2008-07-22 14:34:09 -07:00
mod_timer ( & net - > ipv6 . ip6_fib_timer ,
2013-08-01 10:04:24 +02:00
round_jiffies ( now
2008-07-22 14:34:09 -07:00
+ net - > ipv6 . sysctl . ip6_rt_gc_interval ) ) ;
2008-07-22 14:33:45 -07:00
else
del_timer ( & net - > ipv6 . ip6_fib_timer ) ;
2016-03-08 14:44:45 +01:00
spin_unlock_bh ( & net - > ipv6 . fib6_gc_lock ) ;
2005-04-16 15:20:36 -07:00
}
2008-03-03 23:28:58 -08:00
static void fib6_gc_timer_cb ( unsigned long arg )
{
2013-08-01 10:04:14 +02:00
fib6_run_gc ( 0 , ( struct net * ) arg , true ) ;
2008-03-03 23:28:58 -08:00
}
2010-01-17 03:35:32 +00:00
static int __net_init fib6_net_init ( struct net * net )
2005-04-16 15:20:36 -07:00
{
2010-10-13 08:22:03 +00:00
size_t size = sizeof ( struct hlist_head ) * FIB6_TABLE_HASHSZ ;
2016-03-08 14:44:45 +01:00
spin_lock_init ( & net - > ipv6 . fib6_gc_lock ) ;
2016-03-08 14:44:35 +01:00
rwlock_init ( & net - > ipv6 . fib6_walker_lock ) ;
INIT_LIST_HEAD ( & net - > ipv6 . fib6_walkers ) ;
2008-07-22 14:33:45 -07:00
setup_timer ( & net - > ipv6 . ip6_fib_timer , fib6_gc_timer_cb , ( unsigned long ) net ) ;
2008-03-03 23:31:11 -08:00
2008-03-03 23:34:17 -08:00
net - > ipv6 . rt6_stats = kzalloc ( sizeof ( * net - > ipv6 . rt6_stats ) , GFP_KERNEL ) ;
if ( ! net - > ipv6 . rt6_stats )
goto out_timer ;
2010-10-13 08:22:03 +00:00
/* Avoid false sharing : Use at least a full cache line */
size = max_t ( size_t , size , L1_CACHE_BYTES ) ;
net - > ipv6 . fib_table_hash = kzalloc ( size , GFP_KERNEL ) ;
2008-03-03 23:25:27 -08:00
if ( ! net - > ipv6 . fib_table_hash )
2008-03-03 23:34:17 -08:00
goto out_rt6_stats ;
2008-03-03 23:24:31 -08:00
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_main_tbl = kzalloc ( sizeof ( * net - > ipv6 . fib6_main_tbl ) ,
GFP_KERNEL ) ;
if ( ! net - > ipv6 . fib6_main_tbl )
2008-03-03 23:24:31 -08:00
goto out_fib_table_hash ;
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_main_tbl - > tb6_id = RT6_TABLE_MAIN ;
2008-03-04 13:48:30 -08:00
net - > ipv6 . fib6_main_tbl - > tb6_root . leaf = net - > ipv6 . ip6_null_entry ;
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_main_tbl - > tb6_root . fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO ;
2012-06-11 00:01:52 -07:00
inet_peer_base_init ( & net - > ipv6 . fib6_main_tbl - > tb6_peers ) ;
2008-03-03 23:24:31 -08:00
# ifdef CONFIG_IPV6_MULTIPLE_TABLES
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_local_tbl = kzalloc ( sizeof ( * net - > ipv6 . fib6_local_tbl ) ,
GFP_KERNEL ) ;
if ( ! net - > ipv6 . fib6_local_tbl )
2008-03-03 23:24:31 -08:00
goto out_fib6_main_tbl ;
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_local_tbl - > tb6_id = RT6_TABLE_LOCAL ;
2008-03-04 13:48:30 -08:00
net - > ipv6 . fib6_local_tbl - > tb6_root . leaf = net - > ipv6 . ip6_null_entry ;
2008-03-03 23:25:27 -08:00
net - > ipv6 . fib6_local_tbl - > tb6_root . fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO ;
2012-06-11 00:01:52 -07:00
inet_peer_base_init ( & net - > ipv6 . fib6_local_tbl - > tb6_peers ) ;
2008-03-03 23:24:31 -08:00
# endif
2008-03-03 23:25:27 -08:00
fib6_tables_init ( net ) ;
2007-12-07 00:45:16 -08:00
2008-07-22 14:33:45 -07:00
return 0 ;
2007-12-07 00:40:34 -08:00
2008-03-03 23:24:31 -08:00
# ifdef CONFIG_IPV6_MULTIPLE_TABLES
out_fib6_main_tbl :
2008-03-03 23:25:27 -08:00
kfree ( net - > ipv6 . fib6_main_tbl ) ;
2008-03-03 23:24:31 -08:00
# endif
out_fib_table_hash :
2008-03-03 23:25:27 -08:00
kfree ( net - > ipv6 . fib_table_hash ) ;
2008-03-03 23:34:17 -08:00
out_rt6_stats :
kfree ( net - > ipv6 . rt6_stats ) ;
2008-03-03 23:31:11 -08:00
out_timer :
2008-07-22 14:33:45 -07:00
return - ENOMEM ;
2014-03-28 12:07:02 +08:00
}
2008-03-03 23:25:27 -08:00
static void fib6_net_exit ( struct net * net )
{
2008-03-04 13:48:30 -08:00
rt6_ifdown ( net , NULL ) ;
2008-07-22 14:33:45 -07:00
del_timer_sync ( & net - > ipv6 . ip6_fib_timer ) ;
2008-03-03 23:25:27 -08:00
# ifdef CONFIG_IPV6_MULTIPLE_TABLES
2012-06-11 00:01:52 -07:00
inetpeer_invalidate_tree ( & net - > ipv6 . fib6_local_tbl - > tb6_peers ) ;
2008-03-03 23:25:27 -08:00
kfree ( net - > ipv6 . fib6_local_tbl ) ;
# endif
2012-06-11 00:01:52 -07:00
inetpeer_invalidate_tree ( & net - > ipv6 . fib6_main_tbl - > tb6_peers ) ;
2008-03-03 23:25:27 -08:00
kfree ( net - > ipv6 . fib6_main_tbl ) ;
kfree ( net - > ipv6 . fib_table_hash ) ;
2008-03-03 23:34:17 -08:00
kfree ( net - > ipv6 . rt6_stats ) ;
2008-03-03 23:25:27 -08:00
}
static struct pernet_operations fib6_net_ops = {
. init = fib6_net_init ,
. exit = fib6_net_exit ,
} ;
int __init fib6_init ( void )
{
int ret = - ENOMEM ;
2008-03-03 23:31:11 -08:00
2008-03-03 23:25:27 -08:00
fib6_node_kmem = kmem_cache_create ( " fib6_nodes " ,
sizeof ( struct fib6_node ) ,
0 , SLAB_HWCACHE_ALIGN ,
NULL ) ;
if ( ! fib6_node_kmem )
goto out ;
ret = register_pernet_subsys ( & fib6_net_ops ) ;
if ( ret )
2008-03-03 23:34:17 -08:00
goto out_kmem_cache_create ;
2012-06-16 01:12:19 -07:00
ret = __rtnl_register ( PF_INET6 , RTM_GETROUTE , NULL , inet6_dump_fib ,
NULL ) ;
if ( ret )
goto out_unregister_subsys ;
2014-09-28 00:46:06 +02:00
__fib6_flush_trees = fib6_flush_trees ;
2008-03-03 23:25:27 -08:00
out :
return ret ;
2012-06-16 01:12:19 -07:00
out_unregister_subsys :
unregister_pernet_subsys ( & fib6_net_ops ) ;
2007-12-07 00:40:34 -08:00
out_kmem_cache_create :
kmem_cache_destroy ( fib6_node_kmem ) ;
goto out ;
2005-04-16 15:20:36 -07:00
}
void fib6_gc_cleanup ( void )
{
2008-03-03 23:25:27 -08:00
unregister_pernet_subsys ( & fib6_net_ops ) ;
2005-04-16 15:20:36 -07:00
kmem_cache_destroy ( fib6_node_kmem ) ;
}
2013-09-21 16:55:59 +02:00
# ifdef CONFIG_PROC_FS
struct ipv6_route_iter {
struct seq_net_private p ;
2014-10-06 19:58:34 +02:00
struct fib6_walker w ;
2013-09-21 16:55:59 +02:00
loff_t skip ;
struct fib6_table * tbl ;
2014-10-06 19:58:35 +02:00
int sernum ;
2013-09-21 16:55:59 +02:00
} ;
static int ipv6_route_seq_show ( struct seq_file * seq , void * v )
{
struct rt6_info * rt = v ;
struct ipv6_route_iter * iter = seq - > private ;
seq_printf ( seq , " %pi6 %02x " , & rt - > rt6i_dst . addr , rt - > rt6i_dst . plen ) ;
# ifdef CONFIG_IPV6_SUBTREES
seq_printf ( seq , " %pi6 %02x " , & rt - > rt6i_src . addr , rt - > rt6i_src . plen ) ;
# else
seq_puts ( seq , " 00000000000000000000000000000000 00 " ) ;
# endif
if ( rt - > rt6i_flags & RTF_GATEWAY )
seq_printf ( seq , " %pi6 " , & rt - > rt6i_gateway ) ;
else
seq_puts ( seq , " 00000000000000000000000000000000 " ) ;
seq_printf ( seq , " %08x %08x %08x %08x %8s \n " ,
rt - > rt6i_metric , atomic_read ( & rt - > dst . __refcnt ) ,
rt - > dst . __use , rt - > rt6i_flags ,
rt - > dst . dev ? rt - > dst . dev - > name : " " ) ;
iter - > w . leaf = NULL ;
return 0 ;
}
2014-10-06 19:58:34 +02:00
static int ipv6_route_yield ( struct fib6_walker * w )
2013-09-21 16:55:59 +02:00
{
struct ipv6_route_iter * iter = w - > args ;
if ( ! iter - > skip )
return 1 ;
do {
iter - > w . leaf = iter - > w . leaf - > dst . rt6_next ;
iter - > skip - - ;
if ( ! iter - > skip & & iter - > w . leaf )
return 1 ;
} while ( iter - > w . leaf ) ;
return 0 ;
}
2016-03-08 14:44:35 +01:00
static void ipv6_route_seq_setup_walk ( struct ipv6_route_iter * iter ,
struct net * net )
2013-09-21 16:55:59 +02:00
{
memset ( & iter - > w , 0 , sizeof ( iter - > w ) ) ;
iter - > w . func = ipv6_route_yield ;
iter - > w . root = & iter - > tbl - > tb6_root ;
iter - > w . state = FWS_INIT ;
iter - > w . node = iter - > w . root ;
iter - > w . args = iter ;
2013-09-21 16:56:10 +02:00
iter - > sernum = iter - > w . root - > fn_sernum ;
2013-09-21 16:55:59 +02:00
INIT_LIST_HEAD ( & iter - > w . lh ) ;
2016-03-08 14:44:35 +01:00
fib6_walker_link ( net , & iter - > w ) ;
2013-09-21 16:55:59 +02:00
}
static struct fib6_table * ipv6_route_seq_next_table ( struct fib6_table * tbl ,
struct net * net )
{
unsigned int h ;
struct hlist_node * node ;
if ( tbl ) {
h = ( tbl - > tb6_id & ( FIB6_TABLE_HASHSZ - 1 ) ) + 1 ;
node = rcu_dereference_bh ( hlist_next_rcu ( & tbl - > tb6_hlist ) ) ;
} else {
h = 0 ;
node = NULL ;
}
while ( ! node & & h < FIB6_TABLE_HASHSZ ) {
node = rcu_dereference_bh (
hlist_first_rcu ( & net - > ipv6 . fib_table_hash [ h + + ] ) ) ;
}
return hlist_entry_safe ( node , struct fib6_table , tb6_hlist ) ;
}
2013-09-21 16:56:10 +02:00
static void ipv6_route_check_sernum ( struct ipv6_route_iter * iter )
{
if ( iter - > sernum ! = iter - > w . root - > fn_sernum ) {
iter - > sernum = iter - > w . root - > fn_sernum ;
iter - > w . state = FWS_INIT ;
iter - > w . node = iter - > w . root ;
WARN_ON ( iter - > w . skip ) ;
iter - > w . skip = iter - > w . count ;
}
}
2013-09-21 16:55:59 +02:00
static void * ipv6_route_seq_next ( struct seq_file * seq , void * v , loff_t * pos )
{
int r ;
struct rt6_info * n ;
struct net * net = seq_file_net ( seq ) ;
struct ipv6_route_iter * iter = seq - > private ;
if ( ! v )
goto iter_table ;
n = ( ( struct rt6_info * ) v ) - > dst . rt6_next ;
if ( n ) {
+ + * pos ;
return n ;
}
iter_table :
2013-09-21 16:56:10 +02:00
ipv6_route_check_sernum ( iter ) ;
2013-09-21 16:55:59 +02:00
read_lock ( & iter - > tbl - > tb6_lock ) ;
r = fib6_walk_continue ( & iter - > w ) ;
read_unlock ( & iter - > tbl - > tb6_lock ) ;
if ( r > 0 ) {
if ( v )
+ + * pos ;
return iter - > w . leaf ;
} else if ( r < 0 ) {
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , & iter - > w ) ;
2013-09-21 16:55:59 +02:00
return NULL ;
}
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , & iter - > w ) ;
2013-09-21 16:55:59 +02:00
iter - > tbl = ipv6_route_seq_next_table ( iter - > tbl , net ) ;
if ( ! iter - > tbl )
return NULL ;
2016-03-08 14:44:35 +01:00
ipv6_route_seq_setup_walk ( iter , net ) ;
2013-09-21 16:55:59 +02:00
goto iter_table ;
}
static void * ipv6_route_seq_start ( struct seq_file * seq , loff_t * pos )
__acquires ( RCU_BH )
{
struct net * net = seq_file_net ( seq ) ;
struct ipv6_route_iter * iter = seq - > private ;
rcu_read_lock_bh ( ) ;
iter - > tbl = ipv6_route_seq_next_table ( NULL , net ) ;
iter - > skip = * pos ;
if ( iter - > tbl ) {
2016-03-08 14:44:35 +01:00
ipv6_route_seq_setup_walk ( iter , net ) ;
2013-09-21 16:55:59 +02:00
return ipv6_route_seq_next ( seq , NULL , pos ) ;
} else {
return NULL ;
}
}
static bool ipv6_route_iter_active ( struct ipv6_route_iter * iter )
{
2014-10-06 19:58:34 +02:00
struct fib6_walker * w = & iter - > w ;
2013-09-21 16:55:59 +02:00
return w - > node & & ! ( w - > state = = FWS_U & & w - > node = = w - > root ) ;
}
static void ipv6_route_seq_stop ( struct seq_file * seq , void * v )
__releases ( RCU_BH )
{
2016-03-08 14:44:35 +01:00
struct net * net = seq_file_net ( seq ) ;
2013-09-21 16:55:59 +02:00
struct ipv6_route_iter * iter = seq - > private ;
if ( ipv6_route_iter_active ( iter ) )
2016-03-08 14:44:35 +01:00
fib6_walker_unlink ( net , & iter - > w ) ;
2013-09-21 16:55:59 +02:00
rcu_read_unlock_bh ( ) ;
}
static const struct seq_operations ipv6_route_seq_ops = {
. start = ipv6_route_seq_start ,
. next = ipv6_route_seq_next ,
. stop = ipv6_route_seq_stop ,
. show = ipv6_route_seq_show
} ;
int ipv6_route_open ( struct inode * inode , struct file * file )
{
return seq_open_net ( inode , file , & ipv6_route_seq_ops ,
sizeof ( struct ipv6_route_iter ) ) ;
}
# endif /* CONFIG_PROC_FS */