2015-05-12 15:56:21 +03:00
/*
* net / sched / cls_flower . c Flower classifier
*
* Copyright ( c ) 2015 Jiri Pirko < jiri @ resnulli . us >
*
* This program is free software ; you can redistribute it and / or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation ; either version 2 of the License , or
* ( at your option ) any later version .
*/
# include <linux/kernel.h>
# include <linux/init.h>
# include <linux/module.h>
# include <linux/rhashtable.h>
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
# include <linux/workqueue.h>
2015-05-12 15:56:21 +03:00
# include <linux/if_ether.h>
# include <linux/in6.h>
# include <linux/ip.h>
2017-04-22 23:52:47 +03:00
# include <linux/mpls.h>
2015-05-12 15:56:21 +03:00
# include <net/sch_generic.h>
# include <net/pkt_cls.h>
# include <net/ip.h>
# include <net/flow_dissector.h>
2016-09-08 16:23:47 +03:00
# include <net/dst.h>
# include <net/dst_metadata.h>
2015-05-12 15:56:21 +03:00
struct fl_flow_key {
int indev_ifindex ;
2015-06-04 19:16:39 +03:00
struct flow_dissector_key_control control ;
2016-09-08 16:23:47 +03:00
struct flow_dissector_key_control enc_control ;
2015-05-12 15:56:21 +03:00
struct flow_dissector_key_basic basic ;
struct flow_dissector_key_eth_addrs eth ;
2016-08-17 13:36:13 +03:00
struct flow_dissector_key_vlan vlan ;
2015-05-12 15:56:21 +03:00
union {
2015-06-04 19:16:40 +03:00
struct flow_dissector_key_ipv4_addrs ipv4 ;
2015-05-12 15:56:21 +03:00
struct flow_dissector_key_ipv6_addrs ipv6 ;
} ;
struct flow_dissector_key_ports tp ;
2016-12-07 15:48:28 +03:00
struct flow_dissector_key_icmp icmp ;
2017-01-11 16:05:43 +03:00
struct flow_dissector_key_arp arp ;
2016-09-08 16:23:47 +03:00
struct flow_dissector_key_keyid enc_key_id ;
union {
struct flow_dissector_key_ipv4_addrs enc_ipv4 ;
struct flow_dissector_key_ipv6_addrs enc_ipv6 ;
} ;
2016-11-07 16:14:39 +03:00
struct flow_dissector_key_ports enc_tp ;
2017-04-22 23:52:47 +03:00
struct flow_dissector_key_mpls mpls ;
2017-05-23 19:40:45 +03:00
struct flow_dissector_key_tcp tcp ;
2017-06-01 21:37:38 +03:00
struct flow_dissector_key_ip ip ;
2015-05-12 15:56:21 +03:00
} __aligned ( BITS_PER_LONG / 8 ) ; /* Ensure that we can do comparisons as longs. */
struct fl_flow_mask_range {
unsigned short int start ;
unsigned short int end ;
} ;
struct fl_flow_mask {
struct fl_flow_key key ;
struct fl_flow_mask_range range ;
struct rcu_head rcu ;
} ;
struct cls_fl_head {
struct rhashtable ht ;
struct fl_flow_mask mask ;
struct flow_dissector dissector ;
bool mask_assigned ;
struct list_head filters ;
struct rhashtable_params ht_params ;
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
union {
struct work_struct work ;
struct rcu_head rcu ;
} ;
2017-08-30 09:31:58 +03:00
struct idr handle_idr ;
2015-05-12 15:56:21 +03:00
} ;
struct cls_fl_filter {
struct rhash_head ht_node ;
struct fl_flow_key mkey ;
struct tcf_exts exts ;
struct tcf_result res ;
struct fl_flow_key key ;
struct list_head list ;
u32 handle ;
2016-06-05 17:11:18 +03:00
u32 flags ;
2015-05-12 15:56:21 +03:00
struct rcu_head rcu ;
2016-12-01 15:06:37 +03:00
struct net_device * hw_dev ;
2015-05-12 15:56:21 +03:00
} ;
static unsigned short int fl_mask_range ( const struct fl_flow_mask * mask )
{
return mask - > range . end - mask - > range . start ;
}
static void fl_mask_update_range ( struct fl_flow_mask * mask )
{
const u8 * bytes = ( const u8 * ) & mask - > key ;
size_t size = sizeof ( mask - > key ) ;
size_t i , first = 0 , last = size - 1 ;
for ( i = 0 ; i < sizeof ( mask - > key ) ; i + + ) {
if ( bytes [ i ] ) {
if ( ! first & & i )
first = i ;
last = i ;
}
}
mask - > range . start = rounddown ( first , sizeof ( long ) ) ;
mask - > range . end = roundup ( last + 1 , sizeof ( long ) ) ;
}
static void * fl_key_get_start ( struct fl_flow_key * key ,
const struct fl_flow_mask * mask )
{
return ( u8 * ) key + mask - > range . start ;
}
static void fl_set_masked_key ( struct fl_flow_key * mkey , struct fl_flow_key * key ,
struct fl_flow_mask * mask )
{
const long * lkey = fl_key_get_start ( key , mask ) ;
const long * lmask = fl_key_get_start ( & mask - > key , mask ) ;
long * lmkey = fl_key_get_start ( mkey , mask ) ;
int i ;
for ( i = 0 ; i < fl_mask_range ( mask ) ; i + = sizeof ( long ) )
* lmkey + + = * lkey + + & * lmask + + ;
}
static void fl_clear_masked_range ( struct fl_flow_key * key ,
struct fl_flow_mask * mask )
{
memset ( fl_key_get_start ( key , mask ) , 0 , fl_mask_range ( mask ) ) ;
}
2017-01-16 11:45:13 +03:00
static struct cls_fl_filter * fl_lookup ( struct cls_fl_head * head ,
struct fl_flow_key * mkey )
{
return rhashtable_lookup_fast ( & head - > ht ,
fl_key_get_start ( mkey , & head - > mask ) ,
head - > ht_params ) ;
}
2015-05-12 15:56:21 +03:00
static int fl_classify ( struct sk_buff * skb , const struct tcf_proto * tp ,
struct tcf_result * res )
{
struct cls_fl_head * head = rcu_dereference_bh ( tp - > root ) ;
struct cls_fl_filter * f ;
struct fl_flow_key skb_key ;
struct fl_flow_key skb_mkey ;
2016-06-05 17:11:18 +03:00
if ( ! atomic_read ( & head - > ht . nelems ) )
return - 1 ;
2015-05-12 15:56:21 +03:00
fl_clear_masked_range ( & skb_key , & head - > mask ) ;
2016-09-08 16:23:47 +03:00
2015-05-12 15:56:21 +03:00
skb_key . indev_ifindex = skb - > skb_iif ;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
* so do it rather here .
*/
skb_key . basic . n_proto = skb - > protocol ;
2015-09-01 19:24:27 +03:00
skb_flow_dissect ( skb , & head - > dissector , & skb_key , 0 ) ;
2015-05-12 15:56:21 +03:00
fl_set_masked_key ( & skb_mkey , & skb_key , & head - > mask ) ;
2017-01-16 11:45:13 +03:00
f = fl_lookup ( head , & skb_mkey ) ;
2016-06-13 12:06:39 +03:00
if ( f & & ! tc_skip_sw ( f - > flags ) ) {
2015-05-12 15:56:21 +03:00
* res = f - > res ;
return tcf_exts_exec ( skb , & f - > exts , res ) ;
}
return - 1 ;
}
static int fl_init ( struct tcf_proto * tp )
{
struct cls_fl_head * head ;
head = kzalloc ( sizeof ( * head ) , GFP_KERNEL ) ;
if ( ! head )
return - ENOBUFS ;
INIT_LIST_HEAD_RCU ( & head - > filters ) ;
rcu_assign_pointer ( tp - > root , head ) ;
2017-08-30 09:31:58 +03:00
idr_init ( & head - > handle_idr ) ;
2015-05-12 15:56:21 +03:00
return 0 ;
}
static void fl_destroy_filter ( struct rcu_head * head )
{
struct cls_fl_filter * f = container_of ( head , struct cls_fl_filter , rcu ) ;
tcf_exts_destroy ( & f - > exts ) ;
kfree ( f ) ;
}
2016-12-01 15:06:35 +03:00
static void fl_hw_destroy_filter ( struct tcf_proto * tp , struct cls_fl_filter * f )
2016-03-08 13:42:29 +03:00
{
2017-08-07 11:15:32 +03:00
struct tc_cls_flower_offload cls_flower = { } ;
2016-12-01 15:06:37 +03:00
struct net_device * dev = f - > hw_dev ;
2016-03-08 13:42:29 +03:00
2017-08-09 15:30:35 +03:00
if ( ! tc_can_offload ( dev ) )
2016-03-08 13:42:29 +03:00
return ;
2017-08-07 11:15:32 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp ) ;
cls_flower . command = TC_CLSFLOWER_DESTROY ;
cls_flower . cookie = ( unsigned long ) f ;
2016-03-08 13:42:29 +03:00
2017-08-07 11:15:32 +03:00
dev - > netdev_ops - > ndo_setup_tc ( dev , TC_SETUP_CLSFLOWER , & cls_flower ) ;
2016-03-08 13:42:29 +03:00
}
2016-06-13 12:06:39 +03:00
static int fl_hw_replace_filter ( struct tcf_proto * tp ,
struct flow_dissector * dissector ,
struct fl_flow_key * mask ,
2016-12-01 15:06:35 +03:00
struct cls_fl_filter * f )
2016-03-08 13:42:29 +03:00
{
struct net_device * dev = tp - > q - > dev_queue - > dev ;
2017-08-07 11:15:32 +03:00
struct tc_cls_flower_offload cls_flower = { } ;
2016-06-13 12:06:39 +03:00
int err ;
2016-03-08 13:42:29 +03:00
2017-08-09 15:30:35 +03:00
if ( ! tc_can_offload ( dev ) ) {
2016-12-04 16:25:19 +03:00
if ( tcf_exts_get_dev ( dev , & f - > exts , & f - > hw_dev ) | |
2017-08-09 15:30:35 +03:00
( f - > hw_dev & & ! tc_can_offload ( f - > hw_dev ) ) ) {
2016-12-04 16:25:19 +03:00
f - > hw_dev = dev ;
2016-12-01 15:06:37 +03:00
return tc_skip_sw ( f - > flags ) ? - EINVAL : 0 ;
2016-12-04 16:25:19 +03:00
}
2016-12-01 15:06:37 +03:00
dev = f - > hw_dev ;
2017-08-07 11:15:32 +03:00
cls_flower . egress_dev = true ;
2016-12-01 15:06:37 +03:00
} else {
f - > hw_dev = dev ;
}
2016-03-08 13:42:29 +03:00
2017-08-07 11:15:32 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp ) ;
cls_flower . command = TC_CLSFLOWER_REPLACE ;
cls_flower . cookie = ( unsigned long ) f ;
cls_flower . dissector = dissector ;
cls_flower . mask = mask ;
cls_flower . key = & f - > mkey ;
cls_flower . exts = & f - > exts ;
2016-03-08 13:42:29 +03:00
2017-08-07 11:15:32 +03:00
err = dev - > netdev_ops - > ndo_setup_tc ( dev , TC_SETUP_CLSFLOWER ,
& cls_flower ) ;
2017-02-16 11:31:13 +03:00
if ( ! err )
f - > flags | = TCA_CLS_FLAGS_IN_HW ;
2016-06-13 12:06:39 +03:00
2016-12-01 15:06:35 +03:00
if ( tc_skip_sw ( f - > flags ) )
2016-06-13 12:06:39 +03:00
return err ;
return 0 ;
2016-03-08 13:42:29 +03:00
}
2016-05-13 15:55:37 +03:00
static void fl_hw_update_stats ( struct tcf_proto * tp , struct cls_fl_filter * f )
{
2017-08-07 11:15:32 +03:00
struct tc_cls_flower_offload cls_flower = { } ;
2016-12-01 15:06:37 +03:00
struct net_device * dev = f - > hw_dev ;
2016-05-13 15:55:37 +03:00
2017-08-09 15:30:35 +03:00
if ( ! tc_can_offload ( dev ) )
2016-05-13 15:55:37 +03:00
return ;
2017-08-07 11:15:32 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp ) ;
cls_flower . command = TC_CLSFLOWER_STATS ;
cls_flower . cookie = ( unsigned long ) f ;
cls_flower . exts = & f - > exts ;
2016-05-13 15:55:37 +03:00
2017-08-16 18:15:18 +03:00
dev - > netdev_ops - > ndo_setup_tc ( dev , TC_SETUP_CLSFLOWER ,
2017-08-07 11:15:32 +03:00
& cls_flower ) ;
2016-05-13 15:55:37 +03:00
}
2016-11-01 17:08:29 +03:00
static void __fl_delete ( struct tcf_proto * tp , struct cls_fl_filter * f )
{
2017-08-30 09:31:58 +03:00
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
idr_remove_ext ( & head - > handle_idr , f - > handle ) ;
2016-11-01 17:08:29 +03:00
list_del_rcu ( & f - > list ) ;
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( f - > flags ) )
2016-12-01 15:06:35 +03:00
fl_hw_destroy_filter ( tp , f ) ;
2016-11-01 17:08:29 +03:00
tcf_unbind_filter ( tp , & f - > res ) ;
call_rcu ( & f - > rcu , fl_destroy_filter ) ;
}
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
static void fl_destroy_sleepable ( struct work_struct * work )
{
struct cls_fl_head * head = container_of ( work , struct cls_fl_head ,
work ) ;
if ( head - > mask_assigned )
rhashtable_destroy ( & head - > ht ) ;
kfree ( head ) ;
module_put ( THIS_MODULE ) ;
}
static void fl_destroy_rcu ( struct rcu_head * rcu )
{
struct cls_fl_head * head = container_of ( rcu , struct cls_fl_head , rcu ) ;
INIT_WORK ( & head - > work , fl_destroy_sleepable ) ;
schedule_work ( & head - > work ) ;
}
2017-04-20 00:21:21 +03:00
static void fl_destroy ( struct tcf_proto * tp )
2015-05-12 15:56:21 +03:00
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
struct cls_fl_filter * f , * next ;
2016-11-01 17:08:29 +03:00
list_for_each_entry_safe ( f , next , & head - > filters , list )
__fl_delete ( tp , f ) ;
2017-08-30 09:31:58 +03:00
idr_destroy ( & head - > handle_idr ) ;
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
__module_get ( THIS_MODULE ) ;
call_rcu ( & head - > rcu , fl_destroy_rcu ) ;
2015-05-12 15:56:21 +03:00
}
2017-08-05 07:31:43 +03:00
static void * fl_get ( struct tcf_proto * tp , u32 handle )
2015-05-12 15:56:21 +03:00
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
2017-08-30 09:31:58 +03:00
return idr_find_ext ( & head - > handle_idr , handle ) ;
2015-05-12 15:56:21 +03:00
}
static const struct nla_policy fl_policy [ TCA_FLOWER_MAX + 1 ] = {
[ TCA_FLOWER_UNSPEC ] = { . type = NLA_UNSPEC } ,
[ TCA_FLOWER_CLASSID ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_INDEV ] = { . type = NLA_STRING ,
. len = IFNAMSIZ } ,
[ TCA_FLOWER_KEY_ETH_DST ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_DST_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_SRC ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_SRC_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_TYPE ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_IP_PROTO ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IPV4_SRC ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_SRC_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_DST ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_DST_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV6_SRC ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_SRC_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_DST ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_DST_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_TCP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_DST ] = { . type = NLA_U16 } ,
2015-06-25 13:55:27 +03:00
[ TCA_FLOWER_KEY_UDP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_DST ] = { . type = NLA_U16 } ,
2016-08-17 13:36:13 +03:00
[ TCA_FLOWER_KEY_VLAN_ID ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_VLAN_PRIO ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_VLAN_ETH_TYPE ] = { . type = NLA_U16 } ,
2016-09-08 16:23:47 +03:00
[ TCA_FLOWER_KEY_ENC_KEY_ID ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_SRC ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_DST ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV6_SRC ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_DST ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
2016-09-15 15:28:22 +03:00
[ TCA_FLOWER_KEY_TCP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_DST_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_DST_MASK ] = { . type = NLA_U16 } ,
2016-11-03 15:24:21 +03:00
[ TCA_FLOWER_KEY_SCTP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_DST_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_DST ] = { . type = NLA_U16 } ,
2016-11-07 16:14:39 +03:00
[ TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_DST_PORT ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ] = { . type = NLA_U16 } ,
2016-12-07 15:03:10 +03:00
[ TCA_FLOWER_KEY_FLAGS ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_FLAGS_MASK ] = { . type = NLA_U32 } ,
2016-12-07 15:48:28 +03:00
[ TCA_FLOWER_KEY_ICMPV4_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_CODE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_CODE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_CODE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_CODE_MASK ] = { . type = NLA_U8 } ,
2017-01-11 16:05:43 +03:00
[ TCA_FLOWER_KEY_ARP_SIP ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_SIP_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_TIP ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_TIP_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_OP ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ARP_OP_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ARP_SHA ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_SHA_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_THA ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_THA_MASK ] = { . len = ETH_ALEN } ,
2017-04-22 23:52:47 +03:00
[ TCA_FLOWER_KEY_MPLS_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_BOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_TC ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_LABEL ] = { . type = NLA_U32 } ,
2017-05-23 19:40:45 +03:00
[ TCA_FLOWER_KEY_TCP_FLAGS ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_FLAGS_MASK ] = { . type = NLA_U16 } ,
2017-06-01 21:37:38 +03:00
[ TCA_FLOWER_KEY_IP_TOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TOS_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TTL_MASK ] = { . type = NLA_U8 } ,
2015-05-12 15:56:21 +03:00
} ;
static void fl_set_key_val ( struct nlattr * * tb ,
void * val , int val_type ,
void * mask , int mask_type , int len )
{
if ( ! tb [ val_type ] )
return ;
memcpy ( val , nla_data ( tb [ val_type ] ) , len ) ;
if ( mask_type = = TCA_FLOWER_UNSPEC | | ! tb [ mask_type ] )
memset ( mask , 0xff , len ) ;
else
memcpy ( mask , nla_data ( tb [ mask_type ] ) , len ) ;
}
2017-05-01 16:58:40 +03:00
static int fl_set_key_mpls ( struct nlattr * * tb ,
struct flow_dissector_key_mpls * key_val ,
struct flow_dissector_key_mpls * key_mask )
2017-04-22 23:52:47 +03:00
{
if ( tb [ TCA_FLOWER_KEY_MPLS_TTL ] ) {
key_val - > mpls_ttl = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_TTL ] ) ;
key_mask - > mpls_ttl = MPLS_TTL_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_BOS ] ) {
2017-05-01 16:58:40 +03:00
u8 bos = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_BOS ] ) ;
if ( bos & ~ MPLS_BOS_MASK )
return - EINVAL ;
key_val - > mpls_bos = bos ;
2017-04-22 23:52:47 +03:00
key_mask - > mpls_bos = MPLS_BOS_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_TC ] ) {
2017-05-01 16:58:40 +03:00
u8 tc = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_TC ] ) ;
if ( tc & ~ MPLS_TC_MASK )
return - EINVAL ;
key_val - > mpls_tc = tc ;
2017-04-22 23:52:47 +03:00
key_mask - > mpls_tc = MPLS_TC_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ) {
2017-05-01 16:58:40 +03:00
u32 label = nla_get_u32 ( tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ) ;
if ( label & ~ MPLS_LABEL_MASK )
return - EINVAL ;
key_val - > mpls_label = label ;
2017-04-22 23:52:47 +03:00
key_mask - > mpls_label = MPLS_LABEL_MASK ;
}
2017-05-01 16:58:40 +03:00
return 0 ;
2017-04-22 23:52:47 +03:00
}
2016-08-17 13:36:13 +03:00
static void fl_set_key_vlan ( struct nlattr * * tb ,
struct flow_dissector_key_vlan * key_val ,
struct flow_dissector_key_vlan * key_mask )
{
# define VLAN_PRIORITY_MASK 0x7
if ( tb [ TCA_FLOWER_KEY_VLAN_ID ] ) {
key_val - > vlan_id =
nla_get_u16 ( tb [ TCA_FLOWER_KEY_VLAN_ID ] ) & VLAN_VID_MASK ;
key_mask - > vlan_id = VLAN_VID_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_VLAN_PRIO ] ) {
key_val - > vlan_priority =
nla_get_u8 ( tb [ TCA_FLOWER_KEY_VLAN_PRIO ] ) &
VLAN_PRIORITY_MASK ;
key_mask - > vlan_priority = VLAN_PRIORITY_MASK ;
}
}
2016-12-07 15:03:10 +03:00
static void fl_set_key_flag ( u32 flower_key , u32 flower_mask ,
u32 * dissector_key , u32 * dissector_mask ,
u32 flower_flag_bit , u32 dissector_flag_bit )
{
if ( flower_mask & flower_flag_bit ) {
* dissector_mask | = dissector_flag_bit ;
if ( flower_key & flower_flag_bit )
* dissector_key | = dissector_flag_bit ;
}
}
2016-12-22 15:28:15 +03:00
static int fl_set_key_flags ( struct nlattr * * tb ,
u32 * flags_key , u32 * flags_mask )
2016-12-07 15:03:10 +03:00
{
u32 key , mask ;
2016-12-22 15:28:15 +03:00
/* mask is mandatory for flags */
if ( ! tb [ TCA_FLOWER_KEY_FLAGS_MASK ] )
return - EINVAL ;
2016-12-07 15:03:10 +03:00
key = be32_to_cpu ( nla_get_u32 ( tb [ TCA_FLOWER_KEY_FLAGS ] ) ) ;
2016-12-22 15:28:15 +03:00
mask = be32_to_cpu ( nla_get_u32 ( tb [ TCA_FLOWER_KEY_FLAGS_MASK ] ) ) ;
2016-12-07 15:03:10 +03:00
* flags_key = 0 ;
* flags_mask = 0 ;
fl_set_key_flag ( key , mask , flags_key , flags_mask ,
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT , FLOW_DIS_IS_FRAGMENT ) ;
2016-12-22 15:28:15 +03:00
return 0 ;
2016-12-07 15:03:10 +03:00
}
2017-06-01 21:37:38 +03:00
static void fl_set_key_ip ( struct nlattr * * tb ,
struct flow_dissector_key_ip * key ,
struct flow_dissector_key_ip * mask )
{
fl_set_key_val ( tb , & key - > tos , TCA_FLOWER_KEY_IP_TOS ,
& mask - > tos , TCA_FLOWER_KEY_IP_TOS_MASK ,
sizeof ( key - > tos ) ) ;
fl_set_key_val ( tb , & key - > ttl , TCA_FLOWER_KEY_IP_TTL ,
& mask - > ttl , TCA_FLOWER_KEY_IP_TTL_MASK ,
sizeof ( key - > ttl ) ) ;
}
2015-05-12 15:56:21 +03:00
static int fl_set_key ( struct net * net , struct nlattr * * tb ,
struct fl_flow_key * key , struct fl_flow_key * mask )
{
2016-08-17 13:36:13 +03:00
__be16 ethertype ;
2016-12-22 15:28:15 +03:00
int ret = 0 ;
2015-05-14 20:20:15 +03:00
# ifdef CONFIG_NET_CLS_IND
2015-05-12 15:56:21 +03:00
if ( tb [ TCA_FLOWER_INDEV ] ) {
2015-05-14 20:20:15 +03:00
int err = tcf_change_indev ( net , tb [ TCA_FLOWER_INDEV ] ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
return err ;
key - > indev_ifindex = err ;
mask - > indev_ifindex = 0xffffffff ;
}
2015-05-14 20:20:15 +03:00
# endif
2015-05-12 15:56:21 +03:00
fl_set_key_val ( tb , key - > eth . dst , TCA_FLOWER_KEY_ETH_DST ,
mask - > eth . dst , TCA_FLOWER_KEY_ETH_DST_MASK ,
sizeof ( key - > eth . dst ) ) ;
fl_set_key_val ( tb , key - > eth . src , TCA_FLOWER_KEY_ETH_SRC ,
mask - > eth . src , TCA_FLOWER_KEY_ETH_SRC_MASK ,
sizeof ( key - > eth . src ) ) ;
2016-01-10 19:47:01 +03:00
2016-08-26 18:25:45 +03:00
if ( tb [ TCA_FLOWER_KEY_ETH_TYPE ] ) {
2016-08-17 13:36:13 +03:00
ethertype = nla_get_be16 ( tb [ TCA_FLOWER_KEY_ETH_TYPE ] ) ;
2016-08-26 18:25:45 +03:00
if ( ethertype = = htons ( ETH_P_8021Q ) ) {
fl_set_key_vlan ( tb , & key - > vlan , & mask - > vlan ) ;
fl_set_key_val ( tb , & key - > basic . n_proto ,
TCA_FLOWER_KEY_VLAN_ETH_TYPE ,
& mask - > basic . n_proto , TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . n_proto ) ) ;
} else {
key - > basic . n_proto = ethertype ;
mask - > basic . n_proto = cpu_to_be16 ( ~ 0 ) ;
}
2016-08-17 13:36:13 +03:00
}
2016-01-10 19:47:01 +03:00
2015-05-12 15:56:21 +03:00
if ( key - > basic . n_proto = = htons ( ETH_P_IP ) | |
key - > basic . n_proto = = htons ( ETH_P_IPV6 ) ) {
fl_set_key_val ( tb , & key - > basic . ip_proto , TCA_FLOWER_KEY_IP_PROTO ,
& mask - > basic . ip_proto , TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . ip_proto ) ) ;
2017-06-01 21:37:38 +03:00
fl_set_key_ip ( tb , & key - > ip , & mask - > ip ) ;
2015-05-12 15:56:21 +03:00
}
2016-01-10 19:47:01 +03:00
if ( tb [ TCA_FLOWER_KEY_IPV4_SRC ] | | tb [ TCA_FLOWER_KEY_IPV4_DST ] ) {
key - > control . addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > control . addr_type = ~ 0 ;
2015-05-12 15:56:21 +03:00
fl_set_key_val ( tb , & key - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC ,
& mask - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC_MASK ,
sizeof ( key - > ipv4 . src ) ) ;
fl_set_key_val ( tb , & key - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST ,
& mask - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST_MASK ,
sizeof ( key - > ipv4 . dst ) ) ;
2016-01-10 19:47:01 +03:00
} else if ( tb [ TCA_FLOWER_KEY_IPV6_SRC ] | | tb [ TCA_FLOWER_KEY_IPV6_DST ] ) {
key - > control . addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > control . addr_type = ~ 0 ;
2015-05-12 15:56:21 +03:00
fl_set_key_val ( tb , & key - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC ,
& mask - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC_MASK ,
sizeof ( key - > ipv6 . src ) ) ;
fl_set_key_val ( tb , & key - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST ,
& mask - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST_MASK ,
sizeof ( key - > ipv6 . dst ) ) ;
}
2016-01-10 19:47:01 +03:00
2015-05-12 15:56:21 +03:00
if ( key - > basic . ip_proto = = IPPROTO_TCP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_TCP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_TCP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_TCP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_TCP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ;
2017-05-23 19:40:45 +03:00
fl_set_key_val ( tb , & key - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS ,
& mask - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS_MASK ,
sizeof ( key - > tcp . flags ) ) ;
2015-05-12 15:56:21 +03:00
} else if ( key - > basic . ip_proto = = IPPROTO_UDP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_UDP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_UDP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_UDP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_UDP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ;
2016-11-03 15:24:21 +03:00
} else if ( key - > basic . ip_proto = = IPPROTO_SCTP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_SCTP_SRC ,
& mask - > tp . src , TCA_FLOWER_KEY_SCTP_SRC_MASK ,
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_SCTP_DST ,
& mask - > tp . dst , TCA_FLOWER_KEY_SCTP_DST_MASK ,
sizeof ( key - > tp . dst ) ) ;
2016-12-07 15:48:28 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_IP ) & &
key - > basic . ip_proto = = IPPROTO_ICMP ) {
fl_set_key_val ( tb , & key - > icmp . type , TCA_FLOWER_KEY_ICMPV4_TYPE ,
& mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) ;
fl_set_key_val ( tb , & key - > icmp . code , TCA_FLOWER_KEY_ICMPV4_CODE ,
& mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ;
} else if ( key - > basic . n_proto = = htons ( ETH_P_IPV6 ) & &
key - > basic . ip_proto = = IPPROTO_ICMPV6 ) {
fl_set_key_val ( tb , & key - > icmp . type , TCA_FLOWER_KEY_ICMPV6_TYPE ,
& mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) ;
2017-01-30 18:19:02 +03:00
fl_set_key_val ( tb , & key - > icmp . code , TCA_FLOWER_KEY_ICMPV6_CODE ,
2016-12-07 15:48:28 +03:00
& mask - > icmp . code ,
2017-01-30 18:19:02 +03:00
TCA_FLOWER_KEY_ICMPV6_CODE_MASK ,
2016-12-07 15:48:28 +03:00
sizeof ( key - > icmp . code ) ) ;
2017-04-22 23:52:47 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_MPLS_UC ) | |
key - > basic . n_proto = = htons ( ETH_P_MPLS_MC ) ) {
2017-05-01 16:58:40 +03:00
ret = fl_set_key_mpls ( tb , & key - > mpls , & mask - > mpls ) ;
if ( ret )
return ret ;
2017-01-11 16:05:43 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_ARP ) | |
key - > basic . n_proto = = htons ( ETH_P_RARP ) ) {
fl_set_key_val ( tb , & key - > arp . sip , TCA_FLOWER_KEY_ARP_SIP ,
& mask - > arp . sip , TCA_FLOWER_KEY_ARP_SIP_MASK ,
sizeof ( key - > arp . sip ) ) ;
fl_set_key_val ( tb , & key - > arp . tip , TCA_FLOWER_KEY_ARP_TIP ,
& mask - > arp . tip , TCA_FLOWER_KEY_ARP_TIP_MASK ,
sizeof ( key - > arp . tip ) ) ;
fl_set_key_val ( tb , & key - > arp . op , TCA_FLOWER_KEY_ARP_OP ,
& mask - > arp . op , TCA_FLOWER_KEY_ARP_OP_MASK ,
sizeof ( key - > arp . op ) ) ;
fl_set_key_val ( tb , key - > arp . sha , TCA_FLOWER_KEY_ARP_SHA ,
mask - > arp . sha , TCA_FLOWER_KEY_ARP_SHA_MASK ,
sizeof ( key - > arp . sha ) ) ;
fl_set_key_val ( tb , key - > arp . tha , TCA_FLOWER_KEY_ARP_THA ,
mask - > arp . tha , TCA_FLOWER_KEY_ARP_THA_MASK ,
sizeof ( key - > arp . tha ) ) ;
2015-05-12 15:56:21 +03:00
}
2016-09-08 16:23:47 +03:00
if ( tb [ TCA_FLOWER_KEY_ENC_IPV4_SRC ] | |
tb [ TCA_FLOWER_KEY_ENC_IPV4_DST ] ) {
key - > enc_control . addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > enc_control . addr_type = ~ 0 ;
2016-09-08 16:23:47 +03:00
fl_set_key_val ( tb , & key - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC ,
& mask - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ,
sizeof ( key - > enc_ipv4 . src ) ) ;
fl_set_key_val ( tb , & key - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST ,
& mask - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ,
sizeof ( key - > enc_ipv4 . dst ) ) ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_IPV6_SRC ] | |
tb [ TCA_FLOWER_KEY_ENC_IPV6_DST ] ) {
key - > enc_control . addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > enc_control . addr_type = ~ 0 ;
2016-09-08 16:23:47 +03:00
fl_set_key_val ( tb , & key - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC ,
& mask - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ,
sizeof ( key - > enc_ipv6 . src ) ) ;
fl_set_key_val ( tb , & key - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST ,
& mask - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ,
sizeof ( key - > enc_ipv6 . dst ) ) ;
}
fl_set_key_val ( tb , & key - > enc_key_id . keyid , TCA_FLOWER_KEY_ENC_KEY_ID ,
2016-09-27 11:21:18 +03:00
& mask - > enc_key_id . keyid , TCA_FLOWER_UNSPEC ,
2016-09-08 16:23:47 +03:00
sizeof ( key - > enc_key_id . keyid ) ) ;
2016-11-07 16:14:39 +03:00
fl_set_key_val ( tb , & key - > enc_tp . src , TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ,
& mask - > enc_tp . src , TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ,
sizeof ( key - > enc_tp . src ) ) ;
fl_set_key_val ( tb , & key - > enc_tp . dst , TCA_FLOWER_KEY_ENC_UDP_DST_PORT ,
& mask - > enc_tp . dst , TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ,
sizeof ( key - > enc_tp . dst ) ) ;
2016-12-22 15:28:15 +03:00
if ( tb [ TCA_FLOWER_KEY_FLAGS ] )
ret = fl_set_key_flags ( tb , & key - > control . flags , & mask - > control . flags ) ;
2016-12-07 15:03:10 +03:00
2016-12-22 15:28:15 +03:00
return ret ;
2015-05-12 15:56:21 +03:00
}
static bool fl_mask_eq ( struct fl_flow_mask * mask1 ,
struct fl_flow_mask * mask2 )
{
const long * lmask1 = fl_key_get_start ( & mask1 - > key , mask1 ) ;
const long * lmask2 = fl_key_get_start ( & mask2 - > key , mask2 ) ;
return ! memcmp ( & mask1 - > range , & mask2 - > range , sizeof ( mask1 - > range ) ) & &
! memcmp ( lmask1 , lmask2 , fl_mask_range ( mask1 ) ) ;
}
static const struct rhashtable_params fl_ht_params = {
. key_offset = offsetof ( struct cls_fl_filter , mkey ) , /* base offset */
. head_offset = offsetof ( struct cls_fl_filter , ht_node ) ,
. automatic_shrinking = true ,
} ;
static int fl_init_hashtable ( struct cls_fl_head * head ,
struct fl_flow_mask * mask )
{
head - > ht_params = fl_ht_params ;
head - > ht_params . key_len = fl_mask_range ( mask ) ;
head - > ht_params . key_offset + = mask - > range . start ;
return rhashtable_init ( & head - > ht , & head - > ht_params ) ;
}
# define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
# define FL_KEY_MEMBER_SIZE(member) (sizeof(((struct fl_flow_key *) 0)->member))
2016-08-17 13:36:12 +03:00
# define FL_KEY_IS_MASKED(mask, member) \
memchr_inv ( ( ( char * ) mask ) + FL_KEY_MEMBER_OFFSET ( member ) , \
0 , FL_KEY_MEMBER_SIZE ( member ) ) \
2015-05-12 15:56:21 +03:00
# define FL_KEY_SET(keys, cnt, id, member) \
do { \
keys [ cnt ] . key_id = id ; \
keys [ cnt ] . offset = FL_KEY_MEMBER_OFFSET ( member ) ; \
cnt + + ; \
} while ( 0 ) ;
2016-08-17 13:36:12 +03:00
# define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member) \
2015-05-12 15:56:21 +03:00
do { \
2016-08-17 13:36:12 +03:00
if ( FL_KEY_IS_MASKED ( mask , member ) ) \
2015-05-12 15:56:21 +03:00
FL_KEY_SET ( keys , cnt , id , member ) ; \
} while ( 0 ) ;
static void fl_init_dissector ( struct cls_fl_head * head ,
struct fl_flow_mask * mask )
{
struct flow_dissector_key keys [ FLOW_DISSECTOR_KEY_MAX ] ;
size_t cnt = 0 ;
2015-06-04 19:16:39 +03:00
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_CONTROL , control ) ;
2015-05-12 15:56:21 +03:00
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_BASIC , basic ) ;
2016-08-17 13:36:12 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ETH_ADDRS , eth ) ;
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_IPV4_ADDRS , ipv4 ) ;
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_IPV6_ADDRS , ipv6 ) ;
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_PORTS , tp ) ;
2017-06-01 21:37:38 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_IP , ip ) ;
2017-05-23 19:40:45 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_TCP , tcp ) ;
2016-12-07 15:48:28 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ICMP , icmp ) ;
2017-01-11 16:05:43 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ARP , arp ) ;
2017-04-22 23:52:47 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_MPLS , mpls ) ;
2016-08-17 13:36:13 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_VLAN , vlan ) ;
2016-11-07 16:14:38 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ENC_KEYID , enc_key_id ) ;
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS , enc_ipv4 ) ;
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS , enc_ipv6 ) ;
if ( FL_KEY_IS_MASKED ( & mask - > key , enc_ipv4 ) | |
FL_KEY_IS_MASKED ( & mask - > key , enc_ipv6 ) )
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_ENC_CONTROL ,
enc_control ) ;
2016-11-07 16:14:39 +03:00
FL_KEY_SET_IF_MASKED ( & mask - > key , keys , cnt ,
FLOW_DISSECTOR_KEY_ENC_PORTS , enc_tp ) ;
2015-05-12 15:56:21 +03:00
skb_flow_dissector_init ( & head - > dissector , keys , cnt ) ;
}
static int fl_check_assign_mask ( struct cls_fl_head * head ,
struct fl_flow_mask * mask )
{
int err ;
if ( head - > mask_assigned ) {
if ( ! fl_mask_eq ( & head - > mask , mask ) )
return - EINVAL ;
else
return 0 ;
}
/* Mask is not assigned yet. So assign it and init hashtable
* according to that .
*/
err = fl_init_hashtable ( head , mask ) ;
if ( err )
return err ;
memcpy ( & head - > mask , mask , sizeof ( head - > mask ) ) ;
head - > mask_assigned = true ;
fl_init_dissector ( head , mask ) ;
return 0 ;
}
static int fl_set_parms ( struct net * net , struct tcf_proto * tp ,
struct cls_fl_filter * f , struct fl_flow_mask * mask ,
unsigned long base , struct nlattr * * tb ,
struct nlattr * est , bool ovr )
{
int err ;
2017-08-04 15:29:06 +03:00
err = tcf_exts_validate ( net , tp , tb , est , & f - > exts , ovr ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
return err ;
if ( tb [ TCA_FLOWER_CLASSID ] ) {
f - > res . classid = nla_get_u32 ( tb [ TCA_FLOWER_CLASSID ] ) ;
tcf_bind_filter ( tp , & f - > res , base ) ;
}
err = fl_set_key ( net , tb , & f - > key , & mask - > key ) ;
if ( err )
2017-08-04 15:29:06 +03:00
return err ;
2015-05-12 15:56:21 +03:00
fl_mask_update_range ( mask ) ;
fl_set_masked_key ( & f - > mkey , & f - > key , mask ) ;
return 0 ;
}
static int fl_change ( struct net * net , struct sk_buff * in_skb ,
struct tcf_proto * tp , unsigned long base ,
u32 handle , struct nlattr * * tca ,
2017-08-05 07:31:43 +03:00
void * * arg , bool ovr )
2015-05-12 15:56:21 +03:00
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
2017-08-05 07:31:43 +03:00
struct cls_fl_filter * fold = * arg ;
2015-05-12 15:56:21 +03:00
struct cls_fl_filter * fnew ;
2017-01-19 12:45:31 +03:00
struct nlattr * * tb ;
2015-05-12 15:56:21 +03:00
struct fl_flow_mask mask = { } ;
2017-08-30 09:31:58 +03:00
unsigned long idr_index ;
2015-05-12 15:56:21 +03:00
int err ;
if ( ! tca [ TCA_OPTIONS ] )
return - EINVAL ;
2017-01-19 12:45:31 +03:00
tb = kcalloc ( TCA_FLOWER_MAX + 1 , sizeof ( struct nlattr * ) , GFP_KERNEL ) ;
if ( ! tb )
return - ENOBUFS ;
2017-04-12 15:34:07 +03:00
err = nla_parse_nested ( tb , TCA_FLOWER_MAX , tca [ TCA_OPTIONS ] ,
fl_policy , NULL ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
2017-01-19 12:45:31 +03:00
goto errout_tb ;
2015-05-12 15:56:21 +03:00
2017-01-19 12:45:31 +03:00
if ( fold & & handle & & fold - > handle ! = handle ) {
err = - EINVAL ;
goto errout_tb ;
}
2015-05-12 15:56:21 +03:00
fnew = kzalloc ( sizeof ( * fnew ) , GFP_KERNEL ) ;
2017-01-19 12:45:31 +03:00
if ( ! fnew ) {
err = - ENOBUFS ;
goto errout_tb ;
}
2015-05-12 15:56:21 +03:00
2016-08-19 22:36:54 +03:00
err = tcf_exts_init ( & fnew - > exts , TCA_FLOWER_ACT , 0 ) ;
if ( err < 0 )
goto errout ;
2015-05-12 15:56:21 +03:00
if ( ! handle ) {
2017-08-30 09:31:58 +03:00
err = idr_alloc_ext ( & head - > handle_idr , fnew , & idr_index ,
1 , 0x80000000 , GFP_KERNEL ) ;
if ( err )
2015-05-12 15:56:21 +03:00
goto errout ;
2017-08-30 09:31:58 +03:00
fnew - > handle = idr_index ;
}
/* user specifies a handle and it doesn't exist */
if ( handle & & ! fold ) {
err = idr_alloc_ext ( & head - > handle_idr , fnew , & idr_index ,
handle , handle + 1 , GFP_KERNEL ) ;
if ( err )
goto errout ;
fnew - > handle = idr_index ;
2015-05-12 15:56:21 +03:00
}
2016-06-05 17:11:18 +03:00
if ( tb [ TCA_FLOWER_FLAGS ] ) {
fnew - > flags = nla_get_u32 ( tb [ TCA_FLOWER_FLAGS ] ) ;
if ( ! tc_flags_valid ( fnew - > flags ) ) {
err = - EINVAL ;
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2016-06-05 17:11:18 +03:00
}
}
2016-03-08 13:42:29 +03:00
2015-05-12 15:56:21 +03:00
err = fl_set_parms ( net , tp , fnew , & mask , base , tb , tca [ TCA_RATE ] , ovr ) ;
if ( err )
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2015-05-12 15:56:21 +03:00
err = fl_check_assign_mask ( head , & mask ) ;
if ( err )
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2015-05-12 15:56:21 +03:00
2016-06-13 12:06:39 +03:00
if ( ! tc_skip_sw ( fnew - > flags ) ) {
2017-01-16 11:45:13 +03:00
if ( ! fold & & fl_lookup ( head , & fnew - > mkey ) ) {
err = - EEXIST ;
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2017-01-16 11:45:13 +03:00
}
2016-06-05 17:11:18 +03:00
err = rhashtable_insert_fast ( & head - > ht , & fnew - > ht_node ,
head - > ht_params ) ;
if ( err )
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2016-06-05 17:11:18 +03:00
}
2016-03-08 13:42:29 +03:00
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( fnew - > flags ) ) {
err = fl_hw_replace_filter ( tp ,
& head - > dissector ,
& mask . key ,
2016-12-01 15:06:35 +03:00
fnew ) ;
2016-12-01 15:06:34 +03:00
if ( err )
2017-09-20 19:18:45 +03:00
goto errout_idr ;
2016-12-01 15:06:34 +03:00
}
2016-03-08 13:42:29 +03:00
2017-02-16 11:31:13 +03:00
if ( ! tc_in_hw ( fnew - > flags ) )
fnew - > flags | = TCA_CLS_FLAGS_NOT_IN_HW ;
2016-03-08 13:42:29 +03:00
if ( fold ) {
2016-11-28 17:40:13 +03:00
if ( ! tc_skip_sw ( fold - > flags ) )
rhashtable_remove_fast ( & head - > ht , & fold - > ht_node ,
head - > ht_params ) ;
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( fold - > flags ) )
2016-12-01 15:06:35 +03:00
fl_hw_destroy_filter ( tp , fold ) ;
2016-03-08 13:42:29 +03:00
}
2015-05-12 15:56:21 +03:00
2017-08-05 07:31:43 +03:00
* arg = fnew ;
2015-05-12 15:56:21 +03:00
if ( fold ) {
2017-08-30 09:31:58 +03:00
fnew - > handle = handle ;
idr_replace_ext ( & head - > handle_idr , fnew , fnew - > handle ) ;
2015-07-17 23:38:44 +03:00
list_replace_rcu ( & fold - > list , & fnew - > list ) ;
2015-05-12 15:56:21 +03:00
tcf_unbind_filter ( tp , & fold - > res ) ;
call_rcu ( & fold - > rcu , fl_destroy_filter ) ;
} else {
list_add_tail_rcu ( & fnew - > list , & head - > filters ) ;
}
2017-01-19 12:45:31 +03:00
kfree ( tb ) ;
2015-05-12 15:56:21 +03:00
return 0 ;
2017-09-20 19:18:45 +03:00
errout_idr :
if ( fnew - > handle )
idr_remove_ext ( & head - > handle_idr , fnew - > handle ) ;
2015-05-12 15:56:21 +03:00
errout :
2016-08-19 22:36:54 +03:00
tcf_exts_destroy ( & fnew - > exts ) ;
2015-05-12 15:56:21 +03:00
kfree ( fnew ) ;
2017-01-19 12:45:31 +03:00
errout_tb :
kfree ( tb ) ;
2015-05-12 15:56:21 +03:00
return err ;
}
2017-08-05 07:31:43 +03:00
static int fl_delete ( struct tcf_proto * tp , void * arg , bool * last )
2015-05-12 15:56:21 +03:00
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
2017-08-05 07:31:43 +03:00
struct cls_fl_filter * f = arg ;
2015-05-12 15:56:21 +03:00
2016-11-28 17:40:13 +03:00
if ( ! tc_skip_sw ( f - > flags ) )
rhashtable_remove_fast ( & head - > ht , & f - > ht_node ,
head - > ht_params ) ;
2016-11-01 17:08:29 +03:00
__fl_delete ( tp , f ) ;
2017-04-20 00:21:21 +03:00
* last = list_empty ( & head - > filters ) ;
2015-05-12 15:56:21 +03:00
return 0 ;
}
static void fl_walk ( struct tcf_proto * tp , struct tcf_walker * arg )
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
struct cls_fl_filter * f ;
list_for_each_entry_rcu ( f , & head - > filters , list ) {
if ( arg - > count < arg - > skip )
goto skip ;
2017-08-05 07:31:43 +03:00
if ( arg - > fn ( tp , f , arg ) < 0 ) {
2015-05-12 15:56:21 +03:00
arg - > stop = 1 ;
break ;
}
skip :
arg - > count + + ;
}
}
static int fl_dump_key_val ( struct sk_buff * skb ,
void * val , int val_type ,
void * mask , int mask_type , int len )
{
int err ;
if ( ! memchr_inv ( mask , 0 , len ) )
return 0 ;
err = nla_put ( skb , val_type , len , val ) ;
if ( err )
return err ;
if ( mask_type ! = TCA_FLOWER_UNSPEC ) {
err = nla_put ( skb , mask_type , len , mask ) ;
if ( err )
return err ;
}
return 0 ;
}
2017-04-22 23:52:47 +03:00
static int fl_dump_key_mpls ( struct sk_buff * skb ,
struct flow_dissector_key_mpls * mpls_key ,
struct flow_dissector_key_mpls * mpls_mask )
{
int err ;
if ( ! memchr_inv ( mpls_mask , 0 , sizeof ( * mpls_mask ) ) )
return 0 ;
if ( mpls_mask - > mpls_ttl ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_TTL ,
mpls_key - > mpls_ttl ) ;
if ( err )
return err ;
}
if ( mpls_mask - > mpls_tc ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_TC ,
mpls_key - > mpls_tc ) ;
if ( err )
return err ;
}
if ( mpls_mask - > mpls_label ) {
err = nla_put_u32 ( skb , TCA_FLOWER_KEY_MPLS_LABEL ,
mpls_key - > mpls_label ) ;
if ( err )
return err ;
}
if ( mpls_mask - > mpls_bos ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_BOS ,
mpls_key - > mpls_bos ) ;
if ( err )
return err ;
}
return 0 ;
}
2017-06-01 21:37:38 +03:00
static int fl_dump_key_ip ( struct sk_buff * skb ,
struct flow_dissector_key_ip * key ,
struct flow_dissector_key_ip * mask )
{
if ( fl_dump_key_val ( skb , & key - > tos , TCA_FLOWER_KEY_IP_TOS , & mask - > tos ,
TCA_FLOWER_KEY_IP_TOS_MASK , sizeof ( key - > tos ) ) | |
fl_dump_key_val ( skb , & key - > ttl , TCA_FLOWER_KEY_IP_TTL , & mask - > ttl ,
TCA_FLOWER_KEY_IP_TTL_MASK , sizeof ( key - > ttl ) ) )
return - 1 ;
return 0 ;
}
2016-08-17 13:36:13 +03:00
static int fl_dump_key_vlan ( struct sk_buff * skb ,
struct flow_dissector_key_vlan * vlan_key ,
struct flow_dissector_key_vlan * vlan_mask )
{
int err ;
if ( ! memchr_inv ( vlan_mask , 0 , sizeof ( * vlan_mask ) ) )
return 0 ;
if ( vlan_mask - > vlan_id ) {
err = nla_put_u16 ( skb , TCA_FLOWER_KEY_VLAN_ID ,
vlan_key - > vlan_id ) ;
if ( err )
return err ;
}
if ( vlan_mask - > vlan_priority ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_VLAN_PRIO ,
vlan_key - > vlan_priority ) ;
if ( err )
return err ;
}
return 0 ;
}
2016-12-07 15:03:10 +03:00
static void fl_get_key_flag ( u32 dissector_key , u32 dissector_mask ,
u32 * flower_key , u32 * flower_mask ,
u32 flower_flag_bit , u32 dissector_flag_bit )
{
if ( dissector_mask & dissector_flag_bit ) {
* flower_mask | = flower_flag_bit ;
if ( dissector_key & dissector_flag_bit )
* flower_key | = flower_flag_bit ;
}
}
static int fl_dump_key_flags ( struct sk_buff * skb , u32 flags_key , u32 flags_mask )
{
u32 key , mask ;
__be32 _key , _mask ;
int err ;
if ( ! memchr_inv ( & flags_mask , 0 , sizeof ( flags_mask ) ) )
return 0 ;
key = 0 ;
mask = 0 ;
fl_get_key_flag ( flags_key , flags_mask , & key , & mask ,
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT , FLOW_DIS_IS_FRAGMENT ) ;
_key = cpu_to_be32 ( key ) ;
_mask = cpu_to_be32 ( mask ) ;
err = nla_put ( skb , TCA_FLOWER_KEY_FLAGS , 4 , & _key ) ;
if ( err )
return err ;
return nla_put ( skb , TCA_FLOWER_KEY_FLAGS_MASK , 4 , & _mask ) ;
}
2017-08-05 07:31:43 +03:00
static int fl_dump ( struct net * net , struct tcf_proto * tp , void * fh ,
2015-05-12 15:56:21 +03:00
struct sk_buff * skb , struct tcmsg * t )
{
struct cls_fl_head * head = rtnl_dereference ( tp - > root ) ;
2017-08-05 07:31:43 +03:00
struct cls_fl_filter * f = fh ;
2015-05-12 15:56:21 +03:00
struct nlattr * nest ;
struct fl_flow_key * key , * mask ;
if ( ! f )
return skb - > len ;
t - > tcm_handle = f - > handle ;
nest = nla_nest_start ( skb , TCA_OPTIONS ) ;
if ( ! nest )
goto nla_put_failure ;
if ( f - > res . classid & &
nla_put_u32 ( skb , TCA_FLOWER_CLASSID , f - > res . classid ) )
goto nla_put_failure ;
key = & f - > key ;
mask = & head - > mask . key ;
if ( mask - > indev_ifindex ) {
struct net_device * dev ;
dev = __dev_get_by_index ( net , key - > indev_ifindex ) ;
if ( dev & & nla_put_string ( skb , TCA_FLOWER_INDEV , dev - > name ) )
goto nla_put_failure ;
}
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( f - > flags ) )
fl_hw_update_stats ( tp , f ) ;
2016-05-13 15:55:37 +03:00
2015-05-12 15:56:21 +03:00
if ( fl_dump_key_val ( skb , key - > eth . dst , TCA_FLOWER_KEY_ETH_DST ,
mask - > eth . dst , TCA_FLOWER_KEY_ETH_DST_MASK ,
sizeof ( key - > eth . dst ) ) | |
fl_dump_key_val ( skb , key - > eth . src , TCA_FLOWER_KEY_ETH_SRC ,
mask - > eth . src , TCA_FLOWER_KEY_ETH_SRC_MASK ,
sizeof ( key - > eth . src ) ) | |
fl_dump_key_val ( skb , & key - > basic . n_proto , TCA_FLOWER_KEY_ETH_TYPE ,
& mask - > basic . n_proto , TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . n_proto ) ) )
goto nla_put_failure ;
2016-08-17 13:36:13 +03:00
2017-04-22 23:52:47 +03:00
if ( fl_dump_key_mpls ( skb , & key - > mpls , & mask - > mpls ) )
goto nla_put_failure ;
2016-08-17 13:36:13 +03:00
if ( fl_dump_key_vlan ( skb , & key - > vlan , & mask - > vlan ) )
goto nla_put_failure ;
2015-05-12 15:56:21 +03:00
if ( ( key - > basic . n_proto = = htons ( ETH_P_IP ) | |
key - > basic . n_proto = = htons ( ETH_P_IPV6 ) ) & &
2017-06-01 21:37:38 +03:00
( fl_dump_key_val ( skb , & key - > basic . ip_proto , TCA_FLOWER_KEY_IP_PROTO ,
2015-05-12 15:56:21 +03:00
& mask - > basic . ip_proto , TCA_FLOWER_UNSPEC ,
2017-06-01 21:37:38 +03:00
sizeof ( key - > basic . ip_proto ) ) | |
fl_dump_key_ip ( skb , & key - > ip , & mask - > ip ) ) )
2015-05-12 15:56:21 +03:00
goto nla_put_failure ;
2015-06-04 19:16:40 +03:00
if ( key - > control . addr_type = = FLOW_DISSECTOR_KEY_IPV4_ADDRS & &
2015-05-12 15:56:21 +03:00
( fl_dump_key_val ( skb , & key - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC ,
& mask - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC_MASK ,
sizeof ( key - > ipv4 . src ) ) | |
fl_dump_key_val ( skb , & key - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST ,
& mask - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST_MASK ,
sizeof ( key - > ipv4 . dst ) ) ) )
goto nla_put_failure ;
2015-06-04 19:16:40 +03:00
else if ( key - > control . addr_type = = FLOW_DISSECTOR_KEY_IPV6_ADDRS & &
2015-05-12 15:56:21 +03:00
( fl_dump_key_val ( skb , & key - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC ,
& mask - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC_MASK ,
sizeof ( key - > ipv6 . src ) ) | |
fl_dump_key_val ( skb , & key - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST ,
& mask - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST_MASK ,
sizeof ( key - > ipv6 . dst ) ) ) )
goto nla_put_failure ;
if ( key - > basic . ip_proto = = IPPROTO_TCP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_TCP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_TCP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_TCP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_TCP_DST_MASK ,
2017-05-23 19:40:45 +03:00
sizeof ( key - > tp . dst ) ) | |
fl_dump_key_val ( skb , & key - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS ,
& mask - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS_MASK ,
sizeof ( key - > tcp . flags ) ) ) )
2015-05-12 15:56:21 +03:00
goto nla_put_failure ;
else if ( key - > basic . ip_proto = = IPPROTO_UDP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_UDP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_UDP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_UDP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_UDP_DST_MASK ,
2016-11-03 15:24:21 +03:00
sizeof ( key - > tp . dst ) ) ) )
goto nla_put_failure ;
else if ( key - > basic . ip_proto = = IPPROTO_SCTP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_SCTP_SRC ,
& mask - > tp . src , TCA_FLOWER_KEY_SCTP_SRC_MASK ,
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_SCTP_DST ,
& mask - > tp . dst , TCA_FLOWER_KEY_SCTP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ) )
goto nla_put_failure ;
2016-12-07 15:48:28 +03:00
else if ( key - > basic . n_proto = = htons ( ETH_P_IP ) & &
key - > basic . ip_proto = = IPPROTO_ICMP & &
( fl_dump_key_val ( skb , & key - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE , & mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) | |
fl_dump_key_val ( skb , & key - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE , & mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ) )
goto nla_put_failure ;
else if ( key - > basic . n_proto = = htons ( ETH_P_IPV6 ) & &
key - > basic . ip_proto = = IPPROTO_ICMPV6 & &
( fl_dump_key_val ( skb , & key - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE , & mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) | |
fl_dump_key_val ( skb , & key - > icmp . code ,
TCA_FLOWER_KEY_ICMPV6_CODE , & mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV6_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ) )
goto nla_put_failure ;
2017-01-11 16:05:43 +03:00
else if ( ( key - > basic . n_proto = = htons ( ETH_P_ARP ) | |
key - > basic . n_proto = = htons ( ETH_P_RARP ) ) & &
( fl_dump_key_val ( skb , & key - > arp . sip ,
TCA_FLOWER_KEY_ARP_SIP , & mask - > arp . sip ,
TCA_FLOWER_KEY_ARP_SIP_MASK ,
sizeof ( key - > arp . sip ) ) | |
fl_dump_key_val ( skb , & key - > arp . tip ,
TCA_FLOWER_KEY_ARP_TIP , & mask - > arp . tip ,
TCA_FLOWER_KEY_ARP_TIP_MASK ,
sizeof ( key - > arp . tip ) ) | |
fl_dump_key_val ( skb , & key - > arp . op ,
TCA_FLOWER_KEY_ARP_OP , & mask - > arp . op ,
TCA_FLOWER_KEY_ARP_OP_MASK ,
sizeof ( key - > arp . op ) ) | |
fl_dump_key_val ( skb , key - > arp . sha , TCA_FLOWER_KEY_ARP_SHA ,
mask - > arp . sha , TCA_FLOWER_KEY_ARP_SHA_MASK ,
sizeof ( key - > arp . sha ) ) | |
fl_dump_key_val ( skb , key - > arp . tha , TCA_FLOWER_KEY_ARP_THA ,
mask - > arp . tha , TCA_FLOWER_KEY_ARP_THA_MASK ,
sizeof ( key - > arp . tha ) ) ) )
goto nla_put_failure ;
2015-05-12 15:56:21 +03:00
2016-09-08 16:23:47 +03:00
if ( key - > enc_control . addr_type = = FLOW_DISSECTOR_KEY_IPV4_ADDRS & &
( fl_dump_key_val ( skb , & key - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC , & mask - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ,
sizeof ( key - > enc_ipv4 . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST , & mask - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ,
sizeof ( key - > enc_ipv4 . dst ) ) ) )
goto nla_put_failure ;
else if ( key - > enc_control . addr_type = = FLOW_DISSECTOR_KEY_IPV6_ADDRS & &
( fl_dump_key_val ( skb , & key - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC , & mask - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ,
sizeof ( key - > enc_ipv6 . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST ,
& mask - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ,
sizeof ( key - > enc_ipv6 . dst ) ) ) )
goto nla_put_failure ;
if ( fl_dump_key_val ( skb , & key - > enc_key_id , TCA_FLOWER_KEY_ENC_KEY_ID ,
2016-09-27 11:21:18 +03:00
& mask - > enc_key_id , TCA_FLOWER_UNSPEC ,
2016-11-07 16:14:39 +03:00
sizeof ( key - > enc_key_id ) ) | |
fl_dump_key_val ( skb , & key - > enc_tp . src ,
TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ,
& mask - > enc_tp . src ,
TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ,
sizeof ( key - > enc_tp . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_tp . dst ,
TCA_FLOWER_KEY_ENC_UDP_DST_PORT ,
& mask - > enc_tp . dst ,
TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ,
sizeof ( key - > enc_tp . dst ) ) )
2016-09-08 16:23:47 +03:00
goto nla_put_failure ;
2016-12-07 15:03:10 +03:00
if ( fl_dump_key_flags ( skb , key - > control . flags , mask - > control . flags ) )
goto nla_put_failure ;
2017-02-16 11:31:10 +03:00
if ( f - > flags & & nla_put_u32 ( skb , TCA_FLOWER_FLAGS , f - > flags ) )
goto nla_put_failure ;
2016-06-05 17:11:18 +03:00
2015-05-12 15:56:21 +03:00
if ( tcf_exts_dump ( skb , & f - > exts ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
if ( tcf_exts_dump_stats ( skb , & f - > exts ) < 0 )
goto nla_put_failure ;
return skb - > len ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - 1 ;
}
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 00:30:36 +03:00
static void fl_bind_class ( void * fh , u32 classid , unsigned long cl )
{
struct cls_fl_filter * f = fh ;
if ( f & & f - > res . classid = = classid )
f - > res . class = cl ;
}
2015-05-12 15:56:21 +03:00
static struct tcf_proto_ops cls_fl_ops __read_mostly = {
. kind = " flower " ,
. classify = fl_classify ,
. init = fl_init ,
. destroy = fl_destroy ,
. get = fl_get ,
. change = fl_change ,
. delete = fl_delete ,
. walk = fl_walk ,
. dump = fl_dump ,
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 00:30:36 +03:00
. bind_class = fl_bind_class ,
2015-05-12 15:56:21 +03:00
. owner = THIS_MODULE ,
} ;
static int __init cls_fl_init ( void )
{
return register_tcf_proto_ops ( & cls_fl_ops ) ;
}
static void __exit cls_fl_exit ( void )
{
unregister_tcf_proto_ops ( & cls_fl_ops ) ;
}
module_init ( cls_fl_init ) ;
module_exit ( cls_fl_exit ) ;
MODULE_AUTHOR ( " Jiri Pirko <jiri@resnulli.us> " ) ;
MODULE_DESCRIPTION ( " Flower classifier " ) ;
MODULE_LICENSE ( " GPL v2 " ) ;