2019-05-27 09:55:01 +03:00
// SPDX-License-Identifier: GPL-2.0-or-later
2015-05-12 15:56:21 +03:00
/*
* net / sched / cls_flower . c Flower classifier
*
* Copyright ( c ) 2015 Jiri Pirko < jiri @ resnulli . us >
*/
# include <linux/kernel.h>
# include <linux/init.h>
# include <linux/module.h>
# include <linux/rhashtable.h>
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
# include <linux/workqueue.h>
2019-03-21 16:17:35 +03:00
# include <linux/refcount.h>
2015-05-12 15:56:21 +03:00
# include <linux/if_ether.h>
# include <linux/in6.h>
# include <linux/ip.h>
2017-04-22 23:52:47 +03:00
# include <linux/mpls.h>
2015-05-12 15:56:21 +03:00
# include <net/sch_generic.h>
# include <net/pkt_cls.h>
2021-12-14 20:24:33 +03:00
# include <net/pkt_sched.h>
2015-05-12 15:56:21 +03:00
# include <net/ip.h>
# include <net/flow_dissector.h>
2018-08-07 18:36:01 +03:00
# include <net/geneve.h>
2019-11-21 13:03:28 +03:00
# include <net/vxlan.h>
2019-11-21 13:03:29 +03:00
# include <net/erspan.h>
2022-03-04 19:40:45 +03:00
# include <net/gtp.h>
2015-05-12 15:56:21 +03:00
2016-09-08 16:23:47 +03:00
# include <net/dst.h>
# include <net/dst_metadata.h>
2019-07-09 10:30:50 +03:00
# include <uapi/linux/netfilter/nf_conntrack_common.h>
2021-02-09 09:37:49 +03:00
# define TCA_FLOWER_KEY_CT_FLAGS_MAX \
( ( __TCA_FLOWER_KEY_CT_FLAGS_MAX - 1 ) < < 1 )
# define TCA_FLOWER_KEY_CT_FLAGS_MASK \
( TCA_FLOWER_KEY_CT_FLAGS_MAX - 1 )
2015-05-12 15:56:21 +03:00
struct fl_flow_key {
2019-06-19 09:41:03 +03:00
struct flow_dissector_key_meta meta ;
2015-06-04 19:16:39 +03:00
struct flow_dissector_key_control control ;
2016-09-08 16:23:47 +03:00
struct flow_dissector_key_control enc_control ;
2015-05-12 15:56:21 +03:00
struct flow_dissector_key_basic basic ;
struct flow_dissector_key_eth_addrs eth ;
2016-08-17 13:36:13 +03:00
struct flow_dissector_key_vlan vlan ;
2018-07-06 08:38:16 +03:00
struct flow_dissector_key_vlan cvlan ;
2015-05-12 15:56:21 +03:00
union {
2015-06-04 19:16:40 +03:00
struct flow_dissector_key_ipv4_addrs ipv4 ;
2015-05-12 15:56:21 +03:00
struct flow_dissector_key_ipv6_addrs ipv6 ;
} ;
struct flow_dissector_key_ports tp ;
2016-12-07 15:48:28 +03:00
struct flow_dissector_key_icmp icmp ;
2017-01-11 16:05:43 +03:00
struct flow_dissector_key_arp arp ;
2016-09-08 16:23:47 +03:00
struct flow_dissector_key_keyid enc_key_id ;
union {
struct flow_dissector_key_ipv4_addrs enc_ipv4 ;
struct flow_dissector_key_ipv6_addrs enc_ipv6 ;
} ;
2016-11-07 16:14:39 +03:00
struct flow_dissector_key_ports enc_tp ;
2017-04-22 23:52:47 +03:00
struct flow_dissector_key_mpls mpls ;
2017-05-23 19:40:45 +03:00
struct flow_dissector_key_tcp tcp ;
2017-06-01 21:37:38 +03:00
struct flow_dissector_key_ip ip ;
2018-07-17 19:27:18 +03:00
struct flow_dissector_key_ip enc_ip ;
2018-08-07 18:36:01 +03:00
struct flow_dissector_key_enc_opts enc_opts ;
2019-12-03 13:40:12 +03:00
union {
struct flow_dissector_key_ports tp ;
struct {
struct flow_dissector_key_ports tp_min ;
struct flow_dissector_key_ports tp_max ;
} ;
} tp_range ;
2019-07-09 10:30:50 +03:00
struct flow_dissector_key_ct ct ;
2020-07-23 01:03:01 +03:00
struct flow_dissector_key_hash hash ;
2015-05-12 15:56:21 +03:00
} __aligned ( BITS_PER_LONG / 8 ) ; /* Ensure that we can do comparisons as longs. */
struct fl_flow_mask_range {
unsigned short int start ;
unsigned short int end ;
} ;
struct fl_flow_mask {
struct fl_flow_key key ;
struct fl_flow_mask_range range ;
2018-11-13 03:15:55 +03:00
u32 flags ;
2018-04-30 14:28:30 +03:00
struct rhash_head ht_node ;
struct rhashtable ht ;
struct rhashtable_params filter_ht_params ;
struct flow_dissector dissector ;
struct list_head filters ;
2018-06-21 21:02:16 +03:00
struct rcu_work rwork ;
2018-04-30 14:28:30 +03:00
struct list_head list ;
2019-03-21 16:17:37 +03:00
refcount_t refcnt ;
2015-05-12 15:56:21 +03:00
} ;
2018-07-23 10:23:10 +03:00
struct fl_flow_tmplt {
struct fl_flow_key dummy_key ;
struct fl_flow_key mask ;
struct flow_dissector dissector ;
struct tcf_chain * chain ;
} ;
2015-05-12 15:56:21 +03:00
struct cls_fl_head {
struct rhashtable ht ;
2019-03-21 16:17:39 +03:00
spinlock_t masks_lock ; /* Protect masks list */
2018-04-30 14:28:30 +03:00
struct list_head masks ;
2019-04-24 09:53:31 +03:00
struct list_head hw_filters ;
2018-05-24 01:26:53 +03:00
struct rcu_work rwork ;
2017-08-30 09:31:58 +03:00
struct idr handle_idr ;
2015-05-12 15:56:21 +03:00
} ;
struct cls_fl_filter {
2018-04-30 14:28:30 +03:00
struct fl_flow_mask * mask ;
2015-05-12 15:56:21 +03:00
struct rhash_head ht_node ;
struct fl_flow_key mkey ;
struct tcf_exts exts ;
struct tcf_result res ;
struct fl_flow_key key ;
struct list_head list ;
2019-04-24 09:53:31 +03:00
struct list_head hw_list ;
2015-05-12 15:56:21 +03:00
u32 handle ;
2016-06-05 17:11:18 +03:00
u32 flags ;
2018-09-07 17:22:21 +03:00
u32 in_hw_count ;
2018-05-24 01:26:53 +03:00
struct rcu_work rwork ;
2016-12-01 15:06:37 +03:00
struct net_device * hw_dev ;
2019-03-21 16:17:35 +03:00
/* Flower classifier is unlocked, which means that its reference counter
* can be changed concurrently without any kind of external
* synchronization . Use atomic reference counter to be concurrency - safe .
*/
refcount_t refcnt ;
2019-03-21 16:17:36 +03:00
bool deleted ;
2015-05-12 15:56:21 +03:00
} ;
2018-04-30 14:28:30 +03:00
static const struct rhashtable_params mask_ht_params = {
. key_offset = offsetof ( struct fl_flow_mask , key ) ,
. key_len = sizeof ( struct fl_flow_key ) ,
. head_offset = offsetof ( struct fl_flow_mask , ht_node ) ,
. automatic_shrinking = true ,
} ;
2015-05-12 15:56:21 +03:00
static unsigned short int fl_mask_range ( const struct fl_flow_mask * mask )
{
return mask - > range . end - mask - > range . start ;
}
static void fl_mask_update_range ( struct fl_flow_mask * mask )
{
const u8 * bytes = ( const u8 * ) & mask - > key ;
size_t size = sizeof ( mask - > key ) ;
2018-04-30 14:28:30 +03:00
size_t i , first = 0 , last ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
for ( i = 0 ; i < size ; i + + ) {
if ( bytes [ i ] ) {
first = i ;
break ;
}
}
last = first ;
for ( i = size - 1 ; i ! = first ; i - - ) {
2015-05-12 15:56:21 +03:00
if ( bytes [ i ] ) {
last = i ;
2018-04-30 14:28:30 +03:00
break ;
2015-05-12 15:56:21 +03:00
}
}
mask - > range . start = rounddown ( first , sizeof ( long ) ) ;
mask - > range . end = roundup ( last + 1 , sizeof ( long ) ) ;
}
static void * fl_key_get_start ( struct fl_flow_key * key ,
const struct fl_flow_mask * mask )
{
return ( u8 * ) key + mask - > range . start ;
}
static void fl_set_masked_key ( struct fl_flow_key * mkey , struct fl_flow_key * key ,
struct fl_flow_mask * mask )
{
const long * lkey = fl_key_get_start ( key , mask ) ;
const long * lmask = fl_key_get_start ( & mask - > key , mask ) ;
long * lmkey = fl_key_get_start ( mkey , mask ) ;
int i ;
for ( i = 0 ; i < fl_mask_range ( mask ) ; i + = sizeof ( long ) )
* lmkey + + = * lkey + + & * lmask + + ;
}
2018-07-23 10:23:10 +03:00
static bool fl_mask_fits_tmplt ( struct fl_flow_tmplt * tmplt ,
struct fl_flow_mask * mask )
{
const long * lmask = fl_key_get_start ( & mask - > key , mask ) ;
const long * ltmplt ;
int i ;
if ( ! tmplt )
return true ;
ltmplt = fl_key_get_start ( & tmplt - > mask , mask ) ;
for ( i = 0 ; i < fl_mask_range ( mask ) ; i + = sizeof ( long ) ) {
if ( ~ * ltmplt + + & * lmask + + )
return false ;
}
return true ;
}
2015-05-12 15:56:21 +03:00
static void fl_clear_masked_range ( struct fl_flow_key * key ,
struct fl_flow_mask * mask )
{
memset ( fl_key_get_start ( key , mask ) , 0 , fl_mask_range ( mask ) ) ;
}
2018-11-13 03:15:55 +03:00
static bool fl_range_port_dst_cmp ( struct cls_fl_filter * filter ,
struct fl_flow_key * key ,
struct fl_flow_key * mkey )
{
2021-03-22 00:05:48 +03:00
u16 min_mask , max_mask , min_val , max_val ;
2018-11-13 03:15:55 +03:00
2021-03-22 00:05:48 +03:00
min_mask = ntohs ( filter - > mask - > key . tp_range . tp_min . dst ) ;
max_mask = ntohs ( filter - > mask - > key . tp_range . tp_max . dst ) ;
min_val = ntohs ( filter - > key . tp_range . tp_min . dst ) ;
max_val = ntohs ( filter - > key . tp_range . tp_max . dst ) ;
2018-11-13 03:15:55 +03:00
if ( min_mask & & max_mask ) {
2021-03-22 00:05:48 +03:00
if ( ntohs ( key - > tp_range . tp . dst ) < min_val | |
ntohs ( key - > tp_range . tp . dst ) > max_val )
2018-11-13 03:15:55 +03:00
return false ;
/* skb does not have min and max values */
2019-12-03 13:40:12 +03:00
mkey - > tp_range . tp_min . dst = filter - > mkey . tp_range . tp_min . dst ;
mkey - > tp_range . tp_max . dst = filter - > mkey . tp_range . tp_max . dst ;
2018-11-13 03:15:55 +03:00
}
return true ;
}
static bool fl_range_port_src_cmp ( struct cls_fl_filter * filter ,
struct fl_flow_key * key ,
struct fl_flow_key * mkey )
{
2021-03-22 00:05:48 +03:00
u16 min_mask , max_mask , min_val , max_val ;
2018-11-13 03:15:55 +03:00
2021-03-22 00:05:48 +03:00
min_mask = ntohs ( filter - > mask - > key . tp_range . tp_min . src ) ;
max_mask = ntohs ( filter - > mask - > key . tp_range . tp_max . src ) ;
min_val = ntohs ( filter - > key . tp_range . tp_min . src ) ;
max_val = ntohs ( filter - > key . tp_range . tp_max . src ) ;
2018-11-13 03:15:55 +03:00
if ( min_mask & & max_mask ) {
2021-03-22 00:05:48 +03:00
if ( ntohs ( key - > tp_range . tp . src ) < min_val | |
ntohs ( key - > tp_range . tp . src ) > max_val )
2018-11-13 03:15:55 +03:00
return false ;
/* skb does not have min and max values */
2019-12-03 13:40:12 +03:00
mkey - > tp_range . tp_min . src = filter - > mkey . tp_range . tp_min . src ;
mkey - > tp_range . tp_max . src = filter - > mkey . tp_range . tp_max . src ;
2018-11-13 03:15:55 +03:00
}
return true ;
}
static struct cls_fl_filter * __fl_lookup ( struct fl_flow_mask * mask ,
struct fl_flow_key * mkey )
2017-01-16 11:45:13 +03:00
{
2018-04-30 14:28:30 +03:00
return rhashtable_lookup_fast ( & mask - > ht , fl_key_get_start ( mkey , mask ) ,
mask - > filter_ht_params ) ;
2017-01-16 11:45:13 +03:00
}
2018-11-13 03:15:55 +03:00
static struct cls_fl_filter * fl_lookup_range ( struct fl_flow_mask * mask ,
struct fl_flow_key * mkey ,
struct fl_flow_key * key )
{
struct cls_fl_filter * filter , * f ;
list_for_each_entry_rcu ( filter , & mask - > filters , list ) {
if ( ! fl_range_port_dst_cmp ( filter , key , mkey ) )
continue ;
if ( ! fl_range_port_src_cmp ( filter , key , mkey ) )
continue ;
f = __fl_lookup ( mask , mkey ) ;
if ( f )
return f ;
}
return NULL ;
}
2020-05-29 23:13:58 +03:00
static noinline_for_stack
struct cls_fl_filter * fl_mask_lookup ( struct fl_flow_mask * mask , struct fl_flow_key * key )
2018-11-13 03:15:55 +03:00
{
2020-05-29 23:13:58 +03:00
struct fl_flow_key mkey ;
fl_set_masked_key ( & mkey , key , mask ) ;
2018-11-13 03:15:55 +03:00
if ( ( mask - > flags & TCA_FLOWER_MASK_FLAGS_RANGE ) )
2020-05-29 23:13:58 +03:00
return fl_lookup_range ( mask , & mkey , key ) ;
2018-11-13 03:15:55 +03:00
2020-05-29 23:13:58 +03:00
return __fl_lookup ( mask , & mkey ) ;
2018-11-13 03:15:55 +03:00
}
2019-07-09 10:30:50 +03:00
static u16 fl_ct_info_to_flower_map [ ] = {
[ IP_CT_ESTABLISHED ] = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED ,
[ IP_CT_RELATED ] = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
TCA_FLOWER_KEY_CT_FLAGS_RELATED ,
[ IP_CT_ESTABLISHED_REPLY ] = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
2021-01-27 17:32:45 +03:00
TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED |
TCA_FLOWER_KEY_CT_FLAGS_REPLY ,
2019-07-09 10:30:50 +03:00
[ IP_CT_RELATED_REPLY ] = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
2021-01-27 17:32:45 +03:00
TCA_FLOWER_KEY_CT_FLAGS_RELATED |
TCA_FLOWER_KEY_CT_FLAGS_REPLY ,
2019-07-09 10:30:50 +03:00
[ IP_CT_NEW ] = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
TCA_FLOWER_KEY_CT_FLAGS_NEW ,
} ;
2015-05-12 15:56:21 +03:00
static int fl_classify ( struct sk_buff * skb , const struct tcf_proto * tp ,
struct tcf_result * res )
{
struct cls_fl_head * head = rcu_dereference_bh ( tp - > root ) ;
2021-12-14 20:24:33 +03:00
bool post_ct = tc_skb_cb ( skb ) - > post_ct ;
2021-12-14 20:24:34 +03:00
u16 zone = tc_skb_cb ( skb ) - > zone ;
2019-07-09 10:30:50 +03:00
struct fl_flow_key skb_key ;
struct fl_flow_mask * mask ;
struct cls_fl_filter * f ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
list_for_each_entry_rcu ( mask , & head - > masks , list ) {
2020-02-17 23:38:09 +03:00
flow_dissector_init_keys ( & skb_key . control , & skb_key . basic ) ;
2018-04-30 14:28:30 +03:00
fl_clear_masked_range ( & skb_key , mask ) ;
2016-09-08 16:23:47 +03:00
2019-06-19 09:41:03 +03:00
skb_flow_dissect_meta ( skb , & mask - > dissector , & skb_key ) ;
2018-04-30 14:28:30 +03:00
/* skb_flow_dissect() does not set n_proto in case an unknown
* protocol , so do it rather here .
*/
sched: consistently handle layer3 header accesses in the presence of VLANs
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-03 23:26:43 +03:00
skb_key . basic . n_proto = skb_protocol ( skb , false ) ;
2018-04-30 14:28:30 +03:00
skb_flow_dissect_tunnel_info ( skb , & mask - > dissector , & skb_key ) ;
2019-07-09 10:30:50 +03:00
skb_flow_dissect_ct ( skb , & mask - > dissector , & skb_key ,
fl_ct_info_to_flower_map ,
2021-01-19 11:31:50 +03:00
ARRAY_SIZE ( fl_ct_info_to_flower_map ) ,
2021-12-14 20:24:34 +03:00
post_ct , zone ) ;
2020-07-23 01:03:01 +03:00
skb_flow_dissect_hash ( skb , & mask - > dissector , & skb_key ) ;
cls_flower: Fix inability to match GRE/IPIP packets
When a packet of a new flow arrives in openvswitch kernel module, it dissects
the packet and passes the extracted flow key to ovs-vswtichd daemon. If hw-
offload configuration is enabled, the daemon creates a new TC flower entry to
bypass openvswitch kernel module for the flow (TC flower can also offload flows
to NICs but this time that does not matter).
In this processing flow, I found the following issue in cases of GRE/IPIP
packets.
When ovs_flow_key_extract() in openvswitch module parses a packet of a new
GRE (or IPIP) flow received on non-tunneling vports, it extracts information
of the outer IP header for ip_proto/src_ip/dst_ip match keys.
This means ovs-vswitchd creates a TC flower entry with IP protocol/addresses
match keys whose values are those of the outer IP header. OTOH, TC flower,
which uses flow_dissector (different parser from openvswitch module), extracts
information of the inner IP header.
The following flow is an example to describe the issue in more detail.
<----------- Outer IP -----------------> <---------- Inner IP ---------->
+----------+--------------+--------------+----------+----------+----------+
| ip_proto | src_ip | dst_ip | ip_proto | src_ip | dst_ip |
| 47 (GRE) | 192.168.10.1 | 192.168.10.2 | 6 (TCP) | 10.0.0.1 | 10.0.0.2 |
+----------+--------------+--------------+----------+----------+----------+
In this case, TC flower entry and extracted information are shown as below:
- ovs-vswitchd creates TC flower entry with:
- ip_proto: 47
- src_ip: 192.168.10.1
- dst_ip: 192.168.10.2
- TC flower extracts below for IP header matches:
- ip_proto: 6
- src_ip: 10.0.0.1
- dst_ip: 10.0.0.2
Thus, GRE or IPIP packets never match the TC flower entry, as each
dissector behaves differently.
IMHO, the behavior of TC flower (flow dissector) does not look correct,
as ip_proto/src_ip/dst_ip in TC flower match means the outermost IP
header information except for GRE/IPIP cases. This patch adds a new
flow_dissector flag FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP which skips
dissection of the encapsulated inner GRE/IPIP header in TC flower
classifier.
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 12:21:41 +03:00
skb_flow_dissect ( skb , & mask - > dissector , & skb_key ,
FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP ) ;
2015-05-12 15:56:21 +03:00
2020-05-29 23:13:58 +03:00
f = fl_mask_lookup ( mask , & skb_key ) ;
2018-04-30 14:28:30 +03:00
if ( f & & ! tc_skip_sw ( f - > flags ) ) {
* res = f - > res ;
return tcf_exts_exec ( skb , & f - > exts , res ) ;
}
2015-05-12 15:56:21 +03:00
}
return - 1 ;
}
static int fl_init ( struct tcf_proto * tp )
{
struct cls_fl_head * head ;
head = kzalloc ( sizeof ( * head ) , GFP_KERNEL ) ;
if ( ! head )
return - ENOBUFS ;
2019-03-21 16:17:39 +03:00
spin_lock_init ( & head - > masks_lock ) ;
2018-04-30 14:28:30 +03:00
INIT_LIST_HEAD_RCU ( & head - > masks ) ;
2019-04-24 09:53:31 +03:00
INIT_LIST_HEAD ( & head - > hw_filters ) ;
2015-05-12 15:56:21 +03:00
rcu_assign_pointer ( tp - > root , head ) ;
2017-08-30 09:31:58 +03:00
idr_init ( & head - > handle_idr ) ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
return rhashtable_init ( & head - > ht , & mask_ht_params ) ;
}
2019-06-13 17:54:04 +03:00
static void fl_mask_free ( struct fl_flow_mask * mask , bool mask_init_done )
2018-06-21 21:02:16 +03:00
{
2019-06-13 17:54:04 +03:00
/* temporary masks don't have their filters list and ht initialized */
if ( mask_init_done ) {
WARN_ON ( ! list_empty ( & mask - > filters ) ) ;
rhashtable_destroy ( & mask - > ht ) ;
}
2018-06-21 21:02:16 +03:00
kfree ( mask ) ;
}
static void fl_mask_free_work ( struct work_struct * work )
{
struct fl_flow_mask * mask = container_of ( to_rcu_work ( work ) ,
struct fl_flow_mask , rwork ) ;
2019-06-13 17:54:04 +03:00
fl_mask_free ( mask , true ) ;
}
static void fl_uninit_mask_free_work ( struct work_struct * work )
{
struct fl_flow_mask * mask = container_of ( to_rcu_work ( work ) ,
struct fl_flow_mask , rwork ) ;
fl_mask_free ( mask , false ) ;
2018-06-21 21:02:16 +03:00
}
2019-04-12 00:54:19 +03:00
static bool fl_mask_put ( struct cls_fl_head * head , struct fl_flow_mask * mask )
2018-04-30 14:28:30 +03:00
{
2019-03-21 16:17:37 +03:00
if ( ! refcount_dec_and_test ( & mask - > refcnt ) )
2018-04-30 14:28:30 +03:00
return false ;
rhashtable_remove_fast ( & head - > ht , & mask - > ht_node , mask_ht_params ) ;
2019-03-21 16:17:39 +03:00
spin_lock ( & head - > masks_lock ) ;
2018-04-30 14:28:30 +03:00
list_del_rcu ( & mask - > list ) ;
2019-03-21 16:17:39 +03:00
spin_unlock ( & head - > masks_lock ) ;
2019-04-12 00:54:19 +03:00
tcf_queue_work ( & mask - > rwork , fl_mask_free_work ) ;
2018-04-30 14:28:30 +03:00
return true ;
2015-05-12 15:56:21 +03:00
}
2019-04-24 09:53:31 +03:00
static struct cls_fl_head * fl_head_dereference ( struct tcf_proto * tp )
{
/* Flower classifier only changes root pointer during init and destroy.
* Users must obtain reference to tcf_proto instance before calling its
* API , so tp - > root pointer is protected from concurrent call to
* fl_destroy ( ) by reference counting .
*/
return rcu_dereference_raw ( tp - > root ) ;
}
2017-11-07 00:47:24 +03:00
static void __fl_destroy_filter ( struct cls_fl_filter * f )
{
tcf_exts_destroy ( & f - > exts ) ;
tcf_exts_put_net ( & f - > exts ) ;
kfree ( f ) ;
}
2017-10-27 04:24:33 +03:00
static void fl_destroy_filter_work ( struct work_struct * work )
2015-05-12 15:56:21 +03:00
{
2018-05-24 01:26:53 +03:00
struct cls_fl_filter * f = container_of ( to_rcu_work ( work ) ,
struct cls_fl_filter , rwork ) ;
2015-05-12 15:56:21 +03:00
2017-11-07 00:47:24 +03:00
__fl_destroy_filter ( f ) ;
2017-10-27 04:24:33 +03:00
}
2018-01-24 23:54:21 +03:00
static void fl_hw_destroy_filter ( struct tcf_proto * tp , struct cls_fl_filter * f ,
2019-03-21 16:17:43 +03:00
bool rtnl_held , struct netlink_ext_ack * extack )
2016-03-08 13:42:29 +03:00
{
2017-10-19 16:50:32 +03:00
struct tcf_block * block = tp - > chain - > block ;
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2016-03-08 13:42:29 +03:00
2019-05-07 03:24:21 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp , f - > flags , extack ) ;
2019-07-09 23:55:49 +03:00
cls_flower . command = FLOW_CLS_DESTROY ;
2017-08-07 11:15:32 +03:00
cls_flower . cookie = ( unsigned long ) f ;
2016-03-08 13:42:29 +03:00
2019-08-26 16:44:59 +03:00
tc_setup_cb_destroy ( block , tp , TC_SETUP_CLSFLOWER , & cls_flower , false ,
2019-08-26 16:45:06 +03:00
& f - > flags , & f - > in_hw_count , rtnl_held ) ;
2019-03-21 16:17:43 +03:00
2016-03-08 13:42:29 +03:00
}
2016-06-13 12:06:39 +03:00
static int fl_hw_replace_filter ( struct tcf_proto * tp ,
2019-03-21 16:17:43 +03:00
struct cls_fl_filter * f , bool rtnl_held ,
2018-01-20 04:44:43 +03:00
struct netlink_ext_ack * extack )
2016-03-08 13:42:29 +03:00
{
2017-10-19 16:50:32 +03:00
struct tcf_block * block = tp - > chain - > block ;
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2017-10-11 10:41:09 +03:00
bool skip_sw = tc_skip_sw ( f - > flags ) ;
2019-03-21 16:17:43 +03:00
int err = 0 ;
2019-02-02 14:50:45 +03:00
cls_flower . rule = flow_rule_alloc ( tcf_exts_num_actions ( & f - > exts ) ) ;
2019-08-26 16:45:06 +03:00
if ( ! cls_flower . rule )
return - ENOMEM ;
2019-02-02 14:50:43 +03:00
2019-05-07 03:24:21 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp , f - > flags , extack ) ;
2019-07-09 23:55:49 +03:00
cls_flower . command = FLOW_CLS_REPLACE ;
2017-08-07 11:15:32 +03:00
cls_flower . cookie = ( unsigned long ) f ;
2019-02-02 14:50:43 +03:00
cls_flower . rule - > match . dissector = & f - > mask - > dissector ;
cls_flower . rule - > match . mask = & f - > mask - > key ;
cls_flower . rule - > match . key = & f - > mkey ;
2017-10-27 12:35:34 +03:00
cls_flower . classid = f - > res . classid ;
2016-03-08 13:42:29 +03:00
2021-12-17 21:16:20 +03:00
err = tc_setup_offload_action ( & cls_flower . rule - > action , & f - > exts ) ;
2019-02-02 14:50:46 +03:00
if ( err ) {
kfree ( cls_flower . rule ) ;
2019-08-26 16:45:06 +03:00
if ( skip_sw ) {
2019-02-13 00:39:06 +03:00
NL_SET_ERR_MSG_MOD ( extack , " Failed to setup flow action " ) ;
2019-08-26 16:45:06 +03:00
return err ;
}
return 0 ;
2019-02-02 14:50:46 +03:00
}
2019-08-26 16:44:59 +03:00
err = tc_setup_cb_add ( block , tp , TC_SETUP_CLSFLOWER , & cls_flower ,
2019-08-26 16:45:06 +03:00
skip_sw , & f - > flags , & f - > in_hw_count , rtnl_held ) ;
2021-12-17 21:16:20 +03:00
tc_cleanup_offload_action ( & cls_flower . rule - > action ) ;
2019-02-02 14:50:43 +03:00
kfree ( cls_flower . rule ) ;
2019-08-26 16:44:59 +03:00
if ( err ) {
2019-08-26 16:45:06 +03:00
fl_hw_destroy_filter ( tp , f , rtnl_held , NULL ) ;
return err ;
2017-10-11 10:41:09 +03:00
}
2019-08-26 16:45:06 +03:00
if ( skip_sw & & ! ( f - > flags & TCA_CLS_FLAGS_IN_HW ) )
return - EINVAL ;
2019-03-21 16:17:43 +03:00
2019-08-26 16:45:06 +03:00
return 0 ;
2016-03-08 13:42:29 +03:00
}
2019-03-21 16:17:43 +03:00
static void fl_hw_update_stats ( struct tcf_proto * tp , struct cls_fl_filter * f ,
bool rtnl_held )
2016-05-13 15:55:37 +03:00
{
2017-10-19 16:50:32 +03:00
struct tcf_block * block = tp - > chain - > block ;
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2016-05-13 15:55:37 +03:00
2019-05-07 03:24:21 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp , f - > flags , NULL ) ;
2019-07-09 23:55:49 +03:00
cls_flower . command = FLOW_CLS_STATS ;
2017-08-07 11:15:32 +03:00
cls_flower . cookie = ( unsigned long ) f ;
2017-10-27 12:35:34 +03:00
cls_flower . classid = f - > res . classid ;
2016-05-13 15:55:37 +03:00
2019-08-26 16:45:06 +03:00
tc_setup_cb_call ( block , TC_SETUP_CLSFLOWER , & cls_flower , false ,
rtnl_held ) ;
2019-02-02 14:50:47 +03:00
2021-12-17 21:16:24 +03:00
tcf_exts_hw_stats_update ( & f - > exts , cls_flower . stats . bytes ,
cls_flower . stats . pkts ,
cls_flower . stats . drops ,
cls_flower . stats . lastused ,
cls_flower . stats . used_hw_stats ,
cls_flower . stats . used_hw_stats_valid ) ;
2016-05-13 15:55:37 +03:00
}
2019-03-21 16:17:35 +03:00
static void __fl_put ( struct cls_fl_filter * f )
{
if ( ! refcount_dec_and_test ( & f - > refcnt ) )
return ;
if ( tcf_exts_get_net ( & f - > exts ) )
tcf_queue_work ( & f - > rwork , fl_destroy_filter_work ) ;
else
__fl_destroy_filter ( f ) ;
}
static struct cls_fl_filter * __fl_get ( struct cls_fl_head * head , u32 handle )
{
struct cls_fl_filter * f ;
rcu_read_lock ( ) ;
f = idr_find ( & head - > handle_idr , handle ) ;
if ( f & & ! refcount_inc_not_zero ( & f - > refcnt ) )
f = NULL ;
rcu_read_unlock ( ) ;
return f ;
}
2019-03-21 16:17:36 +03:00
static int __fl_delete ( struct tcf_proto * tp , struct cls_fl_filter * f ,
2019-03-21 16:17:43 +03:00
bool * last , bool rtnl_held ,
struct netlink_ext_ack * extack )
2016-11-01 17:08:29 +03:00
{
2019-03-21 16:17:33 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
2017-08-30 09:31:58 +03:00
2019-03-21 16:17:36 +03:00
* last = false ;
2019-03-21 16:17:42 +03:00
spin_lock ( & tp - > lock ) ;
if ( f - > deleted ) {
spin_unlock ( & tp - > lock ) ;
2019-03-21 16:17:36 +03:00
return - ENOENT ;
2019-03-21 16:17:42 +03:00
}
2019-03-21 16:17:36 +03:00
f - > deleted = true ;
rhashtable_remove_fast ( & f - > mask - > ht , & f - > ht_node ,
f - > mask - > filter_ht_params ) ;
2017-11-28 17:48:43 +03:00
idr_remove ( & head - > handle_idr , f - > handle ) ;
2016-11-01 17:08:29 +03:00
list_del_rcu ( & f - > list ) ;
2019-03-21 16:17:42 +03:00
spin_unlock ( & tp - > lock ) ;
2019-04-12 00:54:19 +03:00
* last = fl_mask_put ( head , f - > mask ) ;
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( f - > flags ) )
2019-03-21 16:17:43 +03:00
fl_hw_destroy_filter ( tp , f , rtnl_held , extack ) ;
2016-11-01 17:08:29 +03:00
tcf_unbind_filter ( tp , & f - > res ) ;
2019-03-21 16:17:35 +03:00
__fl_put ( f ) ;
2018-04-30 14:28:30 +03:00
2019-03-21 16:17:36 +03:00
return 0 ;
2016-11-01 17:08:29 +03:00
}
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
static void fl_destroy_sleepable ( struct work_struct * work )
{
2018-05-24 01:26:53 +03:00
struct cls_fl_head * head = container_of ( to_rcu_work ( work ) ,
struct cls_fl_head ,
rwork ) ;
2018-06-03 10:06:13 +03:00
rhashtable_destroy ( & head - > ht ) ;
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
kfree ( head ) ;
module_put ( THIS_MODULE ) ;
}
2019-02-11 11:55:45 +03:00
static void fl_destroy ( struct tcf_proto * tp , bool rtnl_held ,
struct netlink_ext_ack * extack )
2015-05-12 15:56:21 +03:00
{
2019-03-21 16:17:33 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
2018-04-30 14:28:30 +03:00
struct fl_flow_mask * mask , * next_mask ;
2015-05-12 15:56:21 +03:00
struct cls_fl_filter * f , * next ;
2019-03-21 16:17:36 +03:00
bool last ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
list_for_each_entry_safe ( mask , next_mask , & head - > masks , list ) {
list_for_each_entry_safe ( f , next , & mask - > filters , list ) {
2019-03-21 16:17:43 +03:00
__fl_delete ( tp , f , & last , rtnl_held , extack ) ;
2019-03-21 16:17:36 +03:00
if ( last )
2018-04-30 14:28:30 +03:00
break ;
}
}
2017-08-30 09:31:58 +03:00
idr_destroy ( & head - > handle_idr ) ;
net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.
The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.
In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.
Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.
Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)
This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 03:18:01 +03:00
__module_get ( THIS_MODULE ) ;
2018-05-24 01:26:53 +03:00
tcf_queue_work ( & head - > rwork , fl_destroy_sleepable ) ;
2015-05-12 15:56:21 +03:00
}
2019-03-21 16:17:35 +03:00
static void fl_put ( struct tcf_proto * tp , void * arg )
{
struct cls_fl_filter * f = arg ;
__fl_put ( f ) ;
}
2017-08-05 07:31:43 +03:00
static void * fl_get ( struct tcf_proto * tp , u32 handle )
2015-05-12 15:56:21 +03:00
{
2019-03-21 16:17:33 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
2015-05-12 15:56:21 +03:00
2019-03-21 16:17:35 +03:00
return __fl_get ( head , handle ) ;
2015-05-12 15:56:21 +03:00
}
static const struct nla_policy fl_policy [ TCA_FLOWER_MAX + 1 ] = {
[ TCA_FLOWER_UNSPEC ] = { . type = NLA_UNSPEC } ,
[ TCA_FLOWER_CLASSID ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_INDEV ] = { . type = NLA_STRING ,
. len = IFNAMSIZ } ,
[ TCA_FLOWER_KEY_ETH_DST ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_DST_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_SRC ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_SRC_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ETH_TYPE ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_IP_PROTO ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IPV4_SRC ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_SRC_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_DST ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV4_DST_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_IPV6_SRC ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_SRC_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_DST ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_IPV6_DST_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_TCP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_DST ] = { . type = NLA_U16 } ,
2015-06-25 13:55:27 +03:00
[ TCA_FLOWER_KEY_UDP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_DST ] = { . type = NLA_U16 } ,
2016-08-17 13:36:13 +03:00
[ TCA_FLOWER_KEY_VLAN_ID ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_VLAN_PRIO ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_VLAN_ETH_TYPE ] = { . type = NLA_U16 } ,
2016-09-08 16:23:47 +03:00
[ TCA_FLOWER_KEY_ENC_KEY_ID ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_SRC ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_DST ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_IPV6_SRC ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_DST ] = { . len = sizeof ( struct in6_addr ) } ,
[ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ] = { . len = sizeof ( struct in6_addr ) } ,
2016-09-15 15:28:22 +03:00
[ TCA_FLOWER_KEY_TCP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_DST_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_UDP_DST_MASK ] = { . type = NLA_U16 } ,
2016-11-03 15:24:21 +03:00
[ TCA_FLOWER_KEY_SCTP_SRC_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_DST_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_SRC ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_SCTP_DST ] = { . type = NLA_U16 } ,
2016-11-07 16:14:39 +03:00
[ TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_DST_PORT ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ] = { . type = NLA_U16 } ,
2016-12-07 15:03:10 +03:00
[ TCA_FLOWER_KEY_FLAGS ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_FLAGS_MASK ] = { . type = NLA_U32 } ,
2016-12-07 15:48:28 +03:00
[ TCA_FLOWER_KEY_ICMPV4_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_CODE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV4_CODE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_CODE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ICMPV6_CODE_MASK ] = { . type = NLA_U8 } ,
2017-01-11 16:05:43 +03:00
[ TCA_FLOWER_KEY_ARP_SIP ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_SIP_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_TIP ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_TIP_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ARP_OP ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ARP_OP_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ARP_SHA ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_SHA_MASK ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_THA ] = { . len = ETH_ALEN } ,
[ TCA_FLOWER_KEY_ARP_THA_MASK ] = { . len = ETH_ALEN } ,
2017-04-22 23:52:47 +03:00
[ TCA_FLOWER_KEY_MPLS_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_BOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_TC ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_LABEL ] = { . type = NLA_U32 } ,
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
[ TCA_FLOWER_KEY_MPLS_OPTS ] = { . type = NLA_NESTED } ,
2017-05-23 19:40:45 +03:00
[ TCA_FLOWER_KEY_TCP_FLAGS ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_TCP_FLAGS_MASK ] = { . type = NLA_U16 } ,
2017-06-01 21:37:38 +03:00
[ TCA_FLOWER_KEY_IP_TOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TOS_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_IP_TTL_MASK ] = { . type = NLA_U8 } ,
2018-07-06 08:38:16 +03:00
[ TCA_FLOWER_KEY_CVLAN_ID ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_CVLAN_PRIO ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_CVLAN_ETH_TYPE ] = { . type = NLA_U16 } ,
2018-07-17 19:27:18 +03:00
[ TCA_FLOWER_KEY_ENC_IP_TOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_IP_TOS_MASK ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_IP_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_IP_TTL_MASK ] = { . type = NLA_U8 } ,
2018-08-07 18:36:01 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS ] = { . type = NLA_NESTED } ,
[ TCA_FLOWER_KEY_ENC_OPTS_MASK ] = { . type = NLA_NESTED } ,
2021-02-09 09:37:49 +03:00
[ TCA_FLOWER_KEY_CT_STATE ] =
NLA_POLICY_MASK ( NLA_U16 , TCA_FLOWER_KEY_CT_FLAGS_MASK ) ,
[ TCA_FLOWER_KEY_CT_STATE_MASK ] =
NLA_POLICY_MASK ( NLA_U16 , TCA_FLOWER_KEY_CT_FLAGS_MASK ) ,
2019-07-09 10:30:50 +03:00
[ TCA_FLOWER_KEY_CT_ZONE ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_CT_ZONE_MASK ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_CT_MARK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_CT_MARK_MASK ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_CT_LABELS ] = { . type = NLA_BINARY ,
. len = 128 / BITS_PER_BYTE } ,
[ TCA_FLOWER_KEY_CT_LABELS_MASK ] = { . type = NLA_BINARY ,
. len = 128 / BITS_PER_BYTE } ,
2020-02-11 21:33:40 +03:00
[ TCA_FLOWER_FLAGS ] = { . type = NLA_U32 } ,
2020-07-23 01:03:01 +03:00
[ TCA_FLOWER_KEY_HASH ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_HASH_MASK ] = { . type = NLA_U32 } ,
2018-08-07 18:36:01 +03:00
} ;
static const struct nla_policy
enc_opts_policy [ TCA_FLOWER_KEY_ENC_OPTS_MAX + 1 ] = {
2019-11-21 13:03:28 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS_UNSPEC ] = {
. strict_start_type = TCA_FLOWER_KEY_ENC_OPTS_VXLAN } ,
2018-08-07 18:36:01 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS_GENEVE ] = { . type = NLA_NESTED } ,
2019-11-21 13:03:28 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS_VXLAN ] = { . type = NLA_NESTED } ,
2019-11-21 13:03:29 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS_ERSPAN ] = { . type = NLA_NESTED } ,
2022-03-04 19:40:45 +03:00
[ TCA_FLOWER_KEY_ENC_OPTS_GTP ] = { . type = NLA_NESTED } ,
2018-08-07 18:36:01 +03:00
} ;
static const struct nla_policy
geneve_opt_policy [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_MAX + 1 ] = {
[ TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS ] = { . type = NLA_U16 } ,
[ TCA_FLOWER_KEY_ENC_OPT_GENEVE_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_OPT_GENEVE_DATA ] = { . type = NLA_BINARY ,
. len = 128 } ,
2015-05-12 15:56:21 +03:00
} ;
2019-11-21 13:03:28 +03:00
static const struct nla_policy
vxlan_opt_policy [ TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX + 1 ] = {
[ TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP ] = { . type = NLA_U32 } ,
} ;
2019-11-21 13:03:29 +03:00
static const struct nla_policy
erspan_opt_policy [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX + 1 ] = {
[ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX ] = { . type = NLA_U32 } ,
[ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID ] = { . type = NLA_U8 } ,
} ;
2022-03-04 19:40:45 +03:00
static const struct nla_policy
gtp_opt_policy [ TCA_FLOWER_KEY_ENC_OPT_GTP_MAX + 1 ] = {
[ TCA_FLOWER_KEY_ENC_OPT_GTP_PDU_TYPE ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_ENC_OPT_GTP_QFI ] = { . type = NLA_U8 } ,
} ;
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
static const struct nla_policy
mpls_stack_entry_policy [ TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX + 1 ] = {
[ TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_OPT_LSE_TC ] = { . type = NLA_U8 } ,
[ TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL ] = { . type = NLA_U32 } ,
} ;
2015-05-12 15:56:21 +03:00
static void fl_set_key_val ( struct nlattr * * tb ,
void * val , int val_type ,
void * mask , int mask_type , int len )
{
if ( ! tb [ val_type ] )
return ;
2019-07-09 10:30:50 +03:00
nla_memcpy ( val , tb [ val_type ] , len ) ;
2015-05-12 15:56:21 +03:00
if ( mask_type = = TCA_FLOWER_UNSPEC | | ! tb [ mask_type ] )
memset ( mask , 0xff , len ) ;
else
2019-07-09 10:30:50 +03:00
nla_memcpy ( mask , tb [ mask_type ] , len ) ;
2015-05-12 15:56:21 +03:00
}
2018-11-13 03:15:55 +03:00
static int fl_set_key_port_range ( struct nlattr * * tb , struct fl_flow_key * key ,
2020-03-23 23:48:51 +03:00
struct fl_flow_key * mask ,
struct netlink_ext_ack * extack )
2018-11-13 03:15:55 +03:00
{
2019-12-03 13:40:12 +03:00
fl_set_key_val ( tb , & key - > tp_range . tp_min . dst ,
TCA_FLOWER_KEY_PORT_DST_MIN , & mask - > tp_range . tp_min . dst ,
TCA_FLOWER_UNSPEC , sizeof ( key - > tp_range . tp_min . dst ) ) ;
fl_set_key_val ( tb , & key - > tp_range . tp_max . dst ,
TCA_FLOWER_KEY_PORT_DST_MAX , & mask - > tp_range . tp_max . dst ,
TCA_FLOWER_UNSPEC , sizeof ( key - > tp_range . tp_max . dst ) ) ;
fl_set_key_val ( tb , & key - > tp_range . tp_min . src ,
TCA_FLOWER_KEY_PORT_SRC_MIN , & mask - > tp_range . tp_min . src ,
TCA_FLOWER_UNSPEC , sizeof ( key - > tp_range . tp_min . src ) ) ;
fl_set_key_val ( tb , & key - > tp_range . tp_max . src ,
TCA_FLOWER_KEY_PORT_SRC_MAX , & mask - > tp_range . tp_max . src ,
TCA_FLOWER_UNSPEC , sizeof ( key - > tp_range . tp_max . src ) ) ;
2020-03-23 23:48:51 +03:00
if ( mask - > tp_range . tp_min . dst & & mask - > tp_range . tp_max . dst & &
2021-03-22 00:05:48 +03:00
ntohs ( key - > tp_range . tp_max . dst ) < =
ntohs ( key - > tp_range . tp_min . dst ) ) {
2020-03-23 23:48:51 +03:00
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_PORT_DST_MIN ] ,
" Invalid destination port range (min must be strictly smaller than max) " ) ;
2018-11-13 03:15:55 +03:00
return - EINVAL ;
2020-03-23 23:48:51 +03:00
}
if ( mask - > tp_range . tp_min . src & & mask - > tp_range . tp_max . src & &
2021-03-22 00:05:48 +03:00
ntohs ( key - > tp_range . tp_max . src ) < =
ntohs ( key - > tp_range . tp_min . src ) ) {
2020-03-23 23:48:51 +03:00
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_PORT_SRC_MIN ] ,
" Invalid source port range (min must be strictly smaller than max) " ) ;
return - EINVAL ;
}
2018-11-13 03:15:55 +03:00
return 0 ;
}
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
static int fl_set_key_mpls_lse ( const struct nlattr * nla_lse ,
struct flow_dissector_key_mpls * key_val ,
struct flow_dissector_key_mpls * key_mask ,
struct netlink_ext_ack * extack )
{
struct nlattr * tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX + 1 ] ;
struct flow_dissector_mpls_lse * lse_mask ;
struct flow_dissector_mpls_lse * lse_val ;
u8 lse_index ;
u8 depth ;
int err ;
err = nla_parse_nested ( tb , TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX , nla_lse ,
mpls_stack_entry_policy , extack ) ;
if ( err < 0 )
return err ;
if ( ! tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH ] ) {
NL_SET_ERR_MSG ( extack , " Missing MPLS option \" depth \" " ) ;
return - EINVAL ;
}
depth = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH ] ) ;
/* LSE depth starts at 1, for consistency with terminology used by
* RFC 3031 ( section 3.9 ) , where depth 0 refers to unlabeled packets .
*/
if ( depth < 1 | | depth > FLOW_DIS_MPLS_MAX ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH ] ,
" Invalid MPLS depth " ) ;
return - EINVAL ;
}
lse_index = depth - 1 ;
dissector_set_mpls_lse ( key_val , lse_index ) ;
dissector_set_mpls_lse ( key_mask , lse_index ) ;
lse_val = & key_val - > ls [ lse_index ] ;
lse_mask = & key_mask - > ls [ lse_index ] ;
if ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL ] ) {
lse_val - > mpls_ttl = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL ] ) ;
lse_mask - > mpls_ttl = MPLS_TTL_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS ] ) {
u8 bos = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS ] ) ;
if ( bos & ~ MPLS_BOS_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS ] ,
" Bottom Of Stack (BOS) must be 0 or 1 " ) ;
return - EINVAL ;
}
lse_val - > mpls_bos = bos ;
lse_mask - > mpls_bos = MPLS_BOS_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_TC ] ) {
u8 tc = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_TC ] ) ;
if ( tc & ~ MPLS_TC_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_TC ] ,
" Traffic Class (TC) must be between 0 and 7 " ) ;
return - EINVAL ;
}
lse_val - > mpls_tc = tc ;
lse_mask - > mpls_tc = MPLS_TC_MASK ;
}
if ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL ] ) {
u32 label = nla_get_u32 ( tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL ] ) ;
if ( label & ~ MPLS_LABEL_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL ] ,
" Label must be between 0 and 1048575 " ) ;
return - EINVAL ;
}
lse_val - > mpls_label = label ;
lse_mask - > mpls_label = MPLS_LABEL_MASK ;
}
return 0 ;
}
static int fl_set_key_mpls_opts ( const struct nlattr * nla_mpls_opts ,
struct flow_dissector_key_mpls * key_val ,
struct flow_dissector_key_mpls * key_mask ,
struct netlink_ext_ack * extack )
{
struct nlattr * nla_lse ;
int rem ;
int err ;
if ( ! ( nla_mpls_opts - > nla_type & NLA_F_NESTED ) ) {
NL_SET_ERR_MSG_ATTR ( extack , nla_mpls_opts ,
" NLA_F_NESTED is missing " ) ;
return - EINVAL ;
}
nla_for_each_nested ( nla_lse , nla_mpls_opts , rem ) {
if ( nla_type ( nla_lse ) ! = TCA_FLOWER_KEY_MPLS_OPTS_LSE ) {
NL_SET_ERR_MSG_ATTR ( extack , nla_lse ,
" Invalid MPLS option type " ) ;
return - EINVAL ;
}
err = fl_set_key_mpls_lse ( nla_lse , key_val , key_mask , extack ) ;
if ( err < 0 )
return err ;
}
if ( rem ) {
NL_SET_ERR_MSG ( extack ,
" Bytes leftover after parsing MPLS options " ) ;
return - EINVAL ;
}
return 0 ;
}
2017-05-01 16:58:40 +03:00
static int fl_set_key_mpls ( struct nlattr * * tb ,
struct flow_dissector_key_mpls * key_val ,
2020-03-23 23:48:49 +03:00
struct flow_dissector_key_mpls * key_mask ,
struct netlink_ext_ack * extack )
2017-04-22 23:52:47 +03:00
{
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
struct flow_dissector_mpls_lse * lse_mask ;
struct flow_dissector_mpls_lse * lse_val ;
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
if ( tb [ TCA_FLOWER_KEY_MPLS_OPTS ] ) {
if ( tb [ TCA_FLOWER_KEY_MPLS_TTL ] | |
tb [ TCA_FLOWER_KEY_MPLS_BOS ] | |
tb [ TCA_FLOWER_KEY_MPLS_TC ] | |
tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_OPTS ] ,
" MPLS label, Traffic Class, Bottom Of Stack and Time To Live must be encapsulated in the MPLS options attribute " ) ;
return - EBADMSG ;
}
return fl_set_key_mpls_opts ( tb [ TCA_FLOWER_KEY_MPLS_OPTS ] ,
key_val , key_mask , extack ) ;
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_val = & key_val - > ls [ 0 ] ;
lse_mask = & key_mask - > ls [ 0 ] ;
2017-04-22 23:52:47 +03:00
if ( tb [ TCA_FLOWER_KEY_MPLS_TTL ] ) {
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_val - > mpls_ttl = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_TTL ] ) ;
lse_mask - > mpls_ttl = MPLS_TTL_MASK ;
dissector_set_mpls_lse ( key_val , 0 ) ;
dissector_set_mpls_lse ( key_mask , 0 ) ;
2017-04-22 23:52:47 +03:00
}
if ( tb [ TCA_FLOWER_KEY_MPLS_BOS ] ) {
2017-05-01 16:58:40 +03:00
u8 bos = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_BOS ] ) ;
2020-03-23 23:48:49 +03:00
if ( bos & ~ MPLS_BOS_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_BOS ] ,
" Bottom Of Stack (BOS) must be 0 or 1 " ) ;
2017-05-01 16:58:40 +03:00
return - EINVAL ;
2020-03-23 23:48:49 +03:00
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_val - > mpls_bos = bos ;
lse_mask - > mpls_bos = MPLS_BOS_MASK ;
dissector_set_mpls_lse ( key_val , 0 ) ;
dissector_set_mpls_lse ( key_mask , 0 ) ;
2017-04-22 23:52:47 +03:00
}
if ( tb [ TCA_FLOWER_KEY_MPLS_TC ] ) {
2017-05-01 16:58:40 +03:00
u8 tc = nla_get_u8 ( tb [ TCA_FLOWER_KEY_MPLS_TC ] ) ;
2020-03-23 23:48:49 +03:00
if ( tc & ~ MPLS_TC_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_TC ] ,
" Traffic Class (TC) must be between 0 and 7 " ) ;
2017-05-01 16:58:40 +03:00
return - EINVAL ;
2020-03-23 23:48:49 +03:00
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_val - > mpls_tc = tc ;
lse_mask - > mpls_tc = MPLS_TC_MASK ;
dissector_set_mpls_lse ( key_val , 0 ) ;
dissector_set_mpls_lse ( key_mask , 0 ) ;
2017-04-22 23:52:47 +03:00
}
if ( tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ) {
2017-05-01 16:58:40 +03:00
u32 label = nla_get_u32 ( tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ) ;
2020-03-23 23:48:49 +03:00
if ( label & ~ MPLS_LABEL_MASK ) {
NL_SET_ERR_MSG_ATTR ( extack ,
tb [ TCA_FLOWER_KEY_MPLS_LABEL ] ,
" Label must be between 0 and 1048575 " ) ;
2017-05-01 16:58:40 +03:00
return - EINVAL ;
2020-03-23 23:48:49 +03:00
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_val - > mpls_label = label ;
lse_mask - > mpls_label = MPLS_LABEL_MASK ;
dissector_set_mpls_lse ( key_val , 0 ) ;
dissector_set_mpls_lse ( key_mask , 0 ) ;
2017-04-22 23:52:47 +03:00
}
2017-05-01 16:58:40 +03:00
return 0 ;
2017-04-22 23:52:47 +03:00
}
2016-08-17 13:36:13 +03:00
static void fl_set_key_vlan ( struct nlattr * * tb ,
2018-07-06 08:38:13 +03:00
__be16 ethertype ,
2018-07-06 08:38:16 +03:00
int vlan_id_key , int vlan_prio_key ,
2022-04-06 14:22:41 +03:00
int vlan_next_eth_type_key ,
2016-08-17 13:36:13 +03:00
struct flow_dissector_key_vlan * key_val ,
struct flow_dissector_key_vlan * key_mask )
{
# define VLAN_PRIORITY_MASK 0x7
2018-07-06 08:38:16 +03:00
if ( tb [ vlan_id_key ] ) {
2016-08-17 13:36:13 +03:00
key_val - > vlan_id =
2018-07-06 08:38:16 +03:00
nla_get_u16 ( tb [ vlan_id_key ] ) & VLAN_VID_MASK ;
2016-08-17 13:36:13 +03:00
key_mask - > vlan_id = VLAN_VID_MASK ;
}
2018-07-06 08:38:16 +03:00
if ( tb [ vlan_prio_key ] ) {
2016-08-17 13:36:13 +03:00
key_val - > vlan_priority =
2018-07-06 08:38:16 +03:00
nla_get_u8 ( tb [ vlan_prio_key ] ) &
2016-08-17 13:36:13 +03:00
VLAN_PRIORITY_MASK ;
key_mask - > vlan_priority = VLAN_PRIORITY_MASK ;
}
2018-07-06 08:38:13 +03:00
key_val - > vlan_tpid = ethertype ;
key_mask - > vlan_tpid = cpu_to_be16 ( ~ 0 ) ;
2022-04-06 14:22:41 +03:00
if ( tb [ vlan_next_eth_type_key ] ) {
key_val - > vlan_eth_type =
nla_get_be16 ( tb [ vlan_next_eth_type_key ] ) ;
key_mask - > vlan_eth_type = cpu_to_be16 ( ~ 0 ) ;
}
2016-08-17 13:36:13 +03:00
}
2016-12-07 15:03:10 +03:00
static void fl_set_key_flag ( u32 flower_key , u32 flower_mask ,
u32 * dissector_key , u32 * dissector_mask ,
u32 flower_flag_bit , u32 dissector_flag_bit )
{
if ( flower_mask & flower_flag_bit ) {
* dissector_mask | = dissector_flag_bit ;
if ( flower_key & flower_flag_bit )
* dissector_key | = dissector_flag_bit ;
}
}
2020-03-23 23:48:53 +03:00
static int fl_set_key_flags ( struct nlattr * * tb , u32 * flags_key ,
u32 * flags_mask , struct netlink_ext_ack * extack )
2016-12-07 15:03:10 +03:00
{
u32 key , mask ;
2016-12-22 15:28:15 +03:00
/* mask is mandatory for flags */
2020-03-23 23:48:53 +03:00
if ( ! tb [ TCA_FLOWER_KEY_FLAGS_MASK ] ) {
NL_SET_ERR_MSG ( extack , " Missing flags mask " ) ;
2016-12-22 15:28:15 +03:00
return - EINVAL ;
2020-03-23 23:48:53 +03:00
}
2016-12-07 15:03:10 +03:00
2021-03-22 00:05:49 +03:00
key = be32_to_cpu ( nla_get_be32 ( tb [ TCA_FLOWER_KEY_FLAGS ] ) ) ;
mask = be32_to_cpu ( nla_get_be32 ( tb [ TCA_FLOWER_KEY_FLAGS_MASK ] ) ) ;
2016-12-07 15:03:10 +03:00
* flags_key = 0 ;
* flags_mask = 0 ;
fl_set_key_flag ( key , mask , flags_key , flags_mask ,
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT , FLOW_DIS_IS_FRAGMENT ) ;
2018-03-06 20:11:14 +03:00
fl_set_key_flag ( key , mask , flags_key , flags_mask ,
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST ,
FLOW_DIS_FIRST_FRAG ) ;
2016-12-22 15:28:15 +03:00
return 0 ;
2016-12-07 15:03:10 +03:00
}
2018-07-17 19:27:18 +03:00
static void fl_set_key_ip ( struct nlattr * * tb , bool encap ,
2017-06-01 21:37:38 +03:00
struct flow_dissector_key_ip * key ,
struct flow_dissector_key_ip * mask )
{
2018-07-17 19:27:18 +03:00
int tos_key = encap ? TCA_FLOWER_KEY_ENC_IP_TOS : TCA_FLOWER_KEY_IP_TOS ;
int ttl_key = encap ? TCA_FLOWER_KEY_ENC_IP_TTL : TCA_FLOWER_KEY_IP_TTL ;
int tos_mask = encap ? TCA_FLOWER_KEY_ENC_IP_TOS_MASK : TCA_FLOWER_KEY_IP_TOS_MASK ;
int ttl_mask = encap ? TCA_FLOWER_KEY_ENC_IP_TTL_MASK : TCA_FLOWER_KEY_IP_TTL_MASK ;
2017-06-01 21:37:38 +03:00
2018-07-17 19:27:18 +03:00
fl_set_key_val ( tb , & key - > tos , tos_key , & mask - > tos , tos_mask , sizeof ( key - > tos ) ) ;
fl_set_key_val ( tb , & key - > ttl , ttl_key , & mask - > ttl , ttl_mask , sizeof ( key - > ttl ) ) ;
2017-06-01 21:37:38 +03:00
}
2018-08-07 18:36:01 +03:00
static int fl_set_geneve_opt ( const struct nlattr * nla , struct fl_flow_key * key ,
int depth , int option_len ,
struct netlink_ext_ack * extack )
{
struct nlattr * tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_MAX + 1 ] ;
struct nlattr * class = NULL , * type = NULL , * data = NULL ;
struct geneve_opt * opt ;
int err , data_len = 0 ;
if ( option_len > sizeof ( struct geneve_opt ) )
data_len = option_len - sizeof ( struct geneve_opt ) ;
opt = ( struct geneve_opt * ) & key - > enc_opts . data [ key - > enc_opts . len ] ;
memset ( opt , 0xff , option_len ) ;
opt - > length = data_len / 4 ;
opt - > r1 = 0 ;
opt - > r2 = 0 ;
opt - > r3 = 0 ;
/* If no mask has been prodived we assume an exact match. */
if ( ! depth )
return sizeof ( struct geneve_opt ) + data_len ;
if ( nla_type ( nla ) ! = TCA_FLOWER_KEY_ENC_OPTS_GENEVE ) {
NL_SET_ERR_MSG ( extack , " Non-geneve option type for mask " ) ;
return - EINVAL ;
}
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 15:07:28 +03:00
err = nla_parse_nested_deprecated ( tb ,
TCA_FLOWER_KEY_ENC_OPT_GENEVE_MAX ,
nla , geneve_opt_policy , extack ) ;
2018-08-07 18:36:01 +03:00
if ( err < 0 )
return err ;
/* We are not allowed to omit any of CLASS, TYPE or DATA
* fields from the key .
*/
if ( ! option_len & &
( ! tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS ] | |
! tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_TYPE ] | |
! tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_DATA ] ) ) {
NL_SET_ERR_MSG ( extack , " Missing tunnel key geneve option class, type or data " ) ;
return - EINVAL ;
}
/* Omitting any of CLASS, TYPE or DATA fields is allowed
* for the mask .
*/
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_DATA ] ) {
int new_len = key - > enc_opts . len ;
data = tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_DATA ] ;
data_len = nla_len ( data ) ;
if ( data_len < 4 ) {
NL_SET_ERR_MSG ( extack , " Tunnel key geneve option data is less than 4 bytes long " ) ;
return - ERANGE ;
}
if ( data_len % 4 ) {
NL_SET_ERR_MSG ( extack , " Tunnel key geneve option data is not a multiple of 4 bytes long " ) ;
return - ERANGE ;
}
new_len + = sizeof ( struct geneve_opt ) + data_len ;
BUILD_BUG_ON ( FLOW_DIS_TUN_OPTS_MAX ! = IP_TUNNEL_OPTS_MAX ) ;
if ( new_len > FLOW_DIS_TUN_OPTS_MAX ) {
NL_SET_ERR_MSG ( extack , " Tunnel options exceeds max size " ) ;
return - ERANGE ;
}
opt - > length = data_len / 4 ;
memcpy ( opt - > opt_data , nla_data ( data ) , data_len ) ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS ] ) {
class = tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS ] ;
opt - > opt_class = nla_get_be16 ( class ) ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_TYPE ] ) {
type = tb [ TCA_FLOWER_KEY_ENC_OPT_GENEVE_TYPE ] ;
opt - > type = nla_get_u8 ( type ) ;
}
return sizeof ( struct geneve_opt ) + data_len ;
}
2019-11-21 13:03:28 +03:00
static int fl_set_vxlan_opt ( const struct nlattr * nla , struct fl_flow_key * key ,
int depth , int option_len ,
struct netlink_ext_ack * extack )
{
struct nlattr * tb [ TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX + 1 ] ;
struct vxlan_metadata * md ;
int err ;
md = ( struct vxlan_metadata * ) & key - > enc_opts . data [ key - > enc_opts . len ] ;
memset ( md , 0xff , sizeof ( * md ) ) ;
if ( ! depth )
return sizeof ( * md ) ;
if ( nla_type ( nla ) ! = TCA_FLOWER_KEY_ENC_OPTS_VXLAN ) {
NL_SET_ERR_MSG ( extack , " Non-vxlan option type for mask " ) ;
return - EINVAL ;
}
err = nla_parse_nested ( tb , TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX , nla ,
vxlan_opt_policy , extack ) ;
if ( err < 0 )
return err ;
if ( ! option_len & & ! tb [ TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP ] ) {
NL_SET_ERR_MSG ( extack , " Missing tunnel key vxlan option gbp " ) ;
return - EINVAL ;
}
2020-09-13 14:51:50 +03:00
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP ] ) {
2019-11-21 13:03:28 +03:00
md - > gbp = nla_get_u32 ( tb [ TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP ] ) ;
2020-09-13 14:51:50 +03:00
md - > gbp & = VXLAN_GBP_MASK ;
}
2019-11-21 13:03:28 +03:00
return sizeof ( * md ) ;
}
2019-11-21 13:03:29 +03:00
static int fl_set_erspan_opt ( const struct nlattr * nla , struct fl_flow_key * key ,
int depth , int option_len ,
struct netlink_ext_ack * extack )
{
struct nlattr * tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX + 1 ] ;
struct erspan_metadata * md ;
int err ;
md = ( struct erspan_metadata * ) & key - > enc_opts . data [ key - > enc_opts . len ] ;
memset ( md , 0xff , sizeof ( * md ) ) ;
md - > version = 1 ;
if ( ! depth )
return sizeof ( * md ) ;
if ( nla_type ( nla ) ! = TCA_FLOWER_KEY_ENC_OPTS_ERSPAN ) {
NL_SET_ERR_MSG ( extack , " Non-erspan option type for mask " ) ;
return - EINVAL ;
}
err = nla_parse_nested ( tb , TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX , nla ,
erspan_opt_policy , extack ) ;
if ( err < 0 )
return err ;
if ( ! option_len & & ! tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER ] ) {
NL_SET_ERR_MSG ( extack , " Missing tunnel key erspan option ver " ) ;
return - EINVAL ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER ] )
md - > version = nla_get_u8 ( tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER ] ) ;
if ( md - > version = = 1 ) {
if ( ! option_len & & ! tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX ] ) {
NL_SET_ERR_MSG ( extack , " Missing tunnel key erspan option index " ) ;
return - EINVAL ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX ] ) {
nla = tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX ] ;
2020-09-13 14:43:03 +03:00
memset ( & md - > u , 0x00 , sizeof ( md - > u ) ) ;
2019-11-21 13:03:29 +03:00
md - > u . index = nla_get_be32 ( nla ) ;
}
} else if ( md - > version = = 2 ) {
if ( ! option_len & & ( ! tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR ] | |
! tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID ] ) ) {
NL_SET_ERR_MSG ( extack , " Missing tunnel key erspan option dir or hwid " ) ;
return - EINVAL ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR ] ) {
nla = tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR ] ;
md - > u . md2 . dir = nla_get_u8 ( nla ) ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID ] ) {
nla = tb [ TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID ] ;
set_hwid ( & md - > u . md2 , nla_get_u8 ( nla ) ) ;
}
} else {
NL_SET_ERR_MSG ( extack , " Tunnel key erspan option ver is incorrect " ) ;
return - EINVAL ;
}
return sizeof ( * md ) ;
}
2022-03-04 19:40:45 +03:00
static int fl_set_gtp_opt ( const struct nlattr * nla , struct fl_flow_key * key ,
int depth , int option_len ,
struct netlink_ext_ack * extack )
{
struct nlattr * tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_MAX + 1 ] ;
struct gtp_pdu_session_info * sinfo ;
u8 len = key - > enc_opts . len ;
int err ;
sinfo = ( struct gtp_pdu_session_info * ) & key - > enc_opts . data [ len ] ;
memset ( sinfo , 0xff , option_len ) ;
if ( ! depth )
return sizeof ( * sinfo ) ;
if ( nla_type ( nla ) ! = TCA_FLOWER_KEY_ENC_OPTS_GTP ) {
NL_SET_ERR_MSG_MOD ( extack , " Non-gtp option type for mask " ) ;
return - EINVAL ;
}
err = nla_parse_nested ( tb , TCA_FLOWER_KEY_ENC_OPT_GTP_MAX , nla ,
gtp_opt_policy , extack ) ;
if ( err < 0 )
return err ;
if ( ! option_len & &
( ! tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_PDU_TYPE ] | |
! tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_QFI ] ) ) {
NL_SET_ERR_MSG_MOD ( extack ,
" Missing tunnel key gtp option pdu type or qfi " ) ;
return - EINVAL ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_PDU_TYPE ] )
sinfo - > pdu_type =
nla_get_u8 ( tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_PDU_TYPE ] ) ;
if ( tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_QFI ] )
sinfo - > qfi = nla_get_u8 ( tb [ TCA_FLOWER_KEY_ENC_OPT_GTP_QFI ] ) ;
return sizeof ( * sinfo ) ;
}
2018-08-07 18:36:01 +03:00
static int fl_set_enc_opt ( struct nlattr * * tb , struct fl_flow_key * key ,
struct fl_flow_key * mask ,
struct netlink_ext_ack * extack )
{
const struct nlattr * nla_enc_key , * nla_opt_key , * nla_opt_msk = NULL ;
2018-11-10 08:06:26 +03:00
int err , option_len , key_depth , msk_depth = 0 ;
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 15:07:28 +03:00
err = nla_validate_nested_deprecated ( tb [ TCA_FLOWER_KEY_ENC_OPTS ] ,
TCA_FLOWER_KEY_ENC_OPTS_MAX ,
enc_opts_policy , extack ) ;
2018-11-10 08:06:26 +03:00
if ( err )
return err ;
2018-08-07 18:36:01 +03:00
nla_enc_key = nla_data ( tb [ TCA_FLOWER_KEY_ENC_OPTS ] ) ;
if ( tb [ TCA_FLOWER_KEY_ENC_OPTS_MASK ] ) {
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 15:07:28 +03:00
err = nla_validate_nested_deprecated ( tb [ TCA_FLOWER_KEY_ENC_OPTS_MASK ] ,
TCA_FLOWER_KEY_ENC_OPTS_MAX ,
enc_opts_policy , extack ) ;
2018-11-10 08:06:26 +03:00
if ( err )
return err ;
2018-08-07 18:36:01 +03:00
nla_opt_msk = nla_data ( tb [ TCA_FLOWER_KEY_ENC_OPTS_MASK ] ) ;
msk_depth = nla_len ( tb [ TCA_FLOWER_KEY_ENC_OPTS_MASK ] ) ;
2021-01-15 21:50:24 +03:00
if ( ! nla_ok ( nla_opt_msk , msk_depth ) ) {
NL_SET_ERR_MSG ( extack , " Invalid nested attribute for masks " ) ;
return - EINVAL ;
}
2018-08-07 18:36:01 +03:00
}
nla_for_each_attr ( nla_opt_key , nla_enc_key ,
nla_len ( tb [ TCA_FLOWER_KEY_ENC_OPTS ] ) , key_depth ) {
switch ( nla_type ( nla_opt_key ) ) {
case TCA_FLOWER_KEY_ENC_OPTS_GENEVE :
2019-11-21 13:03:28 +03:00
if ( key - > enc_opts . dst_opt_type & &
key - > enc_opts . dst_opt_type ! = TUNNEL_GENEVE_OPT ) {
NL_SET_ERR_MSG ( extack , " Duplicate type for geneve options " ) ;
return - EINVAL ;
}
2018-08-07 18:36:01 +03:00
option_len = 0 ;
key - > enc_opts . dst_opt_type = TUNNEL_GENEVE_OPT ;
option_len = fl_set_geneve_opt ( nla_opt_key , key ,
key_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
key - > enc_opts . len + = option_len ;
/* At the same time we need to parse through the mask
* in order to verify exact and mask attribute lengths .
*/
mask - > enc_opts . dst_opt_type = TUNNEL_GENEVE_OPT ;
option_len = fl_set_geneve_opt ( nla_opt_msk , mask ,
msk_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
mask - > enc_opts . len + = option_len ;
if ( key - > enc_opts . len ! = mask - > enc_opts . len ) {
NL_SET_ERR_MSG ( extack , " Key and mask miss aligned " ) ;
return - EINVAL ;
}
2019-11-21 13:03:28 +03:00
break ;
case TCA_FLOWER_KEY_ENC_OPTS_VXLAN :
if ( key - > enc_opts . dst_opt_type ) {
NL_SET_ERR_MSG ( extack , " Duplicate type for vxlan options " ) ;
return - EINVAL ;
}
option_len = 0 ;
key - > enc_opts . dst_opt_type = TUNNEL_VXLAN_OPT ;
option_len = fl_set_vxlan_opt ( nla_opt_key , key ,
key_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
key - > enc_opts . len + = option_len ;
/* At the same time we need to parse through the mask
* in order to verify exact and mask attribute lengths .
*/
mask - > enc_opts . dst_opt_type = TUNNEL_VXLAN_OPT ;
option_len = fl_set_vxlan_opt ( nla_opt_msk , mask ,
msk_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
mask - > enc_opts . len + = option_len ;
if ( key - > enc_opts . len ! = mask - > enc_opts . len ) {
NL_SET_ERR_MSG ( extack , " Key and mask miss aligned " ) ;
return - EINVAL ;
}
2019-11-21 13:03:29 +03:00
break ;
case TCA_FLOWER_KEY_ENC_OPTS_ERSPAN :
if ( key - > enc_opts . dst_opt_type ) {
NL_SET_ERR_MSG ( extack , " Duplicate type for erspan options " ) ;
return - EINVAL ;
}
option_len = 0 ;
key - > enc_opts . dst_opt_type = TUNNEL_ERSPAN_OPT ;
option_len = fl_set_erspan_opt ( nla_opt_key , key ,
key_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
key - > enc_opts . len + = option_len ;
/* At the same time we need to parse through the mask
* in order to verify exact and mask attribute lengths .
*/
mask - > enc_opts . dst_opt_type = TUNNEL_ERSPAN_OPT ;
option_len = fl_set_erspan_opt ( nla_opt_msk , mask ,
msk_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
mask - > enc_opts . len + = option_len ;
if ( key - > enc_opts . len ! = mask - > enc_opts . len ) {
NL_SET_ERR_MSG ( extack , " Key and mask miss aligned " ) ;
return - EINVAL ;
}
2018-08-07 18:36:01 +03:00
break ;
2022-03-04 19:40:45 +03:00
case TCA_FLOWER_KEY_ENC_OPTS_GTP :
if ( key - > enc_opts . dst_opt_type ) {
NL_SET_ERR_MSG_MOD ( extack ,
" Duplicate type for gtp options " ) ;
return - EINVAL ;
}
option_len = 0 ;
key - > enc_opts . dst_opt_type = TUNNEL_GTP_OPT ;
option_len = fl_set_gtp_opt ( nla_opt_key , key ,
key_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
key - > enc_opts . len + = option_len ;
/* At the same time we need to parse through the mask
* in order to verify exact and mask attribute lengths .
*/
mask - > enc_opts . dst_opt_type = TUNNEL_GTP_OPT ;
option_len = fl_set_gtp_opt ( nla_opt_msk , mask ,
msk_depth , option_len ,
extack ) ;
if ( option_len < 0 )
return option_len ;
mask - > enc_opts . len + = option_len ;
if ( key - > enc_opts . len ! = mask - > enc_opts . len ) {
NL_SET_ERR_MSG_MOD ( extack ,
" Key and mask miss aligned " ) ;
return - EINVAL ;
}
break ;
2018-08-07 18:36:01 +03:00
default :
NL_SET_ERR_MSG ( extack , " Unknown tunnel option type " ) ;
return - EINVAL ;
}
2021-01-15 21:50:24 +03:00
if ( ! msk_depth )
continue ;
if ( ! nla_ok ( nla_opt_msk , msk_depth ) ) {
NL_SET_ERR_MSG ( extack , " A mask attribute is invalid " ) ;
return - EINVAL ;
}
nla_opt_msk = nla_next ( nla_opt_msk , & msk_depth ) ;
2018-08-07 18:36:01 +03:00
}
return 0 ;
}
2021-02-09 09:37:49 +03:00
static int fl_validate_ct_state ( u16 state , struct nlattr * tb ,
struct netlink_ext_ack * extack )
{
if ( state & & ! ( state & TCA_FLOWER_KEY_CT_FLAGS_TRACKED ) ) {
NL_SET_ERR_MSG_ATTR ( extack , tb ,
" no trk, so no other flag can be set " ) ;
return - EINVAL ;
}
if ( state & TCA_FLOWER_KEY_CT_FLAGS_NEW & &
state & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED ) {
NL_SET_ERR_MSG_ATTR ( extack , tb ,
" new and est are mutually exclusive " ) ;
return - EINVAL ;
}
2021-02-23 10:11:55 +03:00
if ( state & TCA_FLOWER_KEY_CT_FLAGS_INVALID & &
state & ~ ( TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
TCA_FLOWER_KEY_CT_FLAGS_INVALID ) ) {
NL_SET_ERR_MSG_ATTR ( extack , tb ,
" when inv is set, only trk may be set " ) ;
return - EINVAL ;
}
if ( state & TCA_FLOWER_KEY_CT_FLAGS_NEW & &
state & TCA_FLOWER_KEY_CT_FLAGS_REPLY ) {
NL_SET_ERR_MSG_ATTR ( extack , tb ,
" new and rpl are mutually exclusive " ) ;
return - EINVAL ;
}
2021-02-09 09:37:49 +03:00
return 0 ;
}
2019-07-09 10:30:50 +03:00
static int fl_set_key_ct ( struct nlattr * * tb ,
struct flow_dissector_key_ct * key ,
struct flow_dissector_key_ct * mask ,
struct netlink_ext_ack * extack )
{
if ( tb [ TCA_FLOWER_KEY_CT_STATE ] ) {
2021-02-09 09:37:49 +03:00
int err ;
2019-07-09 10:30:50 +03:00
if ( ! IS_ENABLED ( CONFIG_NF_CONNTRACK ) ) {
NL_SET_ERR_MSG ( extack , " Conntrack isn't enabled " ) ;
return - EOPNOTSUPP ;
}
fl_set_key_val ( tb , & key - > ct_state , TCA_FLOWER_KEY_CT_STATE ,
& mask - > ct_state , TCA_FLOWER_KEY_CT_STATE_MASK ,
sizeof ( key - > ct_state ) ) ;
2021-02-09 09:37:49 +03:00
2021-03-17 07:02:43 +03:00
err = fl_validate_ct_state ( key - > ct_state & mask - > ct_state ,
2021-02-09 09:37:49 +03:00
tb [ TCA_FLOWER_KEY_CT_STATE_MASK ] ,
extack ) ;
if ( err )
return err ;
2019-07-09 10:30:50 +03:00
}
if ( tb [ TCA_FLOWER_KEY_CT_ZONE ] ) {
if ( ! IS_ENABLED ( CONFIG_NF_CONNTRACK_ZONES ) ) {
NL_SET_ERR_MSG ( extack , " Conntrack zones isn't enabled " ) ;
return - EOPNOTSUPP ;
}
fl_set_key_val ( tb , & key - > ct_zone , TCA_FLOWER_KEY_CT_ZONE ,
& mask - > ct_zone , TCA_FLOWER_KEY_CT_ZONE_MASK ,
sizeof ( key - > ct_zone ) ) ;
}
if ( tb [ TCA_FLOWER_KEY_CT_MARK ] ) {
if ( ! IS_ENABLED ( CONFIG_NF_CONNTRACK_MARK ) ) {
NL_SET_ERR_MSG ( extack , " Conntrack mark isn't enabled " ) ;
return - EOPNOTSUPP ;
}
fl_set_key_val ( tb , & key - > ct_mark , TCA_FLOWER_KEY_CT_MARK ,
& mask - > ct_mark , TCA_FLOWER_KEY_CT_MARK_MASK ,
sizeof ( key - > ct_mark ) ) ;
}
if ( tb [ TCA_FLOWER_KEY_CT_LABELS ] ) {
if ( ! IS_ENABLED ( CONFIG_NF_CONNTRACK_LABELS ) ) {
NL_SET_ERR_MSG ( extack , " Conntrack labels aren't enabled " ) ;
return - EOPNOTSUPP ;
}
fl_set_key_val ( tb , key - > ct_labels , TCA_FLOWER_KEY_CT_LABELS ,
mask - > ct_labels , TCA_FLOWER_KEY_CT_LABELS_MASK ,
sizeof ( key - > ct_labels ) ) ;
}
return 0 ;
}
2015-05-12 15:56:21 +03:00
static int fl_set_key ( struct net * net , struct nlattr * * tb ,
2018-01-18 19:20:54 +03:00
struct fl_flow_key * key , struct fl_flow_key * mask ,
struct netlink_ext_ack * extack )
2015-05-12 15:56:21 +03:00
{
2016-08-17 13:36:13 +03:00
__be16 ethertype ;
2016-12-22 15:28:15 +03:00
int ret = 0 ;
2019-06-15 12:03:49 +03:00
2015-05-12 15:56:21 +03:00
if ( tb [ TCA_FLOWER_INDEV ] ) {
2018-01-18 19:20:54 +03:00
int err = tcf_change_indev ( net , tb [ TCA_FLOWER_INDEV ] , extack ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
return err ;
2019-06-19 09:41:03 +03:00
key - > meta . ingress_ifindex = err ;
mask - > meta . ingress_ifindex = 0xffffffff ;
2015-05-12 15:56:21 +03:00
}
fl_set_key_val ( tb , key - > eth . dst , TCA_FLOWER_KEY_ETH_DST ,
mask - > eth . dst , TCA_FLOWER_KEY_ETH_DST_MASK ,
sizeof ( key - > eth . dst ) ) ;
fl_set_key_val ( tb , key - > eth . src , TCA_FLOWER_KEY_ETH_SRC ,
mask - > eth . src , TCA_FLOWER_KEY_ETH_SRC_MASK ,
sizeof ( key - > eth . src ) ) ;
2016-01-10 19:47:01 +03:00
2016-08-26 18:25:45 +03:00
if ( tb [ TCA_FLOWER_KEY_ETH_TYPE ] ) {
2016-08-17 13:36:13 +03:00
ethertype = nla_get_be16 ( tb [ TCA_FLOWER_KEY_ETH_TYPE ] ) ;
2018-07-06 08:38:13 +03:00
if ( eth_type_vlan ( ethertype ) ) {
2018-07-06 08:38:16 +03:00
fl_set_key_vlan ( tb , ethertype , TCA_FLOWER_KEY_VLAN_ID ,
2022-04-06 14:22:41 +03:00
TCA_FLOWER_KEY_VLAN_PRIO ,
TCA_FLOWER_KEY_VLAN_ETH_TYPE ,
& key - > vlan , & mask - > vlan ) ;
2018-07-06 08:38:16 +03:00
2018-07-09 05:26:20 +03:00
if ( tb [ TCA_FLOWER_KEY_VLAN_ETH_TYPE ] ) {
ethertype = nla_get_be16 ( tb [ TCA_FLOWER_KEY_VLAN_ETH_TYPE ] ) ;
if ( eth_type_vlan ( ethertype ) ) {
fl_set_key_vlan ( tb , ethertype ,
TCA_FLOWER_KEY_CVLAN_ID ,
TCA_FLOWER_KEY_CVLAN_PRIO ,
2022-04-06 14:22:41 +03:00
TCA_FLOWER_KEY_CVLAN_ETH_TYPE ,
2018-07-09 05:26:20 +03:00
& key - > cvlan , & mask - > cvlan ) ;
fl_set_key_val ( tb , & key - > basic . n_proto ,
TCA_FLOWER_KEY_CVLAN_ETH_TYPE ,
& mask - > basic . n_proto ,
TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . n_proto ) ) ;
} else {
key - > basic . n_proto = ethertype ;
mask - > basic . n_proto = cpu_to_be16 ( ~ 0 ) ;
}
2018-07-06 08:38:16 +03:00
}
2016-08-26 18:25:45 +03:00
} else {
key - > basic . n_proto = ethertype ;
mask - > basic . n_proto = cpu_to_be16 ( ~ 0 ) ;
}
2016-08-17 13:36:13 +03:00
}
2016-01-10 19:47:01 +03:00
2015-05-12 15:56:21 +03:00
if ( key - > basic . n_proto = = htons ( ETH_P_IP ) | |
key - > basic . n_proto = = htons ( ETH_P_IPV6 ) ) {
fl_set_key_val ( tb , & key - > basic . ip_proto , TCA_FLOWER_KEY_IP_PROTO ,
& mask - > basic . ip_proto , TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . ip_proto ) ) ;
2018-07-17 19:27:18 +03:00
fl_set_key_ip ( tb , false , & key - > ip , & mask - > ip ) ;
2015-05-12 15:56:21 +03:00
}
2016-01-10 19:47:01 +03:00
if ( tb [ TCA_FLOWER_KEY_IPV4_SRC ] | | tb [ TCA_FLOWER_KEY_IPV4_DST ] ) {
key - > control . addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > control . addr_type = ~ 0 ;
2015-05-12 15:56:21 +03:00
fl_set_key_val ( tb , & key - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC ,
& mask - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC_MASK ,
sizeof ( key - > ipv4 . src ) ) ;
fl_set_key_val ( tb , & key - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST ,
& mask - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST_MASK ,
sizeof ( key - > ipv4 . dst ) ) ;
2016-01-10 19:47:01 +03:00
} else if ( tb [ TCA_FLOWER_KEY_IPV6_SRC ] | | tb [ TCA_FLOWER_KEY_IPV6_DST ] ) {
key - > control . addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > control . addr_type = ~ 0 ;
2015-05-12 15:56:21 +03:00
fl_set_key_val ( tb , & key - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC ,
& mask - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC_MASK ,
sizeof ( key - > ipv6 . src ) ) ;
fl_set_key_val ( tb , & key - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST ,
& mask - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST_MASK ,
sizeof ( key - > ipv6 . dst ) ) ;
}
2016-01-10 19:47:01 +03:00
2015-05-12 15:56:21 +03:00
if ( key - > basic . ip_proto = = IPPROTO_TCP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_TCP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_TCP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_TCP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_TCP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ;
2017-05-23 19:40:45 +03:00
fl_set_key_val ( tb , & key - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS ,
& mask - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS_MASK ,
sizeof ( key - > tcp . flags ) ) ;
2015-05-12 15:56:21 +03:00
} else if ( key - > basic . ip_proto = = IPPROTO_UDP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_UDP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_UDP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_UDP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_UDP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ;
2016-11-03 15:24:21 +03:00
} else if ( key - > basic . ip_proto = = IPPROTO_SCTP ) {
fl_set_key_val ( tb , & key - > tp . src , TCA_FLOWER_KEY_SCTP_SRC ,
& mask - > tp . src , TCA_FLOWER_KEY_SCTP_SRC_MASK ,
sizeof ( key - > tp . src ) ) ;
fl_set_key_val ( tb , & key - > tp . dst , TCA_FLOWER_KEY_SCTP_DST ,
& mask - > tp . dst , TCA_FLOWER_KEY_SCTP_DST_MASK ,
sizeof ( key - > tp . dst ) ) ;
2016-12-07 15:48:28 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_IP ) & &
key - > basic . ip_proto = = IPPROTO_ICMP ) {
fl_set_key_val ( tb , & key - > icmp . type , TCA_FLOWER_KEY_ICMPV4_TYPE ,
& mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) ;
fl_set_key_val ( tb , & key - > icmp . code , TCA_FLOWER_KEY_ICMPV4_CODE ,
& mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ;
} else if ( key - > basic . n_proto = = htons ( ETH_P_IPV6 ) & &
key - > basic . ip_proto = = IPPROTO_ICMPV6 ) {
fl_set_key_val ( tb , & key - > icmp . type , TCA_FLOWER_KEY_ICMPV6_TYPE ,
& mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) ;
2017-01-30 18:19:02 +03:00
fl_set_key_val ( tb , & key - > icmp . code , TCA_FLOWER_KEY_ICMPV6_CODE ,
2016-12-07 15:48:28 +03:00
& mask - > icmp . code ,
2017-01-30 18:19:02 +03:00
TCA_FLOWER_KEY_ICMPV6_CODE_MASK ,
2016-12-07 15:48:28 +03:00
sizeof ( key - > icmp . code ) ) ;
2017-04-22 23:52:47 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_MPLS_UC ) | |
key - > basic . n_proto = = htons ( ETH_P_MPLS_MC ) ) {
2020-03-23 23:48:49 +03:00
ret = fl_set_key_mpls ( tb , & key - > mpls , & mask - > mpls , extack ) ;
2017-05-01 16:58:40 +03:00
if ( ret )
return ret ;
2017-01-11 16:05:43 +03:00
} else if ( key - > basic . n_proto = = htons ( ETH_P_ARP ) | |
key - > basic . n_proto = = htons ( ETH_P_RARP ) ) {
fl_set_key_val ( tb , & key - > arp . sip , TCA_FLOWER_KEY_ARP_SIP ,
& mask - > arp . sip , TCA_FLOWER_KEY_ARP_SIP_MASK ,
sizeof ( key - > arp . sip ) ) ;
fl_set_key_val ( tb , & key - > arp . tip , TCA_FLOWER_KEY_ARP_TIP ,
& mask - > arp . tip , TCA_FLOWER_KEY_ARP_TIP_MASK ,
sizeof ( key - > arp . tip ) ) ;
fl_set_key_val ( tb , & key - > arp . op , TCA_FLOWER_KEY_ARP_OP ,
& mask - > arp . op , TCA_FLOWER_KEY_ARP_OP_MASK ,
sizeof ( key - > arp . op ) ) ;
fl_set_key_val ( tb , key - > arp . sha , TCA_FLOWER_KEY_ARP_SHA ,
mask - > arp . sha , TCA_FLOWER_KEY_ARP_SHA_MASK ,
sizeof ( key - > arp . sha ) ) ;
fl_set_key_val ( tb , key - > arp . tha , TCA_FLOWER_KEY_ARP_THA ,
mask - > arp . tha , TCA_FLOWER_KEY_ARP_THA_MASK ,
sizeof ( key - > arp . tha ) ) ;
2015-05-12 15:56:21 +03:00
}
2018-11-13 03:15:55 +03:00
if ( key - > basic . ip_proto = = IPPROTO_TCP | |
key - > basic . ip_proto = = IPPROTO_UDP | |
key - > basic . ip_proto = = IPPROTO_SCTP ) {
2020-03-23 23:48:51 +03:00
ret = fl_set_key_port_range ( tb , key , mask , extack ) ;
2018-11-13 03:15:55 +03:00
if ( ret )
return ret ;
}
2016-09-08 16:23:47 +03:00
if ( tb [ TCA_FLOWER_KEY_ENC_IPV4_SRC ] | |
tb [ TCA_FLOWER_KEY_ENC_IPV4_DST ] ) {
key - > enc_control . addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > enc_control . addr_type = ~ 0 ;
2016-09-08 16:23:47 +03:00
fl_set_key_val ( tb , & key - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC ,
& mask - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ,
sizeof ( key - > enc_ipv4 . src ) ) ;
fl_set_key_val ( tb , & key - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST ,
& mask - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ,
sizeof ( key - > enc_ipv4 . dst ) ) ;
}
if ( tb [ TCA_FLOWER_KEY_ENC_IPV6_SRC ] | |
tb [ TCA_FLOWER_KEY_ENC_IPV6_DST ] ) {
key - > enc_control . addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS ;
2016-12-14 20:00:57 +03:00
mask - > enc_control . addr_type = ~ 0 ;
2016-09-08 16:23:47 +03:00
fl_set_key_val ( tb , & key - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC ,
& mask - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ,
sizeof ( key - > enc_ipv6 . src ) ) ;
fl_set_key_val ( tb , & key - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST ,
& mask - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ,
sizeof ( key - > enc_ipv6 . dst ) ) ;
}
fl_set_key_val ( tb , & key - > enc_key_id . keyid , TCA_FLOWER_KEY_ENC_KEY_ID ,
2016-09-27 11:21:18 +03:00
& mask - > enc_key_id . keyid , TCA_FLOWER_UNSPEC ,
2016-09-08 16:23:47 +03:00
sizeof ( key - > enc_key_id . keyid ) ) ;
2016-11-07 16:14:39 +03:00
fl_set_key_val ( tb , & key - > enc_tp . src , TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ,
& mask - > enc_tp . src , TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ,
sizeof ( key - > enc_tp . src ) ) ;
fl_set_key_val ( tb , & key - > enc_tp . dst , TCA_FLOWER_KEY_ENC_UDP_DST_PORT ,
& mask - > enc_tp . dst , TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ,
sizeof ( key - > enc_tp . dst ) ) ;
2018-07-17 19:27:18 +03:00
fl_set_key_ip ( tb , true , & key - > enc_ip , & mask - > enc_ip ) ;
2020-07-23 01:03:01 +03:00
fl_set_key_val ( tb , & key - > hash . hash , TCA_FLOWER_KEY_HASH ,
& mask - > hash . hash , TCA_FLOWER_KEY_HASH_MASK ,
sizeof ( key - > hash . hash ) ) ;
2018-08-07 18:36:01 +03:00
if ( tb [ TCA_FLOWER_KEY_ENC_OPTS ] ) {
ret = fl_set_enc_opt ( tb , key , mask , extack ) ;
if ( ret )
return ret ;
}
2019-07-09 10:30:50 +03:00
ret = fl_set_key_ct ( tb , & key - > ct , & mask - > ct , extack ) ;
if ( ret )
return ret ;
2016-12-22 15:28:15 +03:00
if ( tb [ TCA_FLOWER_KEY_FLAGS ] )
2020-03-23 23:48:53 +03:00
ret = fl_set_key_flags ( tb , & key - > control . flags ,
& mask - > control . flags , extack ) ;
2016-12-07 15:03:10 +03:00
2016-12-22 15:28:15 +03:00
return ret ;
2015-05-12 15:56:21 +03:00
}
2018-04-30 14:28:30 +03:00
static void fl_mask_copy ( struct fl_flow_mask * dst ,
struct fl_flow_mask * src )
2015-05-12 15:56:21 +03:00
{
2018-04-30 14:28:30 +03:00
const void * psrc = fl_key_get_start ( & src - > key , src ) ;
void * pdst = fl_key_get_start ( & dst - > key , src ) ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
memcpy ( pdst , psrc , fl_mask_range ( src ) ) ;
dst - > range = src - > range ;
2015-05-12 15:56:21 +03:00
}
static const struct rhashtable_params fl_ht_params = {
. key_offset = offsetof ( struct cls_fl_filter , mkey ) , /* base offset */
. head_offset = offsetof ( struct cls_fl_filter , ht_node ) ,
. automatic_shrinking = true ,
} ;
2018-04-30 14:28:30 +03:00
static int fl_init_mask_hashtable ( struct fl_flow_mask * mask )
2015-05-12 15:56:21 +03:00
{
2018-04-30 14:28:30 +03:00
mask - > filter_ht_params = fl_ht_params ;
mask - > filter_ht_params . key_len = fl_mask_range ( mask ) ;
mask - > filter_ht_params . key_offset + = mask - > range . start ;
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
return rhashtable_init ( & mask - > ht , & mask - > filter_ht_params ) ;
2015-05-12 15:56:21 +03:00
}
# define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
2019-12-09 21:31:43 +03:00
# define FL_KEY_MEMBER_SIZE(member) sizeof_field(struct fl_flow_key, member)
2015-05-12 15:56:21 +03:00
2016-08-17 13:36:12 +03:00
# define FL_KEY_IS_MASKED(mask, member) \
memchr_inv ( ( ( char * ) mask ) + FL_KEY_MEMBER_OFFSET ( member ) , \
0 , FL_KEY_MEMBER_SIZE ( member ) ) \
2015-05-12 15:56:21 +03:00
# define FL_KEY_SET(keys, cnt, id, member) \
do { \
keys [ cnt ] . key_id = id ; \
keys [ cnt ] . offset = FL_KEY_MEMBER_OFFSET ( member ) ; \
cnt + + ; \
} while ( 0 ) ;
2016-08-17 13:36:12 +03:00
# define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member) \
2015-05-12 15:56:21 +03:00
do { \
2016-08-17 13:36:12 +03:00
if ( FL_KEY_IS_MASKED ( mask , member ) ) \
2015-05-12 15:56:21 +03:00
FL_KEY_SET ( keys , cnt , id , member ) ; \
} while ( 0 ) ;
2018-07-23 10:23:09 +03:00
static void fl_init_dissector ( struct flow_dissector * dissector ,
struct fl_flow_key * mask )
2015-05-12 15:56:21 +03:00
{
struct flow_dissector_key keys [ FLOW_DISSECTOR_KEY_MAX ] ;
size_t cnt = 0 ;
2019-06-19 09:41:03 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_META , meta ) ;
2015-06-04 19:16:39 +03:00
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_CONTROL , control ) ;
2015-05-12 15:56:21 +03:00
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_BASIC , basic ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-08-17 13:36:12 +03:00
FLOW_DISSECTOR_KEY_ETH_ADDRS , eth ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-08-17 13:36:12 +03:00
FLOW_DISSECTOR_KEY_IPV4_ADDRS , ipv4 ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-08-17 13:36:12 +03:00
FLOW_DISSECTOR_KEY_IPV6_ADDRS , ipv6 ) ;
2019-12-03 13:40:12 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_PORTS , tp ) ;
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_PORTS_RANGE , tp_range ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2017-06-01 21:37:38 +03:00
FLOW_DISSECTOR_KEY_IP , ip ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2017-05-23 19:40:45 +03:00
FLOW_DISSECTOR_KEY_TCP , tcp ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-12-07 15:48:28 +03:00
FLOW_DISSECTOR_KEY_ICMP , icmp ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2017-01-11 16:05:43 +03:00
FLOW_DISSECTOR_KEY_ARP , arp ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2017-04-22 23:52:47 +03:00
FLOW_DISSECTOR_KEY_MPLS , mpls ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-08-17 13:36:13 +03:00
FLOW_DISSECTOR_KEY_VLAN , vlan ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2018-07-06 08:38:16 +03:00
FLOW_DISSECTOR_KEY_CVLAN , cvlan ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-11-07 16:14:38 +03:00
FLOW_DISSECTOR_KEY_ENC_KEYID , enc_key_id ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-11-07 16:14:38 +03:00
FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS , enc_ipv4 ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-11-07 16:14:38 +03:00
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS , enc_ipv6 ) ;
2018-07-23 10:23:09 +03:00
if ( FL_KEY_IS_MASKED ( mask , enc_ipv4 ) | |
FL_KEY_IS_MASKED ( mask , enc_ipv6 ) )
2016-11-07 16:14:38 +03:00
FL_KEY_SET ( keys , cnt , FLOW_DISSECTOR_KEY_ENC_CONTROL ,
enc_control ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2016-11-07 16:14:39 +03:00
FLOW_DISSECTOR_KEY_ENC_PORTS , enc_tp ) ;
2018-07-23 10:23:09 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
2018-07-17 19:27:18 +03:00
FLOW_DISSECTOR_KEY_ENC_IP , enc_ip ) ;
2018-08-07 18:36:01 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_ENC_OPTS , enc_opts ) ;
2019-07-09 10:30:50 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_CT , ct ) ;
2020-07-23 01:03:01 +03:00
FL_KEY_SET_IF_MASKED ( mask , keys , cnt ,
FLOW_DISSECTOR_KEY_HASH , hash ) ;
2015-05-12 15:56:21 +03:00
2018-07-23 10:23:09 +03:00
skb_flow_dissector_init ( dissector , keys , cnt ) ;
2018-04-30 14:28:30 +03:00
}
static struct fl_flow_mask * fl_create_new_mask ( struct cls_fl_head * head ,
struct fl_flow_mask * mask )
{
struct fl_flow_mask * newmask ;
int err ;
newmask = kzalloc ( sizeof ( * newmask ) , GFP_KERNEL ) ;
if ( ! newmask )
return ERR_PTR ( - ENOMEM ) ;
fl_mask_copy ( newmask , mask ) ;
2019-12-03 13:40:12 +03:00
if ( ( newmask - > key . tp_range . tp_min . dst & &
newmask - > key . tp_range . tp_max . dst ) | |
( newmask - > key . tp_range . tp_min . src & &
newmask - > key . tp_range . tp_max . src ) )
2018-11-13 03:15:55 +03:00
newmask - > flags | = TCA_FLOWER_MASK_FLAGS_RANGE ;
2018-04-30 14:28:30 +03:00
err = fl_init_mask_hashtable ( newmask ) ;
if ( err )
goto errout_free ;
2018-07-23 10:23:09 +03:00
fl_init_dissector ( & newmask - > dissector , & newmask - > key ) ;
2018-04-30 14:28:30 +03:00
INIT_LIST_HEAD_RCU ( & newmask - > filters ) ;
2019-03-21 16:17:37 +03:00
refcount_set ( & newmask - > refcnt , 1 ) ;
2019-03-21 16:17:38 +03:00
err = rhashtable_replace_fast ( & head - > ht , & mask - > ht_node ,
& newmask - > ht_node , mask_ht_params ) ;
2018-04-30 14:28:30 +03:00
if ( err )
goto errout_destroy ;
2019-03-21 16:17:39 +03:00
spin_lock ( & head - > masks_lock ) ;
2018-04-30 14:28:30 +03:00
list_add_tail_rcu ( & newmask - > list , & head - > masks ) ;
2019-03-21 16:17:39 +03:00
spin_unlock ( & head - > masks_lock ) ;
2018-04-30 14:28:30 +03:00
return newmask ;
errout_destroy :
rhashtable_destroy ( & newmask - > ht ) ;
errout_free :
kfree ( newmask ) ;
return ERR_PTR ( err ) ;
2015-05-12 15:56:21 +03:00
}
static int fl_check_assign_mask ( struct cls_fl_head * head ,
2018-04-30 14:28:30 +03:00
struct cls_fl_filter * fnew ,
struct cls_fl_filter * fold ,
2015-05-12 15:56:21 +03:00
struct fl_flow_mask * mask )
{
2018-04-30 14:28:30 +03:00
struct fl_flow_mask * newmask ;
2019-03-21 16:17:37 +03:00
int ret = 0 ;
2015-05-12 15:56:21 +03:00
2019-03-21 16:17:37 +03:00
rcu_read_lock ( ) ;
2019-03-21 16:17:38 +03:00
/* Insert mask as temporary node to prevent concurrent creation of mask
* with same key . Any concurrent lookups with same key will return
2019-06-13 17:54:04 +03:00
* - EAGAIN because mask ' s refcnt is zero .
2019-03-21 16:17:38 +03:00
*/
fnew - > mask = rhashtable_lookup_get_insert_fast ( & head - > ht ,
& mask - > ht_node ,
mask_ht_params ) ;
2018-04-30 14:28:30 +03:00
if ( ! fnew - > mask ) {
2019-03-21 16:17:37 +03:00
rcu_read_unlock ( ) ;
2019-03-21 16:17:38 +03:00
if ( fold ) {
ret = - EINVAL ;
goto errout_cleanup ;
}
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
newmask = fl_create_new_mask ( head , mask ) ;
2019-03-21 16:17:38 +03:00
if ( IS_ERR ( newmask ) ) {
ret = PTR_ERR ( newmask ) ;
goto errout_cleanup ;
}
2015-05-12 15:56:21 +03:00
2018-04-30 14:28:30 +03:00
fnew - > mask = newmask ;
2019-03-21 16:17:37 +03:00
return 0 ;
2019-03-21 16:17:38 +03:00
} else if ( IS_ERR ( fnew - > mask ) ) {
ret = PTR_ERR ( fnew - > mask ) ;
2018-06-03 10:06:14 +03:00
} else if ( fold & & fold - > mask ! = fnew - > mask ) {
2019-03-21 16:17:37 +03:00
ret = - EINVAL ;
} else if ( ! refcount_inc_not_zero ( & fnew - > mask - > refcnt ) ) {
/* Mask was deleted concurrently, try again */
ret = - EAGAIN ;
2018-04-30 14:28:30 +03:00
}
2019-03-21 16:17:37 +03:00
rcu_read_unlock ( ) ;
return ret ;
2019-03-21 16:17:38 +03:00
errout_cleanup :
rhashtable_remove_fast ( & head - > ht , & mask - > ht_node ,
mask_ht_params ) ;
return ret ;
2015-05-12 15:56:21 +03:00
}
static int fl_set_parms ( struct net * net , struct tcf_proto * tp ,
struct cls_fl_filter * f , struct fl_flow_mask * mask ,
unsigned long base , struct nlattr * * tb ,
2021-07-30 02:12:14 +03:00
struct nlattr * est ,
2021-12-17 21:16:28 +03:00
struct fl_flow_tmplt * tmplt ,
u32 flags , u32 fl_flags ,
2018-01-18 19:20:52 +03:00
struct netlink_ext_ack * extack )
2015-05-12 15:56:21 +03:00
{
int err ;
2021-12-17 21:16:28 +03:00
err = tcf_exts_validate_ex ( net , tp , tb , est , & f - > exts , flags ,
fl_flags , extack ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
return err ;
if ( tb [ TCA_FLOWER_CLASSID ] ) {
f - > res . classid = nla_get_u32 ( tb [ TCA_FLOWER_CLASSID ] ) ;
2021-07-30 02:12:14 +03:00
if ( flags & TCA_ACT_FLAGS_NO_RTNL )
2019-03-21 16:17:43 +03:00
rtnl_lock ( ) ;
2015-05-12 15:56:21 +03:00
tcf_bind_filter ( tp , & f - > res , base ) ;
2021-07-30 02:12:14 +03:00
if ( flags & TCA_ACT_FLAGS_NO_RTNL )
2019-03-21 16:17:43 +03:00
rtnl_unlock ( ) ;
2015-05-12 15:56:21 +03:00
}
2018-01-18 19:20:54 +03:00
err = fl_set_key ( net , tb , & f - > key , & mask - > key , extack ) ;
2015-05-12 15:56:21 +03:00
if ( err )
2017-08-04 15:29:06 +03:00
return err ;
2015-05-12 15:56:21 +03:00
fl_mask_update_range ( mask ) ;
fl_set_masked_key ( & f - > mkey , & f - > key , mask ) ;
2018-07-23 10:23:10 +03:00
if ( ! fl_mask_fits_tmplt ( tmplt , mask ) ) {
NL_SET_ERR_MSG_MOD ( extack , " Mask does not fit the template " ) ;
return - EINVAL ;
}
2015-05-12 15:56:21 +03:00
return 0 ;
}
2019-04-05 20:56:26 +03:00
static int fl_ht_insert_unique ( struct cls_fl_filter * fnew ,
struct cls_fl_filter * fold ,
bool * in_ht )
{
struct fl_flow_mask * mask = fnew - > mask ;
int err ;
2019-04-11 19:12:20 +03:00
err = rhashtable_lookup_insert_fast ( & mask - > ht ,
& fnew - > ht_node ,
mask - > filter_ht_params ) ;
2019-04-05 20:56:26 +03:00
if ( err ) {
* in_ht = false ;
/* It is okay if filter with same key exists when
* overwriting .
*/
return fold & & err = = - EEXIST ? 0 : err ;
}
* in_ht = true ;
return 0 ;
}
2015-05-12 15:56:21 +03:00
static int fl_change ( struct net * net , struct sk_buff * in_skb ,
struct tcf_proto * tp , unsigned long base ,
u32 handle , struct nlattr * * tca ,
2021-07-30 02:12:14 +03:00
void * * arg , u32 flags ,
2019-02-11 11:55:45 +03:00
struct netlink_ext_ack * extack )
2015-05-12 15:56:21 +03:00
{
2019-03-21 16:17:33 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
2021-07-30 02:12:14 +03:00
bool rtnl_held = ! ( flags & TCA_ACT_FLAGS_NO_RTNL ) ;
2017-08-05 07:31:43 +03:00
struct cls_fl_filter * fold = * arg ;
2015-05-12 15:56:21 +03:00
struct cls_fl_filter * fnew ;
2019-01-16 18:53:52 +03:00
struct fl_flow_mask * mask ;
2017-01-19 12:45:31 +03:00
struct nlattr * * tb ;
2019-04-05 20:56:26 +03:00
bool in_ht ;
2015-05-12 15:56:21 +03:00
int err ;
2019-03-21 16:17:35 +03:00
if ( ! tca [ TCA_OPTIONS ] ) {
err = - EINVAL ;
goto errout_fold ;
}
2015-05-12 15:56:21 +03:00
2019-01-16 18:53:52 +03:00
mask = kzalloc ( sizeof ( struct fl_flow_mask ) , GFP_KERNEL ) ;
2019-03-21 16:17:35 +03:00
if ( ! mask ) {
err = - ENOBUFS ;
goto errout_fold ;
}
2017-01-19 12:45:31 +03:00
2019-01-16 18:53:52 +03:00
tb = kcalloc ( TCA_FLOWER_MAX + 1 , sizeof ( struct nlattr * ) , GFP_KERNEL ) ;
if ( ! tb ) {
err = - ENOBUFS ;
goto errout_mask_alloc ;
}
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 15:07:28 +03:00
err = nla_parse_nested_deprecated ( tb , TCA_FLOWER_MAX ,
tca [ TCA_OPTIONS ] , fl_policy , NULL ) ;
2015-05-12 15:56:21 +03:00
if ( err < 0 )
2017-01-19 12:45:31 +03:00
goto errout_tb ;
2015-05-12 15:56:21 +03:00
2017-01-19 12:45:31 +03:00
if ( fold & & handle & & fold - > handle ! = handle ) {
err = - EINVAL ;
goto errout_tb ;
}
2015-05-12 15:56:21 +03:00
fnew = kzalloc ( sizeof ( * fnew ) , GFP_KERNEL ) ;
2017-01-19 12:45:31 +03:00
if ( ! fnew ) {
err = - ENOBUFS ;
goto errout_tb ;
}
2019-04-24 09:53:31 +03:00
INIT_LIST_HEAD ( & fnew - > hw_list ) ;
2019-03-21 16:17:35 +03:00
refcount_set ( & fnew - > refcnt , 1 ) ;
2015-05-12 15:56:21 +03:00
2019-02-21 08:37:42 +03:00
err = tcf_exts_init ( & fnew - > exts , net , TCA_FLOWER_ACT , 0 ) ;
2016-08-19 22:36:54 +03:00
if ( err < 0 )
goto errout ;
2015-05-12 15:56:21 +03:00
2016-06-05 17:11:18 +03:00
if ( tb [ TCA_FLOWER_FLAGS ] ) {
fnew - > flags = nla_get_u32 ( tb [ TCA_FLOWER_FLAGS ] ) ;
if ( ! tc_flags_valid ( fnew - > flags ) ) {
err = - EINVAL ;
2019-03-06 17:22:12 +03:00
goto errout ;
2016-06-05 17:11:18 +03:00
}
}
2016-03-08 13:42:29 +03:00
2021-07-30 02:12:14 +03:00
err = fl_set_parms ( net , tp , fnew , mask , base , tb , tca [ TCA_RATE ] ,
2021-12-17 21:16:28 +03:00
tp - > chain - > tmplt_priv , flags , fnew - > flags ,
extack ) ;
2015-05-12 15:56:21 +03:00
if ( err )
2019-03-06 17:22:12 +03:00
goto errout ;
2015-05-12 15:56:21 +03:00
2019-01-16 18:53:52 +03:00
err = fl_check_assign_mask ( head , fnew , fold , mask ) ;
2015-05-12 15:56:21 +03:00
if ( err )
2019-03-06 17:22:12 +03:00
goto errout ;
2019-04-05 20:56:26 +03:00
err = fl_ht_insert_unique ( fnew , fold , & in_ht ) ;
if ( err )
goto errout_mask ;
2016-12-01 15:06:34 +03:00
if ( ! tc_skip_hw ( fnew - > flags ) ) {
2019-03-21 16:17:43 +03:00
err = fl_hw_replace_filter ( tp , fnew , rtnl_held , extack ) ;
2016-12-01 15:06:34 +03:00
if ( err )
2019-04-05 20:56:26 +03:00
goto errout_ht ;
2016-12-01 15:06:34 +03:00
}
2016-03-08 13:42:29 +03:00
2017-02-16 11:31:13 +03:00
if ( ! tc_in_hw ( fnew - > flags ) )
fnew - > flags | = TCA_CLS_FLAGS_NOT_IN_HW ;
2019-03-21 16:17:42 +03:00
spin_lock ( & tp - > lock ) ;
2019-03-21 16:17:41 +03:00
/* tp was deleted concurrently. -EAGAIN will cause caller to lookup
* proto again or create new one , if necessary .
*/
if ( tp - > deleting ) {
err = - EAGAIN ;
goto errout_hw ;
}
2016-03-08 13:42:29 +03:00
if ( fold ) {
2019-03-21 16:17:36 +03:00
/* Fold filter was deleted concurrently. Retry lookup. */
if ( fold - > deleted ) {
err = - EAGAIN ;
goto errout_hw ;
}
2019-03-21 16:17:34 +03:00
fnew - > handle = handle ;
2019-04-05 20:56:26 +03:00
if ( ! in_ht ) {
struct rhashtable_params params =
fnew - > mask - > filter_ht_params ;
err = rhashtable_insert_fast ( & fnew - > mask - > ht ,
& fnew - > ht_node ,
params ) ;
if ( err )
goto errout_hw ;
in_ht = true ;
}
2019-03-21 16:17:34 +03:00
2019-04-24 09:53:31 +03:00
refcount_inc ( & fnew - > refcnt ) ;
2018-12-19 19:07:56 +03:00
rhashtable_remove_fast ( & fold - > mask - > ht ,
& fold - > ht_node ,
fold - > mask - > filter_ht_params ) ;
2017-11-28 17:56:36 +03:00
idr_replace ( & head - > handle_idr , fnew , fnew - > handle ) ;
2015-07-17 23:38:44 +03:00
list_replace_rcu ( & fold - > list , & fnew - > list ) ;
2019-03-21 16:17:36 +03:00
fold - > deleted = true ;
2019-03-21 16:17:34 +03:00
2019-03-21 16:17:42 +03:00
spin_unlock ( & tp - > lock ) ;
2019-04-12 00:54:19 +03:00
fl_mask_put ( head , fold - > mask ) ;
2019-03-21 16:17:34 +03:00
if ( ! tc_skip_hw ( fold - > flags ) )
2019-03-21 16:17:43 +03:00
fl_hw_destroy_filter ( tp , fold , rtnl_held , NULL ) ;
2015-05-12 15:56:21 +03:00
tcf_unbind_filter ( tp , & fold - > res ) ;
2019-03-21 16:17:35 +03:00
/* Caller holds reference to fold, so refcnt is always > 0
* after this .
*/
refcount_dec ( & fold - > refcnt ) ;
__fl_put ( fold ) ;
2015-05-12 15:56:21 +03:00
} else {
2019-03-21 16:17:34 +03:00
if ( handle ) {
/* user specifies a handle and it doesn't exist */
err = idr_alloc_u32 ( & head - > handle_idr , fnew , & handle ,
handle , GFP_ATOMIC ) ;
2019-03-21 16:17:40 +03:00
/* Filter with specified handle was concurrently
* inserted after initial check in cls_api . This is not
* necessarily an error if NLM_F_EXCL is not set in
* message flags . Returning EAGAIN will cause cls_api to
* try to update concurrently inserted rule .
*/
if ( err = = - ENOSPC )
err = - EAGAIN ;
2019-03-21 16:17:34 +03:00
} else {
handle = 1 ;
err = idr_alloc_u32 ( & head - > handle_idr , fnew , & handle ,
INT_MAX , GFP_ATOMIC ) ;
}
if ( err )
goto errout_hw ;
2019-04-24 09:53:31 +03:00
refcount_inc ( & fnew - > refcnt ) ;
2019-03-21 16:17:34 +03:00
fnew - > handle = handle ;
2018-04-30 14:28:30 +03:00
list_add_tail_rcu ( & fnew - > list , & fnew - > mask - > filters ) ;
2019-03-21 16:17:42 +03:00
spin_unlock ( & tp - > lock ) ;
2015-05-12 15:56:21 +03:00
}
2019-03-21 16:17:34 +03:00
* arg = fnew ;
2017-01-19 12:45:31 +03:00
kfree ( tb ) ;
2019-06-13 17:54:04 +03:00
tcf_queue_work ( & mask - > rwork , fl_uninit_mask_free_work ) ;
2015-05-12 15:56:21 +03:00
return 0 ;
2019-04-24 09:53:31 +03:00
errout_ht :
spin_lock ( & tp - > lock ) ;
2019-03-21 16:17:34 +03:00
errout_hw :
2019-04-24 09:53:31 +03:00
fnew - > deleted = true ;
2019-03-21 16:17:42 +03:00
spin_unlock ( & tp - > lock ) ;
2019-03-21 16:17:34 +03:00
if ( ! tc_skip_hw ( fnew - > flags ) )
2019-03-21 16:17:43 +03:00
fl_hw_destroy_filter ( tp , fnew , rtnl_held , NULL ) ;
2019-04-05 20:56:26 +03:00
if ( in_ht )
rhashtable_remove_fast ( & fnew - > mask - > ht , & fnew - > ht_node ,
fnew - > mask - > filter_ht_params ) ;
2019-03-06 17:22:12 +03:00
errout_mask :
2019-04-12 00:54:19 +03:00
fl_mask_put ( head , fnew - > mask ) ;
2015-05-12 15:56:21 +03:00
errout :
2019-04-24 09:53:31 +03:00
__fl_put ( fnew ) ;
2017-01-19 12:45:31 +03:00
errout_tb :
kfree ( tb ) ;
2019-01-16 18:53:52 +03:00
errout_mask_alloc :
2019-06-13 17:54:04 +03:00
tcf_queue_work ( & mask - > rwork , fl_uninit_mask_free_work ) ;
2019-03-21 16:17:35 +03:00
errout_fold :
if ( fold )
__fl_put ( fold ) ;
2015-05-12 15:56:21 +03:00
return err ;
}
2018-01-18 19:20:53 +03:00
static int fl_delete ( struct tcf_proto * tp , void * arg , bool * last ,
2019-02-11 11:55:45 +03:00
bool rtnl_held , struct netlink_ext_ack * extack )
2015-05-12 15:56:21 +03:00
{
2019-03-21 16:17:33 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
2017-08-05 07:31:43 +03:00
struct cls_fl_filter * f = arg ;
2019-03-21 16:17:36 +03:00
bool last_on_mask ;
int err = 0 ;
2015-05-12 15:56:21 +03:00
2019-03-21 16:17:43 +03:00
err = __fl_delete ( tp , f , & last_on_mask , rtnl_held , extack ) ;
2018-04-30 14:28:30 +03:00
* last = list_empty ( & head - > masks ) ;
2019-03-21 16:17:35 +03:00
__fl_put ( f ) ;
2019-03-21 16:17:36 +03:00
return err ;
2015-05-12 15:56:21 +03:00
}
2019-02-11 11:55:45 +03:00
static void fl_walk ( struct tcf_proto * tp , struct tcf_walker * arg ,
bool rtnl_held )
2015-05-12 15:56:21 +03:00
{
2019-06-28 21:03:42 +03:00
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
unsigned long id = arg - > cookie , tmp ;
2015-05-12 15:56:21 +03:00
struct cls_fl_filter * f ;
2018-04-30 14:28:30 +03:00
2018-07-09 13:29:11 +03:00
arg - > count = arg - > skip ;
2021-09-29 18:08:49 +03:00
rcu_read_lock ( ) ;
2019-06-28 21:03:42 +03:00
idr_for_each_entry_continue_ul ( & head - > handle_idr , f , tmp , id ) {
/* don't return filters that are being deleted */
if ( ! refcount_inc_not_zero ( & f - > refcnt ) )
continue ;
2021-09-29 18:08:49 +03:00
rcu_read_unlock ( ) ;
2018-07-09 13:29:11 +03:00
if ( arg - > fn ( tp , f , arg ) < 0 ) {
2019-03-21 16:17:35 +03:00
__fl_put ( f ) ;
2018-07-09 13:29:11 +03:00
arg - > stop = 1 ;
2021-09-29 18:08:49 +03:00
rcu_read_lock ( ) ;
2018-07-09 13:29:11 +03:00
break ;
2018-04-30 14:28:30 +03:00
}
2019-03-21 16:17:35 +03:00
__fl_put ( f ) ;
2018-07-09 13:29:11 +03:00
arg - > count + + ;
2021-09-29 18:08:49 +03:00
rcu_read_lock ( ) ;
2015-05-12 15:56:21 +03:00
}
2021-09-29 18:08:49 +03:00
rcu_read_unlock ( ) ;
2019-06-28 21:03:42 +03:00
arg - > cookie = id ;
2015-05-12 15:56:21 +03:00
}
2019-04-24 09:53:31 +03:00
static struct cls_fl_filter *
fl_get_next_hw_filter ( struct tcf_proto * tp , struct cls_fl_filter * f , bool add )
{
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
spin_lock ( & tp - > lock ) ;
if ( list_empty ( & head - > hw_filters ) ) {
spin_unlock ( & tp - > lock ) ;
return NULL ;
}
if ( ! f )
f = list_entry ( & head - > hw_filters , struct cls_fl_filter ,
hw_list ) ;
list_for_each_entry_continue ( f , & head - > hw_filters , hw_list ) {
if ( ! ( add & & f - > deleted ) & & refcount_inc_not_zero ( & f - > refcnt ) ) {
spin_unlock ( & tp - > lock ) ;
return f ;
}
}
spin_unlock ( & tp - > lock ) ;
return NULL ;
}
2019-07-19 19:20:15 +03:00
static int fl_reoffload ( struct tcf_proto * tp , bool add , flow_setup_cb_t * cb ,
2018-06-26 00:30:06 +03:00
void * cb_priv , struct netlink_ext_ack * extack )
{
struct tcf_block * block = tp - > chain - > block ;
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2019-04-24 09:53:31 +03:00
struct cls_fl_filter * f = NULL ;
2018-06-26 00:30:06 +03:00
int err ;
2019-04-24 09:53:31 +03:00
/* hw_filters list can only be changed by hw offload functions after
* obtaining rtnl lock . Make sure it is not changed while reoffload is
* iterating it .
*/
ASSERT_RTNL ( ) ;
2019-02-02 14:50:46 +03:00
2019-04-24 09:53:31 +03:00
while ( ( f = fl_get_next_hw_filter ( tp , f , add ) ) ) {
2019-04-03 01:53:20 +03:00
cls_flower . rule =
flow_rule_alloc ( tcf_exts_num_actions ( & f - > exts ) ) ;
if ( ! cls_flower . rule ) {
__fl_put ( f ) ;
return - ENOMEM ;
}
2018-06-26 00:30:06 +03:00
2019-04-03 01:53:20 +03:00
tc_cls_common_offload_init ( & cls_flower . common , tp , f - > flags ,
2019-05-07 03:24:21 +03:00
extack ) ;
2019-04-03 01:53:20 +03:00
cls_flower . command = add ?
2019-07-09 23:55:49 +03:00
FLOW_CLS_REPLACE : FLOW_CLS_DESTROY ;
2019-04-03 01:53:20 +03:00
cls_flower . cookie = ( unsigned long ) f ;
cls_flower . rule - > match . dissector = & f - > mask - > dissector ;
cls_flower . rule - > match . mask = & f - > mask - > key ;
cls_flower . rule - > match . key = & f - > mkey ;
2021-12-17 21:16:20 +03:00
err = tc_setup_offload_action ( & cls_flower . rule - > action , & f - > exts ) ;
2019-04-03 01:53:20 +03:00
if ( err ) {
2019-02-02 14:50:43 +03:00
kfree ( cls_flower . rule ) ;
2019-04-03 01:53:20 +03:00
if ( tc_skip_sw ( f - > flags ) ) {
NL_SET_ERR_MSG_MOD ( extack , " Failed to setup flow action " ) ;
__fl_put ( f ) ;
return err ;
2018-06-26 00:30:06 +03:00
}
2019-04-03 01:53:20 +03:00
goto next_flow ;
}
2018-06-26 00:30:06 +03:00
2019-04-03 01:53:20 +03:00
cls_flower . classid = f - > res . classid ;
2019-08-26 16:44:59 +03:00
err = tc_setup_cb_reoffload ( block , tp , add , cb ,
TC_SETUP_CLSFLOWER , & cls_flower ,
cb_priv , & f - > flags ,
& f - > in_hw_count ) ;
2021-12-17 21:16:20 +03:00
tc_cleanup_offload_action ( & cls_flower . rule - > action ) ;
2019-04-03 01:53:20 +03:00
kfree ( cls_flower . rule ) ;
if ( err ) {
2019-08-26 16:44:59 +03:00
__fl_put ( f ) ;
return err ;
2018-06-26 00:30:06 +03:00
}
2019-04-03 01:53:20 +03:00
next_flow :
__fl_put ( f ) ;
2018-06-26 00:30:06 +03:00
}
return 0 ;
}
2019-08-26 16:45:00 +03:00
static void fl_hw_add ( struct tcf_proto * tp , void * type_data )
{
struct flow_cls_offload * cls_flower = type_data ;
struct cls_fl_filter * f =
( struct cls_fl_filter * ) cls_flower - > cookie ;
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
spin_lock ( & tp - > lock ) ;
list_add ( & f - > hw_list , & head - > hw_filters ) ;
spin_unlock ( & tp - > lock ) ;
}
static void fl_hw_del ( struct tcf_proto * tp , void * type_data )
{
struct flow_cls_offload * cls_flower = type_data ;
struct cls_fl_filter * f =
( struct cls_fl_filter * ) cls_flower - > cookie ;
spin_lock ( & tp - > lock ) ;
if ( ! list_empty ( & f - > hw_list ) )
list_del_init ( & f - > hw_list ) ;
spin_unlock ( & tp - > lock ) ;
}
2019-02-02 14:50:43 +03:00
static int fl_hw_create_tmplt ( struct tcf_chain * chain ,
struct fl_flow_tmplt * tmplt )
2018-07-23 10:23:11 +03:00
{
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2018-07-23 10:23:11 +03:00
struct tcf_block * block = chain - > block ;
2019-02-02 14:50:45 +03:00
cls_flower . rule = flow_rule_alloc ( 0 ) ;
2019-02-02 14:50:43 +03:00
if ( ! cls_flower . rule )
return - ENOMEM ;
2018-07-23 10:23:11 +03:00
cls_flower . common . chain_index = chain - > index ;
2019-07-09 23:55:49 +03:00
cls_flower . command = FLOW_CLS_TMPLT_CREATE ;
2018-07-23 10:23:11 +03:00
cls_flower . cookie = ( unsigned long ) tmplt ;
2019-02-02 14:50:43 +03:00
cls_flower . rule - > match . dissector = & tmplt - > dissector ;
cls_flower . rule - > match . mask = & tmplt - > mask ;
cls_flower . rule - > match . key = & tmplt - > dummy_key ;
2018-07-23 10:23:11 +03:00
/* We don't care if driver (any of them) fails to handle this
* call . It serves just as a hint for it .
*/
2019-08-26 16:44:59 +03:00
tc_setup_cb_call ( block , TC_SETUP_CLSFLOWER , & cls_flower , false , true ) ;
2019-02-02 14:50:43 +03:00
kfree ( cls_flower . rule ) ;
return 0 ;
2018-07-23 10:23:11 +03:00
}
static void fl_hw_destroy_tmplt ( struct tcf_chain * chain ,
struct fl_flow_tmplt * tmplt )
{
2019-07-09 23:55:49 +03:00
struct flow_cls_offload cls_flower = { } ;
2018-07-23 10:23:11 +03:00
struct tcf_block * block = chain - > block ;
cls_flower . common . chain_index = chain - > index ;
2019-07-09 23:55:49 +03:00
cls_flower . command = FLOW_CLS_TMPLT_DESTROY ;
2018-07-23 10:23:11 +03:00
cls_flower . cookie = ( unsigned long ) tmplt ;
2019-08-26 16:44:59 +03:00
tc_setup_cb_call ( block , TC_SETUP_CLSFLOWER , & cls_flower , false , true ) ;
2018-07-23 10:23:11 +03:00
}
2018-07-23 10:23:10 +03:00
static void * fl_tmplt_create ( struct net * net , struct tcf_chain * chain ,
struct nlattr * * tca ,
struct netlink_ext_ack * extack )
{
struct fl_flow_tmplt * tmplt ;
struct nlattr * * tb ;
int err ;
if ( ! tca [ TCA_OPTIONS ] )
return ERR_PTR ( - EINVAL ) ;
tb = kcalloc ( TCA_FLOWER_MAX + 1 , sizeof ( struct nlattr * ) , GFP_KERNEL ) ;
if ( ! tb )
return ERR_PTR ( - ENOBUFS ) ;
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 15:07:28 +03:00
err = nla_parse_nested_deprecated ( tb , TCA_FLOWER_MAX ,
tca [ TCA_OPTIONS ] , fl_policy , NULL ) ;
2018-07-23 10:23:10 +03:00
if ( err )
goto errout_tb ;
tmplt = kzalloc ( sizeof ( * tmplt ) , GFP_KERNEL ) ;
2018-08-03 22:27:55 +03:00
if ( ! tmplt ) {
err = - ENOMEM ;
2018-07-23 10:23:10 +03:00
goto errout_tb ;
2018-08-03 22:27:55 +03:00
}
2018-07-23 10:23:10 +03:00
tmplt - > chain = chain ;
err = fl_set_key ( net , tb , & tmplt - > dummy_key , & tmplt - > mask , extack ) ;
if ( err )
goto errout_tmplt ;
fl_init_dissector ( & tmplt - > dissector , & tmplt - > mask ) ;
2019-02-02 14:50:43 +03:00
err = fl_hw_create_tmplt ( chain , tmplt ) ;
if ( err )
goto errout_tmplt ;
2018-07-23 10:23:11 +03:00
2019-02-02 14:50:43 +03:00
kfree ( tb ) ;
2018-07-23 10:23:10 +03:00
return tmplt ;
errout_tmplt :
kfree ( tmplt ) ;
errout_tb :
kfree ( tb ) ;
return ERR_PTR ( err ) ;
}
2018-09-20 02:37:29 +03:00
static void fl_tmplt_destroy ( void * tmplt_priv )
{
struct fl_flow_tmplt * tmplt = tmplt_priv ;
2018-10-02 22:50:19 +03:00
fl_hw_destroy_tmplt ( tmplt - > chain , tmplt ) ;
kfree ( tmplt ) ;
2018-09-20 02:37:29 +03:00
}
2015-05-12 15:56:21 +03:00
static int fl_dump_key_val ( struct sk_buff * skb ,
void * val , int val_type ,
void * mask , int mask_type , int len )
{
int err ;
if ( ! memchr_inv ( mask , 0 , len ) )
return 0 ;
err = nla_put ( skb , val_type , len , val ) ;
if ( err )
return err ;
if ( mask_type ! = TCA_FLOWER_UNSPEC ) {
err = nla_put ( skb , mask_type , len , mask ) ;
if ( err )
return err ;
}
return 0 ;
}
2018-11-13 03:15:55 +03:00
static int fl_dump_key_port_range ( struct sk_buff * skb , struct fl_flow_key * key ,
struct fl_flow_key * mask )
{
2019-12-03 13:40:12 +03:00
if ( fl_dump_key_val ( skb , & key - > tp_range . tp_min . dst ,
TCA_FLOWER_KEY_PORT_DST_MIN ,
& mask - > tp_range . tp_min . dst , TCA_FLOWER_UNSPEC ,
sizeof ( key - > tp_range . tp_min . dst ) ) | |
fl_dump_key_val ( skb , & key - > tp_range . tp_max . dst ,
TCA_FLOWER_KEY_PORT_DST_MAX ,
& mask - > tp_range . tp_max . dst , TCA_FLOWER_UNSPEC ,
sizeof ( key - > tp_range . tp_max . dst ) ) | |
fl_dump_key_val ( skb , & key - > tp_range . tp_min . src ,
TCA_FLOWER_KEY_PORT_SRC_MIN ,
& mask - > tp_range . tp_min . src , TCA_FLOWER_UNSPEC ,
sizeof ( key - > tp_range . tp_min . src ) ) | |
fl_dump_key_val ( skb , & key - > tp_range . tp_max . src ,
TCA_FLOWER_KEY_PORT_SRC_MAX ,
& mask - > tp_range . tp_max . src , TCA_FLOWER_UNSPEC ,
sizeof ( key - > tp_range . tp_max . src ) ) )
2018-11-13 03:15:55 +03:00
return - 1 ;
return 0 ;
}
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
static int fl_dump_key_mpls_opt_lse ( struct sk_buff * skb ,
struct flow_dissector_key_mpls * mpls_key ,
struct flow_dissector_key_mpls * mpls_mask ,
u8 lse_index )
{
struct flow_dissector_mpls_lse * lse_mask = & mpls_mask - > ls [ lse_index ] ;
struct flow_dissector_mpls_lse * lse_key = & mpls_key - > ls [ lse_index ] ;
int err ;
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH ,
lse_index + 1 ) ;
if ( err )
return err ;
if ( lse_mask - > mpls_ttl ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL ,
lse_key - > mpls_ttl ) ;
if ( err )
return err ;
}
if ( lse_mask - > mpls_bos ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS ,
lse_key - > mpls_bos ) ;
if ( err )
return err ;
}
if ( lse_mask - > mpls_tc ) {
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_OPT_LSE_TC ,
lse_key - > mpls_tc ) ;
if ( err )
return err ;
}
if ( lse_mask - > mpls_label ) {
2020-12-09 18:48:41 +03:00
err = nla_put_u32 ( skb , TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL ,
lse_key - > mpls_label ) ;
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
if ( err )
return err ;
}
return 0 ;
}
static int fl_dump_key_mpls_opts ( struct sk_buff * skb ,
struct flow_dissector_key_mpls * mpls_key ,
struct flow_dissector_key_mpls * mpls_mask )
{
struct nlattr * opts ;
struct nlattr * lse ;
u8 lse_index ;
int err ;
opts = nla_nest_start ( skb , TCA_FLOWER_KEY_MPLS_OPTS ) ;
if ( ! opts )
return - EMSGSIZE ;
for ( lse_index = 0 ; lse_index < FLOW_DIS_MPLS_MAX ; lse_index + + ) {
if ( ! ( mpls_mask - > used_lses & 1 < < lse_index ) )
continue ;
lse = nla_nest_start ( skb , TCA_FLOWER_KEY_MPLS_OPTS_LSE ) ;
if ( ! lse ) {
err = - EMSGSIZE ;
goto err_opts ;
}
err = fl_dump_key_mpls_opt_lse ( skb , mpls_key , mpls_mask ,
lse_index ) ;
if ( err )
goto err_opts_lse ;
nla_nest_end ( skb , lse ) ;
}
nla_nest_end ( skb , opts ) ;
return 0 ;
err_opts_lse :
nla_nest_cancel ( skb , lse ) ;
err_opts :
nla_nest_cancel ( skb , opts ) ;
return err ;
}
2017-04-22 23:52:47 +03:00
static int fl_dump_key_mpls ( struct sk_buff * skb ,
struct flow_dissector_key_mpls * mpls_key ,
struct flow_dissector_key_mpls * mpls_mask )
{
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
struct flow_dissector_mpls_lse * lse_mask ;
struct flow_dissector_mpls_lse * lse_key ;
2017-04-22 23:52:47 +03:00
int err ;
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
if ( ! mpls_mask - > used_lses )
2017-04-22 23:52:47 +03:00
return 0 ;
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_mask = & mpls_mask - > ls [ 0 ] ;
lse_key = & mpls_key - > ls [ 0 ] ;
cls_flower: Support filtering on multiple MPLS Label Stack Entries
With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.
In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).
For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.
The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).
Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:04 +03:00
/* For backward compatibility, don't use the MPLS nested attributes if
* the rule can be expressed using the old attributes .
*/
if ( mpls_mask - > used_lses & ~ 1 | |
( ! lse_mask - > mpls_ttl & & ! lse_mask - > mpls_bos & &
! lse_mask - > mpls_tc & & ! lse_mask - > mpls_label ) )
return fl_dump_key_mpls_opts ( skb , mpls_key , mpls_mask ) ;
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
if ( lse_mask - > mpls_ttl ) {
2017-04-22 23:52:47 +03:00
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_TTL ,
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_key - > mpls_ttl ) ;
2017-04-22 23:52:47 +03:00
if ( err )
return err ;
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
if ( lse_mask - > mpls_tc ) {
2017-04-22 23:52:47 +03:00
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_TC ,
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_key - > mpls_tc ) ;
2017-04-22 23:52:47 +03:00
if ( err )
return err ;
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
if ( lse_mask - > mpls_label ) {
2017-04-22 23:52:47 +03:00
err = nla_put_u32 ( skb , TCA_FLOWER_KEY_MPLS_LABEL ,
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_key - > mpls_label ) ;
2017-04-22 23:52:47 +03:00
if ( err )
return err ;
}
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
if ( lse_mask - > mpls_bos ) {
2017-04-22 23:52:47 +03:00
err = nla_put_u8 ( skb , TCA_FLOWER_KEY_MPLS_BOS ,
flow_dissector: Parse multiple MPLS Label Stack Entries
The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).
This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.
FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.
To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.
The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.
Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26 15:29:00 +03:00
lse_key - > mpls_bos ) ;
2017-04-22 23:52:47 +03:00
if ( err )
return err ;
}
return 0 ;
}
2018-07-17 19:27:18 +03:00
static int fl_dump_key_ip ( struct sk_buff * skb , bool encap ,
2017-06-01 21:37:38 +03:00
struct flow_dissector_key_ip * key ,
struct flow_dissector_key_ip * mask )
{
2018-07-17 19:27:18 +03:00
int tos_key = encap ? TCA_FLOWER_KEY_ENC_IP_TOS : TCA_FLOWER_KEY_IP_TOS ;
int ttl_key = encap ? TCA_FLOWER_KEY_ENC_IP_TTL : TCA_FLOWER_KEY_IP_TTL ;
int tos_mask = encap ? TCA_FLOWER_KEY_ENC_IP_TOS_MASK : TCA_FLOWER_KEY_IP_TOS_MASK ;
int ttl_mask = encap ? TCA_FLOWER_KEY_ENC_IP_TTL_MASK : TCA_FLOWER_KEY_IP_TTL_MASK ;
if ( fl_dump_key_val ( skb , & key - > tos , tos_key , & mask - > tos , tos_mask , sizeof ( key - > tos ) ) | |
fl_dump_key_val ( skb , & key - > ttl , ttl_key , & mask - > ttl , ttl_mask , sizeof ( key - > ttl ) ) )
2017-06-01 21:37:38 +03:00
return - 1 ;
return 0 ;
}
2016-08-17 13:36:13 +03:00
static int fl_dump_key_vlan ( struct sk_buff * skb ,
2018-07-06 08:38:16 +03:00
int vlan_id_key , int vlan_prio_key ,
2016-08-17 13:36:13 +03:00
struct flow_dissector_key_vlan * vlan_key ,
struct flow_dissector_key_vlan * vlan_mask )
{
int err ;
if ( ! memchr_inv ( vlan_mask , 0 , sizeof ( * vlan_mask ) ) )
return 0 ;
if ( vlan_mask - > vlan_id ) {
2018-07-06 08:38:16 +03:00
err = nla_put_u16 ( skb , vlan_id_key ,
2016-08-17 13:36:13 +03:00
vlan_key - > vlan_id ) ;
if ( err )
return err ;
}
if ( vlan_mask - > vlan_priority ) {
2018-07-06 08:38:16 +03:00
err = nla_put_u8 ( skb , vlan_prio_key ,
2016-08-17 13:36:13 +03:00
vlan_key - > vlan_priority ) ;
if ( err )
return err ;
}
return 0 ;
}
2016-12-07 15:03:10 +03:00
static void fl_get_key_flag ( u32 dissector_key , u32 dissector_mask ,
u32 * flower_key , u32 * flower_mask ,
u32 flower_flag_bit , u32 dissector_flag_bit )
{
if ( dissector_mask & dissector_flag_bit ) {
* flower_mask | = flower_flag_bit ;
if ( dissector_key & dissector_flag_bit )
* flower_key | = flower_flag_bit ;
}
}
static int fl_dump_key_flags ( struct sk_buff * skb , u32 flags_key , u32 flags_mask )
{
u32 key , mask ;
__be32 _key , _mask ;
int err ;
if ( ! memchr_inv ( & flags_mask , 0 , sizeof ( flags_mask ) ) )
return 0 ;
key = 0 ;
mask = 0 ;
fl_get_key_flag ( flags_key , flags_mask , & key , & mask ,
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT , FLOW_DIS_IS_FRAGMENT ) ;
2018-03-06 20:11:14 +03:00
fl_get_key_flag ( flags_key , flags_mask , & key , & mask ,
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST ,
FLOW_DIS_FIRST_FRAG ) ;
2016-12-07 15:03:10 +03:00
_key = cpu_to_be32 ( key ) ;
_mask = cpu_to_be32 ( mask ) ;
err = nla_put ( skb , TCA_FLOWER_KEY_FLAGS , 4 , & _key ) ;
if ( err )
return err ;
return nla_put ( skb , TCA_FLOWER_KEY_FLAGS_MASK , 4 , & _mask ) ;
}
2018-08-07 18:36:01 +03:00
static int fl_dump_key_geneve_opt ( struct sk_buff * skb ,
struct flow_dissector_key_enc_opts * enc_opts )
{
struct geneve_opt * opt ;
struct nlattr * nest ;
int opt_off = 0 ;
2019-04-26 12:13:06 +03:00
nest = nla_nest_start_noflag ( skb , TCA_FLOWER_KEY_ENC_OPTS_GENEVE ) ;
2018-08-07 18:36:01 +03:00
if ( ! nest )
goto nla_put_failure ;
while ( enc_opts - > len > opt_off ) {
opt = ( struct geneve_opt * ) & enc_opts - > data [ opt_off ] ;
if ( nla_put_be16 ( skb , TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS ,
opt - > opt_class ) )
goto nla_put_failure ;
if ( nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_GENEVE_TYPE ,
opt - > type ) )
goto nla_put_failure ;
if ( nla_put ( skb , TCA_FLOWER_KEY_ENC_OPT_GENEVE_DATA ,
opt - > length * 4 , opt - > opt_data ) )
goto nla_put_failure ;
opt_off + = sizeof ( struct geneve_opt ) + opt - > length * 4 ;
}
nla_nest_end ( skb , nest ) ;
return 0 ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
2019-11-21 13:03:28 +03:00
static int fl_dump_key_vxlan_opt ( struct sk_buff * skb ,
struct flow_dissector_key_enc_opts * enc_opts )
{
struct vxlan_metadata * md ;
struct nlattr * nest ;
nest = nla_nest_start_noflag ( skb , TCA_FLOWER_KEY_ENC_OPTS_VXLAN ) ;
if ( ! nest )
goto nla_put_failure ;
md = ( struct vxlan_metadata * ) & enc_opts - > data [ 0 ] ;
if ( nla_put_u32 ( skb , TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP , md - > gbp ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
return 0 ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
2019-11-21 13:03:29 +03:00
static int fl_dump_key_erspan_opt ( struct sk_buff * skb ,
struct flow_dissector_key_enc_opts * enc_opts )
{
struct erspan_metadata * md ;
struct nlattr * nest ;
nest = nla_nest_start_noflag ( skb , TCA_FLOWER_KEY_ENC_OPTS_ERSPAN ) ;
if ( ! nest )
goto nla_put_failure ;
md = ( struct erspan_metadata * ) & enc_opts - > data [ 0 ] ;
if ( nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER , md - > version ) )
goto nla_put_failure ;
if ( md - > version = = 1 & &
nla_put_be32 ( skb , TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX , md - > u . index ) )
goto nla_put_failure ;
if ( md - > version = = 2 & &
( nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR ,
md - > u . md2 . dir ) | |
nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID ,
get_hwid ( & md - > u . md2 ) ) ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
return 0 ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
2022-03-04 19:40:45 +03:00
static int fl_dump_key_gtp_opt ( struct sk_buff * skb ,
struct flow_dissector_key_enc_opts * enc_opts )
{
struct gtp_pdu_session_info * session_info ;
struct nlattr * nest ;
nest = nla_nest_start_noflag ( skb , TCA_FLOWER_KEY_ENC_OPTS_GTP ) ;
if ( ! nest )
goto nla_put_failure ;
session_info = ( struct gtp_pdu_session_info * ) & enc_opts - > data [ 0 ] ;
if ( nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_GTP_PDU_TYPE ,
session_info - > pdu_type ) )
goto nla_put_failure ;
if ( nla_put_u8 ( skb , TCA_FLOWER_KEY_ENC_OPT_GTP_QFI , session_info - > qfi ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
return 0 ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
2019-07-09 10:30:50 +03:00
static int fl_dump_key_ct ( struct sk_buff * skb ,
struct flow_dissector_key_ct * key ,
struct flow_dissector_key_ct * mask )
{
if ( IS_ENABLED ( CONFIG_NF_CONNTRACK ) & &
fl_dump_key_val ( skb , & key - > ct_state , TCA_FLOWER_KEY_CT_STATE ,
& mask - > ct_state , TCA_FLOWER_KEY_CT_STATE_MASK ,
sizeof ( key - > ct_state ) ) )
goto nla_put_failure ;
if ( IS_ENABLED ( CONFIG_NF_CONNTRACK_ZONES ) & &
fl_dump_key_val ( skb , & key - > ct_zone , TCA_FLOWER_KEY_CT_ZONE ,
& mask - > ct_zone , TCA_FLOWER_KEY_CT_ZONE_MASK ,
sizeof ( key - > ct_zone ) ) )
goto nla_put_failure ;
if ( IS_ENABLED ( CONFIG_NF_CONNTRACK_MARK ) & &
fl_dump_key_val ( skb , & key - > ct_mark , TCA_FLOWER_KEY_CT_MARK ,
& mask - > ct_mark , TCA_FLOWER_KEY_CT_MARK_MASK ,
sizeof ( key - > ct_mark ) ) )
goto nla_put_failure ;
if ( IS_ENABLED ( CONFIG_NF_CONNTRACK_LABELS ) & &
fl_dump_key_val ( skb , & key - > ct_labels , TCA_FLOWER_KEY_CT_LABELS ,
& mask - > ct_labels , TCA_FLOWER_KEY_CT_LABELS_MASK ,
sizeof ( key - > ct_labels ) ) )
goto nla_put_failure ;
return 0 ;
nla_put_failure :
return - EMSGSIZE ;
}
2018-08-07 18:36:01 +03:00
static int fl_dump_key_options ( struct sk_buff * skb , int enc_opt_type ,
struct flow_dissector_key_enc_opts * enc_opts )
{
struct nlattr * nest ;
int err ;
if ( ! enc_opts - > len )
return 0 ;
2019-04-26 12:13:06 +03:00
nest = nla_nest_start_noflag ( skb , enc_opt_type ) ;
2018-08-07 18:36:01 +03:00
if ( ! nest )
goto nla_put_failure ;
switch ( enc_opts - > dst_opt_type ) {
case TUNNEL_GENEVE_OPT :
err = fl_dump_key_geneve_opt ( skb , enc_opts ) ;
if ( err )
goto nla_put_failure ;
break ;
2019-11-21 13:03:28 +03:00
case TUNNEL_VXLAN_OPT :
err = fl_dump_key_vxlan_opt ( skb , enc_opts ) ;
if ( err )
goto nla_put_failure ;
break ;
2019-11-21 13:03:29 +03:00
case TUNNEL_ERSPAN_OPT :
err = fl_dump_key_erspan_opt ( skb , enc_opts ) ;
if ( err )
goto nla_put_failure ;
break ;
2022-03-04 19:40:45 +03:00
case TUNNEL_GTP_OPT :
err = fl_dump_key_gtp_opt ( skb , enc_opts ) ;
if ( err )
goto nla_put_failure ;
break ;
2018-08-07 18:36:01 +03:00
default :
goto nla_put_failure ;
}
nla_nest_end ( skb , nest ) ;
return 0 ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
static int fl_dump_key_enc_opt ( struct sk_buff * skb ,
struct flow_dissector_key_enc_opts * key_opts ,
struct flow_dissector_key_enc_opts * msk_opts )
{
int err ;
err = fl_dump_key_options ( skb , TCA_FLOWER_KEY_ENC_OPTS , key_opts ) ;
if ( err )
return err ;
return fl_dump_key_options ( skb , TCA_FLOWER_KEY_ENC_OPTS_MASK , msk_opts ) ;
}
2018-07-23 10:23:08 +03:00
static int fl_dump_key ( struct sk_buff * skb , struct net * net ,
struct fl_flow_key * key , struct fl_flow_key * mask )
2015-05-12 15:56:21 +03:00
{
2019-06-19 09:41:03 +03:00
if ( mask - > meta . ingress_ifindex ) {
2015-05-12 15:56:21 +03:00
struct net_device * dev ;
2019-06-19 09:41:03 +03:00
dev = __dev_get_by_index ( net , key - > meta . ingress_ifindex ) ;
2015-05-12 15:56:21 +03:00
if ( dev & & nla_put_string ( skb , TCA_FLOWER_INDEV , dev - > name ) )
goto nla_put_failure ;
}
if ( fl_dump_key_val ( skb , key - > eth . dst , TCA_FLOWER_KEY_ETH_DST ,
mask - > eth . dst , TCA_FLOWER_KEY_ETH_DST_MASK ,
sizeof ( key - > eth . dst ) ) | |
fl_dump_key_val ( skb , key - > eth . src , TCA_FLOWER_KEY_ETH_SRC ,
mask - > eth . src , TCA_FLOWER_KEY_ETH_SRC_MASK ,
sizeof ( key - > eth . src ) ) | |
fl_dump_key_val ( skb , & key - > basic . n_proto , TCA_FLOWER_KEY_ETH_TYPE ,
& mask - > basic . n_proto , TCA_FLOWER_UNSPEC ,
sizeof ( key - > basic . n_proto ) ) )
goto nla_put_failure ;
2016-08-17 13:36:13 +03:00
2017-04-22 23:52:47 +03:00
if ( fl_dump_key_mpls ( skb , & key - > mpls , & mask - > mpls ) )
goto nla_put_failure ;
2018-07-06 08:38:16 +03:00
if ( fl_dump_key_vlan ( skb , TCA_FLOWER_KEY_VLAN_ID ,
TCA_FLOWER_KEY_VLAN_PRIO , & key - > vlan , & mask - > vlan ) )
2016-08-17 13:36:13 +03:00
goto nla_put_failure ;
2018-07-06 08:38:16 +03:00
if ( fl_dump_key_vlan ( skb , TCA_FLOWER_KEY_CVLAN_ID ,
TCA_FLOWER_KEY_CVLAN_PRIO ,
& key - > cvlan , & mask - > cvlan ) | |
( mask - > cvlan . vlan_tpid & &
2018-07-25 05:31:25 +03:00
nla_put_be16 ( skb , TCA_FLOWER_KEY_VLAN_ETH_TYPE ,
key - > cvlan . vlan_tpid ) ) )
2018-07-06 08:38:15 +03:00
goto nla_put_failure ;
2018-07-09 05:26:20 +03:00
if ( mask - > basic . n_proto ) {
2022-04-06 14:22:41 +03:00
if ( mask - > cvlan . vlan_eth_type ) {
2018-07-09 05:26:20 +03:00
if ( nla_put_be16 ( skb , TCA_FLOWER_KEY_CVLAN_ETH_TYPE ,
key - > basic . n_proto ) )
goto nla_put_failure ;
2022-04-06 14:22:41 +03:00
} else if ( mask - > vlan . vlan_eth_type ) {
2018-07-09 05:26:20 +03:00
if ( nla_put_be16 ( skb , TCA_FLOWER_KEY_VLAN_ETH_TYPE ,
2022-04-06 14:22:41 +03:00
key - > vlan . vlan_eth_type ) )
2018-07-09 05:26:20 +03:00
goto nla_put_failure ;
}
2018-07-06 08:38:16 +03:00
}
2015-05-12 15:56:21 +03:00
if ( ( key - > basic . n_proto = = htons ( ETH_P_IP ) | |
key - > basic . n_proto = = htons ( ETH_P_IPV6 ) ) & &
2017-06-01 21:37:38 +03:00
( fl_dump_key_val ( skb , & key - > basic . ip_proto , TCA_FLOWER_KEY_IP_PROTO ,
2015-05-12 15:56:21 +03:00
& mask - > basic . ip_proto , TCA_FLOWER_UNSPEC ,
2017-06-01 21:37:38 +03:00
sizeof ( key - > basic . ip_proto ) ) | |
2018-07-17 19:27:18 +03:00
fl_dump_key_ip ( skb , false , & key - > ip , & mask - > ip ) ) )
2015-05-12 15:56:21 +03:00
goto nla_put_failure ;
2015-06-04 19:16:40 +03:00
if ( key - > control . addr_type = = FLOW_DISSECTOR_KEY_IPV4_ADDRS & &
2015-05-12 15:56:21 +03:00
( fl_dump_key_val ( skb , & key - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC ,
& mask - > ipv4 . src , TCA_FLOWER_KEY_IPV4_SRC_MASK ,
sizeof ( key - > ipv4 . src ) ) | |
fl_dump_key_val ( skb , & key - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST ,
& mask - > ipv4 . dst , TCA_FLOWER_KEY_IPV4_DST_MASK ,
sizeof ( key - > ipv4 . dst ) ) ) )
goto nla_put_failure ;
2015-06-04 19:16:40 +03:00
else if ( key - > control . addr_type = = FLOW_DISSECTOR_KEY_IPV6_ADDRS & &
2015-05-12 15:56:21 +03:00
( fl_dump_key_val ( skb , & key - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC ,
& mask - > ipv6 . src , TCA_FLOWER_KEY_IPV6_SRC_MASK ,
sizeof ( key - > ipv6 . src ) ) | |
fl_dump_key_val ( skb , & key - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST ,
& mask - > ipv6 . dst , TCA_FLOWER_KEY_IPV6_DST_MASK ,
sizeof ( key - > ipv6 . dst ) ) ) )
goto nla_put_failure ;
if ( key - > basic . ip_proto = = IPPROTO_TCP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_TCP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_TCP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_TCP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_TCP_DST_MASK ,
2017-05-23 19:40:45 +03:00
sizeof ( key - > tp . dst ) ) | |
fl_dump_key_val ( skb , & key - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS ,
& mask - > tcp . flags , TCA_FLOWER_KEY_TCP_FLAGS_MASK ,
sizeof ( key - > tcp . flags ) ) ) )
2015-05-12 15:56:21 +03:00
goto nla_put_failure ;
else if ( key - > basic . ip_proto = = IPPROTO_UDP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_UDP_SRC ,
2016-09-15 15:28:22 +03:00
& mask - > tp . src , TCA_FLOWER_KEY_UDP_SRC_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_UDP_DST ,
2016-09-15 15:28:22 +03:00
& mask - > tp . dst , TCA_FLOWER_KEY_UDP_DST_MASK ,
2016-11-03 15:24:21 +03:00
sizeof ( key - > tp . dst ) ) ) )
goto nla_put_failure ;
else if ( key - > basic . ip_proto = = IPPROTO_SCTP & &
( fl_dump_key_val ( skb , & key - > tp . src , TCA_FLOWER_KEY_SCTP_SRC ,
& mask - > tp . src , TCA_FLOWER_KEY_SCTP_SRC_MASK ,
sizeof ( key - > tp . src ) ) | |
fl_dump_key_val ( skb , & key - > tp . dst , TCA_FLOWER_KEY_SCTP_DST ,
& mask - > tp . dst , TCA_FLOWER_KEY_SCTP_DST_MASK ,
2015-05-12 15:56:21 +03:00
sizeof ( key - > tp . dst ) ) ) )
goto nla_put_failure ;
2016-12-07 15:48:28 +03:00
else if ( key - > basic . n_proto = = htons ( ETH_P_IP ) & &
key - > basic . ip_proto = = IPPROTO_ICMP & &
( fl_dump_key_val ( skb , & key - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE , & mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV4_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) | |
fl_dump_key_val ( skb , & key - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE , & mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV4_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ) )
goto nla_put_failure ;
else if ( key - > basic . n_proto = = htons ( ETH_P_IPV6 ) & &
key - > basic . ip_proto = = IPPROTO_ICMPV6 & &
( fl_dump_key_val ( skb , & key - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE , & mask - > icmp . type ,
TCA_FLOWER_KEY_ICMPV6_TYPE_MASK ,
sizeof ( key - > icmp . type ) ) | |
fl_dump_key_val ( skb , & key - > icmp . code ,
TCA_FLOWER_KEY_ICMPV6_CODE , & mask - > icmp . code ,
TCA_FLOWER_KEY_ICMPV6_CODE_MASK ,
sizeof ( key - > icmp . code ) ) ) )
goto nla_put_failure ;
2017-01-11 16:05:43 +03:00
else if ( ( key - > basic . n_proto = = htons ( ETH_P_ARP ) | |
key - > basic . n_proto = = htons ( ETH_P_RARP ) ) & &
( fl_dump_key_val ( skb , & key - > arp . sip ,
TCA_FLOWER_KEY_ARP_SIP , & mask - > arp . sip ,
TCA_FLOWER_KEY_ARP_SIP_MASK ,
sizeof ( key - > arp . sip ) ) | |
fl_dump_key_val ( skb , & key - > arp . tip ,
TCA_FLOWER_KEY_ARP_TIP , & mask - > arp . tip ,
TCA_FLOWER_KEY_ARP_TIP_MASK ,
sizeof ( key - > arp . tip ) ) | |
fl_dump_key_val ( skb , & key - > arp . op ,
TCA_FLOWER_KEY_ARP_OP , & mask - > arp . op ,
TCA_FLOWER_KEY_ARP_OP_MASK ,
sizeof ( key - > arp . op ) ) | |
fl_dump_key_val ( skb , key - > arp . sha , TCA_FLOWER_KEY_ARP_SHA ,
mask - > arp . sha , TCA_FLOWER_KEY_ARP_SHA_MASK ,
sizeof ( key - > arp . sha ) ) | |
fl_dump_key_val ( skb , key - > arp . tha , TCA_FLOWER_KEY_ARP_THA ,
mask - > arp . tha , TCA_FLOWER_KEY_ARP_THA_MASK ,
sizeof ( key - > arp . tha ) ) ) )
goto nla_put_failure ;
2015-05-12 15:56:21 +03:00
2018-11-13 03:15:55 +03:00
if ( ( key - > basic . ip_proto = = IPPROTO_TCP | |
key - > basic . ip_proto = = IPPROTO_UDP | |
key - > basic . ip_proto = = IPPROTO_SCTP ) & &
fl_dump_key_port_range ( skb , key , mask ) )
goto nla_put_failure ;
2016-09-08 16:23:47 +03:00
if ( key - > enc_control . addr_type = = FLOW_DISSECTOR_KEY_IPV4_ADDRS & &
( fl_dump_key_val ( skb , & key - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC , & mask - > enc_ipv4 . src ,
TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK ,
sizeof ( key - > enc_ipv4 . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST , & mask - > enc_ipv4 . dst ,
TCA_FLOWER_KEY_ENC_IPV4_DST_MASK ,
sizeof ( key - > enc_ipv4 . dst ) ) ) )
goto nla_put_failure ;
else if ( key - > enc_control . addr_type = = FLOW_DISSECTOR_KEY_IPV6_ADDRS & &
( fl_dump_key_val ( skb , & key - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC , & mask - > enc_ipv6 . src ,
TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK ,
sizeof ( key - > enc_ipv6 . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST ,
& mask - > enc_ipv6 . dst ,
TCA_FLOWER_KEY_ENC_IPV6_DST_MASK ,
sizeof ( key - > enc_ipv6 . dst ) ) ) )
goto nla_put_failure ;
if ( fl_dump_key_val ( skb , & key - > enc_key_id , TCA_FLOWER_KEY_ENC_KEY_ID ,
2016-09-27 11:21:18 +03:00
& mask - > enc_key_id , TCA_FLOWER_UNSPEC ,
2016-11-07 16:14:39 +03:00
sizeof ( key - > enc_key_id ) ) | |
fl_dump_key_val ( skb , & key - > enc_tp . src ,
TCA_FLOWER_KEY_ENC_UDP_SRC_PORT ,
& mask - > enc_tp . src ,
TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK ,
sizeof ( key - > enc_tp . src ) ) | |
fl_dump_key_val ( skb , & key - > enc_tp . dst ,
TCA_FLOWER_KEY_ENC_UDP_DST_PORT ,
& mask - > enc_tp . dst ,
TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK ,
2018-07-17 19:27:18 +03:00
sizeof ( key - > enc_tp . dst ) ) | |
2018-08-07 18:36:01 +03:00
fl_dump_key_ip ( skb , true , & key - > enc_ip , & mask - > enc_ip ) | |
fl_dump_key_enc_opt ( skb , & key - > enc_opts , & mask - > enc_opts ) )
2016-09-08 16:23:47 +03:00
goto nla_put_failure ;
2019-07-09 10:30:50 +03:00
if ( fl_dump_key_ct ( skb , & key - > ct , & mask - > ct ) )
goto nla_put_failure ;
2016-12-07 15:03:10 +03:00
if ( fl_dump_key_flags ( skb , key - > control . flags , mask - > control . flags ) )
goto nla_put_failure ;
2020-07-23 01:03:01 +03:00
if ( fl_dump_key_val ( skb , & key - > hash . hash , TCA_FLOWER_KEY_HASH ,
& mask - > hash . hash , TCA_FLOWER_KEY_HASH_MASK ,
sizeof ( key - > hash . hash ) ) )
goto nla_put_failure ;
2018-07-23 10:23:08 +03:00
return 0 ;
nla_put_failure :
return - EMSGSIZE ;
}
static int fl_dump ( struct net * net , struct tcf_proto * tp , void * fh ,
2019-02-11 11:55:45 +03:00
struct sk_buff * skb , struct tcmsg * t , bool rtnl_held )
2018-07-23 10:23:08 +03:00
{
struct cls_fl_filter * f = fh ;
struct nlattr * nest ;
struct fl_flow_key * key , * mask ;
2019-03-21 16:17:42 +03:00
bool skip_hw ;
2018-07-23 10:23:08 +03:00
if ( ! f )
return skb - > len ;
t - > tcm_handle = f - > handle ;
2019-04-26 12:13:06 +03:00
nest = nla_nest_start_noflag ( skb , TCA_OPTIONS ) ;
2018-07-23 10:23:08 +03:00
if ( ! nest )
goto nla_put_failure ;
2019-03-21 16:17:42 +03:00
spin_lock ( & tp - > lock ) ;
2018-07-23 10:23:08 +03:00
if ( f - > res . classid & &
nla_put_u32 ( skb , TCA_FLOWER_CLASSID , f - > res . classid ) )
2019-03-21 16:17:42 +03:00
goto nla_put_failure_locked ;
2018-07-23 10:23:08 +03:00
key = & f - > key ;
mask = & f - > mask - > key ;
2019-03-21 16:17:42 +03:00
skip_hw = tc_skip_hw ( f - > flags ) ;
2018-07-23 10:23:08 +03:00
if ( fl_dump_key ( skb , net , key , mask ) )
2019-03-21 16:17:42 +03:00
goto nla_put_failure_locked ;
2018-07-23 10:23:08 +03:00
2017-02-16 11:31:10 +03:00
if ( f - > flags & & nla_put_u32 ( skb , TCA_FLOWER_FLAGS , f - > flags ) )
2019-03-21 16:17:42 +03:00
goto nla_put_failure_locked ;
spin_unlock ( & tp - > lock ) ;
if ( ! skip_hw )
2019-03-21 16:17:43 +03:00
fl_hw_update_stats ( tp , f , rtnl_held ) ;
2016-06-05 17:11:18 +03:00
2018-09-07 17:22:21 +03:00
if ( nla_put_u32 ( skb , TCA_FLOWER_IN_HW_COUNT , f - > in_hw_count ) )
goto nla_put_failure ;
2015-05-12 15:56:21 +03:00
if ( tcf_exts_dump ( skb , & f - > exts ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
if ( tcf_exts_dump_stats ( skb , & f - > exts ) < 0 )
goto nla_put_failure ;
return skb - > len ;
2019-03-21 16:17:42 +03:00
nla_put_failure_locked :
spin_unlock ( & tp - > lock ) ;
2015-05-12 15:56:21 +03:00
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - 1 ;
}
2020-05-15 14:40:13 +03:00
static int fl_terse_dump ( struct net * net , struct tcf_proto * tp , void * fh ,
struct sk_buff * skb , struct tcmsg * t , bool rtnl_held )
{
struct cls_fl_filter * f = fh ;
struct nlattr * nest ;
bool skip_hw ;
if ( ! f )
return skb - > len ;
t - > tcm_handle = f - > handle ;
nest = nla_nest_start_noflag ( skb , TCA_OPTIONS ) ;
if ( ! nest )
goto nla_put_failure ;
spin_lock ( & tp - > lock ) ;
skip_hw = tc_skip_hw ( f - > flags ) ;
if ( f - > flags & & nla_put_u32 ( skb , TCA_FLOWER_FLAGS , f - > flags ) )
goto nla_put_failure_locked ;
spin_unlock ( & tp - > lock ) ;
if ( ! skip_hw )
fl_hw_update_stats ( tp , f , rtnl_held ) ;
if ( tcf_exts_terse_dump ( skb , & f - > exts ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
return skb - > len ;
nla_put_failure_locked :
spin_unlock ( & tp - > lock ) ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - 1 ;
}
2018-07-23 10:23:10 +03:00
static int fl_tmplt_dump ( struct sk_buff * skb , struct net * net , void * tmplt_priv )
{
struct fl_flow_tmplt * tmplt = tmplt_priv ;
struct fl_flow_key * key , * mask ;
struct nlattr * nest ;
2019-04-26 12:13:06 +03:00
nest = nla_nest_start_noflag ( skb , TCA_OPTIONS ) ;
2018-07-23 10:23:10 +03:00
if ( ! nest )
goto nla_put_failure ;
key = & tmplt - > dummy_key ;
mask = & tmplt - > mask ;
if ( fl_dump_key ( skb , net , key , mask ) )
goto nla_put_failure ;
nla_nest_end ( skb , nest ) ;
return skb - > len ;
nla_put_failure :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
2020-01-24 03:26:18 +03:00
static void fl_bind_class ( void * fh , u32 classid , unsigned long cl , void * q ,
unsigned long base )
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 00:30:36 +03:00
{
struct cls_fl_filter * f = fh ;
2020-01-24 03:26:18 +03:00
if ( f & & f - > res . classid = = classid ) {
if ( cl )
__tcf_bind_filter ( q , & f - > res , base ) ;
else
__tcf_unbind_filter ( q , & f - > res ) ;
}
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 00:30:36 +03:00
}
net/sched: add delete_empty() to filters and use it in cls_flower
Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:
6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.
This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.
Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller
Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28 18:36:58 +03:00
static bool fl_delete_empty ( struct tcf_proto * tp )
{
struct cls_fl_head * head = fl_head_dereference ( tp ) ;
spin_lock ( & tp - > lock ) ;
tp - > deleting = idr_is_empty ( & head - > handle_idr ) ;
spin_unlock ( & tp - > lock ) ;
return tp - > deleting ;
}
2015-05-12 15:56:21 +03:00
static struct tcf_proto_ops cls_fl_ops __read_mostly = {
. kind = " flower " ,
. classify = fl_classify ,
. init = fl_init ,
. destroy = fl_destroy ,
. get = fl_get ,
2019-03-21 16:17:35 +03:00
. put = fl_put ,
2015-05-12 15:56:21 +03:00
. change = fl_change ,
. delete = fl_delete ,
net/sched: add delete_empty() to filters and use it in cls_flower
Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:
6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.
This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.
Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller
Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28 18:36:58 +03:00
. delete_empty = fl_delete_empty ,
2015-05-12 15:56:21 +03:00
. walk = fl_walk ,
2018-06-26 00:30:06 +03:00
. reoffload = fl_reoffload ,
2019-08-26 16:45:00 +03:00
. hw_add = fl_hw_add ,
. hw_del = fl_hw_del ,
2015-05-12 15:56:21 +03:00
. dump = fl_dump ,
2020-05-15 14:40:13 +03:00
. terse_dump = fl_terse_dump ,
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 00:30:36 +03:00
. bind_class = fl_bind_class ,
2018-07-23 10:23:10 +03:00
. tmplt_create = fl_tmplt_create ,
. tmplt_destroy = fl_tmplt_destroy ,
. tmplt_dump = fl_tmplt_dump ,
2015-05-12 15:56:21 +03:00
. owner = THIS_MODULE ,
2019-03-21 16:17:44 +03:00
. flags = TCF_PROTO_OPS_DOIT_UNLOCKED ,
2015-05-12 15:56:21 +03:00
} ;
static int __init cls_fl_init ( void )
{
return register_tcf_proto_ops ( & cls_fl_ops ) ;
}
static void __exit cls_fl_exit ( void )
{
unregister_tcf_proto_ops ( & cls_fl_ops ) ;
}
module_init ( cls_fl_init ) ;
module_exit ( cls_fl_exit ) ;
MODULE_AUTHOR ( " Jiri Pirko <jiri@resnulli.us> " ) ;
MODULE_DESCRIPTION ( " Flower classifier " ) ;
MODULE_LICENSE ( " GPL v2 " ) ;