linux/net/core
Martin KaFai Lau 46f8bc9275 bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock.  For example, in __netdev_pick_tx:

static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
			    struct net_device *sb_dev)
{
	/* ... */

	struct sock *sk = skb->sk;

		if (queue_index != new_index && sk &&
		    sk_fullsock(sk) &&
		    rcu_access_pointer(sk->sk_dst_cache))
			sk_tx_queue_set(sk, new_index);

	/* ... */

	return queue_index;
}

This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().

"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer.  It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
     are not allowed.

The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).

int cg_skb_foo(struct __sk_buff *skb) {
	struct bpf_sock *sk;

	sk = skb->sk;
	if (!sk)
		return 1;

	sk = bpf_sk_fullsock(sk);
	if (!sk)
		return 1;

	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
		return 1;

	/* some_traffic_shaping(); */

	return 1;
}

(1) The sk is read only

(2) There is no new "struct bpf_sock_common" introduced.

(3) Future kernel sock's members could be added to bpf_sock only
    instead of repeatedly adding at multiple places like currently
    in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.

(4) After "sk = skb->sk", the reg holding sk is in type
    PTR_TO_SOCK_COMMON_OR_NULL.

(5) After bpf_sk_fullsock(), the return type will be in type
    PTR_TO_SOCKET_OR_NULL which is the same as the return type of
    bpf_sk_lookup_xxx().

    However, bpf_sk_fullsock() does not take refcnt.  The
    acquire_reference_state() is only depending on the return type now.
    To avoid it, a new is_acquire_function() is checked before calling
    acquire_reference_state().

(6) The WARN_ON in "release_reference_state()" is no longer an
    internal verifier bug.

    When reg->id is not found in state->refs[], it means the
    bpf_prog does something wrong like
    "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
    never been acquired by calling "bpf_sk_fullsock(skb->sk)".

    A -EINVAL and a verbose are done instead of WARN_ON.  A test is
    added to the test_verifier in a later patch.

    Since the WARN_ON in "release_reference_state()" is no longer
    needed, "__release_reference_state()" is folded into
    "release_reference_state()" also.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-10 19:46:17 -08:00
..
datagram.c for-4.21/block-20181221 2018-12-28 13:19:59 -08:00
dev_addr_lists.c net: dev: Issue NETDEV_PRE_CHANGEADDR 2018-12-13 18:41:38 -08:00
dev_ioctl.c net: dev: Add extack argument to dev_set_mac_address() 2018-12-13 18:41:38 -08:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-02-06 16:56:20 -08:00
devlink.c devlink: Add health dump {get,clear} commands 2019-02-07 10:34:29 -08:00
drop_monitor.c treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
dst_cache.c net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency 2018-03-05 12:52:45 -05:00
dst.c net: add a route cache full diagnostic message 2019-01-17 15:37:25 -08:00
ethtool.c ethtool: add ethtool_rx_flow_spec to flow_rule structure translator 2019-02-06 10:38:26 -08:00
failover.c net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_notifier.c net: Fix fib notifer to return errno 2018-03-29 14:10:30 -04:00
fib_rules.c net/fib_rules: Update fib_nl_dumprule for strict data checking 2018-10-08 10:39:05 -07:00
filter.c bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper 2019-02-10 19:46:17 -08:00
flow_dissector.c net/flow_dissector: move bpf case into __skb_flow_bpf_dissect 2019-01-29 01:08:29 +01:00
flow_offload.c flow_offload: add flow action infrastructure 2019-02-06 10:38:25 -08:00
gen_estimator.c net: core: protect rate estimator statistics pointer with lock 2018-08-11 12:37:10 -07:00
gen_stats.c net/core: make function ___gnet_stats_copy_basic() static 2018-09-28 10:25:11 -07:00
gro_cells.c gro_cell: add napi_disable in gro_cells_destroy 2018-12-19 15:50:02 -08:00
hwbm.c
link_watch.c net: linkwatch: add check for netdevice being present to linkwatch_do_dev 2018-09-19 21:06:46 -07:00
lwt_bpf.c bpf: in __bpf_redirect_no_mac pull mac only if present 2019-01-20 01:11:48 +01:00
lwtunnel.c ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
Makefile flow_offload: add flow_rule and flow_match structures and use them 2019-02-06 10:38:25 -08:00
neighbour.c neighbour: Do not perturb drop profiles when neigh_probe 2019-01-17 22:08:14 -08:00
net_namespace.c net: namespace: perform strict checks also for doit handlers 2019-01-19 10:09:58 -08:00
net-procfs.c proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
net-sysfs.c net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
net-sysfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
net-traces.c net/ipv6: Udate fib6_table_lookup tracepoint 2018-05-24 23:01:15 -04:00
netclassid_cgroup.c cgroup, netclassid: add a preemption point to write_classid 2018-10-23 12:58:17 -07:00
netevent.c
netpoll.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
netprio_cgroup.c net: remove duplicate includes 2017-12-13 13:18:46 -05:00
page_pool.c net/page_pool: Fix inconsistent lock state warning 2018-07-19 23:23:01 -07:00
pktgen.c pktgen: Fix fall-through annotation 2018-09-13 15:36:41 -07:00
ptp_classifier.c
request_sock.c
rtnetlink.c net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID 2019-02-06 14:17:03 -08:00
scm.c socket: Add SO_TIMESTAMPING_NEW 2019-02-03 11:17:31 -08:00
secure_seq.c infiniband: i40iw, nes: don't use wall time for TCP sequence numbers 2018-07-11 12:10:19 -06:00
skbuff.c net, skbuff: do not prefer skb allocation fails early 2019-01-04 12:53:16 -08:00
skmsg.c Optimize sk_msg_clone() by data merge to end dst sg entry 2019-01-17 11:42:26 -08:00
sock_diag.c net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() 2018-08-14 10:01:24 -07:00
sock_map.c bpf: skmsg, fix psock create on existing kcm/tls port 2018-10-20 00:40:45 +02:00
sock_reuseport.c sctp: add sock_reuseport for the sock in __sctp_hash_endpoint 2018-11-12 09:09:51 -08:00
sock.c net: Fix fall through warning in y2038 tstamp changes. 2019-02-03 20:25:31 -08:00
stream.c tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT 2018-12-04 21:21:18 -08:00
sysctl_net_core.c net: introduce a knob to control whether to inherit devconf config 2019-01-22 11:07:21 -08:00
timestamping.c
tso.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
utils.c net: Remove some unneeded semicolon 2018-08-04 13:05:39 -07:00
xdp.c xdp: remove redundant variable 'headroom' 2018-09-01 01:35:53 +02:00