Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-06-01

The following pull-request contains BPF updates for your *net-next* tree.

We've added 55 non-merge commits during the last 1 day(s) which contain
a total of 91 files changed, 4986 insertions(+), 463 deletions(-).

The main changes are:

1) Add rx_queue_mapping to bpf_sock from Amritha.

2) Add BPF ring buffer, from Andrii.

3) Attach and run programs through devmap, from David.

4) Allow SO_BINDTODEVICE opt in bpf_setsockopt, from Ferenc.

5) link based flow_dissector, from Jakub.

6) Use tracing helpers for lsm programs, from Jiri.

7) Several sk_msg fixes and extensions, from John.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2020-06-01 15:53:08 -07:00
commit 9a25c1df24
99 changed files with 5053 additions and 542 deletions

View File

@ -0,0 +1,209 @@
===============
BPF ring buffer
===============
This document describes BPF ring buffer design, API, and implementation details.
.. contents::
:local:
:depth: 2
Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
- more efficient memory utilization by sharing ring buffer across CPUs;
- preserving ordering of events that happen sequentially in time, even across
multiple CPUs (e.g., fork/exec/exit events for a task).
These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer. Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.
Semantics and APIs
------------------
Single ring buffer is presented to BPF programs as an instance of BPF map of
type ``BPF_MAP_TYPE_RINGBUF``. Two other alternatives considered, but
ultimately rejected.
One way would be to, similar to ``BPF_MAP_TYPE_PERF_EVENT_ARRAY``, make
``BPF_MAP_TYPE_RINGBUF`` could represent an array of ring buffers, but not
enforce "same CPU only" rule. This would be more familiar interface compatible
with existing perf buffer use in BPF, but would fail if application needed more
advanced logic to lookup ring buffer by arbitrary key.
``BPF_MAP_TYPE_HASH_OF_MAPS`` addresses this with current approach.
Additionally, given the performance of BPF ringbuf, many use cases would just
opt into a simple single ring buffer shared among all CPUs, for which current
approach would be an overkill.
Another approach could introduce a new concept, alongside BPF map, to represent
generic "container" object, which doesn't necessarily have key/value interface
with lookup/update/delete operations. This approach would add a lot of extra
infrastructure that has to be built for observability and verifier support. It
would also add another concept that BPF developers would have to familiarize
themselves with, new syntax in libbpf, etc. But then would really provide no
additional benefits over the approach of using a map. ``BPF_MAP_TYPE_RINGBUF``
doesn't support lookup/update/delete operations, but so doesn't few other map
types (e.g., queue and stack; array doesn't support delete, etc).
The approach chosen has an advantage of re-using existing BPF map
infrastructure (introspection APIs in kernel, libbpf support, etc), being
familiar concept (no need to teach users a new type of object in BPF program),
and utilizing existing tooling (bpftool). For common scenario of using a single
ring buffer for all CPUs, it's as simple and straightforward, as would be with
a dedicated "container" object. On the other hand, by being a map, it can be
combined with ``ARRAY_OF_MAPS`` and ``HASH_OF_MAPS`` map-in-maps to implement
a wide variety of topologies, from one ring buffer for each CPU (e.g., as
a replacement for perf buffer use cases), to a complicated application
hashing/sharding of ring buffers (e.g., having a small pool of ring buffers
with hashed task's tgid being a look up key to preserve order, but reduce
contention).
Key and value sizes are enforced to be zero. ``max_entries`` is used to specify
the size of ring buffer and has to be a power of 2 value.
There are a bunch of similarities between perf buffer
(``BPF_MAP_TYPE_PERF_EVENT_ARRAY``) and new BPF ring buffer semantics:
- variable-length records;
- if there is no more space left in ring buffer, reservation fails, no
blocking;
- memory-mappable data area for user-space applications for ease of
consumption and high performance;
- epoll notifications for new incoming data;
- but still the ability to do busy polling for new data to achieve the
lowest latency, if necessary.
BPF ringbuf provides two sets of APIs to BPF programs:
- ``bpf_ringbuf_output()`` allows to *copy* data from one place to a ring
buffer, similarly to ``bpf_perf_event_output()``;
- ``bpf_ringbuf_reserve()``/``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()``
APIs split the whole process into two steps. First, a fixed amount of space
is reserved. If successful, a pointer to a data inside ring buffer data
area is returned, which BPF programs can use similarly to a data inside
array/hash maps. Once ready, this piece of memory is either committed or
discarded. Discard is similar to commit, but makes consumer ignore the
record.
``bpf_ringbuf_output()`` has disadvantage of incurring extra memory copy,
because record has to be prepared in some other place first. But it allows to
submit records of the length that's not known to verifier beforehand. It also
closely matches ``bpf_perf_event_output()``, so will simplify migration
significantly.
``bpf_ringbuf_reserve()`` avoids the extra copy of memory by providing a memory
pointer directly to ring buffer memory. In a lot of cases records are larger
than BPF stack space allows, so many programs have use extra per-CPU array as
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
completely. But in exchange, it only allows a known constant size of memory to
be reserved, such that verifier can verify that BPF program can't access memory
outside its reserved record space. bpf_ringbuf_output(), while slightly slower
due to extra memory copy, covers some use cases that are not suitable for
``bpf_ringbuf_reserve()``.
The difference between commit and discard is very small. Discard just marks
a record as discarded, and such records are supposed to be ignored by consumer
code. Discard is useful for some advanced use-cases, such as ensuring
all-or-nothing multi-record submission, or emulating temporary
``malloc()``/``free()`` within single BPF program invocation.
Each reserved record is tracked by verifier through existing
reference-tracking logic, similar to socket ref-tracking. It is thus
impossible to reserve a record, but forget to submit (or discard) it.
``bpf_ringbuf_query()`` helper allows to query various properties of ring
buffer. Currently 4 are supported:
- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical possition
of consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
off by the time helper returns, so this should be used only for
debugging/reporting reasons or for implementing various heuristics, that take
into account highly-changeable nature of some of those characteristics.
One such heuristic might involve more fine-grained control over poll/epoll
notifications about new data availability in ring buffer. Together with
``BPF_RB_NO_WAKEUP``/``BPF_RB_FORCE_WAKEUP`` flags for output/commit/discard
helpers, it allows BPF program a high degree of control and, e.g., more
efficient batched notifications. Default self-balancing strategy, though,
should be adequate for most applications and will work reliable and efficiently
already.
Design and Implementation
-------------------------
This reserve/commit schema allows a natural way for multiple producers, either
on different CPUs or even on the same CPU/in the same BPF program, to reserve
independent records and work with them without blocking other producers. This
means that if BPF program was interruped by another BPF program sharing the
same ring buffer, they will both get a record reserved (provided there is
enough space left) and can work with it and submit it independently. This
applies to NMI context as well, except that due to using a spinlock during
reservation, in NMI context, ``bpf_ringbuf_reserve()`` might fail to get
a lock, in which case reservation will fail even if ring buffer is not full.
The ring buffer itself internally is implemented as a power-of-2 sized
circular buffer, with two logical and ever-increasing counters (which might
wrap around on 32-bit architectures, that's not a problem):
- consumer counter shows up to which logical position consumer consumed the
data;
- producer counter denotes amount of data reserved by all producers.
Each time a record is reserved, producer that "owns" the record will
successfully advance producer counter. At that point, data is still not yet
ready to be consumed, though. Each record has 8 byte header, which contains the
length of reserved record, as well as two extra bits: busy bit to denote that
record is still being worked on, and discard bit, which might be set at commit
time if record is discarded. In the latter case, consumer is supposed to skip
the record and move on to the next one. Record header also encodes record's
relative offset from the beginning of ring buffer data area (in pages). This
allows ``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()`` to accept only the
pointer to the record itself, without requiring also the pointer to ring buffer
itself. Ring buffer memory location will be restored from record metadata
header. This significantly simplifies verifier, as well as improving API
usability.
Producer counter increments are serialized under spinlock, so there is
a strict ordering between reservations. Commits, on the other hand, are
completely lockless and independent. All records become available to consumer
in the order of reservations, but only after all previous records where
already committed. It is thus possible for slow producers to temporarily hold
off submitted records, that were reserved later.
Reservation/commit/consumer protocol is verified by litmus tests in
Documentation/litmus_tests/bpf-rb/_.
One interesting implementation bit, that significantly simplifies (and thus
speeds up as well) implementation of both producers and consumers is how data
area is mapped twice contiguously back-to-back in the virtual memory. This
allows to not take any special measures for samples that have to wrap around
at the end of the circular buffer data area, because the next page after the
last data page would be first data page again, and thus the sample will still
appear completely contiguous in virtual memory. See comment and a simple ASCII
diagram showing this visually in ``bpf_ringbuf_area_alloc()``.
Another feature that distinguishes BPF ringbuf from perf ring buffer is
a self-pacing notifications of new data being availability.
``bpf_ringbuf_commit()`` implementation will send a notification of new record
being available after commit only if consumer has already caught up right up to
the record being committed. If not, consumer still has to catch up and thus
will see new data anyways without needing an extra poll notification.
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c_) show that
this allows to achieve a very high throughput without having to resort to
tricks like "notify only every Nth sample", which are necessary with perf
buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept ``BPF_RB_NO_WAKEUP`` and
``BPF_RB_FORCE_WAKEUP`` flags, which give full control over notifications of
data availability, but require extra caution and diligence in using this API.

View File

@ -18456,7 +18456,7 @@ L: netdev@vger.kernel.org
L: bpf@vger.kernel.org
S: Maintained
F: include/net/xdp_sock*
F: include/net/xsk_buffer_pool.h
F: include/net/xsk_buff_pool.h
F: include/uapi/linux/if_xdp.h
F: net/xdp/
F: samples/bpf/xdpsock*

View File

@ -263,7 +263,7 @@ static int ena_xdp_tx_map_buff(struct ena_ring *xdp_ring,
dma_addr_t dma = 0;
u32 size;
tx_info->xdpf = convert_to_xdp_frame(xdp);
tx_info->xdpf = xdp_convert_buff_to_frame(xdp);
size = tx_info->xdpf->len;
ena_buf = tx_info->bufs;

View File

@ -2167,7 +2167,7 @@ static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
{
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return I40E_XDP_CONSUMED;

View File

@ -254,7 +254,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring)
*/
int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring *xdp_ring)
{
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return ICE_XDP_CONSUMED;

View File

@ -2215,7 +2215,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
case XDP_PASS:
break;
case XDP_TX:
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf)) {
result = IXGBE_XDP_CONSUMED;
break;

View File

@ -107,7 +107,7 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
case XDP_PASS:
break;
case XDP_TX:
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf)) {
result = IXGBE_XDP_CONSUMED;
break;

View File

@ -2073,7 +2073,7 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
int cpu;
u32 ret;
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return MVNETA_XDP_DROPPED;

View File

@ -64,7 +64,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
struct xdp_frame *xdpf;
dma_addr_t dma_addr;
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return false;
@ -97,10 +97,10 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
xdpi.frame.xdpf = xdpf;
xdpi.frame.dma_addr = dma_addr;
} else {
/* Driver assumes that convert_to_xdp_frame returns an xdp_frame
* that points to the same memory region as the original
* xdp_buff. It allows to map the memory only once and to use
* the DMA_BIDIRECTIONAL mode.
/* Driver assumes that xdp_convert_buff_to_frame returns
* an xdp_frame that points to the same memory region as
* the original xdp_buff. It allows to map the memory only
* once and to use the DMA_BIDIRECTIONAL mode.
*/
xdpi.mode = MLX5E_XDP_XMIT_MODE_PAGE;

View File

@ -329,7 +329,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
case XDP_TX:
/* Buffer ownership passes to tx on success. */
xdpf = convert_to_xdp_frame(&xdp);
xdpf = xdp_convert_buff_to_frame(&xdp);
err = efx_xdp_tx_buffers(efx, 1, &xdpf, true);
if (unlikely(err != 1)) {
efx_free_rx_buffers(rx_queue, rx_buf, 1);

View File

@ -867,7 +867,7 @@ static u32 netsec_xdp_queue_one(struct netsec_priv *priv,
static u32 netsec_xdp_xmit_back(struct netsec_priv *priv, struct xdp_buff *xdp)
{
struct netsec_desc_ring *tx_ring = &priv->desc_ring[NETSEC_RING_TX];
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
u32 ret;
if (unlikely(!xdpf))

View File

@ -1355,7 +1355,7 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp,
ret = CPSW_XDP_PASS;
break;
case XDP_TX:
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
goto drop;

View File

@ -1295,7 +1295,7 @@ resample:
static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
{
struct xdp_frame *frame = convert_to_xdp_frame(xdp);
struct xdp_frame *frame = xdp_convert_buff_to_frame(xdp);
if (unlikely(!frame))
return -EOVERFLOW;

View File

@ -541,7 +541,7 @@ out:
static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp,
struct veth_xdp_tx_bq *bq)
{
struct xdp_frame *frame = convert_to_xdp_frame(xdp);
struct xdp_frame *frame = xdp_convert_buff_to_frame(xdp);
if (unlikely(!frame))
return -EOVERFLOW;
@ -575,11 +575,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
struct xdp_buff xdp;
u32 act;
xdp.data_hard_start = hard_start;
xdp.data = frame->data;
xdp.data_end = frame->data + frame->len;
xdp.data_meta = frame->data - frame->metasize;
xdp.frame_sz = frame->frame_sz;
xdp_convert_frame_to_buff(frame, &xdp);
xdp.rxq = &rq->xdp_rxq;
act = bpf_prog_run_xdp(xdp_prog, &xdp);

View File

@ -703,7 +703,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
break;
case XDP_TX:
stats->xdp_tx++;
xdpf = convert_to_xdp_frame(&xdp);
xdpf = xdp_convert_buff_to_frame(&xdp);
if (unlikely(!xdpf))
goto err_xdp;
err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
@ -892,7 +892,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
break;
case XDP_TX:
stats->xdp_tx++;
xdpf = convert_to_xdp_frame(&xdp);
xdpf = xdp_convert_buff_to_frame(&xdp);
if (unlikely(!xdpf))
goto err_xdp;
err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);

64
include/linux/bpf-netns.h Normal file
View File

@ -0,0 +1,64 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _BPF_NETNS_H
#define _BPF_NETNS_H
#include <linux/mutex.h>
#include <uapi/linux/bpf.h>
enum netns_bpf_attach_type {
NETNS_BPF_INVALID = -1,
NETNS_BPF_FLOW_DISSECTOR = 0,
MAX_NETNS_BPF_ATTACH_TYPE
};
static inline enum netns_bpf_attach_type
to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
{
switch (attach_type) {
case BPF_FLOW_DISSECTOR:
return NETNS_BPF_FLOW_DISSECTOR;
default:
return NETNS_BPF_INVALID;
}
}
/* Protects updates to netns_bpf */
extern struct mutex netns_bpf_mutex;
union bpf_attr;
struct bpf_prog;
#ifdef CONFIG_NET
int netns_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
int netns_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog);
int netns_bpf_prog_detach(const union bpf_attr *attr);
int netns_bpf_link_create(const union bpf_attr *attr,
struct bpf_prog *prog);
#else
static inline int netns_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
return -EOPNOTSUPP;
}
static inline int netns_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
{
return -EOPNOTSUPP;
}
static inline int netns_bpf_link_create(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
#endif
#endif /* _BPF_NETNS_H */

View File

@ -90,6 +90,8 @@ struct bpf_map_ops {
int (*map_direct_value_meta)(const struct bpf_map *map,
u64 imm, u32 *off);
int (*map_mmap)(struct bpf_map *map, struct vm_area_struct *vma);
__poll_t (*map_poll)(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts);
};
struct bpf_map_memory {
@ -244,6 +246,9 @@ enum bpf_arg_type {
ARG_PTR_TO_LONG, /* pointer to long */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */
ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */
ARG_PTR_TO_ALLOC_MEM, /* pointer to dynamically allocated memory */
ARG_PTR_TO_ALLOC_MEM_OR_NULL, /* pointer to dynamically allocated memory or NULL */
ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */
};
/* type of values returned from helper functions */
@ -255,6 +260,7 @@ enum bpf_return_type {
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
RET_PTR_TO_TCP_SOCK_OR_NULL, /* returns a pointer to a tcp_sock or NULL */
RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */
RET_PTR_TO_ALLOC_MEM_OR_NULL, /* returns a pointer to dynamically allocated memory or NULL */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@ -322,6 +328,8 @@ enum bpf_reg_type {
PTR_TO_XDP_SOCK, /* reg points to struct xdp_sock */
PTR_TO_BTF_ID, /* reg points to kernel struct */
PTR_TO_BTF_ID_OR_NULL, /* reg points to kernel struct or NULL */
PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_MEM_OR_NULL, /* reg points to valid memory region or NULL */
};
/* The information passed from prog-specific *_is_valid_access
@ -1242,6 +1250,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
struct net_device *dev_rx);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
struct bpf_prog *xdp_prog);
bool dev_map_can_have_prog(struct bpf_map *map);
struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
void __cpu_map_flush(void);
@ -1355,6 +1364,10 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map
{
return NULL;
}
static inline bool dev_map_can_have_prog(struct bpf_map *map)
{
return false;
}
static inline void __dev_flush(void)
{
@ -1611,10 +1624,18 @@ extern const struct bpf_func_proto bpf_tcp_sock_proto;
extern const struct bpf_func_proto bpf_jiffies64_proto;
extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto;
extern const struct bpf_func_proto bpf_event_output_data_proto;
extern const struct bpf_func_proto bpf_ringbuf_output_proto;
extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
extern const struct bpf_func_proto bpf_ringbuf_query_proto;
const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
/* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);

View File

@ -118,6 +118,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
#if defined(CONFIG_BPF_JIT)
BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
@ -125,3 +126,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
#ifdef CONFIG_NET
BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
#endif

View File

@ -54,6 +54,8 @@ struct bpf_reg_state {
u32 btf_id; /* for PTR_TO_BTF_ID */
u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
/* Max size from any of the above. */
unsigned long raw;
};
@ -63,6 +65,8 @@ struct bpf_reg_state {
* offset, so they can share range knowledge.
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
* came from, when one is tested for != NULL.
* For PTR_TO_MEM_OR_NULL this is used to identify memory allocation
* for the purpose of tracking that it's freed.
* For PTR_TO_SOCKET this is used to share which pointers retain the
* same reference to the socket, to determine proper reference freeing.
*/

View File

@ -1283,32 +1283,6 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
const struct flow_dissector_key *key,
unsigned int key_count);
#ifdef CONFIG_NET
int skb_flow_dissector_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog);
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr);
#else
static inline int skb_flow_dissector_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
return -EOPNOTSUPP;
}
static inline int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
{
return -EOPNOTSUPP;
}
#endif
struct bpf_flow_dissector;
bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
__be16 proto, int nhoff, int hlen, unsigned int flags);

View File

@ -437,4 +437,12 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs)
psock_set_prog(&progs->skb_verdict, NULL);
}
int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb);
static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
{
if (!psock)
return false;
return psock->parser.enabled;
}
#endif /* _LINUX_SKMSG_H */

View File

@ -8,6 +8,8 @@
#include <linux/string.h>
#include <uapi/linux/if_ether.h>
struct bpf_prog;
struct net;
struct sk_buff;
/**
@ -369,4 +371,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
memset(key_basic, 0, sizeof(*key_basic));
}
#ifdef CONFIG_BPF_SYSCALL
int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
#endif /* CONFIG_BPF_SYSCALL */
#endif

View File

@ -33,6 +33,7 @@
#include <net/netns/mpls.h>
#include <net/netns/can.h>
#include <net/netns/xdp.h>
#include <net/netns/bpf.h>
#include <linux/ns_common.h>
#include <linux/idr.h>
#include <linux/skbuff.h>
@ -162,7 +163,8 @@ struct net {
#endif
struct net_generic __rcu *gen;
struct bpf_prog __rcu *flow_dissector_prog;
/* Used to store attached BPF programs */
struct netns_bpf bpf;
/* Note : following structs are cache line aligned */
#ifdef CONFIG_XFRM

18
include/net/netns/bpf.h Normal file
View File

@ -0,0 +1,18 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* BPF programs attached to network namespace
*/
#ifndef __NETNS_BPF_H__
#define __NETNS_BPF_H__
#include <linux/bpf-netns.h>
struct bpf_prog;
struct netns_bpf {
struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
};
#endif /* __NETNS_BPF_H__ */

View File

@ -2690,7 +2690,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
void sock_def_readable(struct sock *sk);
int sock_bindtoindex(struct sock *sk, int ifindex);
int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
void sock_enable_timestamps(struct sock *sk);
void sock_no_linger(struct sock *sk);
void sock_set_keepalive(struct sock *sk);

View File

@ -571,6 +571,15 @@ static inline bool tls_sw_has_ctx_tx(const struct sock *sk)
return !!tls_sw_ctx_tx(ctx);
}
static inline bool tls_sw_has_ctx_rx(const struct sock *sk)
{
struct tls_context *ctx = tls_get_ctx(sk);
if (!ctx)
return false;
return !!tls_sw_ctx_rx(ctx);
}
void tls_sw_write_space(struct sock *sk, struct tls_context *ctx);
void tls_device_write_space(struct sock *sk, struct tls_context *ctx);

View File

@ -61,12 +61,17 @@ struct xdp_rxq_info {
struct xdp_mem_info mem;
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
struct xdp_txq_info {
struct net_device *dev;
};
struct xdp_buff {
void *data;
void *data_end;
void *data_meta;
void *data_hard_start;
struct xdp_rxq_info *rxq;
struct xdp_txq_info *txq;
u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
};
@ -106,9 +111,19 @@ void xdp_warn(const char *msg, const char *func, const int line);
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
static inline
void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
{
xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
xdp->data = frame->data;
xdp->data_end = frame->data + frame->len;
xdp->data_meta = frame->data - frame->metasize;
xdp->frame_sz = frame->frame_sz;
}
/* Convert xdp_buff to xdp_frame */
static inline
struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
{
struct xdp_frame *xdp_frame;
int metasize;

View File

@ -147,6 +147,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
};
/* Note that tracing related programs such as
@ -224,6 +225,7 @@ enum bpf_attach_type {
BPF_CGROUP_INET6_GETPEERNAME,
BPF_CGROUP_INET4_GETSOCKNAME,
BPF_CGROUP_INET6_GETSOCKNAME,
BPF_XDP_DEVMAP,
__MAX_BPF_ATTACH_TYPE
};
@ -235,6 +237,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TRACING = 2,
BPF_LINK_TYPE_CGROUP = 3,
BPF_LINK_TYPE_ITER = 4,
BPF_LINK_TYPE_NETNS = 5,
MAX_BPF_LINK_TYPE,
};
@ -3157,6 +3160,59 @@ union bpf_attr {
* **bpf_sk_cgroup_id**\ ().
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
* void *bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
* Description
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* 0, on success;
* < 0, on error.
*
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
*
* void bpf_ringbuf_submit(void *data, u64 flags)
* Description
* Submit reserved ring buffer sample, pointed to by *data*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* Nothing. Always succeeds.
*
* void bpf_ringbuf_discard(void *data, u64 flags)
* Description
* Discard reserved ring buffer sample, pointed to by *data*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* Nothing. Always succeeds.
*
* u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
* Description
* Query various characteristics of provided ring buffer. What
* exactly is queries is determined by *flags*:
* - BPF_RB_AVAIL_DATA - amount of data not yet consumed;
* - BPF_RB_RING_SIZE - the size of ring buffer;
* - BPF_RB_CONS_POS - consumer position (can wrap around);
* - BPF_RB_PROD_POS - producer(s) position (can wrap around);
* Data returned is just a momentary snapshots of actual values
* and could be inaccurate, so this facility should be used to
* power heuristics and for reporting, not to make 100% correct
* calculation.
* Return
* Requested value, or 0, if flags are not recognized.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -3288,7 +3344,12 @@ union bpf_attr {
FN(seq_printf), \
FN(seq_write), \
FN(sk_cgroup_id), \
FN(sk_ancestor_cgroup_id),
FN(sk_ancestor_cgroup_id), \
FN(ringbuf_output), \
FN(ringbuf_reserve), \
FN(ringbuf_submit), \
FN(ringbuf_discard), \
FN(ringbuf_query),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@ -3398,6 +3459,29 @@ enum {
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
};
/* BPF_FUNC_bpf_ringbuf_commit, BPF_FUNC_bpf_ringbuf_discard, and
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
/* BPF_FUNC_bpf_ringbuf_query flags */
enum {
BPF_RB_AVAIL_DATA = 0,
BPF_RB_RING_SIZE = 1,
BPF_RB_CONS_POS = 2,
BPF_RB_PROD_POS = 3,
};
/* BPF ring buffer constants */
enum {
BPF_RINGBUF_BUSY_BIT = (1U << 31),
BPF_RINGBUF_DISCARD_BIT = (1U << 30),
BPF_RINGBUF_HDR_SZ = 8,
};
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
@ -3530,6 +3614,7 @@ struct bpf_sock {
__u32 dst_ip4;
__u32 dst_ip6[4];
__u32 state;
__s32 rx_queue_mapping;
};
struct bpf_tcp_sock {
@ -3623,6 +3708,8 @@ struct xdp_md {
/* Below access go through struct xdp_rxq_info */
__u32 ingress_ifindex; /* rxq->dev->ifindex */
__u32 rx_queue_index; /* rxq->queue_index */
__u32 egress_ifindex; /* txq->dev->ifindex */
};
enum sk_action {
@ -3645,6 +3732,8 @@ struct sk_msg_md {
__u32 remote_port; /* Stored in network byte order */
__u32 local_port; /* stored in host byte order */
__u32 size; /* Total size of sk_msg */
__bpf_md_ptr(struct bpf_sock *, sk); /* current socket */
};
struct sk_reuseport_md {
@ -3751,6 +3840,10 @@ struct bpf_link_info {
__u64 cgroup_id;
__u32 attach_type;
} cgroup;
struct {
__u32 netns_ino;
__u32 attach_type;
} netns;
};
} __attribute__((aligned(8)));

View File

@ -4,7 +4,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o
@ -13,6 +13,7 @@ ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
obj-$(CONFIG_BPF_SYSCALL) += offload.o
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
endif
ifeq ($(CONFIG_PERF_EVENTS),y)
obj-$(CONFIG_BPF_SYSCALL) += stackmap.o

View File

@ -49,6 +49,6 @@ const struct bpf_prog_ops lsm_prog_ops = {
};
const struct bpf_verifier_ops lsm_verifier_ops = {
.get_func_proto = bpf_tracing_func_proto,
.get_func_proto = tracing_prog_func_proto,
.is_valid_access = btf_ctx_access,
};

View File

@ -595,7 +595,7 @@ static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
mutex_lock(&cgroup_mutex);
/* link might have been auto-released by dying cgroup, so fail */
if (!cg_link->cgroup) {
ret = -EINVAL;
ret = -ENOLINK;
goto out_unlock;
}
if (old_prog && link->prog != old_prog) {

View File

@ -1543,7 +1543,7 @@ select_insn:
/* ARG1 at this point is guaranteed to point to CTX from
* the verifier side due to the fact that the tail call is
* handeled like a helper, that is, bpf_tail_call_proto,
* handled like a helper, that is, bpf_tail_call_proto,
* where arg1_type is ARG_PTR_TO_CTX.
*/
insn = prog->insnsi;

View File

@ -621,7 +621,7 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
{
struct xdp_frame *xdpf;
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return -EOVERFLOW;

View File

@ -60,12 +60,23 @@ struct xdp_dev_bulk_queue {
unsigned int count;
};
/* DEVMAP values */
struct bpf_devmap_val {
u32 ifindex; /* device index */
union {
int fd; /* prog fd on map write */
u32 id; /* prog id on map read */
} bpf_prog;
};
struct bpf_dtab_netdev {
struct net_device *dev; /* must be first member, due to tracepoint */
struct hlist_node index_hlist;
struct bpf_dtab *dtab;
struct bpf_prog *xdp_prog;
struct rcu_head rcu;
unsigned int idx;
struct bpf_devmap_val val;
};
struct bpf_dtab {
@ -105,12 +116,18 @@ static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
{
u32 valsize = attr->value_size;
u64 cost = 0;
int err;
/* check sanity of attributes */
/* check sanity of attributes. 2 value sizes supported:
* 4 bytes: ifindex
* 8 bytes: ifindex + prog fd
*/
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK)
(valsize != offsetofend(struct bpf_devmap_val, ifindex) &&
valsize != offsetofend(struct bpf_devmap_val, bpf_prog.fd)) ||
attr->map_flags & ~DEV_CREATE_FLAG_MASK)
return -EINVAL;
/* Lookup returns a pointer straight to dev->ifindex, so make sure the
@ -217,6 +234,8 @@ static void dev_map_free(struct bpf_map *map)
hlist_for_each_entry_safe(dev, next, head, index_hlist) {
hlist_del_rcu(&dev->index_hlist);
if (dev->xdp_prog)
bpf_prog_put(dev->xdp_prog);
dev_put(dev->dev);
kfree(dev);
}
@ -231,6 +250,8 @@ static void dev_map_free(struct bpf_map *map)
if (!dev)
continue;
if (dev->xdp_prog)
bpf_prog_put(dev->xdp_prog);
dev_put(dev->dev);
kfree(dev);
}
@ -317,6 +338,16 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
return -ENOENT;
}
bool dev_map_can_have_prog(struct bpf_map *map)
{
if ((map->map_type == BPF_MAP_TYPE_DEVMAP ||
map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) &&
map->value_size != offsetofend(struct bpf_devmap_val, ifindex))
return true;
return false;
}
static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
{
struct net_device *dev = bq->dev;
@ -434,13 +465,40 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
if (unlikely(err))
return err;
xdpf = convert_to_xdp_frame(xdp);
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
return -EOVERFLOW;
return bq_enqueue(dev, xdpf, dev_rx);
}
static struct xdp_buff *dev_map_run_prog(struct net_device *dev,
struct xdp_buff *xdp,
struct bpf_prog *xdp_prog)
{
struct xdp_txq_info txq = { .dev = dev };
u32 act;
xdp->txq = &txq;
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
return xdp;
case XDP_DROP:
break;
default:
bpf_warn_invalid_xdp_action(act);
fallthrough;
case XDP_ABORTED:
trace_xdp_exception(dev, xdp_prog, act);
break;
}
xdp_return_buff(xdp);
return NULL;
}
int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
struct net_device *dev_rx)
{
@ -452,6 +510,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
{
struct net_device *dev = dst->dev;
if (dst->xdp_prog) {
xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog);
if (!xdp)
return 0;
}
return __xdp_enqueue(dev, xdp, dev_rx);
}
@ -472,18 +535,15 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
struct net_device *dev = obj ? obj->dev : NULL;
return dev ? &dev->ifindex : NULL;
return obj ? &obj->val : NULL;
}
static void *dev_map_hash_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_dtab_netdev *obj = __dev_map_hash_lookup_elem(map,
*(u32 *)key);
struct net_device *dev = obj ? obj->dev : NULL;
return dev ? &dev->ifindex : NULL;
return obj ? &obj->val : NULL;
}
static void __dev_map_entry_free(struct rcu_head *rcu)
@ -491,6 +551,8 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
struct bpf_dtab_netdev *dev;
dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
if (dev->xdp_prog)
bpf_prog_put(dev->xdp_prog);
dev_put(dev->dev);
kfree(dev);
}
@ -541,9 +603,10 @@ static int dev_map_hash_delete_elem(struct bpf_map *map, void *key)
static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
struct bpf_dtab *dtab,
u32 ifindex,
struct bpf_devmap_val *val,
unsigned int idx)
{
struct bpf_prog *prog = NULL;
struct bpf_dtab_netdev *dev;
dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
@ -551,24 +614,46 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
if (!dev)
return ERR_PTR(-ENOMEM);
dev->dev = dev_get_by_index(net, ifindex);
if (!dev->dev) {
kfree(dev);
return ERR_PTR(-EINVAL);
dev->dev = dev_get_by_index(net, val->ifindex);
if (!dev->dev)
goto err_out;
if (val->bpf_prog.fd >= 0) {
prog = bpf_prog_get_type_dev(val->bpf_prog.fd,
BPF_PROG_TYPE_XDP, false);
if (IS_ERR(prog))
goto err_put_dev;
if (prog->expected_attach_type != BPF_XDP_DEVMAP)
goto err_put_prog;
}
dev->idx = idx;
dev->dtab = dtab;
if (prog) {
dev->xdp_prog = prog;
dev->val.bpf_prog.id = prog->aux->id;
} else {
dev->xdp_prog = NULL;
dev->val.bpf_prog.id = 0;
}
dev->val.ifindex = val->ifindex;
return dev;
err_put_prog:
bpf_prog_put(prog);
err_put_dev:
dev_put(dev->dev);
err_out:
kfree(dev);
return ERR_PTR(-EINVAL);
}
static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
struct bpf_dtab_netdev *dev, *old_dev;
u32 ifindex = *(u32 *)value;
u32 i = *(u32 *)key;
if (unlikely(map_flags > BPF_EXIST))
@ -578,10 +663,16 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
if (unlikely(map_flags == BPF_NOEXIST))
return -EEXIST;
if (!ifindex) {
/* already verified value_size <= sizeof val */
memcpy(&val, value, map->value_size);
if (!val.ifindex) {
dev = NULL;
/* can not specify fd if ifindex is 0 */
if (val.bpf_prog.fd != -1)
return -EINVAL;
} else {
dev = __dev_map_alloc_node(net, dtab, ifindex, i);
dev = __dev_map_alloc_node(net, dtab, &val, i);
if (IS_ERR(dev))
return PTR_ERR(dev);
}
@ -608,13 +699,16 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
void *key, void *value, u64 map_flags)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
struct bpf_dtab_netdev *dev, *old_dev;
u32 ifindex = *(u32 *)value;
u32 idx = *(u32 *)key;
unsigned long flags;
int err = -EEXIST;
if (unlikely(map_flags > BPF_EXIST || !ifindex))
/* already verified value_size <= sizeof val */
memcpy(&val, value, map->value_size);
if (unlikely(map_flags > BPF_EXIST || !val.ifindex))
return -EINVAL;
spin_lock_irqsave(&dtab->index_lock, flags);
@ -623,7 +717,7 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
if (old_dev && (map_flags & BPF_NOEXIST))
goto out_err;
dev = __dev_map_alloc_node(net, dtab, ifindex, idx);
dev = __dev_map_alloc_node(net, dtab, &val, idx);
if (IS_ERR(dev)) {
err = PTR_ERR(dev);
goto out_err;

View File

@ -601,6 +601,12 @@ const struct bpf_func_proto bpf_event_output_data_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
const struct bpf_func_proto bpf_get_current_task_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
@ -629,6 +635,16 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_ktime_get_boot_ns:
return &bpf_ktime_get_boot_ns_proto;
case BPF_FUNC_ringbuf_output:
return &bpf_ringbuf_output_proto;
case BPF_FUNC_ringbuf_reserve:
return &bpf_ringbuf_reserve_proto;
case BPF_FUNC_ringbuf_submit:
return &bpf_ringbuf_submit_proto;
case BPF_FUNC_ringbuf_discard:
return &bpf_ringbuf_discard_proto;
case BPF_FUNC_ringbuf_query:
return &bpf_ringbuf_query_proto;
default:
break;
}
@ -647,6 +663,24 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return bpf_get_trace_printk_proto();
case BPF_FUNC_jiffies64:
return &bpf_jiffies64_proto;
default:
break;
}
if (!perfmon_capable())
return NULL;
switch (func_id) {
case BPF_FUNC_get_current_task:
return &bpf_get_current_task_proto;
case BPF_FUNC_probe_read_user:
return &bpf_probe_read_user_proto;
case BPF_FUNC_probe_read_kernel:
return &bpf_probe_read_kernel_proto;
case BPF_FUNC_probe_read_user_str:
return &bpf_probe_read_user_str_proto;
case BPF_FUNC_probe_read_kernel_str:
return &bpf_probe_read_kernel_str_proto;
default:
return NULL;
}

373
kernel/bpf/net_namespace.c Normal file
View File

@ -0,0 +1,373 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <linux/filter.h>
#include <net/net_namespace.h>
/*
* Functions to manage BPF programs attached to netns
*/
struct bpf_netns_link {
struct bpf_link link;
enum bpf_attach_type type;
enum netns_bpf_attach_type netns_type;
/* We don't hold a ref to net in order to auto-detach the link
* when netns is going away. Instead we rely on pernet
* pre_exit callback to clear this pointer. Must be accessed
* with netns_bpf_mutex held.
*/
struct net *net;
};
/* Protects updates to netns_bpf */
DEFINE_MUTEX(netns_bpf_mutex);
/* Must be called with netns_bpf_mutex held. */
static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
{
struct bpf_netns_link *net_link =
container_of(link, struct bpf_netns_link, link);
net_link->net = NULL;
}
static void bpf_netns_link_release(struct bpf_link *link)
{
struct bpf_netns_link *net_link =
container_of(link, struct bpf_netns_link, link);
enum netns_bpf_attach_type type = net_link->netns_type;
struct net *net;
/* Link auto-detached by dying netns. */
if (!net_link->net)
return;
mutex_lock(&netns_bpf_mutex);
/* Recheck after potential sleep. We can race with cleanup_net
* here, but if we see a non-NULL struct net pointer pre_exit
* has not happened yet and will block on netns_bpf_mutex.
*/
net = net_link->net;
if (!net)
goto out_unlock;
net->bpf.links[type] = NULL;
RCU_INIT_POINTER(net->bpf.progs[type], NULL);
out_unlock:
mutex_unlock(&netns_bpf_mutex);
}
static void bpf_netns_link_dealloc(struct bpf_link *link)
{
struct bpf_netns_link *net_link =
container_of(link, struct bpf_netns_link, link);
kfree(net_link);
}
static int bpf_netns_link_update_prog(struct bpf_link *link,
struct bpf_prog *new_prog,
struct bpf_prog *old_prog)
{
struct bpf_netns_link *net_link =
container_of(link, struct bpf_netns_link, link);
enum netns_bpf_attach_type type = net_link->netns_type;
struct net *net;
int ret = 0;
if (old_prog && old_prog != link->prog)
return -EPERM;
if (new_prog->type != link->prog->type)
return -EINVAL;
mutex_lock(&netns_bpf_mutex);
net = net_link->net;
if (!net || !check_net(net)) {
/* Link auto-detached or netns dying */
ret = -ENOLINK;
goto out_unlock;
}
old_prog = xchg(&link->prog, new_prog);
rcu_assign_pointer(net->bpf.progs[type], new_prog);
bpf_prog_put(old_prog);
out_unlock:
mutex_unlock(&netns_bpf_mutex);
return ret;
}
static int bpf_netns_link_fill_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
const struct bpf_netns_link *net_link =
container_of(link, struct bpf_netns_link, link);
unsigned int inum = 0;
struct net *net;
mutex_lock(&netns_bpf_mutex);
net = net_link->net;
if (net && check_net(net))
inum = net->ns.inum;
mutex_unlock(&netns_bpf_mutex);
info->netns.netns_ino = inum;
info->netns.attach_type = net_link->type;
return 0;
}
static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
struct bpf_link_info info = {};
bpf_netns_link_fill_info(link, &info);
seq_printf(seq,
"netns_ino:\t%u\n"
"attach_type:\t%u\n",
info.netns.netns_ino,
info.netns.attach_type);
}
static const struct bpf_link_ops bpf_netns_link_ops = {
.release = bpf_netns_link_release,
.dealloc = bpf_netns_link_dealloc,
.update_prog = bpf_netns_link_update_prog,
.fill_link_info = bpf_netns_link_fill_info,
.show_fdinfo = bpf_netns_link_show_fdinfo,
};
int netns_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
u32 prog_id, prog_cnt = 0, flags = 0;
enum netns_bpf_attach_type type;
struct bpf_prog *attached;
struct net *net;
if (attr->query.query_flags)
return -EINVAL;
type = to_netns_bpf_attach_type(attr->query.attach_type);
if (type < 0)
return -EINVAL;
net = get_net_ns_by_fd(attr->query.target_fd);
if (IS_ERR(net))
return PTR_ERR(net);
rcu_read_lock();
attached = rcu_dereference(net->bpf.progs[type]);
if (attached) {
prog_cnt = 1;
prog_id = attached->aux->id;
}
rcu_read_unlock();
put_net(net);
if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
return -EFAULT;
if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
return -EFAULT;
if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
return 0;
if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
return -EFAULT;
return 0;
}
int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
enum netns_bpf_attach_type type;
struct net *net;
int ret;
type = to_netns_bpf_attach_type(attr->attach_type);
if (type < 0)
return -EINVAL;
net = current->nsproxy->net_ns;
mutex_lock(&netns_bpf_mutex);
/* Attaching prog directly is not compatible with links */
if (net->bpf.links[type]) {
ret = -EEXIST;
goto out_unlock;
}
switch (type) {
case NETNS_BPF_FLOW_DISSECTOR:
ret = flow_dissector_bpf_prog_attach(net, prog);
break;
default:
ret = -EINVAL;
break;
}
out_unlock:
mutex_unlock(&netns_bpf_mutex);
return ret;
}
/* Must be called with netns_bpf_mutex held. */
static int __netns_bpf_prog_detach(struct net *net,
enum netns_bpf_attach_type type)
{
struct bpf_prog *attached;
/* Progs attached via links cannot be detached */
if (net->bpf.links[type])
return -EINVAL;
attached = rcu_dereference_protected(net->bpf.progs[type],
lockdep_is_held(&netns_bpf_mutex));
if (!attached)
return -ENOENT;
RCU_INIT_POINTER(net->bpf.progs[type], NULL);
bpf_prog_put(attached);
return 0;
}
int netns_bpf_prog_detach(const union bpf_attr *attr)
{
enum netns_bpf_attach_type type;
int ret;
type = to_netns_bpf_attach_type(attr->attach_type);
if (type < 0)
return -EINVAL;
mutex_lock(&netns_bpf_mutex);
ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
mutex_unlock(&netns_bpf_mutex);
return ret;
}
static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
enum netns_bpf_attach_type type)
{
struct bpf_prog *prog;
int err;
mutex_lock(&netns_bpf_mutex);
/* Allow attaching only one prog or link for now */
if (net->bpf.links[type]) {
err = -E2BIG;
goto out_unlock;
}
/* Links are not compatible with attaching prog directly */
prog = rcu_dereference_protected(net->bpf.progs[type],
lockdep_is_held(&netns_bpf_mutex));
if (prog) {
err = -EEXIST;
goto out_unlock;
}
switch (type) {
case NETNS_BPF_FLOW_DISSECTOR:
err = flow_dissector_bpf_prog_attach(net, link->prog);
break;
default:
err = -EINVAL;
break;
}
if (err)
goto out_unlock;
net->bpf.links[type] = link;
out_unlock:
mutex_unlock(&netns_bpf_mutex);
return err;
}
int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
{
enum netns_bpf_attach_type netns_type;
struct bpf_link_primer link_primer;
struct bpf_netns_link *net_link;
enum bpf_attach_type type;
struct net *net;
int err;
if (attr->link_create.flags)
return -EINVAL;
type = attr->link_create.attach_type;
netns_type = to_netns_bpf_attach_type(type);
if (netns_type < 0)
return -EINVAL;
net = get_net_ns_by_fd(attr->link_create.target_fd);
if (IS_ERR(net))
return PTR_ERR(net);
net_link = kzalloc(sizeof(*net_link), GFP_USER);
if (!net_link) {
err = -ENOMEM;
goto out_put_net;
}
bpf_link_init(&net_link->link, BPF_LINK_TYPE_NETNS,
&bpf_netns_link_ops, prog);
net_link->net = net;
net_link->type = type;
net_link->netns_type = netns_type;
err = bpf_link_prime(&net_link->link, &link_primer);
if (err) {
kfree(net_link);
goto out_put_net;
}
err = netns_bpf_link_attach(net, &net_link->link, netns_type);
if (err) {
bpf_link_cleanup(&link_primer);
goto out_put_net;
}
put_net(net);
return bpf_link_settle(&link_primer);
out_put_net:
put_net(net);
return err;
}
static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
{
enum netns_bpf_attach_type type;
struct bpf_link *link;
mutex_lock(&netns_bpf_mutex);
for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
link = net->bpf.links[type];
if (link)
bpf_netns_link_auto_detach(link);
else
__netns_bpf_prog_detach(net, type);
}
mutex_unlock(&netns_bpf_mutex);
}
static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
.pre_exit = netns_bpf_pernet_pre_exit,
};
static int __init netns_bpf_init(void)
{
return register_pernet_subsys(&netns_bpf_pernet_ops);
}
subsys_initcall(netns_bpf_init);

501
kernel/bpf/ringbuf.c Normal file
View File

@ -0,0 +1,501 @@
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/err.h>
#include <linux/irq_work.h>
#include <linux/slab.h>
#include <linux/filter.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/wait.h>
#include <linux/poll.h>
#include <uapi/linux/btf.h>
#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE)
/* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */
#define RINGBUF_PGOFF \
(offsetof(struct bpf_ringbuf, consumer_pos) >> PAGE_SHIFT)
/* consumer page and producer page */
#define RINGBUF_POS_PAGES 2
#define RINGBUF_MAX_RECORD_SZ (UINT_MAX/4)
/* Maximum size of ring buffer area is limited by 32-bit page offset within
* record header, counted in pages. Reserve 8 bits for extensibility, and take
* into account few extra pages for consumer/producer pages and
* non-mmap()'able parts. This gives 64GB limit, which seems plenty for single
* ring buffer.
*/
#define RINGBUF_MAX_DATA_SZ \
(((1ULL << 24) - RINGBUF_POS_PAGES - RINGBUF_PGOFF) * PAGE_SIZE)
struct bpf_ringbuf {
wait_queue_head_t waitq;
struct irq_work work;
u64 mask;
struct page **pages;
int nr_pages;
spinlock_t spinlock ____cacheline_aligned_in_smp;
/* Consumer and producer counters are put into separate pages to allow
* mapping consumer page as r/w, but restrict producer page to r/o.
* This protects producer position from being modified by user-space
* application and ruining in-kernel position tracking.
*/
unsigned long consumer_pos __aligned(PAGE_SIZE);
unsigned long producer_pos __aligned(PAGE_SIZE);
char data[] __aligned(PAGE_SIZE);
};
struct bpf_ringbuf_map {
struct bpf_map map;
struct bpf_map_memory memory;
struct bpf_ringbuf *rb;
};
/* 8-byte ring buffer record header structure */
struct bpf_ringbuf_hdr {
u32 len;
u32 pg_off;
};
static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
{
const gfp_t flags = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN |
__GFP_ZERO;
int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
int nr_data_pages = data_sz >> PAGE_SHIFT;
int nr_pages = nr_meta_pages + nr_data_pages;
struct page **pages, *page;
struct bpf_ringbuf *rb;
size_t array_size;
int i;
/* Each data page is mapped twice to allow "virtual"
* continuous read of samples wrapping around the end of ring
* buffer area:
* ------------------------------------------------------
* | meta pages | real data pages | same data pages |
* ------------------------------------------------------
* | | 1 2 3 4 5 6 7 8 9 | 1 2 3 4 5 6 7 8 9 |
* ------------------------------------------------------
* | | TA DA | TA DA |
* ------------------------------------------------------
* ^^^^^^^
* |
* Here, no need to worry about special handling of wrapped-around
* data due to double-mapped data pages. This works both in kernel and
* when mmap()'ed in user-space, simplifying both kernel and
* user-space implementations significantly.
*/
array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
if (array_size > PAGE_SIZE)
pages = vmalloc_node(array_size, numa_node);
else
pages = kmalloc_node(array_size, flags, numa_node);
if (!pages)
return NULL;
for (i = 0; i < nr_pages; i++) {
page = alloc_pages_node(numa_node, flags, 0);
if (!page) {
nr_pages = i;
goto err_free_pages;
}
pages[i] = page;
if (i >= nr_meta_pages)
pages[nr_data_pages + i] = page;
}
rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
VM_ALLOC | VM_USERMAP, PAGE_KERNEL);
if (rb) {
rb->pages = pages;
rb->nr_pages = nr_pages;
return rb;
}
err_free_pages:
for (i = 0; i < nr_pages; i++)
__free_page(pages[i]);
kvfree(pages);
return NULL;
}
static void bpf_ringbuf_notify(struct irq_work *work)
{
struct bpf_ringbuf *rb = container_of(work, struct bpf_ringbuf, work);
wake_up_all(&rb->waitq);
}
static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
{
struct bpf_ringbuf *rb;
if (!data_sz || !PAGE_ALIGNED(data_sz))
return ERR_PTR(-EINVAL);
#ifdef CONFIG_64BIT
/* on 32-bit arch, it's impossible to overflow record's hdr->pgoff */
if (data_sz > RINGBUF_MAX_DATA_SZ)
return ERR_PTR(-E2BIG);
#endif
rb = bpf_ringbuf_area_alloc(data_sz, numa_node);
if (!rb)
return ERR_PTR(-ENOMEM);
spin_lock_init(&rb->spinlock);
init_waitqueue_head(&rb->waitq);
init_irq_work(&rb->work, bpf_ringbuf_notify);
rb->mask = data_sz - 1;
rb->consumer_pos = 0;
rb->producer_pos = 0;
return rb;
}
static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
{
struct bpf_ringbuf_map *rb_map;
u64 cost;
int err;
if (attr->map_flags & ~RINGBUF_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
if (attr->key_size || attr->value_size ||
attr->max_entries == 0 || !PAGE_ALIGNED(attr->max_entries))
return ERR_PTR(-EINVAL);
rb_map = kzalloc(sizeof(*rb_map), GFP_USER);
if (!rb_map)
return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&rb_map->map, attr);
cost = sizeof(struct bpf_ringbuf_map) +
sizeof(struct bpf_ringbuf) +
attr->max_entries;
err = bpf_map_charge_init(&rb_map->map.memory, cost);
if (err)
goto err_free_map;
rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
if (IS_ERR(rb_map->rb)) {
err = PTR_ERR(rb_map->rb);
goto err_uncharge;
}
return &rb_map->map;
err_uncharge:
bpf_map_charge_finish(&rb_map->map.memory);
err_free_map:
kfree(rb_map);
return ERR_PTR(err);
}
static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
{
/* copy pages pointer and nr_pages to local variable, as we are going
* to unmap rb itself with vunmap() below
*/
struct page **pages = rb->pages;
int i, nr_pages = rb->nr_pages;
vunmap(rb);
for (i = 0; i < nr_pages; i++)
__free_page(pages[i]);
kvfree(pages);
}
static void ringbuf_map_free(struct bpf_map *map)
{
struct bpf_ringbuf_map *rb_map;
/* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
* so the programs (can be more than one that used this map) were
* disconnected from events. Wait for outstanding critical sections in
* these programs to complete
*/
synchronize_rcu();
rb_map = container_of(map, struct bpf_ringbuf_map, map);
bpf_ringbuf_free(rb_map->rb);
kfree(rb_map);
}
static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
{
return ERR_PTR(-ENOTSUPP);
}
static int ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 flags)
{
return -ENOTSUPP;
}
static int ringbuf_map_delete_elem(struct bpf_map *map, void *key)
{
return -ENOTSUPP;
}
static int ringbuf_map_get_next_key(struct bpf_map *map, void *key,
void *next_key)
{
return -ENOTSUPP;
}
static size_t bpf_ringbuf_mmap_page_cnt(const struct bpf_ringbuf *rb)
{
size_t data_pages = (rb->mask + 1) >> PAGE_SHIFT;
/* consumer page + producer page + 2 x data pages */
return RINGBUF_POS_PAGES + 2 * data_pages;
}
static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
{
struct bpf_ringbuf_map *rb_map;
size_t mmap_sz;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
mmap_sz = bpf_ringbuf_mmap_page_cnt(rb_map->rb) << PAGE_SHIFT;
if (vma->vm_pgoff * PAGE_SIZE + (vma->vm_end - vma->vm_start) > mmap_sz)
return -EINVAL;
return remap_vmalloc_range(vma, rb_map->rb,
vma->vm_pgoff + RINGBUF_PGOFF);
}
static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb)
{
unsigned long cons_pos, prod_pos;
cons_pos = smp_load_acquire(&rb->consumer_pos);
prod_pos = smp_load_acquire(&rb->producer_pos);
return prod_pos - cons_pos;
}
static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts)
{
struct bpf_ringbuf_map *rb_map;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
poll_wait(filp, &rb_map->rb->waitq, pts);
if (ringbuf_avail_data_sz(rb_map->rb))
return EPOLLIN | EPOLLRDNORM;
return 0;
}
const struct bpf_map_ops ringbuf_map_ops = {
.map_alloc = ringbuf_map_alloc,
.map_free = ringbuf_map_free,
.map_mmap = ringbuf_map_mmap,
.map_poll = ringbuf_map_poll,
.map_lookup_elem = ringbuf_map_lookup_elem,
.map_update_elem = ringbuf_map_update_elem,
.map_delete_elem = ringbuf_map_delete_elem,
.map_get_next_key = ringbuf_map_get_next_key,
};
/* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself,
* calculate offset from record metadata to ring buffer in pages, rounded
* down. This page offset is stored as part of record metadata and allows to
* restore struct bpf_ringbuf * from record pointer. This page offset is
* stored at offset 4 of record metadata header.
*/
static size_t bpf_ringbuf_rec_pg_off(struct bpf_ringbuf *rb,
struct bpf_ringbuf_hdr *hdr)
{
return ((void *)hdr - (void *)rb) >> PAGE_SHIFT;
}
/* Given pointer to ring buffer record header, restore pointer to struct
* bpf_ringbuf itself by using page offset stored at offset 4
*/
static struct bpf_ringbuf *
bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr)
{
unsigned long addr = (unsigned long)(void *)hdr;
unsigned long off = (unsigned long)hdr->pg_off << PAGE_SHIFT;
return (void*)((addr & PAGE_MASK) - off);
}
static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
{
unsigned long cons_pos, prod_pos, new_prod_pos, flags;
u32 len, pg_off;
struct bpf_ringbuf_hdr *hdr;
if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
return NULL;
len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
cons_pos = smp_load_acquire(&rb->consumer_pos);
if (in_nmi()) {
if (!spin_trylock_irqsave(&rb->spinlock, flags))
return NULL;
} else {
spin_lock_irqsave(&rb->spinlock, flags);
}
prod_pos = rb->producer_pos;
new_prod_pos = prod_pos + len;
/* check for out of ringbuf space by ensuring producer position
* doesn't advance more than (ringbuf_size - 1) ahead
*/
if (new_prod_pos - cons_pos > rb->mask) {
spin_unlock_irqrestore(&rb->spinlock, flags);
return NULL;
}
hdr = (void *)rb->data + (prod_pos & rb->mask);
pg_off = bpf_ringbuf_rec_pg_off(rb, hdr);
hdr->len = size | BPF_RINGBUF_BUSY_BIT;
hdr->pg_off = pg_off;
/* pairs with consumer's smp_load_acquire() */
smp_store_release(&rb->producer_pos, new_prod_pos);
spin_unlock_irqrestore(&rb->spinlock, flags);
return (void *)hdr + BPF_RINGBUF_HDR_SZ;
}
BPF_CALL_3(bpf_ringbuf_reserve, struct bpf_map *, map, u64, size, u64, flags)
{
struct bpf_ringbuf_map *rb_map;
if (unlikely(flags))
return 0;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
return (unsigned long)__bpf_ringbuf_reserve(rb_map->rb, size);
}
const struct bpf_func_proto bpf_ringbuf_reserve_proto = {
.func = bpf_ringbuf_reserve,
.ret_type = RET_PTR_TO_ALLOC_MEM_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,
.arg3_type = ARG_ANYTHING,
};
static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
{
unsigned long rec_pos, cons_pos;
struct bpf_ringbuf_hdr *hdr;
struct bpf_ringbuf *rb;
u32 new_len;
hdr = sample - BPF_RINGBUF_HDR_SZ;
rb = bpf_ringbuf_restore_from_rec(hdr);
new_len = hdr->len ^ BPF_RINGBUF_BUSY_BIT;
if (discard)
new_len |= BPF_RINGBUF_DISCARD_BIT;
/* update record header with correct final size prefix */
xchg(&hdr->len, new_len);
/* if consumer caught up and is waiting for our record, notify about
* new data availability
*/
rec_pos = (void *)hdr - (void *)rb->data;
cons_pos = smp_load_acquire(&rb->consumer_pos) & rb->mask;
if (flags & BPF_RB_FORCE_WAKEUP)
irq_work_queue(&rb->work);
else if (cons_pos == rec_pos && !(flags & BPF_RB_NO_WAKEUP))
irq_work_queue(&rb->work);
}
BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
{
bpf_ringbuf_commit(sample, flags, false /* discard */);
return 0;
}
const struct bpf_func_proto bpf_ringbuf_submit_proto = {
.func = bpf_ringbuf_submit,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
.arg2_type = ARG_ANYTHING,
};
BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
{
bpf_ringbuf_commit(sample, flags, true /* discard */);
return 0;
}
const struct bpf_func_proto bpf_ringbuf_discard_proto = {
.func = bpf_ringbuf_discard,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
.arg2_type = ARG_ANYTHING,
};
BPF_CALL_4(bpf_ringbuf_output, struct bpf_map *, map, void *, data, u64, size,
u64, flags)
{
struct bpf_ringbuf_map *rb_map;
void *rec;
if (unlikely(flags & ~(BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP)))
return -EINVAL;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
rec = __bpf_ringbuf_reserve(rb_map->rb, size);
if (!rec)
return -EAGAIN;
memcpy(rec, data, size);
bpf_ringbuf_commit(rec, flags, false /* discard */);
return 0;
}
const struct bpf_func_proto bpf_ringbuf_output_proto = {
.func = bpf_ringbuf_output,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_MEM,
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
.arg4_type = ARG_ANYTHING,
};
BPF_CALL_2(bpf_ringbuf_query, struct bpf_map *, map, u64, flags)
{
struct bpf_ringbuf *rb;
rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
switch (flags) {
case BPF_RB_AVAIL_DATA:
return ringbuf_avail_data_sz(rb);
case BPF_RB_RING_SIZE:
return rb->mask + 1;
case BPF_RB_CONS_POS:
return smp_load_acquire(&rb->consumer_pos);
case BPF_RB_PROD_POS:
return smp_load_acquire(&rb->producer_pos);
default:
return 0;
}
}
const struct bpf_func_proto bpf_ringbuf_query_proto = {
.func = bpf_ringbuf_query,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_ANYTHING,
};

View File

@ -26,6 +26,8 @@
#include <linux/audit.h>
#include <uapi/linux/btf.h>
#include <linux/bpf_lsm.h>
#include <linux/poll.h>
#include <linux/bpf-netns.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@ -662,6 +664,16 @@ out:
return err;
}
static __poll_t bpf_map_poll(struct file *filp, struct poll_table_struct *pts)
{
struct bpf_map *map = filp->private_data;
if (map->ops->map_poll)
return map->ops->map_poll(map, filp, pts);
return EPOLLERR;
}
const struct file_operations bpf_map_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_map_show_fdinfo,
@ -670,6 +682,7 @@ const struct file_operations bpf_map_fops = {
.read = bpf_dummy_read,
.write = bpf_dummy_write,
.mmap = bpf_map_mmap,
.poll = bpf_map_poll,
};
int bpf_map_new_fd(struct bpf_map *map, int flags)
@ -1387,7 +1400,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);
if (!buf) {
kvfree(buf_prevkey);
kfree(buf_prevkey);
return -ENOMEM;
}
@ -1472,7 +1485,8 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) ||
!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
@ -2855,7 +2869,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
ret = lirc_prog_attach(attr, prog);
break;
case BPF_PROG_TYPE_FLOW_DISSECTOR:
ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
ret = netns_bpf_prog_attach(attr, prog);
break;
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SKB:
@ -2895,7 +2909,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
case BPF_PROG_TYPE_FLOW_DISSECTOR:
if (!capable(CAP_NET_ADMIN))
return -EPERM;
return skb_flow_dissector_bpf_prog_detach(attr);
return netns_bpf_prog_detach(attr);
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SKB:
case BPF_PROG_TYPE_CGROUP_SOCK:
@ -2948,7 +2962,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
case BPF_LIRC_MODE2:
return lirc_prog_query(attr, uattr);
case BPF_FLOW_DISSECTOR:
return skb_flow_dissector_prog_query(attr, uattr);
return netns_bpf_prog_query(attr, uattr);
default:
return -EINVAL;
}
@ -3873,6 +3887,9 @@ static int link_create(union bpf_attr *attr)
case BPF_PROG_TYPE_TRACING:
ret = tracing_bpf_link_attach(attr, prog);
break;
case BPF_PROG_TYPE_FLOW_DISSECTOR:
ret = netns_bpf_link_create(attr, prog);
break;
default:
ret = -EINVAL;
}
@ -3924,7 +3941,7 @@ static int link_update(union bpf_attr *attr)
if (link->ops->update_prog)
ret = link->ops->update_prog(link, new_prog, old_prog);
else
ret = EINVAL;
ret = -EINVAL;
out_put_progs:
if (old_prog)

View File

@ -233,6 +233,7 @@ struct bpf_call_arg_meta {
bool pkt_access;
int regno;
int access_size;
int mem_size;
u64 msize_max_value;
int ref_obj_id;
int func_id;
@ -408,7 +409,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type type)
type == PTR_TO_SOCKET_OR_NULL ||
type == PTR_TO_SOCK_COMMON_OR_NULL ||
type == PTR_TO_TCP_SOCK_OR_NULL ||
type == PTR_TO_BTF_ID_OR_NULL;
type == PTR_TO_BTF_ID_OR_NULL ||
type == PTR_TO_MEM_OR_NULL;
}
static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
@ -422,7 +424,9 @@ static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
return type == PTR_TO_SOCKET ||
type == PTR_TO_SOCKET_OR_NULL ||
type == PTR_TO_TCP_SOCK ||
type == PTR_TO_TCP_SOCK_OR_NULL;
type == PTR_TO_TCP_SOCK_OR_NULL ||
type == PTR_TO_MEM ||
type == PTR_TO_MEM_OR_NULL;
}
static bool arg_type_may_be_refcounted(enum bpf_arg_type type)
@ -436,7 +440,9 @@ static bool arg_type_may_be_refcounted(enum bpf_arg_type type)
*/
static bool is_release_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_sk_release;
return func_id == BPF_FUNC_sk_release ||
func_id == BPF_FUNC_ringbuf_submit ||
func_id == BPF_FUNC_ringbuf_discard;
}
static bool may_be_acquire_function(enum bpf_func_id func_id)
@ -444,7 +450,8 @@ static bool may_be_acquire_function(enum bpf_func_id func_id)
return func_id == BPF_FUNC_sk_lookup_tcp ||
func_id == BPF_FUNC_sk_lookup_udp ||
func_id == BPF_FUNC_skc_lookup_tcp ||
func_id == BPF_FUNC_map_lookup_elem;
func_id == BPF_FUNC_map_lookup_elem ||
func_id == BPF_FUNC_ringbuf_reserve;
}
static bool is_acquire_function(enum bpf_func_id func_id,
@ -454,7 +461,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
if (func_id == BPF_FUNC_sk_lookup_tcp ||
func_id == BPF_FUNC_sk_lookup_udp ||
func_id == BPF_FUNC_skc_lookup_tcp)
func_id == BPF_FUNC_skc_lookup_tcp ||
func_id == BPF_FUNC_ringbuf_reserve)
return true;
if (func_id == BPF_FUNC_map_lookup_elem &&
@ -494,6 +502,8 @@ static const char * const reg_type_str[] = {
[PTR_TO_XDP_SOCK] = "xdp_sock",
[PTR_TO_BTF_ID] = "ptr_",
[PTR_TO_BTF_ID_OR_NULL] = "ptr_or_null_",
[PTR_TO_MEM] = "mem",
[PTR_TO_MEM_OR_NULL] = "mem_or_null",
};
static char slot_type_char[] = {
@ -2468,32 +2478,49 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
return 0;
}
/* check read/write into map element returned by bpf_map_lookup_elem() */
static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
int size, bool zero_size_allowed)
/* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
static int __check_mem_access(struct bpf_verifier_env *env, int regno,
int off, int size, u32 mem_size,
bool zero_size_allowed)
{
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_map *map = regs[regno].map_ptr;
bool size_ok = size > 0 || (size == 0 && zero_size_allowed);
struct bpf_reg_state *reg;
if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) ||
off + size > map->value_size) {
if (off >= 0 && size_ok && (u64)off + size <= mem_size)
return 0;
reg = &cur_regs(env)[regno];
switch (reg->type) {
case PTR_TO_MAP_VALUE:
verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
map->value_size, off, size);
return -EACCES;
mem_size, off, size);
break;
case PTR_TO_PACKET:
case PTR_TO_PACKET_META:
case PTR_TO_PACKET_END:
verbose(env, "invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
off, size, regno, reg->id, off, mem_size);
break;
case PTR_TO_MEM:
default:
verbose(env, "invalid access to memory, mem_size=%u off=%d size=%d\n",
mem_size, off, size);
}
return 0;
return -EACCES;
}
/* check read/write into a map element with possible variable offset */
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, bool zero_size_allowed)
/* check read/write into a memory region with possible variable offset */
static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, u32 mem_size,
bool zero_size_allowed)
{
struct bpf_verifier_state *vstate = env->cur_state;
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *reg = &state->regs[regno];
int err;
/* We may have adjusted the register to this map value, so we
/* We may have adjusted the register pointing to memory region, so we
* need to try adding each of min_value and max_value to off
* to make sure our theoretical access will be safe.
*/
@ -2514,10 +2541,10 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
regno);
return -EACCES;
}
err = __check_map_access(env, regno, reg->smin_value + off, size,
zero_size_allowed);
err = __check_mem_access(env, regno, reg->smin_value + off, size,
mem_size, zero_size_allowed);
if (err) {
verbose(env, "R%d min value is outside of the array range\n",
verbose(env, "R%d min value is outside of the allowed memory range\n",
regno);
return err;
}
@ -2527,18 +2554,38 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
* If reg->umax_value + off could overflow, treat that as unbounded too.
*/
if (reg->umax_value >= BPF_MAX_VAR_OFF) {
verbose(env, "R%d unbounded memory access, make sure to bounds check any array access into a map\n",
verbose(env, "R%d unbounded memory access, make sure to bounds check any such access\n",
regno);
return -EACCES;
}
err = __check_map_access(env, regno, reg->umax_value + off, size,
zero_size_allowed);
if (err)
verbose(env, "R%d max value is outside of the array range\n",
err = __check_mem_access(env, regno, reg->umax_value + off, size,
mem_size, zero_size_allowed);
if (err) {
verbose(env, "R%d max value is outside of the allowed memory range\n",
regno);
return err;
}
if (map_value_has_spin_lock(reg->map_ptr)) {
u32 lock = reg->map_ptr->spin_lock_off;
return 0;
}
/* check read/write into a map element with possible variable offset */
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, bool zero_size_allowed)
{
struct bpf_verifier_state *vstate = env->cur_state;
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *reg = &state->regs[regno];
struct bpf_map *map = reg->map_ptr;
int err;
err = check_mem_region_access(env, regno, off, size, map->value_size,
zero_size_allowed);
if (err)
return err;
if (map_value_has_spin_lock(map)) {
u32 lock = map->spin_lock_off;
/* if any part of struct bpf_spin_lock can be touched by
* load/store reject this program.
@ -2596,21 +2643,6 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
}
}
static int __check_packet_access(struct bpf_verifier_env *env, u32 regno,
int off, int size, bool zero_size_allowed)
{
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_reg_state *reg = &regs[regno];
if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) ||
(u64)off + size > reg->range) {
verbose(env, "invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
off, size, regno, reg->id, reg->off, reg->range);
return -EACCES;
}
return 0;
}
static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
int size, bool zero_size_allowed)
{
@ -2631,16 +2663,17 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
regno);
return -EACCES;
}
err = __check_packet_access(env, regno, off, size, zero_size_allowed);
err = __check_mem_access(env, regno, off, size, reg->range,
zero_size_allowed);
if (err) {
verbose(env, "R%d offset is outside of the packet\n", regno);
return err;
}
/* __check_packet_access has made sure "off + size - 1" is within u16.
/* __check_mem_access has made sure "off + size - 1" is within u16.
* reg->umax_value can't be bigger than MAX_PACKET_OFF which is 0xffff,
* otherwise find_good_pkt_pointers would have refused to set range info
* that __check_packet_access would have rejected this pkt access.
* that __check_mem_access would have rejected this pkt access.
* Therefore, "off + reg->umax_value + size - 1" won't overflow u32.
*/
env->prog->aux->max_pkt_offset =
@ -3220,6 +3253,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
mark_reg_unknown(env, regs, value_regno);
}
}
} else if (reg->type == PTR_TO_MEM) {
if (t == BPF_WRITE && value_regno >= 0 &&
is_pointer_value(env, value_regno)) {
verbose(env, "R%d leaks addr into mem\n", value_regno);
return -EACCES;
}
err = check_mem_region_access(env, regno, off, size,
reg->mem_size, false);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
} else if (reg->type == PTR_TO_CTX) {
enum bpf_reg_type reg_type = SCALAR_VALUE;
u32 btf_id = 0;
@ -3557,6 +3600,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
return -EACCES;
return check_map_access(env, regno, reg->off, access_size,
zero_size_allowed);
case PTR_TO_MEM:
return check_mem_region_access(env, regno, reg->off,
access_size, reg->mem_size,
zero_size_allowed);
default: /* scalar_value|ptr_to_stack or invalid ptr */
return check_stack_boundary(env, regno, access_size,
zero_size_allowed, meta);
@ -3661,6 +3708,17 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
type == ARG_CONST_SIZE_OR_ZERO;
}
static bool arg_type_is_alloc_mem_ptr(enum bpf_arg_type type)
{
return type == ARG_PTR_TO_ALLOC_MEM ||
type == ARG_PTR_TO_ALLOC_MEM_OR_NULL;
}
static bool arg_type_is_alloc_size(enum bpf_arg_type type)
{
return type == ARG_CONST_ALLOC_SIZE_OR_ZERO;
}
static bool arg_type_is_int_ptr(enum bpf_arg_type type)
{
return type == ARG_PTR_TO_INT ||
@ -3720,7 +3778,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
type != expected_type)
goto err_type;
} else if (arg_type == ARG_CONST_SIZE ||
arg_type == ARG_CONST_SIZE_OR_ZERO) {
arg_type == ARG_CONST_SIZE_OR_ZERO ||
arg_type == ARG_CONST_ALLOC_SIZE_OR_ZERO) {
expected_type = SCALAR_VALUE;
if (type != expected_type)
goto err_type;
@ -3791,13 +3850,29 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
* happens during stack boundary checking.
*/
if (register_is_null(reg) &&
arg_type == ARG_PTR_TO_MEM_OR_NULL)
(arg_type == ARG_PTR_TO_MEM_OR_NULL ||
arg_type == ARG_PTR_TO_ALLOC_MEM_OR_NULL))
/* final test in check_stack_boundary() */;
else if (!type_is_pkt_pointer(type) &&
type != PTR_TO_MAP_VALUE &&
type != PTR_TO_MEM &&
type != expected_type)
goto err_type;
meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
} else if (arg_type_is_alloc_mem_ptr(arg_type)) {
expected_type = PTR_TO_MEM;
if (register_is_null(reg) &&
arg_type == ARG_PTR_TO_ALLOC_MEM_OR_NULL)
/* final test in check_stack_boundary() */;
else if (type != expected_type)
goto err_type;
if (meta->ref_obj_id) {
verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
regno, reg->ref_obj_id,
meta->ref_obj_id);
return -EFAULT;
}
meta->ref_obj_id = reg->ref_obj_id;
} else if (arg_type_is_int_ptr(arg_type)) {
expected_type = PTR_TO_STACK;
if (!type_is_pkt_pointer(type) &&
@ -3893,6 +3968,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
zero_size_allowed, meta);
if (!err)
err = mark_chain_precision(env, regno);
} else if (arg_type_is_alloc_size(arg_type)) {
if (!tnum_is_const(reg->var_off)) {
verbose(env, "R%d unbounded size, use 'var &= const' or 'if (var < const)'\n",
regno);
return -EACCES;
}
meta->mem_size = reg->var_off.value;
} else if (arg_type_is_int_ptr(arg_type)) {
int size = int_ptr_type_to_size(arg_type);
@ -3929,6 +4011,14 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_xdp_output)
goto error;
break;
case BPF_MAP_TYPE_RINGBUF:
if (func_id != BPF_FUNC_ringbuf_output &&
func_id != BPF_FUNC_ringbuf_reserve &&
func_id != BPF_FUNC_ringbuf_submit &&
func_id != BPF_FUNC_ringbuf_discard &&
func_id != BPF_FUNC_ringbuf_query)
goto error;
break;
case BPF_MAP_TYPE_STACK_TRACE:
if (func_id != BPF_FUNC_get_stackid)
goto error;
@ -4655,6 +4745,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL;
regs[BPF_REG_0].id = ++env->id_gen;
} else if (fn->ret_type == RET_PTR_TO_ALLOC_MEM_OR_NULL) {
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
regs[BPF_REG_0].id = ++env->id_gen;
regs[BPF_REG_0].mem_size = meta.mem_size;
} else {
verbose(env, "unknown return type %d of func %s#%d\n",
fn->ret_type, func_id_name(func_id), func_id);
@ -6611,6 +6706,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
reg->type = PTR_TO_TCP_SOCK;
} else if (reg->type == PTR_TO_BTF_ID_OR_NULL) {
reg->type = PTR_TO_BTF_ID;
} else if (reg->type == PTR_TO_MEM_OR_NULL) {
reg->type = PTR_TO_MEM;
}
if (is_null) {
/* We don't need id and ref_obj_id from this point

View File

@ -147,7 +147,7 @@ BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
return ret;
}
static const struct bpf_func_proto bpf_probe_read_user_proto = {
const struct bpf_func_proto bpf_probe_read_user_proto = {
.func = bpf_probe_read_user,
.gpl_only = true,
.ret_type = RET_INTEGER,
@ -167,7 +167,7 @@ BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
return ret;
}
static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
const struct bpf_func_proto bpf_probe_read_user_str_proto = {
.func = bpf_probe_read_user_str,
.gpl_only = true,
.ret_type = RET_INTEGER,
@ -198,7 +198,7 @@ BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
}
static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
const struct bpf_func_proto bpf_probe_read_kernel_proto = {
.func = bpf_probe_read_kernel,
.gpl_only = true,
.ret_type = RET_INTEGER,
@ -253,7 +253,7 @@ BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
}
static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
.func = bpf_probe_read_kernel_str,
.gpl_only = true,
.ret_type = RET_INTEGER,
@ -585,9 +585,9 @@ BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
goto out;
}
err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
(void *) (long) args[fmt_cnt],
MAX_SEQ_PRINTF_STR_LEN);
err = strncpy_from_unsafe_strict(bufs->buf[memcpy_cnt],
(void *) (long) args[fmt_cnt],
MAX_SEQ_PRINTF_STR_LEN);
if (err < 0)
bufs->buf[memcpy_cnt][0] = '\0';
params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
@ -907,7 +907,7 @@ BPF_CALL_0(bpf_get_current_task)
return (long) current;
}
static const struct bpf_func_proto bpf_get_current_task_proto = {
const struct bpf_func_proto bpf_get_current_task_proto = {
.func = bpf_get_current_task,
.gpl_only = true,
.ret_type = RET_INTEGER,
@ -1088,6 +1088,16 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_perf_event_read_value_proto;
case BPF_FUNC_get_ns_current_pid_tgid:
return &bpf_get_ns_current_pid_tgid_proto;
case BPF_FUNC_ringbuf_output:
return &bpf_ringbuf_output_proto;
case BPF_FUNC_ringbuf_reserve:
return &bpf_ringbuf_reserve_proto;
case BPF_FUNC_ringbuf_submit:
return &bpf_ringbuf_submit_proto;
case BPF_FUNC_ringbuf_discard:
return &bpf_ringbuf_discard_proto;
case BPF_FUNC_ringbuf_query:
return &bpf_ringbuf_query_proto;
default:
return NULL;
}
@ -1457,7 +1467,7 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
}
}
static const struct bpf_func_proto *
const struct bpf_func_proto *
tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {

View File

@ -5420,6 +5420,18 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
struct bpf_prog *new = xdp->prog;
int ret = 0;
if (new) {
u32 i;
/* generic XDP does not work with DEVMAPs that can
* have a bpf_prog installed on an entry
*/
for (i = 0; i < new->aux->used_map_cnt; i++) {
if (dev_map_can_have_prog(new->aux->used_maps[i]))
return -EINVAL;
}
}
switch (xdp->command) {
case XDP_SETUP_PROG:
rcu_assign_pointer(dev->xdp_prog, new);
@ -8835,6 +8847,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
return -EINVAL;
}
if (prog->expected_attach_type == BPF_XDP_DEVMAP) {
NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
bpf_prog_put(prog);
return -EINVAL;
}
/* prog->aux->id may be 0 for orphaned device-bound progs */
if (prog->aux->id && prog->aux->id == prog_id) {
bpf_prog_put(prog);

View File

@ -4248,6 +4248,9 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
static int _bpf_setsockopt(struct sock *sk, int level, int optname,
char *optval, int optlen, u32 flags)
{
char devname[IFNAMSIZ];
struct net *net;
int ifindex;
int ret = 0;
int val;
@ -4257,7 +4260,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
sock_owned_by_me(sk);
if (level == SOL_SOCKET) {
if (optlen != sizeof(int))
if (optlen != sizeof(int) && optname != SO_BINDTODEVICE)
return -EINVAL;
val = *((int *)optval);
@ -4298,6 +4301,29 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
sk_dst_reset(sk);
}
break;
case SO_BINDTODEVICE:
ret = -ENOPROTOOPT;
#ifdef CONFIG_NETDEVICES
optlen = min_t(long, optlen, IFNAMSIZ - 1);
strncpy(devname, optval, optlen);
devname[optlen] = 0;
ifindex = 0;
if (devname[0] != '\0') {
struct net_device *dev;
ret = -ENODEV;
net = sock_net(sk);
dev = dev_get_by_name(net, devname);
if (!dev)
break;
ifindex = dev->ifindex;
dev_put(dev);
}
ret = sock_bindtoindex(sk, ifindex, false);
#endif
break;
default:
ret = -EINVAL;
}
@ -6443,6 +6469,26 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_msg_push_data_proto;
case BPF_FUNC_msg_pop_data:
return &bpf_msg_pop_data_proto;
case BPF_FUNC_perf_event_output:
return &bpf_event_output_data_proto;
case BPF_FUNC_get_current_uid_gid:
return &bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_current_pid_tgid:
return &bpf_get_current_pid_tgid_proto;
case BPF_FUNC_sk_storage_get:
return &bpf_sk_storage_get_proto;
case BPF_FUNC_sk_storage_delete:
return &bpf_sk_storage_delete_proto;
#ifdef CONFIG_CGROUPS
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_get_current_ancestor_cgroup_id:
return &bpf_get_current_ancestor_cgroup_id_proto;
#endif
#ifdef CONFIG_CGROUP_NET_CLASSID
case BPF_FUNC_get_cgroup_classid:
return &bpf_get_cgroup_classid_curr_proto;
#endif
default:
return bpf_base_func_proto(func_id);
}
@ -6829,6 +6875,7 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
case offsetof(struct bpf_sock, protocol):
case offsetof(struct bpf_sock, dst_port):
case offsetof(struct bpf_sock, src_port):
case offsetof(struct bpf_sock, rx_queue_mapping):
case bpf_ctx_range(struct bpf_sock, src_ip4):
case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
case bpf_ctx_range(struct bpf_sock, dst_ip4):
@ -6994,6 +7041,13 @@ static bool xdp_is_valid_access(int off, int size,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
if (prog->expected_attach_type != BPF_XDP_DEVMAP) {
switch (off) {
case offsetof(struct xdp_md, egress_ifindex):
return false;
}
}
if (type == BPF_WRITE) {
if (bpf_prog_is_dev_bound(prog->aux)) {
switch (off) {
@ -7257,6 +7311,11 @@ static bool sk_msg_is_valid_access(int off, int size,
if (size != sizeof(__u64))
return false;
break;
case offsetof(struct sk_msg_md, sk):
if (size != sizeof(__u64))
return false;
info->reg_type = PTR_TO_SOCKET;
break;
case bpf_ctx_range(struct sk_msg_md, family):
case bpf_ctx_range(struct sk_msg_md, remote_ip4):
case bpf_ctx_range(struct sk_msg_md, local_ip4):
@ -7872,6 +7931,23 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
skc_state),
target_size));
break;
case offsetof(struct bpf_sock, rx_queue_mapping):
#ifdef CONFIG_XPS
*insn++ = BPF_LDX_MEM(
BPF_FIELD_SIZEOF(struct sock, sk_rx_queue_mapping),
si->dst_reg, si->src_reg,
bpf_target_off(struct sock, sk_rx_queue_mapping,
sizeof_field(struct sock,
sk_rx_queue_mapping),
target_size));
*insn++ = BPF_JMP_IMM(BPF_JNE, si->dst_reg, NO_QUEUE_MAPPING,
1);
*insn++ = BPF_MOV64_IMM(si->dst_reg, -1);
#else
*insn++ = BPF_MOV64_IMM(si->dst_reg, -1);
*target_size = 2;
#endif
break;
}
return insn - insn_buf;
@ -7942,6 +8018,16 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
offsetof(struct xdp_rxq_info,
queue_index));
break;
case offsetof(struct xdp_md, egress_ifindex):
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
si->dst_reg, si->src_reg,
offsetof(struct xdp_buff, txq));
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
si->dst_reg, si->dst_reg,
offsetof(struct xdp_txq_info, dev));
*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
offsetof(struct net_device, ifindex));
break;
}
return insn - insn_buf;
@ -8593,6 +8679,12 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
si->dst_reg, si->src_reg,
offsetof(struct sk_msg_sg, size));
break;
case offsetof(struct sk_msg_md, sk):
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_msg, sk),
si->dst_reg, si->src_reg,
offsetof(struct sk_msg, sk));
break;
}
return insn - insn_buf;

View File

@ -31,8 +31,7 @@
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_conntrack_labels.h>
#endif
static DEFINE_MUTEX(flow_dissector_mutex);
#include <linux/bpf-netns.h>
static void dissector_set_key(struct flow_dissector *flow_dissector,
enum flow_dissector_key_id key_id)
@ -70,54 +69,11 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
}
EXPORT_SYMBOL(skb_flow_dissector_init);
int skb_flow_dissector_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
#ifdef CONFIG_BPF_SYSCALL
int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
{
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
u32 prog_id, prog_cnt = 0, flags = 0;
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
struct bpf_prog *attached;
struct net *net;
if (attr->query.query_flags)
return -EINVAL;
net = get_net_ns_by_fd(attr->query.target_fd);
if (IS_ERR(net))
return PTR_ERR(net);
rcu_read_lock();
attached = rcu_dereference(net->flow_dissector_prog);
if (attached) {
prog_cnt = 1;
prog_id = attached->aux->id;
}
rcu_read_unlock();
put_net(net);
if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
return -EFAULT;
if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
return -EFAULT;
if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
return 0;
if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
return -EFAULT;
return 0;
}
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
struct bpf_prog *attached;
struct net *net;
int ret = 0;
net = current->nsproxy->net_ns;
mutex_lock(&flow_dissector_mutex);
if (net == &init_net) {
/* BPF flow dissector in the root namespace overrides
@ -130,70 +86,29 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
for_each_net(ns) {
if (ns == &init_net)
continue;
if (rcu_access_pointer(ns->flow_dissector_prog)) {
ret = -EEXIST;
goto out;
}
if (rcu_access_pointer(ns->bpf.progs[type]))
return -EEXIST;
}
} else {
/* Make sure root flow dissector is not attached
* when attaching to the non-root namespace.
*/
if (rcu_access_pointer(init_net.flow_dissector_prog)) {
ret = -EEXIST;
goto out;
}
if (rcu_access_pointer(init_net.bpf.progs[type]))
return -EEXIST;
}
attached = rcu_dereference_protected(net->flow_dissector_prog,
lockdep_is_held(&flow_dissector_mutex));
if (attached == prog) {
attached = rcu_dereference_protected(net->bpf.progs[type],
lockdep_is_held(&netns_bpf_mutex));
if (attached == prog)
/* The same program cannot be attached twice */
ret = -EINVAL;
goto out;
}
rcu_assign_pointer(net->flow_dissector_prog, prog);
return -EINVAL;
rcu_assign_pointer(net->bpf.progs[type], prog);
if (attached)
bpf_prog_put(attached);
out:
mutex_unlock(&flow_dissector_mutex);
return ret;
}
static int flow_dissector_bpf_prog_detach(struct net *net)
{
struct bpf_prog *attached;
mutex_lock(&flow_dissector_mutex);
attached = rcu_dereference_protected(net->flow_dissector_prog,
lockdep_is_held(&flow_dissector_mutex));
if (!attached) {
mutex_unlock(&flow_dissector_mutex);
return -ENOENT;
}
RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
bpf_prog_put(attached);
mutex_unlock(&flow_dissector_mutex);
return 0;
}
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
{
return flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
}
static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
{
/* We're not racing with attach/detach because there are no
* references to netns left when pre_exit gets called.
*/
if (rcu_access_pointer(net->flow_dissector_prog))
flow_dissector_bpf_prog_detach(net);
}
static struct pernet_operations flow_dissector_pernet_ops __net_initdata = {
.pre_exit = flow_dissector_pernet_pre_exit,
};
#endif /* CONFIG_BPF_SYSCALL */
/**
* __skb_flow_get_ports - extract the upper layer ports and return them
@ -1044,11 +959,13 @@ bool __skb_flow_dissect(const struct net *net,
WARN_ON_ONCE(!net);
if (net) {
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
rcu_read_lock();
attached = rcu_dereference(init_net.flow_dissector_prog);
attached = rcu_dereference(init_net.bpf.progs[type]);
if (!attached)
attached = rcu_dereference(net->flow_dissector_prog);
attached = rcu_dereference(net->bpf.progs[type]);
if (attached) {
struct bpf_flow_keys flow_keys;
@ -1869,7 +1786,6 @@ static int __init init_default_flow_dissectors(void)
skb_flow_dissector_init(&flow_keys_basic_dissector,
flow_keys_basic_dissector_keys,
ARRAY_SIZE(flow_keys_basic_dissector_keys));
return register_pernet_subsys(&flow_dissector_pernet_ops);
return 0;
}
core_initcall(init_default_flow_dissectors);

View File

@ -7,6 +7,7 @@
#include <net/sock.h>
#include <net/tcp.h>
#include <net/tls.h>
static bool sk_msg_try_coalesce_ok(struct sk_msg *msg, int elem_first_coalesce)
{
@ -682,13 +683,75 @@ static struct sk_psock *sk_psock_from_strp(struct strparser *strp)
return container_of(parser, struct sk_psock, parser);
}
static void sk_psock_verdict_apply(struct sk_psock *psock,
struct sk_buff *skb, int verdict)
static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb)
{
struct sk_psock *psock_other;
struct sock *sk_other;
bool ingress;
sk_other = tcp_skb_bpf_redirect_fetch(skb);
if (unlikely(!sk_other)) {
kfree_skb(skb);
return;
}
psock_other = sk_psock(sk_other);
if (!psock_other || sock_flag(sk_other, SOCK_DEAD) ||
!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
kfree_skb(skb);
return;
}
ingress = tcp_skb_bpf_ingress(skb);
if ((!ingress && sock_writeable(sk_other)) ||
(ingress &&
atomic_read(&sk_other->sk_rmem_alloc) <=
sk_other->sk_rcvbuf)) {
if (!ingress)
skb_set_owner_w(skb, sk_other);
skb_queue_tail(&psock_other->ingress_skb, skb);
schedule_work(&psock_other->work);
} else {
kfree_skb(skb);
}
}
static void sk_psock_tls_verdict_apply(struct sk_psock *psock,
struct sk_buff *skb, int verdict)
{
switch (verdict) {
case __SK_REDIRECT:
sk_psock_skb_redirect(psock, skb);
break;
case __SK_PASS:
case __SK_DROP:
default:
break;
}
}
int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
{
struct bpf_prog *prog;
int ret = __SK_PASS;
rcu_read_lock();
prog = READ_ONCE(psock->progs.skb_verdict);
if (likely(prog)) {
tcp_skb_bpf_redirect_clear(skb);
ret = sk_psock_bpf_run(psock, prog, skb);
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
}
rcu_read_unlock();
sk_psock_tls_verdict_apply(psock, skb, ret);
return ret;
}
EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
static void sk_psock_verdict_apply(struct sk_psock *psock,
struct sk_buff *skb, int verdict)
{
struct sock *sk_other;
switch (verdict) {
case __SK_PASS:
sk_other = psock->sk;
@ -707,25 +770,8 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
}
goto out_free;
case __SK_REDIRECT:
sk_other = tcp_skb_bpf_redirect_fetch(skb);
if (unlikely(!sk_other))
goto out_free;
psock_other = sk_psock(sk_other);
if (!psock_other || sock_flag(sk_other, SOCK_DEAD) ||
!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED))
goto out_free;
ingress = tcp_skb_bpf_ingress(skb);
if ((!ingress && sock_writeable(sk_other)) ||
(ingress &&
atomic_read(&sk_other->sk_rmem_alloc) <=
sk_other->sk_rcvbuf)) {
if (!ingress)
skb_set_owner_w(skb, sk_other);
skb_queue_tail(&psock_other->ingress_skb, skb);
schedule_work(&psock_other->work);
break;
}
/* fall-through */
sk_psock_skb_redirect(psock, skb);
break;
case __SK_DROP:
/* fall-through */
default:
@ -779,9 +825,13 @@ static void sk_psock_strp_data_ready(struct sock *sk)
rcu_read_lock();
psock = sk_psock(sk);
if (likely(psock)) {
write_lock_bh(&sk->sk_callback_lock);
strp_data_ready(&psock->parser.strp);
write_unlock_bh(&sk->sk_callback_lock);
if (tls_sw_has_ctx_rx(sk)) {
psock->parser.saved_data_ready(sk);
} else {
write_lock_bh(&sk->sk_callback_lock);
strp_data_ready(&psock->parser.strp);
write_unlock_bh(&sk->sk_callback_lock);
}
}
rcu_read_unlock();
}

View File

@ -594,13 +594,15 @@ out:
return ret;
}
int sock_bindtoindex(struct sock *sk, int ifindex)
int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk)
{
int ret;
lock_sock(sk);
if (lock_sk)
lock_sock(sk);
ret = sock_bindtoindex_locked(sk, ifindex);
release_sock(sk);
if (lock_sk)
release_sock(sk);
return ret;
}
@ -646,7 +648,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
goto out;
}
return sock_bindtoindex(sk, index);
return sock_bindtoindex(sk, index, true);
out:
#endif

View File

@ -22,7 +22,7 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
goto error;
if (cfg->bind_ifindex) {
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex);
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex, true);
if (err < 0)
goto error;
}

View File

@ -30,7 +30,7 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
goto error;
}
if (cfg->bind_ifindex) {
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex);
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex, true);
if (err < 0)
goto error;
}

View File

@ -1742,6 +1742,7 @@ int tls_sw_recvmsg(struct sock *sk,
long timeo;
bool is_kvec = iov_iter_is_kvec(&msg->msg_iter);
bool is_peek = flags & MSG_PEEK;
bool bpf_strp_enabled;
int num_async = 0;
int pending;
@ -1752,6 +1753,7 @@ int tls_sw_recvmsg(struct sock *sk,
psock = sk_psock_get(sk);
lock_sock(sk);
bpf_strp_enabled = sk_psock_strp_enabled(psock);
/* Process pending decrypted records. It must be non-zero-copy */
err = process_rx_list(ctx, msg, &control, &cmsg, 0, len, false,
@ -1805,11 +1807,12 @@ int tls_sw_recvmsg(struct sock *sk,
if (to_decrypt <= len && !is_kvec && !is_peek &&
ctx->control == TLS_RECORD_TYPE_DATA &&
prot->version != TLS_1_3_VERSION)
prot->version != TLS_1_3_VERSION &&
!bpf_strp_enabled)
zc = true;
/* Do not use async mode if record is non-data */
if (ctx->control == TLS_RECORD_TYPE_DATA)
if (ctx->control == TLS_RECORD_TYPE_DATA && !bpf_strp_enabled)
async_capable = ctx->async_capable;
else
async_capable = false;
@ -1859,6 +1862,19 @@ int tls_sw_recvmsg(struct sock *sk,
goto pick_next_record;
if (!zc) {
if (bpf_strp_enabled) {
err = sk_psock_tls_strp_read(psock, skb);
if (err != __SK_PASS) {
rxm->offset = rxm->offset + rxm->full_len;
rxm->full_len = 0;
if (err == __SK_DROP)
consume_skb(skb);
ctx->recv_pkt = NULL;
__strp_unpause(&ctx->strp);
continue;
}
}
if (rxm->full_len > len) {
retain_skb = true;
chunk = len;

View File

@ -553,7 +553,7 @@ static int do_dump(int argc, char **argv)
btf = btf__parse_elf(*argv, NULL);
if (IS_ERR(btf)) {
err = PTR_ERR(btf);
err = -PTR_ERR(btf);
btf = NULL;
p_err("failed to load BTF from %s: %s",
*argv, strerror(err));
@ -951,9 +951,9 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s btf { show | list } [id BTF_ID]\n"
" %s btf dump BTF_SRC [format FORMAT]\n"
" %s btf help\n"
"Usage: %1$s %2$s { show | list } [id BTF_ID]\n"
" %1$s %2$s dump BTF_SRC [format FORMAT]\n"
" %1$s %2$s help\n"
"\n"
" BTF_SRC := { id BTF_ID | prog PROG | map MAP [{key | value | kv | all}] | file FILE }\n"
" FORMAT := { raw | c }\n"
@ -961,7 +961,7 @@ static int do_help(int argc, char **argv)
" " HELP_SPEC_PROGRAM "\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, bin_name, bin_name);
bin_name, "btf");
return 0;
}

View File

@ -491,20 +491,18 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s { show | list } CGROUP [**effective**]\n"
" %s %s tree [CGROUP_ROOT] [**effective**]\n"
" %s %s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
" %s %s detach CGROUP ATTACH_TYPE PROG\n"
" %s %s help\n"
"Usage: %1$s %2$s { show | list } CGROUP [**effective**]\n"
" %1$s %2$s tree [CGROUP_ROOT] [**effective**]\n"
" %1$s %2$s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
" %1$s %2$s detach CGROUP ATTACH_TYPE PROG\n"
" %1$s %2$s help\n"
"\n"
HELP_SPEC_ATTACH_TYPES "\n"
" " HELP_SPEC_ATTACH_FLAGS "\n"
" " HELP_SPEC_PROGRAM "\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2]);
bin_name, argv[-2]);
return 0;
}

View File

@ -758,11 +758,29 @@ static void section_misc(const char *define_prefix, __u32 ifindex)
print_end_section();
}
#ifdef USE_LIBCAP
#define capability(c) { c, false, #c }
#define capability_msg(a, i) a[i].set ? "" : a[i].name, a[i].set ? "" : ", "
#endif
static int handle_perms(void)
{
#ifdef USE_LIBCAP
cap_value_t cap_list[1] = { CAP_SYS_ADMIN };
bool has_sys_admin_cap = false;
struct {
cap_value_t cap;
bool set;
char name[14]; /* strlen("CAP_SYS_ADMIN") */
} bpf_caps[] = {
capability(CAP_SYS_ADMIN),
#ifdef CAP_BPF
capability(CAP_BPF),
capability(CAP_NET_ADMIN),
capability(CAP_PERFMON),
#endif
};
cap_value_t cap_list[ARRAY_SIZE(bpf_caps)];
unsigned int i, nb_bpf_caps = 0;
bool cap_sys_admin_only = true;
cap_flag_value_t val;
int res = -1;
cap_t caps;
@ -774,35 +792,64 @@ static int handle_perms(void)
return -1;
}
if (cap_get_flag(caps, CAP_SYS_ADMIN, CAP_EFFECTIVE, &val)) {
p_err("bug: failed to retrieve CAP_SYS_ADMIN status");
goto exit_free;
}
if (val == CAP_SET)
has_sys_admin_cap = true;
#ifdef CAP_BPF
if (CAP_IS_SUPPORTED(CAP_BPF))
cap_sys_admin_only = false;
#endif
if (!run_as_unprivileged && !has_sys_admin_cap) {
p_err("full feature probing requires CAP_SYS_ADMIN, run as root or use 'unprivileged'");
goto exit_free;
for (i = 0; i < ARRAY_SIZE(bpf_caps); i++) {
const char *cap_name = bpf_caps[i].name;
cap_value_t cap = bpf_caps[i].cap;
if (cap_get_flag(caps, cap, CAP_EFFECTIVE, &val)) {
p_err("bug: failed to retrieve %s status: %s", cap_name,
strerror(errno));
goto exit_free;
}
if (val == CAP_SET) {
bpf_caps[i].set = true;
cap_list[nb_bpf_caps++] = cap;
}
if (cap_sys_admin_only)
/* System does not know about CAP_BPF, meaning that
* CAP_SYS_ADMIN is the only capability required. We
* just checked it, break.
*/
break;
}
if ((run_as_unprivileged && !has_sys_admin_cap) ||
(!run_as_unprivileged && has_sys_admin_cap)) {
if ((run_as_unprivileged && !nb_bpf_caps) ||
(!run_as_unprivileged && nb_bpf_caps == ARRAY_SIZE(bpf_caps)) ||
(!run_as_unprivileged && cap_sys_admin_only && nb_bpf_caps)) {
/* We are all good, exit now */
res = 0;
goto exit_free;
}
/* if (run_as_unprivileged && has_sys_admin_cap), drop CAP_SYS_ADMIN */
if (!run_as_unprivileged) {
if (cap_sys_admin_only)
p_err("missing %s, required for full feature probing; run as root or use 'unprivileged'",
bpf_caps[0].name);
else
p_err("missing %s%s%s%s%s%s%s%srequired for full feature probing; run as root or use 'unprivileged'",
capability_msg(bpf_caps, 0),
capability_msg(bpf_caps, 1),
capability_msg(bpf_caps, 2),
capability_msg(bpf_caps, 3));
goto exit_free;
}
if (cap_set_flag(caps, CAP_EFFECTIVE, ARRAY_SIZE(cap_list), cap_list,
/* if (run_as_unprivileged && nb_bpf_caps > 0), drop capabilities. */
if (cap_set_flag(caps, CAP_EFFECTIVE, nb_bpf_caps, cap_list,
CAP_CLEAR)) {
p_err("bug: failed to clear CAP_SYS_ADMIN from capabilities");
p_err("bug: failed to clear capabilities: %s", strerror(errno));
goto exit_free;
}
if (cap_set_proc(caps)) {
p_err("failed to drop CAP_SYS_ADMIN: %s", strerror(errno));
p_err("failed to drop capabilities: %s", strerror(errno));
goto exit_free;
}
@ -817,7 +864,7 @@ exit_free:
return res;
#else
/* Detection assumes user has sufficient privileges (CAP_SYS_ADMIN).
/* Detection assumes user has specific privileges.
* We do not use libpcap so let's approximate, and restrict usage to
* root user only.
*/
@ -901,7 +948,7 @@ static int do_probe(int argc, char **argv)
}
}
/* Full feature detection requires CAP_SYS_ADMIN privilege.
/* Full feature detection requires specific privileges.
* Let's approximate, and warn if user is not root.
*/
if (handle_perms())
@ -937,12 +984,12 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s probe [COMPONENT] [full] [unprivileged] [macros [prefix PREFIX]]\n"
" %s %s help\n"
"Usage: %1$s %2$s probe [COMPONENT] [full] [unprivileged] [macros [prefix PREFIX]]\n"
" %1$s %2$s help\n"
"\n"
" COMPONENT := { kernel | dev NAME }\n"
"",
bin_name, argv[-2], bin_name, argv[-2]);
bin_name, argv[-2]);
return 0;
}

View File

@ -586,12 +586,12 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %1$s gen skeleton FILE\n"
" %1$s gen help\n"
"Usage: %1$s %2$s skeleton FILE\n"
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name);
bin_name, "gen");
return 0;
}

View File

@ -68,10 +68,10 @@ close_obj:
static int do_help(int argc, char **argv)
{
fprintf(stderr,
"Usage: %s %s pin OBJ PATH\n"
" %s %s help\n"
"\n",
bin_name, argv[-2], bin_name, argv[-2]);
"Usage: %1$s %2$s pin OBJ PATH\n"
" %1$s %2$s help\n"
"",
bin_name, "iter");
return 0;
}

View File

@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_TRACING] = "tracing",
[BPF_LINK_TYPE_CGROUP] = "cgroup",
[BPF_LINK_TYPE_ITER] = "iter",
[BPF_LINK_TYPE_NETNS] = "netns",
};
static int link_parse_fd(int *argc, char ***argv)
@ -62,6 +63,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
}
static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)
{
if (attach_type < ARRAY_SIZE(attach_type_name))
jsonw_string_field(wtr, "attach_type",
attach_type_name[attach_type]);
else
jsonw_uint_field(wtr, "attach_type", attach_type);
}
static int get_prog_info(int prog_id, struct bpf_prog_info *info)
{
__u32 len = sizeof(*info);
@ -105,22 +115,18 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
jsonw_uint_field(json_wtr, "prog_type",
prog_info.type);
if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
jsonw_string_field(json_wtr, "attach_type",
attach_type_name[info->tracing.attach_type]);
else
jsonw_uint_field(json_wtr, "attach_type",
info->tracing.attach_type);
show_link_attach_type_json(info->tracing.attach_type,
json_wtr);
break;
case BPF_LINK_TYPE_CGROUP:
jsonw_lluint_field(json_wtr, "cgroup_id",
info->cgroup.cgroup_id);
if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
jsonw_string_field(json_wtr, "attach_type",
attach_type_name[info->cgroup.attach_type]);
else
jsonw_uint_field(json_wtr, "attach_type",
info->cgroup.attach_type);
show_link_attach_type_json(info->cgroup.attach_type, json_wtr);
break;
case BPF_LINK_TYPE_NETNS:
jsonw_uint_field(json_wtr, "netns_ino",
info->netns.netns_ino);
show_link_attach_type_json(info->netns.attach_type, json_wtr);
break;
default:
break;
@ -153,6 +159,14 @@ static void show_link_header_plain(struct bpf_link_info *info)
printf("prog %u ", info->prog_id);
}
static void show_link_attach_type_plain(__u32 attach_type)
{
if (attach_type < ARRAY_SIZE(attach_type_name))
printf("attach_type %s ", attach_type_name[attach_type]);
else
printf("attach_type %u ", attach_type);
}
static int show_link_close_plain(int fd, struct bpf_link_info *info)
{
struct bpf_prog_info prog_info;
@ -176,19 +190,15 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
else
printf("\n\tprog_type %u ", prog_info.type);
if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
printf("attach_type %s ",
attach_type_name[info->tracing.attach_type]);
else
printf("attach_type %u ", info->tracing.attach_type);
show_link_attach_type_plain(info->tracing.attach_type);
break;
case BPF_LINK_TYPE_CGROUP:
printf("\n\tcgroup_id %zu ", (size_t)info->cgroup.cgroup_id);
if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
printf("attach_type %s ",
attach_type_name[info->cgroup.attach_type]);
else
printf("attach_type %u ", info->cgroup.attach_type);
show_link_attach_type_plain(info->cgroup.attach_type);
break;
case BPF_LINK_TYPE_NETNS:
printf("\n\tnetns_ino %u ", info->netns.netns_ino);
show_link_attach_type_plain(info->netns.attach_type);
break;
default:
break;
@ -312,7 +322,6 @@ static int do_help(int argc, char **argv)
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_LINK "\n"
" " HELP_SPEC_PROGRAM "\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2]);

View File

@ -1561,24 +1561,24 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s { show | list } [MAP]\n"
" %s %s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
" entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
" [dev NAME]\n"
" %s %s dump MAP\n"
" %s %s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
" %s %s lookup MAP [key DATA]\n"
" %s %s getnext MAP [key DATA]\n"
" %s %s delete MAP key DATA\n"
" %s %s pin MAP FILE\n"
" %s %s event_pipe MAP [cpu N index M]\n"
" %s %s peek MAP\n"
" %s %s push MAP value VALUE\n"
" %s %s pop MAP\n"
" %s %s enqueue MAP value VALUE\n"
" %s %s dequeue MAP\n"
" %s %s freeze MAP\n"
" %s %s help\n"
"Usage: %1$s %2$s { show | list } [MAP]\n"
" %1$s %2$s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
" entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
" [dev NAME]\n"
" %1$s %2$s dump MAP\n"
" %1$s %2$s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
" %1$s %2$s lookup MAP [key DATA]\n"
" %1$s %2$s getnext MAP [key DATA]\n"
" %1$s %2$s delete MAP key DATA\n"
" %1$s %2$s pin MAP FILE\n"
" %1$s %2$s event_pipe MAP [cpu N index M]\n"
" %1$s %2$s peek MAP\n"
" %1$s %2$s push MAP value VALUE\n"
" %1$s %2$s pop MAP\n"
" %1$s %2$s enqueue MAP value VALUE\n"
" %1$s %2$s dequeue MAP\n"
" %1$s %2$s freeze MAP\n"
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_MAP "\n"
" DATA := { [hex] BYTES }\n"
@ -1593,11 +1593,6 @@ static int do_help(int argc, char **argv)
" queue | stack | sk_storage | struct_ops }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2]);
return 0;

View File

@ -458,10 +458,10 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s { show | list } [dev <devname>]\n"
" %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
" %s %s detach ATTACH_TYPE dev <devname>\n"
" %s %s help\n"
"Usage: %1$s %2$s { show | list } [dev <devname>]\n"
" %1$s %2$s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
" %1$s %2$s detach ATTACH_TYPE dev <devname>\n"
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_PROGRAM "\n"
" ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
@ -470,8 +470,8 @@ static int do_help(int argc, char **argv)
" For progs attached to cgroups, use \"bpftool cgroup\"\n"
" to dump program attachments. For program types\n"
" sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
" consult iproute2.\n",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
" consult iproute2.\n"
"",
bin_name, argv[-2]);
return 0;

View File

@ -231,7 +231,7 @@ static int do_show(int argc, char **argv)
static int do_help(int argc, char **argv)
{
fprintf(stderr,
"Usage: %s %s { show | list | help }\n"
"Usage: %1$s %2$s { show | list | help }\n"
"",
bin_name, argv[-2]);

View File

@ -1984,24 +1984,24 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s { show | list } [PROG]\n"
" %s %s dump xlated PROG [{ file FILE | opcodes | visual | linum }]\n"
" %s %s dump jited PROG [{ file FILE | opcodes | linum }]\n"
" %s %s pin PROG FILE\n"
" %s %s { load | loadall } OBJ PATH \\\n"
"Usage: %1$s %2$s { show | list } [PROG]\n"
" %1$s %2$s dump xlated PROG [{ file FILE | opcodes | visual | linum }]\n"
" %1$s %2$s dump jited PROG [{ file FILE | opcodes | linum }]\n"
" %1$s %2$s pin PROG FILE\n"
" %1$s %2$s { load | loadall } OBJ PATH \\\n"
" [type TYPE] [dev NAME] \\\n"
" [map { idx IDX | name NAME } MAP]\\\n"
" [pinmaps MAP_DIR]\n"
" %s %s attach PROG ATTACH_TYPE [MAP]\n"
" %s %s detach PROG ATTACH_TYPE [MAP]\n"
" %s %s run PROG \\\n"
" %1$s %2$s attach PROG ATTACH_TYPE [MAP]\n"
" %1$s %2$s detach PROG ATTACH_TYPE [MAP]\n"
" %1$s %2$s run PROG \\\n"
" data_in FILE \\\n"
" [data_out FILE [data_size_out L]] \\\n"
" [ctx_in FILE [ctx_out FILE [ctx_size_out M]]] \\\n"
" [repeat N]\n"
" %s %s profile PROG [duration DURATION] METRICs\n"
" %s %s tracelog\n"
" %s %s help\n"
" %1$s %2$s profile PROG [duration DURATION] METRICs\n"
" %1$s %2$s tracelog\n"
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_MAP "\n"
" " HELP_SPEC_PROGRAM "\n"
@ -2022,10 +2022,7 @@ static int do_help(int argc, char **argv)
" METRIC := { cycles | instructions | l1d_loads | llc_misses }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2]);
bin_name, argv[-2]);
return 0;
}

View File

@ -566,16 +566,15 @@ static int do_help(int argc, char **argv)
}
fprintf(stderr,
"Usage: %s %s { show | list } [STRUCT_OPS_MAP]\n"
" %s %s dump [STRUCT_OPS_MAP]\n"
" %s %s register OBJ\n"
" %s %s unregister STRUCT_OPS_MAP\n"
" %s %s help\n"
"Usage: %1$s %2$s { show | list } [STRUCT_OPS_MAP]\n"
" %1$s %2$s dump [STRUCT_OPS_MAP]\n"
" %1$s %2$s register OBJ\n"
" %1$s %2$s unregister STRUCT_OPS_MAP\n"
" %1$s %2$s help\n"
"\n"
" OPTIONS := { {-j|--json} [{-p|--pretty}] }\n"
" STRUCT_OPS_MAP := [ id STRUCT_OPS_MAP_ID | name STRUCT_OPS_MAP_NAME ]\n",
bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2],
" STRUCT_OPS_MAP := [ id STRUCT_OPS_MAP_ID | name STRUCT_OPS_MAP_NAME ]\n"
"",
bin_name, argv[-2]);
return 0;

View File

@ -147,6 +147,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
};
/* Note that tracing related programs such as
@ -224,6 +225,7 @@ enum bpf_attach_type {
BPF_CGROUP_INET6_GETPEERNAME,
BPF_CGROUP_INET4_GETSOCKNAME,
BPF_CGROUP_INET6_GETSOCKNAME,
BPF_XDP_DEVMAP,
__MAX_BPF_ATTACH_TYPE
};
@ -235,6 +237,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TRACING = 2,
BPF_LINK_TYPE_CGROUP = 3,
BPF_LINK_TYPE_ITER = 4,
BPF_LINK_TYPE_NETNS = 5,
MAX_BPF_LINK_TYPE,
};
@ -3157,6 +3160,59 @@ union bpf_attr {
* **bpf_sk_cgroup_id**\ ().
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
* void *bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
* Description
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* 0, on success;
* < 0, on error.
*
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
*
* void bpf_ringbuf_submit(void *data, u64 flags)
* Description
* Submit reserved ring buffer sample, pointed to by *data*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* Nothing. Always succeeds.
*
* void bpf_ringbuf_discard(void *data, u64 flags)
* Description
* Discard reserved ring buffer sample, pointed to by *data*.
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
* new data availability is sent.
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
* new data availability is sent unconditionally.
* Return
* Nothing. Always succeeds.
*
* u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
* Description
* Query various characteristics of provided ring buffer. What
* exactly is queries is determined by *flags*:
* - BPF_RB_AVAIL_DATA - amount of data not yet consumed;
* - BPF_RB_RING_SIZE - the size of ring buffer;
* - BPF_RB_CONS_POS - consumer position (can wrap around);
* - BPF_RB_PROD_POS - producer(s) position (can wrap around);
* Data returned is just a momentary snapshots of actual values
* and could be inaccurate, so this facility should be used to
* power heuristics and for reporting, not to make 100% correct
* calculation.
* Return
* Requested value, or 0, if flags are not recognized.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -3288,7 +3344,12 @@ union bpf_attr {
FN(seq_printf), \
FN(seq_write), \
FN(sk_cgroup_id), \
FN(sk_ancestor_cgroup_id),
FN(sk_ancestor_cgroup_id), \
FN(ringbuf_output), \
FN(ringbuf_reserve), \
FN(ringbuf_submit), \
FN(ringbuf_discard), \
FN(ringbuf_query),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@ -3398,6 +3459,29 @@ enum {
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
};
/* BPF_FUNC_bpf_ringbuf_commit, BPF_FUNC_bpf_ringbuf_discard, and
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
/* BPF_FUNC_bpf_ringbuf_query flags */
enum {
BPF_RB_AVAIL_DATA = 0,
BPF_RB_RING_SIZE = 1,
BPF_RB_CONS_POS = 2,
BPF_RB_PROD_POS = 3,
};
/* BPF ring buffer constants */
enum {
BPF_RINGBUF_BUSY_BIT = (1U << 31),
BPF_RINGBUF_DISCARD_BIT = (1U << 30),
BPF_RINGBUF_HDR_SZ = 8,
};
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
@ -3530,6 +3614,7 @@ struct bpf_sock {
__u32 dst_ip4;
__u32 dst_ip6[4];
__u32 state;
__s32 rx_queue_mapping;
};
struct bpf_tcp_sock {
@ -3623,6 +3708,8 @@ struct xdp_md {
/* Below access go through struct xdp_rxq_info */
__u32 ingress_ifindex; /* rxq->dev->ifindex */
__u32 rx_queue_index; /* rxq->queue_index */
__u32 egress_ifindex; /* txq->dev->ifindex */
};
enum sk_action {
@ -3645,6 +3732,8 @@ struct sk_msg_md {
__u32 remote_port; /* Stored in network byte order */
__u32 local_port; /* stored in host byte order */
__u32 size; /* Total size of sk_msg */
__bpf_md_ptr(struct bpf_sock *, sk); /* current socket */
};
struct sk_reuseport_md {
@ -3751,6 +3840,10 @@ struct bpf_link_info {
__u64 cgroup_id;
__u32 attach_type;
} cgroup;
struct {
__u32 netns_ino;
__u32 attach_type;
} netns;
};
} __attribute__((aligned(8)));

View File

@ -1,3 +1,3 @@
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
btf_dump.o
btf_dump.o ringbuf.o

View File

@ -151,7 +151,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN_SHARED) | \
sed 's/\[.*\]//' | \
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \
sort -u | wc -l)
VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \
grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
CMD_TARGETS = $(LIB_TARGET) $(PC_FILE)
@ -218,7 +218,7 @@ check_abi: $(OUTPUT)libbpf.so
sed 's/\[.*\]//' | \
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \
sort -u > $(OUTPUT)libbpf_global_syms.tmp; \
readelf -s --wide $(OUTPUT)libbpf.so | \
readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \
grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \
sort -u > $(OUTPUT)libbpf_versioned_syms.tmp; \
diff -u $(OUTPUT)libbpf_global_syms.tmp \
@ -264,7 +264,7 @@ install_pkgconfig: $(PC_FILE)
$(call QUIET_INSTALL, $(PC_FILE)) \
$(call do_install,$(PC_FILE),$(libdir_SQ)/pkgconfig,644)
install: install_lib install_pkgconfig
install: install_lib install_pkgconfig install_headers
### Cleaning rules

View File

@ -6657,6 +6657,8 @@ static const struct bpf_sec_def section_defs[] = {
.expected_attach_type = BPF_TRACE_ITER,
.is_attach_btf = true,
.attach_fn = attach_iter),
BPF_EAPROG_SEC("xdp_devmap", BPF_PROG_TYPE_XDP,
BPF_XDP_DEVMAP),
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
@ -7894,8 +7896,9 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
return bpf_program__attach_iter(prog, NULL);
}
struct bpf_link *
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
static struct bpf_link *
bpf_program__attach_fd(struct bpf_program *prog, int target_fd,
const char *target_name)
{
enum bpf_attach_type attach_type;
char errmsg[STRERR_BUFSIZE];
@ -7915,12 +7918,12 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
link->detach = &bpf_link__detach_fd;
attach_type = bpf_program__get_expected_attach_type(prog);
link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
if (link_fd < 0) {
link_fd = -errno;
free(link);
pr_warn("program '%s': failed to attach to cgroup: %s\n",
bpf_program__title(prog, false),
pr_warn("program '%s': failed to attach to %s: %s\n",
bpf_program__title(prog, false), target_name,
libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
return ERR_PTR(link_fd);
}
@ -7928,6 +7931,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
return link;
}
struct bpf_link *
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
{
return bpf_program__attach_fd(prog, cgroup_fd, "cgroup");
}
struct bpf_link *
bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
{
return bpf_program__attach_fd(prog, netns_fd, "netns");
}
struct bpf_link *
bpf_program__attach_iter(struct bpf_program *prog,
const struct bpf_iter_attach_opts *opts)
@ -8137,9 +8152,12 @@ void perf_buffer__free(struct perf_buffer *pb)
if (!pb)
return;
if (pb->cpu_bufs) {
for (i = 0; i < pb->cpu_cnt && pb->cpu_bufs[i]; i++) {
for (i = 0; i < pb->cpu_cnt; i++) {
struct perf_cpu_buf *cpu_buf = pb->cpu_bufs[i];
if (!cpu_buf)
continue;
bpf_map_delete_elem(pb->map_fd, &cpu_buf->map_key);
perf_buffer__free_cpu_buf(pb, cpu_buf);
}
@ -8456,6 +8474,25 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
return cnt < 0 ? -errno : cnt;
}
int perf_buffer__consume(struct perf_buffer *pb)
{
int i, err;
for (i = 0; i < pb->cpu_cnt; i++) {
struct perf_cpu_buf *cpu_buf = pb->cpu_bufs[i];
if (!cpu_buf)
continue;
err = perf_buffer__process_records(pb, cpu_buf);
if (err) {
pr_warn("error while processing records: %d\n", err);
return err;
}
}
return 0;
}
struct bpf_prog_info_array_desc {
int array_offset; /* e.g. offset of jited_prog_insns */
int count_offset; /* e.g. offset of jited_prog_len */

View File

@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_lsm(struct bpf_program *prog);
LIBBPF_API struct bpf_link *
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
struct bpf_map;
@ -478,6 +480,27 @@ LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
size_t info_size, __u32 flags);
/* Ring buffer APIs */
struct ring_buffer;
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
struct ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatiblity */
};
#define ring_buffer_opts__last_field sz
LIBBPF_API struct ring_buffer *
ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
const struct ring_buffer_opts *opts);
LIBBPF_API void ring_buffer__free(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
/* Perf buffer APIs */
struct perf_buffer;
typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu,
@ -533,6 +556,7 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt,
LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
typedef enum bpf_perf_event_ret
(*bpf_perf_event_print_t)(struct perf_event_header *hdr,

View File

@ -262,4 +262,11 @@ LIBBPF_0.0.9 {
bpf_link_get_fd_by_id;
bpf_link_get_next_id;
bpf_program__attach_iter;
bpf_program__attach_netns;
perf_buffer__consume;
ring_buffer__add;
ring_buffer__consume;
ring_buffer__free;
ring_buffer__new;
ring_buffer__poll;
} LIBBPF_0.0.8;

View File

@ -238,6 +238,11 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
if (btf_fd < 0)
return false;
break;
case BPF_MAP_TYPE_RINGBUF:
key_size = 0;
value_size = 0;
max_entries = 4096;
break;
case BPF_MAP_TYPE_UNSPEC:
case BPF_MAP_TYPE_HASH:
case BPF_MAP_TYPE_ARRAY:

288
tools/lib/bpf/ringbuf.c Normal file
View File

@ -0,0 +1,288 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/*
* Ring buffer operations.
*
* Copyright (C) 2020 Facebook, Inc.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <linux/err.h>
#include <linux/bpf.h>
#include <asm/barrier.h>
#include <sys/mman.h>
#include <sys/epoll.h>
#include <tools/libc_compat.h>
#include "libbpf.h"
#include "libbpf_internal.h"
#include "bpf.h"
/* make sure libbpf doesn't use kernel-only integer typedefs */
#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
struct ring {
ring_buffer_sample_fn sample_cb;
void *ctx;
void *data;
unsigned long *consumer_pos;
unsigned long *producer_pos;
unsigned long mask;
int map_fd;
};
struct ring_buffer {
struct epoll_event *events;
struct ring *rings;
size_t page_size;
int epoll_fd;
int ring_cnt;
};
static void ringbuf_unmap_ring(struct ring_buffer *rb, struct ring *r)
{
if (r->consumer_pos) {
munmap(r->consumer_pos, rb->page_size);
r->consumer_pos = NULL;
}
if (r->producer_pos) {
munmap(r->producer_pos, rb->page_size + 2 * (r->mask + 1));
r->producer_pos = NULL;
}
}
/* Add extra RINGBUF maps to this ring buffer manager */
int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx)
{
struct bpf_map_info info;
__u32 len = sizeof(info);
struct epoll_event *e;
struct ring *r;
void *tmp;
int err;
memset(&info, 0, sizeof(info));
err = bpf_obj_get_info_by_fd(map_fd, &info, &len);
if (err) {
err = -errno;
pr_warn("ringbuf: failed to get map info for fd=%d: %d\n",
map_fd, err);
return err;
}
if (info.type != BPF_MAP_TYPE_RINGBUF) {
pr_warn("ringbuf: map fd=%d is not BPF_MAP_TYPE_RINGBUF\n",
map_fd);
return -EINVAL;
}
tmp = reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings));
if (!tmp)
return -ENOMEM;
rb->rings = tmp;
tmp = reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events));
if (!tmp)
return -ENOMEM;
rb->events = tmp;
r = &rb->rings[rb->ring_cnt];
memset(r, 0, sizeof(*r));
r->map_fd = map_fd;
r->sample_cb = sample_cb;
r->ctx = ctx;
r->mask = info.max_entries - 1;
/* Map writable consumer page */
tmp = mmap(NULL, rb->page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
map_fd, 0);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
map_fd, err);
return err;
}
r->consumer_pos = tmp;
/* Map read-only producer page and data pages. We map twice as big
* data size to allow simple reading of samples that wrap around the
* end of a ring buffer. See kernel implementation for details.
* */
tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
MAP_SHARED, map_fd, rb->page_size);
if (tmp == MAP_FAILED) {
err = -errno;
ringbuf_unmap_ring(rb, r);
pr_warn("ringbuf: failed to mmap data pages for map fd=%d: %d\n",
map_fd, err);
return err;
}
r->producer_pos = tmp;
r->data = tmp + rb->page_size;
e = &rb->events[rb->ring_cnt];
memset(e, 0, sizeof(*e));
e->events = EPOLLIN;
e->data.fd = rb->ring_cnt;
if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, e) < 0) {
err = -errno;
ringbuf_unmap_ring(rb, r);
pr_warn("ringbuf: failed to epoll add map fd=%d: %d\n",
map_fd, err);
return err;
}
rb->ring_cnt++;
return 0;
}
void ring_buffer__free(struct ring_buffer *rb)
{
int i;
if (!rb)
return;
for (i = 0; i < rb->ring_cnt; ++i)
ringbuf_unmap_ring(rb, &rb->rings[i]);
if (rb->epoll_fd >= 0)
close(rb->epoll_fd);
free(rb->events);
free(rb->rings);
free(rb);
}
struct ring_buffer *
ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
const struct ring_buffer_opts *opts)
{
struct ring_buffer *rb;
int err;
if (!OPTS_VALID(opts, ring_buffer_opts))
return NULL;
rb = calloc(1, sizeof(*rb));
if (!rb)
return NULL;
rb->page_size = getpagesize();
rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (rb->epoll_fd < 0) {
err = -errno;
pr_warn("ringbuf: failed to create epoll instance: %d\n", err);
goto err_out;
}
err = ring_buffer__add(rb, map_fd, sample_cb, ctx);
if (err)
goto err_out;
return rb;
err_out:
ring_buffer__free(rb);
return NULL;
}
static inline int roundup_len(__u32 len)
{
/* clear out top 2 bits (discard and busy, if set) */
len <<= 2;
len >>= 2;
/* add length prefix */
len += BPF_RINGBUF_HDR_SZ;
/* round up to 8 byte alignment */
return (len + 7) / 8 * 8;
}
static int ringbuf_process_ring(struct ring* r)
{
int *len_ptr, len, err, cnt = 0;
unsigned long cons_pos, prod_pos;
bool got_new_data;
void *sample;
cons_pos = smp_load_acquire(r->consumer_pos);
do {
got_new_data = false;
prod_pos = smp_load_acquire(r->producer_pos);
while (cons_pos < prod_pos) {
len_ptr = r->data + (cons_pos & r->mask);
len = smp_load_acquire(len_ptr);
/* sample not committed yet, bail out for now */
if (len & BPF_RINGBUF_BUSY_BIT)
goto done;
got_new_data = true;
cons_pos += roundup_len(len);
if ((len & BPF_RINGBUF_DISCARD_BIT) == 0) {
sample = (void *)len_ptr + BPF_RINGBUF_HDR_SZ;
err = r->sample_cb(r->ctx, sample, len);
if (err) {
/* update consumer pos and bail out */
smp_store_release(r->consumer_pos,
cons_pos);
return err;
}
cnt++;
}
smp_store_release(r->consumer_pos, cons_pos);
}
} while (got_new_data);
done:
return cnt;
}
/* Consume available ring buffer(s) data without event polling.
* Returns number of records consumed across all registered ring buffers, or
* negative number if any of the callbacks return error.
*/
int ring_buffer__consume(struct ring_buffer *rb)
{
int i, err, res = 0;
for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = &rb->rings[i];
err = ringbuf_process_ring(ring);
if (err < 0)
return err;
res += err;
}
return res;
}
/* Poll for available data and consume records, if any are available.
* Returns number of records consumed, or negative number, if any of the
* registered callbacks returned error.
*/
int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
{
int i, cnt, err, res = 0;
cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
for (i = 0; i < cnt; i++) {
__u32 ring_id = rb->events[i].data.fd;
struct ring *ring = &rb->rings[ring_id];
err = ringbuf_process_ring(ring);
if (err < 0)
return err;
res += cnt;
}
return cnt < 0 ? -errno : res;
}

View File

@ -413,12 +413,15 @@ $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h
$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@
$(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
$(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
$(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
$(OUTPUT)/perfbuf_bench.skel.h
$(OUTPUT)/bench.o: bench.h testing_helpers.h
$(OUTPUT)/bench: LDLIBS += -lm
$(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
$(OUTPUT)/bench_count.o \
$(OUTPUT)/bench_rename.o \
$(OUTPUT)/bench_trigger.o
$(OUTPUT)/bench_trigger.o \
$(OUTPUT)/bench_ringbufs.o
$(call msg,BINARY,,$@)
$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)

View File

@ -130,6 +130,13 @@ static const struct argp_option opts[] = {
{},
};
extern struct argp bench_ringbufs_argp;
static const struct argp_child bench_parsers[] = {
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
{},
};
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
static int pos_args;
@ -208,6 +215,7 @@ static void parse_cmdline_args(int argc, char **argv)
.options = opts,
.parser = parse_arg,
.doc = argp_program_doc,
.children = bench_parsers,
};
if (argp_parse(&argp, argc, argv, 0, NULL, NULL))
exit(1);
@ -310,6 +318,10 @@ extern const struct bench bench_trig_rawtp;
extern const struct bench bench_trig_kprobe;
extern const struct bench bench_trig_fentry;
extern const struct bench bench_trig_fmodret;
extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom;
extern const struct bench bench_pb_libbpf;
extern const struct bench bench_pb_custom;
static const struct bench *benchs[] = {
&bench_count_global,
@ -327,6 +339,10 @@ static const struct bench *benchs[] = {
&bench_trig_kprobe,
&bench_trig_fentry,
&bench_trig_fmodret,
&bench_rb_libbpf,
&bench_rb_custom,
&bench_pb_libbpf,
&bench_pb_custom,
};
static void setup_benchmark()

View File

@ -0,0 +1,566 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <asm/barrier.h>
#include <linux/perf_event.h>
#include <linux/ring_buffer.h>
#include <sys/epoll.h>
#include <sys/mman.h>
#include <argp.h>
#include <stdlib.h>
#include "bench.h"
#include "ringbuf_bench.skel.h"
#include "perfbuf_bench.skel.h"
static struct {
bool back2back;
int batch_cnt;
bool sampled;
int sample_rate;
int ringbuf_sz; /* per-ringbuf, in bytes */
bool ringbuf_use_output; /* use slower output API */
int perfbuf_sz; /* per-CPU size, in pages */
} args = {
.back2back = false,
.batch_cnt = 500,
.sampled = false,
.sample_rate = 500,
.ringbuf_sz = 512 * 1024,
.ringbuf_use_output = false,
.perfbuf_sz = 128,
};
enum {
ARG_RB_BACK2BACK = 2000,
ARG_RB_USE_OUTPUT = 2001,
ARG_RB_BATCH_CNT = 2002,
ARG_RB_SAMPLED = 2003,
ARG_RB_SAMPLE_RATE = 2004,
};
static const struct argp_option opts[] = {
{ "rb-b2b", ARG_RB_BACK2BACK, NULL, 0, "Back-to-back mode"},
{ "rb-use-output", ARG_RB_USE_OUTPUT, NULL, 0, "Use bpf_ringbuf_output() instead of bpf_ringbuf_reserve()"},
{ "rb-batch-cnt", ARG_RB_BATCH_CNT, "CNT", 0, "Set BPF-side record batch count"},
{ "rb-sampled", ARG_RB_SAMPLED, NULL, 0, "Notification sampling"},
{ "rb-sample-rate", ARG_RB_SAMPLE_RATE, "RATE", 0, "Notification sample rate"},
{},
};
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
switch (key) {
case ARG_RB_BACK2BACK:
args.back2back = true;
break;
case ARG_RB_USE_OUTPUT:
args.ringbuf_use_output = true;
break;
case ARG_RB_BATCH_CNT:
args.batch_cnt = strtol(arg, NULL, 10);
if (args.batch_cnt < 0) {
fprintf(stderr, "Invalid batch count.");
argp_usage(state);
}
break;
case ARG_RB_SAMPLED:
args.sampled = true;
break;
case ARG_RB_SAMPLE_RATE:
args.sample_rate = strtol(arg, NULL, 10);
if (args.sample_rate < 0) {
fprintf(stderr, "Invalid perfbuf sample rate.");
argp_usage(state);
}
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
/* exported into benchmark runner */
const struct argp bench_ringbufs_argp = {
.options = opts,
.parser = parse_arg,
};
/* RINGBUF-LIBBPF benchmark */
static struct counter buf_hits;
static inline void bufs_trigger_batch()
{
(void)syscall(__NR_getpgid);
}
static void bufs_validate()
{
if (env.consumer_cnt != 1) {
fprintf(stderr, "rb-libbpf benchmark doesn't support multi-consumer!\n");
exit(1);
}
if (args.back2back && env.producer_cnt > 1) {
fprintf(stderr, "back-to-back mode makes sense only for single-producer case!\n");
exit(1);
}
}
static void *bufs_sample_producer(void *input)
{
if (args.back2back) {
/* initial batch to get everything started */
bufs_trigger_batch();
return NULL;
}
while (true)
bufs_trigger_batch();
return NULL;
}
static struct ringbuf_libbpf_ctx {
struct ringbuf_bench *skel;
struct ring_buffer *ringbuf;
} ringbuf_libbpf_ctx;
static void ringbuf_libbpf_measure(struct bench_res *res)
{
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
res->hits = atomic_swap(&buf_hits.value, 0);
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
static struct ringbuf_bench *ringbuf_setup_skeleton()
{
struct ringbuf_bench *skel;
setup_libbpf();
skel = ringbuf_bench__open();
if (!skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
skel->rodata->batch_cnt = args.batch_cnt;
skel->rodata->use_output = args.ringbuf_use_output ? 1 : 0;
if (args.sampled)
/* record data + header take 16 bytes */
skel->rodata->wakeup_data_size = args.sample_rate * 16;
bpf_map__resize(skel->maps.ringbuf, args.ringbuf_sz);
if (ringbuf_bench__load(skel)) {
fprintf(stderr, "failed to load skeleton\n");
exit(1);
}
return skel;
}
static int buf_process_sample(void *ctx, void *data, size_t len)
{
atomic_inc(&buf_hits.value);
return 0;
}
static void ringbuf_libbpf_setup()
{
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
struct bpf_link *link;
ctx->skel = ringbuf_setup_skeleton();
ctx->ringbuf = ring_buffer__new(bpf_map__fd(ctx->skel->maps.ringbuf),
buf_process_sample, NULL, NULL);
if (!ctx->ringbuf) {
fprintf(stderr, "failed to create ringbuf\n");
exit(1);
}
link = bpf_program__attach(ctx->skel->progs.bench_ringbuf);
if (IS_ERR(link)) {
fprintf(stderr, "failed to attach program!\n");
exit(1);
}
}
static void *ringbuf_libbpf_consumer(void *input)
{
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
while (ring_buffer__poll(ctx->ringbuf, -1) >= 0) {
if (args.back2back)
bufs_trigger_batch();
}
fprintf(stderr, "ringbuf polling failed!\n");
return NULL;
}
/* RINGBUF-CUSTOM benchmark */
struct ringbuf_custom {
__u64 *consumer_pos;
__u64 *producer_pos;
__u64 mask;
void *data;
int map_fd;
};
static struct ringbuf_custom_ctx {
struct ringbuf_bench *skel;
struct ringbuf_custom ringbuf;
int epoll_fd;
struct epoll_event event;
} ringbuf_custom_ctx;
static void ringbuf_custom_measure(struct bench_res *res)
{
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
res->hits = atomic_swap(&buf_hits.value, 0);
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
static void ringbuf_custom_setup()
{
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
const size_t page_size = getpagesize();
struct bpf_link *link;
struct ringbuf_custom *r;
void *tmp;
int err;
ctx->skel = ringbuf_setup_skeleton();
ctx->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (ctx->epoll_fd < 0) {
fprintf(stderr, "failed to create epoll fd: %d\n", -errno);
exit(1);
}
r = &ctx->ringbuf;
r->map_fd = bpf_map__fd(ctx->skel->maps.ringbuf);
r->mask = args.ringbuf_sz - 1;
/* Map writable consumer page */
tmp = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
r->map_fd, 0);
if (tmp == MAP_FAILED) {
fprintf(stderr, "failed to mmap consumer page: %d\n", -errno);
exit(1);
}
r->consumer_pos = tmp;
/* Map read-only producer page and data pages. */
tmp = mmap(NULL, page_size + 2 * args.ringbuf_sz, PROT_READ, MAP_SHARED,
r->map_fd, page_size);
if (tmp == MAP_FAILED) {
fprintf(stderr, "failed to mmap data pages: %d\n", -errno);
exit(1);
}
r->producer_pos = tmp;
r->data = tmp + page_size;
ctx->event.events = EPOLLIN;
err = epoll_ctl(ctx->epoll_fd, EPOLL_CTL_ADD, r->map_fd, &ctx->event);
if (err < 0) {
fprintf(stderr, "failed to epoll add ringbuf: %d\n", -errno);
exit(1);
}
link = bpf_program__attach(ctx->skel->progs.bench_ringbuf);
if (IS_ERR(link)) {
fprintf(stderr, "failed to attach program\n");
exit(1);
}
}
#define RINGBUF_BUSY_BIT (1 << 31)
#define RINGBUF_DISCARD_BIT (1 << 30)
#define RINGBUF_META_LEN 8
static inline int roundup_len(__u32 len)
{
/* clear out top 2 bits */
len <<= 2;
len >>= 2;
/* add length prefix */
len += RINGBUF_META_LEN;
/* round up to 8 byte alignment */
return (len + 7) / 8 * 8;
}
static void ringbuf_custom_process_ring(struct ringbuf_custom *r)
{
unsigned long cons_pos, prod_pos;
int *len_ptr, len;
bool got_new_data;
cons_pos = smp_load_acquire(r->consumer_pos);
while (true) {
got_new_data = false;
prod_pos = smp_load_acquire(r->producer_pos);
while (cons_pos < prod_pos) {
len_ptr = r->data + (cons_pos & r->mask);
len = smp_load_acquire(len_ptr);
/* sample not committed yet, bail out for now */
if (len & RINGBUF_BUSY_BIT)
return;
got_new_data = true;
cons_pos += roundup_len(len);
atomic_inc(&buf_hits.value);
}
if (got_new_data)
smp_store_release(r->consumer_pos, cons_pos);
else
break;
};
}
static void *ringbuf_custom_consumer(void *input)
{
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
int cnt;
do {
if (args.back2back)
bufs_trigger_batch();
cnt = epoll_wait(ctx->epoll_fd, &ctx->event, 1, -1);
if (cnt > 0)
ringbuf_custom_process_ring(&ctx->ringbuf);
} while (cnt >= 0);
fprintf(stderr, "ringbuf polling failed!\n");
return 0;
}
/* PERFBUF-LIBBPF benchmark */
static struct perfbuf_libbpf_ctx {
struct perfbuf_bench *skel;
struct perf_buffer *perfbuf;
} perfbuf_libbpf_ctx;
static void perfbuf_measure(struct bench_res *res)
{
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
res->hits = atomic_swap(&buf_hits.value, 0);
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
static struct perfbuf_bench *perfbuf_setup_skeleton()
{
struct perfbuf_bench *skel;
setup_libbpf();
skel = perfbuf_bench__open();
if (!skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
skel->rodata->batch_cnt = args.batch_cnt;
if (perfbuf_bench__load(skel)) {
fprintf(stderr, "failed to load skeleton\n");
exit(1);
}
return skel;
}
static enum bpf_perf_event_ret
perfbuf_process_sample_raw(void *input_ctx, int cpu,
struct perf_event_header *e)
{
switch (e->type) {
case PERF_RECORD_SAMPLE:
atomic_inc(&buf_hits.value);
break;
case PERF_RECORD_LOST:
break;
default:
return LIBBPF_PERF_EVENT_ERROR;
}
return LIBBPF_PERF_EVENT_CONT;
}
static void perfbuf_libbpf_setup()
{
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
struct perf_event_attr attr;
struct perf_buffer_raw_opts pb_opts = {
.event_cb = perfbuf_process_sample_raw,
.ctx = (void *)(long)0,
.attr = &attr,
};
struct bpf_link *link;
ctx->skel = perfbuf_setup_skeleton();
memset(&attr, 0, sizeof(attr));
attr.config = PERF_COUNT_SW_BPF_OUTPUT,
attr.type = PERF_TYPE_SOFTWARE;
attr.sample_type = PERF_SAMPLE_RAW;
/* notify only every Nth sample */
if (args.sampled) {
attr.sample_period = args.sample_rate;
attr.wakeup_events = args.sample_rate;
} else {
attr.sample_period = 1;
attr.wakeup_events = 1;
}
if (args.sample_rate > args.batch_cnt) {
fprintf(stderr, "sample rate %d is too high for given batch count %d\n",
args.sample_rate, args.batch_cnt);
exit(1);
}
ctx->perfbuf = perf_buffer__new_raw(bpf_map__fd(ctx->skel->maps.perfbuf),
args.perfbuf_sz, &pb_opts);
if (!ctx->perfbuf) {
fprintf(stderr, "failed to create perfbuf\n");
exit(1);
}
link = bpf_program__attach(ctx->skel->progs.bench_perfbuf);
if (IS_ERR(link)) {
fprintf(stderr, "failed to attach program\n");
exit(1);
}
}
static void *perfbuf_libbpf_consumer(void *input)
{
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
while (perf_buffer__poll(ctx->perfbuf, -1) >= 0) {
if (args.back2back)
bufs_trigger_batch();
}
fprintf(stderr, "perfbuf polling failed!\n");
return NULL;
}
/* PERFBUF-CUSTOM benchmark */
/* copies of internal libbpf definitions */
struct perf_cpu_buf {
struct perf_buffer *pb;
void *base; /* mmap()'ed memory */
void *buf; /* for reconstructing segmented data */
size_t buf_size;
int fd;
int cpu;
int map_key;
};
struct perf_buffer {
perf_buffer_event_fn event_cb;
perf_buffer_sample_fn sample_cb;
perf_buffer_lost_fn lost_cb;
void *ctx; /* passed into callbacks */
size_t page_size;
size_t mmap_size;
struct perf_cpu_buf **cpu_bufs;
struct epoll_event *events;
int cpu_cnt; /* number of allocated CPU buffers */
int epoll_fd; /* perf event FD */
int map_fd; /* BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map FD */
};
static void *perfbuf_custom_consumer(void *input)
{
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
struct perf_buffer *pb = ctx->perfbuf;
struct perf_cpu_buf *cpu_buf;
struct perf_event_mmap_page *header;
size_t mmap_mask = pb->mmap_size - 1;
struct perf_event_header *ehdr;
__u64 data_head, data_tail;
size_t ehdr_size;
void *base;
int i, cnt;
while (true) {
if (args.back2back)
bufs_trigger_batch();
cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, -1);
if (cnt <= 0) {
fprintf(stderr, "perf epoll failed: %d\n", -errno);
exit(1);
}
for (i = 0; i < cnt; ++i) {
cpu_buf = pb->events[i].data.ptr;
header = cpu_buf->base;
base = ((void *)header) + pb->page_size;
data_head = ring_buffer_read_head(header);
data_tail = header->data_tail;
while (data_head != data_tail) {
ehdr = base + (data_tail & mmap_mask);
ehdr_size = ehdr->size;
if (ehdr->type == PERF_RECORD_SAMPLE)
atomic_inc(&buf_hits.value);
data_tail += ehdr_size;
}
ring_buffer_write_tail(header, data_tail);
}
}
return NULL;
}
const struct bench bench_rb_libbpf = {
.name = "rb-libbpf",
.validate = bufs_validate,
.setup = ringbuf_libbpf_setup,
.producer_thread = bufs_sample_producer,
.consumer_thread = ringbuf_libbpf_consumer,
.measure = ringbuf_libbpf_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_rb_custom = {
.name = "rb-custom",
.validate = bufs_validate,
.setup = ringbuf_custom_setup,
.producer_thread = bufs_sample_producer,
.consumer_thread = ringbuf_custom_consumer,
.measure = ringbuf_custom_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_pb_libbpf = {
.name = "pb-libbpf",
.validate = bufs_validate,
.setup = perfbuf_libbpf_setup,
.producer_thread = bufs_sample_producer,
.consumer_thread = perfbuf_libbpf_consumer,
.measure = perfbuf_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_pb_custom = {
.name = "pb-custom",
.validate = bufs_validate,
.setup = perfbuf_libbpf_setup,
.producer_thread = bufs_sample_producer,
.consumer_thread = perfbuf_custom_consumer,
.measure = perfbuf_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};

View File

@ -0,0 +1,75 @@
#!/bin/bash
set -eufo pipefail
RUN_BENCH="sudo ./bench -w3 -d10 -a"
function hits()
{
echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
}
function drops()
{
echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
}
function header()
{
local len=${#1}
printf "\n%s\n" "$1"
for i in $(seq 1 $len); do printf '='; done
printf '\n'
}
function summarize()
{
bench="$1"
summary=$(echo $2 | tail -n1)
printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
}
header "Single-producer, parallel producer"
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
summarize $b "$($RUN_BENCH $b)"
done
header "Single-producer, parallel producer, sampled notification"
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
summarize $b "$($RUN_BENCH --rb-sampled $b)"
done
header "Single-producer, back-to-back mode"
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
summarize $b "$($RUN_BENCH --rb-b2b $b)"
summarize $b-sampled "$($RUN_BENCH --rb-sampled --rb-b2b $b)"
done
header "Ringbuf back-to-back, effect of sample rate"
for b in 1 5 10 25 50 100 250 500 1000 2000 3000; do
summarize "rb-sampled-$b" "$($RUN_BENCH --rb-b2b --rb-batch-cnt $b --rb-sampled --rb-sample-rate $b rb-custom)"
done
header "Perfbuf back-to-back, effect of sample rate"
for b in 1 5 10 25 50 100 250 500 1000 2000 3000; do
summarize "pb-sampled-$b" "$($RUN_BENCH --rb-b2b --rb-batch-cnt $b --rb-sampled --rb-sample-rate $b pb-custom)"
done
header "Ringbuf back-to-back, reserve+commit vs output"
summarize "reserve" "$($RUN_BENCH --rb-b2b rb-custom)"
summarize "output" "$($RUN_BENCH --rb-b2b --rb-use-output rb-custom)"
header "Ringbuf sampled, reserve+commit vs output"
summarize "reserve-sampled" "$($RUN_BENCH --rb-sampled rb-custom)"
summarize "output-sampled" "$($RUN_BENCH --rb-sampled --rb-use-output rb-custom)"
header "Single-producer, consumer/producer competing on the same CPU, low batch count"
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
summarize $b "$($RUN_BENCH --rb-batch-cnt 1 --rb-sample-rate 1 --prod-affinity 0 --cons-affinity 0 $b)"
done
header "Ringbuf, multi-producer contention"
for b in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
summarize "rb-libbpf nr_prod $b" "$($RUN_BENCH -p$b --rb-batch-cnt 50 rb-libbpf)"
done

View File

@ -6,6 +6,8 @@
#include <linux/if_tun.h>
#include <sys/uio.h>
#include "bpf_flow.skel.h"
#ifndef IP_MF
#define IP_MF 0x2000
#endif
@ -101,6 +103,7 @@ struct test {
#define VLAN_HLEN 4
static __u32 duration;
struct test tests[] = {
{
.name = "ipv4",
@ -444,17 +447,130 @@ static int ifup(const char *ifname)
return 0;
}
static int init_prog_array(struct bpf_object *obj, struct bpf_map *prog_array)
{
int i, err, map_fd, prog_fd;
struct bpf_program *prog;
char prog_name[32];
map_fd = bpf_map__fd(prog_array);
if (map_fd < 0)
return -1;
for (i = 0; i < bpf_map__def(prog_array)->max_entries; i++) {
snprintf(prog_name, sizeof(prog_name), "flow_dissector/%i", i);
prog = bpf_object__find_program_by_title(obj, prog_name);
if (!prog)
return -1;
prog_fd = bpf_program__fd(prog);
if (prog_fd < 0)
return -1;
err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
if (err)
return -1;
}
return 0;
}
static void run_tests_skb_less(int tap_fd, struct bpf_map *keys)
{
int i, err, keys_fd;
keys_fd = bpf_map__fd(keys);
if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
return;
for (i = 0; i < ARRAY_SIZE(tests); i++) {
/* Keep in sync with 'flags' from eth_get_headlen. */
__u32 eth_get_headlen_flags =
BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
struct bpf_prog_test_run_attr tattr = {};
struct bpf_flow_keys flow_keys = {};
__u32 key = (__u32)(tests[i].keys.sport) << 16 |
tests[i].keys.dport;
/* For skb-less case we can't pass input flags; run
* only the tests that have a matching set of flags.
*/
if (tests[i].flags != eth_get_headlen_flags)
continue;
err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
err = bpf_map_delete_elem(keys_fd, &key);
CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
}
}
static void test_skb_less_prog_attach(struct bpf_flow *skel, int tap_fd)
{
int err, prog_fd;
prog_fd = bpf_program__fd(skel->progs._dissect);
if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
return;
err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno))
return;
run_tests_skb_less(tap_fd, skel->maps.last_dissection);
err = bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
CHECK(err, "bpf_prog_detach", "err %d errno %d\n", err, errno);
}
static void test_skb_less_link_create(struct bpf_flow *skel, int tap_fd)
{
struct bpf_link *link;
int err, net_fd;
net_fd = open("/proc/self/ns/net", O_RDONLY);
if (CHECK(net_fd < 0, "open(/proc/self/ns/net)", "err %d\n", errno))
return;
link = bpf_program__attach_netns(skel->progs._dissect, net_fd);
if (CHECK(IS_ERR(link), "attach_netns", "err %ld\n", PTR_ERR(link)))
goto out_close;
run_tests_skb_less(tap_fd, skel->maps.last_dissection);
err = bpf_link__destroy(link);
CHECK(err, "bpf_link__destroy", "err %d\n", err);
out_close:
close(net_fd);
}
void test_flow_dissector(void)
{
int i, err, prog_fd, keys_fd = -1, tap_fd;
struct bpf_object *obj;
__u32 duration = 0;
struct bpf_flow *skel;
err = bpf_flow_load(&obj, "./bpf_flow.o", "flow_dissector",
"jmp_table", "last_dissection", &prog_fd, &keys_fd);
if (CHECK_FAIL(err))
skel = bpf_flow__open_and_load();
if (CHECK(!skel, "skel", "failed to open/load skeleton\n"))
return;
prog_fd = bpf_program__fd(skel->progs._dissect);
if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
goto out_destroy_skel;
keys_fd = bpf_map__fd(skel->maps.last_dissection);
if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
goto out_destroy_skel;
err = init_prog_array(skel->obj, skel->maps.jmp_table);
if (CHECK(err, "init_prog_array", "err %d\n", err))
goto out_destroy_skel;
for (i = 0; i < ARRAY_SIZE(tests); i++) {
struct bpf_flow_keys flow_keys;
struct bpf_prog_test_run_attr tattr = {
@ -487,43 +603,17 @@ void test_flow_dissector(void)
* via BPF map in this case.
*/
err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno);
tap_fd = create_tap("tap0");
CHECK(tap_fd < 0, "create_tap", "tap_fd %d errno %d\n", tap_fd, errno);
err = ifup("tap0");
CHECK(err, "ifup", "err %d errno %d\n", err, errno);
for (i = 0; i < ARRAY_SIZE(tests); i++) {
/* Keep in sync with 'flags' from eth_get_headlen. */
__u32 eth_get_headlen_flags =
BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
struct bpf_prog_test_run_attr tattr = {};
struct bpf_flow_keys flow_keys = {};
__u32 key = (__u32)(tests[i].keys.sport) << 16 |
tests[i].keys.dport;
/* Test direct prog attachment */
test_skb_less_prog_attach(skel, tap_fd);
/* Test indirect prog attachment via link */
test_skb_less_link_create(skel, tap_fd);
/* For skb-less case we can't pass input flags; run
* only the tests that have a matching set of flags.
*/
if (tests[i].flags != eth_get_headlen_flags)
continue;
err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
err = bpf_map_delete_elem(keys_fd, &key);
CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
}
bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
bpf_object__close(obj);
close(tap_fd);
out_destroy_skel:
bpf_flow__destroy(skel);
}

View File

@ -11,6 +11,7 @@
#include <fcntl.h>
#include <sched.h>
#include <stdbool.h>
#include <sys/stat.h>
#include <unistd.h>
#include <linux/bpf.h>
@ -18,21 +19,30 @@
#include "test_progs.h"
static bool is_attached(int netns)
static int init_net = -1;
static __u32 query_attached_prog_id(int netns)
{
__u32 cnt;
__u32 prog_ids[1] = {};
__u32 prog_cnt = ARRAY_SIZE(prog_ids);
int err;
err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL, NULL, &cnt);
err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL,
prog_ids, &prog_cnt);
if (CHECK_FAIL(err)) {
perror("bpf_prog_query");
return true; /* fail-safe */
return 0;
}
return cnt > 0;
return prog_cnt == 1 ? prog_ids[0] : 0;
}
static int load_prog(void)
static bool prog_is_attached(int netns)
{
return query_attached_prog_id(netns) > 0;
}
static int load_prog(enum bpf_prog_type type)
{
struct bpf_insn prog[] = {
BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
@ -40,61 +50,566 @@ static int load_prog(void)
};
int fd;
fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
fd = bpf_load_program(type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
if (CHECK_FAIL(fd < 0))
perror("bpf_load_program");
return fd;
}
static void do_flow_dissector_reattach(void)
static __u32 query_prog_id(int prog)
{
int prog_fd[2] = { -1, -1 };
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
int err;
prog_fd[0] = load_prog();
if (prog_fd[0] < 0)
return;
prog_fd[1] = load_prog();
if (prog_fd[1] < 0)
goto out_close;
err = bpf_prog_attach(prog_fd[0], 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach-0");
goto out_close;
err = bpf_obj_get_info_by_fd(prog, &info, &info_len);
if (CHECK_FAIL(err || info_len != sizeof(info))) {
perror("bpf_obj_get_info_by_fd");
return 0;
}
return info.id;
}
static int unshare_net(int old_net)
{
int err, new_net;
err = unshare(CLONE_NEWNET);
if (CHECK_FAIL(err)) {
perror("unshare(CLONE_NEWNET)");
return -1;
}
new_net = open("/proc/self/ns/net", O_RDONLY);
if (CHECK_FAIL(new_net < 0)) {
perror("open(/proc/self/ns/net)");
setns(old_net, CLONE_NEWNET);
return -1;
}
return new_net;
}
static void test_prog_attach_prog_attach(int netns, int prog1, int prog2)
{
int err;
err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect success when attaching a different program */
err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach-1");
perror("bpf_prog_attach(prog2) #1");
goto out_detach;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
/* Expect failure when attaching the same program twice */
err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(!err || errno != EINVAL))
perror("bpf_prog_attach-2");
perror("bpf_prog_attach(prog2) #2");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
out_detach:
err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
if (CHECK_FAIL(err))
perror("bpf_prog_detach");
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_create_link_create(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
int link1, link2;
link1 = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect failure creating link when another link exists */
errno = 0;
link2 = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link2 != -1 || errno != E2BIG))
perror("bpf_prog_attach(prog2) expected E2BIG");
if (link2 != -1)
close(link2);
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(link1);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_prog_attach_link_create(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
int err, link;
err = bpf_prog_attach(prog1, -1, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect failure creating link when prog attached */
errno = 0;
link = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link != -1 || errno != EEXIST))
perror("bpf_link_create(prog2) expected EEXIST");
if (link != -1)
close(link);
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
if (CHECK_FAIL(err))
perror("bpf_prog_detach");
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_create_prog_attach(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
int err, link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect failure attaching prog when link exists */
errno = 0;
err = bpf_prog_attach(prog2, -1, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(!err || errno != EEXIST))
perror("bpf_prog_attach(prog2) expected EEXIST");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_create_prog_detach(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
int err, link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect failure detaching prog when link exists */
errno = 0;
err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
if (CHECK_FAIL(!err || errno != EINVAL))
perror("bpf_prog_detach expected EINVAL");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_prog_attach_detach_query(int netns, int prog1, int prog2)
{
int err;
err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
if (CHECK_FAIL(err)) {
perror("bpf_prog_detach");
return;
}
/* Expect no prog attached after successful detach */
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_create_close_query(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
int link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(link);
/* Expect no prog attached after closing last link FD */
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_update_no_old_prog(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
int err, link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect success replacing the prog when old prog not specified */
update_opts.flags = 0;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(err))
perror("bpf_link_update");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_update_replace_old_prog(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
int err, link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect success F_REPLACE and old prog specified to succeed */
update_opts.flags = BPF_F_REPLACE;
update_opts.old_prog_fd = prog1;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(err))
perror("bpf_link_update");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_update_invalid_opts(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
int err, link;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect update to fail w/ old prog FD but w/o F_REPLACE*/
errno = 0;
update_opts.flags = 0;
update_opts.old_prog_fd = prog1;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(!err || errno != EINVAL)) {
perror("bpf_link_update expected EINVAL");
goto out_close;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect update to fail on old prog FD mismatch */
errno = 0;
update_opts.flags = BPF_F_REPLACE;
update_opts.old_prog_fd = prog2;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(!err || errno != EPERM)) {
perror("bpf_link_update expected EPERM");
goto out_close;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect update to fail for invalid old prog FD */
errno = 0;
update_opts.flags = BPF_F_REPLACE;
update_opts.old_prog_fd = -1;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(!err || errno != EBADF)) {
perror("bpf_link_update expected EBADF");
goto out_close;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect update to fail with invalid flags */
errno = 0;
update_opts.flags = BPF_F_ALLOW_MULTI;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(!err || errno != EINVAL))
perror("bpf_link_update expected EINVAL");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
out_close:
close(prog_fd[1]);
close(prog_fd[0]);
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_update_invalid_prog(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
int err, link, prog3;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
/* Expect failure when new prog FD is not valid */
errno = 0;
update_opts.flags = 0;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, -1, &update_opts);
if (CHECK_FAIL(!err || errno != EBADF)) {
perror("bpf_link_update expected EINVAL");
goto out_close_link;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
prog3 = load_prog(BPF_PROG_TYPE_SOCKET_FILTER);
if (prog3 < 0)
goto out_close_link;
/* Expect failure when new prog FD type doesn't match */
errno = 0;
update_opts.flags = 0;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, prog3, &update_opts);
if (CHECK_FAIL(!err || errno != EINVAL))
perror("bpf_link_update expected EINVAL");
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(prog3);
out_close_link:
close(link);
CHECK_FAIL(prog_is_attached(netns));
}
static void test_link_update_netns_gone(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
int err, link, old_net;
old_net = netns;
netns = unshare_net(old_net);
if (netns < 0)
return;
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
return;
}
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
close(netns);
err = setns(old_net, CLONE_NEWNET);
if (CHECK_FAIL(err)) {
perror("setns(CLONE_NEWNET)");
close(link);
return;
}
/* Expect failure when netns destroyed */
errno = 0;
update_opts.flags = 0;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(!err || errno != ENOLINK))
perror("bpf_link_update");
close(link);
}
static void test_link_get_info(int netns, int prog1, int prog2)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
struct bpf_link_info info = {};
struct stat netns_stat = {};
__u32 info_len, link_id;
int err, link, old_net;
old_net = netns;
netns = unshare_net(old_net);
if (netns < 0)
return;
err = fstat(netns, &netns_stat);
if (CHECK_FAIL(err)) {
perror("stat(netns)");
goto out_resetns;
}
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
if (CHECK_FAIL(link < 0)) {
perror("bpf_link_create(prog1)");
goto out_resetns;
}
info_len = sizeof(info);
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
if (CHECK_FAIL(err)) {
perror("bpf_obj_get_info");
goto out_unlink;
}
CHECK_FAIL(info_len != sizeof(info));
/* Expect link info to be sane and match prog and netns details */
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
CHECK_FAIL(info.id == 0);
CHECK_FAIL(info.prog_id != query_prog_id(prog1));
CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
update_opts.flags = 0;
update_opts.old_prog_fd = 0;
err = bpf_link_update(link, prog2, &update_opts);
if (CHECK_FAIL(err)) {
perror("bpf_link_update(prog2)");
goto out_unlink;
}
link_id = info.id;
info_len = sizeof(info);
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
if (CHECK_FAIL(err)) {
perror("bpf_obj_get_info");
goto out_unlink;
}
CHECK_FAIL(info_len != sizeof(info));
/* Expect no info change after update except in prog id */
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
CHECK_FAIL(info.id != link_id);
CHECK_FAIL(info.prog_id != query_prog_id(prog2));
CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
/* Leave netns link is attached to and close last FD to it */
err = setns(old_net, CLONE_NEWNET);
if (CHECK_FAIL(err)) {
perror("setns(NEWNET)");
goto out_unlink;
}
close(netns);
old_net = -1;
netns = -1;
info_len = sizeof(info);
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
if (CHECK_FAIL(err)) {
perror("bpf_obj_get_info");
goto out_unlink;
}
CHECK_FAIL(info_len != sizeof(info));
/* Expect netns_ino to change to 0 */
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
CHECK_FAIL(info.id != link_id);
CHECK_FAIL(info.prog_id != query_prog_id(prog2));
CHECK_FAIL(info.netns.netns_ino != 0);
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
out_unlink:
close(link);
out_resetns:
if (old_net != -1)
setns(old_net, CLONE_NEWNET);
if (netns != -1)
close(netns);
}
static void run_tests(int netns)
{
struct test {
const char *test_name;
void (*test_func)(int netns, int prog1, int prog2);
} tests[] = {
{ "prog attach, prog attach",
test_prog_attach_prog_attach },
{ "link create, link create",
test_link_create_link_create },
{ "prog attach, link create",
test_prog_attach_link_create },
{ "link create, prog attach",
test_link_create_prog_attach },
{ "link create, prog detach",
test_link_create_prog_detach },
{ "prog attach, detach, query",
test_prog_attach_detach_query },
{ "link create, close, query",
test_link_create_close_query },
{ "link update no old prog",
test_link_update_no_old_prog },
{ "link update with replace old prog",
test_link_update_replace_old_prog },
{ "link update invalid opts",
test_link_update_invalid_opts },
{ "link update invalid prog",
test_link_update_invalid_prog },
{ "link update netns gone",
test_link_update_netns_gone },
{ "link get info",
test_link_get_info },
};
int i, progs[2] = { -1, -1 };
char test_name[80];
for (i = 0; i < ARRAY_SIZE(progs); i++) {
progs[i] = load_prog(BPF_PROG_TYPE_FLOW_DISSECTOR);
if (progs[i] < 0)
goto out_close;
}
for (i = 0; i < ARRAY_SIZE(tests); i++) {
snprintf(test_name, sizeof(test_name),
"flow dissector %s%s",
tests[i].test_name,
netns == init_net ? " (init_net)" : "");
if (test__start_subtest(test_name))
tests[i].test_func(netns, progs[0], progs[1]);
}
out_close:
for (i = 0; i < ARRAY_SIZE(progs); i++) {
if (progs[i] != -1)
CHECK_FAIL(close(progs[i]));
}
}
void test_flow_dissector_reattach(void)
{
int init_net, self_net, err;
int err, new_net, saved_net;
self_net = open("/proc/self/ns/net", O_RDONLY);
if (CHECK_FAIL(self_net < 0)) {
saved_net = open("/proc/self/ns/net", O_RDONLY);
if (CHECK_FAIL(saved_net < 0)) {
perror("open(/proc/self/ns/net");
return;
}
@ -111,30 +626,29 @@ void test_flow_dissector_reattach(void)
goto out_close;
}
if (is_attached(init_net)) {
if (prog_is_attached(init_net)) {
test__skip();
printf("Can't test with flow dissector attached to init_net\n");
goto out_setns;
}
/* First run tests in root network namespace */
do_flow_dissector_reattach();
run_tests(init_net);
/* Then repeat tests in a non-root namespace */
err = unshare(CLONE_NEWNET);
if (CHECK_FAIL(err)) {
perror("unshare(CLONE_NEWNET)");
new_net = unshare_net(init_net);
if (new_net < 0)
goto out_setns;
}
do_flow_dissector_reattach();
run_tests(new_net);
close(new_net);
out_setns:
/* Move back to netns we started in. */
err = setns(self_net, CLONE_NEWNET);
err = setns(saved_net, CLONE_NEWNET);
if (CHECK_FAIL(err))
perror("setns(/proc/self/ns/net)");
out_close:
close(init_net);
close(self_net);
close(saved_net);
}

View File

@ -0,0 +1,211 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
#include <linux/compiler.h>
#include <asm/barrier.h>
#include <test_progs.h>
#include <sys/mman.h>
#include <sys/epoll.h>
#include <time.h>
#include <sched.h>
#include <signal.h>
#include <pthread.h>
#include <sys/sysinfo.h>
#include <linux/perf_event.h>
#include <linux/ring_buffer.h>
#include "test_ringbuf.skel.h"
#define EDONE 7777
static int duration = 0;
struct sample {
int pid;
int seq;
long value;
char comm[16];
};
static int sample_cnt;
static int process_sample(void *ctx, void *data, size_t len)
{
struct sample *s = data;
sample_cnt++;
switch (s->seq) {
case 0:
CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n",
333L, s->value);
return 0;
case 1:
CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n",
777L, s->value);
return -EDONE;
default:
/* we don't care about the rest */
return 0;
}
}
static struct test_ringbuf *skel;
static struct ring_buffer *ringbuf;
static void trigger_samples()
{
skel->bss->dropped = 0;
skel->bss->total = 0;
skel->bss->discarded = 0;
/* trigger exactly two samples */
skel->bss->value = 333;
syscall(__NR_getpgid);
skel->bss->value = 777;
syscall(__NR_getpgid);
}
static void *poll_thread(void *input)
{
long timeout = (long)input;
return (void *)(long)ring_buffer__poll(ringbuf, timeout);
}
void test_ringbuf(void)
{
const size_t rec_sz = BPF_RINGBUF_HDR_SZ + sizeof(struct sample);
pthread_t thread;
long bg_ret = -1;
int err;
skel = test_ringbuf__open_and_load();
if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n"))
return;
/* only trigger BPF program for current process */
skel->bss->pid = getpid();
ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf),
process_sample, NULL, NULL);
if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n"))
goto cleanup;
err = test_ringbuf__attach(skel);
if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err))
goto cleanup;
trigger_samples();
/* 2 submitted + 1 discarded records */
CHECK(skel->bss->avail_data != 3 * rec_sz,
"err_avail_size", "exp %ld, got %ld\n",
3L * rec_sz, skel->bss->avail_data);
CHECK(skel->bss->ring_size != 4096,
"err_ring_size", "exp %ld, got %ld\n",
4096L, skel->bss->ring_size);
CHECK(skel->bss->cons_pos != 0,
"err_cons_pos", "exp %ld, got %ld\n",
0L, skel->bss->cons_pos);
CHECK(skel->bss->prod_pos != 3 * rec_sz,
"err_prod_pos", "exp %ld, got %ld\n",
3L * rec_sz, skel->bss->prod_pos);
/* poll for samples */
err = ring_buffer__poll(ringbuf, -1);
/* -EDONE is used as an indicator that we are done */
if (CHECK(err != -EDONE, "err_done", "done err: %d\n", err))
goto cleanup;
/* we expect extra polling to return nothing */
err = ring_buffer__poll(ringbuf, 0);
if (CHECK(err != 0, "extra_samples", "poll result: %d\n", err))
goto cleanup;
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
0L, skel->bss->dropped);
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
2L, skel->bss->total);
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
1L, skel->bss->discarded);
/* now validate consumer position is updated and returned */
trigger_samples();
CHECK(skel->bss->cons_pos != 3 * rec_sz,
"err_cons_pos", "exp %ld, got %ld\n",
3L * rec_sz, skel->bss->cons_pos);
err = ring_buffer__poll(ringbuf, -1);
CHECK(err <= 0, "poll_err", "err %d\n", err);
/* start poll in background w/ long timeout */
err = pthread_create(&thread, NULL, poll_thread, (void *)(long)10000);
if (CHECK(err, "bg_poll", "pthread_create failed: %d\n", err))
goto cleanup;
/* turn off notifications now */
skel->bss->flags = BPF_RB_NO_WAKEUP;
/* give background thread a bit of a time */
usleep(50000);
trigger_samples();
/* sleeping arbitrarily is bad, but no better way to know that
* epoll_wait() **DID NOT** unblock in background thread
*/
usleep(50000);
/* background poll should still be blocked */
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
if (CHECK(err != EBUSY, "try_join", "err %d\n", err))
goto cleanup;
/* BPF side did everything right */
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
0L, skel->bss->dropped);
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
2L, skel->bss->total);
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
1L, skel->bss->discarded);
/* clear flags to return to "adaptive" notification mode */
skel->bss->flags = 0;
/* produce new samples, no notification should be triggered, because
* consumer is now behind
*/
trigger_samples();
/* background poll should still be blocked */
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
if (CHECK(err != EBUSY, "try_join", "err %d\n", err))
goto cleanup;
/* now force notifications */
skel->bss->flags = BPF_RB_FORCE_WAKEUP;
sample_cnt = 0;
trigger_samples();
/* now we should get a pending notification */
usleep(50000);
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
if (CHECK(err, "join_bg", "err %d\n", err))
goto cleanup;
if (CHECK(bg_ret != 1, "bg_ret", "epoll_wait result: %ld", bg_ret))
goto cleanup;
/* 3 rounds, 2 samples each */
CHECK(sample_cnt != 6, "wrong_sample_cnt",
"expected to see %d samples, got %d\n", 6, sample_cnt);
/* BPF side did everything right */
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
0L, skel->bss->dropped);
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
2L, skel->bss->total);
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
1L, skel->bss->discarded);
test_ringbuf__detach(skel);
cleanup:
ring_buffer__free(ringbuf);
test_ringbuf__destroy(skel);
}

View File

@ -0,0 +1,102 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
#include <test_progs.h>
#include <sys/epoll.h>
#include "test_ringbuf_multi.skel.h"
static int duration = 0;
struct sample {
int pid;
int seq;
long value;
char comm[16];
};
static int process_sample(void *ctx, void *data, size_t len)
{
int ring = (unsigned long)ctx;
struct sample *s = data;
switch (s->seq) {
case 0:
CHECK(ring != 1, "sample1_ring", "exp %d, got %d\n", 1, ring);
CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n",
333L, s->value);
break;
case 1:
CHECK(ring != 2, "sample2_ring", "exp %d, got %d\n", 2, ring);
CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n",
777L, s->value);
break;
default:
CHECK(true, "extra_sample", "unexpected sample seq %d, val %ld\n",
s->seq, s->value);
return -1;
}
return 0;
}
void test_ringbuf_multi(void)
{
struct test_ringbuf_multi *skel;
struct ring_buffer *ringbuf;
int err;
skel = test_ringbuf_multi__open_and_load();
if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n"))
return;
/* only trigger BPF program for current process */
skel->bss->pid = getpid();
ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf1),
process_sample, (void *)(long)1, NULL);
if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n"))
goto cleanup;
err = ring_buffer__add(ringbuf, bpf_map__fd(skel->maps.ringbuf2),
process_sample, (void *)(long)2);
if (CHECK(err, "ringbuf_add", "failed to add another ring\n"))
goto cleanup;
err = test_ringbuf_multi__attach(skel);
if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err))
goto cleanup;
/* trigger few samples, some will be skipped */
skel->bss->target_ring = 0;
skel->bss->value = 333;
syscall(__NR_getpgid);
/* skipped, no ringbuf in slot 1 */
skel->bss->target_ring = 1;
skel->bss->value = 555;
syscall(__NR_getpgid);
skel->bss->target_ring = 2;
skel->bss->value = 777;
syscall(__NR_getpgid);
/* poll for samples, should get 2 ringbufs back */
err = ring_buffer__poll(ringbuf, -1);
if (CHECK(err != 4, "poll_res", "expected 4 records, got %d\n", err))
goto cleanup;
/* expect extra polling to return nothing */
err = ring_buffer__poll(ringbuf, 0);
if (CHECK(err < 0, "extra_samples", "poll result: %d\n", err))
goto cleanup;
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
0L, skel->bss->dropped);
CHECK(skel->bss->skipped != 1, "err_skipped", "exp %ld, got %ld\n",
1L, skel->bss->skipped);
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
2L, skel->bss->total);
cleanup:
ring_buffer__free(ringbuf);
test_ringbuf_multi__destroy(skel);
}

View File

@ -0,0 +1,30 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
void test_skb_helpers(void)
{
struct __sk_buff skb = {
.wire_len = 100,
.gso_segs = 8,
.gso_size = 10,
};
struct bpf_prog_test_run_attr tattr = {
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.ctx_in = &skb,
.ctx_size_in = sizeof(skb),
.ctx_out = &skb,
.ctx_size_out = sizeof(skb),
};
struct bpf_object *obj;
int err;
err = bpf_prog_load("./test_skb_helpers.o", BPF_PROG_TYPE_SCHED_CLS, &obj,
&tattr.prog_fd);
if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
return;
err = bpf_prog_test_run_xattr(&tattr);
CHECK_ATTR(err, "len", "err %d errno %d\n", err, errno);
bpf_object__close(obj);
}

View File

@ -1,7 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Cloudflare
#include <error.h>
#include "test_progs.h"
#include "test_skmsg_load_helpers.skel.h"
#define TCP_REPAIR 19 /* TCP sock is under repair right now */
@ -70,10 +72,43 @@ out:
close(s);
}
static void test_skmsg_helpers(enum bpf_map_type map_type)
{
struct test_skmsg_load_helpers *skel;
int err, map, verdict;
skel = test_skmsg_load_helpers__open_and_load();
if (CHECK_FAIL(!skel)) {
perror("test_skmsg_load_helpers__open_and_load");
return;
}
verdict = bpf_program__fd(skel->progs.prog_msg_verdict);
map = bpf_map__fd(skel->maps.sock_map);
err = bpf_prog_attach(verdict, map, BPF_SK_MSG_VERDICT, 0);
if (CHECK_FAIL(err)) {
perror("bpf_prog_attach");
goto out;
}
err = bpf_prog_detach2(verdict, map, BPF_SK_MSG_VERDICT);
if (CHECK_FAIL(err)) {
perror("bpf_prog_detach2");
goto out;
}
out:
test_skmsg_load_helpers__destroy(skel);
}
void test_sockmap_basic(void)
{
if (test__start_subtest("sockmap create_update_free"))
test_sockmap_create_update_free(BPF_MAP_TYPE_SOCKMAP);
if (test__start_subtest("sockhash create_update_free"))
test_sockmap_create_update_free(BPF_MAP_TYPE_SOCKHASH);
if (test__start_subtest("sockmap sk_msg load helpers"))
test_skmsg_helpers(BPF_MAP_TYPE_SOCKMAP);
if (test__start_subtest("sockhash sk_msg load helpers"))
test_skmsg_helpers(BPF_MAP_TYPE_SOCKHASH);
}

View File

@ -0,0 +1,97 @@
// SPDX-License-Identifier: GPL-2.0
#include <uapi/linux/bpf.h>
#include <linux/if_link.h>
#include <test_progs.h>
#include "test_xdp_devmap_helpers.skel.h"
#include "test_xdp_with_devmap_helpers.skel.h"
#define IFINDEX_LO 1
struct bpf_devmap_val {
u32 ifindex; /* device index */
union {
int fd; /* prog fd on map write */
u32 id; /* prog id on map read */
} bpf_prog;
};
void test_xdp_with_devmap_helpers(void)
{
struct test_xdp_with_devmap_helpers *skel;
struct bpf_prog_info info = {};
struct bpf_devmap_val val = {
.ifindex = IFINDEX_LO,
};
__u32 len = sizeof(info);
__u32 duration = 0, idx = 0;
int err, dm_fd, map_fd;
skel = test_xdp_with_devmap_helpers__open_and_load();
if (CHECK_FAIL(!skel)) {
perror("test_xdp_with_devmap_helpers__open_and_load");
return;
}
/* can not attach program with DEVMAPs that allow programs
* as xdp generic
*/
dm_fd = bpf_program__fd(skel->progs.xdp_redir_prog);
err = bpf_set_link_xdp_fd(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE);
CHECK(err == 0, "Generic attach of program with 8-byte devmap",
"should have failed\n");
dm_fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
map_fd = bpf_map__fd(skel->maps.dm_ports);
err = bpf_obj_get_info_by_fd(dm_fd, &info, &len);
if (CHECK_FAIL(err))
goto out_close;
val.bpf_prog.fd = dm_fd;
err = bpf_map_update_elem(map_fd, &idx, &val, 0);
CHECK(err, "Add program to devmap entry",
"err %d errno %d\n", err, errno);
err = bpf_map_lookup_elem(map_fd, &idx, &val);
CHECK(err, "Read devmap entry", "err %d errno %d\n", err, errno);
CHECK(info.id != val.bpf_prog.id, "Expected program id in devmap entry",
"expected %u read %u\n", info.id, val.bpf_prog.id);
/* can not attach BPF_XDP_DEVMAP program to a device */
err = bpf_set_link_xdp_fd(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE);
CHECK(err == 0, "Attach of BPF_XDP_DEVMAP program",
"should have failed\n");
val.ifindex = 1;
val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_prog);
err = bpf_map_update_elem(map_fd, &idx, &val, 0);
CHECK(err == 0, "Add non-BPF_XDP_DEVMAP program to devmap entry",
"should have failed\n");
out_close:
test_xdp_with_devmap_helpers__destroy(skel);
}
void test_neg_xdp_devmap_helpers(void)
{
struct test_xdp_devmap_helpers *skel;
__u32 duration = 0;
skel = test_xdp_devmap_helpers__open_and_load();
if (CHECK(skel,
"Load of XDP program accessing egress ifindex without attach type",
"should have failed\n")) {
test_xdp_devmap_helpers__destroy(skel);
}
}
void test_xdp_devmap_attach(void)
{
if (test__start_subtest("DEVMAP with programs in entries"))
test_xdp_with_devmap_helpers();
if (test__start_subtest("Verifier check of DEVMAP programs"))
test_neg_xdp_devmap_helpers();
}

View File

@ -20,20 +20,20 @@
#include <bpf/bpf_endian.h>
int _version SEC("version") = 1;
#define PROG(F) SEC(#F) int bpf_func_##F
#define PROG(F) PROG_(F, _##F)
#define PROG_(NUM, NAME) SEC("flow_dissector/"#NUM) int bpf_func##NAME
/* These are the identifiers of the BPF programs that will be used in tail
* calls. Name is limited to 16 characters, with the terminating character and
* bpf_func_ above, we have only 6 to work with, anything after will be cropped.
*/
enum {
IP,
IPV6,
IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
IPV6FR, /* Fragmentation IPv6 Extension Header */
MPLS,
VLAN,
};
#define IP 0
#define IPV6 1
#define IPV6OP 2 /* Destination/Hop-by-Hop Options IPv6 Ext. Header */
#define IPV6FR 3 /* Fragmentation IPv6 Extension Header */
#define MPLS 4
#define VLAN 5
#define MAX_PROG 6
#define IP_MF 0x2000
#define IP_OFFSET 0x1FFF
@ -59,7 +59,7 @@ struct frag_hdr {
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 8);
__uint(max_entries, MAX_PROG);
__uint(key_size, sizeof(__u32));
__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

View File

@ -9,6 +9,8 @@
#include <linux/in6.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
#include <linux/if.h>
#include <errno.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
@ -21,6 +23,10 @@
#define TCP_CA_NAME_MAX 16
#endif
#ifndef IFNAMSIZ
#define IFNAMSIZ 16
#endif
int _version SEC("version") = 1;
__attribute__ ((noinline))
@ -75,6 +81,29 @@ static __inline int set_cc(struct bpf_sock_addr *ctx)
return 0;
}
static __inline int bind_to_device(struct bpf_sock_addr *ctx)
{
char veth1[IFNAMSIZ] = "test_sock_addr1";
char veth2[IFNAMSIZ] = "test_sock_addr2";
char missing[IFNAMSIZ] = "nonexistent_dev";
char del_bind[IFNAMSIZ] = "";
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
&veth1, sizeof(veth1)))
return 1;
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
&veth2, sizeof(veth2)))
return 1;
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
&missing, sizeof(missing)) != -ENODEV)
return 1;
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
&del_bind, sizeof(del_bind)))
return 1;
return 0;
}
SEC("cgroup/connect4")
int connect_v4_prog(struct bpf_sock_addr *ctx)
{
@ -88,6 +117,10 @@ int connect_v4_prog(struct bpf_sock_addr *ctx)
tuple.ipv4.daddr = bpf_htonl(DST_REWRITE_IP4);
tuple.ipv4.dport = bpf_htons(DST_REWRITE_PORT4);
/* Bind to device and unbind it. */
if (bind_to_device(ctx))
return 0;
if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
return 0;
else if (ctx->type == SOCK_STREAM)

View File

@ -0,0 +1,33 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Facebook
#include <linux/bpf.h>
#include <stdint.h>
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(value_size, sizeof(int));
__uint(key_size, sizeof(int));
} perfbuf SEC(".maps");
const volatile int batch_cnt = 0;
long sample_val = 42;
long dropped __attribute__((aligned(128))) = 0;
SEC("fentry/__x64_sys_getpgid")
int bench_perfbuf(void *ctx)
{
__u64 *sample;
int i;
for (i = 0; i < batch_cnt; i++) {
if (bpf_perf_event_output(ctx, &perfbuf, BPF_F_CURRENT_CPU,
&sample_val, sizeof(sample_val)))
__sync_add_and_fetch(&dropped, 1);
}
return 0;
}

View File

@ -0,0 +1,60 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Facebook
#include <linux/bpf.h>
#include <stdint.h>
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
} ringbuf SEC(".maps");
const volatile int batch_cnt = 0;
const volatile long use_output = 0;
long sample_val = 42;
long dropped __attribute__((aligned(128))) = 0;
const volatile long wakeup_data_size = 0;
static __always_inline long get_flags()
{
long sz;
if (!wakeup_data_size)
return 0;
sz = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
return sz >= wakeup_data_size ? BPF_RB_FORCE_WAKEUP : BPF_RB_NO_WAKEUP;
}
SEC("fentry/__x64_sys_getpgid")
int bench_ringbuf(void *ctx)
{
long *sample, flags;
int i;
if (!use_output) {
for (i = 0; i < batch_cnt; i++) {
sample = bpf_ringbuf_reserve(&ringbuf,
sizeof(sample_val), 0);
if (!sample) {
__sync_add_and_fetch(&dropped, 1);
} else {
*sample = sample_val;
flags = get_flags();
bpf_ringbuf_submit(sample, flags);
}
}
} else {
for (i = 0; i < batch_cnt; i++) {
flags = get_flags();
if (bpf_ringbuf_output(&ringbuf, &sample_val,
sizeof(sample_val), flags))
__sync_add_and_fetch(&dropped, 1);
}
}
return 0;
}

View File

@ -0,0 +1,78 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Facebook
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
struct sample {
int pid;
int seq;
long value;
char comm[16];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 1 << 12);
} ringbuf SEC(".maps");
/* inputs */
int pid = 0;
long value = 0;
long flags = 0;
/* outputs */
long total = 0;
long discarded = 0;
long dropped = 0;
long avail_data = 0;
long ring_size = 0;
long cons_pos = 0;
long prod_pos = 0;
/* inner state */
long seq = 0;
SEC("tp/syscalls/sys_enter_getpgid")
int test_ringbuf(void *ctx)
{
int cur_pid = bpf_get_current_pid_tgid() >> 32;
struct sample *sample;
int zero = 0;
if (cur_pid != pid)
return 0;
sample = bpf_ringbuf_reserve(&ringbuf, sizeof(*sample), 0);
if (!sample) {
__sync_fetch_and_add(&dropped, 1);
return 1;
}
sample->pid = pid;
bpf_get_current_comm(sample->comm, sizeof(sample->comm));
sample->value = value;
sample->seq = seq++;
__sync_fetch_and_add(&total, 1);
if (sample->seq & 1) {
/* copy from reserved sample to a new one... */
bpf_ringbuf_output(&ringbuf, sample, sizeof(*sample), flags);
/* ...and then discard reserved sample */
bpf_ringbuf_discard(sample, flags);
__sync_fetch_and_add(&discarded, 1);
} else {
bpf_ringbuf_submit(sample, flags);
}
avail_data = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
ring_size = bpf_ringbuf_query(&ringbuf, BPF_RB_RING_SIZE);
cons_pos = bpf_ringbuf_query(&ringbuf, BPF_RB_CONS_POS);
prod_pos = bpf_ringbuf_query(&ringbuf, BPF_RB_PROD_POS);
return 0;
}

View File

@ -0,0 +1,77 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Facebook
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
struct sample {
int pid;
int seq;
long value;
char comm[16];
};
struct ringbuf_map {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 1 << 12);
} ringbuf1 SEC(".maps"),
ringbuf2 SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(max_entries, 4);
__type(key, int);
__array(values, struct ringbuf_map);
} ringbuf_arr SEC(".maps") = {
.values = {
[0] = &ringbuf1,
[2] = &ringbuf2,
},
};
/* inputs */
int pid = 0;
int target_ring = 0;
long value = 0;
/* outputs */
long total = 0;
long dropped = 0;
long skipped = 0;
SEC("tp/syscalls/sys_enter_getpgid")
int test_ringbuf(void *ctx)
{
int cur_pid = bpf_get_current_pid_tgid() >> 32;
struct sample *sample;
void *rb;
int zero = 0;
if (cur_pid != pid)
return 0;
rb = bpf_map_lookup_elem(&ringbuf_arr, &target_ring);
if (!rb) {
skipped += 1;
return 1;
}
sample = bpf_ringbuf_reserve(rb, sizeof(*sample), 0);
if (!sample) {
dropped += 1;
return 1;
}
sample->pid = pid;
bpf_get_current_comm(sample->comm, sizeof(sample->comm));
sample->value = value;
sample->seq = total;
total += 1;
bpf_ringbuf_submit(sample, 0);
return 0;
}

View File

@ -0,0 +1,28 @@
// SPDX-License-Identifier: GPL-2.0-only
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#define TEST_COMM_LEN 16
struct {
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
__uint(max_entries, 1);
__type(key, u32);
__type(value, u32);
} cgroup_map SEC(".maps");
char _license[] SEC("license") = "GPL";
SEC("classifier/test_skb_helpers")
int test_skb_helpers(struct __sk_buff *skb)
{
struct task_struct *task;
char comm[TEST_COMM_LEN];
__u32 tpid;
task = (struct task_struct *)bpf_get_current_task();
bpf_probe_read_kernel(&tpid , sizeof(tpid), &task->tgid);
bpf_probe_read_kernel_str(&comm, sizeof(comm), &task->comm);
return 0;
}

View File

@ -0,0 +1,47 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Isovalent, Inc.
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_SOCKMAP);
__uint(max_entries, 2);
__type(key, __u32);
__type(value, __u64);
} sock_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_SOCKHASH);
__uint(max_entries, 2);
__type(key, __u32);
__type(value, __u64);
} sock_hash SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
__uint(map_flags, BPF_F_NO_PREALLOC);
__type(key, __u32);
__type(value, __u64);
} socket_storage SEC(".maps");
SEC("sk_msg")
int prog_msg_verdict(struct sk_msg_md *msg)
{
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
int verdict = SK_PASS;
__u32 pid, tpid;
__u64 *sk_stg;
pid = bpf_get_current_pid_tgid() >> 32;
sk_stg = bpf_sk_storage_get(&socket_storage, msg->sk, 0, BPF_SK_STORAGE_GET_F_CREATE);
if (!sk_stg)
return SK_DROP;
*sk_stg = pid;
bpf_probe_read_kernel(&tpid , sizeof(tpid), &task->tgid);
if (pid != tpid)
verdict = SK_DROP;
bpf_sk_storage_delete(&socket_storage, (void *)msg->sk);
return verdict;
}
char _license[] SEC("license") = "GPL";

View File

@ -79,11 +79,18 @@ struct {
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__uint(max_entries, 2);
__type(key, int);
__type(value, int);
} sock_skb_opts SEC(".maps");
struct {
__uint(type, TEST_MAP_TYPE);
__uint(max_entries, 20);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
} tls_sock_map SEC(".maps");
SEC("sk_skb1")
int bpf_prog1(struct __sk_buff *skb)
{
@ -118,6 +125,43 @@ int bpf_prog2(struct __sk_buff *skb)
}
SEC("sk_skb3")
int bpf_prog3(struct __sk_buff *skb)
{
const int one = 1;
int err, *f, ret = SK_PASS;
void *data_end;
char *c;
err = bpf_skb_pull_data(skb, 19);
if (err)
goto tls_out;
c = (char *)(long)skb->data;
data_end = (void *)(long)skb->data_end;
if (c + 18 < data_end)
memcpy(&c[13], "PASS", 4);
f = bpf_map_lookup_elem(&sock_skb_opts, &one);
if (f && *f) {
__u64 flags = 0;
ret = 0;
flags = *f;
#ifdef SOCKMAP
return bpf_sk_redirect_map(skb, &tls_sock_map, ret, flags);
#else
return bpf_sk_redirect_hash(skb, &tls_sock_map, &ret, flags);
#endif
}
f = bpf_map_lookup_elem(&sock_skb_opts, &one);
if (f && *f)
ret = SK_DROP;
tls_out:
return ret;
}
SEC("sockops")
int bpf_sockmap(struct bpf_sock_ops *skops)
{

View File

@ -0,0 +1,22 @@
// SPDX-License-Identifier: GPL-2.0
/* fails to load without expected_attach_type = BPF_XDP_DEVMAP
* because of access to egress_ifindex
*/
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
SEC("xdp_dm_log")
int xdpdm_devlog(struct xdp_md *ctx)
{
char fmt[] = "devmap redirect: dev %u -> dev %u len %u\n";
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
unsigned int len = data_end - data;
bpf_trace_printk(fmt, sizeof(fmt),
ctx->ingress_ifindex, ctx->egress_ifindex, len);
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";

View File

@ -0,0 +1,44 @@
// SPDX-License-Identifier: GPL-2.0
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(__u32));
__uint(value_size, sizeof(struct bpf_devmap_val));
__uint(max_entries, 4);
} dm_ports SEC(".maps");
SEC("xdp_redir")
int xdp_redir_prog(struct xdp_md *ctx)
{
return bpf_redirect_map(&dm_ports, 1, 0);
}
/* invalid program on DEVMAP entry;
* SEC name means expected attach type not set
*/
SEC("xdp_dummy")
int xdp_dummy_prog(struct xdp_md *ctx)
{
return XDP_PASS;
}
/* valid program on DEVMAP entry via SEC name;
* has access to egress and ingress ifindex
*/
SEC("xdp_devmap")
int xdp_dummy_dm(struct xdp_md *ctx)
{
char fmt[] = "devmap redirect: dev %u -> dev %u len %u\n";
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
unsigned int len = data_end - data;
bpf_trace_printk(fmt, sizeof(fmt),
ctx->ingress_ifindex, ctx->egress_ifindex, len);
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";

View File

@ -1394,23 +1394,25 @@ static void test_map_rdonly(void)
key = 1;
value = 1234;
/* Insert key=1 element. */
/* Try to insert key=1 element. */
assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == -1 &&
errno == EPERM);
/* Check that key=2 is not found. */
/* Check that key=1 is not found. */
assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == ENOENT);
assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == ENOENT);
close(fd);
}
static void test_map_wronly(void)
static void test_map_wronly_hash(void)
{
int fd, key = 0, value = 0;
fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
MAP_SIZE, map_flags | BPF_F_WRONLY);
if (fd < 0) {
printf("Failed to create map for read only test '%s'!\n",
printf("Failed to create map for write only test '%s'!\n",
strerror(errno));
exit(1);
}
@ -1420,9 +1422,49 @@ static void test_map_wronly(void)
/* Insert key=1 element. */
assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0);
/* Check that key=2 is not found. */
/* Check that reading elements and keys from the map is not allowed. */
assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == EPERM);
assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == EPERM);
close(fd);
}
static void test_map_wronly_stack_or_queue(enum bpf_map_type map_type)
{
int fd, value = 0;
assert(map_type == BPF_MAP_TYPE_QUEUE ||
map_type == BPF_MAP_TYPE_STACK);
fd = bpf_create_map(map_type, 0, sizeof(value), MAP_SIZE,
map_flags | BPF_F_WRONLY);
/* Stack/Queue maps do not support BPF_F_NO_PREALLOC */
if (map_flags & BPF_F_NO_PREALLOC) {
assert(fd < 0 && errno == EINVAL);
return;
}
if (fd < 0) {
printf("Failed to create map '%s'!\n", strerror(errno));
exit(1);
}
value = 1234;
assert(bpf_map_update_elem(fd, NULL, &value, BPF_ANY) == 0);
/* Peek element should fail */
assert(bpf_map_lookup_elem(fd, NULL, &value) == -1 && errno == EPERM);
/* Pop element should fail */
assert(bpf_map_lookup_and_delete_elem(fd, NULL, &value) == -1 &&
errno == EPERM);
close(fd);
}
static void test_map_wronly(void)
{
test_map_wronly_hash();
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_STACK);
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_QUEUE);
}
static void prepare_reuseport_grp(int type, int map_fd, size_t map_elem_size,

View File

@ -63,8 +63,8 @@ int s1, s2, c1, c2, p1, p2;
int test_cnt;
int passed;
int failed;
int map_fd[8];
struct bpf_map *maps[8];
int map_fd[9];
struct bpf_map *maps[9];
int prog_fd[11];
int txmsg_pass;
@ -79,7 +79,10 @@ int txmsg_end_push;
int txmsg_start_pop;
int txmsg_pop;
int txmsg_ingress;
int txmsg_skb;
int txmsg_redir_skb;
int txmsg_ktls_skb;
int txmsg_ktls_skb_drop;
int txmsg_ktls_skb_redir;
int ktls;
int peek_flag;
@ -104,7 +107,7 @@ static const struct option long_options[] = {
{"txmsg_start_pop", required_argument, NULL, 'w'},
{"txmsg_pop", required_argument, NULL, 'x'},
{"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
{"txmsg_skb", no_argument, &txmsg_skb, 1 },
{"txmsg_redir_skb", no_argument, &txmsg_redir_skb, 1 },
{"ktls", no_argument, &ktls, 1 },
{"peek", no_argument, &peek_flag, 1 },
{"whitelist", required_argument, NULL, 'n' },
@ -169,7 +172,8 @@ static void test_reset(void)
txmsg_start_push = txmsg_end_push = 0;
txmsg_pass = txmsg_drop = txmsg_redir = 0;
txmsg_apply = txmsg_cork = 0;
txmsg_ingress = txmsg_skb = 0;
txmsg_ingress = txmsg_redir_skb = 0;
txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
}
static int test_start_subtest(const struct _test *t, struct sockmap_options *o)
@ -502,14 +506,41 @@ unwind_iov:
static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz)
{
int i, j, bytes_cnt = 0;
int i, j = 0, bytes_cnt = 0;
unsigned char k = 0;
for (i = 0; i < msg->msg_iovlen; i++) {
unsigned char *d = msg->msg_iov[i].iov_base;
for (j = 0;
j < msg->msg_iov[i].iov_len && size; j++) {
/* Special case test for skb ingress + ktls */
if (i == 0 && txmsg_ktls_skb) {
if (msg->msg_iov[i].iov_len < 4)
return -EIO;
if (txmsg_ktls_skb_redir) {
if (memcmp(&d[13], "PASS", 4) != 0) {
fprintf(stderr,
"detected redirect ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[13], d[14], d[15], d[16]);
return -EIO;
}
d[13] = 0;
d[14] = 1;
d[15] = 2;
d[16] = 3;
j = 13;
} else if (txmsg_ktls_skb) {
if (memcmp(d, "PASS", 4) != 0) {
fprintf(stderr,
"detected ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[0], d[1], d[2], d[3]);
return -EIO;
}
d[0] = 0;
d[1] = 1;
d[2] = 2;
d[3] = 3;
}
}
for (; j < msg->msg_iov[i].iov_len && size; j++) {
if (d[j] != k++) {
fprintf(stderr,
"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
@ -724,7 +755,7 @@ static int sendmsg_test(struct sockmap_options *opt)
rxpid = fork();
if (rxpid == 0) {
iov_buf -= (txmsg_pop - txmsg_start_pop + 1);
if (opt->drop_expected)
if (opt->drop_expected || txmsg_ktls_skb_drop)
_exit(0);
if (!iov_buf) /* zero bytes sent case */
@ -911,8 +942,28 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
return err;
}
/* Attach programs to TLS sockmap */
if (txmsg_ktls_skb) {
err = bpf_prog_attach(prog_fd[0], map_fd[8],
BPF_SK_SKB_STREAM_PARSER, 0);
if (err) {
fprintf(stderr,
"ERROR: bpf_prog_attach (TLS sockmap %i->%i): %d (%s)\n",
prog_fd[0], map_fd[8], err, strerror(errno));
return err;
}
err = bpf_prog_attach(prog_fd[2], map_fd[8],
BPF_SK_SKB_STREAM_VERDICT, 0);
if (err) {
fprintf(stderr, "ERROR: bpf_prog_attach (TLS sockmap): %d (%s)\n",
err, strerror(errno));
return err;
}
}
/* Attach to cgroups */
err = bpf_prog_attach(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
err = bpf_prog_attach(prog_fd[3], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
if (err) {
fprintf(stderr, "ERROR: bpf_prog_attach (groups): %d (%s)\n",
err, strerror(errno));
@ -928,15 +979,15 @@ run:
/* Attach txmsg program to sockmap */
if (txmsg_pass)
tx_prog_fd = prog_fd[3];
else if (txmsg_redir)
tx_prog_fd = prog_fd[4];
else if (txmsg_apply)
else if (txmsg_redir)
tx_prog_fd = prog_fd[5];
else if (txmsg_cork)
else if (txmsg_apply)
tx_prog_fd = prog_fd[6];
else if (txmsg_drop)
else if (txmsg_cork)
tx_prog_fd = prog_fd[7];
else if (txmsg_drop)
tx_prog_fd = prog_fd[8];
else
tx_prog_fd = 0;
@ -1108,7 +1159,35 @@ run:
}
}
if (txmsg_skb) {
if (txmsg_ktls_skb) {
int ingress = BPF_F_INGRESS;
i = 0;
err = bpf_map_update_elem(map_fd[8], &i, &p2, BPF_ANY);
if (err) {
fprintf(stderr,
"ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
err, strerror(errno));
}
if (txmsg_ktls_skb_redir) {
i = 1;
err = bpf_map_update_elem(map_fd[7],
&i, &ingress, BPF_ANY);
if (err) {
fprintf(stderr,
"ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
err, strerror(errno));
}
}
if (txmsg_ktls_skb_drop) {
i = 1;
err = bpf_map_update_elem(map_fd[7], &i, &i, BPF_ANY);
}
}
if (txmsg_redir_skb) {
int skb_fd = (test == SENDMSG || test == SENDPAGE) ?
p2 : p1;
int ingress = BPF_F_INGRESS;
@ -1123,8 +1202,7 @@ run:
}
i = 3;
err = bpf_map_update_elem(map_fd[0],
&i, &skb_fd, BPF_ANY);
err = bpf_map_update_elem(map_fd[0], &i, &skb_fd, BPF_ANY);
if (err) {
fprintf(stderr,
"ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
@ -1158,9 +1236,12 @@ run:
fprintf(stderr, "unknown test\n");
out:
/* Detatch and zero all the maps */
bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);
bpf_prog_detach2(prog_fd[3], cg_fd, BPF_CGROUP_SOCK_OPS);
bpf_prog_detach2(prog_fd[0], map_fd[0], BPF_SK_SKB_STREAM_PARSER);
bpf_prog_detach2(prog_fd[1], map_fd[0], BPF_SK_SKB_STREAM_VERDICT);
bpf_prog_detach2(prog_fd[0], map_fd[8], BPF_SK_SKB_STREAM_PARSER);
bpf_prog_detach2(prog_fd[2], map_fd[8], BPF_SK_SKB_STREAM_VERDICT);
if (tx_prog_fd >= 0)
bpf_prog_detach2(tx_prog_fd, map_fd[1], BPF_SK_MSG_VERDICT);
@ -1229,8 +1310,10 @@ static void test_options(char *options)
}
if (txmsg_ingress)
strncat(options, "ingress,", OPTSTRING);
if (txmsg_skb)
strncat(options, "skb,", OPTSTRING);
if (txmsg_redir_skb)
strncat(options, "redir_skb,", OPTSTRING);
if (txmsg_ktls_skb)
strncat(options, "ktls_skb,", OPTSTRING);
if (ktls)
strncat(options, "ktls,", OPTSTRING);
if (peek_flag)
@ -1362,6 +1445,40 @@ static void test_txmsg_ingress_redir(int cgrp, struct sockmap_options *opt)
test_send(opt, cgrp);
}
static void test_txmsg_skb(int cgrp, struct sockmap_options *opt)
{
bool data = opt->data_test;
int k = ktls;
opt->data_test = true;
ktls = 1;
txmsg_pass = txmsg_drop = 0;
txmsg_ingress = txmsg_redir = 0;
txmsg_ktls_skb = 1;
txmsg_pass = 1;
/* Using data verification so ensure iov layout is
* expected from test receiver side. e.g. has enough
* bytes to write test code.
*/
opt->iov_length = 100;
opt->iov_count = 1;
opt->rate = 1;
test_exec(cgrp, opt);
txmsg_ktls_skb_drop = 1;
test_exec(cgrp, opt);
txmsg_ktls_skb_drop = 0;
txmsg_ktls_skb_redir = 1;
test_exec(cgrp, opt);
opt->data_test = data;
ktls = k;
}
/* Test cork with hung data. This tests poor usage patterns where
* cork can leave data on the ring if user program is buggy and
* doesn't flush them somehow. They do take some time however
@ -1542,11 +1659,13 @@ char *map_names[] = {
"sock_bytes",
"sock_redir_flags",
"sock_skb_opts",
"tls_sock_map",
};
int prog_attach_type[] = {
BPF_SK_SKB_STREAM_PARSER,
BPF_SK_SKB_STREAM_VERDICT,
BPF_SK_SKB_STREAM_VERDICT,
BPF_CGROUP_SOCK_OPS,
BPF_SK_MSG_VERDICT,
BPF_SK_MSG_VERDICT,
@ -1558,6 +1677,7 @@ int prog_attach_type[] = {
};
int prog_type[] = {
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_SOCK_OPS,
@ -1620,6 +1740,7 @@ struct _test test[] = {
{"txmsg test redirect", test_txmsg_redir},
{"txmsg test drop", test_txmsg_drop},
{"txmsg test ingress redirect", test_txmsg_ingress_redir},
{"txmsg test skb", test_txmsg_skb},
{"txmsg test apply", test_txmsg_apply},
{"txmsg test cork", test_txmsg_cork},
{"txmsg test hanging corks", test_txmsg_cork_hangs},

View File

@ -15,7 +15,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R0 max value is outside of the array range",
.errstr = "R0 max value is outside of the allowed memory range",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
@ -44,7 +44,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R0 max value is outside of the array range",
.errstr = "R0 max value is outside of the allowed memory range",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},

View File

@ -117,7 +117,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R0 min value is outside of the array range",
.errstr = "R0 min value is outside of the allowed memory range",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
@ -137,7 +137,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
.errstr = "R0 unbounded memory access, make sure to bounds check any such access",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},

View File

@ -20,7 +20,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_8b = { 3 },
.errstr = "R0 max value is outside of the array range",
.errstr = "R0 max value is outside of the allowed memory range",
.result = REJECT,
},
{
@ -146,7 +146,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_8b = { 3 },
.errstr = "R0 min value is outside of the array range",
.errstr = "R0 min value is outside of the allowed memory range",
.result = REJECT
},
{
@ -354,7 +354,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_8b = { 3 },
.errstr = "R0 max value is outside of the array range",
.errstr = "R0 max value is outside of the allowed memory range",
.result = REJECT
},
{

View File

@ -105,7 +105,7 @@
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.fixup_map_hash_8b = { 16 },
.result = REJECT,
.errstr = "R0 min value is outside of the array range",
.errstr = "R0 min value is outside of the allowed memory range",
},
{
"calls: overlapping caller/callee",

View File

@ -68,7 +68,7 @@
},
.fixup_map_array_48b = { 1 },
.result = REJECT,
.errstr = "R1 min value is outside of the array range",
.errstr = "R1 min value is outside of the allowed memory range",
},
{
"direct map access, write test 7",
@ -220,7 +220,7 @@
},
.fixup_map_array_small = { 1 },
.result = REJECT,
.errstr = "R1 min value is outside of the array range",
.errstr = "R1 min value is outside of the allowed memory range",
},
{
"direct map access, write test 19",

View File

@ -318,7 +318,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 4 },
.errstr = "R1 min value is outside of the array range",
.errstr = "R1 min value is outside of the allowed memory range",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},

View File

@ -280,7 +280,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R1 min value is outside of the array range",
.errstr = "R1 min value is outside of the allowed memory range",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
@ -415,7 +415,7 @@
BPF_EXIT_INSN(),
},
.fixup_map_hash_48b = { 3 },
.errstr = "R1 min value is outside of the array range",
.errstr = "R1 min value is outside of the allowed memory range",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
@ -926,7 +926,7 @@
},
.fixup_map_hash_16b = { 3, 10 },
.result = REJECT,
.errstr = "R2 unbounded memory access, make sure to bounds check any array access into a map",
.errstr = "R2 unbounded memory access, make sure to bounds check any such access",
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
{

View File

@ -50,7 +50,7 @@
.fixup_map_array_48b = { 8 },
.result = ACCEPT,
.result_unpriv = REJECT,
.errstr_unpriv = "R0 min value is outside of the array range",
.errstr_unpriv = "R0 min value is outside of the allowed memory range",
.retval = 1,
},
{
@ -325,7 +325,7 @@
},
.fixup_map_array_48b = { 3 },
.result = REJECT,
.errstr = "R0 min value is outside of the array range",
.errstr = "R0 min value is outside of the allowed memory range",
.result_unpriv = REJECT,
.errstr_unpriv = "R0 pointer arithmetic of map value goes out of range",
},
@ -601,7 +601,7 @@
},
.fixup_map_array_48b = { 3 },
.result = REJECT,
.errstr = "R1 max value is outside of the array range",
.errstr = "R1 max value is outside of the allowed memory range",
.errstr_unpriv = "R1 pointer arithmetic of map value goes out of range",
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
@ -726,7 +726,7 @@
},
.fixup_map_array_48b = { 3 },
.result = REJECT,
.errstr = "R0 min value is outside of the array range",
.errstr = "R0 min value is outside of the allowed memory range",
},
{
"map access: value_ptr -= known scalar, 2",