Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-06-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 55 non-merge commits during the last 1 day(s) which contain a total of 91 files changed, 4986 insertions(+), 463 deletions(-). The main changes are: 1) Add rx_queue_mapping to bpf_sock from Amritha. 2) Add BPF ring buffer, from Andrii. 3) Attach and run programs through devmap, from David. 4) Allow SO_BINDTODEVICE opt in bpf_setsockopt, from Ferenc. 5) link based flow_dissector, from Jakub. 6) Use tracing helpers for lsm programs, from Jiri. 7) Several sk_msg fixes and extensions, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
commit
9a25c1df24
209
Documentation/bpf/ringbuf.rst
Normal file
209
Documentation/bpf/ringbuf.rst
Normal file
@ -0,0 +1,209 @@
|
||||
===============
|
||||
BPF ring buffer
|
||||
===============
|
||||
|
||||
This document describes BPF ring buffer design, API, and implementation details.
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 2
|
||||
|
||||
Motivation
|
||||
----------
|
||||
|
||||
There are two distinctive motivators for this work, which are not satisfied by
|
||||
existing perf buffer, which prompted creation of a new ring buffer
|
||||
implementation.
|
||||
|
||||
- more efficient memory utilization by sharing ring buffer across CPUs;
|
||||
- preserving ordering of events that happen sequentially in time, even across
|
||||
multiple CPUs (e.g., fork/exec/exit events for a task).
|
||||
|
||||
These two problems are independent, but perf buffer fails to satisfy both.
|
||||
Both are a result of a choice to have per-CPU perf ring buffer. Both can be
|
||||
also solved by having an MPSC implementation of ring buffer. The ordering
|
||||
problem could technically be solved for perf buffer with some in-kernel
|
||||
counting, but given the first one requires an MPSC buffer, the same solution
|
||||
would solve the second problem automatically.
|
||||
|
||||
Semantics and APIs
|
||||
------------------
|
||||
|
||||
Single ring buffer is presented to BPF programs as an instance of BPF map of
|
||||
type ``BPF_MAP_TYPE_RINGBUF``. Two other alternatives considered, but
|
||||
ultimately rejected.
|
||||
|
||||
One way would be to, similar to ``BPF_MAP_TYPE_PERF_EVENT_ARRAY``, make
|
||||
``BPF_MAP_TYPE_RINGBUF`` could represent an array of ring buffers, but not
|
||||
enforce "same CPU only" rule. This would be more familiar interface compatible
|
||||
with existing perf buffer use in BPF, but would fail if application needed more
|
||||
advanced logic to lookup ring buffer by arbitrary key.
|
||||
``BPF_MAP_TYPE_HASH_OF_MAPS`` addresses this with current approach.
|
||||
Additionally, given the performance of BPF ringbuf, many use cases would just
|
||||
opt into a simple single ring buffer shared among all CPUs, for which current
|
||||
approach would be an overkill.
|
||||
|
||||
Another approach could introduce a new concept, alongside BPF map, to represent
|
||||
generic "container" object, which doesn't necessarily have key/value interface
|
||||
with lookup/update/delete operations. This approach would add a lot of extra
|
||||
infrastructure that has to be built for observability and verifier support. It
|
||||
would also add another concept that BPF developers would have to familiarize
|
||||
themselves with, new syntax in libbpf, etc. But then would really provide no
|
||||
additional benefits over the approach of using a map. ``BPF_MAP_TYPE_RINGBUF``
|
||||
doesn't support lookup/update/delete operations, but so doesn't few other map
|
||||
types (e.g., queue and stack; array doesn't support delete, etc).
|
||||
|
||||
The approach chosen has an advantage of re-using existing BPF map
|
||||
infrastructure (introspection APIs in kernel, libbpf support, etc), being
|
||||
familiar concept (no need to teach users a new type of object in BPF program),
|
||||
and utilizing existing tooling (bpftool). For common scenario of using a single
|
||||
ring buffer for all CPUs, it's as simple and straightforward, as would be with
|
||||
a dedicated "container" object. On the other hand, by being a map, it can be
|
||||
combined with ``ARRAY_OF_MAPS`` and ``HASH_OF_MAPS`` map-in-maps to implement
|
||||
a wide variety of topologies, from one ring buffer for each CPU (e.g., as
|
||||
a replacement for perf buffer use cases), to a complicated application
|
||||
hashing/sharding of ring buffers (e.g., having a small pool of ring buffers
|
||||
with hashed task's tgid being a look up key to preserve order, but reduce
|
||||
contention).
|
||||
|
||||
Key and value sizes are enforced to be zero. ``max_entries`` is used to specify
|
||||
the size of ring buffer and has to be a power of 2 value.
|
||||
|
||||
There are a bunch of similarities between perf buffer
|
||||
(``BPF_MAP_TYPE_PERF_EVENT_ARRAY``) and new BPF ring buffer semantics:
|
||||
|
||||
- variable-length records;
|
||||
- if there is no more space left in ring buffer, reservation fails, no
|
||||
blocking;
|
||||
- memory-mappable data area for user-space applications for ease of
|
||||
consumption and high performance;
|
||||
- epoll notifications for new incoming data;
|
||||
- but still the ability to do busy polling for new data to achieve the
|
||||
lowest latency, if necessary.
|
||||
|
||||
BPF ringbuf provides two sets of APIs to BPF programs:
|
||||
|
||||
- ``bpf_ringbuf_output()`` allows to *copy* data from one place to a ring
|
||||
buffer, similarly to ``bpf_perf_event_output()``;
|
||||
- ``bpf_ringbuf_reserve()``/``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()``
|
||||
APIs split the whole process into two steps. First, a fixed amount of space
|
||||
is reserved. If successful, a pointer to a data inside ring buffer data
|
||||
area is returned, which BPF programs can use similarly to a data inside
|
||||
array/hash maps. Once ready, this piece of memory is either committed or
|
||||
discarded. Discard is similar to commit, but makes consumer ignore the
|
||||
record.
|
||||
|
||||
``bpf_ringbuf_output()`` has disadvantage of incurring extra memory copy,
|
||||
because record has to be prepared in some other place first. But it allows to
|
||||
submit records of the length that's not known to verifier beforehand. It also
|
||||
closely matches ``bpf_perf_event_output()``, so will simplify migration
|
||||
significantly.
|
||||
|
||||
``bpf_ringbuf_reserve()`` avoids the extra copy of memory by providing a memory
|
||||
pointer directly to ring buffer memory. In a lot of cases records are larger
|
||||
than BPF stack space allows, so many programs have use extra per-CPU array as
|
||||
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
|
||||
completely. But in exchange, it only allows a known constant size of memory to
|
||||
be reserved, such that verifier can verify that BPF program can't access memory
|
||||
outside its reserved record space. bpf_ringbuf_output(), while slightly slower
|
||||
due to extra memory copy, covers some use cases that are not suitable for
|
||||
``bpf_ringbuf_reserve()``.
|
||||
|
||||
The difference between commit and discard is very small. Discard just marks
|
||||
a record as discarded, and such records are supposed to be ignored by consumer
|
||||
code. Discard is useful for some advanced use-cases, such as ensuring
|
||||
all-or-nothing multi-record submission, or emulating temporary
|
||||
``malloc()``/``free()`` within single BPF program invocation.
|
||||
|
||||
Each reserved record is tracked by verifier through existing
|
||||
reference-tracking logic, similar to socket ref-tracking. It is thus
|
||||
impossible to reserve a record, but forget to submit (or discard) it.
|
||||
|
||||
``bpf_ringbuf_query()`` helper allows to query various properties of ring
|
||||
buffer. Currently 4 are supported:
|
||||
|
||||
- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
|
||||
- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
|
||||
- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical possition
|
||||
of consumer/producer, respectively.
|
||||
|
||||
Returned values are momentarily snapshots of ring buffer state and could be
|
||||
off by the time helper returns, so this should be used only for
|
||||
debugging/reporting reasons or for implementing various heuristics, that take
|
||||
into account highly-changeable nature of some of those characteristics.
|
||||
|
||||
One such heuristic might involve more fine-grained control over poll/epoll
|
||||
notifications about new data availability in ring buffer. Together with
|
||||
``BPF_RB_NO_WAKEUP``/``BPF_RB_FORCE_WAKEUP`` flags for output/commit/discard
|
||||
helpers, it allows BPF program a high degree of control and, e.g., more
|
||||
efficient batched notifications. Default self-balancing strategy, though,
|
||||
should be adequate for most applications and will work reliable and efficiently
|
||||
already.
|
||||
|
||||
Design and Implementation
|
||||
-------------------------
|
||||
|
||||
This reserve/commit schema allows a natural way for multiple producers, either
|
||||
on different CPUs or even on the same CPU/in the same BPF program, to reserve
|
||||
independent records and work with them without blocking other producers. This
|
||||
means that if BPF program was interruped by another BPF program sharing the
|
||||
same ring buffer, they will both get a record reserved (provided there is
|
||||
enough space left) and can work with it and submit it independently. This
|
||||
applies to NMI context as well, except that due to using a spinlock during
|
||||
reservation, in NMI context, ``bpf_ringbuf_reserve()`` might fail to get
|
||||
a lock, in which case reservation will fail even if ring buffer is not full.
|
||||
|
||||
The ring buffer itself internally is implemented as a power-of-2 sized
|
||||
circular buffer, with two logical and ever-increasing counters (which might
|
||||
wrap around on 32-bit architectures, that's not a problem):
|
||||
|
||||
- consumer counter shows up to which logical position consumer consumed the
|
||||
data;
|
||||
- producer counter denotes amount of data reserved by all producers.
|
||||
|
||||
Each time a record is reserved, producer that "owns" the record will
|
||||
successfully advance producer counter. At that point, data is still not yet
|
||||
ready to be consumed, though. Each record has 8 byte header, which contains the
|
||||
length of reserved record, as well as two extra bits: busy bit to denote that
|
||||
record is still being worked on, and discard bit, which might be set at commit
|
||||
time if record is discarded. In the latter case, consumer is supposed to skip
|
||||
the record and move on to the next one. Record header also encodes record's
|
||||
relative offset from the beginning of ring buffer data area (in pages). This
|
||||
allows ``bpf_ringbuf_commit()``/``bpf_ringbuf_discard()`` to accept only the
|
||||
pointer to the record itself, without requiring also the pointer to ring buffer
|
||||
itself. Ring buffer memory location will be restored from record metadata
|
||||
header. This significantly simplifies verifier, as well as improving API
|
||||
usability.
|
||||
|
||||
Producer counter increments are serialized under spinlock, so there is
|
||||
a strict ordering between reservations. Commits, on the other hand, are
|
||||
completely lockless and independent. All records become available to consumer
|
||||
in the order of reservations, but only after all previous records where
|
||||
already committed. It is thus possible for slow producers to temporarily hold
|
||||
off submitted records, that were reserved later.
|
||||
|
||||
Reservation/commit/consumer protocol is verified by litmus tests in
|
||||
Documentation/litmus_tests/bpf-rb/_.
|
||||
|
||||
One interesting implementation bit, that significantly simplifies (and thus
|
||||
speeds up as well) implementation of both producers and consumers is how data
|
||||
area is mapped twice contiguously back-to-back in the virtual memory. This
|
||||
allows to not take any special measures for samples that have to wrap around
|
||||
at the end of the circular buffer data area, because the next page after the
|
||||
last data page would be first data page again, and thus the sample will still
|
||||
appear completely contiguous in virtual memory. See comment and a simple ASCII
|
||||
diagram showing this visually in ``bpf_ringbuf_area_alloc()``.
|
||||
|
||||
Another feature that distinguishes BPF ringbuf from perf ring buffer is
|
||||
a self-pacing notifications of new data being availability.
|
||||
``bpf_ringbuf_commit()`` implementation will send a notification of new record
|
||||
being available after commit only if consumer has already caught up right up to
|
||||
the record being committed. If not, consumer still has to catch up and thus
|
||||
will see new data anyways without needing an extra poll notification.
|
||||
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c_) show that
|
||||
this allows to achieve a very high throughput without having to resort to
|
||||
tricks like "notify only every Nth sample", which are necessary with perf
|
||||
buffer. For extreme cases, when BPF program wants more manual control of
|
||||
notifications, commit/discard/output helpers accept ``BPF_RB_NO_WAKEUP`` and
|
||||
``BPF_RB_FORCE_WAKEUP`` flags, which give full control over notifications of
|
||||
data availability, but require extra caution and diligence in using this API.
|
@ -18456,7 +18456,7 @@ L: netdev@vger.kernel.org
|
||||
L: bpf@vger.kernel.org
|
||||
S: Maintained
|
||||
F: include/net/xdp_sock*
|
||||
F: include/net/xsk_buffer_pool.h
|
||||
F: include/net/xsk_buff_pool.h
|
||||
F: include/uapi/linux/if_xdp.h
|
||||
F: net/xdp/
|
||||
F: samples/bpf/xdpsock*
|
||||
|
@ -263,7 +263,7 @@ static int ena_xdp_tx_map_buff(struct ena_ring *xdp_ring,
|
||||
dma_addr_t dma = 0;
|
||||
u32 size;
|
||||
|
||||
tx_info->xdpf = convert_to_xdp_frame(xdp);
|
||||
tx_info->xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
size = tx_info->xdpf->len;
|
||||
ena_buf = tx_info->bufs;
|
||||
|
||||
|
@ -2167,7 +2167,7 @@ static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
|
||||
|
||||
int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
|
||||
{
|
||||
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
|
||||
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
|
||||
if (unlikely(!xdpf))
|
||||
return I40E_XDP_CONSUMED;
|
||||
|
@ -254,7 +254,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring)
|
||||
*/
|
||||
int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring *xdp_ring)
|
||||
{
|
||||
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
|
||||
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
|
||||
if (unlikely(!xdpf))
|
||||
return ICE_XDP_CONSUMED;
|
||||
|
@ -2215,7 +2215,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
|
||||
case XDP_PASS:
|
||||
break;
|
||||
case XDP_TX:
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf)) {
|
||||
result = IXGBE_XDP_CONSUMED;
|
||||
break;
|
||||
|
@ -107,7 +107,7 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
|
||||
case XDP_PASS:
|
||||
break;
|
||||
case XDP_TX:
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf)) {
|
||||
result = IXGBE_XDP_CONSUMED;
|
||||
break;
|
||||
|
@ -2073,7 +2073,7 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
|
||||
int cpu;
|
||||
u32 ret;
|
||||
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf))
|
||||
return MVNETA_XDP_DROPPED;
|
||||
|
||||
|
@ -64,7 +64,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
|
||||
struct xdp_frame *xdpf;
|
||||
dma_addr_t dma_addr;
|
||||
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf))
|
||||
return false;
|
||||
|
||||
@ -97,10 +97,10 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
|
||||
xdpi.frame.xdpf = xdpf;
|
||||
xdpi.frame.dma_addr = dma_addr;
|
||||
} else {
|
||||
/* Driver assumes that convert_to_xdp_frame returns an xdp_frame
|
||||
* that points to the same memory region as the original
|
||||
* xdp_buff. It allows to map the memory only once and to use
|
||||
* the DMA_BIDIRECTIONAL mode.
|
||||
/* Driver assumes that xdp_convert_buff_to_frame returns
|
||||
* an xdp_frame that points to the same memory region as
|
||||
* the original xdp_buff. It allows to map the memory only
|
||||
* once and to use the DMA_BIDIRECTIONAL mode.
|
||||
*/
|
||||
|
||||
xdpi.mode = MLX5E_XDP_XMIT_MODE_PAGE;
|
||||
|
@ -329,7 +329,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
|
||||
|
||||
case XDP_TX:
|
||||
/* Buffer ownership passes to tx on success. */
|
||||
xdpf = convert_to_xdp_frame(&xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(&xdp);
|
||||
err = efx_xdp_tx_buffers(efx, 1, &xdpf, true);
|
||||
if (unlikely(err != 1)) {
|
||||
efx_free_rx_buffers(rx_queue, rx_buf, 1);
|
||||
|
@ -867,7 +867,7 @@ static u32 netsec_xdp_queue_one(struct netsec_priv *priv,
|
||||
static u32 netsec_xdp_xmit_back(struct netsec_priv *priv, struct xdp_buff *xdp)
|
||||
{
|
||||
struct netsec_desc_ring *tx_ring = &priv->desc_ring[NETSEC_RING_TX];
|
||||
struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
|
||||
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
u32 ret;
|
||||
|
||||
if (unlikely(!xdpf))
|
||||
|
@ -1355,7 +1355,7 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp,
|
||||
ret = CPSW_XDP_PASS;
|
||||
break;
|
||||
case XDP_TX:
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf))
|
||||
goto drop;
|
||||
|
||||
|
@ -1295,7 +1295,7 @@ resample:
|
||||
|
||||
static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
|
||||
{
|
||||
struct xdp_frame *frame = convert_to_xdp_frame(xdp);
|
||||
struct xdp_frame *frame = xdp_convert_buff_to_frame(xdp);
|
||||
|
||||
if (unlikely(!frame))
|
||||
return -EOVERFLOW;
|
||||
|
@ -541,7 +541,7 @@ out:
|
||||
static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp,
|
||||
struct veth_xdp_tx_bq *bq)
|
||||
{
|
||||
struct xdp_frame *frame = convert_to_xdp_frame(xdp);
|
||||
struct xdp_frame *frame = xdp_convert_buff_to_frame(xdp);
|
||||
|
||||
if (unlikely(!frame))
|
||||
return -EOVERFLOW;
|
||||
@ -575,11 +575,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
|
||||
struct xdp_buff xdp;
|
||||
u32 act;
|
||||
|
||||
xdp.data_hard_start = hard_start;
|
||||
xdp.data = frame->data;
|
||||
xdp.data_end = frame->data + frame->len;
|
||||
xdp.data_meta = frame->data - frame->metasize;
|
||||
xdp.frame_sz = frame->frame_sz;
|
||||
xdp_convert_frame_to_buff(frame, &xdp);
|
||||
xdp.rxq = &rq->xdp_rxq;
|
||||
|
||||
act = bpf_prog_run_xdp(xdp_prog, &xdp);
|
||||
|
@ -703,7 +703,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
|
||||
break;
|
||||
case XDP_TX:
|
||||
stats->xdp_tx++;
|
||||
xdpf = convert_to_xdp_frame(&xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(&xdp);
|
||||
if (unlikely(!xdpf))
|
||||
goto err_xdp;
|
||||
err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
|
||||
@ -892,7 +892,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
|
||||
break;
|
||||
case XDP_TX:
|
||||
stats->xdp_tx++;
|
||||
xdpf = convert_to_xdp_frame(&xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(&xdp);
|
||||
if (unlikely(!xdpf))
|
||||
goto err_xdp;
|
||||
err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
|
||||
|
64
include/linux/bpf-netns.h
Normal file
64
include/linux/bpf-netns.h
Normal file
@ -0,0 +1,64 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef _BPF_NETNS_H
|
||||
#define _BPF_NETNS_H
|
||||
|
||||
#include <linux/mutex.h>
|
||||
#include <uapi/linux/bpf.h>
|
||||
|
||||
enum netns_bpf_attach_type {
|
||||
NETNS_BPF_INVALID = -1,
|
||||
NETNS_BPF_FLOW_DISSECTOR = 0,
|
||||
MAX_NETNS_BPF_ATTACH_TYPE
|
||||
};
|
||||
|
||||
static inline enum netns_bpf_attach_type
|
||||
to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
|
||||
{
|
||||
switch (attach_type) {
|
||||
case BPF_FLOW_DISSECTOR:
|
||||
return NETNS_BPF_FLOW_DISSECTOR;
|
||||
default:
|
||||
return NETNS_BPF_INVALID;
|
||||
}
|
||||
}
|
||||
|
||||
/* Protects updates to netns_bpf */
|
||||
extern struct mutex netns_bpf_mutex;
|
||||
|
||||
union bpf_attr;
|
||||
struct bpf_prog;
|
||||
|
||||
#ifdef CONFIG_NET
|
||||
int netns_bpf_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr);
|
||||
int netns_bpf_prog_attach(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog);
|
||||
int netns_bpf_prog_detach(const union bpf_attr *attr);
|
||||
int netns_bpf_link_create(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog);
|
||||
#else
|
||||
static inline int netns_bpf_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int netns_bpf_prog_attach(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int netns_bpf_link_create(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _BPF_NETNS_H */
|
@ -90,6 +90,8 @@ struct bpf_map_ops {
|
||||
int (*map_direct_value_meta)(const struct bpf_map *map,
|
||||
u64 imm, u32 *off);
|
||||
int (*map_mmap)(struct bpf_map *map, struct vm_area_struct *vma);
|
||||
__poll_t (*map_poll)(struct bpf_map *map, struct file *filp,
|
||||
struct poll_table_struct *pts);
|
||||
};
|
||||
|
||||
struct bpf_map_memory {
|
||||
@ -244,6 +246,9 @@ enum bpf_arg_type {
|
||||
ARG_PTR_TO_LONG, /* pointer to long */
|
||||
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */
|
||||
ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */
|
||||
ARG_PTR_TO_ALLOC_MEM, /* pointer to dynamically allocated memory */
|
||||
ARG_PTR_TO_ALLOC_MEM_OR_NULL, /* pointer to dynamically allocated memory or NULL */
|
||||
ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */
|
||||
};
|
||||
|
||||
/* type of values returned from helper functions */
|
||||
@ -255,6 +260,7 @@ enum bpf_return_type {
|
||||
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
|
||||
RET_PTR_TO_TCP_SOCK_OR_NULL, /* returns a pointer to a tcp_sock or NULL */
|
||||
RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */
|
||||
RET_PTR_TO_ALLOC_MEM_OR_NULL, /* returns a pointer to dynamically allocated memory or NULL */
|
||||
};
|
||||
|
||||
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
|
||||
@ -322,6 +328,8 @@ enum bpf_reg_type {
|
||||
PTR_TO_XDP_SOCK, /* reg points to struct xdp_sock */
|
||||
PTR_TO_BTF_ID, /* reg points to kernel struct */
|
||||
PTR_TO_BTF_ID_OR_NULL, /* reg points to kernel struct or NULL */
|
||||
PTR_TO_MEM, /* reg points to valid memory region */
|
||||
PTR_TO_MEM_OR_NULL, /* reg points to valid memory region or NULL */
|
||||
};
|
||||
|
||||
/* The information passed from prog-specific *_is_valid_access
|
||||
@ -1242,6 +1250,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
|
||||
struct net_device *dev_rx);
|
||||
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
|
||||
struct bpf_prog *xdp_prog);
|
||||
bool dev_map_can_have_prog(struct bpf_map *map);
|
||||
|
||||
struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
|
||||
void __cpu_map_flush(void);
|
||||
@ -1355,6 +1364,10 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
static inline bool dev_map_can_have_prog(struct bpf_map *map)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline void __dev_flush(void)
|
||||
{
|
||||
@ -1611,10 +1624,18 @@ extern const struct bpf_func_proto bpf_tcp_sock_proto;
|
||||
extern const struct bpf_func_proto bpf_jiffies64_proto;
|
||||
extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto;
|
||||
extern const struct bpf_func_proto bpf_event_output_data_proto;
|
||||
extern const struct bpf_func_proto bpf_ringbuf_output_proto;
|
||||
extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
|
||||
extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
|
||||
extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
|
||||
extern const struct bpf_func_proto bpf_ringbuf_query_proto;
|
||||
|
||||
const struct bpf_func_proto *bpf_tracing_func_proto(
|
||||
enum bpf_func_id func_id, const struct bpf_prog *prog);
|
||||
|
||||
const struct bpf_func_proto *tracing_prog_func_proto(
|
||||
enum bpf_func_id func_id, const struct bpf_prog *prog);
|
||||
|
||||
/* Shared helpers among cBPF and eBPF. */
|
||||
void bpf_user_rnd_init_once(void);
|
||||
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
|
||||
|
@ -118,6 +118,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
|
||||
#if defined(CONFIG_BPF_JIT)
|
||||
BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
|
||||
#endif
|
||||
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
|
||||
|
||||
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
|
||||
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
|
||||
@ -125,3 +126,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
|
||||
BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
|
||||
#endif
|
||||
BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
|
||||
#ifdef CONFIG_NET
|
||||
BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
|
||||
#endif
|
||||
|
@ -54,6 +54,8 @@ struct bpf_reg_state {
|
||||
|
||||
u32 btf_id; /* for PTR_TO_BTF_ID */
|
||||
|
||||
u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
|
||||
|
||||
/* Max size from any of the above. */
|
||||
unsigned long raw;
|
||||
};
|
||||
@ -63,6 +65,8 @@ struct bpf_reg_state {
|
||||
* offset, so they can share range knowledge.
|
||||
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
|
||||
* came from, when one is tested for != NULL.
|
||||
* For PTR_TO_MEM_OR_NULL this is used to identify memory allocation
|
||||
* for the purpose of tracking that it's freed.
|
||||
* For PTR_TO_SOCKET this is used to share which pointers retain the
|
||||
* same reference to the socket, to determine proper reference freeing.
|
||||
*/
|
||||
|
@ -1283,32 +1283,6 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
|
||||
const struct flow_dissector_key *key,
|
||||
unsigned int key_count);
|
||||
|
||||
#ifdef CONFIG_NET
|
||||
int skb_flow_dissector_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr);
|
||||
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog);
|
||||
|
||||
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr);
|
||||
#else
|
||||
static inline int skb_flow_dissector_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
#endif
|
||||
|
||||
struct bpf_flow_dissector;
|
||||
bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
|
||||
__be16 proto, int nhoff, int hlen, unsigned int flags);
|
||||
|
@ -437,4 +437,12 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs)
|
||||
psock_set_prog(&progs->skb_verdict, NULL);
|
||||
}
|
||||
|
||||
int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb);
|
||||
|
||||
static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
|
||||
{
|
||||
if (!psock)
|
||||
return false;
|
||||
return psock->parser.enabled;
|
||||
}
|
||||
#endif /* _LINUX_SKMSG_H */
|
||||
|
@ -8,6 +8,8 @@
|
||||
#include <linux/string.h>
|
||||
#include <uapi/linux/if_ether.h>
|
||||
|
||||
struct bpf_prog;
|
||||
struct net;
|
||||
struct sk_buff;
|
||||
|
||||
/**
|
||||
@ -369,4 +371,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
|
||||
memset(key_basic, 0, sizeof(*key_basic));
|
||||
}
|
||||
|
||||
#ifdef CONFIG_BPF_SYSCALL
|
||||
int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
|
||||
#endif /* CONFIG_BPF_SYSCALL */
|
||||
|
||||
#endif
|
||||
|
@ -33,6 +33,7 @@
|
||||
#include <net/netns/mpls.h>
|
||||
#include <net/netns/can.h>
|
||||
#include <net/netns/xdp.h>
|
||||
#include <net/netns/bpf.h>
|
||||
#include <linux/ns_common.h>
|
||||
#include <linux/idr.h>
|
||||
#include <linux/skbuff.h>
|
||||
@ -162,7 +163,8 @@ struct net {
|
||||
#endif
|
||||
struct net_generic __rcu *gen;
|
||||
|
||||
struct bpf_prog __rcu *flow_dissector_prog;
|
||||
/* Used to store attached BPF programs */
|
||||
struct netns_bpf bpf;
|
||||
|
||||
/* Note : following structs are cache line aligned */
|
||||
#ifdef CONFIG_XFRM
|
||||
|
18
include/net/netns/bpf.h
Normal file
18
include/net/netns/bpf.h
Normal file
@ -0,0 +1,18 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* BPF programs attached to network namespace
|
||||
*/
|
||||
|
||||
#ifndef __NETNS_BPF_H__
|
||||
#define __NETNS_BPF_H__
|
||||
|
||||
#include <linux/bpf-netns.h>
|
||||
|
||||
struct bpf_prog;
|
||||
|
||||
struct netns_bpf {
|
||||
struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
|
||||
struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
|
||||
};
|
||||
|
||||
#endif /* __NETNS_BPF_H__ */
|
@ -2690,7 +2690,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
|
||||
|
||||
void sock_def_readable(struct sock *sk);
|
||||
|
||||
int sock_bindtoindex(struct sock *sk, int ifindex);
|
||||
int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
|
||||
void sock_enable_timestamps(struct sock *sk);
|
||||
void sock_no_linger(struct sock *sk);
|
||||
void sock_set_keepalive(struct sock *sk);
|
||||
|
@ -571,6 +571,15 @@ static inline bool tls_sw_has_ctx_tx(const struct sock *sk)
|
||||
return !!tls_sw_ctx_tx(ctx);
|
||||
}
|
||||
|
||||
static inline bool tls_sw_has_ctx_rx(const struct sock *sk)
|
||||
{
|
||||
struct tls_context *ctx = tls_get_ctx(sk);
|
||||
|
||||
if (!ctx)
|
||||
return false;
|
||||
return !!tls_sw_ctx_rx(ctx);
|
||||
}
|
||||
|
||||
void tls_sw_write_space(struct sock *sk, struct tls_context *ctx);
|
||||
void tls_device_write_space(struct sock *sk, struct tls_context *ctx);
|
||||
|
||||
|
@ -61,12 +61,17 @@ struct xdp_rxq_info {
|
||||
struct xdp_mem_info mem;
|
||||
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
|
||||
|
||||
struct xdp_txq_info {
|
||||
struct net_device *dev;
|
||||
};
|
||||
|
||||
struct xdp_buff {
|
||||
void *data;
|
||||
void *data_end;
|
||||
void *data_meta;
|
||||
void *data_hard_start;
|
||||
struct xdp_rxq_info *rxq;
|
||||
struct xdp_txq_info *txq;
|
||||
u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
|
||||
};
|
||||
|
||||
@ -106,9 +111,19 @@ void xdp_warn(const char *msg, const char *func, const int line);
|
||||
|
||||
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
|
||||
|
||||
static inline
|
||||
void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
|
||||
{
|
||||
xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
|
||||
xdp->data = frame->data;
|
||||
xdp->data_end = frame->data + frame->len;
|
||||
xdp->data_meta = frame->data - frame->metasize;
|
||||
xdp->frame_sz = frame->frame_sz;
|
||||
}
|
||||
|
||||
/* Convert xdp_buff to xdp_frame */
|
||||
static inline
|
||||
struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
|
||||
struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
|
||||
{
|
||||
struct xdp_frame *xdp_frame;
|
||||
int metasize;
|
||||
|
@ -147,6 +147,7 @@ enum bpf_map_type {
|
||||
BPF_MAP_TYPE_SK_STORAGE,
|
||||
BPF_MAP_TYPE_DEVMAP_HASH,
|
||||
BPF_MAP_TYPE_STRUCT_OPS,
|
||||
BPF_MAP_TYPE_RINGBUF,
|
||||
};
|
||||
|
||||
/* Note that tracing related programs such as
|
||||
@ -224,6 +225,7 @@ enum bpf_attach_type {
|
||||
BPF_CGROUP_INET6_GETPEERNAME,
|
||||
BPF_CGROUP_INET4_GETSOCKNAME,
|
||||
BPF_CGROUP_INET6_GETSOCKNAME,
|
||||
BPF_XDP_DEVMAP,
|
||||
__MAX_BPF_ATTACH_TYPE
|
||||
};
|
||||
|
||||
@ -235,6 +237,7 @@ enum bpf_link_type {
|
||||
BPF_LINK_TYPE_TRACING = 2,
|
||||
BPF_LINK_TYPE_CGROUP = 3,
|
||||
BPF_LINK_TYPE_ITER = 4,
|
||||
BPF_LINK_TYPE_NETNS = 5,
|
||||
|
||||
MAX_BPF_LINK_TYPE,
|
||||
};
|
||||
@ -3157,6 +3160,59 @@ union bpf_attr {
|
||||
* **bpf_sk_cgroup_id**\ ().
|
||||
* Return
|
||||
* The id is returned or 0 in case the id could not be retrieved.
|
||||
*
|
||||
* void *bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
|
||||
* Description
|
||||
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* 0, on success;
|
||||
* < 0, on error.
|
||||
*
|
||||
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
|
||||
* Description
|
||||
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
|
||||
* Return
|
||||
* Valid pointer with *size* bytes of memory available; NULL,
|
||||
* otherwise.
|
||||
*
|
||||
* void bpf_ringbuf_submit(void *data, u64 flags)
|
||||
* Description
|
||||
* Submit reserved ring buffer sample, pointed to by *data*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* Nothing. Always succeeds.
|
||||
*
|
||||
* void bpf_ringbuf_discard(void *data, u64 flags)
|
||||
* Description
|
||||
* Discard reserved ring buffer sample, pointed to by *data*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* Nothing. Always succeeds.
|
||||
*
|
||||
* u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
|
||||
* Description
|
||||
* Query various characteristics of provided ring buffer. What
|
||||
* exactly is queries is determined by *flags*:
|
||||
* - BPF_RB_AVAIL_DATA - amount of data not yet consumed;
|
||||
* - BPF_RB_RING_SIZE - the size of ring buffer;
|
||||
* - BPF_RB_CONS_POS - consumer position (can wrap around);
|
||||
* - BPF_RB_PROD_POS - producer(s) position (can wrap around);
|
||||
* Data returned is just a momentary snapshots of actual values
|
||||
* and could be inaccurate, so this facility should be used to
|
||||
* power heuristics and for reporting, not to make 100% correct
|
||||
* calculation.
|
||||
* Return
|
||||
* Requested value, or 0, if flags are not recognized.
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -3288,7 +3344,12 @@ union bpf_attr {
|
||||
FN(seq_printf), \
|
||||
FN(seq_write), \
|
||||
FN(sk_cgroup_id), \
|
||||
FN(sk_ancestor_cgroup_id),
|
||||
FN(sk_ancestor_cgroup_id), \
|
||||
FN(ringbuf_output), \
|
||||
FN(ringbuf_reserve), \
|
||||
FN(ringbuf_submit), \
|
||||
FN(ringbuf_discard), \
|
||||
FN(ringbuf_query),
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
* function eBPF program intends to call
|
||||
@ -3398,6 +3459,29 @@ enum {
|
||||
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
|
||||
};
|
||||
|
||||
/* BPF_FUNC_bpf_ringbuf_commit, BPF_FUNC_bpf_ringbuf_discard, and
|
||||
* BPF_FUNC_bpf_ringbuf_output flags.
|
||||
*/
|
||||
enum {
|
||||
BPF_RB_NO_WAKEUP = (1ULL << 0),
|
||||
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
|
||||
};
|
||||
|
||||
/* BPF_FUNC_bpf_ringbuf_query flags */
|
||||
enum {
|
||||
BPF_RB_AVAIL_DATA = 0,
|
||||
BPF_RB_RING_SIZE = 1,
|
||||
BPF_RB_CONS_POS = 2,
|
||||
BPF_RB_PROD_POS = 3,
|
||||
};
|
||||
|
||||
/* BPF ring buffer constants */
|
||||
enum {
|
||||
BPF_RINGBUF_BUSY_BIT = (1U << 31),
|
||||
BPF_RINGBUF_DISCARD_BIT = (1U << 30),
|
||||
BPF_RINGBUF_HDR_SZ = 8,
|
||||
};
|
||||
|
||||
/* Mode for BPF_FUNC_skb_adjust_room helper. */
|
||||
enum bpf_adj_room_mode {
|
||||
BPF_ADJ_ROOM_NET,
|
||||
@ -3530,6 +3614,7 @@ struct bpf_sock {
|
||||
__u32 dst_ip4;
|
||||
__u32 dst_ip6[4];
|
||||
__u32 state;
|
||||
__s32 rx_queue_mapping;
|
||||
};
|
||||
|
||||
struct bpf_tcp_sock {
|
||||
@ -3623,6 +3708,8 @@ struct xdp_md {
|
||||
/* Below access go through struct xdp_rxq_info */
|
||||
__u32 ingress_ifindex; /* rxq->dev->ifindex */
|
||||
__u32 rx_queue_index; /* rxq->queue_index */
|
||||
|
||||
__u32 egress_ifindex; /* txq->dev->ifindex */
|
||||
};
|
||||
|
||||
enum sk_action {
|
||||
@ -3645,6 +3732,8 @@ struct sk_msg_md {
|
||||
__u32 remote_port; /* Stored in network byte order */
|
||||
__u32 local_port; /* stored in host byte order */
|
||||
__u32 size; /* Total size of sk_msg */
|
||||
|
||||
__bpf_md_ptr(struct bpf_sock *, sk); /* current socket */
|
||||
};
|
||||
|
||||
struct sk_reuseport_md {
|
||||
@ -3751,6 +3840,10 @@ struct bpf_link_info {
|
||||
__u64 cgroup_id;
|
||||
__u32 attach_type;
|
||||
} cgroup;
|
||||
struct {
|
||||
__u32 netns_ino;
|
||||
__u32 attach_type;
|
||||
} netns;
|
||||
};
|
||||
} __attribute__((aligned(8)));
|
||||
|
||||
|
@ -4,7 +4,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init)
|
||||
|
||||
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
|
||||
obj-$(CONFIG_BPF_JIT) += trampoline.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += btf.o
|
||||
@ -13,6 +13,7 @@ ifeq ($(CONFIG_NET),y)
|
||||
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += offload.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
|
||||
endif
|
||||
ifeq ($(CONFIG_PERF_EVENTS),y)
|
||||
obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
|
||||
|
@ -49,6 +49,6 @@ const struct bpf_prog_ops lsm_prog_ops = {
|
||||
};
|
||||
|
||||
const struct bpf_verifier_ops lsm_verifier_ops = {
|
||||
.get_func_proto = bpf_tracing_func_proto,
|
||||
.get_func_proto = tracing_prog_func_proto,
|
||||
.is_valid_access = btf_ctx_access,
|
||||
};
|
||||
|
@ -595,7 +595,7 @@ static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
|
||||
mutex_lock(&cgroup_mutex);
|
||||
/* link might have been auto-released by dying cgroup, so fail */
|
||||
if (!cg_link->cgroup) {
|
||||
ret = -EINVAL;
|
||||
ret = -ENOLINK;
|
||||
goto out_unlock;
|
||||
}
|
||||
if (old_prog && link->prog != old_prog) {
|
||||
|
@ -1543,7 +1543,7 @@ select_insn:
|
||||
|
||||
/* ARG1 at this point is guaranteed to point to CTX from
|
||||
* the verifier side due to the fact that the tail call is
|
||||
* handeled like a helper, that is, bpf_tail_call_proto,
|
||||
* handled like a helper, that is, bpf_tail_call_proto,
|
||||
* where arg1_type is ARG_PTR_TO_CTX.
|
||||
*/
|
||||
insn = prog->insnsi;
|
||||
|
@ -621,7 +621,7 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
|
||||
{
|
||||
struct xdp_frame *xdpf;
|
||||
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf))
|
||||
return -EOVERFLOW;
|
||||
|
||||
|
@ -60,12 +60,23 @@ struct xdp_dev_bulk_queue {
|
||||
unsigned int count;
|
||||
};
|
||||
|
||||
/* DEVMAP values */
|
||||
struct bpf_devmap_val {
|
||||
u32 ifindex; /* device index */
|
||||
union {
|
||||
int fd; /* prog fd on map write */
|
||||
u32 id; /* prog id on map read */
|
||||
} bpf_prog;
|
||||
};
|
||||
|
||||
struct bpf_dtab_netdev {
|
||||
struct net_device *dev; /* must be first member, due to tracepoint */
|
||||
struct hlist_node index_hlist;
|
||||
struct bpf_dtab *dtab;
|
||||
struct bpf_prog *xdp_prog;
|
||||
struct rcu_head rcu;
|
||||
unsigned int idx;
|
||||
struct bpf_devmap_val val;
|
||||
};
|
||||
|
||||
struct bpf_dtab {
|
||||
@ -105,12 +116,18 @@ static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
|
||||
|
||||
static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
|
||||
{
|
||||
u32 valsize = attr->value_size;
|
||||
u64 cost = 0;
|
||||
int err;
|
||||
|
||||
/* check sanity of attributes */
|
||||
/* check sanity of attributes. 2 value sizes supported:
|
||||
* 4 bytes: ifindex
|
||||
* 8 bytes: ifindex + prog fd
|
||||
*/
|
||||
if (attr->max_entries == 0 || attr->key_size != 4 ||
|
||||
attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK)
|
||||
(valsize != offsetofend(struct bpf_devmap_val, ifindex) &&
|
||||
valsize != offsetofend(struct bpf_devmap_val, bpf_prog.fd)) ||
|
||||
attr->map_flags & ~DEV_CREATE_FLAG_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
/* Lookup returns a pointer straight to dev->ifindex, so make sure the
|
||||
@ -217,6 +234,8 @@ static void dev_map_free(struct bpf_map *map)
|
||||
|
||||
hlist_for_each_entry_safe(dev, next, head, index_hlist) {
|
||||
hlist_del_rcu(&dev->index_hlist);
|
||||
if (dev->xdp_prog)
|
||||
bpf_prog_put(dev->xdp_prog);
|
||||
dev_put(dev->dev);
|
||||
kfree(dev);
|
||||
}
|
||||
@ -231,6 +250,8 @@ static void dev_map_free(struct bpf_map *map)
|
||||
if (!dev)
|
||||
continue;
|
||||
|
||||
if (dev->xdp_prog)
|
||||
bpf_prog_put(dev->xdp_prog);
|
||||
dev_put(dev->dev);
|
||||
kfree(dev);
|
||||
}
|
||||
@ -317,6 +338,16 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
bool dev_map_can_have_prog(struct bpf_map *map)
|
||||
{
|
||||
if ((map->map_type == BPF_MAP_TYPE_DEVMAP ||
|
||||
map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) &&
|
||||
map->value_size != offsetofend(struct bpf_devmap_val, ifindex))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
|
||||
{
|
||||
struct net_device *dev = bq->dev;
|
||||
@ -434,13 +465,40 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
|
||||
if (unlikely(err))
|
||||
return err;
|
||||
|
||||
xdpf = convert_to_xdp_frame(xdp);
|
||||
xdpf = xdp_convert_buff_to_frame(xdp);
|
||||
if (unlikely(!xdpf))
|
||||
return -EOVERFLOW;
|
||||
|
||||
return bq_enqueue(dev, xdpf, dev_rx);
|
||||
}
|
||||
|
||||
static struct xdp_buff *dev_map_run_prog(struct net_device *dev,
|
||||
struct xdp_buff *xdp,
|
||||
struct bpf_prog *xdp_prog)
|
||||
{
|
||||
struct xdp_txq_info txq = { .dev = dev };
|
||||
u32 act;
|
||||
|
||||
xdp->txq = &txq;
|
||||
|
||||
act = bpf_prog_run_xdp(xdp_prog, xdp);
|
||||
switch (act) {
|
||||
case XDP_PASS:
|
||||
return xdp;
|
||||
case XDP_DROP:
|
||||
break;
|
||||
default:
|
||||
bpf_warn_invalid_xdp_action(act);
|
||||
fallthrough;
|
||||
case XDP_ABORTED:
|
||||
trace_xdp_exception(dev, xdp_prog, act);
|
||||
break;
|
||||
}
|
||||
|
||||
xdp_return_buff(xdp);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
|
||||
struct net_device *dev_rx)
|
||||
{
|
||||
@ -452,6 +510,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
|
||||
{
|
||||
struct net_device *dev = dst->dev;
|
||||
|
||||
if (dst->xdp_prog) {
|
||||
xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog);
|
||||
if (!xdp)
|
||||
return 0;
|
||||
}
|
||||
return __xdp_enqueue(dev, xdp, dev_rx);
|
||||
}
|
||||
|
||||
@ -472,18 +535,15 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
|
||||
static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
|
||||
struct net_device *dev = obj ? obj->dev : NULL;
|
||||
|
||||
return dev ? &dev->ifindex : NULL;
|
||||
return obj ? &obj->val : NULL;
|
||||
}
|
||||
|
||||
static void *dev_map_hash_lookup_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct bpf_dtab_netdev *obj = __dev_map_hash_lookup_elem(map,
|
||||
*(u32 *)key);
|
||||
struct net_device *dev = obj ? obj->dev : NULL;
|
||||
|
||||
return dev ? &dev->ifindex : NULL;
|
||||
return obj ? &obj->val : NULL;
|
||||
}
|
||||
|
||||
static void __dev_map_entry_free(struct rcu_head *rcu)
|
||||
@ -491,6 +551,8 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
|
||||
struct bpf_dtab_netdev *dev;
|
||||
|
||||
dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
|
||||
if (dev->xdp_prog)
|
||||
bpf_prog_put(dev->xdp_prog);
|
||||
dev_put(dev->dev);
|
||||
kfree(dev);
|
||||
}
|
||||
@ -541,9 +603,10 @@ static int dev_map_hash_delete_elem(struct bpf_map *map, void *key)
|
||||
|
||||
static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
|
||||
struct bpf_dtab *dtab,
|
||||
u32 ifindex,
|
||||
struct bpf_devmap_val *val,
|
||||
unsigned int idx)
|
||||
{
|
||||
struct bpf_prog *prog = NULL;
|
||||
struct bpf_dtab_netdev *dev;
|
||||
|
||||
dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
|
||||
@ -551,24 +614,46 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
|
||||
if (!dev)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
dev->dev = dev_get_by_index(net, ifindex);
|
||||
if (!dev->dev) {
|
||||
kfree(dev);
|
||||
return ERR_PTR(-EINVAL);
|
||||
dev->dev = dev_get_by_index(net, val->ifindex);
|
||||
if (!dev->dev)
|
||||
goto err_out;
|
||||
|
||||
if (val->bpf_prog.fd >= 0) {
|
||||
prog = bpf_prog_get_type_dev(val->bpf_prog.fd,
|
||||
BPF_PROG_TYPE_XDP, false);
|
||||
if (IS_ERR(prog))
|
||||
goto err_put_dev;
|
||||
if (prog->expected_attach_type != BPF_XDP_DEVMAP)
|
||||
goto err_put_prog;
|
||||
}
|
||||
|
||||
dev->idx = idx;
|
||||
dev->dtab = dtab;
|
||||
if (prog) {
|
||||
dev->xdp_prog = prog;
|
||||
dev->val.bpf_prog.id = prog->aux->id;
|
||||
} else {
|
||||
dev->xdp_prog = NULL;
|
||||
dev->val.bpf_prog.id = 0;
|
||||
}
|
||||
dev->val.ifindex = val->ifindex;
|
||||
|
||||
return dev;
|
||||
err_put_prog:
|
||||
bpf_prog_put(prog);
|
||||
err_put_dev:
|
||||
dev_put(dev->dev);
|
||||
err_out:
|
||||
kfree(dev);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
|
||||
struct bpf_dtab_netdev *dev, *old_dev;
|
||||
u32 ifindex = *(u32 *)value;
|
||||
u32 i = *(u32 *)key;
|
||||
|
||||
if (unlikely(map_flags > BPF_EXIST))
|
||||
@ -578,10 +663,16 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
|
||||
if (unlikely(map_flags == BPF_NOEXIST))
|
||||
return -EEXIST;
|
||||
|
||||
if (!ifindex) {
|
||||
/* already verified value_size <= sizeof val */
|
||||
memcpy(&val, value, map->value_size);
|
||||
|
||||
if (!val.ifindex) {
|
||||
dev = NULL;
|
||||
/* can not specify fd if ifindex is 0 */
|
||||
if (val.bpf_prog.fd != -1)
|
||||
return -EINVAL;
|
||||
} else {
|
||||
dev = __dev_map_alloc_node(net, dtab, ifindex, i);
|
||||
dev = __dev_map_alloc_node(net, dtab, &val, i);
|
||||
if (IS_ERR(dev))
|
||||
return PTR_ERR(dev);
|
||||
}
|
||||
@ -608,13 +699,16 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
|
||||
void *key, void *value, u64 map_flags)
|
||||
{
|
||||
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
|
||||
struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
|
||||
struct bpf_dtab_netdev *dev, *old_dev;
|
||||
u32 ifindex = *(u32 *)value;
|
||||
u32 idx = *(u32 *)key;
|
||||
unsigned long flags;
|
||||
int err = -EEXIST;
|
||||
|
||||
if (unlikely(map_flags > BPF_EXIST || !ifindex))
|
||||
/* already verified value_size <= sizeof val */
|
||||
memcpy(&val, value, map->value_size);
|
||||
|
||||
if (unlikely(map_flags > BPF_EXIST || !val.ifindex))
|
||||
return -EINVAL;
|
||||
|
||||
spin_lock_irqsave(&dtab->index_lock, flags);
|
||||
@ -623,7 +717,7 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
|
||||
if (old_dev && (map_flags & BPF_NOEXIST))
|
||||
goto out_err;
|
||||
|
||||
dev = __dev_map_alloc_node(net, dtab, ifindex, idx);
|
||||
dev = __dev_map_alloc_node(net, dtab, &val, idx);
|
||||
if (IS_ERR(dev)) {
|
||||
err = PTR_ERR(dev);
|
||||
goto out_err;
|
||||
|
@ -601,6 +601,12 @@ const struct bpf_func_proto bpf_event_output_data_proto = {
|
||||
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
|
||||
};
|
||||
|
||||
const struct bpf_func_proto bpf_get_current_task_proto __weak;
|
||||
const struct bpf_func_proto bpf_probe_read_user_proto __weak;
|
||||
const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
|
||||
const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
|
||||
const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
|
||||
|
||||
const struct bpf_func_proto *
|
||||
bpf_base_func_proto(enum bpf_func_id func_id)
|
||||
{
|
||||
@ -629,6 +635,16 @@ bpf_base_func_proto(enum bpf_func_id func_id)
|
||||
return &bpf_ktime_get_ns_proto;
|
||||
case BPF_FUNC_ktime_get_boot_ns:
|
||||
return &bpf_ktime_get_boot_ns_proto;
|
||||
case BPF_FUNC_ringbuf_output:
|
||||
return &bpf_ringbuf_output_proto;
|
||||
case BPF_FUNC_ringbuf_reserve:
|
||||
return &bpf_ringbuf_reserve_proto;
|
||||
case BPF_FUNC_ringbuf_submit:
|
||||
return &bpf_ringbuf_submit_proto;
|
||||
case BPF_FUNC_ringbuf_discard:
|
||||
return &bpf_ringbuf_discard_proto;
|
||||
case BPF_FUNC_ringbuf_query:
|
||||
return &bpf_ringbuf_query_proto;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
@ -647,6 +663,24 @@ bpf_base_func_proto(enum bpf_func_id func_id)
|
||||
return bpf_get_trace_printk_proto();
|
||||
case BPF_FUNC_jiffies64:
|
||||
return &bpf_jiffies64_proto;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
if (!perfmon_capable())
|
||||
return NULL;
|
||||
|
||||
switch (func_id) {
|
||||
case BPF_FUNC_get_current_task:
|
||||
return &bpf_get_current_task_proto;
|
||||
case BPF_FUNC_probe_read_user:
|
||||
return &bpf_probe_read_user_proto;
|
||||
case BPF_FUNC_probe_read_kernel:
|
||||
return &bpf_probe_read_kernel_proto;
|
||||
case BPF_FUNC_probe_read_user_str:
|
||||
return &bpf_probe_read_user_str_proto;
|
||||
case BPF_FUNC_probe_read_kernel_str:
|
||||
return &bpf_probe_read_kernel_str_proto;
|
||||
default:
|
||||
return NULL;
|
||||
}
|
||||
|
373
kernel/bpf/net_namespace.c
Normal file
373
kernel/bpf/net_namespace.c
Normal file
@ -0,0 +1,373 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/filter.h>
|
||||
#include <net/net_namespace.h>
|
||||
|
||||
/*
|
||||
* Functions to manage BPF programs attached to netns
|
||||
*/
|
||||
|
||||
struct bpf_netns_link {
|
||||
struct bpf_link link;
|
||||
enum bpf_attach_type type;
|
||||
enum netns_bpf_attach_type netns_type;
|
||||
|
||||
/* We don't hold a ref to net in order to auto-detach the link
|
||||
* when netns is going away. Instead we rely on pernet
|
||||
* pre_exit callback to clear this pointer. Must be accessed
|
||||
* with netns_bpf_mutex held.
|
||||
*/
|
||||
struct net *net;
|
||||
};
|
||||
|
||||
/* Protects updates to netns_bpf */
|
||||
DEFINE_MUTEX(netns_bpf_mutex);
|
||||
|
||||
/* Must be called with netns_bpf_mutex held. */
|
||||
static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
|
||||
{
|
||||
struct bpf_netns_link *net_link =
|
||||
container_of(link, struct bpf_netns_link, link);
|
||||
|
||||
net_link->net = NULL;
|
||||
}
|
||||
|
||||
static void bpf_netns_link_release(struct bpf_link *link)
|
||||
{
|
||||
struct bpf_netns_link *net_link =
|
||||
container_of(link, struct bpf_netns_link, link);
|
||||
enum netns_bpf_attach_type type = net_link->netns_type;
|
||||
struct net *net;
|
||||
|
||||
/* Link auto-detached by dying netns. */
|
||||
if (!net_link->net)
|
||||
return;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
|
||||
/* Recheck after potential sleep. We can race with cleanup_net
|
||||
* here, but if we see a non-NULL struct net pointer pre_exit
|
||||
* has not happened yet and will block on netns_bpf_mutex.
|
||||
*/
|
||||
net = net_link->net;
|
||||
if (!net)
|
||||
goto out_unlock;
|
||||
|
||||
net->bpf.links[type] = NULL;
|
||||
RCU_INIT_POINTER(net->bpf.progs[type], NULL);
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
}
|
||||
|
||||
static void bpf_netns_link_dealloc(struct bpf_link *link)
|
||||
{
|
||||
struct bpf_netns_link *net_link =
|
||||
container_of(link, struct bpf_netns_link, link);
|
||||
|
||||
kfree(net_link);
|
||||
}
|
||||
|
||||
static int bpf_netns_link_update_prog(struct bpf_link *link,
|
||||
struct bpf_prog *new_prog,
|
||||
struct bpf_prog *old_prog)
|
||||
{
|
||||
struct bpf_netns_link *net_link =
|
||||
container_of(link, struct bpf_netns_link, link);
|
||||
enum netns_bpf_attach_type type = net_link->netns_type;
|
||||
struct net *net;
|
||||
int ret = 0;
|
||||
|
||||
if (old_prog && old_prog != link->prog)
|
||||
return -EPERM;
|
||||
if (new_prog->type != link->prog->type)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
|
||||
net = net_link->net;
|
||||
if (!net || !check_net(net)) {
|
||||
/* Link auto-detached or netns dying */
|
||||
ret = -ENOLINK;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
old_prog = xchg(&link->prog, new_prog);
|
||||
rcu_assign_pointer(net->bpf.progs[type], new_prog);
|
||||
bpf_prog_put(old_prog);
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int bpf_netns_link_fill_info(const struct bpf_link *link,
|
||||
struct bpf_link_info *info)
|
||||
{
|
||||
const struct bpf_netns_link *net_link =
|
||||
container_of(link, struct bpf_netns_link, link);
|
||||
unsigned int inum = 0;
|
||||
struct net *net;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
net = net_link->net;
|
||||
if (net && check_net(net))
|
||||
inum = net->ns.inum;
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
|
||||
info->netns.netns_ino = inum;
|
||||
info->netns.attach_type = net_link->type;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
|
||||
struct seq_file *seq)
|
||||
{
|
||||
struct bpf_link_info info = {};
|
||||
|
||||
bpf_netns_link_fill_info(link, &info);
|
||||
seq_printf(seq,
|
||||
"netns_ino:\t%u\n"
|
||||
"attach_type:\t%u\n",
|
||||
info.netns.netns_ino,
|
||||
info.netns.attach_type);
|
||||
}
|
||||
|
||||
static const struct bpf_link_ops bpf_netns_link_ops = {
|
||||
.release = bpf_netns_link_release,
|
||||
.dealloc = bpf_netns_link_dealloc,
|
||||
.update_prog = bpf_netns_link_update_prog,
|
||||
.fill_link_info = bpf_netns_link_fill_info,
|
||||
.show_fdinfo = bpf_netns_link_show_fdinfo,
|
||||
};
|
||||
|
||||
int netns_bpf_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr)
|
||||
{
|
||||
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
|
||||
u32 prog_id, prog_cnt = 0, flags = 0;
|
||||
enum netns_bpf_attach_type type;
|
||||
struct bpf_prog *attached;
|
||||
struct net *net;
|
||||
|
||||
if (attr->query.query_flags)
|
||||
return -EINVAL;
|
||||
|
||||
type = to_netns_bpf_attach_type(attr->query.attach_type);
|
||||
if (type < 0)
|
||||
return -EINVAL;
|
||||
|
||||
net = get_net_ns_by_fd(attr->query.target_fd);
|
||||
if (IS_ERR(net))
|
||||
return PTR_ERR(net);
|
||||
|
||||
rcu_read_lock();
|
||||
attached = rcu_dereference(net->bpf.progs[type]);
|
||||
if (attached) {
|
||||
prog_cnt = 1;
|
||||
prog_id = attached->aux->id;
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
put_net(net);
|
||||
|
||||
if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
|
||||
return -EFAULT;
|
||||
if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
|
||||
return -EFAULT;
|
||||
|
||||
if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
|
||||
return 0;
|
||||
|
||||
if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
|
||||
{
|
||||
enum netns_bpf_attach_type type;
|
||||
struct net *net;
|
||||
int ret;
|
||||
|
||||
type = to_netns_bpf_attach_type(attr->attach_type);
|
||||
if (type < 0)
|
||||
return -EINVAL;
|
||||
|
||||
net = current->nsproxy->net_ns;
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
|
||||
/* Attaching prog directly is not compatible with links */
|
||||
if (net->bpf.links[type]) {
|
||||
ret = -EEXIST;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
switch (type) {
|
||||
case NETNS_BPF_FLOW_DISSECTOR:
|
||||
ret = flow_dissector_bpf_prog_attach(net, prog);
|
||||
break;
|
||||
default:
|
||||
ret = -EINVAL;
|
||||
break;
|
||||
}
|
||||
out_unlock:
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/* Must be called with netns_bpf_mutex held. */
|
||||
static int __netns_bpf_prog_detach(struct net *net,
|
||||
enum netns_bpf_attach_type type)
|
||||
{
|
||||
struct bpf_prog *attached;
|
||||
|
||||
/* Progs attached via links cannot be detached */
|
||||
if (net->bpf.links[type])
|
||||
return -EINVAL;
|
||||
|
||||
attached = rcu_dereference_protected(net->bpf.progs[type],
|
||||
lockdep_is_held(&netns_bpf_mutex));
|
||||
if (!attached)
|
||||
return -ENOENT;
|
||||
RCU_INIT_POINTER(net->bpf.progs[type], NULL);
|
||||
bpf_prog_put(attached);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int netns_bpf_prog_detach(const union bpf_attr *attr)
|
||||
{
|
||||
enum netns_bpf_attach_type type;
|
||||
int ret;
|
||||
|
||||
type = to_netns_bpf_attach_type(attr->attach_type);
|
||||
if (type < 0)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
|
||||
enum netns_bpf_attach_type type)
|
||||
{
|
||||
struct bpf_prog *prog;
|
||||
int err;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
|
||||
/* Allow attaching only one prog or link for now */
|
||||
if (net->bpf.links[type]) {
|
||||
err = -E2BIG;
|
||||
goto out_unlock;
|
||||
}
|
||||
/* Links are not compatible with attaching prog directly */
|
||||
prog = rcu_dereference_protected(net->bpf.progs[type],
|
||||
lockdep_is_held(&netns_bpf_mutex));
|
||||
if (prog) {
|
||||
err = -EEXIST;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
switch (type) {
|
||||
case NETNS_BPF_FLOW_DISSECTOR:
|
||||
err = flow_dissector_bpf_prog_attach(net, link->prog);
|
||||
break;
|
||||
default:
|
||||
err = -EINVAL;
|
||||
break;
|
||||
}
|
||||
if (err)
|
||||
goto out_unlock;
|
||||
|
||||
net->bpf.links[type] = link;
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
|
||||
{
|
||||
enum netns_bpf_attach_type netns_type;
|
||||
struct bpf_link_primer link_primer;
|
||||
struct bpf_netns_link *net_link;
|
||||
enum bpf_attach_type type;
|
||||
struct net *net;
|
||||
int err;
|
||||
|
||||
if (attr->link_create.flags)
|
||||
return -EINVAL;
|
||||
|
||||
type = attr->link_create.attach_type;
|
||||
netns_type = to_netns_bpf_attach_type(type);
|
||||
if (netns_type < 0)
|
||||
return -EINVAL;
|
||||
|
||||
net = get_net_ns_by_fd(attr->link_create.target_fd);
|
||||
if (IS_ERR(net))
|
||||
return PTR_ERR(net);
|
||||
|
||||
net_link = kzalloc(sizeof(*net_link), GFP_USER);
|
||||
if (!net_link) {
|
||||
err = -ENOMEM;
|
||||
goto out_put_net;
|
||||
}
|
||||
bpf_link_init(&net_link->link, BPF_LINK_TYPE_NETNS,
|
||||
&bpf_netns_link_ops, prog);
|
||||
net_link->net = net;
|
||||
net_link->type = type;
|
||||
net_link->netns_type = netns_type;
|
||||
|
||||
err = bpf_link_prime(&net_link->link, &link_primer);
|
||||
if (err) {
|
||||
kfree(net_link);
|
||||
goto out_put_net;
|
||||
}
|
||||
|
||||
err = netns_bpf_link_attach(net, &net_link->link, netns_type);
|
||||
if (err) {
|
||||
bpf_link_cleanup(&link_primer);
|
||||
goto out_put_net;
|
||||
}
|
||||
|
||||
put_net(net);
|
||||
return bpf_link_settle(&link_primer);
|
||||
|
||||
out_put_net:
|
||||
put_net(net);
|
||||
return err;
|
||||
}
|
||||
|
||||
static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
|
||||
{
|
||||
enum netns_bpf_attach_type type;
|
||||
struct bpf_link *link;
|
||||
|
||||
mutex_lock(&netns_bpf_mutex);
|
||||
for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
|
||||
link = net->bpf.links[type];
|
||||
if (link)
|
||||
bpf_netns_link_auto_detach(link);
|
||||
else
|
||||
__netns_bpf_prog_detach(net, type);
|
||||
}
|
||||
mutex_unlock(&netns_bpf_mutex);
|
||||
}
|
||||
|
||||
static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
|
||||
.pre_exit = netns_bpf_pernet_pre_exit,
|
||||
};
|
||||
|
||||
static int __init netns_bpf_init(void)
|
||||
{
|
||||
return register_pernet_subsys(&netns_bpf_pernet_ops);
|
||||
}
|
||||
|
||||
subsys_initcall(netns_bpf_init);
|
501
kernel/bpf/ringbuf.c
Normal file
501
kernel/bpf/ringbuf.c
Normal file
@ -0,0 +1,501 @@
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/btf.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/irq_work.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/filter.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/wait.h>
|
||||
#include <linux/poll.h>
|
||||
#include <uapi/linux/btf.h>
|
||||
|
||||
#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE)
|
||||
|
||||
/* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */
|
||||
#define RINGBUF_PGOFF \
|
||||
(offsetof(struct bpf_ringbuf, consumer_pos) >> PAGE_SHIFT)
|
||||
/* consumer page and producer page */
|
||||
#define RINGBUF_POS_PAGES 2
|
||||
|
||||
#define RINGBUF_MAX_RECORD_SZ (UINT_MAX/4)
|
||||
|
||||
/* Maximum size of ring buffer area is limited by 32-bit page offset within
|
||||
* record header, counted in pages. Reserve 8 bits for extensibility, and take
|
||||
* into account few extra pages for consumer/producer pages and
|
||||
* non-mmap()'able parts. This gives 64GB limit, which seems plenty for single
|
||||
* ring buffer.
|
||||
*/
|
||||
#define RINGBUF_MAX_DATA_SZ \
|
||||
(((1ULL << 24) - RINGBUF_POS_PAGES - RINGBUF_PGOFF) * PAGE_SIZE)
|
||||
|
||||
struct bpf_ringbuf {
|
||||
wait_queue_head_t waitq;
|
||||
struct irq_work work;
|
||||
u64 mask;
|
||||
struct page **pages;
|
||||
int nr_pages;
|
||||
spinlock_t spinlock ____cacheline_aligned_in_smp;
|
||||
/* Consumer and producer counters are put into separate pages to allow
|
||||
* mapping consumer page as r/w, but restrict producer page to r/o.
|
||||
* This protects producer position from being modified by user-space
|
||||
* application and ruining in-kernel position tracking.
|
||||
*/
|
||||
unsigned long consumer_pos __aligned(PAGE_SIZE);
|
||||
unsigned long producer_pos __aligned(PAGE_SIZE);
|
||||
char data[] __aligned(PAGE_SIZE);
|
||||
};
|
||||
|
||||
struct bpf_ringbuf_map {
|
||||
struct bpf_map map;
|
||||
struct bpf_map_memory memory;
|
||||
struct bpf_ringbuf *rb;
|
||||
};
|
||||
|
||||
/* 8-byte ring buffer record header structure */
|
||||
struct bpf_ringbuf_hdr {
|
||||
u32 len;
|
||||
u32 pg_off;
|
||||
};
|
||||
|
||||
static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
|
||||
{
|
||||
const gfp_t flags = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN |
|
||||
__GFP_ZERO;
|
||||
int nr_meta_pages = RINGBUF_PGOFF + RINGBUF_POS_PAGES;
|
||||
int nr_data_pages = data_sz >> PAGE_SHIFT;
|
||||
int nr_pages = nr_meta_pages + nr_data_pages;
|
||||
struct page **pages, *page;
|
||||
struct bpf_ringbuf *rb;
|
||||
size_t array_size;
|
||||
int i;
|
||||
|
||||
/* Each data page is mapped twice to allow "virtual"
|
||||
* continuous read of samples wrapping around the end of ring
|
||||
* buffer area:
|
||||
* ------------------------------------------------------
|
||||
* | meta pages | real data pages | same data pages |
|
||||
* ------------------------------------------------------
|
||||
* | | 1 2 3 4 5 6 7 8 9 | 1 2 3 4 5 6 7 8 9 |
|
||||
* ------------------------------------------------------
|
||||
* | | TA DA | TA DA |
|
||||
* ------------------------------------------------------
|
||||
* ^^^^^^^
|
||||
* |
|
||||
* Here, no need to worry about special handling of wrapped-around
|
||||
* data due to double-mapped data pages. This works both in kernel and
|
||||
* when mmap()'ed in user-space, simplifying both kernel and
|
||||
* user-space implementations significantly.
|
||||
*/
|
||||
array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
|
||||
if (array_size > PAGE_SIZE)
|
||||
pages = vmalloc_node(array_size, numa_node);
|
||||
else
|
||||
pages = kmalloc_node(array_size, flags, numa_node);
|
||||
if (!pages)
|
||||
return NULL;
|
||||
|
||||
for (i = 0; i < nr_pages; i++) {
|
||||
page = alloc_pages_node(numa_node, flags, 0);
|
||||
if (!page) {
|
||||
nr_pages = i;
|
||||
goto err_free_pages;
|
||||
}
|
||||
pages[i] = page;
|
||||
if (i >= nr_meta_pages)
|
||||
pages[nr_data_pages + i] = page;
|
||||
}
|
||||
|
||||
rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
|
||||
VM_ALLOC | VM_USERMAP, PAGE_KERNEL);
|
||||
if (rb) {
|
||||
rb->pages = pages;
|
||||
rb->nr_pages = nr_pages;
|
||||
return rb;
|
||||
}
|
||||
|
||||
err_free_pages:
|
||||
for (i = 0; i < nr_pages; i++)
|
||||
__free_page(pages[i]);
|
||||
kvfree(pages);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static void bpf_ringbuf_notify(struct irq_work *work)
|
||||
{
|
||||
struct bpf_ringbuf *rb = container_of(work, struct bpf_ringbuf, work);
|
||||
|
||||
wake_up_all(&rb->waitq);
|
||||
}
|
||||
|
||||
static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
|
||||
{
|
||||
struct bpf_ringbuf *rb;
|
||||
|
||||
if (!data_sz || !PAGE_ALIGNED(data_sz))
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
/* on 32-bit arch, it's impossible to overflow record's hdr->pgoff */
|
||||
if (data_sz > RINGBUF_MAX_DATA_SZ)
|
||||
return ERR_PTR(-E2BIG);
|
||||
#endif
|
||||
|
||||
rb = bpf_ringbuf_area_alloc(data_sz, numa_node);
|
||||
if (!rb)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
spin_lock_init(&rb->spinlock);
|
||||
init_waitqueue_head(&rb->waitq);
|
||||
init_irq_work(&rb->work, bpf_ringbuf_notify);
|
||||
|
||||
rb->mask = data_sz - 1;
|
||||
rb->consumer_pos = 0;
|
||||
rb->producer_pos = 0;
|
||||
|
||||
return rb;
|
||||
}
|
||||
|
||||
static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
u64 cost;
|
||||
int err;
|
||||
|
||||
if (attr->map_flags & ~RINGBUF_CREATE_FLAG_MASK)
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
if (attr->key_size || attr->value_size ||
|
||||
attr->max_entries == 0 || !PAGE_ALIGNED(attr->max_entries))
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
rb_map = kzalloc(sizeof(*rb_map), GFP_USER);
|
||||
if (!rb_map)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
bpf_map_init_from_attr(&rb_map->map, attr);
|
||||
|
||||
cost = sizeof(struct bpf_ringbuf_map) +
|
||||
sizeof(struct bpf_ringbuf) +
|
||||
attr->max_entries;
|
||||
err = bpf_map_charge_init(&rb_map->map.memory, cost);
|
||||
if (err)
|
||||
goto err_free_map;
|
||||
|
||||
rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node);
|
||||
if (IS_ERR(rb_map->rb)) {
|
||||
err = PTR_ERR(rb_map->rb);
|
||||
goto err_uncharge;
|
||||
}
|
||||
|
||||
return &rb_map->map;
|
||||
|
||||
err_uncharge:
|
||||
bpf_map_charge_finish(&rb_map->map.memory);
|
||||
err_free_map:
|
||||
kfree(rb_map);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
|
||||
{
|
||||
/* copy pages pointer and nr_pages to local variable, as we are going
|
||||
* to unmap rb itself with vunmap() below
|
||||
*/
|
||||
struct page **pages = rb->pages;
|
||||
int i, nr_pages = rb->nr_pages;
|
||||
|
||||
vunmap(rb);
|
||||
for (i = 0; i < nr_pages; i++)
|
||||
__free_page(pages[i]);
|
||||
kvfree(pages);
|
||||
}
|
||||
|
||||
static void ringbuf_map_free(struct bpf_map *map)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
|
||||
/* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
|
||||
* so the programs (can be more than one that used this map) were
|
||||
* disconnected from events. Wait for outstanding critical sections in
|
||||
* these programs to complete
|
||||
*/
|
||||
synchronize_rcu();
|
||||
|
||||
rb_map = container_of(map, struct bpf_ringbuf_map, map);
|
||||
bpf_ringbuf_free(rb_map->rb);
|
||||
kfree(rb_map);
|
||||
}
|
||||
|
||||
static void *ringbuf_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return ERR_PTR(-ENOTSUPP);
|
||||
}
|
||||
|
||||
static int ringbuf_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 flags)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static int ringbuf_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static int ringbuf_map_get_next_key(struct bpf_map *map, void *key,
|
||||
void *next_key)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static size_t bpf_ringbuf_mmap_page_cnt(const struct bpf_ringbuf *rb)
|
||||
{
|
||||
size_t data_pages = (rb->mask + 1) >> PAGE_SHIFT;
|
||||
|
||||
/* consumer page + producer page + 2 x data pages */
|
||||
return RINGBUF_POS_PAGES + 2 * data_pages;
|
||||
}
|
||||
|
||||
static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
size_t mmap_sz;
|
||||
|
||||
rb_map = container_of(map, struct bpf_ringbuf_map, map);
|
||||
mmap_sz = bpf_ringbuf_mmap_page_cnt(rb_map->rb) << PAGE_SHIFT;
|
||||
|
||||
if (vma->vm_pgoff * PAGE_SIZE + (vma->vm_end - vma->vm_start) > mmap_sz)
|
||||
return -EINVAL;
|
||||
|
||||
return remap_vmalloc_range(vma, rb_map->rb,
|
||||
vma->vm_pgoff + RINGBUF_PGOFF);
|
||||
}
|
||||
|
||||
static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb)
|
||||
{
|
||||
unsigned long cons_pos, prod_pos;
|
||||
|
||||
cons_pos = smp_load_acquire(&rb->consumer_pos);
|
||||
prod_pos = smp_load_acquire(&rb->producer_pos);
|
||||
return prod_pos - cons_pos;
|
||||
}
|
||||
|
||||
static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
|
||||
struct poll_table_struct *pts)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
|
||||
rb_map = container_of(map, struct bpf_ringbuf_map, map);
|
||||
poll_wait(filp, &rb_map->rb->waitq, pts);
|
||||
|
||||
if (ringbuf_avail_data_sz(rb_map->rb))
|
||||
return EPOLLIN | EPOLLRDNORM;
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct bpf_map_ops ringbuf_map_ops = {
|
||||
.map_alloc = ringbuf_map_alloc,
|
||||
.map_free = ringbuf_map_free,
|
||||
.map_mmap = ringbuf_map_mmap,
|
||||
.map_poll = ringbuf_map_poll,
|
||||
.map_lookup_elem = ringbuf_map_lookup_elem,
|
||||
.map_update_elem = ringbuf_map_update_elem,
|
||||
.map_delete_elem = ringbuf_map_delete_elem,
|
||||
.map_get_next_key = ringbuf_map_get_next_key,
|
||||
};
|
||||
|
||||
/* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself,
|
||||
* calculate offset from record metadata to ring buffer in pages, rounded
|
||||
* down. This page offset is stored as part of record metadata and allows to
|
||||
* restore struct bpf_ringbuf * from record pointer. This page offset is
|
||||
* stored at offset 4 of record metadata header.
|
||||
*/
|
||||
static size_t bpf_ringbuf_rec_pg_off(struct bpf_ringbuf *rb,
|
||||
struct bpf_ringbuf_hdr *hdr)
|
||||
{
|
||||
return ((void *)hdr - (void *)rb) >> PAGE_SHIFT;
|
||||
}
|
||||
|
||||
/* Given pointer to ring buffer record header, restore pointer to struct
|
||||
* bpf_ringbuf itself by using page offset stored at offset 4
|
||||
*/
|
||||
static struct bpf_ringbuf *
|
||||
bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr)
|
||||
{
|
||||
unsigned long addr = (unsigned long)(void *)hdr;
|
||||
unsigned long off = (unsigned long)hdr->pg_off << PAGE_SHIFT;
|
||||
|
||||
return (void*)((addr & PAGE_MASK) - off);
|
||||
}
|
||||
|
||||
static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
|
||||
{
|
||||
unsigned long cons_pos, prod_pos, new_prod_pos, flags;
|
||||
u32 len, pg_off;
|
||||
struct bpf_ringbuf_hdr *hdr;
|
||||
|
||||
if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
|
||||
return NULL;
|
||||
|
||||
len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
|
||||
cons_pos = smp_load_acquire(&rb->consumer_pos);
|
||||
|
||||
if (in_nmi()) {
|
||||
if (!spin_trylock_irqsave(&rb->spinlock, flags))
|
||||
return NULL;
|
||||
} else {
|
||||
spin_lock_irqsave(&rb->spinlock, flags);
|
||||
}
|
||||
|
||||
prod_pos = rb->producer_pos;
|
||||
new_prod_pos = prod_pos + len;
|
||||
|
||||
/* check for out of ringbuf space by ensuring producer position
|
||||
* doesn't advance more than (ringbuf_size - 1) ahead
|
||||
*/
|
||||
if (new_prod_pos - cons_pos > rb->mask) {
|
||||
spin_unlock_irqrestore(&rb->spinlock, flags);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
hdr = (void *)rb->data + (prod_pos & rb->mask);
|
||||
pg_off = bpf_ringbuf_rec_pg_off(rb, hdr);
|
||||
hdr->len = size | BPF_RINGBUF_BUSY_BIT;
|
||||
hdr->pg_off = pg_off;
|
||||
|
||||
/* pairs with consumer's smp_load_acquire() */
|
||||
smp_store_release(&rb->producer_pos, new_prod_pos);
|
||||
|
||||
spin_unlock_irqrestore(&rb->spinlock, flags);
|
||||
|
||||
return (void *)hdr + BPF_RINGBUF_HDR_SZ;
|
||||
}
|
||||
|
||||
BPF_CALL_3(bpf_ringbuf_reserve, struct bpf_map *, map, u64, size, u64, flags)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
|
||||
if (unlikely(flags))
|
||||
return 0;
|
||||
|
||||
rb_map = container_of(map, struct bpf_ringbuf_map, map);
|
||||
return (unsigned long)__bpf_ringbuf_reserve(rb_map->rb, size);
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_ringbuf_reserve_proto = {
|
||||
.func = bpf_ringbuf_reserve,
|
||||
.ret_type = RET_PTR_TO_ALLOC_MEM_OR_NULL,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
|
||||
{
|
||||
unsigned long rec_pos, cons_pos;
|
||||
struct bpf_ringbuf_hdr *hdr;
|
||||
struct bpf_ringbuf *rb;
|
||||
u32 new_len;
|
||||
|
||||
hdr = sample - BPF_RINGBUF_HDR_SZ;
|
||||
rb = bpf_ringbuf_restore_from_rec(hdr);
|
||||
new_len = hdr->len ^ BPF_RINGBUF_BUSY_BIT;
|
||||
if (discard)
|
||||
new_len |= BPF_RINGBUF_DISCARD_BIT;
|
||||
|
||||
/* update record header with correct final size prefix */
|
||||
xchg(&hdr->len, new_len);
|
||||
|
||||
/* if consumer caught up and is waiting for our record, notify about
|
||||
* new data availability
|
||||
*/
|
||||
rec_pos = (void *)hdr - (void *)rb->data;
|
||||
cons_pos = smp_load_acquire(&rb->consumer_pos) & rb->mask;
|
||||
|
||||
if (flags & BPF_RB_FORCE_WAKEUP)
|
||||
irq_work_queue(&rb->work);
|
||||
else if (cons_pos == rec_pos && !(flags & BPF_RB_NO_WAKEUP))
|
||||
irq_work_queue(&rb->work);
|
||||
}
|
||||
|
||||
BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
|
||||
{
|
||||
bpf_ringbuf_commit(sample, flags, false /* discard */);
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_ringbuf_submit_proto = {
|
||||
.func = bpf_ringbuf_submit,
|
||||
.ret_type = RET_VOID,
|
||||
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
|
||||
{
|
||||
bpf_ringbuf_commit(sample, flags, true /* discard */);
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_ringbuf_discard_proto = {
|
||||
.func = bpf_ringbuf_discard,
|
||||
.ret_type = RET_VOID,
|
||||
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_4(bpf_ringbuf_output, struct bpf_map *, map, void *, data, u64, size,
|
||||
u64, flags)
|
||||
{
|
||||
struct bpf_ringbuf_map *rb_map;
|
||||
void *rec;
|
||||
|
||||
if (unlikely(flags & ~(BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP)))
|
||||
return -EINVAL;
|
||||
|
||||
rb_map = container_of(map, struct bpf_ringbuf_map, map);
|
||||
rec = __bpf_ringbuf_reserve(rb_map->rb, size);
|
||||
if (!rec)
|
||||
return -EAGAIN;
|
||||
|
||||
memcpy(rec, data, size);
|
||||
bpf_ringbuf_commit(rec, flags, false /* discard */);
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_ringbuf_output_proto = {
|
||||
.func = bpf_ringbuf_output,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_PTR_TO_MEM,
|
||||
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_2(bpf_ringbuf_query, struct bpf_map *, map, u64, flags)
|
||||
{
|
||||
struct bpf_ringbuf *rb;
|
||||
|
||||
rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
|
||||
|
||||
switch (flags) {
|
||||
case BPF_RB_AVAIL_DATA:
|
||||
return ringbuf_avail_data_sz(rb);
|
||||
case BPF_RB_RING_SIZE:
|
||||
return rb->mask + 1;
|
||||
case BPF_RB_CONS_POS:
|
||||
return smp_load_acquire(&rb->consumer_pos);
|
||||
case BPF_RB_PROD_POS:
|
||||
return smp_load_acquire(&rb->producer_pos);
|
||||
default:
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_ringbuf_query_proto = {
|
||||
.func = bpf_ringbuf_query,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_CONST_MAP_PTR,
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
};
|
@ -26,6 +26,8 @@
|
||||
#include <linux/audit.h>
|
||||
#include <uapi/linux/btf.h>
|
||||
#include <linux/bpf_lsm.h>
|
||||
#include <linux/poll.h>
|
||||
#include <linux/bpf-netns.h>
|
||||
|
||||
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
|
||||
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
|
||||
@ -662,6 +664,16 @@ out:
|
||||
return err;
|
||||
}
|
||||
|
||||
static __poll_t bpf_map_poll(struct file *filp, struct poll_table_struct *pts)
|
||||
{
|
||||
struct bpf_map *map = filp->private_data;
|
||||
|
||||
if (map->ops->map_poll)
|
||||
return map->ops->map_poll(map, filp, pts);
|
||||
|
||||
return EPOLLERR;
|
||||
}
|
||||
|
||||
const struct file_operations bpf_map_fops = {
|
||||
#ifdef CONFIG_PROC_FS
|
||||
.show_fdinfo = bpf_map_show_fdinfo,
|
||||
@ -670,6 +682,7 @@ const struct file_operations bpf_map_fops = {
|
||||
.read = bpf_dummy_read,
|
||||
.write = bpf_dummy_write,
|
||||
.mmap = bpf_map_mmap,
|
||||
.poll = bpf_map_poll,
|
||||
};
|
||||
|
||||
int bpf_map_new_fd(struct bpf_map *map, int flags)
|
||||
@ -1387,7 +1400,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
|
||||
|
||||
buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);
|
||||
if (!buf) {
|
||||
kvfree(buf_prevkey);
|
||||
kfree(buf_prevkey);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
@ -1472,7 +1485,8 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
|
||||
map = __bpf_map_get(f);
|
||||
if (IS_ERR(map))
|
||||
return PTR_ERR(map);
|
||||
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
|
||||
if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) ||
|
||||
!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
|
||||
err = -EPERM;
|
||||
goto err_put;
|
||||
}
|
||||
@ -2855,7 +2869,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
|
||||
ret = lirc_prog_attach(attr, prog);
|
||||
break;
|
||||
case BPF_PROG_TYPE_FLOW_DISSECTOR:
|
||||
ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
|
||||
ret = netns_bpf_prog_attach(attr, prog);
|
||||
break;
|
||||
case BPF_PROG_TYPE_CGROUP_DEVICE:
|
||||
case BPF_PROG_TYPE_CGROUP_SKB:
|
||||
@ -2895,7 +2909,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
|
||||
case BPF_PROG_TYPE_FLOW_DISSECTOR:
|
||||
if (!capable(CAP_NET_ADMIN))
|
||||
return -EPERM;
|
||||
return skb_flow_dissector_bpf_prog_detach(attr);
|
||||
return netns_bpf_prog_detach(attr);
|
||||
case BPF_PROG_TYPE_CGROUP_DEVICE:
|
||||
case BPF_PROG_TYPE_CGROUP_SKB:
|
||||
case BPF_PROG_TYPE_CGROUP_SOCK:
|
||||
@ -2948,7 +2962,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
|
||||
case BPF_LIRC_MODE2:
|
||||
return lirc_prog_query(attr, uattr);
|
||||
case BPF_FLOW_DISSECTOR:
|
||||
return skb_flow_dissector_prog_query(attr, uattr);
|
||||
return netns_bpf_prog_query(attr, uattr);
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
@ -3873,6 +3887,9 @@ static int link_create(union bpf_attr *attr)
|
||||
case BPF_PROG_TYPE_TRACING:
|
||||
ret = tracing_bpf_link_attach(attr, prog);
|
||||
break;
|
||||
case BPF_PROG_TYPE_FLOW_DISSECTOR:
|
||||
ret = netns_bpf_link_create(attr, prog);
|
||||
break;
|
||||
default:
|
||||
ret = -EINVAL;
|
||||
}
|
||||
@ -3924,7 +3941,7 @@ static int link_update(union bpf_attr *attr)
|
||||
if (link->ops->update_prog)
|
||||
ret = link->ops->update_prog(link, new_prog, old_prog);
|
||||
else
|
||||
ret = EINVAL;
|
||||
ret = -EINVAL;
|
||||
|
||||
out_put_progs:
|
||||
if (old_prog)
|
||||
|
@ -233,6 +233,7 @@ struct bpf_call_arg_meta {
|
||||
bool pkt_access;
|
||||
int regno;
|
||||
int access_size;
|
||||
int mem_size;
|
||||
u64 msize_max_value;
|
||||
int ref_obj_id;
|
||||
int func_id;
|
||||
@ -408,7 +409,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type type)
|
||||
type == PTR_TO_SOCKET_OR_NULL ||
|
||||
type == PTR_TO_SOCK_COMMON_OR_NULL ||
|
||||
type == PTR_TO_TCP_SOCK_OR_NULL ||
|
||||
type == PTR_TO_BTF_ID_OR_NULL;
|
||||
type == PTR_TO_BTF_ID_OR_NULL ||
|
||||
type == PTR_TO_MEM_OR_NULL;
|
||||
}
|
||||
|
||||
static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
|
||||
@ -422,7 +424,9 @@ static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
|
||||
return type == PTR_TO_SOCKET ||
|
||||
type == PTR_TO_SOCKET_OR_NULL ||
|
||||
type == PTR_TO_TCP_SOCK ||
|
||||
type == PTR_TO_TCP_SOCK_OR_NULL;
|
||||
type == PTR_TO_TCP_SOCK_OR_NULL ||
|
||||
type == PTR_TO_MEM ||
|
||||
type == PTR_TO_MEM_OR_NULL;
|
||||
}
|
||||
|
||||
static bool arg_type_may_be_refcounted(enum bpf_arg_type type)
|
||||
@ -436,7 +440,9 @@ static bool arg_type_may_be_refcounted(enum bpf_arg_type type)
|
||||
*/
|
||||
static bool is_release_function(enum bpf_func_id func_id)
|
||||
{
|
||||
return func_id == BPF_FUNC_sk_release;
|
||||
return func_id == BPF_FUNC_sk_release ||
|
||||
func_id == BPF_FUNC_ringbuf_submit ||
|
||||
func_id == BPF_FUNC_ringbuf_discard;
|
||||
}
|
||||
|
||||
static bool may_be_acquire_function(enum bpf_func_id func_id)
|
||||
@ -444,7 +450,8 @@ static bool may_be_acquire_function(enum bpf_func_id func_id)
|
||||
return func_id == BPF_FUNC_sk_lookup_tcp ||
|
||||
func_id == BPF_FUNC_sk_lookup_udp ||
|
||||
func_id == BPF_FUNC_skc_lookup_tcp ||
|
||||
func_id == BPF_FUNC_map_lookup_elem;
|
||||
func_id == BPF_FUNC_map_lookup_elem ||
|
||||
func_id == BPF_FUNC_ringbuf_reserve;
|
||||
}
|
||||
|
||||
static bool is_acquire_function(enum bpf_func_id func_id,
|
||||
@ -454,7 +461,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
|
||||
|
||||
if (func_id == BPF_FUNC_sk_lookup_tcp ||
|
||||
func_id == BPF_FUNC_sk_lookup_udp ||
|
||||
func_id == BPF_FUNC_skc_lookup_tcp)
|
||||
func_id == BPF_FUNC_skc_lookup_tcp ||
|
||||
func_id == BPF_FUNC_ringbuf_reserve)
|
||||
return true;
|
||||
|
||||
if (func_id == BPF_FUNC_map_lookup_elem &&
|
||||
@ -494,6 +502,8 @@ static const char * const reg_type_str[] = {
|
||||
[PTR_TO_XDP_SOCK] = "xdp_sock",
|
||||
[PTR_TO_BTF_ID] = "ptr_",
|
||||
[PTR_TO_BTF_ID_OR_NULL] = "ptr_or_null_",
|
||||
[PTR_TO_MEM] = "mem",
|
||||
[PTR_TO_MEM_OR_NULL] = "mem_or_null",
|
||||
};
|
||||
|
||||
static char slot_type_char[] = {
|
||||
@ -2468,32 +2478,49 @@ static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* check read/write into map element returned by bpf_map_lookup_elem() */
|
||||
static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
|
||||
int size, bool zero_size_allowed)
|
||||
/* check read/write into memory region (e.g., map value, ringbuf sample, etc) */
|
||||
static int __check_mem_access(struct bpf_verifier_env *env, int regno,
|
||||
int off, int size, u32 mem_size,
|
||||
bool zero_size_allowed)
|
||||
{
|
||||
struct bpf_reg_state *regs = cur_regs(env);
|
||||
struct bpf_map *map = regs[regno].map_ptr;
|
||||
bool size_ok = size > 0 || (size == 0 && zero_size_allowed);
|
||||
struct bpf_reg_state *reg;
|
||||
|
||||
if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) ||
|
||||
off + size > map->value_size) {
|
||||
if (off >= 0 && size_ok && (u64)off + size <= mem_size)
|
||||
return 0;
|
||||
|
||||
reg = &cur_regs(env)[regno];
|
||||
switch (reg->type) {
|
||||
case PTR_TO_MAP_VALUE:
|
||||
verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n",
|
||||
map->value_size, off, size);
|
||||
return -EACCES;
|
||||
mem_size, off, size);
|
||||
break;
|
||||
case PTR_TO_PACKET:
|
||||
case PTR_TO_PACKET_META:
|
||||
case PTR_TO_PACKET_END:
|
||||
verbose(env, "invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
|
||||
off, size, regno, reg->id, off, mem_size);
|
||||
break;
|
||||
case PTR_TO_MEM:
|
||||
default:
|
||||
verbose(env, "invalid access to memory, mem_size=%u off=%d size=%d\n",
|
||||
mem_size, off, size);
|
||||
}
|
||||
return 0;
|
||||
|
||||
return -EACCES;
|
||||
}
|
||||
|
||||
/* check read/write into a map element with possible variable offset */
|
||||
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
|
||||
int off, int size, bool zero_size_allowed)
|
||||
/* check read/write into a memory region with possible variable offset */
|
||||
static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
|
||||
int off, int size, u32 mem_size,
|
||||
bool zero_size_allowed)
|
||||
{
|
||||
struct bpf_verifier_state *vstate = env->cur_state;
|
||||
struct bpf_func_state *state = vstate->frame[vstate->curframe];
|
||||
struct bpf_reg_state *reg = &state->regs[regno];
|
||||
int err;
|
||||
|
||||
/* We may have adjusted the register to this map value, so we
|
||||
/* We may have adjusted the register pointing to memory region, so we
|
||||
* need to try adding each of min_value and max_value to off
|
||||
* to make sure our theoretical access will be safe.
|
||||
*/
|
||||
@ -2514,10 +2541,10 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
|
||||
regno);
|
||||
return -EACCES;
|
||||
}
|
||||
err = __check_map_access(env, regno, reg->smin_value + off, size,
|
||||
zero_size_allowed);
|
||||
err = __check_mem_access(env, regno, reg->smin_value + off, size,
|
||||
mem_size, zero_size_allowed);
|
||||
if (err) {
|
||||
verbose(env, "R%d min value is outside of the array range\n",
|
||||
verbose(env, "R%d min value is outside of the allowed memory range\n",
|
||||
regno);
|
||||
return err;
|
||||
}
|
||||
@ -2527,18 +2554,38 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
|
||||
* If reg->umax_value + off could overflow, treat that as unbounded too.
|
||||
*/
|
||||
if (reg->umax_value >= BPF_MAX_VAR_OFF) {
|
||||
verbose(env, "R%d unbounded memory access, make sure to bounds check any array access into a map\n",
|
||||
verbose(env, "R%d unbounded memory access, make sure to bounds check any such access\n",
|
||||
regno);
|
||||
return -EACCES;
|
||||
}
|
||||
err = __check_map_access(env, regno, reg->umax_value + off, size,
|
||||
zero_size_allowed);
|
||||
if (err)
|
||||
verbose(env, "R%d max value is outside of the array range\n",
|
||||
err = __check_mem_access(env, regno, reg->umax_value + off, size,
|
||||
mem_size, zero_size_allowed);
|
||||
if (err) {
|
||||
verbose(env, "R%d max value is outside of the allowed memory range\n",
|
||||
regno);
|
||||
return err;
|
||||
}
|
||||
|
||||
if (map_value_has_spin_lock(reg->map_ptr)) {
|
||||
u32 lock = reg->map_ptr->spin_lock_off;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* check read/write into a map element with possible variable offset */
|
||||
static int check_map_access(struct bpf_verifier_env *env, u32 regno,
|
||||
int off, int size, bool zero_size_allowed)
|
||||
{
|
||||
struct bpf_verifier_state *vstate = env->cur_state;
|
||||
struct bpf_func_state *state = vstate->frame[vstate->curframe];
|
||||
struct bpf_reg_state *reg = &state->regs[regno];
|
||||
struct bpf_map *map = reg->map_ptr;
|
||||
int err;
|
||||
|
||||
err = check_mem_region_access(env, regno, off, size, map->value_size,
|
||||
zero_size_allowed);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (map_value_has_spin_lock(map)) {
|
||||
u32 lock = map->spin_lock_off;
|
||||
|
||||
/* if any part of struct bpf_spin_lock can be touched by
|
||||
* load/store reject this program.
|
||||
@ -2596,21 +2643,6 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
|
||||
}
|
||||
}
|
||||
|
||||
static int __check_packet_access(struct bpf_verifier_env *env, u32 regno,
|
||||
int off, int size, bool zero_size_allowed)
|
||||
{
|
||||
struct bpf_reg_state *regs = cur_regs(env);
|
||||
struct bpf_reg_state *reg = ®s[regno];
|
||||
|
||||
if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) ||
|
||||
(u64)off + size > reg->range) {
|
||||
verbose(env, "invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
|
||||
off, size, regno, reg->id, reg->off, reg->range);
|
||||
return -EACCES;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
|
||||
int size, bool zero_size_allowed)
|
||||
{
|
||||
@ -2631,16 +2663,17 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
|
||||
regno);
|
||||
return -EACCES;
|
||||
}
|
||||
err = __check_packet_access(env, regno, off, size, zero_size_allowed);
|
||||
err = __check_mem_access(env, regno, off, size, reg->range,
|
||||
zero_size_allowed);
|
||||
if (err) {
|
||||
verbose(env, "R%d offset is outside of the packet\n", regno);
|
||||
return err;
|
||||
}
|
||||
|
||||
/* __check_packet_access has made sure "off + size - 1" is within u16.
|
||||
/* __check_mem_access has made sure "off + size - 1" is within u16.
|
||||
* reg->umax_value can't be bigger than MAX_PACKET_OFF which is 0xffff,
|
||||
* otherwise find_good_pkt_pointers would have refused to set range info
|
||||
* that __check_packet_access would have rejected this pkt access.
|
||||
* that __check_mem_access would have rejected this pkt access.
|
||||
* Therefore, "off + reg->umax_value + size - 1" won't overflow u32.
|
||||
*/
|
||||
env->prog->aux->max_pkt_offset =
|
||||
@ -3220,6 +3253,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
|
||||
mark_reg_unknown(env, regs, value_regno);
|
||||
}
|
||||
}
|
||||
} else if (reg->type == PTR_TO_MEM) {
|
||||
if (t == BPF_WRITE && value_regno >= 0 &&
|
||||
is_pointer_value(env, value_regno)) {
|
||||
verbose(env, "R%d leaks addr into mem\n", value_regno);
|
||||
return -EACCES;
|
||||
}
|
||||
err = check_mem_region_access(env, regno, off, size,
|
||||
reg->mem_size, false);
|
||||
if (!err && t == BPF_READ && value_regno >= 0)
|
||||
mark_reg_unknown(env, regs, value_regno);
|
||||
} else if (reg->type == PTR_TO_CTX) {
|
||||
enum bpf_reg_type reg_type = SCALAR_VALUE;
|
||||
u32 btf_id = 0;
|
||||
@ -3557,6 +3600,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
|
||||
return -EACCES;
|
||||
return check_map_access(env, regno, reg->off, access_size,
|
||||
zero_size_allowed);
|
||||
case PTR_TO_MEM:
|
||||
return check_mem_region_access(env, regno, reg->off,
|
||||
access_size, reg->mem_size,
|
||||
zero_size_allowed);
|
||||
default: /* scalar_value|ptr_to_stack or invalid ptr */
|
||||
return check_stack_boundary(env, regno, access_size,
|
||||
zero_size_allowed, meta);
|
||||
@ -3661,6 +3708,17 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
|
||||
type == ARG_CONST_SIZE_OR_ZERO;
|
||||
}
|
||||
|
||||
static bool arg_type_is_alloc_mem_ptr(enum bpf_arg_type type)
|
||||
{
|
||||
return type == ARG_PTR_TO_ALLOC_MEM ||
|
||||
type == ARG_PTR_TO_ALLOC_MEM_OR_NULL;
|
||||
}
|
||||
|
||||
static bool arg_type_is_alloc_size(enum bpf_arg_type type)
|
||||
{
|
||||
return type == ARG_CONST_ALLOC_SIZE_OR_ZERO;
|
||||
}
|
||||
|
||||
static bool arg_type_is_int_ptr(enum bpf_arg_type type)
|
||||
{
|
||||
return type == ARG_PTR_TO_INT ||
|
||||
@ -3720,7 +3778,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
|
||||
type != expected_type)
|
||||
goto err_type;
|
||||
} else if (arg_type == ARG_CONST_SIZE ||
|
||||
arg_type == ARG_CONST_SIZE_OR_ZERO) {
|
||||
arg_type == ARG_CONST_SIZE_OR_ZERO ||
|
||||
arg_type == ARG_CONST_ALLOC_SIZE_OR_ZERO) {
|
||||
expected_type = SCALAR_VALUE;
|
||||
if (type != expected_type)
|
||||
goto err_type;
|
||||
@ -3791,13 +3850,29 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
|
||||
* happens during stack boundary checking.
|
||||
*/
|
||||
if (register_is_null(reg) &&
|
||||
arg_type == ARG_PTR_TO_MEM_OR_NULL)
|
||||
(arg_type == ARG_PTR_TO_MEM_OR_NULL ||
|
||||
arg_type == ARG_PTR_TO_ALLOC_MEM_OR_NULL))
|
||||
/* final test in check_stack_boundary() */;
|
||||
else if (!type_is_pkt_pointer(type) &&
|
||||
type != PTR_TO_MAP_VALUE &&
|
||||
type != PTR_TO_MEM &&
|
||||
type != expected_type)
|
||||
goto err_type;
|
||||
meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
|
||||
} else if (arg_type_is_alloc_mem_ptr(arg_type)) {
|
||||
expected_type = PTR_TO_MEM;
|
||||
if (register_is_null(reg) &&
|
||||
arg_type == ARG_PTR_TO_ALLOC_MEM_OR_NULL)
|
||||
/* final test in check_stack_boundary() */;
|
||||
else if (type != expected_type)
|
||||
goto err_type;
|
||||
if (meta->ref_obj_id) {
|
||||
verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
|
||||
regno, reg->ref_obj_id,
|
||||
meta->ref_obj_id);
|
||||
return -EFAULT;
|
||||
}
|
||||
meta->ref_obj_id = reg->ref_obj_id;
|
||||
} else if (arg_type_is_int_ptr(arg_type)) {
|
||||
expected_type = PTR_TO_STACK;
|
||||
if (!type_is_pkt_pointer(type) &&
|
||||
@ -3893,6 +3968,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
|
||||
zero_size_allowed, meta);
|
||||
if (!err)
|
||||
err = mark_chain_precision(env, regno);
|
||||
} else if (arg_type_is_alloc_size(arg_type)) {
|
||||
if (!tnum_is_const(reg->var_off)) {
|
||||
verbose(env, "R%d unbounded size, use 'var &= const' or 'if (var < const)'\n",
|
||||
regno);
|
||||
return -EACCES;
|
||||
}
|
||||
meta->mem_size = reg->var_off.value;
|
||||
} else if (arg_type_is_int_ptr(arg_type)) {
|
||||
int size = int_ptr_type_to_size(arg_type);
|
||||
|
||||
@ -3929,6 +4011,14 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
|
||||
func_id != BPF_FUNC_xdp_output)
|
||||
goto error;
|
||||
break;
|
||||
case BPF_MAP_TYPE_RINGBUF:
|
||||
if (func_id != BPF_FUNC_ringbuf_output &&
|
||||
func_id != BPF_FUNC_ringbuf_reserve &&
|
||||
func_id != BPF_FUNC_ringbuf_submit &&
|
||||
func_id != BPF_FUNC_ringbuf_discard &&
|
||||
func_id != BPF_FUNC_ringbuf_query)
|
||||
goto error;
|
||||
break;
|
||||
case BPF_MAP_TYPE_STACK_TRACE:
|
||||
if (func_id != BPF_FUNC_get_stackid)
|
||||
goto error;
|
||||
@ -4655,6 +4745,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
|
||||
mark_reg_known_zero(env, regs, BPF_REG_0);
|
||||
regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL;
|
||||
regs[BPF_REG_0].id = ++env->id_gen;
|
||||
} else if (fn->ret_type == RET_PTR_TO_ALLOC_MEM_OR_NULL) {
|
||||
mark_reg_known_zero(env, regs, BPF_REG_0);
|
||||
regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
|
||||
regs[BPF_REG_0].id = ++env->id_gen;
|
||||
regs[BPF_REG_0].mem_size = meta.mem_size;
|
||||
} else {
|
||||
verbose(env, "unknown return type %d of func %s#%d\n",
|
||||
fn->ret_type, func_id_name(func_id), func_id);
|
||||
@ -6611,6 +6706,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
|
||||
reg->type = PTR_TO_TCP_SOCK;
|
||||
} else if (reg->type == PTR_TO_BTF_ID_OR_NULL) {
|
||||
reg->type = PTR_TO_BTF_ID;
|
||||
} else if (reg->type == PTR_TO_MEM_OR_NULL) {
|
||||
reg->type = PTR_TO_MEM;
|
||||
}
|
||||
if (is_null) {
|
||||
/* We don't need id and ref_obj_id from this point
|
||||
|
@ -147,7 +147,7 @@ BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_probe_read_user_proto = {
|
||||
const struct bpf_func_proto bpf_probe_read_user_proto = {
|
||||
.func = bpf_probe_read_user,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -167,7 +167,7 @@ BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
|
||||
const struct bpf_func_proto bpf_probe_read_user_str_proto = {
|
||||
.func = bpf_probe_read_user_str,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -198,7 +198,7 @@ BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
|
||||
return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
|
||||
const struct bpf_func_proto bpf_probe_read_kernel_proto = {
|
||||
.func = bpf_probe_read_kernel,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -253,7 +253,7 @@ BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
|
||||
return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
|
||||
const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
|
||||
.func = bpf_probe_read_kernel_str,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -585,9 +585,9 @@ BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
|
||||
goto out;
|
||||
}
|
||||
|
||||
err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
|
||||
(void *) (long) args[fmt_cnt],
|
||||
MAX_SEQ_PRINTF_STR_LEN);
|
||||
err = strncpy_from_unsafe_strict(bufs->buf[memcpy_cnt],
|
||||
(void *) (long) args[fmt_cnt],
|
||||
MAX_SEQ_PRINTF_STR_LEN);
|
||||
if (err < 0)
|
||||
bufs->buf[memcpy_cnt][0] = '\0';
|
||||
params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
|
||||
@ -907,7 +907,7 @@ BPF_CALL_0(bpf_get_current_task)
|
||||
return (long) current;
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_get_current_task_proto = {
|
||||
const struct bpf_func_proto bpf_get_current_task_proto = {
|
||||
.func = bpf_get_current_task,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -1088,6 +1088,16 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_perf_event_read_value_proto;
|
||||
case BPF_FUNC_get_ns_current_pid_tgid:
|
||||
return &bpf_get_ns_current_pid_tgid_proto;
|
||||
case BPF_FUNC_ringbuf_output:
|
||||
return &bpf_ringbuf_output_proto;
|
||||
case BPF_FUNC_ringbuf_reserve:
|
||||
return &bpf_ringbuf_reserve_proto;
|
||||
case BPF_FUNC_ringbuf_submit:
|
||||
return &bpf_ringbuf_submit_proto;
|
||||
case BPF_FUNC_ringbuf_discard:
|
||||
return &bpf_ringbuf_discard_proto;
|
||||
case BPF_FUNC_ringbuf_query:
|
||||
return &bpf_ringbuf_query_proto;
|
||||
default:
|
||||
return NULL;
|
||||
}
|
||||
@ -1457,7 +1467,7 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
}
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto *
|
||||
const struct bpf_func_proto *
|
||||
tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
{
|
||||
switch (func_id) {
|
||||
|
@ -5420,6 +5420,18 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
|
||||
struct bpf_prog *new = xdp->prog;
|
||||
int ret = 0;
|
||||
|
||||
if (new) {
|
||||
u32 i;
|
||||
|
||||
/* generic XDP does not work with DEVMAPs that can
|
||||
* have a bpf_prog installed on an entry
|
||||
*/
|
||||
for (i = 0; i < new->aux->used_map_cnt; i++) {
|
||||
if (dev_map_can_have_prog(new->aux->used_maps[i]))
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
switch (xdp->command) {
|
||||
case XDP_SETUP_PROG:
|
||||
rcu_assign_pointer(dev->xdp_prog, new);
|
||||
@ -8835,6 +8847,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (prog->expected_attach_type == BPF_XDP_DEVMAP) {
|
||||
NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
|
||||
bpf_prog_put(prog);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* prog->aux->id may be 0 for orphaned device-bound progs */
|
||||
if (prog->aux->id && prog->aux->id == prog_id) {
|
||||
bpf_prog_put(prog);
|
||||
|
@ -4248,6 +4248,9 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
|
||||
static int _bpf_setsockopt(struct sock *sk, int level, int optname,
|
||||
char *optval, int optlen, u32 flags)
|
||||
{
|
||||
char devname[IFNAMSIZ];
|
||||
struct net *net;
|
||||
int ifindex;
|
||||
int ret = 0;
|
||||
int val;
|
||||
|
||||
@ -4257,7 +4260,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
|
||||
sock_owned_by_me(sk);
|
||||
|
||||
if (level == SOL_SOCKET) {
|
||||
if (optlen != sizeof(int))
|
||||
if (optlen != sizeof(int) && optname != SO_BINDTODEVICE)
|
||||
return -EINVAL;
|
||||
val = *((int *)optval);
|
||||
|
||||
@ -4298,6 +4301,29 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
|
||||
sk_dst_reset(sk);
|
||||
}
|
||||
break;
|
||||
case SO_BINDTODEVICE:
|
||||
ret = -ENOPROTOOPT;
|
||||
#ifdef CONFIG_NETDEVICES
|
||||
optlen = min_t(long, optlen, IFNAMSIZ - 1);
|
||||
strncpy(devname, optval, optlen);
|
||||
devname[optlen] = 0;
|
||||
|
||||
ifindex = 0;
|
||||
if (devname[0] != '\0') {
|
||||
struct net_device *dev;
|
||||
|
||||
ret = -ENODEV;
|
||||
|
||||
net = sock_net(sk);
|
||||
dev = dev_get_by_name(net, devname);
|
||||
if (!dev)
|
||||
break;
|
||||
ifindex = dev->ifindex;
|
||||
dev_put(dev);
|
||||
}
|
||||
ret = sock_bindtoindex(sk, ifindex, false);
|
||||
#endif
|
||||
break;
|
||||
default:
|
||||
ret = -EINVAL;
|
||||
}
|
||||
@ -6443,6 +6469,26 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_msg_push_data_proto;
|
||||
case BPF_FUNC_msg_pop_data:
|
||||
return &bpf_msg_pop_data_proto;
|
||||
case BPF_FUNC_perf_event_output:
|
||||
return &bpf_event_output_data_proto;
|
||||
case BPF_FUNC_get_current_uid_gid:
|
||||
return &bpf_get_current_uid_gid_proto;
|
||||
case BPF_FUNC_get_current_pid_tgid:
|
||||
return &bpf_get_current_pid_tgid_proto;
|
||||
case BPF_FUNC_sk_storage_get:
|
||||
return &bpf_sk_storage_get_proto;
|
||||
case BPF_FUNC_sk_storage_delete:
|
||||
return &bpf_sk_storage_delete_proto;
|
||||
#ifdef CONFIG_CGROUPS
|
||||
case BPF_FUNC_get_current_cgroup_id:
|
||||
return &bpf_get_current_cgroup_id_proto;
|
||||
case BPF_FUNC_get_current_ancestor_cgroup_id:
|
||||
return &bpf_get_current_ancestor_cgroup_id_proto;
|
||||
#endif
|
||||
#ifdef CONFIG_CGROUP_NET_CLASSID
|
||||
case BPF_FUNC_get_cgroup_classid:
|
||||
return &bpf_get_cgroup_classid_curr_proto;
|
||||
#endif
|
||||
default:
|
||||
return bpf_base_func_proto(func_id);
|
||||
}
|
||||
@ -6829,6 +6875,7 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
|
||||
case offsetof(struct bpf_sock, protocol):
|
||||
case offsetof(struct bpf_sock, dst_port):
|
||||
case offsetof(struct bpf_sock, src_port):
|
||||
case offsetof(struct bpf_sock, rx_queue_mapping):
|
||||
case bpf_ctx_range(struct bpf_sock, src_ip4):
|
||||
case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
|
||||
case bpf_ctx_range(struct bpf_sock, dst_ip4):
|
||||
@ -6994,6 +7041,13 @@ static bool xdp_is_valid_access(int off, int size,
|
||||
const struct bpf_prog *prog,
|
||||
struct bpf_insn_access_aux *info)
|
||||
{
|
||||
if (prog->expected_attach_type != BPF_XDP_DEVMAP) {
|
||||
switch (off) {
|
||||
case offsetof(struct xdp_md, egress_ifindex):
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if (type == BPF_WRITE) {
|
||||
if (bpf_prog_is_dev_bound(prog->aux)) {
|
||||
switch (off) {
|
||||
@ -7257,6 +7311,11 @@ static bool sk_msg_is_valid_access(int off, int size,
|
||||
if (size != sizeof(__u64))
|
||||
return false;
|
||||
break;
|
||||
case offsetof(struct sk_msg_md, sk):
|
||||
if (size != sizeof(__u64))
|
||||
return false;
|
||||
info->reg_type = PTR_TO_SOCKET;
|
||||
break;
|
||||
case bpf_ctx_range(struct sk_msg_md, family):
|
||||
case bpf_ctx_range(struct sk_msg_md, remote_ip4):
|
||||
case bpf_ctx_range(struct sk_msg_md, local_ip4):
|
||||
@ -7872,6 +7931,23 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
|
||||
skc_state),
|
||||
target_size));
|
||||
break;
|
||||
case offsetof(struct bpf_sock, rx_queue_mapping):
|
||||
#ifdef CONFIG_XPS
|
||||
*insn++ = BPF_LDX_MEM(
|
||||
BPF_FIELD_SIZEOF(struct sock, sk_rx_queue_mapping),
|
||||
si->dst_reg, si->src_reg,
|
||||
bpf_target_off(struct sock, sk_rx_queue_mapping,
|
||||
sizeof_field(struct sock,
|
||||
sk_rx_queue_mapping),
|
||||
target_size));
|
||||
*insn++ = BPF_JMP_IMM(BPF_JNE, si->dst_reg, NO_QUEUE_MAPPING,
|
||||
1);
|
||||
*insn++ = BPF_MOV64_IMM(si->dst_reg, -1);
|
||||
#else
|
||||
*insn++ = BPF_MOV64_IMM(si->dst_reg, -1);
|
||||
*target_size = 2;
|
||||
#endif
|
||||
break;
|
||||
}
|
||||
|
||||
return insn - insn_buf;
|
||||
@ -7942,6 +8018,16 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
|
||||
offsetof(struct xdp_rxq_info,
|
||||
queue_index));
|
||||
break;
|
||||
case offsetof(struct xdp_md, egress_ifindex):
|
||||
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
|
||||
si->dst_reg, si->src_reg,
|
||||
offsetof(struct xdp_buff, txq));
|
||||
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
|
||||
si->dst_reg, si->dst_reg,
|
||||
offsetof(struct xdp_txq_info, dev));
|
||||
*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
|
||||
offsetof(struct net_device, ifindex));
|
||||
break;
|
||||
}
|
||||
|
||||
return insn - insn_buf;
|
||||
@ -8593,6 +8679,12 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
|
||||
si->dst_reg, si->src_reg,
|
||||
offsetof(struct sk_msg_sg, size));
|
||||
break;
|
||||
|
||||
case offsetof(struct sk_msg_md, sk):
|
||||
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_msg, sk),
|
||||
si->dst_reg, si->src_reg,
|
||||
offsetof(struct sk_msg, sk));
|
||||
break;
|
||||
}
|
||||
|
||||
return insn - insn_buf;
|
||||
|
@ -31,8 +31,7 @@
|
||||
#include <net/netfilter/nf_conntrack_core.h>
|
||||
#include <net/netfilter/nf_conntrack_labels.h>
|
||||
#endif
|
||||
|
||||
static DEFINE_MUTEX(flow_dissector_mutex);
|
||||
#include <linux/bpf-netns.h>
|
||||
|
||||
static void dissector_set_key(struct flow_dissector *flow_dissector,
|
||||
enum flow_dissector_key_id key_id)
|
||||
@ -70,54 +69,11 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
|
||||
}
|
||||
EXPORT_SYMBOL(skb_flow_dissector_init);
|
||||
|
||||
int skb_flow_dissector_prog_query(const union bpf_attr *attr,
|
||||
union bpf_attr __user *uattr)
|
||||
#ifdef CONFIG_BPF_SYSCALL
|
||||
int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
|
||||
{
|
||||
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
|
||||
u32 prog_id, prog_cnt = 0, flags = 0;
|
||||
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
|
||||
struct bpf_prog *attached;
|
||||
struct net *net;
|
||||
|
||||
if (attr->query.query_flags)
|
||||
return -EINVAL;
|
||||
|
||||
net = get_net_ns_by_fd(attr->query.target_fd);
|
||||
if (IS_ERR(net))
|
||||
return PTR_ERR(net);
|
||||
|
||||
rcu_read_lock();
|
||||
attached = rcu_dereference(net->flow_dissector_prog);
|
||||
if (attached) {
|
||||
prog_cnt = 1;
|
||||
prog_id = attached->aux->id;
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
put_net(net);
|
||||
|
||||
if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
|
||||
return -EFAULT;
|
||||
if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
|
||||
return -EFAULT;
|
||||
|
||||
if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
|
||||
return 0;
|
||||
|
||||
if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
struct bpf_prog *attached;
|
||||
struct net *net;
|
||||
int ret = 0;
|
||||
|
||||
net = current->nsproxy->net_ns;
|
||||
mutex_lock(&flow_dissector_mutex);
|
||||
|
||||
if (net == &init_net) {
|
||||
/* BPF flow dissector in the root namespace overrides
|
||||
@ -130,70 +86,29 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
|
||||
for_each_net(ns) {
|
||||
if (ns == &init_net)
|
||||
continue;
|
||||
if (rcu_access_pointer(ns->flow_dissector_prog)) {
|
||||
ret = -EEXIST;
|
||||
goto out;
|
||||
}
|
||||
if (rcu_access_pointer(ns->bpf.progs[type]))
|
||||
return -EEXIST;
|
||||
}
|
||||
} else {
|
||||
/* Make sure root flow dissector is not attached
|
||||
* when attaching to the non-root namespace.
|
||||
*/
|
||||
if (rcu_access_pointer(init_net.flow_dissector_prog)) {
|
||||
ret = -EEXIST;
|
||||
goto out;
|
||||
}
|
||||
if (rcu_access_pointer(init_net.bpf.progs[type]))
|
||||
return -EEXIST;
|
||||
}
|
||||
|
||||
attached = rcu_dereference_protected(net->flow_dissector_prog,
|
||||
lockdep_is_held(&flow_dissector_mutex));
|
||||
if (attached == prog) {
|
||||
attached = rcu_dereference_protected(net->bpf.progs[type],
|
||||
lockdep_is_held(&netns_bpf_mutex));
|
||||
if (attached == prog)
|
||||
/* The same program cannot be attached twice */
|
||||
ret = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
rcu_assign_pointer(net->flow_dissector_prog, prog);
|
||||
return -EINVAL;
|
||||
|
||||
rcu_assign_pointer(net->bpf.progs[type], prog);
|
||||
if (attached)
|
||||
bpf_prog_put(attached);
|
||||
out:
|
||||
mutex_unlock(&flow_dissector_mutex);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int flow_dissector_bpf_prog_detach(struct net *net)
|
||||
{
|
||||
struct bpf_prog *attached;
|
||||
|
||||
mutex_lock(&flow_dissector_mutex);
|
||||
attached = rcu_dereference_protected(net->flow_dissector_prog,
|
||||
lockdep_is_held(&flow_dissector_mutex));
|
||||
if (!attached) {
|
||||
mutex_unlock(&flow_dissector_mutex);
|
||||
return -ENOENT;
|
||||
}
|
||||
RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
|
||||
bpf_prog_put(attached);
|
||||
mutex_unlock(&flow_dissector_mutex);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
|
||||
{
|
||||
return flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
|
||||
}
|
||||
|
||||
static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
|
||||
{
|
||||
/* We're not racing with attach/detach because there are no
|
||||
* references to netns left when pre_exit gets called.
|
||||
*/
|
||||
if (rcu_access_pointer(net->flow_dissector_prog))
|
||||
flow_dissector_bpf_prog_detach(net);
|
||||
}
|
||||
|
||||
static struct pernet_operations flow_dissector_pernet_ops __net_initdata = {
|
||||
.pre_exit = flow_dissector_pernet_pre_exit,
|
||||
};
|
||||
#endif /* CONFIG_BPF_SYSCALL */
|
||||
|
||||
/**
|
||||
* __skb_flow_get_ports - extract the upper layer ports and return them
|
||||
@ -1044,11 +959,13 @@ bool __skb_flow_dissect(const struct net *net,
|
||||
|
||||
WARN_ON_ONCE(!net);
|
||||
if (net) {
|
||||
enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
|
||||
|
||||
rcu_read_lock();
|
||||
attached = rcu_dereference(init_net.flow_dissector_prog);
|
||||
attached = rcu_dereference(init_net.bpf.progs[type]);
|
||||
|
||||
if (!attached)
|
||||
attached = rcu_dereference(net->flow_dissector_prog);
|
||||
attached = rcu_dereference(net->bpf.progs[type]);
|
||||
|
||||
if (attached) {
|
||||
struct bpf_flow_keys flow_keys;
|
||||
@ -1869,7 +1786,6 @@ static int __init init_default_flow_dissectors(void)
|
||||
skb_flow_dissector_init(&flow_keys_basic_dissector,
|
||||
flow_keys_basic_dissector_keys,
|
||||
ARRAY_SIZE(flow_keys_basic_dissector_keys));
|
||||
|
||||
return register_pernet_subsys(&flow_dissector_pernet_ops);
|
||||
return 0;
|
||||
}
|
||||
core_initcall(init_default_flow_dissectors);
|
||||
|
@ -7,6 +7,7 @@
|
||||
|
||||
#include <net/sock.h>
|
||||
#include <net/tcp.h>
|
||||
#include <net/tls.h>
|
||||
|
||||
static bool sk_msg_try_coalesce_ok(struct sk_msg *msg, int elem_first_coalesce)
|
||||
{
|
||||
@ -682,13 +683,75 @@ static struct sk_psock *sk_psock_from_strp(struct strparser *strp)
|
||||
return container_of(parser, struct sk_psock, parser);
|
||||
}
|
||||
|
||||
static void sk_psock_verdict_apply(struct sk_psock *psock,
|
||||
struct sk_buff *skb, int verdict)
|
||||
static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb)
|
||||
{
|
||||
struct sk_psock *psock_other;
|
||||
struct sock *sk_other;
|
||||
bool ingress;
|
||||
|
||||
sk_other = tcp_skb_bpf_redirect_fetch(skb);
|
||||
if (unlikely(!sk_other)) {
|
||||
kfree_skb(skb);
|
||||
return;
|
||||
}
|
||||
psock_other = sk_psock(sk_other);
|
||||
if (!psock_other || sock_flag(sk_other, SOCK_DEAD) ||
|
||||
!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
|
||||
kfree_skb(skb);
|
||||
return;
|
||||
}
|
||||
|
||||
ingress = tcp_skb_bpf_ingress(skb);
|
||||
if ((!ingress && sock_writeable(sk_other)) ||
|
||||
(ingress &&
|
||||
atomic_read(&sk_other->sk_rmem_alloc) <=
|
||||
sk_other->sk_rcvbuf)) {
|
||||
if (!ingress)
|
||||
skb_set_owner_w(skb, sk_other);
|
||||
skb_queue_tail(&psock_other->ingress_skb, skb);
|
||||
schedule_work(&psock_other->work);
|
||||
} else {
|
||||
kfree_skb(skb);
|
||||
}
|
||||
}
|
||||
|
||||
static void sk_psock_tls_verdict_apply(struct sk_psock *psock,
|
||||
struct sk_buff *skb, int verdict)
|
||||
{
|
||||
switch (verdict) {
|
||||
case __SK_REDIRECT:
|
||||
sk_psock_skb_redirect(psock, skb);
|
||||
break;
|
||||
case __SK_PASS:
|
||||
case __SK_DROP:
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
|
||||
{
|
||||
struct bpf_prog *prog;
|
||||
int ret = __SK_PASS;
|
||||
|
||||
rcu_read_lock();
|
||||
prog = READ_ONCE(psock->progs.skb_verdict);
|
||||
if (likely(prog)) {
|
||||
tcp_skb_bpf_redirect_clear(skb);
|
||||
ret = sk_psock_bpf_run(psock, prog, skb);
|
||||
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
|
||||
}
|
||||
rcu_read_unlock();
|
||||
sk_psock_tls_verdict_apply(psock, skb, ret);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
|
||||
|
||||
static void sk_psock_verdict_apply(struct sk_psock *psock,
|
||||
struct sk_buff *skb, int verdict)
|
||||
{
|
||||
struct sock *sk_other;
|
||||
|
||||
switch (verdict) {
|
||||
case __SK_PASS:
|
||||
sk_other = psock->sk;
|
||||
@ -707,25 +770,8 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
|
||||
}
|
||||
goto out_free;
|
||||
case __SK_REDIRECT:
|
||||
sk_other = tcp_skb_bpf_redirect_fetch(skb);
|
||||
if (unlikely(!sk_other))
|
||||
goto out_free;
|
||||
psock_other = sk_psock(sk_other);
|
||||
if (!psock_other || sock_flag(sk_other, SOCK_DEAD) ||
|
||||
!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED))
|
||||
goto out_free;
|
||||
ingress = tcp_skb_bpf_ingress(skb);
|
||||
if ((!ingress && sock_writeable(sk_other)) ||
|
||||
(ingress &&
|
||||
atomic_read(&sk_other->sk_rmem_alloc) <=
|
||||
sk_other->sk_rcvbuf)) {
|
||||
if (!ingress)
|
||||
skb_set_owner_w(skb, sk_other);
|
||||
skb_queue_tail(&psock_other->ingress_skb, skb);
|
||||
schedule_work(&psock_other->work);
|
||||
break;
|
||||
}
|
||||
/* fall-through */
|
||||
sk_psock_skb_redirect(psock, skb);
|
||||
break;
|
||||
case __SK_DROP:
|
||||
/* fall-through */
|
||||
default:
|
||||
@ -779,9 +825,13 @@ static void sk_psock_strp_data_ready(struct sock *sk)
|
||||
rcu_read_lock();
|
||||
psock = sk_psock(sk);
|
||||
if (likely(psock)) {
|
||||
write_lock_bh(&sk->sk_callback_lock);
|
||||
strp_data_ready(&psock->parser.strp);
|
||||
write_unlock_bh(&sk->sk_callback_lock);
|
||||
if (tls_sw_has_ctx_rx(sk)) {
|
||||
psock->parser.saved_data_ready(sk);
|
||||
} else {
|
||||
write_lock_bh(&sk->sk_callback_lock);
|
||||
strp_data_ready(&psock->parser.strp);
|
||||
write_unlock_bh(&sk->sk_callback_lock);
|
||||
}
|
||||
}
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
@ -594,13 +594,15 @@ out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
int sock_bindtoindex(struct sock *sk, int ifindex)
|
||||
int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk)
|
||||
{
|
||||
int ret;
|
||||
|
||||
lock_sock(sk);
|
||||
if (lock_sk)
|
||||
lock_sock(sk);
|
||||
ret = sock_bindtoindex_locked(sk, ifindex);
|
||||
release_sock(sk);
|
||||
if (lock_sk)
|
||||
release_sock(sk);
|
||||
|
||||
return ret;
|
||||
}
|
||||
@ -646,7 +648,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
|
||||
goto out;
|
||||
}
|
||||
|
||||
return sock_bindtoindex(sk, index);
|
||||
return sock_bindtoindex(sk, index, true);
|
||||
out:
|
||||
#endif
|
||||
|
||||
|
@ -22,7 +22,7 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
|
||||
goto error;
|
||||
|
||||
if (cfg->bind_ifindex) {
|
||||
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex);
|
||||
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex, true);
|
||||
if (err < 0)
|
||||
goto error;
|
||||
}
|
||||
|
@ -30,7 +30,7 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
|
||||
goto error;
|
||||
}
|
||||
if (cfg->bind_ifindex) {
|
||||
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex);
|
||||
err = sock_bindtoindex(sock->sk, cfg->bind_ifindex, true);
|
||||
if (err < 0)
|
||||
goto error;
|
||||
}
|
||||
|
@ -1742,6 +1742,7 @@ int tls_sw_recvmsg(struct sock *sk,
|
||||
long timeo;
|
||||
bool is_kvec = iov_iter_is_kvec(&msg->msg_iter);
|
||||
bool is_peek = flags & MSG_PEEK;
|
||||
bool bpf_strp_enabled;
|
||||
int num_async = 0;
|
||||
int pending;
|
||||
|
||||
@ -1752,6 +1753,7 @@ int tls_sw_recvmsg(struct sock *sk,
|
||||
|
||||
psock = sk_psock_get(sk);
|
||||
lock_sock(sk);
|
||||
bpf_strp_enabled = sk_psock_strp_enabled(psock);
|
||||
|
||||
/* Process pending decrypted records. It must be non-zero-copy */
|
||||
err = process_rx_list(ctx, msg, &control, &cmsg, 0, len, false,
|
||||
@ -1805,11 +1807,12 @@ int tls_sw_recvmsg(struct sock *sk,
|
||||
|
||||
if (to_decrypt <= len && !is_kvec && !is_peek &&
|
||||
ctx->control == TLS_RECORD_TYPE_DATA &&
|
||||
prot->version != TLS_1_3_VERSION)
|
||||
prot->version != TLS_1_3_VERSION &&
|
||||
!bpf_strp_enabled)
|
||||
zc = true;
|
||||
|
||||
/* Do not use async mode if record is non-data */
|
||||
if (ctx->control == TLS_RECORD_TYPE_DATA)
|
||||
if (ctx->control == TLS_RECORD_TYPE_DATA && !bpf_strp_enabled)
|
||||
async_capable = ctx->async_capable;
|
||||
else
|
||||
async_capable = false;
|
||||
@ -1859,6 +1862,19 @@ int tls_sw_recvmsg(struct sock *sk,
|
||||
goto pick_next_record;
|
||||
|
||||
if (!zc) {
|
||||
if (bpf_strp_enabled) {
|
||||
err = sk_psock_tls_strp_read(psock, skb);
|
||||
if (err != __SK_PASS) {
|
||||
rxm->offset = rxm->offset + rxm->full_len;
|
||||
rxm->full_len = 0;
|
||||
if (err == __SK_DROP)
|
||||
consume_skb(skb);
|
||||
ctx->recv_pkt = NULL;
|
||||
__strp_unpause(&ctx->strp);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
if (rxm->full_len > len) {
|
||||
retain_skb = true;
|
||||
chunk = len;
|
||||
|
@ -553,7 +553,7 @@ static int do_dump(int argc, char **argv)
|
||||
btf = btf__parse_elf(*argv, NULL);
|
||||
|
||||
if (IS_ERR(btf)) {
|
||||
err = PTR_ERR(btf);
|
||||
err = -PTR_ERR(btf);
|
||||
btf = NULL;
|
||||
p_err("failed to load BTF from %s: %s",
|
||||
*argv, strerror(err));
|
||||
@ -951,9 +951,9 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s btf { show | list } [id BTF_ID]\n"
|
||||
" %s btf dump BTF_SRC [format FORMAT]\n"
|
||||
" %s btf help\n"
|
||||
"Usage: %1$s %2$s { show | list } [id BTF_ID]\n"
|
||||
" %1$s %2$s dump BTF_SRC [format FORMAT]\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" BTF_SRC := { id BTF_ID | prog PROG | map MAP [{key | value | kv | all}] | file FILE }\n"
|
||||
" FORMAT := { raw | c }\n"
|
||||
@ -961,7 +961,7 @@ static int do_help(int argc, char **argv)
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, bin_name, bin_name);
|
||||
bin_name, "btf");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -491,20 +491,18 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } CGROUP [**effective**]\n"
|
||||
" %s %s tree [CGROUP_ROOT] [**effective**]\n"
|
||||
" %s %s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
|
||||
" %s %s detach CGROUP ATTACH_TYPE PROG\n"
|
||||
" %s %s help\n"
|
||||
"Usage: %1$s %2$s { show | list } CGROUP [**effective**]\n"
|
||||
" %1$s %2$s tree [CGROUP_ROOT] [**effective**]\n"
|
||||
" %1$s %2$s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
|
||||
" %1$s %2$s detach CGROUP ATTACH_TYPE PROG\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
HELP_SPEC_ATTACH_TYPES "\n"
|
||||
" " HELP_SPEC_ATTACH_FLAGS "\n"
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2]);
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -758,11 +758,29 @@ static void section_misc(const char *define_prefix, __u32 ifindex)
|
||||
print_end_section();
|
||||
}
|
||||
|
||||
#ifdef USE_LIBCAP
|
||||
#define capability(c) { c, false, #c }
|
||||
#define capability_msg(a, i) a[i].set ? "" : a[i].name, a[i].set ? "" : ", "
|
||||
#endif
|
||||
|
||||
static int handle_perms(void)
|
||||
{
|
||||
#ifdef USE_LIBCAP
|
||||
cap_value_t cap_list[1] = { CAP_SYS_ADMIN };
|
||||
bool has_sys_admin_cap = false;
|
||||
struct {
|
||||
cap_value_t cap;
|
||||
bool set;
|
||||
char name[14]; /* strlen("CAP_SYS_ADMIN") */
|
||||
} bpf_caps[] = {
|
||||
capability(CAP_SYS_ADMIN),
|
||||
#ifdef CAP_BPF
|
||||
capability(CAP_BPF),
|
||||
capability(CAP_NET_ADMIN),
|
||||
capability(CAP_PERFMON),
|
||||
#endif
|
||||
};
|
||||
cap_value_t cap_list[ARRAY_SIZE(bpf_caps)];
|
||||
unsigned int i, nb_bpf_caps = 0;
|
||||
bool cap_sys_admin_only = true;
|
||||
cap_flag_value_t val;
|
||||
int res = -1;
|
||||
cap_t caps;
|
||||
@ -774,35 +792,64 @@ static int handle_perms(void)
|
||||
return -1;
|
||||
}
|
||||
|
||||
if (cap_get_flag(caps, CAP_SYS_ADMIN, CAP_EFFECTIVE, &val)) {
|
||||
p_err("bug: failed to retrieve CAP_SYS_ADMIN status");
|
||||
goto exit_free;
|
||||
}
|
||||
if (val == CAP_SET)
|
||||
has_sys_admin_cap = true;
|
||||
#ifdef CAP_BPF
|
||||
if (CAP_IS_SUPPORTED(CAP_BPF))
|
||||
cap_sys_admin_only = false;
|
||||
#endif
|
||||
|
||||
if (!run_as_unprivileged && !has_sys_admin_cap) {
|
||||
p_err("full feature probing requires CAP_SYS_ADMIN, run as root or use 'unprivileged'");
|
||||
goto exit_free;
|
||||
for (i = 0; i < ARRAY_SIZE(bpf_caps); i++) {
|
||||
const char *cap_name = bpf_caps[i].name;
|
||||
cap_value_t cap = bpf_caps[i].cap;
|
||||
|
||||
if (cap_get_flag(caps, cap, CAP_EFFECTIVE, &val)) {
|
||||
p_err("bug: failed to retrieve %s status: %s", cap_name,
|
||||
strerror(errno));
|
||||
goto exit_free;
|
||||
}
|
||||
|
||||
if (val == CAP_SET) {
|
||||
bpf_caps[i].set = true;
|
||||
cap_list[nb_bpf_caps++] = cap;
|
||||
}
|
||||
|
||||
if (cap_sys_admin_only)
|
||||
/* System does not know about CAP_BPF, meaning that
|
||||
* CAP_SYS_ADMIN is the only capability required. We
|
||||
* just checked it, break.
|
||||
*/
|
||||
break;
|
||||
}
|
||||
|
||||
if ((run_as_unprivileged && !has_sys_admin_cap) ||
|
||||
(!run_as_unprivileged && has_sys_admin_cap)) {
|
||||
if ((run_as_unprivileged && !nb_bpf_caps) ||
|
||||
(!run_as_unprivileged && nb_bpf_caps == ARRAY_SIZE(bpf_caps)) ||
|
||||
(!run_as_unprivileged && cap_sys_admin_only && nb_bpf_caps)) {
|
||||
/* We are all good, exit now */
|
||||
res = 0;
|
||||
goto exit_free;
|
||||
}
|
||||
|
||||
/* if (run_as_unprivileged && has_sys_admin_cap), drop CAP_SYS_ADMIN */
|
||||
if (!run_as_unprivileged) {
|
||||
if (cap_sys_admin_only)
|
||||
p_err("missing %s, required for full feature probing; run as root or use 'unprivileged'",
|
||||
bpf_caps[0].name);
|
||||
else
|
||||
p_err("missing %s%s%s%s%s%s%s%srequired for full feature probing; run as root or use 'unprivileged'",
|
||||
capability_msg(bpf_caps, 0),
|
||||
capability_msg(bpf_caps, 1),
|
||||
capability_msg(bpf_caps, 2),
|
||||
capability_msg(bpf_caps, 3));
|
||||
goto exit_free;
|
||||
}
|
||||
|
||||
if (cap_set_flag(caps, CAP_EFFECTIVE, ARRAY_SIZE(cap_list), cap_list,
|
||||
/* if (run_as_unprivileged && nb_bpf_caps > 0), drop capabilities. */
|
||||
if (cap_set_flag(caps, CAP_EFFECTIVE, nb_bpf_caps, cap_list,
|
||||
CAP_CLEAR)) {
|
||||
p_err("bug: failed to clear CAP_SYS_ADMIN from capabilities");
|
||||
p_err("bug: failed to clear capabilities: %s", strerror(errno));
|
||||
goto exit_free;
|
||||
}
|
||||
|
||||
if (cap_set_proc(caps)) {
|
||||
p_err("failed to drop CAP_SYS_ADMIN: %s", strerror(errno));
|
||||
p_err("failed to drop capabilities: %s", strerror(errno));
|
||||
goto exit_free;
|
||||
}
|
||||
|
||||
@ -817,7 +864,7 @@ exit_free:
|
||||
|
||||
return res;
|
||||
#else
|
||||
/* Detection assumes user has sufficient privileges (CAP_SYS_ADMIN).
|
||||
/* Detection assumes user has specific privileges.
|
||||
* We do not use libpcap so let's approximate, and restrict usage to
|
||||
* root user only.
|
||||
*/
|
||||
@ -901,7 +948,7 @@ static int do_probe(int argc, char **argv)
|
||||
}
|
||||
}
|
||||
|
||||
/* Full feature detection requires CAP_SYS_ADMIN privilege.
|
||||
/* Full feature detection requires specific privileges.
|
||||
* Let's approximate, and warn if user is not root.
|
||||
*/
|
||||
if (handle_perms())
|
||||
@ -937,12 +984,12 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s probe [COMPONENT] [full] [unprivileged] [macros [prefix PREFIX]]\n"
|
||||
" %s %s help\n"
|
||||
"Usage: %1$s %2$s probe [COMPONENT] [full] [unprivileged] [macros [prefix PREFIX]]\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" COMPONENT := { kernel | dev NAME }\n"
|
||||
"",
|
||||
bin_name, argv[-2], bin_name, argv[-2]);
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -586,12 +586,12 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %1$s gen skeleton FILE\n"
|
||||
" %1$s gen help\n"
|
||||
"Usage: %1$s %2$s skeleton FILE\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name);
|
||||
bin_name, "gen");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -68,10 +68,10 @@ close_obj:
|
||||
static int do_help(int argc, char **argv)
|
||||
{
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s pin OBJ PATH\n"
|
||||
" %s %s help\n"
|
||||
"\n",
|
||||
bin_name, argv[-2], bin_name, argv[-2]);
|
||||
"Usage: %1$s %2$s pin OBJ PATH\n"
|
||||
" %1$s %2$s help\n"
|
||||
"",
|
||||
bin_name, "iter");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
|
||||
[BPF_LINK_TYPE_TRACING] = "tracing",
|
||||
[BPF_LINK_TYPE_CGROUP] = "cgroup",
|
||||
[BPF_LINK_TYPE_ITER] = "iter",
|
||||
[BPF_LINK_TYPE_NETNS] = "netns",
|
||||
};
|
||||
|
||||
static int link_parse_fd(int *argc, char ***argv)
|
||||
@ -62,6 +63,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
|
||||
jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
|
||||
}
|
||||
|
||||
static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)
|
||||
{
|
||||
if (attach_type < ARRAY_SIZE(attach_type_name))
|
||||
jsonw_string_field(wtr, "attach_type",
|
||||
attach_type_name[attach_type]);
|
||||
else
|
||||
jsonw_uint_field(wtr, "attach_type", attach_type);
|
||||
}
|
||||
|
||||
static int get_prog_info(int prog_id, struct bpf_prog_info *info)
|
||||
{
|
||||
__u32 len = sizeof(*info);
|
||||
@ -105,22 +115,18 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
|
||||
jsonw_uint_field(json_wtr, "prog_type",
|
||||
prog_info.type);
|
||||
|
||||
if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
|
||||
jsonw_string_field(json_wtr, "attach_type",
|
||||
attach_type_name[info->tracing.attach_type]);
|
||||
else
|
||||
jsonw_uint_field(json_wtr, "attach_type",
|
||||
info->tracing.attach_type);
|
||||
show_link_attach_type_json(info->tracing.attach_type,
|
||||
json_wtr);
|
||||
break;
|
||||
case BPF_LINK_TYPE_CGROUP:
|
||||
jsonw_lluint_field(json_wtr, "cgroup_id",
|
||||
info->cgroup.cgroup_id);
|
||||
if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
|
||||
jsonw_string_field(json_wtr, "attach_type",
|
||||
attach_type_name[info->cgroup.attach_type]);
|
||||
else
|
||||
jsonw_uint_field(json_wtr, "attach_type",
|
||||
info->cgroup.attach_type);
|
||||
show_link_attach_type_json(info->cgroup.attach_type, json_wtr);
|
||||
break;
|
||||
case BPF_LINK_TYPE_NETNS:
|
||||
jsonw_uint_field(json_wtr, "netns_ino",
|
||||
info->netns.netns_ino);
|
||||
show_link_attach_type_json(info->netns.attach_type, json_wtr);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
@ -153,6 +159,14 @@ static void show_link_header_plain(struct bpf_link_info *info)
|
||||
printf("prog %u ", info->prog_id);
|
||||
}
|
||||
|
||||
static void show_link_attach_type_plain(__u32 attach_type)
|
||||
{
|
||||
if (attach_type < ARRAY_SIZE(attach_type_name))
|
||||
printf("attach_type %s ", attach_type_name[attach_type]);
|
||||
else
|
||||
printf("attach_type %u ", attach_type);
|
||||
}
|
||||
|
||||
static int show_link_close_plain(int fd, struct bpf_link_info *info)
|
||||
{
|
||||
struct bpf_prog_info prog_info;
|
||||
@ -176,19 +190,15 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
|
||||
else
|
||||
printf("\n\tprog_type %u ", prog_info.type);
|
||||
|
||||
if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
|
||||
printf("attach_type %s ",
|
||||
attach_type_name[info->tracing.attach_type]);
|
||||
else
|
||||
printf("attach_type %u ", info->tracing.attach_type);
|
||||
show_link_attach_type_plain(info->tracing.attach_type);
|
||||
break;
|
||||
case BPF_LINK_TYPE_CGROUP:
|
||||
printf("\n\tcgroup_id %zu ", (size_t)info->cgroup.cgroup_id);
|
||||
if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
|
||||
printf("attach_type %s ",
|
||||
attach_type_name[info->cgroup.attach_type]);
|
||||
else
|
||||
printf("attach_type %u ", info->cgroup.attach_type);
|
||||
show_link_attach_type_plain(info->cgroup.attach_type);
|
||||
break;
|
||||
case BPF_LINK_TYPE_NETNS:
|
||||
printf("\n\tnetns_ino %u ", info->netns.netns_ino);
|
||||
show_link_attach_type_plain(info->netns.attach_type);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
@ -312,7 +322,6 @@ static int do_help(int argc, char **argv)
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" " HELP_SPEC_LINK "\n"
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, argv[-2]);
|
||||
|
@ -1561,24 +1561,24 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } [MAP]\n"
|
||||
" %s %s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
|
||||
" entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
|
||||
" [dev NAME]\n"
|
||||
" %s %s dump MAP\n"
|
||||
" %s %s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
|
||||
" %s %s lookup MAP [key DATA]\n"
|
||||
" %s %s getnext MAP [key DATA]\n"
|
||||
" %s %s delete MAP key DATA\n"
|
||||
" %s %s pin MAP FILE\n"
|
||||
" %s %s event_pipe MAP [cpu N index M]\n"
|
||||
" %s %s peek MAP\n"
|
||||
" %s %s push MAP value VALUE\n"
|
||||
" %s %s pop MAP\n"
|
||||
" %s %s enqueue MAP value VALUE\n"
|
||||
" %s %s dequeue MAP\n"
|
||||
" %s %s freeze MAP\n"
|
||||
" %s %s help\n"
|
||||
"Usage: %1$s %2$s { show | list } [MAP]\n"
|
||||
" %1$s %2$s create FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
|
||||
" entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
|
||||
" [dev NAME]\n"
|
||||
" %1$s %2$s dump MAP\n"
|
||||
" %1$s %2$s update MAP [key DATA] [value VALUE] [UPDATE_FLAGS]\n"
|
||||
" %1$s %2$s lookup MAP [key DATA]\n"
|
||||
" %1$s %2$s getnext MAP [key DATA]\n"
|
||||
" %1$s %2$s delete MAP key DATA\n"
|
||||
" %1$s %2$s pin MAP FILE\n"
|
||||
" %1$s %2$s event_pipe MAP [cpu N index M]\n"
|
||||
" %1$s %2$s peek MAP\n"
|
||||
" %1$s %2$s push MAP value VALUE\n"
|
||||
" %1$s %2$s pop MAP\n"
|
||||
" %1$s %2$s enqueue MAP value VALUE\n"
|
||||
" %1$s %2$s dequeue MAP\n"
|
||||
" %1$s %2$s freeze MAP\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" " HELP_SPEC_MAP "\n"
|
||||
" DATA := { [hex] BYTES }\n"
|
||||
@ -1593,11 +1593,6 @@ static int do_help(int argc, char **argv)
|
||||
" queue | stack | sk_storage | struct_ops }\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
|
@ -458,10 +458,10 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } [dev <devname>]\n"
|
||||
" %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
|
||||
" %s %s detach ATTACH_TYPE dev <devname>\n"
|
||||
" %s %s help\n"
|
||||
"Usage: %1$s %2$s { show | list } [dev <devname>]\n"
|
||||
" %1$s %2$s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
|
||||
" %1$s %2$s detach ATTACH_TYPE dev <devname>\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
" ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
|
||||
@ -470,8 +470,8 @@ static int do_help(int argc, char **argv)
|
||||
" For progs attached to cgroups, use \"bpftool cgroup\"\n"
|
||||
" to dump program attachments. For program types\n"
|
||||
" sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
|
||||
" consult iproute2.\n",
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
" consult iproute2.\n"
|
||||
"",
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
|
@ -231,7 +231,7 @@ static int do_show(int argc, char **argv)
|
||||
static int do_help(int argc, char **argv)
|
||||
{
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list | help }\n"
|
||||
"Usage: %1$s %2$s { show | list | help }\n"
|
||||
"",
|
||||
bin_name, argv[-2]);
|
||||
|
||||
|
@ -1984,24 +1984,24 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } [PROG]\n"
|
||||
" %s %s dump xlated PROG [{ file FILE | opcodes | visual | linum }]\n"
|
||||
" %s %s dump jited PROG [{ file FILE | opcodes | linum }]\n"
|
||||
" %s %s pin PROG FILE\n"
|
||||
" %s %s { load | loadall } OBJ PATH \\\n"
|
||||
"Usage: %1$s %2$s { show | list } [PROG]\n"
|
||||
" %1$s %2$s dump xlated PROG [{ file FILE | opcodes | visual | linum }]\n"
|
||||
" %1$s %2$s dump jited PROG [{ file FILE | opcodes | linum }]\n"
|
||||
" %1$s %2$s pin PROG FILE\n"
|
||||
" %1$s %2$s { load | loadall } OBJ PATH \\\n"
|
||||
" [type TYPE] [dev NAME] \\\n"
|
||||
" [map { idx IDX | name NAME } MAP]\\\n"
|
||||
" [pinmaps MAP_DIR]\n"
|
||||
" %s %s attach PROG ATTACH_TYPE [MAP]\n"
|
||||
" %s %s detach PROG ATTACH_TYPE [MAP]\n"
|
||||
" %s %s run PROG \\\n"
|
||||
" %1$s %2$s attach PROG ATTACH_TYPE [MAP]\n"
|
||||
" %1$s %2$s detach PROG ATTACH_TYPE [MAP]\n"
|
||||
" %1$s %2$s run PROG \\\n"
|
||||
" data_in FILE \\\n"
|
||||
" [data_out FILE [data_size_out L]] \\\n"
|
||||
" [ctx_in FILE [ctx_out FILE [ctx_size_out M]]] \\\n"
|
||||
" [repeat N]\n"
|
||||
" %s %s profile PROG [duration DURATION] METRICs\n"
|
||||
" %s %s tracelog\n"
|
||||
" %s %s help\n"
|
||||
" %1$s %2$s profile PROG [duration DURATION] METRICs\n"
|
||||
" %1$s %2$s tracelog\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" " HELP_SPEC_MAP "\n"
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
@ -2022,10 +2022,7 @@ static int do_help(int argc, char **argv)
|
||||
" METRIC := { cycles | instructions | l1d_loads | llc_misses }\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2]);
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -566,16 +566,15 @@ static int do_help(int argc, char **argv)
|
||||
}
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } [STRUCT_OPS_MAP]\n"
|
||||
" %s %s dump [STRUCT_OPS_MAP]\n"
|
||||
" %s %s register OBJ\n"
|
||||
" %s %s unregister STRUCT_OPS_MAP\n"
|
||||
" %s %s help\n"
|
||||
"Usage: %1$s %2$s { show | list } [STRUCT_OPS_MAP]\n"
|
||||
" %1$s %2$s dump [STRUCT_OPS_MAP]\n"
|
||||
" %1$s %2$s register OBJ\n"
|
||||
" %1$s %2$s unregister STRUCT_OPS_MAP\n"
|
||||
" %1$s %2$s help\n"
|
||||
"\n"
|
||||
" OPTIONS := { {-j|--json} [{-p|--pretty}] }\n"
|
||||
" STRUCT_OPS_MAP := [ id STRUCT_OPS_MAP_ID | name STRUCT_OPS_MAP_NAME ]\n",
|
||||
bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2],
|
||||
" STRUCT_OPS_MAP := [ id STRUCT_OPS_MAP_ID | name STRUCT_OPS_MAP_NAME ]\n"
|
||||
"",
|
||||
bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
|
@ -147,6 +147,7 @@ enum bpf_map_type {
|
||||
BPF_MAP_TYPE_SK_STORAGE,
|
||||
BPF_MAP_TYPE_DEVMAP_HASH,
|
||||
BPF_MAP_TYPE_STRUCT_OPS,
|
||||
BPF_MAP_TYPE_RINGBUF,
|
||||
};
|
||||
|
||||
/* Note that tracing related programs such as
|
||||
@ -224,6 +225,7 @@ enum bpf_attach_type {
|
||||
BPF_CGROUP_INET6_GETPEERNAME,
|
||||
BPF_CGROUP_INET4_GETSOCKNAME,
|
||||
BPF_CGROUP_INET6_GETSOCKNAME,
|
||||
BPF_XDP_DEVMAP,
|
||||
__MAX_BPF_ATTACH_TYPE
|
||||
};
|
||||
|
||||
@ -235,6 +237,7 @@ enum bpf_link_type {
|
||||
BPF_LINK_TYPE_TRACING = 2,
|
||||
BPF_LINK_TYPE_CGROUP = 3,
|
||||
BPF_LINK_TYPE_ITER = 4,
|
||||
BPF_LINK_TYPE_NETNS = 5,
|
||||
|
||||
MAX_BPF_LINK_TYPE,
|
||||
};
|
||||
@ -3157,6 +3160,59 @@ union bpf_attr {
|
||||
* **bpf_sk_cgroup_id**\ ().
|
||||
* Return
|
||||
* The id is returned or 0 in case the id could not be retrieved.
|
||||
*
|
||||
* void *bpf_ringbuf_output(void *ringbuf, void *data, u64 size, u64 flags)
|
||||
* Description
|
||||
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* 0, on success;
|
||||
* < 0, on error.
|
||||
*
|
||||
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
|
||||
* Description
|
||||
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
|
||||
* Return
|
||||
* Valid pointer with *size* bytes of memory available; NULL,
|
||||
* otherwise.
|
||||
*
|
||||
* void bpf_ringbuf_submit(void *data, u64 flags)
|
||||
* Description
|
||||
* Submit reserved ring buffer sample, pointed to by *data*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* Nothing. Always succeeds.
|
||||
*
|
||||
* void bpf_ringbuf_discard(void *data, u64 flags)
|
||||
* Description
|
||||
* Discard reserved ring buffer sample, pointed to by *data*.
|
||||
* If BPF_RB_NO_WAKEUP is specified in *flags*, no notification of
|
||||
* new data availability is sent.
|
||||
* IF BPF_RB_FORCE_WAKEUP is specified in *flags*, notification of
|
||||
* new data availability is sent unconditionally.
|
||||
* Return
|
||||
* Nothing. Always succeeds.
|
||||
*
|
||||
* u64 bpf_ringbuf_query(void *ringbuf, u64 flags)
|
||||
* Description
|
||||
* Query various characteristics of provided ring buffer. What
|
||||
* exactly is queries is determined by *flags*:
|
||||
* - BPF_RB_AVAIL_DATA - amount of data not yet consumed;
|
||||
* - BPF_RB_RING_SIZE - the size of ring buffer;
|
||||
* - BPF_RB_CONS_POS - consumer position (can wrap around);
|
||||
* - BPF_RB_PROD_POS - producer(s) position (can wrap around);
|
||||
* Data returned is just a momentary snapshots of actual values
|
||||
* and could be inaccurate, so this facility should be used to
|
||||
* power heuristics and for reporting, not to make 100% correct
|
||||
* calculation.
|
||||
* Return
|
||||
* Requested value, or 0, if flags are not recognized.
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -3288,7 +3344,12 @@ union bpf_attr {
|
||||
FN(seq_printf), \
|
||||
FN(seq_write), \
|
||||
FN(sk_cgroup_id), \
|
||||
FN(sk_ancestor_cgroup_id),
|
||||
FN(sk_ancestor_cgroup_id), \
|
||||
FN(ringbuf_output), \
|
||||
FN(ringbuf_reserve), \
|
||||
FN(ringbuf_submit), \
|
||||
FN(ringbuf_discard), \
|
||||
FN(ringbuf_query),
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
* function eBPF program intends to call
|
||||
@ -3398,6 +3459,29 @@ enum {
|
||||
BPF_F_GET_BRANCH_RECORDS_SIZE = (1ULL << 0),
|
||||
};
|
||||
|
||||
/* BPF_FUNC_bpf_ringbuf_commit, BPF_FUNC_bpf_ringbuf_discard, and
|
||||
* BPF_FUNC_bpf_ringbuf_output flags.
|
||||
*/
|
||||
enum {
|
||||
BPF_RB_NO_WAKEUP = (1ULL << 0),
|
||||
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
|
||||
};
|
||||
|
||||
/* BPF_FUNC_bpf_ringbuf_query flags */
|
||||
enum {
|
||||
BPF_RB_AVAIL_DATA = 0,
|
||||
BPF_RB_RING_SIZE = 1,
|
||||
BPF_RB_CONS_POS = 2,
|
||||
BPF_RB_PROD_POS = 3,
|
||||
};
|
||||
|
||||
/* BPF ring buffer constants */
|
||||
enum {
|
||||
BPF_RINGBUF_BUSY_BIT = (1U << 31),
|
||||
BPF_RINGBUF_DISCARD_BIT = (1U << 30),
|
||||
BPF_RINGBUF_HDR_SZ = 8,
|
||||
};
|
||||
|
||||
/* Mode for BPF_FUNC_skb_adjust_room helper. */
|
||||
enum bpf_adj_room_mode {
|
||||
BPF_ADJ_ROOM_NET,
|
||||
@ -3530,6 +3614,7 @@ struct bpf_sock {
|
||||
__u32 dst_ip4;
|
||||
__u32 dst_ip6[4];
|
||||
__u32 state;
|
||||
__s32 rx_queue_mapping;
|
||||
};
|
||||
|
||||
struct bpf_tcp_sock {
|
||||
@ -3623,6 +3708,8 @@ struct xdp_md {
|
||||
/* Below access go through struct xdp_rxq_info */
|
||||
__u32 ingress_ifindex; /* rxq->dev->ifindex */
|
||||
__u32 rx_queue_index; /* rxq->queue_index */
|
||||
|
||||
__u32 egress_ifindex; /* txq->dev->ifindex */
|
||||
};
|
||||
|
||||
enum sk_action {
|
||||
@ -3645,6 +3732,8 @@ struct sk_msg_md {
|
||||
__u32 remote_port; /* Stored in network byte order */
|
||||
__u32 local_port; /* stored in host byte order */
|
||||
__u32 size; /* Total size of sk_msg */
|
||||
|
||||
__bpf_md_ptr(struct bpf_sock *, sk); /* current socket */
|
||||
};
|
||||
|
||||
struct sk_reuseport_md {
|
||||
@ -3751,6 +3840,10 @@ struct bpf_link_info {
|
||||
__u64 cgroup_id;
|
||||
__u32 attach_type;
|
||||
} cgroup;
|
||||
struct {
|
||||
__u32 netns_ino;
|
||||
__u32 attach_type;
|
||||
} netns;
|
||||
};
|
||||
} __attribute__((aligned(8)));
|
||||
|
||||
|
@ -1,3 +1,3 @@
|
||||
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
|
||||
netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
|
||||
btf_dump.o
|
||||
btf_dump.o ringbuf.o
|
||||
|
@ -151,7 +151,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN_SHARED) | \
|
||||
sed 's/\[.*\]//' | \
|
||||
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \
|
||||
sort -u | wc -l)
|
||||
VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
|
||||
VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \
|
||||
grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
|
||||
|
||||
CMD_TARGETS = $(LIB_TARGET) $(PC_FILE)
|
||||
@ -218,7 +218,7 @@ check_abi: $(OUTPUT)libbpf.so
|
||||
sed 's/\[.*\]//' | \
|
||||
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \
|
||||
sort -u > $(OUTPUT)libbpf_global_syms.tmp; \
|
||||
readelf -s --wide $(OUTPUT)libbpf.so | \
|
||||
readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \
|
||||
grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \
|
||||
sort -u > $(OUTPUT)libbpf_versioned_syms.tmp; \
|
||||
diff -u $(OUTPUT)libbpf_global_syms.tmp \
|
||||
@ -264,7 +264,7 @@ install_pkgconfig: $(PC_FILE)
|
||||
$(call QUIET_INSTALL, $(PC_FILE)) \
|
||||
$(call do_install,$(PC_FILE),$(libdir_SQ)/pkgconfig,644)
|
||||
|
||||
install: install_lib install_pkgconfig
|
||||
install: install_lib install_pkgconfig install_headers
|
||||
|
||||
### Cleaning rules
|
||||
|
||||
|
@ -6657,6 +6657,8 @@ static const struct bpf_sec_def section_defs[] = {
|
||||
.expected_attach_type = BPF_TRACE_ITER,
|
||||
.is_attach_btf = true,
|
||||
.attach_fn = attach_iter),
|
||||
BPF_EAPROG_SEC("xdp_devmap", BPF_PROG_TYPE_XDP,
|
||||
BPF_XDP_DEVMAP),
|
||||
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
|
||||
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
|
||||
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
|
||||
@ -7894,8 +7896,9 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
|
||||
return bpf_program__attach_iter(prog, NULL);
|
||||
}
|
||||
|
||||
struct bpf_link *
|
||||
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
|
||||
static struct bpf_link *
|
||||
bpf_program__attach_fd(struct bpf_program *prog, int target_fd,
|
||||
const char *target_name)
|
||||
{
|
||||
enum bpf_attach_type attach_type;
|
||||
char errmsg[STRERR_BUFSIZE];
|
||||
@ -7915,12 +7918,12 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
|
||||
link->detach = &bpf_link__detach_fd;
|
||||
|
||||
attach_type = bpf_program__get_expected_attach_type(prog);
|
||||
link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
|
||||
link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
|
||||
if (link_fd < 0) {
|
||||
link_fd = -errno;
|
||||
free(link);
|
||||
pr_warn("program '%s': failed to attach to cgroup: %s\n",
|
||||
bpf_program__title(prog, false),
|
||||
pr_warn("program '%s': failed to attach to %s: %s\n",
|
||||
bpf_program__title(prog, false), target_name,
|
||||
libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
|
||||
return ERR_PTR(link_fd);
|
||||
}
|
||||
@ -7928,6 +7931,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
|
||||
return link;
|
||||
}
|
||||
|
||||
struct bpf_link *
|
||||
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
|
||||
{
|
||||
return bpf_program__attach_fd(prog, cgroup_fd, "cgroup");
|
||||
}
|
||||
|
||||
struct bpf_link *
|
||||
bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
|
||||
{
|
||||
return bpf_program__attach_fd(prog, netns_fd, "netns");
|
||||
}
|
||||
|
||||
struct bpf_link *
|
||||
bpf_program__attach_iter(struct bpf_program *prog,
|
||||
const struct bpf_iter_attach_opts *opts)
|
||||
@ -8137,9 +8152,12 @@ void perf_buffer__free(struct perf_buffer *pb)
|
||||
if (!pb)
|
||||
return;
|
||||
if (pb->cpu_bufs) {
|
||||
for (i = 0; i < pb->cpu_cnt && pb->cpu_bufs[i]; i++) {
|
||||
for (i = 0; i < pb->cpu_cnt; i++) {
|
||||
struct perf_cpu_buf *cpu_buf = pb->cpu_bufs[i];
|
||||
|
||||
if (!cpu_buf)
|
||||
continue;
|
||||
|
||||
bpf_map_delete_elem(pb->map_fd, &cpu_buf->map_key);
|
||||
perf_buffer__free_cpu_buf(pb, cpu_buf);
|
||||
}
|
||||
@ -8456,6 +8474,25 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
|
||||
return cnt < 0 ? -errno : cnt;
|
||||
}
|
||||
|
||||
int perf_buffer__consume(struct perf_buffer *pb)
|
||||
{
|
||||
int i, err;
|
||||
|
||||
for (i = 0; i < pb->cpu_cnt; i++) {
|
||||
struct perf_cpu_buf *cpu_buf = pb->cpu_bufs[i];
|
||||
|
||||
if (!cpu_buf)
|
||||
continue;
|
||||
|
||||
err = perf_buffer__process_records(pb, cpu_buf);
|
||||
if (err) {
|
||||
pr_warn("error while processing records: %d\n", err);
|
||||
return err;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct bpf_prog_info_array_desc {
|
||||
int array_offset; /* e.g. offset of jited_prog_insns */
|
||||
int count_offset; /* e.g. offset of jited_prog_len */
|
||||
|
@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
|
||||
bpf_program__attach_lsm(struct bpf_program *prog);
|
||||
LIBBPF_API struct bpf_link *
|
||||
bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
|
||||
LIBBPF_API struct bpf_link *
|
||||
bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
|
||||
|
||||
struct bpf_map;
|
||||
|
||||
@ -478,6 +480,27 @@ LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
|
||||
LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
|
||||
size_t info_size, __u32 flags);
|
||||
|
||||
/* Ring buffer APIs */
|
||||
struct ring_buffer;
|
||||
|
||||
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
|
||||
|
||||
struct ring_buffer_opts {
|
||||
size_t sz; /* size of this struct, for forward/backward compatiblity */
|
||||
};
|
||||
|
||||
#define ring_buffer_opts__last_field sz
|
||||
|
||||
LIBBPF_API struct ring_buffer *
|
||||
ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
|
||||
const struct ring_buffer_opts *opts);
|
||||
LIBBPF_API void ring_buffer__free(struct ring_buffer *rb);
|
||||
LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
|
||||
ring_buffer_sample_fn sample_cb, void *ctx);
|
||||
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
|
||||
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
|
||||
|
||||
/* Perf buffer APIs */
|
||||
struct perf_buffer;
|
||||
|
||||
typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu,
|
||||
@ -533,6 +556,7 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt,
|
||||
|
||||
LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
|
||||
LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
|
||||
LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
|
||||
|
||||
typedef enum bpf_perf_event_ret
|
||||
(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
|
||||
|
@ -262,4 +262,11 @@ LIBBPF_0.0.9 {
|
||||
bpf_link_get_fd_by_id;
|
||||
bpf_link_get_next_id;
|
||||
bpf_program__attach_iter;
|
||||
bpf_program__attach_netns;
|
||||
perf_buffer__consume;
|
||||
ring_buffer__add;
|
||||
ring_buffer__consume;
|
||||
ring_buffer__free;
|
||||
ring_buffer__new;
|
||||
ring_buffer__poll;
|
||||
} LIBBPF_0.0.8;
|
||||
|
@ -238,6 +238,11 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
|
||||
if (btf_fd < 0)
|
||||
return false;
|
||||
break;
|
||||
case BPF_MAP_TYPE_RINGBUF:
|
||||
key_size = 0;
|
||||
value_size = 0;
|
||||
max_entries = 4096;
|
||||
break;
|
||||
case BPF_MAP_TYPE_UNSPEC:
|
||||
case BPF_MAP_TYPE_HASH:
|
||||
case BPF_MAP_TYPE_ARRAY:
|
||||
|
288
tools/lib/bpf/ringbuf.c
Normal file
288
tools/lib/bpf/ringbuf.c
Normal file
@ -0,0 +1,288 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/*
|
||||
* Ring buffer operations.
|
||||
*
|
||||
* Copyright (C) 2020 Facebook, Inc.
|
||||
*/
|
||||
#ifndef _GNU_SOURCE
|
||||
#define _GNU_SOURCE
|
||||
#endif
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <asm/barrier.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/epoll.h>
|
||||
#include <tools/libc_compat.h>
|
||||
|
||||
#include "libbpf.h"
|
||||
#include "libbpf_internal.h"
|
||||
#include "bpf.h"
|
||||
|
||||
/* make sure libbpf doesn't use kernel-only integer typedefs */
|
||||
#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
|
||||
|
||||
struct ring {
|
||||
ring_buffer_sample_fn sample_cb;
|
||||
void *ctx;
|
||||
void *data;
|
||||
unsigned long *consumer_pos;
|
||||
unsigned long *producer_pos;
|
||||
unsigned long mask;
|
||||
int map_fd;
|
||||
};
|
||||
|
||||
struct ring_buffer {
|
||||
struct epoll_event *events;
|
||||
struct ring *rings;
|
||||
size_t page_size;
|
||||
int epoll_fd;
|
||||
int ring_cnt;
|
||||
};
|
||||
|
||||
static void ringbuf_unmap_ring(struct ring_buffer *rb, struct ring *r)
|
||||
{
|
||||
if (r->consumer_pos) {
|
||||
munmap(r->consumer_pos, rb->page_size);
|
||||
r->consumer_pos = NULL;
|
||||
}
|
||||
if (r->producer_pos) {
|
||||
munmap(r->producer_pos, rb->page_size + 2 * (r->mask + 1));
|
||||
r->producer_pos = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/* Add extra RINGBUF maps to this ring buffer manager */
|
||||
int ring_buffer__add(struct ring_buffer *rb, int map_fd,
|
||||
ring_buffer_sample_fn sample_cb, void *ctx)
|
||||
{
|
||||
struct bpf_map_info info;
|
||||
__u32 len = sizeof(info);
|
||||
struct epoll_event *e;
|
||||
struct ring *r;
|
||||
void *tmp;
|
||||
int err;
|
||||
|
||||
memset(&info, 0, sizeof(info));
|
||||
|
||||
err = bpf_obj_get_info_by_fd(map_fd, &info, &len);
|
||||
if (err) {
|
||||
err = -errno;
|
||||
pr_warn("ringbuf: failed to get map info for fd=%d: %d\n",
|
||||
map_fd, err);
|
||||
return err;
|
||||
}
|
||||
|
||||
if (info.type != BPF_MAP_TYPE_RINGBUF) {
|
||||
pr_warn("ringbuf: map fd=%d is not BPF_MAP_TYPE_RINGBUF\n",
|
||||
map_fd);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
tmp = reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings));
|
||||
if (!tmp)
|
||||
return -ENOMEM;
|
||||
rb->rings = tmp;
|
||||
|
||||
tmp = reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events));
|
||||
if (!tmp)
|
||||
return -ENOMEM;
|
||||
rb->events = tmp;
|
||||
|
||||
r = &rb->rings[rb->ring_cnt];
|
||||
memset(r, 0, sizeof(*r));
|
||||
|
||||
r->map_fd = map_fd;
|
||||
r->sample_cb = sample_cb;
|
||||
r->ctx = ctx;
|
||||
r->mask = info.max_entries - 1;
|
||||
|
||||
/* Map writable consumer page */
|
||||
tmp = mmap(NULL, rb->page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
|
||||
map_fd, 0);
|
||||
if (tmp == MAP_FAILED) {
|
||||
err = -errno;
|
||||
pr_warn("ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
|
||||
map_fd, err);
|
||||
return err;
|
||||
}
|
||||
r->consumer_pos = tmp;
|
||||
|
||||
/* Map read-only producer page and data pages. We map twice as big
|
||||
* data size to allow simple reading of samples that wrap around the
|
||||
* end of a ring buffer. See kernel implementation for details.
|
||||
* */
|
||||
tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
|
||||
MAP_SHARED, map_fd, rb->page_size);
|
||||
if (tmp == MAP_FAILED) {
|
||||
err = -errno;
|
||||
ringbuf_unmap_ring(rb, r);
|
||||
pr_warn("ringbuf: failed to mmap data pages for map fd=%d: %d\n",
|
||||
map_fd, err);
|
||||
return err;
|
||||
}
|
||||
r->producer_pos = tmp;
|
||||
r->data = tmp + rb->page_size;
|
||||
|
||||
e = &rb->events[rb->ring_cnt];
|
||||
memset(e, 0, sizeof(*e));
|
||||
|
||||
e->events = EPOLLIN;
|
||||
e->data.fd = rb->ring_cnt;
|
||||
if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, e) < 0) {
|
||||
err = -errno;
|
||||
ringbuf_unmap_ring(rb, r);
|
||||
pr_warn("ringbuf: failed to epoll add map fd=%d: %d\n",
|
||||
map_fd, err);
|
||||
return err;
|
||||
}
|
||||
|
||||
rb->ring_cnt++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
void ring_buffer__free(struct ring_buffer *rb)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (!rb)
|
||||
return;
|
||||
|
||||
for (i = 0; i < rb->ring_cnt; ++i)
|
||||
ringbuf_unmap_ring(rb, &rb->rings[i]);
|
||||
if (rb->epoll_fd >= 0)
|
||||
close(rb->epoll_fd);
|
||||
|
||||
free(rb->events);
|
||||
free(rb->rings);
|
||||
free(rb);
|
||||
}
|
||||
|
||||
struct ring_buffer *
|
||||
ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
|
||||
const struct ring_buffer_opts *opts)
|
||||
{
|
||||
struct ring_buffer *rb;
|
||||
int err;
|
||||
|
||||
if (!OPTS_VALID(opts, ring_buffer_opts))
|
||||
return NULL;
|
||||
|
||||
rb = calloc(1, sizeof(*rb));
|
||||
if (!rb)
|
||||
return NULL;
|
||||
|
||||
rb->page_size = getpagesize();
|
||||
|
||||
rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
|
||||
if (rb->epoll_fd < 0) {
|
||||
err = -errno;
|
||||
pr_warn("ringbuf: failed to create epoll instance: %d\n", err);
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
err = ring_buffer__add(rb, map_fd, sample_cb, ctx);
|
||||
if (err)
|
||||
goto err_out;
|
||||
|
||||
return rb;
|
||||
|
||||
err_out:
|
||||
ring_buffer__free(rb);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline int roundup_len(__u32 len)
|
||||
{
|
||||
/* clear out top 2 bits (discard and busy, if set) */
|
||||
len <<= 2;
|
||||
len >>= 2;
|
||||
/* add length prefix */
|
||||
len += BPF_RINGBUF_HDR_SZ;
|
||||
/* round up to 8 byte alignment */
|
||||
return (len + 7) / 8 * 8;
|
||||
}
|
||||
|
||||
static int ringbuf_process_ring(struct ring* r)
|
||||
{
|
||||
int *len_ptr, len, err, cnt = 0;
|
||||
unsigned long cons_pos, prod_pos;
|
||||
bool got_new_data;
|
||||
void *sample;
|
||||
|
||||
cons_pos = smp_load_acquire(r->consumer_pos);
|
||||
do {
|
||||
got_new_data = false;
|
||||
prod_pos = smp_load_acquire(r->producer_pos);
|
||||
while (cons_pos < prod_pos) {
|
||||
len_ptr = r->data + (cons_pos & r->mask);
|
||||
len = smp_load_acquire(len_ptr);
|
||||
|
||||
/* sample not committed yet, bail out for now */
|
||||
if (len & BPF_RINGBUF_BUSY_BIT)
|
||||
goto done;
|
||||
|
||||
got_new_data = true;
|
||||
cons_pos += roundup_len(len);
|
||||
|
||||
if ((len & BPF_RINGBUF_DISCARD_BIT) == 0) {
|
||||
sample = (void *)len_ptr + BPF_RINGBUF_HDR_SZ;
|
||||
err = r->sample_cb(r->ctx, sample, len);
|
||||
if (err) {
|
||||
/* update consumer pos and bail out */
|
||||
smp_store_release(r->consumer_pos,
|
||||
cons_pos);
|
||||
return err;
|
||||
}
|
||||
cnt++;
|
||||
}
|
||||
|
||||
smp_store_release(r->consumer_pos, cons_pos);
|
||||
}
|
||||
} while (got_new_data);
|
||||
done:
|
||||
return cnt;
|
||||
}
|
||||
|
||||
/* Consume available ring buffer(s) data without event polling.
|
||||
* Returns number of records consumed across all registered ring buffers, or
|
||||
* negative number if any of the callbacks return error.
|
||||
*/
|
||||
int ring_buffer__consume(struct ring_buffer *rb)
|
||||
{
|
||||
int i, err, res = 0;
|
||||
|
||||
for (i = 0; i < rb->ring_cnt; i++) {
|
||||
struct ring *ring = &rb->rings[i];
|
||||
|
||||
err = ringbuf_process_ring(ring);
|
||||
if (err < 0)
|
||||
return err;
|
||||
res += err;
|
||||
}
|
||||
return res;
|
||||
}
|
||||
|
||||
/* Poll for available data and consume records, if any are available.
|
||||
* Returns number of records consumed, or negative number, if any of the
|
||||
* registered callbacks returned error.
|
||||
*/
|
||||
int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
|
||||
{
|
||||
int i, cnt, err, res = 0;
|
||||
|
||||
cnt = epoll_wait(rb->epoll_fd, rb->events, rb->ring_cnt, timeout_ms);
|
||||
for (i = 0; i < cnt; i++) {
|
||||
__u32 ring_id = rb->events[i].data.fd;
|
||||
struct ring *ring = &rb->rings[ring_id];
|
||||
|
||||
err = ringbuf_process_ring(ring);
|
||||
if (err < 0)
|
||||
return err;
|
||||
res += cnt;
|
||||
}
|
||||
return cnt < 0 ? -errno : res;
|
||||
}
|
@ -413,12 +413,15 @@ $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h
|
||||
$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@
|
||||
$(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
|
||||
$(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
|
||||
$(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
|
||||
$(OUTPUT)/perfbuf_bench.skel.h
|
||||
$(OUTPUT)/bench.o: bench.h testing_helpers.h
|
||||
$(OUTPUT)/bench: LDLIBS += -lm
|
||||
$(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
|
||||
$(OUTPUT)/bench_count.o \
|
||||
$(OUTPUT)/bench_rename.o \
|
||||
$(OUTPUT)/bench_trigger.o
|
||||
$(OUTPUT)/bench_trigger.o \
|
||||
$(OUTPUT)/bench_ringbufs.o
|
||||
$(call msg,BINARY,,$@)
|
||||
$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)
|
||||
|
||||
|
@ -130,6 +130,13 @@ static const struct argp_option opts[] = {
|
||||
{},
|
||||
};
|
||||
|
||||
extern struct argp bench_ringbufs_argp;
|
||||
|
||||
static const struct argp_child bench_parsers[] = {
|
||||
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
|
||||
{},
|
||||
};
|
||||
|
||||
static error_t parse_arg(int key, char *arg, struct argp_state *state)
|
||||
{
|
||||
static int pos_args;
|
||||
@ -208,6 +215,7 @@ static void parse_cmdline_args(int argc, char **argv)
|
||||
.options = opts,
|
||||
.parser = parse_arg,
|
||||
.doc = argp_program_doc,
|
||||
.children = bench_parsers,
|
||||
};
|
||||
if (argp_parse(&argp, argc, argv, 0, NULL, NULL))
|
||||
exit(1);
|
||||
@ -310,6 +318,10 @@ extern const struct bench bench_trig_rawtp;
|
||||
extern const struct bench bench_trig_kprobe;
|
||||
extern const struct bench bench_trig_fentry;
|
||||
extern const struct bench bench_trig_fmodret;
|
||||
extern const struct bench bench_rb_libbpf;
|
||||
extern const struct bench bench_rb_custom;
|
||||
extern const struct bench bench_pb_libbpf;
|
||||
extern const struct bench bench_pb_custom;
|
||||
|
||||
static const struct bench *benchs[] = {
|
||||
&bench_count_global,
|
||||
@ -327,6 +339,10 @@ static const struct bench *benchs[] = {
|
||||
&bench_trig_kprobe,
|
||||
&bench_trig_fentry,
|
||||
&bench_trig_fmodret,
|
||||
&bench_rb_libbpf,
|
||||
&bench_rb_custom,
|
||||
&bench_pb_libbpf,
|
||||
&bench_pb_custom,
|
||||
};
|
||||
|
||||
static void setup_benchmark()
|
||||
|
566
tools/testing/selftests/bpf/benchs/bench_ringbufs.c
Normal file
566
tools/testing/selftests/bpf/benchs/bench_ringbufs.c
Normal file
@ -0,0 +1,566 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* Copyright (c) 2020 Facebook */
|
||||
#include <asm/barrier.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <linux/ring_buffer.h>
|
||||
#include <sys/epoll.h>
|
||||
#include <sys/mman.h>
|
||||
#include <argp.h>
|
||||
#include <stdlib.h>
|
||||
#include "bench.h"
|
||||
#include "ringbuf_bench.skel.h"
|
||||
#include "perfbuf_bench.skel.h"
|
||||
|
||||
static struct {
|
||||
bool back2back;
|
||||
int batch_cnt;
|
||||
bool sampled;
|
||||
int sample_rate;
|
||||
int ringbuf_sz; /* per-ringbuf, in bytes */
|
||||
bool ringbuf_use_output; /* use slower output API */
|
||||
int perfbuf_sz; /* per-CPU size, in pages */
|
||||
} args = {
|
||||
.back2back = false,
|
||||
.batch_cnt = 500,
|
||||
.sampled = false,
|
||||
.sample_rate = 500,
|
||||
.ringbuf_sz = 512 * 1024,
|
||||
.ringbuf_use_output = false,
|
||||
.perfbuf_sz = 128,
|
||||
};
|
||||
|
||||
enum {
|
||||
ARG_RB_BACK2BACK = 2000,
|
||||
ARG_RB_USE_OUTPUT = 2001,
|
||||
ARG_RB_BATCH_CNT = 2002,
|
||||
ARG_RB_SAMPLED = 2003,
|
||||
ARG_RB_SAMPLE_RATE = 2004,
|
||||
};
|
||||
|
||||
static const struct argp_option opts[] = {
|
||||
{ "rb-b2b", ARG_RB_BACK2BACK, NULL, 0, "Back-to-back mode"},
|
||||
{ "rb-use-output", ARG_RB_USE_OUTPUT, NULL, 0, "Use bpf_ringbuf_output() instead of bpf_ringbuf_reserve()"},
|
||||
{ "rb-batch-cnt", ARG_RB_BATCH_CNT, "CNT", 0, "Set BPF-side record batch count"},
|
||||
{ "rb-sampled", ARG_RB_SAMPLED, NULL, 0, "Notification sampling"},
|
||||
{ "rb-sample-rate", ARG_RB_SAMPLE_RATE, "RATE", 0, "Notification sample rate"},
|
||||
{},
|
||||
};
|
||||
|
||||
static error_t parse_arg(int key, char *arg, struct argp_state *state)
|
||||
{
|
||||
switch (key) {
|
||||
case ARG_RB_BACK2BACK:
|
||||
args.back2back = true;
|
||||
break;
|
||||
case ARG_RB_USE_OUTPUT:
|
||||
args.ringbuf_use_output = true;
|
||||
break;
|
||||
case ARG_RB_BATCH_CNT:
|
||||
args.batch_cnt = strtol(arg, NULL, 10);
|
||||
if (args.batch_cnt < 0) {
|
||||
fprintf(stderr, "Invalid batch count.");
|
||||
argp_usage(state);
|
||||
}
|
||||
break;
|
||||
case ARG_RB_SAMPLED:
|
||||
args.sampled = true;
|
||||
break;
|
||||
case ARG_RB_SAMPLE_RATE:
|
||||
args.sample_rate = strtol(arg, NULL, 10);
|
||||
if (args.sample_rate < 0) {
|
||||
fprintf(stderr, "Invalid perfbuf sample rate.");
|
||||
argp_usage(state);
|
||||
}
|
||||
break;
|
||||
default:
|
||||
return ARGP_ERR_UNKNOWN;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* exported into benchmark runner */
|
||||
const struct argp bench_ringbufs_argp = {
|
||||
.options = opts,
|
||||
.parser = parse_arg,
|
||||
};
|
||||
|
||||
/* RINGBUF-LIBBPF benchmark */
|
||||
|
||||
static struct counter buf_hits;
|
||||
|
||||
static inline void bufs_trigger_batch()
|
||||
{
|
||||
(void)syscall(__NR_getpgid);
|
||||
}
|
||||
|
||||
static void bufs_validate()
|
||||
{
|
||||
if (env.consumer_cnt != 1) {
|
||||
fprintf(stderr, "rb-libbpf benchmark doesn't support multi-consumer!\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (args.back2back && env.producer_cnt > 1) {
|
||||
fprintf(stderr, "back-to-back mode makes sense only for single-producer case!\n");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
static void *bufs_sample_producer(void *input)
|
||||
{
|
||||
if (args.back2back) {
|
||||
/* initial batch to get everything started */
|
||||
bufs_trigger_batch();
|
||||
return NULL;
|
||||
}
|
||||
|
||||
while (true)
|
||||
bufs_trigger_batch();
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static struct ringbuf_libbpf_ctx {
|
||||
struct ringbuf_bench *skel;
|
||||
struct ring_buffer *ringbuf;
|
||||
} ringbuf_libbpf_ctx;
|
||||
|
||||
static void ringbuf_libbpf_measure(struct bench_res *res)
|
||||
{
|
||||
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
|
||||
|
||||
res->hits = atomic_swap(&buf_hits.value, 0);
|
||||
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
|
||||
}
|
||||
|
||||
static struct ringbuf_bench *ringbuf_setup_skeleton()
|
||||
{
|
||||
struct ringbuf_bench *skel;
|
||||
|
||||
setup_libbpf();
|
||||
|
||||
skel = ringbuf_bench__open();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "failed to open skeleton\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
skel->rodata->batch_cnt = args.batch_cnt;
|
||||
skel->rodata->use_output = args.ringbuf_use_output ? 1 : 0;
|
||||
|
||||
if (args.sampled)
|
||||
/* record data + header take 16 bytes */
|
||||
skel->rodata->wakeup_data_size = args.sample_rate * 16;
|
||||
|
||||
bpf_map__resize(skel->maps.ringbuf, args.ringbuf_sz);
|
||||
|
||||
if (ringbuf_bench__load(skel)) {
|
||||
fprintf(stderr, "failed to load skeleton\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
return skel;
|
||||
}
|
||||
|
||||
static int buf_process_sample(void *ctx, void *data, size_t len)
|
||||
{
|
||||
atomic_inc(&buf_hits.value);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void ringbuf_libbpf_setup()
|
||||
{
|
||||
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
|
||||
struct bpf_link *link;
|
||||
|
||||
ctx->skel = ringbuf_setup_skeleton();
|
||||
ctx->ringbuf = ring_buffer__new(bpf_map__fd(ctx->skel->maps.ringbuf),
|
||||
buf_process_sample, NULL, NULL);
|
||||
if (!ctx->ringbuf) {
|
||||
fprintf(stderr, "failed to create ringbuf\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
link = bpf_program__attach(ctx->skel->progs.bench_ringbuf);
|
||||
if (IS_ERR(link)) {
|
||||
fprintf(stderr, "failed to attach program!\n");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
static void *ringbuf_libbpf_consumer(void *input)
|
||||
{
|
||||
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
|
||||
|
||||
while (ring_buffer__poll(ctx->ringbuf, -1) >= 0) {
|
||||
if (args.back2back)
|
||||
bufs_trigger_batch();
|
||||
}
|
||||
fprintf(stderr, "ringbuf polling failed!\n");
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/* RINGBUF-CUSTOM benchmark */
|
||||
struct ringbuf_custom {
|
||||
__u64 *consumer_pos;
|
||||
__u64 *producer_pos;
|
||||
__u64 mask;
|
||||
void *data;
|
||||
int map_fd;
|
||||
};
|
||||
|
||||
static struct ringbuf_custom_ctx {
|
||||
struct ringbuf_bench *skel;
|
||||
struct ringbuf_custom ringbuf;
|
||||
int epoll_fd;
|
||||
struct epoll_event event;
|
||||
} ringbuf_custom_ctx;
|
||||
|
||||
static void ringbuf_custom_measure(struct bench_res *res)
|
||||
{
|
||||
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
|
||||
|
||||
res->hits = atomic_swap(&buf_hits.value, 0);
|
||||
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
|
||||
}
|
||||
|
||||
static void ringbuf_custom_setup()
|
||||
{
|
||||
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
|
||||
const size_t page_size = getpagesize();
|
||||
struct bpf_link *link;
|
||||
struct ringbuf_custom *r;
|
||||
void *tmp;
|
||||
int err;
|
||||
|
||||
ctx->skel = ringbuf_setup_skeleton();
|
||||
|
||||
ctx->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
|
||||
if (ctx->epoll_fd < 0) {
|
||||
fprintf(stderr, "failed to create epoll fd: %d\n", -errno);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
r = &ctx->ringbuf;
|
||||
r->map_fd = bpf_map__fd(ctx->skel->maps.ringbuf);
|
||||
r->mask = args.ringbuf_sz - 1;
|
||||
|
||||
/* Map writable consumer page */
|
||||
tmp = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
|
||||
r->map_fd, 0);
|
||||
if (tmp == MAP_FAILED) {
|
||||
fprintf(stderr, "failed to mmap consumer page: %d\n", -errno);
|
||||
exit(1);
|
||||
}
|
||||
r->consumer_pos = tmp;
|
||||
|
||||
/* Map read-only producer page and data pages. */
|
||||
tmp = mmap(NULL, page_size + 2 * args.ringbuf_sz, PROT_READ, MAP_SHARED,
|
||||
r->map_fd, page_size);
|
||||
if (tmp == MAP_FAILED) {
|
||||
fprintf(stderr, "failed to mmap data pages: %d\n", -errno);
|
||||
exit(1);
|
||||
}
|
||||
r->producer_pos = tmp;
|
||||
r->data = tmp + page_size;
|
||||
|
||||
ctx->event.events = EPOLLIN;
|
||||
err = epoll_ctl(ctx->epoll_fd, EPOLL_CTL_ADD, r->map_fd, &ctx->event);
|
||||
if (err < 0) {
|
||||
fprintf(stderr, "failed to epoll add ringbuf: %d\n", -errno);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
link = bpf_program__attach(ctx->skel->progs.bench_ringbuf);
|
||||
if (IS_ERR(link)) {
|
||||
fprintf(stderr, "failed to attach program\n");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
#define RINGBUF_BUSY_BIT (1 << 31)
|
||||
#define RINGBUF_DISCARD_BIT (1 << 30)
|
||||
#define RINGBUF_META_LEN 8
|
||||
|
||||
static inline int roundup_len(__u32 len)
|
||||
{
|
||||
/* clear out top 2 bits */
|
||||
len <<= 2;
|
||||
len >>= 2;
|
||||
/* add length prefix */
|
||||
len += RINGBUF_META_LEN;
|
||||
/* round up to 8 byte alignment */
|
||||
return (len + 7) / 8 * 8;
|
||||
}
|
||||
|
||||
static void ringbuf_custom_process_ring(struct ringbuf_custom *r)
|
||||
{
|
||||
unsigned long cons_pos, prod_pos;
|
||||
int *len_ptr, len;
|
||||
bool got_new_data;
|
||||
|
||||
cons_pos = smp_load_acquire(r->consumer_pos);
|
||||
while (true) {
|
||||
got_new_data = false;
|
||||
prod_pos = smp_load_acquire(r->producer_pos);
|
||||
while (cons_pos < prod_pos) {
|
||||
len_ptr = r->data + (cons_pos & r->mask);
|
||||
len = smp_load_acquire(len_ptr);
|
||||
|
||||
/* sample not committed yet, bail out for now */
|
||||
if (len & RINGBUF_BUSY_BIT)
|
||||
return;
|
||||
|
||||
got_new_data = true;
|
||||
cons_pos += roundup_len(len);
|
||||
|
||||
atomic_inc(&buf_hits.value);
|
||||
}
|
||||
if (got_new_data)
|
||||
smp_store_release(r->consumer_pos, cons_pos);
|
||||
else
|
||||
break;
|
||||
};
|
||||
}
|
||||
|
||||
static void *ringbuf_custom_consumer(void *input)
|
||||
{
|
||||
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
|
||||
int cnt;
|
||||
|
||||
do {
|
||||
if (args.back2back)
|
||||
bufs_trigger_batch();
|
||||
cnt = epoll_wait(ctx->epoll_fd, &ctx->event, 1, -1);
|
||||
if (cnt > 0)
|
||||
ringbuf_custom_process_ring(&ctx->ringbuf);
|
||||
} while (cnt >= 0);
|
||||
fprintf(stderr, "ringbuf polling failed!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* PERFBUF-LIBBPF benchmark */
|
||||
static struct perfbuf_libbpf_ctx {
|
||||
struct perfbuf_bench *skel;
|
||||
struct perf_buffer *perfbuf;
|
||||
} perfbuf_libbpf_ctx;
|
||||
|
||||
static void perfbuf_measure(struct bench_res *res)
|
||||
{
|
||||
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
|
||||
|
||||
res->hits = atomic_swap(&buf_hits.value, 0);
|
||||
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
|
||||
}
|
||||
|
||||
static struct perfbuf_bench *perfbuf_setup_skeleton()
|
||||
{
|
||||
struct perfbuf_bench *skel;
|
||||
|
||||
setup_libbpf();
|
||||
|
||||
skel = perfbuf_bench__open();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "failed to open skeleton\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
skel->rodata->batch_cnt = args.batch_cnt;
|
||||
|
||||
if (perfbuf_bench__load(skel)) {
|
||||
fprintf(stderr, "failed to load skeleton\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
return skel;
|
||||
}
|
||||
|
||||
static enum bpf_perf_event_ret
|
||||
perfbuf_process_sample_raw(void *input_ctx, int cpu,
|
||||
struct perf_event_header *e)
|
||||
{
|
||||
switch (e->type) {
|
||||
case PERF_RECORD_SAMPLE:
|
||||
atomic_inc(&buf_hits.value);
|
||||
break;
|
||||
case PERF_RECORD_LOST:
|
||||
break;
|
||||
default:
|
||||
return LIBBPF_PERF_EVENT_ERROR;
|
||||
}
|
||||
return LIBBPF_PERF_EVENT_CONT;
|
||||
}
|
||||
|
||||
static void perfbuf_libbpf_setup()
|
||||
{
|
||||
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
|
||||
struct perf_event_attr attr;
|
||||
struct perf_buffer_raw_opts pb_opts = {
|
||||
.event_cb = perfbuf_process_sample_raw,
|
||||
.ctx = (void *)(long)0,
|
||||
.attr = &attr,
|
||||
};
|
||||
struct bpf_link *link;
|
||||
|
||||
ctx->skel = perfbuf_setup_skeleton();
|
||||
|
||||
memset(&attr, 0, sizeof(attr));
|
||||
attr.config = PERF_COUNT_SW_BPF_OUTPUT,
|
||||
attr.type = PERF_TYPE_SOFTWARE;
|
||||
attr.sample_type = PERF_SAMPLE_RAW;
|
||||
/* notify only every Nth sample */
|
||||
if (args.sampled) {
|
||||
attr.sample_period = args.sample_rate;
|
||||
attr.wakeup_events = args.sample_rate;
|
||||
} else {
|
||||
attr.sample_period = 1;
|
||||
attr.wakeup_events = 1;
|
||||
}
|
||||
|
||||
if (args.sample_rate > args.batch_cnt) {
|
||||
fprintf(stderr, "sample rate %d is too high for given batch count %d\n",
|
||||
args.sample_rate, args.batch_cnt);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
ctx->perfbuf = perf_buffer__new_raw(bpf_map__fd(ctx->skel->maps.perfbuf),
|
||||
args.perfbuf_sz, &pb_opts);
|
||||
if (!ctx->perfbuf) {
|
||||
fprintf(stderr, "failed to create perfbuf\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
link = bpf_program__attach(ctx->skel->progs.bench_perfbuf);
|
||||
if (IS_ERR(link)) {
|
||||
fprintf(stderr, "failed to attach program\n");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
static void *perfbuf_libbpf_consumer(void *input)
|
||||
{
|
||||
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
|
||||
|
||||
while (perf_buffer__poll(ctx->perfbuf, -1) >= 0) {
|
||||
if (args.back2back)
|
||||
bufs_trigger_batch();
|
||||
}
|
||||
fprintf(stderr, "perfbuf polling failed!\n");
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/* PERFBUF-CUSTOM benchmark */
|
||||
|
||||
/* copies of internal libbpf definitions */
|
||||
struct perf_cpu_buf {
|
||||
struct perf_buffer *pb;
|
||||
void *base; /* mmap()'ed memory */
|
||||
void *buf; /* for reconstructing segmented data */
|
||||
size_t buf_size;
|
||||
int fd;
|
||||
int cpu;
|
||||
int map_key;
|
||||
};
|
||||
|
||||
struct perf_buffer {
|
||||
perf_buffer_event_fn event_cb;
|
||||
perf_buffer_sample_fn sample_cb;
|
||||
perf_buffer_lost_fn lost_cb;
|
||||
void *ctx; /* passed into callbacks */
|
||||
|
||||
size_t page_size;
|
||||
size_t mmap_size;
|
||||
struct perf_cpu_buf **cpu_bufs;
|
||||
struct epoll_event *events;
|
||||
int cpu_cnt; /* number of allocated CPU buffers */
|
||||
int epoll_fd; /* perf event FD */
|
||||
int map_fd; /* BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map FD */
|
||||
};
|
||||
|
||||
static void *perfbuf_custom_consumer(void *input)
|
||||
{
|
||||
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
|
||||
struct perf_buffer *pb = ctx->perfbuf;
|
||||
struct perf_cpu_buf *cpu_buf;
|
||||
struct perf_event_mmap_page *header;
|
||||
size_t mmap_mask = pb->mmap_size - 1;
|
||||
struct perf_event_header *ehdr;
|
||||
__u64 data_head, data_tail;
|
||||
size_t ehdr_size;
|
||||
void *base;
|
||||
int i, cnt;
|
||||
|
||||
while (true) {
|
||||
if (args.back2back)
|
||||
bufs_trigger_batch();
|
||||
cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, -1);
|
||||
if (cnt <= 0) {
|
||||
fprintf(stderr, "perf epoll failed: %d\n", -errno);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
for (i = 0; i < cnt; ++i) {
|
||||
cpu_buf = pb->events[i].data.ptr;
|
||||
header = cpu_buf->base;
|
||||
base = ((void *)header) + pb->page_size;
|
||||
|
||||
data_head = ring_buffer_read_head(header);
|
||||
data_tail = header->data_tail;
|
||||
while (data_head != data_tail) {
|
||||
ehdr = base + (data_tail & mmap_mask);
|
||||
ehdr_size = ehdr->size;
|
||||
|
||||
if (ehdr->type == PERF_RECORD_SAMPLE)
|
||||
atomic_inc(&buf_hits.value);
|
||||
|
||||
data_tail += ehdr_size;
|
||||
}
|
||||
ring_buffer_write_tail(header, data_tail);
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
const struct bench bench_rb_libbpf = {
|
||||
.name = "rb-libbpf",
|
||||
.validate = bufs_validate,
|
||||
.setup = ringbuf_libbpf_setup,
|
||||
.producer_thread = bufs_sample_producer,
|
||||
.consumer_thread = ringbuf_libbpf_consumer,
|
||||
.measure = ringbuf_libbpf_measure,
|
||||
.report_progress = hits_drops_report_progress,
|
||||
.report_final = hits_drops_report_final,
|
||||
};
|
||||
|
||||
const struct bench bench_rb_custom = {
|
||||
.name = "rb-custom",
|
||||
.validate = bufs_validate,
|
||||
.setup = ringbuf_custom_setup,
|
||||
.producer_thread = bufs_sample_producer,
|
||||
.consumer_thread = ringbuf_custom_consumer,
|
||||
.measure = ringbuf_custom_measure,
|
||||
.report_progress = hits_drops_report_progress,
|
||||
.report_final = hits_drops_report_final,
|
||||
};
|
||||
|
||||
const struct bench bench_pb_libbpf = {
|
||||
.name = "pb-libbpf",
|
||||
.validate = bufs_validate,
|
||||
.setup = perfbuf_libbpf_setup,
|
||||
.producer_thread = bufs_sample_producer,
|
||||
.consumer_thread = perfbuf_libbpf_consumer,
|
||||
.measure = perfbuf_measure,
|
||||
.report_progress = hits_drops_report_progress,
|
||||
.report_final = hits_drops_report_final,
|
||||
};
|
||||
|
||||
const struct bench bench_pb_custom = {
|
||||
.name = "pb-custom",
|
||||
.validate = bufs_validate,
|
||||
.setup = perfbuf_libbpf_setup,
|
||||
.producer_thread = bufs_sample_producer,
|
||||
.consumer_thread = perfbuf_custom_consumer,
|
||||
.measure = perfbuf_measure,
|
||||
.report_progress = hits_drops_report_progress,
|
||||
.report_final = hits_drops_report_final,
|
||||
};
|
||||
|
75
tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
Executable file
75
tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
Executable file
@ -0,0 +1,75 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eufo pipefail
|
||||
|
||||
RUN_BENCH="sudo ./bench -w3 -d10 -a"
|
||||
|
||||
function hits()
|
||||
{
|
||||
echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
|
||||
}
|
||||
|
||||
function drops()
|
||||
{
|
||||
echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
|
||||
}
|
||||
|
||||
function header()
|
||||
{
|
||||
local len=${#1}
|
||||
|
||||
printf "\n%s\n" "$1"
|
||||
for i in $(seq 1 $len); do printf '='; done
|
||||
printf '\n'
|
||||
}
|
||||
|
||||
function summarize()
|
||||
{
|
||||
bench="$1"
|
||||
summary=$(echo $2 | tail -n1)
|
||||
printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
|
||||
}
|
||||
|
||||
header "Single-producer, parallel producer"
|
||||
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
|
||||
summarize $b "$($RUN_BENCH $b)"
|
||||
done
|
||||
|
||||
header "Single-producer, parallel producer, sampled notification"
|
||||
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
|
||||
summarize $b "$($RUN_BENCH --rb-sampled $b)"
|
||||
done
|
||||
|
||||
header "Single-producer, back-to-back mode"
|
||||
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
|
||||
summarize $b "$($RUN_BENCH --rb-b2b $b)"
|
||||
summarize $b-sampled "$($RUN_BENCH --rb-sampled --rb-b2b $b)"
|
||||
done
|
||||
|
||||
header "Ringbuf back-to-back, effect of sample rate"
|
||||
for b in 1 5 10 25 50 100 250 500 1000 2000 3000; do
|
||||
summarize "rb-sampled-$b" "$($RUN_BENCH --rb-b2b --rb-batch-cnt $b --rb-sampled --rb-sample-rate $b rb-custom)"
|
||||
done
|
||||
header "Perfbuf back-to-back, effect of sample rate"
|
||||
for b in 1 5 10 25 50 100 250 500 1000 2000 3000; do
|
||||
summarize "pb-sampled-$b" "$($RUN_BENCH --rb-b2b --rb-batch-cnt $b --rb-sampled --rb-sample-rate $b pb-custom)"
|
||||
done
|
||||
|
||||
header "Ringbuf back-to-back, reserve+commit vs output"
|
||||
summarize "reserve" "$($RUN_BENCH --rb-b2b rb-custom)"
|
||||
summarize "output" "$($RUN_BENCH --rb-b2b --rb-use-output rb-custom)"
|
||||
|
||||
header "Ringbuf sampled, reserve+commit vs output"
|
||||
summarize "reserve-sampled" "$($RUN_BENCH --rb-sampled rb-custom)"
|
||||
summarize "output-sampled" "$($RUN_BENCH --rb-sampled --rb-use-output rb-custom)"
|
||||
|
||||
header "Single-producer, consumer/producer competing on the same CPU, low batch count"
|
||||
for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
|
||||
summarize $b "$($RUN_BENCH --rb-batch-cnt 1 --rb-sample-rate 1 --prod-affinity 0 --cons-affinity 0 $b)"
|
||||
done
|
||||
|
||||
header "Ringbuf, multi-producer contention"
|
||||
for b in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
|
||||
summarize "rb-libbpf nr_prod $b" "$($RUN_BENCH -p$b --rb-batch-cnt 50 rb-libbpf)"
|
||||
done
|
||||
|
@ -6,6 +6,8 @@
|
||||
#include <linux/if_tun.h>
|
||||
#include <sys/uio.h>
|
||||
|
||||
#include "bpf_flow.skel.h"
|
||||
|
||||
#ifndef IP_MF
|
||||
#define IP_MF 0x2000
|
||||
#endif
|
||||
@ -101,6 +103,7 @@ struct test {
|
||||
|
||||
#define VLAN_HLEN 4
|
||||
|
||||
static __u32 duration;
|
||||
struct test tests[] = {
|
||||
{
|
||||
.name = "ipv4",
|
||||
@ -444,17 +447,130 @@ static int ifup(const char *ifname)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int init_prog_array(struct bpf_object *obj, struct bpf_map *prog_array)
|
||||
{
|
||||
int i, err, map_fd, prog_fd;
|
||||
struct bpf_program *prog;
|
||||
char prog_name[32];
|
||||
|
||||
map_fd = bpf_map__fd(prog_array);
|
||||
if (map_fd < 0)
|
||||
return -1;
|
||||
|
||||
for (i = 0; i < bpf_map__def(prog_array)->max_entries; i++) {
|
||||
snprintf(prog_name, sizeof(prog_name), "flow_dissector/%i", i);
|
||||
|
||||
prog = bpf_object__find_program_by_title(obj, prog_name);
|
||||
if (!prog)
|
||||
return -1;
|
||||
|
||||
prog_fd = bpf_program__fd(prog);
|
||||
if (prog_fd < 0)
|
||||
return -1;
|
||||
|
||||
err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
|
||||
if (err)
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void run_tests_skb_less(int tap_fd, struct bpf_map *keys)
|
||||
{
|
||||
int i, err, keys_fd;
|
||||
|
||||
keys_fd = bpf_map__fd(keys);
|
||||
if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
|
||||
return;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
/* Keep in sync with 'flags' from eth_get_headlen. */
|
||||
__u32 eth_get_headlen_flags =
|
||||
BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
|
||||
struct bpf_prog_test_run_attr tattr = {};
|
||||
struct bpf_flow_keys flow_keys = {};
|
||||
__u32 key = (__u32)(tests[i].keys.sport) << 16 |
|
||||
tests[i].keys.dport;
|
||||
|
||||
/* For skb-less case we can't pass input flags; run
|
||||
* only the tests that have a matching set of flags.
|
||||
*/
|
||||
|
||||
if (tests[i].flags != eth_get_headlen_flags)
|
||||
continue;
|
||||
|
||||
err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
|
||||
CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
|
||||
|
||||
err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
|
||||
CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
|
||||
|
||||
CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
|
||||
CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
|
||||
|
||||
err = bpf_map_delete_elem(keys_fd, &key);
|
||||
CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
|
||||
}
|
||||
}
|
||||
|
||||
static void test_skb_less_prog_attach(struct bpf_flow *skel, int tap_fd)
|
||||
{
|
||||
int err, prog_fd;
|
||||
|
||||
prog_fd = bpf_program__fd(skel->progs._dissect);
|
||||
if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
|
||||
return;
|
||||
|
||||
err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno))
|
||||
return;
|
||||
|
||||
run_tests_skb_less(tap_fd, skel->maps.last_dissection);
|
||||
|
||||
err = bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
|
||||
CHECK(err, "bpf_prog_detach", "err %d errno %d\n", err, errno);
|
||||
}
|
||||
|
||||
static void test_skb_less_link_create(struct bpf_flow *skel, int tap_fd)
|
||||
{
|
||||
struct bpf_link *link;
|
||||
int err, net_fd;
|
||||
|
||||
net_fd = open("/proc/self/ns/net", O_RDONLY);
|
||||
if (CHECK(net_fd < 0, "open(/proc/self/ns/net)", "err %d\n", errno))
|
||||
return;
|
||||
|
||||
link = bpf_program__attach_netns(skel->progs._dissect, net_fd);
|
||||
if (CHECK(IS_ERR(link), "attach_netns", "err %ld\n", PTR_ERR(link)))
|
||||
goto out_close;
|
||||
|
||||
run_tests_skb_less(tap_fd, skel->maps.last_dissection);
|
||||
|
||||
err = bpf_link__destroy(link);
|
||||
CHECK(err, "bpf_link__destroy", "err %d\n", err);
|
||||
out_close:
|
||||
close(net_fd);
|
||||
}
|
||||
|
||||
void test_flow_dissector(void)
|
||||
{
|
||||
int i, err, prog_fd, keys_fd = -1, tap_fd;
|
||||
struct bpf_object *obj;
|
||||
__u32 duration = 0;
|
||||
struct bpf_flow *skel;
|
||||
|
||||
err = bpf_flow_load(&obj, "./bpf_flow.o", "flow_dissector",
|
||||
"jmp_table", "last_dissection", &prog_fd, &keys_fd);
|
||||
if (CHECK_FAIL(err))
|
||||
skel = bpf_flow__open_and_load();
|
||||
if (CHECK(!skel, "skel", "failed to open/load skeleton\n"))
|
||||
return;
|
||||
|
||||
prog_fd = bpf_program__fd(skel->progs._dissect);
|
||||
if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
|
||||
goto out_destroy_skel;
|
||||
keys_fd = bpf_map__fd(skel->maps.last_dissection);
|
||||
if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
|
||||
goto out_destroy_skel;
|
||||
err = init_prog_array(skel->obj, skel->maps.jmp_table);
|
||||
if (CHECK(err, "init_prog_array", "err %d\n", err))
|
||||
goto out_destroy_skel;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
struct bpf_flow_keys flow_keys;
|
||||
struct bpf_prog_test_run_attr tattr = {
|
||||
@ -487,43 +603,17 @@ void test_flow_dissector(void)
|
||||
* via BPF map in this case.
|
||||
*/
|
||||
|
||||
err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno);
|
||||
|
||||
tap_fd = create_tap("tap0");
|
||||
CHECK(tap_fd < 0, "create_tap", "tap_fd %d errno %d\n", tap_fd, errno);
|
||||
err = ifup("tap0");
|
||||
CHECK(err, "ifup", "err %d errno %d\n", err, errno);
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
/* Keep in sync with 'flags' from eth_get_headlen. */
|
||||
__u32 eth_get_headlen_flags =
|
||||
BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
|
||||
struct bpf_prog_test_run_attr tattr = {};
|
||||
struct bpf_flow_keys flow_keys = {};
|
||||
__u32 key = (__u32)(tests[i].keys.sport) << 16 |
|
||||
tests[i].keys.dport;
|
||||
/* Test direct prog attachment */
|
||||
test_skb_less_prog_attach(skel, tap_fd);
|
||||
/* Test indirect prog attachment via link */
|
||||
test_skb_less_link_create(skel, tap_fd);
|
||||
|
||||
/* For skb-less case we can't pass input flags; run
|
||||
* only the tests that have a matching set of flags.
|
||||
*/
|
||||
|
||||
if (tests[i].flags != eth_get_headlen_flags)
|
||||
continue;
|
||||
|
||||
err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
|
||||
CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
|
||||
|
||||
err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
|
||||
CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
|
||||
|
||||
CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
|
||||
CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
|
||||
|
||||
err = bpf_map_delete_elem(keys_fd, &key);
|
||||
CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
|
||||
}
|
||||
|
||||
bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
|
||||
bpf_object__close(obj);
|
||||
close(tap_fd);
|
||||
out_destroy_skel:
|
||||
bpf_flow__destroy(skel);
|
||||
}
|
||||
|
@ -11,6 +11,7 @@
|
||||
#include <fcntl.h>
|
||||
#include <sched.h>
|
||||
#include <stdbool.h>
|
||||
#include <sys/stat.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#include <linux/bpf.h>
|
||||
@ -18,21 +19,30 @@
|
||||
|
||||
#include "test_progs.h"
|
||||
|
||||
static bool is_attached(int netns)
|
||||
static int init_net = -1;
|
||||
|
||||
static __u32 query_attached_prog_id(int netns)
|
||||
{
|
||||
__u32 cnt;
|
||||
__u32 prog_ids[1] = {};
|
||||
__u32 prog_cnt = ARRAY_SIZE(prog_ids);
|
||||
int err;
|
||||
|
||||
err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL, NULL, &cnt);
|
||||
err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL,
|
||||
prog_ids, &prog_cnt);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_query");
|
||||
return true; /* fail-safe */
|
||||
return 0;
|
||||
}
|
||||
|
||||
return cnt > 0;
|
||||
return prog_cnt == 1 ? prog_ids[0] : 0;
|
||||
}
|
||||
|
||||
static int load_prog(void)
|
||||
static bool prog_is_attached(int netns)
|
||||
{
|
||||
return query_attached_prog_id(netns) > 0;
|
||||
}
|
||||
|
||||
static int load_prog(enum bpf_prog_type type)
|
||||
{
|
||||
struct bpf_insn prog[] = {
|
||||
BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
|
||||
@ -40,61 +50,566 @@ static int load_prog(void)
|
||||
};
|
||||
int fd;
|
||||
|
||||
fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
|
||||
ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
|
||||
fd = bpf_load_program(type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
|
||||
if (CHECK_FAIL(fd < 0))
|
||||
perror("bpf_load_program");
|
||||
|
||||
return fd;
|
||||
}
|
||||
|
||||
static void do_flow_dissector_reattach(void)
|
||||
static __u32 query_prog_id(int prog)
|
||||
{
|
||||
int prog_fd[2] = { -1, -1 };
|
||||
struct bpf_prog_info info = {};
|
||||
__u32 info_len = sizeof(info);
|
||||
int err;
|
||||
|
||||
prog_fd[0] = load_prog();
|
||||
if (prog_fd[0] < 0)
|
||||
return;
|
||||
|
||||
prog_fd[1] = load_prog();
|
||||
if (prog_fd[1] < 0)
|
||||
goto out_close;
|
||||
|
||||
err = bpf_prog_attach(prog_fd[0], 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach-0");
|
||||
goto out_close;
|
||||
err = bpf_obj_get_info_by_fd(prog, &info, &info_len);
|
||||
if (CHECK_FAIL(err || info_len != sizeof(info))) {
|
||||
perror("bpf_obj_get_info_by_fd");
|
||||
return 0;
|
||||
}
|
||||
|
||||
return info.id;
|
||||
}
|
||||
|
||||
static int unshare_net(int old_net)
|
||||
{
|
||||
int err, new_net;
|
||||
|
||||
err = unshare(CLONE_NEWNET);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("unshare(CLONE_NEWNET)");
|
||||
return -1;
|
||||
}
|
||||
new_net = open("/proc/self/ns/net", O_RDONLY);
|
||||
if (CHECK_FAIL(new_net < 0)) {
|
||||
perror("open(/proc/self/ns/net)");
|
||||
setns(old_net, CLONE_NEWNET);
|
||||
return -1;
|
||||
}
|
||||
return new_net;
|
||||
}
|
||||
|
||||
static void test_prog_attach_prog_attach(int netns, int prog1, int prog2)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect success when attaching a different program */
|
||||
err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
|
||||
err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach-1");
|
||||
perror("bpf_prog_attach(prog2) #1");
|
||||
goto out_detach;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
|
||||
|
||||
/* Expect failure when attaching the same program twice */
|
||||
err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
|
||||
err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(!err || errno != EINVAL))
|
||||
perror("bpf_prog_attach-2");
|
||||
perror("bpf_prog_attach(prog2) #2");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
|
||||
|
||||
out_detach:
|
||||
err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
|
||||
if (CHECK_FAIL(err))
|
||||
perror("bpf_prog_detach");
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_create_link_create(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
|
||||
int link1, link2;
|
||||
|
||||
link1 = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect failure creating link when another link exists */
|
||||
errno = 0;
|
||||
link2 = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link2 != -1 || errno != E2BIG))
|
||||
perror("bpf_prog_attach(prog2) expected E2BIG");
|
||||
if (link2 != -1)
|
||||
close(link2);
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(link1);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_prog_attach_link_create(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
|
||||
int err, link;
|
||||
|
||||
err = bpf_prog_attach(prog1, -1, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect failure creating link when prog attached */
|
||||
errno = 0;
|
||||
link = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link != -1 || errno != EEXIST))
|
||||
perror("bpf_link_create(prog2) expected EEXIST");
|
||||
if (link != -1)
|
||||
close(link);
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
|
||||
if (CHECK_FAIL(err))
|
||||
perror("bpf_prog_detach");
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_create_prog_attach(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
|
||||
int err, link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect failure attaching prog when link exists */
|
||||
errno = 0;
|
||||
err = bpf_prog_attach(prog2, -1, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(!err || errno != EEXIST))
|
||||
perror("bpf_prog_attach(prog2) expected EEXIST");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_create_prog_detach(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
|
||||
int err, link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect failure detaching prog when link exists */
|
||||
errno = 0;
|
||||
err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
|
||||
if (CHECK_FAIL(!err || errno != EINVAL))
|
||||
perror("bpf_prog_detach expected EINVAL");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_prog_attach_detach_query(int netns, int prog1, int prog2)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_detach");
|
||||
return;
|
||||
}
|
||||
|
||||
/* Expect no prog attached after successful detach */
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_create_close_query(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
|
||||
int link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(link);
|
||||
/* Expect no prog attached after closing last link FD */
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_update_no_old_prog(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
int err, link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect success replacing the prog when old prog not specified */
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(err))
|
||||
perror("bpf_link_update");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
|
||||
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_update_replace_old_prog(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
int err, link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect success F_REPLACE and old prog specified to succeed */
|
||||
update_opts.flags = BPF_F_REPLACE;
|
||||
update_opts.old_prog_fd = prog1;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(err))
|
||||
perror("bpf_link_update");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
|
||||
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_update_invalid_opts(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
int err, link;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect update to fail w/ old prog FD but w/o F_REPLACE*/
|
||||
errno = 0;
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = prog1;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EINVAL)) {
|
||||
perror("bpf_link_update expected EINVAL");
|
||||
goto out_close;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect update to fail on old prog FD mismatch */
|
||||
errno = 0;
|
||||
update_opts.flags = BPF_F_REPLACE;
|
||||
update_opts.old_prog_fd = prog2;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EPERM)) {
|
||||
perror("bpf_link_update expected EPERM");
|
||||
goto out_close;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect update to fail for invalid old prog FD */
|
||||
errno = 0;
|
||||
update_opts.flags = BPF_F_REPLACE;
|
||||
update_opts.old_prog_fd = -1;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EBADF)) {
|
||||
perror("bpf_link_update expected EBADF");
|
||||
goto out_close;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect update to fail with invalid flags */
|
||||
errno = 0;
|
||||
update_opts.flags = BPF_F_ALLOW_MULTI;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EINVAL))
|
||||
perror("bpf_link_update expected EINVAL");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
out_close:
|
||||
close(prog_fd[1]);
|
||||
close(prog_fd[0]);
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_update_invalid_prog(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
int err, link, prog3;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
/* Expect failure when new prog FD is not valid */
|
||||
errno = 0;
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, -1, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EBADF)) {
|
||||
perror("bpf_link_update expected EINVAL");
|
||||
goto out_close_link;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
prog3 = load_prog(BPF_PROG_TYPE_SOCKET_FILTER);
|
||||
if (prog3 < 0)
|
||||
goto out_close_link;
|
||||
|
||||
/* Expect failure when new prog FD type doesn't match */
|
||||
errno = 0;
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, prog3, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != EINVAL))
|
||||
perror("bpf_link_update expected EINVAL");
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(prog3);
|
||||
out_close_link:
|
||||
close(link);
|
||||
CHECK_FAIL(prog_is_attached(netns));
|
||||
}
|
||||
|
||||
static void test_link_update_netns_gone(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
int err, link, old_net;
|
||||
|
||||
old_net = netns;
|
||||
netns = unshare_net(old_net);
|
||||
if (netns < 0)
|
||||
return;
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
return;
|
||||
}
|
||||
CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
|
||||
|
||||
close(netns);
|
||||
err = setns(old_net, CLONE_NEWNET);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("setns(CLONE_NEWNET)");
|
||||
close(link);
|
||||
return;
|
||||
}
|
||||
|
||||
/* Expect failure when netns destroyed */
|
||||
errno = 0;
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(!err || errno != ENOLINK))
|
||||
perror("bpf_link_update");
|
||||
|
||||
close(link);
|
||||
}
|
||||
|
||||
static void test_link_get_info(int netns, int prog1, int prog2)
|
||||
{
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
|
||||
DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
|
||||
struct bpf_link_info info = {};
|
||||
struct stat netns_stat = {};
|
||||
__u32 info_len, link_id;
|
||||
int err, link, old_net;
|
||||
|
||||
old_net = netns;
|
||||
netns = unshare_net(old_net);
|
||||
if (netns < 0)
|
||||
return;
|
||||
|
||||
err = fstat(netns, &netns_stat);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("stat(netns)");
|
||||
goto out_resetns;
|
||||
}
|
||||
|
||||
link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
|
||||
if (CHECK_FAIL(link < 0)) {
|
||||
perror("bpf_link_create(prog1)");
|
||||
goto out_resetns;
|
||||
}
|
||||
|
||||
info_len = sizeof(info);
|
||||
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_obj_get_info");
|
||||
goto out_unlink;
|
||||
}
|
||||
CHECK_FAIL(info_len != sizeof(info));
|
||||
|
||||
/* Expect link info to be sane and match prog and netns details */
|
||||
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
|
||||
CHECK_FAIL(info.id == 0);
|
||||
CHECK_FAIL(info.prog_id != query_prog_id(prog1));
|
||||
CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
|
||||
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
|
||||
|
||||
update_opts.flags = 0;
|
||||
update_opts.old_prog_fd = 0;
|
||||
err = bpf_link_update(link, prog2, &update_opts);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_link_update(prog2)");
|
||||
goto out_unlink;
|
||||
}
|
||||
|
||||
link_id = info.id;
|
||||
info_len = sizeof(info);
|
||||
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_obj_get_info");
|
||||
goto out_unlink;
|
||||
}
|
||||
CHECK_FAIL(info_len != sizeof(info));
|
||||
|
||||
/* Expect no info change after update except in prog id */
|
||||
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
|
||||
CHECK_FAIL(info.id != link_id);
|
||||
CHECK_FAIL(info.prog_id != query_prog_id(prog2));
|
||||
CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
|
||||
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
|
||||
|
||||
/* Leave netns link is attached to and close last FD to it */
|
||||
err = setns(old_net, CLONE_NEWNET);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("setns(NEWNET)");
|
||||
goto out_unlink;
|
||||
}
|
||||
close(netns);
|
||||
old_net = -1;
|
||||
netns = -1;
|
||||
|
||||
info_len = sizeof(info);
|
||||
err = bpf_obj_get_info_by_fd(link, &info, &info_len);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_obj_get_info");
|
||||
goto out_unlink;
|
||||
}
|
||||
CHECK_FAIL(info_len != sizeof(info));
|
||||
|
||||
/* Expect netns_ino to change to 0 */
|
||||
CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
|
||||
CHECK_FAIL(info.id != link_id);
|
||||
CHECK_FAIL(info.prog_id != query_prog_id(prog2));
|
||||
CHECK_FAIL(info.netns.netns_ino != 0);
|
||||
CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
|
||||
|
||||
out_unlink:
|
||||
close(link);
|
||||
out_resetns:
|
||||
if (old_net != -1)
|
||||
setns(old_net, CLONE_NEWNET);
|
||||
if (netns != -1)
|
||||
close(netns);
|
||||
}
|
||||
|
||||
static void run_tests(int netns)
|
||||
{
|
||||
struct test {
|
||||
const char *test_name;
|
||||
void (*test_func)(int netns, int prog1, int prog2);
|
||||
} tests[] = {
|
||||
{ "prog attach, prog attach",
|
||||
test_prog_attach_prog_attach },
|
||||
{ "link create, link create",
|
||||
test_link_create_link_create },
|
||||
{ "prog attach, link create",
|
||||
test_prog_attach_link_create },
|
||||
{ "link create, prog attach",
|
||||
test_link_create_prog_attach },
|
||||
{ "link create, prog detach",
|
||||
test_link_create_prog_detach },
|
||||
{ "prog attach, detach, query",
|
||||
test_prog_attach_detach_query },
|
||||
{ "link create, close, query",
|
||||
test_link_create_close_query },
|
||||
{ "link update no old prog",
|
||||
test_link_update_no_old_prog },
|
||||
{ "link update with replace old prog",
|
||||
test_link_update_replace_old_prog },
|
||||
{ "link update invalid opts",
|
||||
test_link_update_invalid_opts },
|
||||
{ "link update invalid prog",
|
||||
test_link_update_invalid_prog },
|
||||
{ "link update netns gone",
|
||||
test_link_update_netns_gone },
|
||||
{ "link get info",
|
||||
test_link_get_info },
|
||||
};
|
||||
int i, progs[2] = { -1, -1 };
|
||||
char test_name[80];
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(progs); i++) {
|
||||
progs[i] = load_prog(BPF_PROG_TYPE_FLOW_DISSECTOR);
|
||||
if (progs[i] < 0)
|
||||
goto out_close;
|
||||
}
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
snprintf(test_name, sizeof(test_name),
|
||||
"flow dissector %s%s",
|
||||
tests[i].test_name,
|
||||
netns == init_net ? " (init_net)" : "");
|
||||
if (test__start_subtest(test_name))
|
||||
tests[i].test_func(netns, progs[0], progs[1]);
|
||||
}
|
||||
out_close:
|
||||
for (i = 0; i < ARRAY_SIZE(progs); i++) {
|
||||
if (progs[i] != -1)
|
||||
CHECK_FAIL(close(progs[i]));
|
||||
}
|
||||
}
|
||||
|
||||
void test_flow_dissector_reattach(void)
|
||||
{
|
||||
int init_net, self_net, err;
|
||||
int err, new_net, saved_net;
|
||||
|
||||
self_net = open("/proc/self/ns/net", O_RDONLY);
|
||||
if (CHECK_FAIL(self_net < 0)) {
|
||||
saved_net = open("/proc/self/ns/net", O_RDONLY);
|
||||
if (CHECK_FAIL(saved_net < 0)) {
|
||||
perror("open(/proc/self/ns/net");
|
||||
return;
|
||||
}
|
||||
@ -111,30 +626,29 @@ void test_flow_dissector_reattach(void)
|
||||
goto out_close;
|
||||
}
|
||||
|
||||
if (is_attached(init_net)) {
|
||||
if (prog_is_attached(init_net)) {
|
||||
test__skip();
|
||||
printf("Can't test with flow dissector attached to init_net\n");
|
||||
goto out_setns;
|
||||
}
|
||||
|
||||
/* First run tests in root network namespace */
|
||||
do_flow_dissector_reattach();
|
||||
run_tests(init_net);
|
||||
|
||||
/* Then repeat tests in a non-root namespace */
|
||||
err = unshare(CLONE_NEWNET);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("unshare(CLONE_NEWNET)");
|
||||
new_net = unshare_net(init_net);
|
||||
if (new_net < 0)
|
||||
goto out_setns;
|
||||
}
|
||||
do_flow_dissector_reattach();
|
||||
run_tests(new_net);
|
||||
close(new_net);
|
||||
|
||||
out_setns:
|
||||
/* Move back to netns we started in. */
|
||||
err = setns(self_net, CLONE_NEWNET);
|
||||
err = setns(saved_net, CLONE_NEWNET);
|
||||
if (CHECK_FAIL(err))
|
||||
perror("setns(/proc/self/ns/net)");
|
||||
|
||||
out_close:
|
||||
close(init_net);
|
||||
close(self_net);
|
||||
close(saved_net);
|
||||
}
|
||||
|
211
tools/testing/selftests/bpf/prog_tests/ringbuf.c
Normal file
211
tools/testing/selftests/bpf/prog_tests/ringbuf.c
Normal file
@ -0,0 +1,211 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define _GNU_SOURCE
|
||||
#include <linux/compiler.h>
|
||||
#include <asm/barrier.h>
|
||||
#include <test_progs.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/epoll.h>
|
||||
#include <time.h>
|
||||
#include <sched.h>
|
||||
#include <signal.h>
|
||||
#include <pthread.h>
|
||||
#include <sys/sysinfo.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <linux/ring_buffer.h>
|
||||
#include "test_ringbuf.skel.h"
|
||||
|
||||
#define EDONE 7777
|
||||
|
||||
static int duration = 0;
|
||||
|
||||
struct sample {
|
||||
int pid;
|
||||
int seq;
|
||||
long value;
|
||||
char comm[16];
|
||||
};
|
||||
|
||||
static int sample_cnt;
|
||||
|
||||
static int process_sample(void *ctx, void *data, size_t len)
|
||||
{
|
||||
struct sample *s = data;
|
||||
|
||||
sample_cnt++;
|
||||
|
||||
switch (s->seq) {
|
||||
case 0:
|
||||
CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n",
|
||||
333L, s->value);
|
||||
return 0;
|
||||
case 1:
|
||||
CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n",
|
||||
777L, s->value);
|
||||
return -EDONE;
|
||||
default:
|
||||
/* we don't care about the rest */
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static struct test_ringbuf *skel;
|
||||
static struct ring_buffer *ringbuf;
|
||||
|
||||
static void trigger_samples()
|
||||
{
|
||||
skel->bss->dropped = 0;
|
||||
skel->bss->total = 0;
|
||||
skel->bss->discarded = 0;
|
||||
|
||||
/* trigger exactly two samples */
|
||||
skel->bss->value = 333;
|
||||
syscall(__NR_getpgid);
|
||||
skel->bss->value = 777;
|
||||
syscall(__NR_getpgid);
|
||||
}
|
||||
|
||||
static void *poll_thread(void *input)
|
||||
{
|
||||
long timeout = (long)input;
|
||||
|
||||
return (void *)(long)ring_buffer__poll(ringbuf, timeout);
|
||||
}
|
||||
|
||||
void test_ringbuf(void)
|
||||
{
|
||||
const size_t rec_sz = BPF_RINGBUF_HDR_SZ + sizeof(struct sample);
|
||||
pthread_t thread;
|
||||
long bg_ret = -1;
|
||||
int err;
|
||||
|
||||
skel = test_ringbuf__open_and_load();
|
||||
if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n"))
|
||||
return;
|
||||
|
||||
/* only trigger BPF program for current process */
|
||||
skel->bss->pid = getpid();
|
||||
|
||||
ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf),
|
||||
process_sample, NULL, NULL);
|
||||
if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n"))
|
||||
goto cleanup;
|
||||
|
||||
err = test_ringbuf__attach(skel);
|
||||
if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
trigger_samples();
|
||||
|
||||
/* 2 submitted + 1 discarded records */
|
||||
CHECK(skel->bss->avail_data != 3 * rec_sz,
|
||||
"err_avail_size", "exp %ld, got %ld\n",
|
||||
3L * rec_sz, skel->bss->avail_data);
|
||||
CHECK(skel->bss->ring_size != 4096,
|
||||
"err_ring_size", "exp %ld, got %ld\n",
|
||||
4096L, skel->bss->ring_size);
|
||||
CHECK(skel->bss->cons_pos != 0,
|
||||
"err_cons_pos", "exp %ld, got %ld\n",
|
||||
0L, skel->bss->cons_pos);
|
||||
CHECK(skel->bss->prod_pos != 3 * rec_sz,
|
||||
"err_prod_pos", "exp %ld, got %ld\n",
|
||||
3L * rec_sz, skel->bss->prod_pos);
|
||||
|
||||
/* poll for samples */
|
||||
err = ring_buffer__poll(ringbuf, -1);
|
||||
|
||||
/* -EDONE is used as an indicator that we are done */
|
||||
if (CHECK(err != -EDONE, "err_done", "done err: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* we expect extra polling to return nothing */
|
||||
err = ring_buffer__poll(ringbuf, 0);
|
||||
if (CHECK(err != 0, "extra_samples", "poll result: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
|
||||
0L, skel->bss->dropped);
|
||||
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
|
||||
2L, skel->bss->total);
|
||||
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
|
||||
1L, skel->bss->discarded);
|
||||
|
||||
/* now validate consumer position is updated and returned */
|
||||
trigger_samples();
|
||||
CHECK(skel->bss->cons_pos != 3 * rec_sz,
|
||||
"err_cons_pos", "exp %ld, got %ld\n",
|
||||
3L * rec_sz, skel->bss->cons_pos);
|
||||
err = ring_buffer__poll(ringbuf, -1);
|
||||
CHECK(err <= 0, "poll_err", "err %d\n", err);
|
||||
|
||||
/* start poll in background w/ long timeout */
|
||||
err = pthread_create(&thread, NULL, poll_thread, (void *)(long)10000);
|
||||
if (CHECK(err, "bg_poll", "pthread_create failed: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* turn off notifications now */
|
||||
skel->bss->flags = BPF_RB_NO_WAKEUP;
|
||||
|
||||
/* give background thread a bit of a time */
|
||||
usleep(50000);
|
||||
trigger_samples();
|
||||
/* sleeping arbitrarily is bad, but no better way to know that
|
||||
* epoll_wait() **DID NOT** unblock in background thread
|
||||
*/
|
||||
usleep(50000);
|
||||
/* background poll should still be blocked */
|
||||
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
|
||||
if (CHECK(err != EBUSY, "try_join", "err %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* BPF side did everything right */
|
||||
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
|
||||
0L, skel->bss->dropped);
|
||||
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
|
||||
2L, skel->bss->total);
|
||||
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
|
||||
1L, skel->bss->discarded);
|
||||
|
||||
/* clear flags to return to "adaptive" notification mode */
|
||||
skel->bss->flags = 0;
|
||||
|
||||
/* produce new samples, no notification should be triggered, because
|
||||
* consumer is now behind
|
||||
*/
|
||||
trigger_samples();
|
||||
|
||||
/* background poll should still be blocked */
|
||||
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
|
||||
if (CHECK(err != EBUSY, "try_join", "err %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* now force notifications */
|
||||
skel->bss->flags = BPF_RB_FORCE_WAKEUP;
|
||||
sample_cnt = 0;
|
||||
trigger_samples();
|
||||
|
||||
/* now we should get a pending notification */
|
||||
usleep(50000);
|
||||
err = pthread_tryjoin_np(thread, (void **)&bg_ret);
|
||||
if (CHECK(err, "join_bg", "err %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
if (CHECK(bg_ret != 1, "bg_ret", "epoll_wait result: %ld", bg_ret))
|
||||
goto cleanup;
|
||||
|
||||
/* 3 rounds, 2 samples each */
|
||||
CHECK(sample_cnt != 6, "wrong_sample_cnt",
|
||||
"expected to see %d samples, got %d\n", 6, sample_cnt);
|
||||
|
||||
/* BPF side did everything right */
|
||||
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
|
||||
0L, skel->bss->dropped);
|
||||
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
|
||||
2L, skel->bss->total);
|
||||
CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n",
|
||||
1L, skel->bss->discarded);
|
||||
|
||||
test_ringbuf__detach(skel);
|
||||
cleanup:
|
||||
ring_buffer__free(ringbuf);
|
||||
test_ringbuf__destroy(skel);
|
||||
}
|
102
tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c
Normal file
102
tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c
Normal file
@ -0,0 +1,102 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define _GNU_SOURCE
|
||||
#include <test_progs.h>
|
||||
#include <sys/epoll.h>
|
||||
#include "test_ringbuf_multi.skel.h"
|
||||
|
||||
static int duration = 0;
|
||||
|
||||
struct sample {
|
||||
int pid;
|
||||
int seq;
|
||||
long value;
|
||||
char comm[16];
|
||||
};
|
||||
|
||||
static int process_sample(void *ctx, void *data, size_t len)
|
||||
{
|
||||
int ring = (unsigned long)ctx;
|
||||
struct sample *s = data;
|
||||
|
||||
switch (s->seq) {
|
||||
case 0:
|
||||
CHECK(ring != 1, "sample1_ring", "exp %d, got %d\n", 1, ring);
|
||||
CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n",
|
||||
333L, s->value);
|
||||
break;
|
||||
case 1:
|
||||
CHECK(ring != 2, "sample2_ring", "exp %d, got %d\n", 2, ring);
|
||||
CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n",
|
||||
777L, s->value);
|
||||
break;
|
||||
default:
|
||||
CHECK(true, "extra_sample", "unexpected sample seq %d, val %ld\n",
|
||||
s->seq, s->value);
|
||||
return -1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void test_ringbuf_multi(void)
|
||||
{
|
||||
struct test_ringbuf_multi *skel;
|
||||
struct ring_buffer *ringbuf;
|
||||
int err;
|
||||
|
||||
skel = test_ringbuf_multi__open_and_load();
|
||||
if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n"))
|
||||
return;
|
||||
|
||||
/* only trigger BPF program for current process */
|
||||
skel->bss->pid = getpid();
|
||||
|
||||
ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf1),
|
||||
process_sample, (void *)(long)1, NULL);
|
||||
if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n"))
|
||||
goto cleanup;
|
||||
|
||||
err = ring_buffer__add(ringbuf, bpf_map__fd(skel->maps.ringbuf2),
|
||||
process_sample, (void *)(long)2);
|
||||
if (CHECK(err, "ringbuf_add", "failed to add another ring\n"))
|
||||
goto cleanup;
|
||||
|
||||
err = test_ringbuf_multi__attach(skel);
|
||||
if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* trigger few samples, some will be skipped */
|
||||
skel->bss->target_ring = 0;
|
||||
skel->bss->value = 333;
|
||||
syscall(__NR_getpgid);
|
||||
|
||||
/* skipped, no ringbuf in slot 1 */
|
||||
skel->bss->target_ring = 1;
|
||||
skel->bss->value = 555;
|
||||
syscall(__NR_getpgid);
|
||||
|
||||
skel->bss->target_ring = 2;
|
||||
skel->bss->value = 777;
|
||||
syscall(__NR_getpgid);
|
||||
|
||||
/* poll for samples, should get 2 ringbufs back */
|
||||
err = ring_buffer__poll(ringbuf, -1);
|
||||
if (CHECK(err != 4, "poll_res", "expected 4 records, got %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
/* expect extra polling to return nothing */
|
||||
err = ring_buffer__poll(ringbuf, 0);
|
||||
if (CHECK(err < 0, "extra_samples", "poll result: %d\n", err))
|
||||
goto cleanup;
|
||||
|
||||
CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n",
|
||||
0L, skel->bss->dropped);
|
||||
CHECK(skel->bss->skipped != 1, "err_skipped", "exp %ld, got %ld\n",
|
||||
1L, skel->bss->skipped);
|
||||
CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n",
|
||||
2L, skel->bss->total);
|
||||
|
||||
cleanup:
|
||||
ring_buffer__free(ringbuf);
|
||||
test_ringbuf_multi__destroy(skel);
|
||||
}
|
30
tools/testing/selftests/bpf/prog_tests/skb_helpers.c
Normal file
30
tools/testing/selftests/bpf/prog_tests/skb_helpers.c
Normal file
@ -0,0 +1,30 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <test_progs.h>
|
||||
#include <network_helpers.h>
|
||||
|
||||
void test_skb_helpers(void)
|
||||
{
|
||||
struct __sk_buff skb = {
|
||||
.wire_len = 100,
|
||||
.gso_segs = 8,
|
||||
.gso_size = 10,
|
||||
};
|
||||
struct bpf_prog_test_run_attr tattr = {
|
||||
.data_in = &pkt_v4,
|
||||
.data_size_in = sizeof(pkt_v4),
|
||||
.ctx_in = &skb,
|
||||
.ctx_size_in = sizeof(skb),
|
||||
.ctx_out = &skb,
|
||||
.ctx_size_out = sizeof(skb),
|
||||
};
|
||||
struct bpf_object *obj;
|
||||
int err;
|
||||
|
||||
err = bpf_prog_load("./test_skb_helpers.o", BPF_PROG_TYPE_SCHED_CLS, &obj,
|
||||
&tattr.prog_fd);
|
||||
if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
|
||||
return;
|
||||
err = bpf_prog_test_run_xattr(&tattr);
|
||||
CHECK_ATTR(err, "len", "err %d errno %d\n", err, errno);
|
||||
bpf_object__close(obj);
|
||||
}
|
@ -1,7 +1,9 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Cloudflare
|
||||
#include <error.h>
|
||||
|
||||
#include "test_progs.h"
|
||||
#include "test_skmsg_load_helpers.skel.h"
|
||||
|
||||
#define TCP_REPAIR 19 /* TCP sock is under repair right now */
|
||||
|
||||
@ -70,10 +72,43 @@ out:
|
||||
close(s);
|
||||
}
|
||||
|
||||
static void test_skmsg_helpers(enum bpf_map_type map_type)
|
||||
{
|
||||
struct test_skmsg_load_helpers *skel;
|
||||
int err, map, verdict;
|
||||
|
||||
skel = test_skmsg_load_helpers__open_and_load();
|
||||
if (CHECK_FAIL(!skel)) {
|
||||
perror("test_skmsg_load_helpers__open_and_load");
|
||||
return;
|
||||
}
|
||||
|
||||
verdict = bpf_program__fd(skel->progs.prog_msg_verdict);
|
||||
map = bpf_map__fd(skel->maps.sock_map);
|
||||
|
||||
err = bpf_prog_attach(verdict, map, BPF_SK_MSG_VERDICT, 0);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_attach");
|
||||
goto out;
|
||||
}
|
||||
|
||||
err = bpf_prog_detach2(verdict, map, BPF_SK_MSG_VERDICT);
|
||||
if (CHECK_FAIL(err)) {
|
||||
perror("bpf_prog_detach2");
|
||||
goto out;
|
||||
}
|
||||
out:
|
||||
test_skmsg_load_helpers__destroy(skel);
|
||||
}
|
||||
|
||||
void test_sockmap_basic(void)
|
||||
{
|
||||
if (test__start_subtest("sockmap create_update_free"))
|
||||
test_sockmap_create_update_free(BPF_MAP_TYPE_SOCKMAP);
|
||||
if (test__start_subtest("sockhash create_update_free"))
|
||||
test_sockmap_create_update_free(BPF_MAP_TYPE_SOCKHASH);
|
||||
if (test__start_subtest("sockmap sk_msg load helpers"))
|
||||
test_skmsg_helpers(BPF_MAP_TYPE_SOCKMAP);
|
||||
if (test__start_subtest("sockhash sk_msg load helpers"))
|
||||
test_skmsg_helpers(BPF_MAP_TYPE_SOCKHASH);
|
||||
}
|
||||
|
97
tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
Normal file
97
tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
Normal file
@ -0,0 +1,97 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <uapi/linux/bpf.h>
|
||||
#include <linux/if_link.h>
|
||||
#include <test_progs.h>
|
||||
|
||||
#include "test_xdp_devmap_helpers.skel.h"
|
||||
#include "test_xdp_with_devmap_helpers.skel.h"
|
||||
|
||||
#define IFINDEX_LO 1
|
||||
|
||||
struct bpf_devmap_val {
|
||||
u32 ifindex; /* device index */
|
||||
union {
|
||||
int fd; /* prog fd on map write */
|
||||
u32 id; /* prog id on map read */
|
||||
} bpf_prog;
|
||||
};
|
||||
|
||||
void test_xdp_with_devmap_helpers(void)
|
||||
{
|
||||
struct test_xdp_with_devmap_helpers *skel;
|
||||
struct bpf_prog_info info = {};
|
||||
struct bpf_devmap_val val = {
|
||||
.ifindex = IFINDEX_LO,
|
||||
};
|
||||
__u32 len = sizeof(info);
|
||||
__u32 duration = 0, idx = 0;
|
||||
int err, dm_fd, map_fd;
|
||||
|
||||
|
||||
skel = test_xdp_with_devmap_helpers__open_and_load();
|
||||
if (CHECK_FAIL(!skel)) {
|
||||
perror("test_xdp_with_devmap_helpers__open_and_load");
|
||||
return;
|
||||
}
|
||||
|
||||
/* can not attach program with DEVMAPs that allow programs
|
||||
* as xdp generic
|
||||
*/
|
||||
dm_fd = bpf_program__fd(skel->progs.xdp_redir_prog);
|
||||
err = bpf_set_link_xdp_fd(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE);
|
||||
CHECK(err == 0, "Generic attach of program with 8-byte devmap",
|
||||
"should have failed\n");
|
||||
|
||||
dm_fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
|
||||
map_fd = bpf_map__fd(skel->maps.dm_ports);
|
||||
err = bpf_obj_get_info_by_fd(dm_fd, &info, &len);
|
||||
if (CHECK_FAIL(err))
|
||||
goto out_close;
|
||||
|
||||
val.bpf_prog.fd = dm_fd;
|
||||
err = bpf_map_update_elem(map_fd, &idx, &val, 0);
|
||||
CHECK(err, "Add program to devmap entry",
|
||||
"err %d errno %d\n", err, errno);
|
||||
|
||||
err = bpf_map_lookup_elem(map_fd, &idx, &val);
|
||||
CHECK(err, "Read devmap entry", "err %d errno %d\n", err, errno);
|
||||
CHECK(info.id != val.bpf_prog.id, "Expected program id in devmap entry",
|
||||
"expected %u read %u\n", info.id, val.bpf_prog.id);
|
||||
|
||||
/* can not attach BPF_XDP_DEVMAP program to a device */
|
||||
err = bpf_set_link_xdp_fd(IFINDEX_LO, dm_fd, XDP_FLAGS_SKB_MODE);
|
||||
CHECK(err == 0, "Attach of BPF_XDP_DEVMAP program",
|
||||
"should have failed\n");
|
||||
|
||||
val.ifindex = 1;
|
||||
val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_prog);
|
||||
err = bpf_map_update_elem(map_fd, &idx, &val, 0);
|
||||
CHECK(err == 0, "Add non-BPF_XDP_DEVMAP program to devmap entry",
|
||||
"should have failed\n");
|
||||
|
||||
out_close:
|
||||
test_xdp_with_devmap_helpers__destroy(skel);
|
||||
}
|
||||
|
||||
void test_neg_xdp_devmap_helpers(void)
|
||||
{
|
||||
struct test_xdp_devmap_helpers *skel;
|
||||
__u32 duration = 0;
|
||||
|
||||
skel = test_xdp_devmap_helpers__open_and_load();
|
||||
if (CHECK(skel,
|
||||
"Load of XDP program accessing egress ifindex without attach type",
|
||||
"should have failed\n")) {
|
||||
test_xdp_devmap_helpers__destroy(skel);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void test_xdp_devmap_attach(void)
|
||||
{
|
||||
if (test__start_subtest("DEVMAP with programs in entries"))
|
||||
test_xdp_with_devmap_helpers();
|
||||
|
||||
if (test__start_subtest("Verifier check of DEVMAP programs"))
|
||||
test_neg_xdp_devmap_helpers();
|
||||
}
|
@ -20,20 +20,20 @@
|
||||
#include <bpf/bpf_endian.h>
|
||||
|
||||
int _version SEC("version") = 1;
|
||||
#define PROG(F) SEC(#F) int bpf_func_##F
|
||||
#define PROG(F) PROG_(F, _##F)
|
||||
#define PROG_(NUM, NAME) SEC("flow_dissector/"#NUM) int bpf_func##NAME
|
||||
|
||||
/* These are the identifiers of the BPF programs that will be used in tail
|
||||
* calls. Name is limited to 16 characters, with the terminating character and
|
||||
* bpf_func_ above, we have only 6 to work with, anything after will be cropped.
|
||||
*/
|
||||
enum {
|
||||
IP,
|
||||
IPV6,
|
||||
IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
|
||||
IPV6FR, /* Fragmentation IPv6 Extension Header */
|
||||
MPLS,
|
||||
VLAN,
|
||||
};
|
||||
#define IP 0
|
||||
#define IPV6 1
|
||||
#define IPV6OP 2 /* Destination/Hop-by-Hop Options IPv6 Ext. Header */
|
||||
#define IPV6FR 3 /* Fragmentation IPv6 Extension Header */
|
||||
#define MPLS 4
|
||||
#define VLAN 5
|
||||
#define MAX_PROG 6
|
||||
|
||||
#define IP_MF 0x2000
|
||||
#define IP_OFFSET 0x1FFF
|
||||
@ -59,7 +59,7 @@ struct frag_hdr {
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
|
||||
__uint(max_entries, 8);
|
||||
__uint(max_entries, MAX_PROG);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} jmp_table SEC(".maps");
|
||||
|
@ -9,6 +9,8 @@
|
||||
#include <linux/in6.h>
|
||||
#include <sys/socket.h>
|
||||
#include <netinet/tcp.h>
|
||||
#include <linux/if.h>
|
||||
#include <errno.h>
|
||||
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
@ -21,6 +23,10 @@
|
||||
#define TCP_CA_NAME_MAX 16
|
||||
#endif
|
||||
|
||||
#ifndef IFNAMSIZ
|
||||
#define IFNAMSIZ 16
|
||||
#endif
|
||||
|
||||
int _version SEC("version") = 1;
|
||||
|
||||
__attribute__ ((noinline))
|
||||
@ -75,6 +81,29 @@ static __inline int set_cc(struct bpf_sock_addr *ctx)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __inline int bind_to_device(struct bpf_sock_addr *ctx)
|
||||
{
|
||||
char veth1[IFNAMSIZ] = "test_sock_addr1";
|
||||
char veth2[IFNAMSIZ] = "test_sock_addr2";
|
||||
char missing[IFNAMSIZ] = "nonexistent_dev";
|
||||
char del_bind[IFNAMSIZ] = "";
|
||||
|
||||
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
|
||||
&veth1, sizeof(veth1)))
|
||||
return 1;
|
||||
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
|
||||
&veth2, sizeof(veth2)))
|
||||
return 1;
|
||||
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
|
||||
&missing, sizeof(missing)) != -ENODEV)
|
||||
return 1;
|
||||
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_BINDTODEVICE,
|
||||
&del_bind, sizeof(del_bind)))
|
||||
return 1;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("cgroup/connect4")
|
||||
int connect_v4_prog(struct bpf_sock_addr *ctx)
|
||||
{
|
||||
@ -88,6 +117,10 @@ int connect_v4_prog(struct bpf_sock_addr *ctx)
|
||||
tuple.ipv4.daddr = bpf_htonl(DST_REWRITE_IP4);
|
||||
tuple.ipv4.dport = bpf_htons(DST_REWRITE_PORT4);
|
||||
|
||||
/* Bind to device and unbind it. */
|
||||
if (bind_to_device(ctx))
|
||||
return 0;
|
||||
|
||||
if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
|
||||
return 0;
|
||||
else if (ctx->type == SOCK_STREAM)
|
||||
|
33
tools/testing/selftests/bpf/progs/perfbuf_bench.c
Normal file
33
tools/testing/selftests/bpf/progs/perfbuf_bench.c
Normal file
@ -0,0 +1,33 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Facebook
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <stdint.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(value_size, sizeof(int));
|
||||
__uint(key_size, sizeof(int));
|
||||
} perfbuf SEC(".maps");
|
||||
|
||||
const volatile int batch_cnt = 0;
|
||||
|
||||
long sample_val = 42;
|
||||
long dropped __attribute__((aligned(128))) = 0;
|
||||
|
||||
SEC("fentry/__x64_sys_getpgid")
|
||||
int bench_perfbuf(void *ctx)
|
||||
{
|
||||
__u64 *sample;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < batch_cnt; i++) {
|
||||
if (bpf_perf_event_output(ctx, &perfbuf, BPF_F_CURRENT_CPU,
|
||||
&sample_val, sizeof(sample_val)))
|
||||
__sync_add_and_fetch(&dropped, 1);
|
||||
}
|
||||
return 0;
|
||||
}
|
60
tools/testing/selftests/bpf/progs/ringbuf_bench.c
Normal file
60
tools/testing/selftests/bpf/progs/ringbuf_bench.c
Normal file
@ -0,0 +1,60 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Facebook
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <stdint.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
} ringbuf SEC(".maps");
|
||||
|
||||
const volatile int batch_cnt = 0;
|
||||
const volatile long use_output = 0;
|
||||
|
||||
long sample_val = 42;
|
||||
long dropped __attribute__((aligned(128))) = 0;
|
||||
|
||||
const volatile long wakeup_data_size = 0;
|
||||
|
||||
static __always_inline long get_flags()
|
||||
{
|
||||
long sz;
|
||||
|
||||
if (!wakeup_data_size)
|
||||
return 0;
|
||||
|
||||
sz = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
|
||||
return sz >= wakeup_data_size ? BPF_RB_FORCE_WAKEUP : BPF_RB_NO_WAKEUP;
|
||||
}
|
||||
|
||||
SEC("fentry/__x64_sys_getpgid")
|
||||
int bench_ringbuf(void *ctx)
|
||||
{
|
||||
long *sample, flags;
|
||||
int i;
|
||||
|
||||
if (!use_output) {
|
||||
for (i = 0; i < batch_cnt; i++) {
|
||||
sample = bpf_ringbuf_reserve(&ringbuf,
|
||||
sizeof(sample_val), 0);
|
||||
if (!sample) {
|
||||
__sync_add_and_fetch(&dropped, 1);
|
||||
} else {
|
||||
*sample = sample_val;
|
||||
flags = get_flags();
|
||||
bpf_ringbuf_submit(sample, flags);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
for (i = 0; i < batch_cnt; i++) {
|
||||
flags = get_flags();
|
||||
if (bpf_ringbuf_output(&ringbuf, &sample_val,
|
||||
sizeof(sample_val), flags))
|
||||
__sync_add_and_fetch(&dropped, 1);
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
78
tools/testing/selftests/bpf/progs/test_ringbuf.c
Normal file
78
tools/testing/selftests/bpf/progs/test_ringbuf.c
Normal file
@ -0,0 +1,78 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Facebook
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
struct sample {
|
||||
int pid;
|
||||
int seq;
|
||||
long value;
|
||||
char comm[16];
|
||||
};
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 12);
|
||||
} ringbuf SEC(".maps");
|
||||
|
||||
/* inputs */
|
||||
int pid = 0;
|
||||
long value = 0;
|
||||
long flags = 0;
|
||||
|
||||
/* outputs */
|
||||
long total = 0;
|
||||
long discarded = 0;
|
||||
long dropped = 0;
|
||||
|
||||
long avail_data = 0;
|
||||
long ring_size = 0;
|
||||
long cons_pos = 0;
|
||||
long prod_pos = 0;
|
||||
|
||||
/* inner state */
|
||||
long seq = 0;
|
||||
|
||||
SEC("tp/syscalls/sys_enter_getpgid")
|
||||
int test_ringbuf(void *ctx)
|
||||
{
|
||||
int cur_pid = bpf_get_current_pid_tgid() >> 32;
|
||||
struct sample *sample;
|
||||
int zero = 0;
|
||||
|
||||
if (cur_pid != pid)
|
||||
return 0;
|
||||
|
||||
sample = bpf_ringbuf_reserve(&ringbuf, sizeof(*sample), 0);
|
||||
if (!sample) {
|
||||
__sync_fetch_and_add(&dropped, 1);
|
||||
return 1;
|
||||
}
|
||||
|
||||
sample->pid = pid;
|
||||
bpf_get_current_comm(sample->comm, sizeof(sample->comm));
|
||||
sample->value = value;
|
||||
|
||||
sample->seq = seq++;
|
||||
__sync_fetch_and_add(&total, 1);
|
||||
|
||||
if (sample->seq & 1) {
|
||||
/* copy from reserved sample to a new one... */
|
||||
bpf_ringbuf_output(&ringbuf, sample, sizeof(*sample), flags);
|
||||
/* ...and then discard reserved sample */
|
||||
bpf_ringbuf_discard(sample, flags);
|
||||
__sync_fetch_and_add(&discarded, 1);
|
||||
} else {
|
||||
bpf_ringbuf_submit(sample, flags);
|
||||
}
|
||||
|
||||
avail_data = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
|
||||
ring_size = bpf_ringbuf_query(&ringbuf, BPF_RB_RING_SIZE);
|
||||
cons_pos = bpf_ringbuf_query(&ringbuf, BPF_RB_CONS_POS);
|
||||
prod_pos = bpf_ringbuf_query(&ringbuf, BPF_RB_PROD_POS);
|
||||
|
||||
return 0;
|
||||
}
|
77
tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
Normal file
77
tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
Normal file
@ -0,0 +1,77 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Facebook
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
struct sample {
|
||||
int pid;
|
||||
int seq;
|
||||
long value;
|
||||
char comm[16];
|
||||
};
|
||||
|
||||
struct ringbuf_map {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 12);
|
||||
} ringbuf1 SEC(".maps"),
|
||||
ringbuf2 SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
|
||||
__uint(max_entries, 4);
|
||||
__type(key, int);
|
||||
__array(values, struct ringbuf_map);
|
||||
} ringbuf_arr SEC(".maps") = {
|
||||
.values = {
|
||||
[0] = &ringbuf1,
|
||||
[2] = &ringbuf2,
|
||||
},
|
||||
};
|
||||
|
||||
/* inputs */
|
||||
int pid = 0;
|
||||
int target_ring = 0;
|
||||
long value = 0;
|
||||
|
||||
/* outputs */
|
||||
long total = 0;
|
||||
long dropped = 0;
|
||||
long skipped = 0;
|
||||
|
||||
SEC("tp/syscalls/sys_enter_getpgid")
|
||||
int test_ringbuf(void *ctx)
|
||||
{
|
||||
int cur_pid = bpf_get_current_pid_tgid() >> 32;
|
||||
struct sample *sample;
|
||||
void *rb;
|
||||
int zero = 0;
|
||||
|
||||
if (cur_pid != pid)
|
||||
return 0;
|
||||
|
||||
rb = bpf_map_lookup_elem(&ringbuf_arr, &target_ring);
|
||||
if (!rb) {
|
||||
skipped += 1;
|
||||
return 1;
|
||||
}
|
||||
|
||||
sample = bpf_ringbuf_reserve(rb, sizeof(*sample), 0);
|
||||
if (!sample) {
|
||||
dropped += 1;
|
||||
return 1;
|
||||
}
|
||||
|
||||
sample->pid = pid;
|
||||
bpf_get_current_comm(sample->comm, sizeof(sample->comm));
|
||||
sample->value = value;
|
||||
|
||||
sample->seq = total;
|
||||
total += 1;
|
||||
|
||||
bpf_ringbuf_submit(sample, 0);
|
||||
|
||||
return 0;
|
||||
}
|
28
tools/testing/selftests/bpf/progs/test_skb_helpers.c
Normal file
28
tools/testing/selftests/bpf/progs/test_skb_helpers.c
Normal file
@ -0,0 +1,28 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
|
||||
#define TEST_COMM_LEN 16
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||||
__uint(max_entries, 1);
|
||||
__type(key, u32);
|
||||
__type(value, u32);
|
||||
} cgroup_map SEC(".maps");
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
SEC("classifier/test_skb_helpers")
|
||||
int test_skb_helpers(struct __sk_buff *skb)
|
||||
{
|
||||
struct task_struct *task;
|
||||
char comm[TEST_COMM_LEN];
|
||||
__u32 tpid;
|
||||
|
||||
task = (struct task_struct *)bpf_get_current_task();
|
||||
bpf_probe_read_kernel(&tpid , sizeof(tpid), &task->tgid);
|
||||
bpf_probe_read_kernel_str(&comm, sizeof(comm), &task->comm);
|
||||
return 0;
|
||||
}
|
47
tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
Normal file
47
tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
Normal file
@ -0,0 +1,47 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Isovalent, Inc.
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_SOCKMAP);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, __u32);
|
||||
__type(value, __u64);
|
||||
} sock_map SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_SOCKHASH);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, __u32);
|
||||
__type(value, __u64);
|
||||
} sock_hash SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
|
||||
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||
__type(key, __u32);
|
||||
__type(value, __u64);
|
||||
} socket_storage SEC(".maps");
|
||||
|
||||
SEC("sk_msg")
|
||||
int prog_msg_verdict(struct sk_msg_md *msg)
|
||||
{
|
||||
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
|
||||
int verdict = SK_PASS;
|
||||
__u32 pid, tpid;
|
||||
__u64 *sk_stg;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
sk_stg = bpf_sk_storage_get(&socket_storage, msg->sk, 0, BPF_SK_STORAGE_GET_F_CREATE);
|
||||
if (!sk_stg)
|
||||
return SK_DROP;
|
||||
*sk_stg = pid;
|
||||
bpf_probe_read_kernel(&tpid , sizeof(tpid), &task->tgid);
|
||||
if (pid != tpid)
|
||||
verdict = SK_DROP;
|
||||
bpf_sk_storage_delete(&socket_storage, (void *)msg->sk);
|
||||
return verdict;
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
@ -79,11 +79,18 @@ struct {
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 1);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, int);
|
||||
__type(value, int);
|
||||
} sock_skb_opts SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, TEST_MAP_TYPE);
|
||||
__uint(max_entries, 20);
|
||||
__uint(key_size, sizeof(int));
|
||||
__uint(value_size, sizeof(int));
|
||||
} tls_sock_map SEC(".maps");
|
||||
|
||||
SEC("sk_skb1")
|
||||
int bpf_prog1(struct __sk_buff *skb)
|
||||
{
|
||||
@ -118,6 +125,43 @@ int bpf_prog2(struct __sk_buff *skb)
|
||||
|
||||
}
|
||||
|
||||
SEC("sk_skb3")
|
||||
int bpf_prog3(struct __sk_buff *skb)
|
||||
{
|
||||
const int one = 1;
|
||||
int err, *f, ret = SK_PASS;
|
||||
void *data_end;
|
||||
char *c;
|
||||
|
||||
err = bpf_skb_pull_data(skb, 19);
|
||||
if (err)
|
||||
goto tls_out;
|
||||
|
||||
c = (char *)(long)skb->data;
|
||||
data_end = (void *)(long)skb->data_end;
|
||||
|
||||
if (c + 18 < data_end)
|
||||
memcpy(&c[13], "PASS", 4);
|
||||
f = bpf_map_lookup_elem(&sock_skb_opts, &one);
|
||||
if (f && *f) {
|
||||
__u64 flags = 0;
|
||||
|
||||
ret = 0;
|
||||
flags = *f;
|
||||
#ifdef SOCKMAP
|
||||
return bpf_sk_redirect_map(skb, &tls_sock_map, ret, flags);
|
||||
#else
|
||||
return bpf_sk_redirect_hash(skb, &tls_sock_map, &ret, flags);
|
||||
#endif
|
||||
}
|
||||
|
||||
f = bpf_map_lookup_elem(&sock_skb_opts, &one);
|
||||
if (f && *f)
|
||||
ret = SK_DROP;
|
||||
tls_out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
SEC("sockops")
|
||||
int bpf_sockmap(struct bpf_sock_ops *skops)
|
||||
{
|
||||
|
22
tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
Normal file
22
tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
Normal file
@ -0,0 +1,22 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* fails to load without expected_attach_type = BPF_XDP_DEVMAP
|
||||
* because of access to egress_ifindex
|
||||
*/
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
SEC("xdp_dm_log")
|
||||
int xdpdm_devlog(struct xdp_md *ctx)
|
||||
{
|
||||
char fmt[] = "devmap redirect: dev %u -> dev %u len %u\n";
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
unsigned int len = data_end - data;
|
||||
|
||||
bpf_trace_printk(fmt, sizeof(fmt),
|
||||
ctx->ingress_ifindex, ctx->egress_ifindex, len);
|
||||
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
@ -0,0 +1,44 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_DEVMAP);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(struct bpf_devmap_val));
|
||||
__uint(max_entries, 4);
|
||||
} dm_ports SEC(".maps");
|
||||
|
||||
SEC("xdp_redir")
|
||||
int xdp_redir_prog(struct xdp_md *ctx)
|
||||
{
|
||||
return bpf_redirect_map(&dm_ports, 1, 0);
|
||||
}
|
||||
|
||||
/* invalid program on DEVMAP entry;
|
||||
* SEC name means expected attach type not set
|
||||
*/
|
||||
SEC("xdp_dummy")
|
||||
int xdp_dummy_prog(struct xdp_md *ctx)
|
||||
{
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
/* valid program on DEVMAP entry via SEC name;
|
||||
* has access to egress and ingress ifindex
|
||||
*/
|
||||
SEC("xdp_devmap")
|
||||
int xdp_dummy_dm(struct xdp_md *ctx)
|
||||
{
|
||||
char fmt[] = "devmap redirect: dev %u -> dev %u len %u\n";
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
unsigned int len = data_end - data;
|
||||
|
||||
bpf_trace_printk(fmt, sizeof(fmt),
|
||||
ctx->ingress_ifindex, ctx->egress_ifindex, len);
|
||||
|
||||
return XDP_PASS;
|
||||
}
|
||||
char _license[] SEC("license") = "GPL";
|
@ -1394,23 +1394,25 @@ static void test_map_rdonly(void)
|
||||
|
||||
key = 1;
|
||||
value = 1234;
|
||||
/* Insert key=1 element. */
|
||||
/* Try to insert key=1 element. */
|
||||
assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == -1 &&
|
||||
errno == EPERM);
|
||||
|
||||
/* Check that key=2 is not found. */
|
||||
/* Check that key=1 is not found. */
|
||||
assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == ENOENT);
|
||||
assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == ENOENT);
|
||||
|
||||
close(fd);
|
||||
}
|
||||
|
||||
static void test_map_wronly(void)
|
||||
static void test_map_wronly_hash(void)
|
||||
{
|
||||
int fd, key = 0, value = 0;
|
||||
|
||||
fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
|
||||
MAP_SIZE, map_flags | BPF_F_WRONLY);
|
||||
if (fd < 0) {
|
||||
printf("Failed to create map for read only test '%s'!\n",
|
||||
printf("Failed to create map for write only test '%s'!\n",
|
||||
strerror(errno));
|
||||
exit(1);
|
||||
}
|
||||
@ -1420,9 +1422,49 @@ static void test_map_wronly(void)
|
||||
/* Insert key=1 element. */
|
||||
assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0);
|
||||
|
||||
/* Check that key=2 is not found. */
|
||||
/* Check that reading elements and keys from the map is not allowed. */
|
||||
assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == EPERM);
|
||||
assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == EPERM);
|
||||
|
||||
close(fd);
|
||||
}
|
||||
|
||||
static void test_map_wronly_stack_or_queue(enum bpf_map_type map_type)
|
||||
{
|
||||
int fd, value = 0;
|
||||
|
||||
assert(map_type == BPF_MAP_TYPE_QUEUE ||
|
||||
map_type == BPF_MAP_TYPE_STACK);
|
||||
fd = bpf_create_map(map_type, 0, sizeof(value), MAP_SIZE,
|
||||
map_flags | BPF_F_WRONLY);
|
||||
/* Stack/Queue maps do not support BPF_F_NO_PREALLOC */
|
||||
if (map_flags & BPF_F_NO_PREALLOC) {
|
||||
assert(fd < 0 && errno == EINVAL);
|
||||
return;
|
||||
}
|
||||
if (fd < 0) {
|
||||
printf("Failed to create map '%s'!\n", strerror(errno));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
value = 1234;
|
||||
assert(bpf_map_update_elem(fd, NULL, &value, BPF_ANY) == 0);
|
||||
|
||||
/* Peek element should fail */
|
||||
assert(bpf_map_lookup_elem(fd, NULL, &value) == -1 && errno == EPERM);
|
||||
|
||||
/* Pop element should fail */
|
||||
assert(bpf_map_lookup_and_delete_elem(fd, NULL, &value) == -1 &&
|
||||
errno == EPERM);
|
||||
|
||||
close(fd);
|
||||
}
|
||||
|
||||
static void test_map_wronly(void)
|
||||
{
|
||||
test_map_wronly_hash();
|
||||
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_STACK);
|
||||
test_map_wronly_stack_or_queue(BPF_MAP_TYPE_QUEUE);
|
||||
}
|
||||
|
||||
static void prepare_reuseport_grp(int type, int map_fd, size_t map_elem_size,
|
||||
|
@ -63,8 +63,8 @@ int s1, s2, c1, c2, p1, p2;
|
||||
int test_cnt;
|
||||
int passed;
|
||||
int failed;
|
||||
int map_fd[8];
|
||||
struct bpf_map *maps[8];
|
||||
int map_fd[9];
|
||||
struct bpf_map *maps[9];
|
||||
int prog_fd[11];
|
||||
|
||||
int txmsg_pass;
|
||||
@ -79,7 +79,10 @@ int txmsg_end_push;
|
||||
int txmsg_start_pop;
|
||||
int txmsg_pop;
|
||||
int txmsg_ingress;
|
||||
int txmsg_skb;
|
||||
int txmsg_redir_skb;
|
||||
int txmsg_ktls_skb;
|
||||
int txmsg_ktls_skb_drop;
|
||||
int txmsg_ktls_skb_redir;
|
||||
int ktls;
|
||||
int peek_flag;
|
||||
|
||||
@ -104,7 +107,7 @@ static const struct option long_options[] = {
|
||||
{"txmsg_start_pop", required_argument, NULL, 'w'},
|
||||
{"txmsg_pop", required_argument, NULL, 'x'},
|
||||
{"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
|
||||
{"txmsg_skb", no_argument, &txmsg_skb, 1 },
|
||||
{"txmsg_redir_skb", no_argument, &txmsg_redir_skb, 1 },
|
||||
{"ktls", no_argument, &ktls, 1 },
|
||||
{"peek", no_argument, &peek_flag, 1 },
|
||||
{"whitelist", required_argument, NULL, 'n' },
|
||||
@ -169,7 +172,8 @@ static void test_reset(void)
|
||||
txmsg_start_push = txmsg_end_push = 0;
|
||||
txmsg_pass = txmsg_drop = txmsg_redir = 0;
|
||||
txmsg_apply = txmsg_cork = 0;
|
||||
txmsg_ingress = txmsg_skb = 0;
|
||||
txmsg_ingress = txmsg_redir_skb = 0;
|
||||
txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
|
||||
}
|
||||
|
||||
static int test_start_subtest(const struct _test *t, struct sockmap_options *o)
|
||||
@ -502,14 +506,41 @@ unwind_iov:
|
||||
|
||||
static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz)
|
||||
{
|
||||
int i, j, bytes_cnt = 0;
|
||||
int i, j = 0, bytes_cnt = 0;
|
||||
unsigned char k = 0;
|
||||
|
||||
for (i = 0; i < msg->msg_iovlen; i++) {
|
||||
unsigned char *d = msg->msg_iov[i].iov_base;
|
||||
|
||||
for (j = 0;
|
||||
j < msg->msg_iov[i].iov_len && size; j++) {
|
||||
/* Special case test for skb ingress + ktls */
|
||||
if (i == 0 && txmsg_ktls_skb) {
|
||||
if (msg->msg_iov[i].iov_len < 4)
|
||||
return -EIO;
|
||||
if (txmsg_ktls_skb_redir) {
|
||||
if (memcmp(&d[13], "PASS", 4) != 0) {
|
||||
fprintf(stderr,
|
||||
"detected redirect ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[13], d[14], d[15], d[16]);
|
||||
return -EIO;
|
||||
}
|
||||
d[13] = 0;
|
||||
d[14] = 1;
|
||||
d[15] = 2;
|
||||
d[16] = 3;
|
||||
j = 13;
|
||||
} else if (txmsg_ktls_skb) {
|
||||
if (memcmp(d, "PASS", 4) != 0) {
|
||||
fprintf(stderr,
|
||||
"detected ktls_skb data error with skb ingress update @iov[%i]:%i \"%02x %02x %02x %02x\" != \"PASS\"\n", i, 0, d[0], d[1], d[2], d[3]);
|
||||
return -EIO;
|
||||
}
|
||||
d[0] = 0;
|
||||
d[1] = 1;
|
||||
d[2] = 2;
|
||||
d[3] = 3;
|
||||
}
|
||||
}
|
||||
|
||||
for (; j < msg->msg_iov[i].iov_len && size; j++) {
|
||||
if (d[j] != k++) {
|
||||
fprintf(stderr,
|
||||
"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
|
||||
@ -724,7 +755,7 @@ static int sendmsg_test(struct sockmap_options *opt)
|
||||
rxpid = fork();
|
||||
if (rxpid == 0) {
|
||||
iov_buf -= (txmsg_pop - txmsg_start_pop + 1);
|
||||
if (opt->drop_expected)
|
||||
if (opt->drop_expected || txmsg_ktls_skb_drop)
|
||||
_exit(0);
|
||||
|
||||
if (!iov_buf) /* zero bytes sent case */
|
||||
@ -911,8 +942,28 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
|
||||
return err;
|
||||
}
|
||||
|
||||
/* Attach programs to TLS sockmap */
|
||||
if (txmsg_ktls_skb) {
|
||||
err = bpf_prog_attach(prog_fd[0], map_fd[8],
|
||||
BPF_SK_SKB_STREAM_PARSER, 0);
|
||||
if (err) {
|
||||
fprintf(stderr,
|
||||
"ERROR: bpf_prog_attach (TLS sockmap %i->%i): %d (%s)\n",
|
||||
prog_fd[0], map_fd[8], err, strerror(errno));
|
||||
return err;
|
||||
}
|
||||
|
||||
err = bpf_prog_attach(prog_fd[2], map_fd[8],
|
||||
BPF_SK_SKB_STREAM_VERDICT, 0);
|
||||
if (err) {
|
||||
fprintf(stderr, "ERROR: bpf_prog_attach (TLS sockmap): %d (%s)\n",
|
||||
err, strerror(errno));
|
||||
return err;
|
||||
}
|
||||
}
|
||||
|
||||
/* Attach to cgroups */
|
||||
err = bpf_prog_attach(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
|
||||
err = bpf_prog_attach(prog_fd[3], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
|
||||
if (err) {
|
||||
fprintf(stderr, "ERROR: bpf_prog_attach (groups): %d (%s)\n",
|
||||
err, strerror(errno));
|
||||
@ -928,15 +979,15 @@ run:
|
||||
|
||||
/* Attach txmsg program to sockmap */
|
||||
if (txmsg_pass)
|
||||
tx_prog_fd = prog_fd[3];
|
||||
else if (txmsg_redir)
|
||||
tx_prog_fd = prog_fd[4];
|
||||
else if (txmsg_apply)
|
||||
else if (txmsg_redir)
|
||||
tx_prog_fd = prog_fd[5];
|
||||
else if (txmsg_cork)
|
||||
else if (txmsg_apply)
|
||||
tx_prog_fd = prog_fd[6];
|
||||
else if (txmsg_drop)
|
||||
else if (txmsg_cork)
|
||||
tx_prog_fd = prog_fd[7];
|
||||
else if (txmsg_drop)
|
||||
tx_prog_fd = prog_fd[8];
|
||||
else
|
||||
tx_prog_fd = 0;
|
||||
|
||||
@ -1108,7 +1159,35 @@ run:
|
||||
}
|
||||
}
|
||||
|
||||
if (txmsg_skb) {
|
||||
if (txmsg_ktls_skb) {
|
||||
int ingress = BPF_F_INGRESS;
|
||||
|
||||
i = 0;
|
||||
err = bpf_map_update_elem(map_fd[8], &i, &p2, BPF_ANY);
|
||||
if (err) {
|
||||
fprintf(stderr,
|
||||
"ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
|
||||
err, strerror(errno));
|
||||
}
|
||||
|
||||
if (txmsg_ktls_skb_redir) {
|
||||
i = 1;
|
||||
err = bpf_map_update_elem(map_fd[7],
|
||||
&i, &ingress, BPF_ANY);
|
||||
if (err) {
|
||||
fprintf(stderr,
|
||||
"ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
|
||||
err, strerror(errno));
|
||||
}
|
||||
}
|
||||
|
||||
if (txmsg_ktls_skb_drop) {
|
||||
i = 1;
|
||||
err = bpf_map_update_elem(map_fd[7], &i, &i, BPF_ANY);
|
||||
}
|
||||
}
|
||||
|
||||
if (txmsg_redir_skb) {
|
||||
int skb_fd = (test == SENDMSG || test == SENDPAGE) ?
|
||||
p2 : p1;
|
||||
int ingress = BPF_F_INGRESS;
|
||||
@ -1123,8 +1202,7 @@ run:
|
||||
}
|
||||
|
||||
i = 3;
|
||||
err = bpf_map_update_elem(map_fd[0],
|
||||
&i, &skb_fd, BPF_ANY);
|
||||
err = bpf_map_update_elem(map_fd[0], &i, &skb_fd, BPF_ANY);
|
||||
if (err) {
|
||||
fprintf(stderr,
|
||||
"ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
|
||||
@ -1158,9 +1236,12 @@ run:
|
||||
fprintf(stderr, "unknown test\n");
|
||||
out:
|
||||
/* Detatch and zero all the maps */
|
||||
bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);
|
||||
bpf_prog_detach2(prog_fd[3], cg_fd, BPF_CGROUP_SOCK_OPS);
|
||||
bpf_prog_detach2(prog_fd[0], map_fd[0], BPF_SK_SKB_STREAM_PARSER);
|
||||
bpf_prog_detach2(prog_fd[1], map_fd[0], BPF_SK_SKB_STREAM_VERDICT);
|
||||
bpf_prog_detach2(prog_fd[0], map_fd[8], BPF_SK_SKB_STREAM_PARSER);
|
||||
bpf_prog_detach2(prog_fd[2], map_fd[8], BPF_SK_SKB_STREAM_VERDICT);
|
||||
|
||||
if (tx_prog_fd >= 0)
|
||||
bpf_prog_detach2(tx_prog_fd, map_fd[1], BPF_SK_MSG_VERDICT);
|
||||
|
||||
@ -1229,8 +1310,10 @@ static void test_options(char *options)
|
||||
}
|
||||
if (txmsg_ingress)
|
||||
strncat(options, "ingress,", OPTSTRING);
|
||||
if (txmsg_skb)
|
||||
strncat(options, "skb,", OPTSTRING);
|
||||
if (txmsg_redir_skb)
|
||||
strncat(options, "redir_skb,", OPTSTRING);
|
||||
if (txmsg_ktls_skb)
|
||||
strncat(options, "ktls_skb,", OPTSTRING);
|
||||
if (ktls)
|
||||
strncat(options, "ktls,", OPTSTRING);
|
||||
if (peek_flag)
|
||||
@ -1362,6 +1445,40 @@ static void test_txmsg_ingress_redir(int cgrp, struct sockmap_options *opt)
|
||||
test_send(opt, cgrp);
|
||||
}
|
||||
|
||||
static void test_txmsg_skb(int cgrp, struct sockmap_options *opt)
|
||||
{
|
||||
bool data = opt->data_test;
|
||||
int k = ktls;
|
||||
|
||||
opt->data_test = true;
|
||||
ktls = 1;
|
||||
|
||||
txmsg_pass = txmsg_drop = 0;
|
||||
txmsg_ingress = txmsg_redir = 0;
|
||||
txmsg_ktls_skb = 1;
|
||||
txmsg_pass = 1;
|
||||
|
||||
/* Using data verification so ensure iov layout is
|
||||
* expected from test receiver side. e.g. has enough
|
||||
* bytes to write test code.
|
||||
*/
|
||||
opt->iov_length = 100;
|
||||
opt->iov_count = 1;
|
||||
opt->rate = 1;
|
||||
test_exec(cgrp, opt);
|
||||
|
||||
txmsg_ktls_skb_drop = 1;
|
||||
test_exec(cgrp, opt);
|
||||
|
||||
txmsg_ktls_skb_drop = 0;
|
||||
txmsg_ktls_skb_redir = 1;
|
||||
test_exec(cgrp, opt);
|
||||
|
||||
opt->data_test = data;
|
||||
ktls = k;
|
||||
}
|
||||
|
||||
|
||||
/* Test cork with hung data. This tests poor usage patterns where
|
||||
* cork can leave data on the ring if user program is buggy and
|
||||
* doesn't flush them somehow. They do take some time however
|
||||
@ -1542,11 +1659,13 @@ char *map_names[] = {
|
||||
"sock_bytes",
|
||||
"sock_redir_flags",
|
||||
"sock_skb_opts",
|
||||
"tls_sock_map",
|
||||
};
|
||||
|
||||
int prog_attach_type[] = {
|
||||
BPF_SK_SKB_STREAM_PARSER,
|
||||
BPF_SK_SKB_STREAM_VERDICT,
|
||||
BPF_SK_SKB_STREAM_VERDICT,
|
||||
BPF_CGROUP_SOCK_OPS,
|
||||
BPF_SK_MSG_VERDICT,
|
||||
BPF_SK_MSG_VERDICT,
|
||||
@ -1558,6 +1677,7 @@ int prog_attach_type[] = {
|
||||
};
|
||||
|
||||
int prog_type[] = {
|
||||
BPF_PROG_TYPE_SK_SKB,
|
||||
BPF_PROG_TYPE_SK_SKB,
|
||||
BPF_PROG_TYPE_SK_SKB,
|
||||
BPF_PROG_TYPE_SOCK_OPS,
|
||||
@ -1620,6 +1740,7 @@ struct _test test[] = {
|
||||
{"txmsg test redirect", test_txmsg_redir},
|
||||
{"txmsg test drop", test_txmsg_drop},
|
||||
{"txmsg test ingress redirect", test_txmsg_ingress_redir},
|
||||
{"txmsg test skb", test_txmsg_skb},
|
||||
{"txmsg test apply", test_txmsg_apply},
|
||||
{"txmsg test cork", test_txmsg_cork},
|
||||
{"txmsg test hanging corks", test_txmsg_cork_hangs},
|
||||
|
@ -15,7 +15,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R0 max value is outside of the array range",
|
||||
.errstr = "R0 max value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
|
||||
},
|
||||
@ -44,7 +44,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R0 max value is outside of the array range",
|
||||
.errstr = "R0 max value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
|
||||
},
|
||||
|
@ -117,7 +117,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R0 min value is outside of the array range",
|
||||
.errstr = "R0 min value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
|
||||
},
|
||||
@ -137,7 +137,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
|
||||
.errstr = "R0 unbounded memory access, make sure to bounds check any such access",
|
||||
.result = REJECT,
|
||||
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
|
||||
},
|
||||
|
@ -20,7 +20,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_8b = { 3 },
|
||||
.errstr = "R0 max value is outside of the array range",
|
||||
.errstr = "R0 max value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
},
|
||||
{
|
||||
@ -146,7 +146,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_8b = { 3 },
|
||||
.errstr = "R0 min value is outside of the array range",
|
||||
.errstr = "R0 min value is outside of the allowed memory range",
|
||||
.result = REJECT
|
||||
},
|
||||
{
|
||||
@ -354,7 +354,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_8b = { 3 },
|
||||
.errstr = "R0 max value is outside of the array range",
|
||||
.errstr = "R0 max value is outside of the allowed memory range",
|
||||
.result = REJECT
|
||||
},
|
||||
{
|
||||
|
@ -105,7 +105,7 @@
|
||||
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
|
||||
.fixup_map_hash_8b = { 16 },
|
||||
.result = REJECT,
|
||||
.errstr = "R0 min value is outside of the array range",
|
||||
.errstr = "R0 min value is outside of the allowed memory range",
|
||||
},
|
||||
{
|
||||
"calls: overlapping caller/callee",
|
||||
|
@ -68,7 +68,7 @@
|
||||
},
|
||||
.fixup_map_array_48b = { 1 },
|
||||
.result = REJECT,
|
||||
.errstr = "R1 min value is outside of the array range",
|
||||
.errstr = "R1 min value is outside of the allowed memory range",
|
||||
},
|
||||
{
|
||||
"direct map access, write test 7",
|
||||
@ -220,7 +220,7 @@
|
||||
},
|
||||
.fixup_map_array_small = { 1 },
|
||||
.result = REJECT,
|
||||
.errstr = "R1 min value is outside of the array range",
|
||||
.errstr = "R1 min value is outside of the allowed memory range",
|
||||
},
|
||||
{
|
||||
"direct map access, write test 19",
|
||||
|
@ -318,7 +318,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 4 },
|
||||
.errstr = "R1 min value is outside of the array range",
|
||||
.errstr = "R1 min value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
|
||||
},
|
||||
|
@ -280,7 +280,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R1 min value is outside of the array range",
|
||||
.errstr = "R1 min value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
|
||||
},
|
||||
@ -415,7 +415,7 @@
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
.fixup_map_hash_48b = { 3 },
|
||||
.errstr = "R1 min value is outside of the array range",
|
||||
.errstr = "R1 min value is outside of the allowed memory range",
|
||||
.result = REJECT,
|
||||
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
|
||||
},
|
||||
@ -926,7 +926,7 @@
|
||||
},
|
||||
.fixup_map_hash_16b = { 3, 10 },
|
||||
.result = REJECT,
|
||||
.errstr = "R2 unbounded memory access, make sure to bounds check any array access into a map",
|
||||
.errstr = "R2 unbounded memory access, make sure to bounds check any such access",
|
||||
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
|
||||
},
|
||||
{
|
||||
|
@ -50,7 +50,7 @@
|
||||
.fixup_map_array_48b = { 8 },
|
||||
.result = ACCEPT,
|
||||
.result_unpriv = REJECT,
|
||||
.errstr_unpriv = "R0 min value is outside of the array range",
|
||||
.errstr_unpriv = "R0 min value is outside of the allowed memory range",
|
||||
.retval = 1,
|
||||
},
|
||||
{
|
||||
@ -325,7 +325,7 @@
|
||||
},
|
||||
.fixup_map_array_48b = { 3 },
|
||||
.result = REJECT,
|
||||
.errstr = "R0 min value is outside of the array range",
|
||||
.errstr = "R0 min value is outside of the allowed memory range",
|
||||
.result_unpriv = REJECT,
|
||||
.errstr_unpriv = "R0 pointer arithmetic of map value goes out of range",
|
||||
},
|
||||
@ -601,7 +601,7 @@
|
||||
},
|
||||
.fixup_map_array_48b = { 3 },
|
||||
.result = REJECT,
|
||||
.errstr = "R1 max value is outside of the array range",
|
||||
.errstr = "R1 max value is outside of the allowed memory range",
|
||||
.errstr_unpriv = "R1 pointer arithmetic of map value goes out of range",
|
||||
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
|
||||
},
|
||||
@ -726,7 +726,7 @@
|
||||
},
|
||||
.fixup_map_array_48b = { 3 },
|
||||
.result = REJECT,
|
||||
.errstr = "R0 min value is outside of the array range",
|
||||
.errstr = "R0 min value is outside of the allowed memory range",
|
||||
},
|
||||
{
|
||||
"map access: value_ptr -= known scalar, 2",
|
||||
|
Loading…
Reference in New Issue
Block a user