d2dcc67df9
Consider this code snippet: struct node { long key; bpf_list_node l; bpf_rb_node r; bpf_refcount ref; } int some_bpf_prog(void *ctx) { struct node *n = bpf_obj_new(/*...*/), *m; bpf_spin_lock(&glock); bpf_rbtree_add(&some_tree, &n->r, /* ... */); m = bpf_refcount_acquire(n); bpf_rbtree_add(&other_tree, &m->r, /* ... */); bpf_spin_unlock(&glock); /* ... */ } After bpf_refcount_acquire, n and m point to the same underlying memory, and that node's bpf_rb_node field is being used by the some_tree insert, so overwriting it as a result of the second insert is an error. In order to properly support refcounted nodes, the rbtree and list insert functions must be allowed to fail. This patch adds such support. The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to return an int indicating success/failure, with 0 -> success, nonzero -> failure. bpf_obj_drop on failure ======================= Currently the only reason an insert can fail is the example above: the bpf_{list,rb}_node is already in use. When such a failure occurs, the insert kfuncs will bpf_obj_drop the input node. This allows the insert operations to logically fail without changing their verifier owning ref behavior, namely the unconditional release_reference of the input owning ref. With insert that always succeeds, ownership of the node is always passed to the collection, since the node always ends up in the collection. With a possibly-failed insert w/ bpf_obj_drop, ownership of the node is always passed either to the collection (success), or to bpf_obj_drop (failure). Regardless, it's correct to continue unconditionally releasing the input owning ref, as something is always taking ownership from the calling program on insert. Keeping owning ref behavior unchanged results in a nice default UX for insert functions that can fail. If the program's reaction to a failed insert is "fine, just get rid of this owning ref for me and let me go on with my business", then there's no reason to check for failure since that's default behavior. e.g.: long important_failures = 0; int some_bpf_prog(void *ctx) { struct node *n, *m, *o; /* all bpf_obj_new'd */ bpf_spin_lock(&glock); bpf_rbtree_add(&some_tree, &n->node, /* ... */); bpf_rbtree_add(&some_tree, &m->node, /* ... */); if (bpf_rbtree_add(&some_tree, &o->node, /* ... */)) { important_failures++; } bpf_spin_unlock(&glock); } If we instead chose to pass ownership back to the program on failed insert - by returning NULL on success or an owning ref on failure - programs would always have to do something with the returned ref on failure. The most likely action is probably "I'll just get rid of this owning ref and go about my business", which ideally would look like: if (n = bpf_rbtree_add(&some_tree, &n->node, /* ... */)) bpf_obj_drop(n); But bpf_obj_drop isn't allowed in a critical section and inserts must occur within one, so in reality error handling would become a hard-to-parse mess. For refcounted nodes, we can replicate the "pass ownership back to program on failure" logic with this patch's semantics, albeit in an ugly way: struct node *n = bpf_obj_new(/* ... */), *m; bpf_spin_lock(&glock); m = bpf_refcount_acquire(n); if (bpf_rbtree_add(&some_tree, &n->node, /* ... */)) { /* Do something with m */ } bpf_spin_unlock(&glock); bpf_obj_drop(m); bpf_refcount_acquire is used to simulate "return owning ref on failure". This should be an uncommon occurrence, though. Addition of two verifier-fixup'd args to collection inserts =========================================================== The actual bpf_obj_drop kfunc is bpf_obj_drop_impl(void *, struct btf_struct_meta *), with bpf_obj_drop macro populating the second arg with 0 and the verifier later filling in the arg during insn fixup. Because bpf_rbtree_add and bpf_list_push_{front,back} now might do bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be passed to bpf_obj_drop_impl. Similarly, because the 'node' param to those insert functions is the bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a pointer to the beginning of the node, the insert functions need to be able to find the beginning of the node struct. A second verifier-populated param is necessary: the offset of {list,rb}_node within the node type. These two new params allow the insert kfuncs to correctly call __bpf_obj_drop_impl: beginning_of_node = bpf_rb_node_ptr - offset if (already_inserted) __bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record); Similarly to other kfuncs with "hidden" verifier-populated params, the insert functions are renamed with _impl prefix and a macro is provided for common usage. For example, bpf_rbtree_add kfunc is now bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets "hidden" args to 0. Due to the two new args BPF progs will need to be recompiled to work with the new _impl kfuncs. This patch also rewrites the "hidden argument" explanation to more directly say why the BPF program writer doesn't need to populate the arguments with anything meaningful. How does this new logic affect non-owning references? ===================================================== Currently, non-owning refs are valid until the end of the critical section in which they're created. We can make this guarantee because, if a non-owning ref exists, the referent was added to some collection. The collection will drop() its nodes when it goes away, but it can't go away while our program is accessing it, so that's not a problem. If the referent is removed from the collection in the same CS that it was added in, it can't be bpf_obj_drop'd until after CS end. Those are the only two ways to free the referent's memory and neither can happen until after the non-owning ref's lifetime ends. On first glance, having these collection insert functions potentially bpf_obj_drop their input seems like it breaks the "can't be bpf_obj_drop'd until after CS end" line of reasoning. But we care about the memory not being _freed_ until end of CS end, and a previous patch in the series modified bpf_obj_drop such that it doesn't free refcounted nodes until refcount == 0. So the statement can be more accurately rewritten as "can't be free'd until after CS end". We can prove that this rewritten statement holds for any non-owning reference produced by collection insert functions: * If the input to the insert function is _not_ refcounted * We have an owning reference to the input, and can conclude it isn't in any collection * Inserting a node in a collection turns owning refs into non-owning, and since our input type isn't refcounted, there's no way to obtain additional owning refs to the same underlying memory * Because our node isn't in any collection, the insert operation cannot fail, so bpf_obj_drop will not execute * If bpf_obj_drop is guaranteed not to execute, there's no risk of memory being free'd * Otherwise, the input to the insert function is refcounted * If the insert operation fails due to the node's list_head or rb_root already being in some collection, there was some previous successful insert which passed refcount to the collection * We have an owning reference to the input, it must have been acquired via bpf_refcount_acquire, which bumped the refcount * refcount must be >= 2 since there's a valid owning reference and the node is already in a collection * Insert triggering bpf_obj_drop will decr refcount to >= 1, never resulting in a free So although we may do bpf_obj_drop during the critical section, this will never result in memory being free'd, and no changes to non-owning ref logic are needed in this patch. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230415201811.343116-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
135 lines
4.7 KiB
C
135 lines
4.7 KiB
C
#ifndef __BPF_EXPERIMENTAL__
|
|
#define __BPF_EXPERIMENTAL__
|
|
|
|
#include <vmlinux.h>
|
|
#include <bpf/bpf_tracing.h>
|
|
#include <bpf/bpf_helpers.h>
|
|
#include <bpf/bpf_core_read.h>
|
|
|
|
#define __contains(name, node) __attribute__((btf_decl_tag("contains:" #name ":" #node)))
|
|
|
|
/* Description
|
|
* Allocates an object of the type represented by 'local_type_id' in
|
|
* program BTF. User may use the bpf_core_type_id_local macro to pass the
|
|
* type ID of a struct in program BTF.
|
|
*
|
|
* The 'local_type_id' parameter must be a known constant.
|
|
* The 'meta' parameter is rewritten by the verifier, no need for BPF
|
|
* program to set it.
|
|
* Returns
|
|
* A pointer to an object of the type corresponding to the passed in
|
|
* 'local_type_id', or NULL on failure.
|
|
*/
|
|
extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_obj_new_impl */
|
|
#define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
|
|
|
|
/* Description
|
|
* Free an allocated object. All fields of the object that require
|
|
* destruction will be destructed before the storage is freed.
|
|
*
|
|
* The 'meta' parameter is rewritten by the verifier, no need for BPF
|
|
* program to set it.
|
|
* Returns
|
|
* Void.
|
|
*/
|
|
extern void bpf_obj_drop_impl(void *kptr, void *meta) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_obj_drop_impl */
|
|
#define bpf_obj_drop(kptr) bpf_obj_drop_impl(kptr, NULL)
|
|
|
|
/* Description
|
|
* Increment the refcount on a refcounted local kptr, turning the
|
|
* non-owning reference input into an owning reference in the process.
|
|
*
|
|
* The 'meta' parameter is rewritten by the verifier, no need for BPF
|
|
* program to set it.
|
|
* Returns
|
|
* An owning reference to the object pointed to by 'kptr'
|
|
*/
|
|
extern void *bpf_refcount_acquire_impl(void *kptr, void *meta) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_refcount_acquire_impl */
|
|
#define bpf_refcount_acquire(kptr) bpf_refcount_acquire_impl(kptr, NULL)
|
|
|
|
/* Description
|
|
* Add a new entry to the beginning of the BPF linked list.
|
|
*
|
|
* The 'meta' and 'off' parameters are rewritten by the verifier, no need
|
|
* for BPF programs to set them
|
|
* Returns
|
|
* 0 if the node was successfully added
|
|
* -EINVAL if the node wasn't added because it's already in a list
|
|
*/
|
|
extern int bpf_list_push_front_impl(struct bpf_list_head *head,
|
|
struct bpf_list_node *node,
|
|
void *meta, __u64 off) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_list_push_front_impl */
|
|
#define bpf_list_push_front(head, node) bpf_list_push_front_impl(head, node, NULL, 0)
|
|
|
|
/* Description
|
|
* Add a new entry to the end of the BPF linked list.
|
|
*
|
|
* The 'meta' and 'off' parameters are rewritten by the verifier, no need
|
|
* for BPF programs to set them
|
|
* Returns
|
|
* 0 if the node was successfully added
|
|
* -EINVAL if the node wasn't added because it's already in a list
|
|
*/
|
|
extern int bpf_list_push_back_impl(struct bpf_list_head *head,
|
|
struct bpf_list_node *node,
|
|
void *meta, __u64 off) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_list_push_back_impl */
|
|
#define bpf_list_push_back(head, node) bpf_list_push_back_impl(head, node, NULL, 0)
|
|
|
|
/* Description
|
|
* Remove the entry at the beginning of the BPF linked list.
|
|
* Returns
|
|
* Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
|
|
*/
|
|
extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ksym;
|
|
|
|
/* Description
|
|
* Remove the entry at the end of the BPF linked list.
|
|
* Returns
|
|
* Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
|
|
*/
|
|
extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
|
|
|
|
/* Description
|
|
* Remove 'node' from rbtree with root 'root'
|
|
* Returns
|
|
* Pointer to the removed node, or NULL if 'root' didn't contain 'node'
|
|
*/
|
|
extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
|
|
struct bpf_rb_node *node) __ksym;
|
|
|
|
/* Description
|
|
* Add 'node' to rbtree with root 'root' using comparator 'less'
|
|
*
|
|
* The 'meta' and 'off' parameters are rewritten by the verifier, no need
|
|
* for BPF programs to set them
|
|
* Returns
|
|
* 0 if the node was successfully added
|
|
* -EINVAL if the node wasn't added because it's already in a tree
|
|
*/
|
|
extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *node,
|
|
bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b),
|
|
void *meta, __u64 off) __ksym;
|
|
|
|
/* Convenience macro to wrap over bpf_rbtree_add_impl */
|
|
#define bpf_rbtree_add(head, node, less) bpf_rbtree_add_impl(head, node, less, NULL, 0)
|
|
|
|
/* Description
|
|
* Return the first (leftmost) node in input tree
|
|
* Returns
|
|
* Pointer to the node, which is _not_ removed from the tree. If the tree
|
|
* contains no nodes, returns NULL.
|
|
*/
|
|
extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
|
|
|
|
#endif
|