Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
This commit is contained in:
Linus Torvalds
2020-10-15 18:42:13 -07:00
2301 changed files with 130033 additions and 50954 deletions

View File

@ -124,6 +124,7 @@ enum bpf_cmd {
BPF_ENABLE_STATS,
BPF_ITER_CREATE,
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
};
enum bpf_map_type {
@ -155,6 +156,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
BPF_MAP_TYPE_INODE_STORAGE,
};
/* Note that tracing related programs such as
@ -345,19 +347,45 @@ enum bpf_link_type {
/* The verifier internal test flag. Behavior is undefined */
#define BPF_F_TEST_STATE_FREQ (1U << 3)
/* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will
* restrict map and helper usage for such programs. Sleepable BPF programs can
* only be attached to hooks where kernel execution context allows sleeping.
* Such programs are allowed to use helpers that may sleep like
* bpf_copy_from_user().
*/
#define BPF_F_SLEEPABLE (1U << 4)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* two extensions:
* the following extensions:
*
* insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
* insn[0].imm: map fd map fd
* insn[1].imm: 0 offset into value
* insn[0].off: 0 0
* insn[1].off: 0 0
* ldimm64 rewrite: address of map address of map[0]+offset
* verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
* insn[0].src_reg: BPF_PSEUDO_MAP_FD
* insn[0].imm: map fd
* insn[1].imm: 0
* insn[0].off: 0
* insn[1].off: 0
* ldimm64 rewrite: address of map
* verifier type: CONST_PTR_TO_MAP
*/
#define BPF_PSEUDO_MAP_FD 1
/* insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
* insn[0].imm: map fd
* insn[1].imm: offset into value
* insn[0].off: 0
* insn[1].off: 0
* ldimm64 rewrite: address of map[0]+offset
* verifier type: PTR_TO_MAP_VALUE
*/
#define BPF_PSEUDO_MAP_VALUE 2
/* insn[0].src_reg: BPF_PSEUDO_BTF_ID
* insn[0].imm: kernel btd id of VAR
* insn[1].imm: 0
* insn[0].off: 0
* insn[1].off: 0
* ldimm64 rewrite: address of the kernel variable
* verifier type: PTR_TO_BTF_ID or PTR_TO_MEM, depending on whether the var
* is struct/union.
*/
#define BPF_PSEUDO_BTF_ID 3
/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function
@ -404,6 +432,12 @@ enum {
/* Enable memory-mapping BPF map */
BPF_F_MMAPABLE = (1U << 10),
/* Share perf_event among processes */
BPF_F_PRESERVE_ELEMS = (1U << 11),
/* Create a map that is suitable to be an inner map with dynamic max entries */
BPF_F_INNER_MAP = (1U << 12),
};
/* Flags for BPF_PROG_QUERY. */
@ -414,6 +448,11 @@ enum {
*/
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
/* Flags for BPF_PROG_TEST_RUN */
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
/* type for BPF_ENABLE_STATS */
enum bpf_stats_type {
/* enabled run_time_ns and run_cnt */
@ -556,6 +595,8 @@ union bpf_attr {
*/
__aligned_u64 ctx_in;
__aligned_u64 ctx_out;
__u32 flags;
__u32 cpu;
} test;
struct { /* anonymous struct used by BPF_*_GET_*_ID */
@ -622,8 +663,13 @@ union bpf_attr {
};
__u32 attach_type; /* attach type */
__u32 flags; /* extra flags */
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
union {
__u32 target_btf_id; /* btf_id of target to attach to */
struct {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
};
};
} link_create;
struct { /* struct used by BPF_LINK_UPDATE command */
@ -649,6 +695,12 @@ union bpf_attr {
__u32 flags;
} iter_create;
struct { /* struct used by BPF_PROG_BIND_MAP command */
__u32 prog_fd;
__u32 map_fd;
__u32 flags; /* extra flags */
} prog_bind_map;
} __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF
@ -1438,8 +1490,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
* * 0, if the *skb* task belongs to the cgroup2.
* * 1, if the *skb* task does not belong to the cgroup2.
* * 0, if current task belongs to the cgroup2.
* * 1, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
@ -1649,7 +1701,7 @@ union bpf_attr {
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
* **TCP_KEEPIDLE**, **TCP_KEEPINTVL**, **TCP_KEEPCNT**,
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**.
* **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**.
* * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
* * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
* Return
@ -2204,7 +2256,7 @@ union bpf_attr {
* Description
* This helper is used in programs implementing policies at the
* skb socket level. If the sk_buff *skb* is allowed to pass (i.e.
* if the verdeict eBPF program returns **SK_PASS**), redirect it
* if the verdict eBPF program returns **SK_PASS**), redirect it
* to the socket referenced by *map* (of type
* **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and
* egress interfaces can be used for redirection. The
@ -2496,7 +2548,7 @@ union bpf_attr {
* result is from *reuse*\ **->socks**\ [] using the hash of the
* tuple.
*
* long bpf_sk_release(struct bpf_sock *sock)
* long bpf_sk_release(void *sock)
* Description
* Release the reference held by *sock*. *sock* must be a
* non-**NULL** pointer that was returned from
@ -2676,7 +2728,7 @@ union bpf_attr {
* result is from *reuse*\ **->socks**\ [] using the hash of the
* tuple.
*
* long bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
* long bpf_tcp_check_syncookie(void *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
* Description
* Check whether *iph* and *th* contain a valid SYN cookie ACK for
* the listening socket in *sk*.
@ -2807,7 +2859,7 @@ union bpf_attr {
*
* **-ERANGE** if resulting value was out of range.
*
* void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags)
* void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
* Description
* Get a bpf-local-storage from a *sk*.
*
@ -2823,6 +2875,9 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*.
*
* *sk* is a kernel **struct sock** pointer for LSM program.
* *sk* is a **struct bpf_sock** pointer for other program types.
*
* An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used
@ -2835,13 +2890,14 @@ union bpf_attr {
* **NULL** if not found or there was an error in adding
* a new bpf-local-storage.
*
* long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
* long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
* Description
* Delete a bpf-local-storage from a *sk*.
* Return
* 0 on success.
*
* **-ENOENT** if the bpf-local-storage cannot be found.
* **-EINVAL** if sk is not a fullsock (e.g. a request_sock).
*
* long bpf_send_signal(u32 sig)
* Description
@ -2858,7 +2914,7 @@ union bpf_attr {
*
* **-EAGAIN** if bpf program can try again.
*
* s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
* s64 bpf_tcp_gen_syncookie(void *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
* Description
* Try to issue a SYN cookie for the packet with corresponding
* IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
@ -3087,7 +3143,7 @@ union bpf_attr {
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
* long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
* long bpf_sk_assign(struct sk_buff *skb, void *sk, u64 flags)
* Description
* Helper is overloaded depending on BPF program type. This
* description applies to **BPF_PROG_TYPE_SCHED_CLS** and
@ -3215,11 +3271,11 @@ union bpf_attr {
*
* **-EOVERFLOW** if an overflow happened: The same object will be tried again.
*
* u64 bpf_sk_cgroup_id(struct bpf_sock *sk)
* u64 bpf_sk_cgroup_id(void *sk)
* Description
* Return the cgroup v2 id of the socket *sk*.
*
* *sk* must be a non-**NULL** pointer to a full socket, e.g. one
* *sk* must be a non-**NULL** pointer to a socket, e.g. one
* returned from **bpf_sk_lookup_xxx**\ (),
* **bpf_sk_fullsock**\ (), etc. The format of returned id is
* same as in **bpf_skb_cgroup_id**\ ().
@ -3229,7 +3285,7 @@ union bpf_attr {
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
* u64 bpf_sk_ancestor_cgroup_id(struct bpf_sock *sk, int ancestor_level)
* u64 bpf_sk_ancestor_cgroup_id(void *sk, int ancestor_level)
* Description
* Return id of cgroup v2 that is ancestor of cgroup associated
* with the *sk* at the *ancestor_level*. The root cgroup is at
@ -3337,38 +3393,38 @@ union bpf_attr {
* Description
* Dynamically cast a *sk* pointer to a *tcp6_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
* *sk* if casting is valid, or **NULL** otherwise.
*
* struct tcp_sock *bpf_skc_to_tcp_sock(void *sk)
* Description
* Dynamically cast a *sk* pointer to a *tcp_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
* *sk* if casting is valid, or **NULL** otherwise.
*
* struct tcp_timewait_sock *bpf_skc_to_tcp_timewait_sock(void *sk)
* Description
* Dynamically cast a *sk* pointer to a *tcp_timewait_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
* *sk* if casting is valid, or **NULL** otherwise.
*
* struct tcp_request_sock *bpf_skc_to_tcp_request_sock(void *sk)
* Description
* Dynamically cast a *sk* pointer to a *tcp_request_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
* *sk* if casting is valid, or **NULL** otherwise.
*
* struct udp6_sock *bpf_skc_to_udp6_sock(void *sk)
* Description
* Dynamically cast a *sk* pointer to a *udp6_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
* *sk* if casting is valid, or **NULL** otherwise.
*
* long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
* Description
* Return a user or a kernel stack in bpf program provided buffer.
* To achieve this, the helper needs *task*, which is a valid
* pointer to struct task_struct. To store the stacktrace, the
* bpf program provides *buf* with a nonnegative *size*.
* pointer to **struct task_struct**. To store the stacktrace, the
* bpf program provides *buf* with a nonnegative *size*.
*
* The last argument, *flags*, holds the number of stack frames to
* skip (from 0 to 255), masked with
@ -3395,6 +3451,293 @@ union bpf_attr {
* A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure.
*
* long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
* Description
* Load header option. Support reading a particular TCP header
* option for bpf program (**BPF_PROG_TYPE_SOCK_OPS**).
*
* If *flags* is 0, it will search the option from the
* *skops*\ **->skb_data**. The comment in **struct bpf_sock_ops**
* has details on what skb_data contains under different
* *skops*\ **->op**.
*
* The first byte of the *searchby_res* specifies the
* kind that it wants to search.
*
* If the searching kind is an experimental kind
* (i.e. 253 or 254 according to RFC6994). It also
* needs to specify the "magic" which is either
* 2 bytes or 4 bytes. It then also needs to
* specify the size of the magic by using
* the 2nd byte which is "kind-length" of a TCP
* header option and the "kind-length" also
* includes the first 2 bytes "kind" and "kind-length"
* itself as a normal TCP header option also does.
*
* For example, to search experimental kind 254 with
* 2 byte magic 0xeB9F, the searchby_res should be
* [ 254, 4, 0xeB, 0x9F, 0, 0, .... 0 ].
*
* To search for the standard window scale option (3),
* the *searchby_res* should be [ 3, 0, 0, .... 0 ].
* Note, kind-length must be 0 for regular option.
*
* Searching for No-Op (0) and End-of-Option-List (1) are
* not supported.
*
* *len* must be at least 2 bytes which is the minimal size
* of a header option.
*
* Supported flags:
*
* * **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the
* saved_syn packet or the just-received syn packet.
*
* Return
* > 0 when found, the header option is copied to *searchby_res*.
* The return value is the total length copied. On failure, a
* negative error code is returned:
*
* **-EINVAL** if a parameter is invalid.
*
* **-ENOMSG** if the option is not found.
*
* **-ENOENT** if no syn packet is available when
* **BPF_LOAD_HDR_OPT_TCP_SYN** is used.
*
* **-ENOSPC** if there is not enough space. Only *len* number of
* bytes are copied.
*
* **-EFAULT** on failure to parse the header options in the
* packet.
*
* **-EPERM** if the helper cannot be used under the current
* *skops*\ **->op**.
*
* long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from, u32 len, u64 flags)
* Description
* Store header option. The data will be copied
* from buffer *from* with length *len* to the TCP header.
*
* The buffer *from* should have the whole option that
* includes the kind, kind-length, and the actual
* option data. The *len* must be at least kind-length
* long. The kind-length does not have to be 4 byte
* aligned. The kernel will take care of the padding
* and setting the 4 bytes aligned value to th->doff.
*
* This helper will check for duplicated option
* by searching the same option in the outgoing skb.
*
* This helper can only be called during
* **BPF_SOCK_OPS_WRITE_HDR_OPT_CB**.
*
* Return
* 0 on success, or negative error in case of failure:
*
* **-EINVAL** If param is invalid.
*
* **-ENOSPC** if there is not enough space in the header.
* Nothing has been written
*
* **-EEXIST** if the option already exists.
*
* **-EFAULT** on failrue to parse the existing header options.
*
* **-EPERM** if the helper cannot be used under the current
* *skops*\ **->op**.
*
* long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64 flags)
* Description
* Reserve *len* bytes for the bpf header option. The
* space will be used by **bpf_store_hdr_opt**\ () later in
* **BPF_SOCK_OPS_WRITE_HDR_OPT_CB**.
*
* If **bpf_reserve_hdr_opt**\ () is called multiple times,
* the total number of bytes will be reserved.
*
* This helper can only be called during
* **BPF_SOCK_OPS_HDR_OPT_LEN_CB**.
*
* Return
* 0 on success, or negative error in case of failure:
*
* **-EINVAL** if a parameter is invalid.
*
* **-ENOSPC** if there is not enough space in the header.
*
* **-EPERM** if the helper cannot be used under the current
* *skops*\ **->op**.
*
* void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
* Description
* Get a bpf_local_storage from an *inode*.
*
* Logically, it could be thought of as getting the value from
* a *map* with *inode* as the **key**. From this
* perspective, the usage is not much different from
* **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
* helper enforces the key must be an inode and the map must also
* be a **BPF_MAP_TYPE_INODE_STORAGE**.
*
* Underneath, the value is stored locally at *inode* instead of
* the *map*. The *map* is used as the bpf-local-storage
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf_local_storage residing at *inode*.
*
* An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf_local_storage will be
* created if one does not exist. *value* can be used
* together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
* the initial value of a bpf_local_storage. If *value* is
* **NULL**, the new bpf_local_storage will be zero initialized.
* Return
* A bpf_local_storage pointer is returned on success.
*
* **NULL** if not found or there was an error in adding
* a new bpf_local_storage.
*
* int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
* Description
* Delete a bpf_local_storage from an *inode*.
* Return
* 0 on success.
*
* **-ENOENT** if the bpf_local_storage cannot be found.
*
* long bpf_d_path(struct path *path, char *buf, u32 sz)
* Description
* Return full path for given **struct path** object, which
* needs to be the kernel BTF *path* object. The path is
* returned in the provided buffer *buf* of size *sz* and
* is zero terminated.
*
* Return
* On success, the strictly positive length of the string,
* including the trailing NUL character. On error, a negative
* value.
*
* long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
* Description
* Read *size* bytes from user space address *user_ptr* and store
* the data in *dst*. This is a wrapper of **copy_from_user**\ ().
* Return
* 0 on success, or a negative error in case of failure.
*
* long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags)
* Description
* Use BTF to store a string representation of *ptr*->ptr in *str*,
* using *ptr*->type_id. This value should specify the type
* that *ptr*->ptr points to. LLVM __builtin_btf_type_id(type, 1)
* can be used to look up vmlinux BTF type ids. Traversing the
* data structure using BTF, the type information and values are
* stored in the first *str_size* - 1 bytes of *str*. Safe copy of
* the pointer data is carried out to avoid kernel crashes during
* operation. Smaller types can use string space on the stack;
* larger programs can use map data to store the string
* representation.
*
* The string can be subsequently shared with userspace via
* bpf_perf_event_output() or ring buffer interfaces.
* bpf_trace_printk() is to be avoided as it places too small
* a limit on string size to be useful.
*
* *flags* is a combination of
*
* **BTF_F_COMPACT**
* no formatting around type information
* **BTF_F_NONAME**
* no struct/union member names/types
* **BTF_F_PTR_RAW**
* show raw (unobfuscated) pointer values;
* equivalent to printk specifier %px.
* **BTF_F_ZERO**
* show zero-valued struct/union members; they
* are not displayed by default
*
* Return
* The number of bytes that were written (or would have been
* written if output had to be truncated due to string size),
* or a negative error in cases of failure.
*
* long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 ptr_size, u64 flags)
* Description
* Use BTF to write to seq_write a string representation of
* *ptr*->ptr, using *ptr*->type_id as per bpf_snprintf_btf().
* *flags* are identical to those used for bpf_snprintf_btf.
* Return
* 0 on success or a negative error in case of failure.
*
* u64 bpf_skb_cgroup_classid(struct sk_buff *skb)
* Description
* See **bpf_get_cgroup_classid**\ () for the main description.
* This helper differs from **bpf_get_cgroup_classid**\ () in that
* the cgroup v1 net_cls class is retrieved only from the *skb*'s
* associated socket instead of the current process.
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
* long bpf_redirect_neigh(u32 ifindex, u64 flags)
* Description
* Redirect the packet to another net device of index *ifindex*
* and fill in L2 addresses from neighboring subsystem. This helper
* is somewhat similar to **bpf_redirect**\ (), except that it
* populates L2 addresses as well, meaning, internally, the helper
* performs a FIB lookup based on the skb's networking header to
* get the address of the next hop and then relies on the neighbor
* lookup for the L2 address of the nexthop.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types, and enabled
* for IPv4 and IPv6 protocols.
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
*
* void *bpf_per_cpu_ptr(const void *percpu_ptr, u32 cpu)
* Description
* Take a pointer to a percpu ksym, *percpu_ptr*, and return a
* pointer to the percpu kernel variable on *cpu*. A ksym is an
* extern variable decorated with '__ksym'. For ksym, there is a
* global var (either static or global) defined of the same name
* in the kernel. The ksym is percpu if the global var is percpu.
* The returned pointer points to the global percpu var on *cpu*.
*
* bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
* kernel, except that bpf_per_cpu_ptr() may return NULL. This
* happens if *cpu* is larger than nr_cpu_ids. The caller of
* bpf_per_cpu_ptr() must check the returned value.
* Return
* A pointer pointing to the kernel percpu variable on *cpu*, or
* NULL, if *cpu* is invalid.
*
* void *bpf_this_cpu_ptr(const void *percpu_ptr)
* Description
* Take a pointer to a percpu ksym, *percpu_ptr*, and return a
* pointer to the percpu kernel variable on this cpu. See the
* description of 'ksym' in **bpf_per_cpu_ptr**\ ().
*
* bpf_this_cpu_ptr() has the same semantic as this_cpu_ptr() in
* the kernel. Different from **bpf_per_cpu_ptr**\ (), it would
* never return NULL.
* Return
* A pointer pointing to the kernel percpu variable on this cpu.
*
* long bpf_redirect_peer(u32 ifindex, u64 flags)
* Description
* Redirect the packet to another net device of index *ifindex*.
* This helper is somewhat similar to **bpf_redirect**\ (), except
* that the redirection happens to the *ifindex*' peer device and
* the netns switch takes place from ingress to ingress without
* going through the CPU's backlog queue.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types at the ingress
* hook and for veth device types. The peer device must reside in a
* different network namespace.
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -3539,6 +3882,20 @@ union bpf_attr {
FN(skc_to_tcp_request_sock), \
FN(skc_to_udp6_sock), \
FN(get_task_stack), \
FN(load_hdr_opt), \
FN(store_hdr_opt), \
FN(reserve_hdr_opt), \
FN(inode_storage_get), \
FN(inode_storage_delete), \
FN(d_path), \
FN(copy_from_user), \
FN(snprintf_btf), \
FN(seq_printf_btf), \
FN(skb_cgroup_classid), \
FN(redirect_neigh), \
FN(bpf_per_cpu_ptr), \
FN(bpf_this_cpu_ptr), \
FN(redirect_peer), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -3648,9 +4005,13 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
};
/* BPF_FUNC_sk_storage_get flags */
/* BPF_FUNC_<kernel_obj>_storage_get flags */
enum {
BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0),
BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
/* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
* and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
*/
BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
};
/* BPF_FUNC_read_branch_records flags. */
@ -4071,6 +4432,15 @@ struct bpf_link_info {
__u64 cgroup_id;
__u32 attach_type;
} cgroup;
struct {
__aligned_u64 target_name; /* in/out: target_name buffer ptr */
__u32 target_name_len; /* in/out: target_name buffer len */
union {
struct {
__u32 map_id;
} map;
};
} iter;
struct {
__u32 netns_ino;
__u32 attach_type;
@ -4158,6 +4528,36 @@ struct bpf_sock_ops {
__u64 bytes_received;
__u64 bytes_acked;
__bpf_md_ptr(struct bpf_sock *, sk);
/* [skb_data, skb_data_end) covers the whole TCP header.
*
* BPF_SOCK_OPS_PARSE_HDR_OPT_CB: The packet received
* BPF_SOCK_OPS_HDR_OPT_LEN_CB: Not useful because the
* header has not been written.
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB: The header and options have
* been written so far.
* BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: The SYNACK that concludes
* the 3WHS.
* BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: The ACK that concludes
* the 3WHS.
*
* bpf_load_hdr_opt() can also be used to read a particular option.
*/
__bpf_md_ptr(void *, skb_data);
__bpf_md_ptr(void *, skb_data_end);
__u32 skb_len; /* The total length of a packet.
* It includes the header, options,
* and payload.
*/
__u32 skb_tcp_flags; /* tcp_flags of the header. It provides
* an easy way to check for tcp_flags
* without parsing skb_data.
*
* In particular, the skb_tcp_flags
* will still be available in
* BPF_SOCK_OPS_HDR_OPT_LEN even though
* the outgoing header has not
* been written yet.
*/
};
/* Definitions for bpf_sock_ops_cb_flags */
@ -4166,8 +4566,51 @@ enum {
BPF_SOCK_OPS_RETRANS_CB_FLAG = (1<<1),
BPF_SOCK_OPS_STATE_CB_FLAG = (1<<2),
BPF_SOCK_OPS_RTT_CB_FLAG = (1<<3),
/* Call bpf for all received TCP headers. The bpf prog will be
* called under sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB
*
* Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB
* for the header option related helpers that will be useful
* to the bpf programs.
*
* It could be used at the client/active side (i.e. connect() side)
* when the server told it that the server was in syncookie
* mode and required the active side to resend the bpf-written
* options. The active side can keep writing the bpf-options until
* it received a valid packet from the server side to confirm
* the earlier packet (and options) has been received. The later
* example patch is using it like this at the active side when the
* server is in syncookie mode.
*
* The bpf prog will usually turn this off in the common cases.
*/
BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG = (1<<4),
/* Call bpf when kernel has received a header option that
* the kernel cannot handle. The bpf prog will be called under
* sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB.
*
* Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB
* for the header option related helpers that will be useful
* to the bpf programs.
*/
BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG = (1<<5),
/* Call bpf when the kernel is writing header options for the
* outgoing packet. The bpf prog will first be called
* to reserve space in a skb under
* sock_ops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB. Then
* the bpf prog will be called to write the header option(s)
* under sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*
* Please refer to the comment in BPF_SOCK_OPS_HDR_OPT_LEN_CB
* and BPF_SOCK_OPS_WRITE_HDR_OPT_CB for the header option
* related helpers that will be useful to the bpf programs.
*
* The kernel gets its chance to reserve space and write
* options first before the BPF program does.
*/
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
/* Mask of all currently supported cb flags */
BPF_SOCK_OPS_ALL_CB_FLAGS = 0xF,
BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F,
};
/* List of known BPF sock_ops operators.
@ -4223,6 +4666,63 @@ enum {
*/
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
*/
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle
* the packets received at
* an already established
* connection.
*
* sock_ops->skb_data:
* Referring to the received skb.
* It covers the TCP header only.
*
* bpf_load_hdr_opt() can also
* be used to search for a
* particular option.
*/
BPF_SOCK_OPS_HDR_OPT_LEN_CB, /* Reserve space for writing the
* header option later in
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
* Arg1: bool want_cookie. (in
* writing SYNACK only)
*
* sock_ops->skb_data:
* Not available because no header has
* been written yet.
*
* sock_ops->skb_tcp_flags:
* The tcp_flags of the
* outgoing skb. (e.g. SYN, ACK, FIN).
*
* bpf_reserve_hdr_opt() should
* be used to reserve space.
*/
BPF_SOCK_OPS_WRITE_HDR_OPT_CB, /* Write the header options
* Arg1: bool want_cookie. (in
* writing SYNACK only)
*
* sock_ops->skb_data:
* Referring to the outgoing skb.
* It covers the TCP header
* that has already been written
* by the kernel and the
* earlier bpf-progs.
*
* sock_ops->skb_tcp_flags:
* The tcp_flags of the outgoing
* skb. (e.g. SYN, ACK, FIN).
*
* bpf_store_hdr_opt() should
* be used to write the
* option.
*
* bpf_load_hdr_opt() can also
* be used to search for a
* particular option that
* has already been written
* by the kernel or the
* earlier bpf-progs.
*/
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
@ -4250,6 +4750,63 @@ enum {
enum {
TCP_BPF_IW = 1001, /* Set TCP initial congestion window */
TCP_BPF_SNDCWND_CLAMP = 1002, /* Set sndcwnd_clamp */
TCP_BPF_DELACK_MAX = 1003, /* Max delay ack in usecs */
TCP_BPF_RTO_MIN = 1004, /* Min delay ack in usecs */
/* Copy the SYN pkt to optval
*
* BPF_PROG_TYPE_SOCK_OPS only. It is similar to the
* bpf_getsockopt(TCP_SAVED_SYN) but it does not limit
* to only getting from the saved_syn. It can either get the
* syn packet from:
*
* 1. the just-received SYN packet (only available when writing the
* SYNACK). It will be useful when it is not necessary to
* save the SYN packet for latter use. It is also the only way
* to get the SYN during syncookie mode because the syn
* packet cannot be saved during syncookie.
*
* OR
*
* 2. the earlier saved syn which was done by
* bpf_setsockopt(TCP_SAVE_SYN).
*
* The bpf_getsockopt(TCP_BPF_SYN*) option will hide where the
* SYN packet is obtained.
*
* If the bpf-prog does not need the IP[46] header, the
* bpf-prog can avoid parsing the IP header by using
* TCP_BPF_SYN. Otherwise, the bpf-prog can get both
* IP[46] and TCP header by using TCP_BPF_SYN_IP.
*
* >0: Total number of bytes copied
* -ENOSPC: Not enough space in optval. Only optlen number of
* bytes is copied.
* -ENOENT: The SYN skb is not available now and the earlier SYN pkt
* is not saved by setsockopt(TCP_SAVE_SYN).
*/
TCP_BPF_SYN = 1005, /* Copy the TCP header */
TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */
TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */
};
enum {
BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0),
};
/* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*/
enum {
BPF_WRITE_HDR_TCP_CURRENT_MSS = 1, /* Kernel is finding the
* total option spaces
* required for an established
* sk in order to calculate the
* MSS. No skb is actually
* sent.
*/
BPF_WRITE_HDR_TCP_SYNACK_COOKIE = 2, /* Kernel is in syncookie mode
* when sending a SYN.
*/
};
struct bpf_perf_event_value {
@ -4447,4 +5004,34 @@ struct bpf_sk_lookup {
__u32 local_port; /* Host byte order */
};
/*
* struct btf_ptr is used for typed pointer representation; the
* type id is used to render the pointer data as the appropriate type
* via the bpf_snprintf_btf() helper described above. A flags field -
* potentially to specify additional details about the BTF pointer
* (rather than its mode of display) - is included for future use.
* Display flags - BTF_F_* - are passed to bpf_snprintf_btf separately.
*/
struct btf_ptr {
void *ptr;
__u32 type_id;
__u32 flags; /* BTF ptr flags; unused at present. */
};
/*
* Flags to control bpf_snprintf_btf() behaviour.
* - BTF_F_COMPACT: no formatting around type information
* - BTF_F_NONAME: no struct/union member names/types
* - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
* equivalent to %px.
* - BTF_F_ZERO: show zero-valued struct/union members; they
* are not displayed by default
*/
enum {
BTF_F_COMPACT = (1ULL << 0),
BTF_F_NONAME = (1ULL << 1),
BTF_F_PTR_RAW = (1ULL << 2),
BTF_F_ZERO = (1ULL << 3),
};
#endif /* _UAPI__LINUX_BPF_H__ */