IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add a new vlan options dump flag which causes only global vlan options
to be dumped. The dumps are done only with bridge devices, ports are
ignored. They support vlan compression if the options in sequential
vlans are equal (currently always true).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can have two types of vlan options depending on context:
- per-device vlan options (split in per-bridge and per-port)
- global vlan options
The second type wasn't supported in the bridge until now, but we need
them for per-vlan multicast support, per-vlan STP support and other
options which require global vlan context. They are contained in the global
bridge vlan context even if the vlan is not configured on the bridge device
itself. This patch adds initial netlink attributes and support for setting
these global vlan options, they can only be set (RTM_NEWVLAN) and the
operation must use the bridge device. Since there are no such options yet
it shouldn't have any functional effect.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the port multicast context to check if the router port is a vlan and
in case it is include its vlan id in the notification.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a global knob that controls if vlan multicast snooping is enabled.
The proper contexts (vlan or bridge-wide) will be chosen based on the knob
when processing packets and changing bridge device state. Note that
vlans have their individual mcast snooping enabled by default, but this
knob is needed to turn on bridge vlan snooping. It is disabled by
default. To enable the knob vlan filtering must also be enabled, it
doesn't make sense to have vlan mcast snooping without vlan filtering
since that would lead to inconsistencies. Disabling vlan filtering will
also automatically disable vlan mcast snooping.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.
This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:
* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)
This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.
In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:
a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.
The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html
Bugzilla: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs,
so it's now possible to call bpf_get_func_ip from both kprobe and
kretprobe programs.
Taking the caller's address from 'struct kprobe::addr', which is
defined for both kprobe and kretprobe.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs,
specifically for all trampoline attach types.
The trampoline's caller IP address is stored in (ctx - 8) address.
so there's no reason to actually call the helper, but rather fixup
the call instruction and return [ctx - 8] value directly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org
Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
in hash/array/lru maps as a regular field and helpers to operate on it:
// Initialize the timer.
// First 4 bits of 'flags' specify clockid.
// Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags);
// Configure the timer to call 'callback_fn' static function.
long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);
// Arm the timer to expire 'nsec' nanoseconds from the current time.
long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags);
// Cancel the timer and wait for callback_fn to finish if it was running.
long bpf_timer_cancel(struct bpf_timer *timer);
Here is how BPF program might look like:
struct map_elem {
int counter;
struct bpf_timer timer;
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1000);
__type(key, int);
__type(value, struct map_elem);
} hmap SEC(".maps");
static int timer_cb(void *map, int *key, struct map_elem *val);
/* val points to particular map element that contains bpf_timer. */
SEC("fentry/bpf_fentry_test1")
int BPF_PROG(test1, int a)
{
struct map_elem *val;
int key = 0;
val = bpf_map_lookup_elem(&hmap, &key);
if (val) {
bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME);
bpf_timer_set_callback(&val->timer, timer_cb);
bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0);
}
}
This patch adds helper implementations that rely on hrtimers
to call bpf functions as timers expire.
The following patches add necessary safety checks.
Only programs with CAP_BPF are allowed to use bpf_timer.
The amount of timers used by the program is constrained by
the memcg recorded at map creation time.
The bpf_timer_init() helper needs explicit 'map' argument because inner maps
are dynamic and not known at load time. While the bpf_timer_set_callback() is
receiving hidden 'aux->prog' argument supplied by the verifier.
The prog pointer is needed to do refcnting of bpf program to make sure that
program doesn't get freed while the timer is armed. This approach relies on
"user refcnt" scheme used in prog_array that stores bpf programs for
bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is
paired with bpf_timer_cancel() that will drop the prog refcnt. The
ops->map_release_uref is responsible for cancelling the timers and dropping
prog refcnt when user space reference to a map reaches zero.
This uref approach is done to make sure that Ctrl-C of user space process will
not leave timers running forever unless the user space explicitly pinned a map
that contained timers in bpffs.
bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't
have user references (is not held by open file descriptor from user space and
not pinned in bpffs).
The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
and free the timer if given map element had it allocated.
"bpftool map update" command can be used to cancel timers.
The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
'__u64 :64' has 1 byte alignment of 8 byte padding.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDnGVYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv6UEAC78zkseI8TmKaowNfkz/+MkP9eSFb1pVn3
rxpbPOsZompHoZpeWt4oHL+3Rmm3a9iRo/APA2ELas4zvp+Q+6uG7eha2Dc4hUA9
YgeO4z9YfG8wQNZc3x7bncb6ZwqEE5nnbFe/m25SyrAZVLlZ7FKHxfoZDqjhlGFC
eLNiYO6vdvwgCoBMcotyCDttrPfEu6947/5vB1zevv57twdQQaEWGUhvyx1XrlDX
0YD5fmdOjNU2isgxt4xo2Ur2zL6w254/hvj58sV3Z7JfkJpI9DCK+ztKEfzuyEhA
WYz06rDAT1+1KuVLfowaZ+pYiPPOIsL0+QXI83r3nLaE7WGGlfS8Hmz//1FbziYs
ZSZI826kEN+/lKeWTcKOOMhmkYyXEFFuQZS34eg9KI4xwML8v+ILlHmcp+tjebw9
vzNF6f7N2ki+jnyxxyNxeMHxeAMWsqnIRROOhZg6bbs6UVNpDy4qRzpQaDOaJsVe
uSAQ6PTd/etR9KE+ClhLe6X7Rmp/lfZCPe64wqM/3k1qV2KWhE1fwCQO4c5o1MBN
rpk3Ef5PZYP3aakCvZnfcjMWlpZNbq/xMc6vPc+yq32akq1t1KbODVBiR5odcH0C
Gt5N11im50SO06haBt7EOe4JMQLbK5sxG15t4C6mNQZgPegGfaLlVkKpzIkOzUha
OkRofKMcDA==
=gHse
-----END PGP SIGNATURE-----
Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"A combination of changes that ended up depending on both the driver
and core branch (and/or the IDE removal), and a few late arriving
fixes. In detail:
- Fix io ticks wrap-around issue (Chunguang)
- nvme-tcp sock locking fix (Maurizio)
- s390-dasd fixes (Kees, Christoph)
- blk_execute_rq polling support (Keith)
- blk-cgroup RCU iteration fix (Yu)
- nbd backend ID addition (Prasanna)
- Partition deletion fix (Yufen)
- Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)
- Removal of now dead block request types due to IDE removal
(Christoph)
- Loop probing and control device cleanups (Christoph)
- Device uevent fix (Christoph)
- Misc cleanups/fixes (Tetsuo, Christoph)"
* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
block: fix the problem of io_ticks becoming smaller
nvme-tcp: can't set sk_user_data without write_lock
loop: remove unused variable in loop_set_status()
block: remove the bdgrab in blk_drop_partitions
block: grab a device refcount in disk_uevent
s390/dasd: Avoid field over-reading memcpy()
dasd: unexport dasd_set_target_state
block: check disk exist before trying to add partition
ubd: remove dead code in ubd_setup_common
nvme: use return value from blk_execute_rq()
block: return errors from blk_execute_rq()
nvme: use blk_execute_rq() for passthrough commands
block: support polling through blk_execute_rq
block: remove REQ_OP_SCSI_{IN,OUT}
block: mark blk_mq_init_queue_data static
loop: rewrite loop_exit using idr_for_each_entry
loop: split loop_lookup
loop: don't allow deleting an unspecified loop device
loop: move loop_ctl_mutex locking into loop_add
...
Doorbell remapping for ifcvf, mlx5.
virtio_vdpa support for mlx5.
Validate device input in several drivers (for SEV and friends).
ZONE_MOVABLE aware handling in virtio-mem.
Misc fixes, cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmDm5jQPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp6mYIAMTk5ggM5xdt6NCAASAigssEAoCTMorfoxkx
i7O562TEejgLvYKx/EZnYF+YpmYGyWEY9AgxMPxP/nPRLszuf0nZSmMp5ivu/vMz
zwpAto+7RpUmIQP+N6QjWabiWrpQI9EnXA47kOnyU703Y+RnITPNCvD1PpnDG3zs
W2GdH7DKqwsCY22hB+zboH2D6HNf3gTuUtgUBYbdBnYVxdOsSd1dx9Te0EKUTV3y
uvENmFEcushDRYpUhAsZm4bKcLOn+6rgNGXuXNa4R/hUlJTwrQjGmzu+ua6vfMwF
dcGxdaeMJUo8o0C1Pz7wJBXF5UZXQlxoyBP+0b0ZTm69AwmIHMY=
=o6A1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio,vhost,vdpa updates from Michael Tsirkin:
- Doorbell remapping for ifcvf, mlx5
- virtio_vdpa support for mlx5
- Validate device input in several drivers (for SEV and friends)
- ZONE_MOVABLE aware handling in virtio-mem
- Misc fixes, cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-mem: prioritize unplug from ZONE_MOVABLE in Big Block Mode
virtio-mem: simplify high-level unplug handling in Big Block Mode
virtio-mem: prioritize unplug from ZONE_MOVABLE in Sub Block Mode
virtio-mem: simplify high-level unplug handling in Sub Block Mode
virtio-mem: simplify high-level plug handling in Sub Block Mode
virtio-mem: use page_zonenum() in virtio_mem_fake_offline()
virtio-mem: don't read big block size in Sub Block Mode
virtio/vdpa: clear the virtqueue state during probe
vp_vdpa: allow set vq state to initial state after reset
virtio-pci library: introduce vp_modern_get_driver_features()
vdpa: support packed virtqueue for set/get_vq_state()
virtio-ring: store DMA metadata in desc_extra for split virtqueue
virtio: use err label in __vring_new_virtqueue()
virtio_ring: introduce virtqueue_desc_add_split()
virtio_ring: secure handling of mapping errors
virtio-ring: factor out desc_extra allocation
virtio_ring: rename vring_desc_extra_packed
virtio-ring: maintain next in extra state for packed virtqueue
vdpa/mlx5: Clear vq ready indication upon device reset
vdpa/mlx5: Add support for doorbell bypassing
...
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmDnZwAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW1BD/9SHWGYhxLY+xL27eO0Q8XOPePb
diqllGavzq3fcakmJ3+6iIpb/WYX0ztu1M4KMBRP3QxNjP6nFkS1ph3PC0LL3ec2
h23hRfOrhlQd4rdonPcq/Z7oXKhrkem9G6KneVfvB94HmXnaZIrNBjwQRy0uRMXE
/IVNH4o6YMR8Av/VrG+L6BS+O/oXVnYVSLOuXsIrxmxS24NybsOpRzHvl14ZUsHt
eiwzcRC3ugAaxJn8cOSrHdBwvdOgbFFWEtMITcesQpYru+EmQcsCZdmJ0DbwsV2e
9k+LrVoy0CZFoekBtaaFvZq+JVBjUZKoAUYBML4ejWnQKolJH0BZQRh4RT0rbTjc
UMiuE3kFUsdJjzJRyO4pcqpwaNhCiZ2XrwyKeev/FLIn95bD1xbLJWfRvoKhioiI
X+1vujN2+N5n8T+u8sCVohujJCkUkMjevfF6ew8rvYOj3FrGqTi4jgrXUFAIsjLa
mHdA92oHIjNOCjyVIqnoUFTDltVMW9CwnLtd5nPnGvJoMtsj7lthy6fdtdPH0WVu
iNR4toE/AjBJo4rtib/irYbZtqmw2AbBFqoRk4yj8Fw4ZdSPYELwAR1aah0Oce9R
t1T9OE66vlr28XIC0NF917JfSNkc2eXnx4B21Zh+a/68XSJ1FzXPTob3lvXVVhQR
Ou4aw6dH7mql/2bq1w==
=wAww
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove unneeded semicolon in um_arch.c
um: Remove the repeated declaration
um: fix error return code in winch_tramp()
um: fix error return code in slip_open()
um: Fix stack pointer alignment
um: implement flush_cache_vmap/flush_cache_vunmap
um: add a UML specific futex implementation
um: enable the use of optimized xor routines in UML
um: Add support for host CPU flags and alignment
um: allow not setting extra rpaths in the linux binary
um: virtio/pci: enable suspend/resume
um: add PCI over virtio emulation driver
um: irqs: allow invoking time-travel handler multiple times
um: time-travel/signals: fix ndelay() in interrupt
um: expose time-travel mode to userspace side
um: export signals_enabled directly
um: remove unused smp_sigio_handler() declaration
lib: add iomem emulation (logic_iomem)
um: allow disabling NO_IOMEM
Pull yet more updates from Andrew Morton:
"54 patches.
Subsystems affected by this patch series: lib, mm (slub, secretmem,
cleanups, init, pagemap, and mremap), and debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (54 commits)
powerpc/mm: enable HAVE_MOVE_PMD support
powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
mm/mremap: allow arch runtime override
mm/mremap: hold the rmap lock in write mode when moving page table entries.
mm/mremap: use pmd/pud_poplulate to update page table entries
mm/mremap: don't enable optimized PUD move if page table levels is 2
mm/mremap: convert huge PUD move to separate helper
selftest/mremap_test: avoid crash with static build
selftest/mremap_test: update the test to handle pagesize other than 4K
mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
kdump: use vmlinux_build_id to simplify
buildid: fix kernel-doc notation
buildid: mark some arguments const
scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
scripts/decode_stacktrace.sh: support debuginfod
x86/dumpstack: use %pSb/%pBb for backtrace printing
arm64: stacktrace: use %pSb for backtrace printing
module: add printk formats to add module build ID to stacktraces
...
Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.
The secretmem feature is off by default and the user must explicitly
enable it at the boot time.
Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call. The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.
Secretmem is designed to provide the following protections:
* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes
"simple" ROP insufficient to perform exfiltration, which increases the
required complexity of the attack. Along with other protections like
the kernel stack size limit and address space layout randomization which
make finding gadgets is really hard, absence of any in-kernel primitive
for accessing secret memory means the one gadget ROP attack can't work.
Since the only way to access secret memory is to reconstruct the missing
mapping entry, the attacker has to recover the physical page and insert
a PTE pointing to it in the kernel and then retrieve the contents. That
takes at least three gadgets which is a level of difficulty beyond most
standard attacks.
* Prevent cross-process secret userspace memory exposures. Once the
secret memory is allocated, the user can't accidentally pass it into the
kernel to be transmitted somewhere. The secreremem pages cannot be
accessed via the direct map and they are disallowed in GUP.
* Harden against exploited kernel flaws. In order to access secretmem,
a kernel-side attack would need to either walk the page tables and
create new ones, or spawn a new privileged uiserspace process to perform
secrets exfiltration using ptrace.
The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations. Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU. Andy Lutomirski says:
"Getting fd-backed memory into a guest will take some possibly major
work in the kernel, but getting vma-backed memory into a guest without
mapping it in the host user address space seems much, much worse."
memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory. Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall. Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with. If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.
The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.
Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.
Pages in the secretmem regions are unevictable and unmovable to avoid
accidental exposure of the sensitive data via swap or during page
migration.
Since the secretmem mappings are locked in memory they cannot exceed
RLIMIT_MEMLOCK. Since these mappings are already locked independently
from mlock(), an attempt to mlock()/munlock() secretmem range would fail
and mlockall()/munlockall() will ignore secretmem mappings.
However, unlike mlock()ed memory, secretmem currently behaves more like
long-term GUP: secretmem mappings are unmovable mappings directly consumed
by user space. With default limits, there is no excessive use of
secretmem and it poses no real problem in combination with
ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.
A page that was a part of the secret memory area is cleared when it is
freed to ensure the data is not exposed to the next user of that page.
The following example demonstrates creation of a secret mapping (error
handling is omitted):
fd = memfd_secret(0);
ftruncate(fd, MAP_SIZE);
ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
[akpm@linux-foundation.org: suppress Kconfig whine]
Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
BPF_PROG_TEST_RUN.
The intended use case is to pass some XDP meta data to the test runs of
XDP programs that are used as tail calls.
For programs that use bpf_prog_test_run_xdp, support xdp_md input and
output. Unlike with an actual xdp_md during a non-test run, data_meta must
be 0 because it must point to the start of the provided user data. From
the initial xdp_md, use data and data_end to adjust the pointers in the
generated xdp_buff. All other non-zero fields are prohibited (with
EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
different) xdp_md back to the userspace.
We require all fields of input xdp_md except the ones we explicitly
support to be set to zero. The expectation is that in the future we might
add support for more fields and we want to fail explicitly if the user
runs the program on the kernel where we don't yet support them.
Co-developed-by: Cody Haas <chaas@riotgames.com>
Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Cody Haas <chaas@riotgames.com>
Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Zvi Effron <zeffron@riotgames.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
- Prevent sigaltstack out of bounds writes. The kernel unconditionally
writes the FPU state to the alternate stack without checking whether
the stack is large enough to accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been
updated despite the fact that the FPU state which is stored on the
signal stack has grown over time which causes trouble in the field
when AVX512 is available on a CPU. The kernel does not expose the
minimum requirements for the alternate stack size depending on the
available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well
- A major cleanup of the x86 FPU code. The recent discoveries of XSTATE
related issues unearthed quite some inconsistencies, duplicated code
and other issues.
The fine granular overhaul addresses this, makes the code more robust
and maintainable, which allows to integrate upcoming XSTATE related
features in sane ways.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDlcpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeP5D/4i+AgYYeiMLgGb+NS7iaKPfoWo6LIz
y3qdTSA0DQaIYbYivWwRO/g0GYdDMXDWeZalFi7eGnVI8O3eOog+22Zrf/y0UINB
KJHdYd4ApWHhs401022y5hexrWQvnV8w1yQCuj/zLm6eC+AVhdwt2AY+IBoRrdUj
wqY97B/4rJNsBvvqTDn9EeDrJA2y0y0Suc7AhIp2BGMI+dpIdxys8RJDamXNWyDL
gJf0YRgUoiIn3AHKb+fgv60AoxfC175NSg/5/y/scFNXqVlW0Up4YCb7pqG9o2Ga
f3XvtWfbw1N5PmUYjFkALwEkzGUbM3v0RA3xLY2j2WlWm9fBPPy59dt+i/h/VKyA
GrA7i7lcIqX8dfVH6XkrReZBkRDSB6t9SZTvV54jAz5fcIZO2Rg++UFUvI/R6GKK
XCcxukYaArwo+IG62iqDszS3gfLGhcor/cviOeULRC5zMUIO4Jah+IhDnifmShtC
M5s9QzrwIRD/XMewGRQmvkiN4kBfE7jFoBQr1J9leCXJKrM+2JQmMzVInuubTQIq
SdlKOaAIn7xtekz+6XdFG9Gmhck0PCLMJMOLNvQkKWI3KqGLRZ+dAWKK0vsCizAx
0BA7ZeB9w9lFT+D8mQCX77JvW9+VNwyfwIOLIrJRHk3VqVpS5qvoiFTLGJJBdZx4
/TbbRZu7nXDN2w==
=Mq1m
-----END PGP SIGNATURE-----
Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes.
The kernel unconditionally writes the FPU state to the alternate
stack without checking whether the stack is large enough to
accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space
data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
been updated despite the fact that the FPU state which is stored on
the signal stack has grown over time which causes trouble in the
field when AVX512 is available on a CPU. The kernel does not expose
the minimum requirements for the alternate stack size depending on
the available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well.
- A major cleanup of the x86 FPU code. The recent discoveries of
XSTATE related issues unearthed quite some inconsistencies,
duplicated code and other issues.
The fine granular overhaul addresses this, makes the code more
robust and maintainable, which allows to integrate upcoming XSTATE
related features in sane ways"
* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
x86/fpu/signal: Let xrstor handle the features to init
x86/fpu/signal: Handle #PF in the direct restore path
x86/fpu: Return proper error codes from user access functions
x86/fpu/signal: Split out the direct restore code
x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
x86/fpu/signal: Sanitize the xstate check on sigframe
x86/fpu/signal: Remove the legacy alignment check
x86/fpu/signal: Move initial checks into fpu__restore_sig()
x86/fpu: Mark init_fpstate __ro_after_init
x86/pkru: Remove xstate fiddling from write_pkru()
x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
x86/fpu: Remove PKRU handling from switch_fpu_finish()
x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
x86/fpu: Hook up PKRU into ptrace()
x86/fpu: Add PKRU storage outside of task XSAVE buffer
x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
...
nf_conntrack_netlink.h does not exist, refer to nfnetlink_conntrack.h instead.
Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYORvYQAKCRDh3BK/laaZ
PCfvAQCbU+PW2RbwlqjZMet6w9qorh29XYe786P5pNRVbMYCygD+N45l66Sbd/Rz
7M7ioVDseyTW4dnLhb8SzSNB0zr6jQs=
=MDvD
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fixes for virtiofs submounts
- Misc fixes and cleanups
* tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtiofs: Fix spelling mistakes
fuse: use DIV_ROUND_UP helper macro for calculations
fuse: fix illegal access to inode with reused nodeid
fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)
fuse: Make fuse_fill_super_submount() static
fuse: Switch to fc_mount() for submounts
fuse: Call vfs_get_tree() for submounts
fuse: add dedicated filesystem context ops for submounts
virtiofs: propagate sync() to file server
fuse: reject internal errno
fuse: check connected before queueing on fpq->io
fuse: ignore PG_workingset after stealing
fuse: Fix infinite loop in sget_fc()
fuse: Fix crash if superblock of submount gets killed early
fuse: Fix crash in fuse_dentry_automount() error path
Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups. Highlights
are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM4qQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylKvQCfbh+OmTkDlDlDhSWlxuV05M1XTXoAoLUcLZru
s5JCnwSZztQQLMDHj7Pd
=Zupm
-----END PGP SIGNATURE-----
Merge tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups.
Highlights are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (227 commits)
serial: mvebu-uart: remove unused member nb from struct mvebu_uart
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
dt-bindings: mvebu-uart: fix documentation
serial: mvebu-uart: correctly calculate minimal possible baudrate
serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
serial: mvebu-uart: fix calculation of clock divisor
tty: make linux/tty_flip.h self-contained
serial: Prefer unsigned int to bare use of unsigned
serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
serial: qcom_geni_serial: use DT aliases according to DT bindings
Revert "tty: serial: Add UART driver for Cortina-Access platform"
tty: serial: Add UART driver for Cortina-Access platform
MAINTAINERS: add me back as mxser maintainer
mxser: Documentation, fix typos
mxser: Documentation, make the docs up-to-date
mxser: Documentation, remove traces of callout device
mxser: introduce mxser_16550A_or_MUST helper
mxser: rename flags to old_speed in mxser_set_serial_info
mxser: use port variable in mxser_set_serial_info
mxser: access info->MCR under info->slock
...
Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanna driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems mushed
together" tree...
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM8jQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymECgCg0yL+8WxDKO5Gg5llM5PshvLB1rQAn0y5pDgg
nw78LV3HQ0U7qaZBtI91
=x+AR
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanalabs driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems
mushed together" tree...
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
mcb: Use DEFINE_RES_MEM() helper macro and fix the end address
PNP: moved EXPORT_SYMBOL so that it immediately followed its function/variable
bus: mhi: pci-generic: Add missing 'pci_disable_pcie_error_reporting()' calls
bus: mhi: Wait for M2 state during system resume
bus: mhi: core: Fix power down latency
intel_th: Wait until port is in reset before programming it
intel_th: msu: Make contiguous buffers uncached
intel_th: Remove an unused exit point from intel_th_remove()
stm class: Spelling fix
nitro_enclaves: Set Bus Master for the NE PCI device
misc: ibmasm: Modify matricies to matrices
misc: vmw_vmci: return the correct errno code
siox: Simplify error handling via dev_err_probe()
fpga: machxo2-spi: Address warning about unused variable
lkdtm/heap: Add init_on_alloc tests
selftests/lkdtm: Enable various testable CONFIGs
lkdtm: Add CONFIG hints in errors where possible
lkdtm: Enable DOUBLE_FAULT on all architectures
lkdtm/heap: Add vmalloc linear overflow test
lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE
...
- Add support for the CXL Fixed Memory Window Structure, a recent
extension of the ACPI CEDT (CXL Early Discovery Table)
- Add infrastructure for component registers
- Add HDM (Host-managed device memory) decoder definitions
- Define a device model for an HDM decoder tree
- Bridge CXL persistent memory capabilities to an NVDIMM bus /
device-model
- Switch to fine grained mapping of CXL MMIO registers to allow
different drivers / system software to own individual register blocks
- Enable media provisioning commands, and publish the label storage area
size in sysfs
- Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYOB2lAAKCRDfioYZHlFs
ZyyaAP9O+SnYflFX+3gpoU4pK92VbIUl9KzzHdvJdW2CqtEVMgD9GO4V2Ng17WFg
/Mzn9Mj9S+YaHYvOsN6qEF1V0QvqNQ4=
=+X3m
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"This subsystem is still in the build-out phase as the bulk of the
update is improvements to enumeration and fleshing out the device
model. In terms of new features, more mailbox commands have been added
to the allowed-list in support of persistent memory provisioning
support targeting v5.15.
The critical update from an enumeration perspective is support for the
CXL Fixed Memory Window Structure that indicates to Linux which system
physical address ranges decode to the CXL Host Bridges in the system.
This allows the driver to detect which address ranges have been mapped
by firmware and what address ranges are available for future hotplug.
So, again, mostly skeleton this round, with more meat targeting v5.15.
Summary:
- Add support for the CXL Fixed Memory Window Structure, a recent
extension of the ACPI CEDT (CXL Early Discovery Table)
- Add infrastructure for component registers
- Add HDM (Host-managed device memory) decoder definitions
- Define a device model for an HDM decoder tree
- Bridge CXL persistent memory capabilities to an NVDIMM bus /
device-model
- Switch to fine grained mapping of CXL MMIO registers to allow
different drivers / system software to own individual register
blocks
- Enable media provisioning commands, and publish the label storage
area size in sysfs
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
cxl/pci: Rename CXL REGLOC ID
cxl/acpi: Use the ACPI CFMWS to create static decoder objects
cxl/acpi: Add the Host Bridge base address to CXL port objects
cxl/pmem: Register 'pmem' / cxl_nvdimm devices
libnvdimm: Drop unused device power management support
libnvdimm: Export nvdimm shutdown helper, nvdimm_delete()
cxl/pmem: Add initial infrastructure for pmem support
cxl/core: Add cxl-bus driver infrastructure
cxl/pci: Add media provisioning required commands
cxl/component_regs: Fix offset
cxl/hdm: Fix decoder count calculation
cxl/acpi: Introduce cxl_decoder objects
cxl/acpi: Enumerate host bridge root ports
cxl/acpi: Add downstream port data to cxl_port instances
cxl/Kconfig: Default drivers to CONFIG_CXL_BUS
cxl/acpi: Introduce the root of a cxl_port topology
cxl/pci: Fixup devm_cxl_iomap_block() to take a 'struct device *'
cxl/pci: Add HDM decoder capabilities
cxl/pci: Reserve individual register block regions
cxl/pci: Map registers based on capabilities
...
As the diffstat scatters over the tree, we've got many tree-wide
small changes, but also got quite a few intrusive changes in the
core side. The only ABI-visible core change is the new rawmidi
framing mode support while others are kernel-internal, mostly code
refactoring and/or nice improvements.
Here are some highlights:
Core:
- A new framing access mode for rawmidi to get timestamps
- Cleanup / refactoring of buffer memory management helper code
- Support for automatic negotiation of ASoC DAI formats
- Revival of software suspend for PCM and control core, as a
preliminary work for PCI BAR rescan support
ASoC:
- Accessory detection support for several Qualcomm parts
- Support for IEC958 control with hdmi-codec
- Merging of Tegra machine drivers into a single driver
- Support for AmLogic SM1 TOACODEC, Intel AlderLake-M, several NXP
i.MX8 variants, NXP TFA1 and TDF9897, Rockchip RK817, Qualcomm
Quinary MI2S, Texas Instruments TAS2505
USB-audio:
- Reduction of latency at playback start
- Code cleanup / fixes of usx2y driver
- Scarlett2 mixer code fixes and enhancements
- Quirks for Ozone and Denon devices
HD-audio:
- A few quirks for HP and ASUS machines
- Display power management fixes
Others:
- FireWire code refactoring and enhancements
- Tree-wide trivial coding-style fixes
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmDezZ8OHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8kxhAAhWyQBU9HX7xo/132J0SsP0tqT0NhePr217T7
RSmRre3kD8iNDtpe9P9eziyN5q5TB8+InY52yUiDiWdSxCHOBDKBPwzzDeVRdOGg
/BdQqCtEAtdgzsnPEsXxvK+y3LayLwlkCyN1kMLCl4KGFfMg8fQgMbXobEOfyKwT
r23Xgbkb8jIFyjEAmvJnbvPXjkUjv2ew0+RB+cqV80m9h+KdNN/k/8+0wdWaQBEd
6oRSE8ApYclgyLf4TYCkLAwTP3NQ8SuvYSuZ9tFGy7vjz7l69xPWS/6ndbfrJvEP
pO615dYWKTWQKi2SjStnjFUH7ODirbPZzyDqU9tvEAGbKj0l07LNXy0JrbwJ8VB2
JBfsdTlmSwHcAXmj0JB7IhflmG96W9azUKf1a8YNJLw7UgHoXjDZC0NFtOanaB7x
PEdf+27Ty1pZaNbtJpwdzEyc06UNA8dNRAr6FsSMphRTnBBzcUz2uEAfWbevS7Yl
Q5IOZd2lqKIUHPB6j2r0iDiCKScXzlHVhFzOLj3vHS4L8vjfbVshxkIkSY0HeA9r
HINbkby2mpk901oyqaFwTLkoatXBk5zDX2m2ow7rQLLUIdF0hWBXrE5Xrl3c/T/x
aFXuDxJfZm4YibJiBAOO5xv03DzVBNJz01E020oyum2m0NHhB1xD3RK428pcjshV
zgXJaZg=
=IN9G
-----END PGP SIGNATURE-----
Merge tag 'sound-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"As the diffstat scatters over the tree, we've got many tree-wide small
changes, but also got quite a few intrusive changes in the core side.
The only ABI-visible core change is the new rawmidi framing mode
support while others are kernel-internal, mostly code refactoring
and/or nice improvements.
Here are some highlights:
Core:
- A new framing access mode for rawmidi to get timestamps
- Cleanup / refactoring of buffer memory management helper code
- Support for automatic negotiation of ASoC DAI formats
- Revival of software suspend for PCM and control core, as a
preliminary work for PCI BAR rescan support
ASoC:
- Accessory detection support for several Qualcomm parts
- Support for IEC958 control with hdmi-codec
- Merging of Tegra machine drivers into a single driver
- Support for AmLogic SM1 TOACODEC, Intel AlderLake-M, several NXP
i.MX8 variants, NXP TFA1 and TDF9897, Rockchip RK817, Qualcomm
Quinary MI2S, Texas Instruments TAS2505
USB-audio:
- Reduction of latency at playback start
- Code cleanup / fixes of usx2y driver
- Scarlett2 mixer code fixes and enhancements
- Quirks for Ozone and Denon devices
HD-audio:
- A few quirks for HP and ASUS machines
- Display power management fixes
Others:
- FireWire code refactoring and enhancements
- Tree-wide trivial coding-style fixes"
* tag 'sound-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (594 commits)
ALSA: usb-audio: scarlett2: Fix for loop increment in scarlett2_usb_get_config
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
ALSA: hda/realtek - Add ALC285 HP init procedure
ALSA: hda/realtek - Add type for ALC287
ALSA: scarlett2: Fix scarlett2_*_ctl_put() return values again
ALSA: scarlett2: Fix pad count for 18i8 Gen 3
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
ALSA: firewire-lib: Fix 'amdtp_domain_start()' when no AMDTP_OUT_STREAM stream is found
ASoC: qcom: lpass-cpu: mark IRQ_CLEAR register as volatile and readable
ALSA: hda: Release codec display power during shutdown/reboot
ALSA: hda: Release controller display power during shutdown/reboot
ALSA: hda/realtek: Apply LED fixup for HP Dragonfly G1, too
ASoC: fsl: remove unnecessary oom message
ASoC: tlv320aic32x4: dt-bindings: add TAS2505 to compatible
ASoC: tlv320aic32x4: add support for TAS2505
ASoC: tlv320aic32x4: add type to device private data struct
ASoC: tegra30: ahub: Use devm_platform_get_and_ioremap_resource()
ASoC: tegra: tegra210_admaif: Use devm_platform_get_and_ioremap_resource()
...
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
This PR contains a replacement driver for Intel iWarp hardware. This new
driver supports the old ethernet hardware and also newer chips that can do
ROCE. Otherwise this contains the typical mix of patches:
- Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
- Many static checker driven code clean ups, including a wide refcount_t
conversion
- Several series for the hns driver, more HIP09 HW capabilities, migration
to new HW register manipulators, and code cleanups
- Minor fixes and improvements in srp, rts, and cm
- Improvements throughout for sysfs related code to use DEVICE_ATTR_*,
make the ib_port sysfs first-class, and overall use sysfs APIs properly
- Intel's new irdma driver replacing i40iw
- rxe general clean ups and Memory Window support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmDcunQACgkQOG33FX4g
mxqSBA//dsZi/UzpzgU+YqyMFmUp04wd2/iCYzOcCViNPQZCyCARbGaMXI4kMa4s
8dM5xU76OnCuNSnXHaIwvHC3CdN9GUm08j9eWY7syvAiKtXCjzv7qmCVfBw35UyK
IXKfXh57toTSSAIfxw8yKc97QgaDSJ2zQ34fXkoE0AvTlfyN6pHQe9ef/Ca0ejS4
awUGYVG/oilLXrEHcSSAv5UoX6hOUje6jqqRgp5jmZTI3g7SlIPL8mWgXBkHAYmd
kDX7lBd09CKo2bmR071/kF6xUzvbCg1tmeE6lZze7gE+aKlBkZcvCBe1RAh3sBzK
ysLfON5GGw5qnkMaY8j5h3sgWvi3qTTEW+jCAmmVi/6z4PF47mvmVVn+/pZc3y2e
PqH43cunhwS0KuoUJ5Sd48J/UvabrvdbCNZrjCGCpt45EF4VwKxYMh74Bf0ABEQS
i9eKR/+wyHG6Uv1U37fIXsqa8yUttl9aV/s88s8irn4xhG8ygBLZgeVQNeGUfvdV
1W0XLEjRmKFezC1FhiPOz7CLIgL3BfSU1V+S7p0Gvb6ijZqyZTfRUaWbaD3KJpRT
1kwzE4qp6IbJMEqgQH/lq0xBzvzF48FPvBslX5kwlm0phQRrMCwMVIafutpu395q
ySeStEvsTVfz/JUHL3ZaEJyTRjAvPL0lXLH80XUpgWk9GzsksOM=
=wyqt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This contains a replacement driver for Intel iWarp hardware. This new
driver supports the old ethernet hardware and also newer chips that
can do ROCE.
Other than that, this contains the typical mix of patches:
- Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
- Many static checker driven code clean ups, including a wide
refcount_t conversion
- Several series for the hns driver, more HIP09 HW capabilities,
migration to new HW register manipulators, and code cleanups
- Minor fixes and improvements in srp, rts, and cm
- Improvements throughout for sysfs related code to use
DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use
sysfs APIs properly
- Intel's new irdma driver replacing i40iw
- rxe general clean ups and Memory Window support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits)
RDMA/core: Always release restrack object
RDMA/mlx5: Don't access NULL-cleared mpi pointer
RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
RDMA/rxe: Missing unlock on error in get_srq_wqe()
RDMA/cma: Fix rdma_resolve_route() memory leak
RDMA/core/sa_query: Remove unused argument
RDMA/cma: Fix incorrect Packet Lifetime calculation
RDMA/cma: Protect RMW with qp_mutex
RDMA/cma: Remove unnecessary INIT->INIT transition
RDMA/hns: Add window selection field of congestion control
RDMA/hfi1: Remove use of kmap()
RDMA/irdma: Remove use of kmap()
RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
IB/isert: Align target max I/O size to initiator size
RDMA/hns: Fix incorrect vlan enable bit in QPC
MAINTAINERS: Update Broadcom RDMA maintainers
RDMA/irdma: Use the queried port attributes
RDMA/rxe: Fix redundant skb_put_zero
RDMA/rxe: Fix extra copy in prepare_ack_packet
...
Since PTP virtual clock support is added, there can be
several PTP virtual clocks based on one PTP physical
clock for timestamping.
This patch is to extend SO_TIMESTAMPING API to support
PHC (PTP Hardware Clock) binding by adding a new flag
SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are
in use, user space can configure to bind one for
timestamping, but PTP physical clock is not supported
and not needed to bind.
This patch is preparation for timestamp conversion from
raw timestamp to a specific PTP virtual clock time in
core net.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDcl7AACgkQnJ2qBz9k
QNnsBQf+LBAPsfykQ/f8EdHErO1lfbVTmwf2g/JzTkjrIVZTZ6Ic47aCIiFxgHU2
Js9ufaPxpsbbopzpn2PAoCUzxNsZDqgXtnC03MOUAqoSFbAvgLHz2sQwjqeYJUGQ
P6n7VipEA/qBVpQI5zeCUhHYcahoNrRjSLzaFnE2Z8CrQYQ6Ry9gVEhduvu2OTru
62cWlAWlTJfx/FcR1Y0F/ZznnNSKMiAHcEe3F6Beztplg2ooq+z6FclJYrkmnxMq
SXSOsqTCdi1/oFx36NpvLkykrIS9I7N/iqCnKwbm6X+nyZZKyAwYZhWVqkbozPPu
+u1Ppq8o0IuWwEA6/UAmxgAO3m/Gkw==
=tn0h
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc fs updates from Jan Kara:
"The new quotactl_fd() syscall (remake of quotactl_path() syscall that
got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
isofs, and writeback fixes and cleanups"
* tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: fix obtain a reference to a freeing memcg css
quota: remove unnecessary oom message
isofs: remove redundant continue statement
quota: Wire up quotactl_fd syscall
quota: Change quotactl_path() systcall to an fd-based one
reiserfs: Remove unneed check in reiserfs_write_full_page()
udf: Fix NULL pointer dereference in udf_symlink function
reiserfs: add check for invalid 1st journal block
I. Background: Sparse Memory Mappings
When we manage sparse memory mappings dynamically in user space - also
sometimes involving MAP_NORESERVE - we want to dynamically populate/
discard memory inside such a sparse memory region. Example users are
hypervisors (especially implementing memory ballooning or similar
technologies like virtio-mem) and memory allocators. In addition, we want
to fail in a nice way (instead of generating SIGBUS) if populating does
not succeed because we are out of backend memory (which can happen easily
with file-based mappings, especially tmpfs and hugetlbfs).
While MADV_DONTNEED, MADV_REMOVE and FALLOC_FL_PUNCH_HOLE allow for
reliably discarding memory for most mapping types, there is no generic
approach to populate page tables and preallocate memory.
Although mmap() supports MAP_POPULATE, it is not applicable to the concept
of sparse memory mappings, where we want to populate/discard dynamically
and avoid expensive/problematic remappings. In addition, we never
actually report errors during the final populate phase - it is best-effort
only.
fallocate() can be used to preallocate file-based memory and fail in a
safe way. However, it cannot really be used for any private mappings on
anonymous files via memfd due to COW semantics. In addition, fallocate()
does not actually populate page tables, so we still always get pagefaults
on first access - which is sometimes undesired (i.e., real-time workloads)
and requires real prefaulting of page tables, not just a preallocation of
backend storage. There might be interesting use cases for sparse memory
regions along with mlockall(MCL_ONFAULT) which fallocate() cannot satisfy
as it does not prefault page tables.
II. On preallcoation/prefaulting from user space
Because we don't have a proper interface, what applications (like QEMU and
databases) end up doing is touching (i.e., reading+writing one byte to not
overwrite existing data) all individual pages.
However, that approach
1) Can result in wear on storage backing, because we end up reading/writing
each page; this is especially a problem for dax/pmem.
2) Can result in mmap_sem contention when prefaulting via multiple
threads.
3) Requires expensive signal handling, especially to catch SIGBUS in case
of hugetlbfs/shmem/file-backed memory. For example, this is
problematic in hypervisors like QEMU where SIGBUS handlers might already
be used by other subsystems concurrently to e.g, handle hardware errors.
"Simply" doing preallocation concurrently from other thread is not that
easy.
III. On MADV_WILLNEED
Extending MADV_WILLNEED is not an option because
1. It would change the semantics: "Expect access in the near future." and
"might be a good idea to read some pages" vs. "Definitely populate/
preallocate all memory and definitely fail on errors.".
2. Existing users (like virtio-balloon in QEMU when deflating the balloon)
don't want populate/prealloc semantics. They treat this rather as a hint
to give a little performance boost without too much overhead - and don't
expect that a lot of memory might get consumed or a lot of time
might be spent.
IV. MADV_POPULATE_READ and MADV_POPULATE_WRITE
Let's introduce MADV_POPULATE_READ and MADV_POPULATE_WRITE, inspired by
MAP_POPULATE, with the following semantics:
1. MADV_POPULATE_READ can be used to prefault page tables just like
manually reading each individual page. This will not break any COW
mappings. The shared zero page might get mapped and no backend storage
might get preallocated -- allocation might be deferred to
write-fault time. Especially shared file mappings require an explicit
fallocate() upfront to actually preallocate backend memory (blocks in
the file system) in case the file might have holes.
2. If MADV_POPULATE_READ succeeds, all page tables have been populated
(prefaulted) readable once.
3. MADV_POPULATE_WRITE can be used to preallocate backend memory and
prefault page tables just like manually writing (or
reading+writing) each individual page. This will break any COW
mappings -- e.g., the shared zeropage is never populated.
4. If MADV_POPULATE_WRITE succeeds, all page tables have been populated
(prefaulted) writable once.
5. MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot be applied to special
mappings marked with VM_PFNMAP and VM_IO. Also, proper access
permissions (e.g., PROT_READ, PROT_WRITE) are required. If any such
mapping is encountered, madvise() fails with -EINVAL.
6. If MADV_POPULATE_READ or MADV_POPULATE_WRITE fails, some page tables
might have been populated.
7. MADV_POPULATE_READ and MADV_POPULATE_WRITE will return -EHWPOISON
when encountering a HW poisoned page in the range.
8. Similar to MAP_POPULATE, MADV_POPULATE_READ and MADV_POPULATE_WRITE
cannot protect from the OOM (Out Of Memory) handler killing the
process.
While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e.,
preallocate memory and prefault page tables for VMs), one issue is that
whenever we prefault pages writable, the pages have to be marked dirty,
because the CPU could dirty them any time. while not a real problem for
hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each
page will be marked dirty and has to be written back later when evicting.
MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole
mapping from backend storage without marking it dirty, such that eviction
won't have to write it back. As discussed above, shared file mappings
might require an explciit fallocate() upfront to achieve
preallcoation+prepopulation.
Although sparse memory mappings are the primary use case, this will also
be useful for other preallocate/prefault use cases where MAP_POPULATE is
not desired or the semantics of MAP_POPULATE are not sufficient: as one
example, QEMU users can trigger preallocation/prefaulting of guest RAM
after the mapping was created -- and don't want errors to be silently
suppressed.
Looking at the history, MADV_POPULATE was already proposed in 2013 [1],
however, the main motivation back than was performance improvements --
which should also still be the case.
V. Single-threaded performance comparison
I did a short experiment, prefaulting page tables on completely *empty
mappings/files* and repeated the experiment 10 times. The results
correspond to the shortest execution time. In general, the performance
benefit for huge pages is negligible with small mappings.
V.1: Private mappings
POPULATE_READ and POPULATE_WRITE is fastest. Note that
Reading/POPULATE_READ will populate the shared zeropage where applicable
-- which result in short population times.
The fastest way to allocate backend storage (here: swap or huge pages) and
prefault page tables is POPULATE_WRITE.
V.2: Shared mappings
fallocate() is fastest, however, doesn't prefault page tables.
POPULATE_WRITE is faster than simple writes and read/writes.
POPULATE_READ is faster than simple reads.
Without a fd, the fastest way to allocate backend storage and prefault
page tables is POPULATE_WRITE. With an fd, the fastest way is usually
FALLOCATE+POPULATE_READ or FALLOCATE+POPULATE_WRITE respectively; one
exception are actual files: FALLOCATE+Read is slightly faster than
FALLOCATE+POPULATE_READ.
The fastest way to allocate backend storage prefault page tables is
FALLOCATE+POPULATE_WRITE -- except when dealing with actual files; then,
FALLOCATE+POPULATE_READ is fastest and won't directly mark all pages as
dirty.
v.3: Detailed results
==================================================
2 MiB MAP_PRIVATE:
**************************************************
Anon 4 KiB : Read : 0.119 ms
Anon 4 KiB : Write : 0.222 ms
Anon 4 KiB : Read/Write : 0.380 ms
Anon 4 KiB : POPULATE_READ : 0.060 ms
Anon 4 KiB : POPULATE_WRITE : 0.158 ms
Memfd 4 KiB : Read : 0.034 ms
Memfd 4 KiB : Write : 0.310 ms
Memfd 4 KiB : Read/Write : 0.362 ms
Memfd 4 KiB : POPULATE_READ : 0.039 ms
Memfd 4 KiB : POPULATE_WRITE : 0.229 ms
Memfd 2 MiB : Read : 0.030 ms
Memfd 2 MiB : Write : 0.030 ms
Memfd 2 MiB : Read/Write : 0.030 ms
Memfd 2 MiB : POPULATE_READ : 0.030 ms
Memfd 2 MiB : POPULATE_WRITE : 0.030 ms
tmpfs : Read : 0.033 ms
tmpfs : Write : 0.313 ms
tmpfs : Read/Write : 0.406 ms
tmpfs : POPULATE_READ : 0.039 ms
tmpfs : POPULATE_WRITE : 0.285 ms
file : Read : 0.033 ms
file : Write : 0.351 ms
file : Read/Write : 0.408 ms
file : POPULATE_READ : 0.039 ms
file : POPULATE_WRITE : 0.290 ms
hugetlbfs : Read : 0.030 ms
hugetlbfs : Write : 0.030 ms
hugetlbfs : Read/Write : 0.030 ms
hugetlbfs : POPULATE_READ : 0.030 ms
hugetlbfs : POPULATE_WRITE : 0.030 ms
**************************************************
4096 MiB MAP_PRIVATE:
**************************************************
Anon 4 KiB : Read : 237.940 ms
Anon 4 KiB : Write : 708.409 ms
Anon 4 KiB : Read/Write : 1054.041 ms
Anon 4 KiB : POPULATE_READ : 124.310 ms
Anon 4 KiB : POPULATE_WRITE : 572.582 ms
Memfd 4 KiB : Read : 136.928 ms
Memfd 4 KiB : Write : 963.898 ms
Memfd 4 KiB : Read/Write : 1106.561 ms
Memfd 4 KiB : POPULATE_READ : 78.450 ms
Memfd 4 KiB : POPULATE_WRITE : 805.881 ms
Memfd 2 MiB : Read : 357.116 ms
Memfd 2 MiB : Write : 357.210 ms
Memfd 2 MiB : Read/Write : 357.606 ms
Memfd 2 MiB : POPULATE_READ : 356.094 ms
Memfd 2 MiB : POPULATE_WRITE : 356.937 ms
tmpfs : Read : 137.536 ms
tmpfs : Write : 954.362 ms
tmpfs : Read/Write : 1105.954 ms
tmpfs : POPULATE_READ : 80.289 ms
tmpfs : POPULATE_WRITE : 822.826 ms
file : Read : 137.874 ms
file : Write : 987.025 ms
file : Read/Write : 1107.439 ms
file : POPULATE_READ : 80.413 ms
file : POPULATE_WRITE : 857.622 ms
hugetlbfs : Read : 355.607 ms
hugetlbfs : Write : 355.729 ms
hugetlbfs : Read/Write : 356.127 ms
hugetlbfs : POPULATE_READ : 354.585 ms
hugetlbfs : POPULATE_WRITE : 355.138 ms
**************************************************
2 MiB MAP_SHARED:
**************************************************
Anon 4 KiB : Read : 0.394 ms
Anon 4 KiB : Write : 0.348 ms
Anon 4 KiB : Read/Write : 0.400 ms
Anon 4 KiB : POPULATE_READ : 0.326 ms
Anon 4 KiB : POPULATE_WRITE : 0.273 ms
Anon 2 MiB : Read : 0.030 ms
Anon 2 MiB : Write : 0.030 ms
Anon 2 MiB : Read/Write : 0.030 ms
Anon 2 MiB : POPULATE_READ : 0.030 ms
Anon 2 MiB : POPULATE_WRITE : 0.030 ms
Memfd 4 KiB : Read : 0.412 ms
Memfd 4 KiB : Write : 0.372 ms
Memfd 4 KiB : Read/Write : 0.419 ms
Memfd 4 KiB : POPULATE_READ : 0.343 ms
Memfd 4 KiB : POPULATE_WRITE : 0.288 ms
Memfd 4 KiB : FALLOCATE : 0.137 ms
Memfd 4 KiB : FALLOCATE+Read : 0.446 ms
Memfd 4 KiB : FALLOCATE+Write : 0.330 ms
Memfd 4 KiB : FALLOCATE+Read/Write : 0.454 ms
Memfd 4 KiB : FALLOCATE+POPULATE_READ : 0.379 ms
Memfd 4 KiB : FALLOCATE+POPULATE_WRITE : 0.268 ms
Memfd 2 MiB : Read : 0.030 ms
Memfd 2 MiB : Write : 0.030 ms
Memfd 2 MiB : Read/Write : 0.030 ms
Memfd 2 MiB : POPULATE_READ : 0.030 ms
Memfd 2 MiB : POPULATE_WRITE : 0.030 ms
Memfd 2 MiB : FALLOCATE : 0.030 ms
Memfd 2 MiB : FALLOCATE+Read : 0.031 ms
Memfd 2 MiB : FALLOCATE+Write : 0.031 ms
Memfd 2 MiB : FALLOCATE+Read/Write : 0.031 ms
Memfd 2 MiB : FALLOCATE+POPULATE_READ : 0.030 ms
Memfd 2 MiB : FALLOCATE+POPULATE_WRITE : 0.030 ms
tmpfs : Read : 0.416 ms
tmpfs : Write : 0.369 ms
tmpfs : Read/Write : 0.425 ms
tmpfs : POPULATE_READ : 0.346 ms
tmpfs : POPULATE_WRITE : 0.295 ms
tmpfs : FALLOCATE : 0.139 ms
tmpfs : FALLOCATE+Read : 0.447 ms
tmpfs : FALLOCATE+Write : 0.333 ms
tmpfs : FALLOCATE+Read/Write : 0.454 ms
tmpfs : FALLOCATE+POPULATE_READ : 0.380 ms
tmpfs : FALLOCATE+POPULATE_WRITE : 0.272 ms
file : Read : 0.191 ms
file : Write : 0.511 ms
file : Read/Write : 0.524 ms
file : POPULATE_READ : 0.196 ms
file : POPULATE_WRITE : 0.434 ms
file : FALLOCATE : 0.004 ms
file : FALLOCATE+Read : 0.197 ms
file : FALLOCATE+Write : 0.554 ms
file : FALLOCATE+Read/Write : 0.480 ms
file : FALLOCATE+POPULATE_READ : 0.201 ms
file : FALLOCATE+POPULATE_WRITE : 0.381 ms
hugetlbfs : Read : 0.030 ms
hugetlbfs : Write : 0.030 ms
hugetlbfs : Read/Write : 0.030 ms
hugetlbfs : POPULATE_READ : 0.030 ms
hugetlbfs : POPULATE_WRITE : 0.030 ms
hugetlbfs : FALLOCATE : 0.030 ms
hugetlbfs : FALLOCATE+Read : 0.031 ms
hugetlbfs : FALLOCATE+Write : 0.031 ms
hugetlbfs : FALLOCATE+Read/Write : 0.030 ms
hugetlbfs : FALLOCATE+POPULATE_READ : 0.030 ms
hugetlbfs : FALLOCATE+POPULATE_WRITE : 0.030 ms
**************************************************
4096 MiB MAP_SHARED:
**************************************************
Anon 4 KiB : Read : 1053.090 ms
Anon 4 KiB : Write : 913.642 ms
Anon 4 KiB : Read/Write : 1060.350 ms
Anon 4 KiB : POPULATE_READ : 893.691 ms
Anon 4 KiB : POPULATE_WRITE : 782.885 ms
Anon 2 MiB : Read : 358.553 ms
Anon 2 MiB : Write : 358.419 ms
Anon 2 MiB : Read/Write : 357.992 ms
Anon 2 MiB : POPULATE_READ : 357.533 ms
Anon 2 MiB : POPULATE_WRITE : 357.808 ms
Memfd 4 KiB : Read : 1078.144 ms
Memfd 4 KiB : Write : 942.036 ms
Memfd 4 KiB : Read/Write : 1100.391 ms
Memfd 4 KiB : POPULATE_READ : 925.829 ms
Memfd 4 KiB : POPULATE_WRITE : 804.394 ms
Memfd 4 KiB : FALLOCATE : 304.632 ms
Memfd 4 KiB : FALLOCATE+Read : 1163.359 ms
Memfd 4 KiB : FALLOCATE+Write : 933.186 ms
Memfd 4 KiB : FALLOCATE+Read/Write : 1187.304 ms
Memfd 4 KiB : FALLOCATE+POPULATE_READ : 1013.660 ms
Memfd 4 KiB : FALLOCATE+POPULATE_WRITE : 794.560 ms
Memfd 2 MiB : Read : 358.131 ms
Memfd 2 MiB : Write : 358.099 ms
Memfd 2 MiB : Read/Write : 358.250 ms
Memfd 2 MiB : POPULATE_READ : 357.563 ms
Memfd 2 MiB : POPULATE_WRITE : 357.334 ms
Memfd 2 MiB : FALLOCATE : 356.735 ms
Memfd 2 MiB : FALLOCATE+Read : 358.152 ms
Memfd 2 MiB : FALLOCATE+Write : 358.331 ms
Memfd 2 MiB : FALLOCATE+Read/Write : 358.018 ms
Memfd 2 MiB : FALLOCATE+POPULATE_READ : 357.286 ms
Memfd 2 MiB : FALLOCATE+POPULATE_WRITE : 357.523 ms
tmpfs : Read : 1087.265 ms
tmpfs : Write : 950.840 ms
tmpfs : Read/Write : 1107.567 ms
tmpfs : POPULATE_READ : 922.605 ms
tmpfs : POPULATE_WRITE : 810.094 ms
tmpfs : FALLOCATE : 306.320 ms
tmpfs : FALLOCATE+Read : 1169.796 ms
tmpfs : FALLOCATE+Write : 933.730 ms
tmpfs : FALLOCATE+Read/Write : 1191.610 ms
tmpfs : FALLOCATE+POPULATE_READ : 1020.474 ms
tmpfs : FALLOCATE+POPULATE_WRITE : 798.945 ms
file : Read : 654.101 ms
file : Write : 1259.142 ms
file : Read/Write : 1289.509 ms
file : POPULATE_READ : 661.642 ms
file : POPULATE_WRITE : 1106.816 ms
file : FALLOCATE : 1.864 ms
file : FALLOCATE+Read : 656.328 ms
file : FALLOCATE+Write : 1153.300 ms
file : FALLOCATE+Read/Write : 1180.613 ms
file : FALLOCATE+POPULATE_READ : 668.347 ms
file : FALLOCATE+POPULATE_WRITE : 996.143 ms
hugetlbfs : Read : 357.245 ms
hugetlbfs : Write : 357.413 ms
hugetlbfs : Read/Write : 357.120 ms
hugetlbfs : POPULATE_READ : 356.321 ms
hugetlbfs : POPULATE_WRITE : 356.693 ms
hugetlbfs : FALLOCATE : 355.927 ms
hugetlbfs : FALLOCATE+Read : 357.074 ms
hugetlbfs : FALLOCATE+Write : 357.120 ms
hugetlbfs : FALLOCATE+Read/Write : 356.983 ms
hugetlbfs : FALLOCATE+POPULATE_READ : 356.413 ms
hugetlbfs : FALLOCATE+POPULATE_WRITE : 356.266 ms
**************************************************
[1] https://lkml.org/lkml/2013/6/27/698
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20210419135443.12822-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MPOL_LOCAL policy has been setup as a real policy, but it is still handled
like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit
set, and there are many places having to judge the real 'prefer' or the
'local' policy, which are quite confusing.
In current code, there are 4 cases that MPOL_LOCAL are used:
1. user specifies 'local' policy
2. user specifies 'prefer' policy, but with empty nodemask
3. system 'default' policy is used
4. 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES
flag set, and when it is 'rebind' to a nodemask which doesn't contains
the 'preferred' node, it will perform as 'local' policy
So make 'local' a real policy instead of a fake 'prefer' one, and kill
MPOL_F_LOCAL bit, which can greatly reduce the confusion for code reading.
For case 4, the logic of mpol_rebind_preferred() is confusing, as Michal
Hocko pointed out:
: I do believe that rebinding preferred policy is just bogus and it should
: be dropped altogether on the ground that a preference is a mere hint from
: userspace where to start the allocation. Unless I am missing something
: cpusets will be always authoritative for the final placement. The
: preferred node just acts as a starting point and it should be really
: preserved when cpusets changes. Otherwise we have a very subtle behavior
: corner cases.
So dump all the tricky transformation between 'prefer' and 'local', and
just record the new nodemask of rebinding.
[feng.tang@intel.com: fix a problem in mpol_set_nodemask(), per Michal Hocko]
Link: https://lkml.kernel.org/r/1622560492-1294-3-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: refine code and comments of mpol_set_nodemask(), per Michal]
Link: https://lkml.kernel.org/r/20210603081807.GE56979@shbuild999.sh.intel.com
Link: https://lkml.kernel.org/r/1622469956-82897-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the feature is fully implemented (the faulting path hooks exist
so userspace is notified, and the ioctl to resolve such faults is
available), advertise this as a supported feature.
Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener
to another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance
(for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require
jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address
allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks
in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
(our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm 60GHz WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
=LvtX
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmDbjZ4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMMMRAAgwwYgJ8pqac21co0rLCEMLdnSKvu
ueaqCa0F/c+UJ9VmHZ0eDr4FHFgCxKamUP1+/Zn4U53tc483mX/Kz6blGd91hz1K
YnYJGK/tlOmhOJtaZOsp0saRrYn/62kO5tyzp+mhNKje5ON4ICA5jR3Wz+B+RDSM
GXH8EDuxKVkAuqpTZHm79a8LL2G9uDre2uWWHehDxLNAOuLxK/3DnObHRdEi4ZcV
9MSFljLnSocFdLOPRL2R05LQ2ce18NHkTW+FMzPbTezLP4lm62lEsAYOQ1vvUm0A
MoncR4bFGMPYbM5WQmwiDDBBVoo6+HB5Q0AK6fbZwaIJYNFCCc0ytKKyfSyyIaRK
8qSaGeXpUU4nHeygeHvhz7uOVFJfG+WXWRM2WIOwUWU0KKbvCSJ3UDM3cuJHIQol
jv2yuYIZQcz/tXTsOn59ivjsxXclsb3d1Z6FmligaoAoASCerCgeGkhDC4CwHYg5
qZHxE/rtiZxzgZ15cghxITOcGzOVEeYqbo9FzD5aXh66MGzOkrtYwJ+CCVTSx27O
uco6PKelVI2EJ9v4KeQtIMOFDU/ZyJrde/q2otfqQoSaR12Jarpi0qL/5ossuR2B
bEqNN6sX22++QfvLZOAJ8EmYjzkmDfs4q4LYQxMpQfw27OTcK17rXcnI//mmRbUr
SrCuVW3mF8FzxJc=
=1bAZ
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another merge window, another small audit pull request.
Four patches in total: one is cosmetic, one removes an unnecessary
initialization, one renames some enum values to prevent name
collisions, and one converts list_del()/list_add() to list_move().
None of these are earth shattering and all pass the audit-testsuite
tests while merging cleanly on top of your tree from earlier today"
* tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove unnecessary 'ret' initialization
audit: remove trailing spaces and tabs
audit: Use list_move instead of list_del/list_add
audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
audit: add blank line after variable declarations
Problem:
On reconfigure of device, there is no way to defend if the backend
storage is matching with the initial backend storage.
Say, if an initial connect request for backend "pool1/image1" got
mapped to /dev/nbd0 and the userspace process is terminated. A next
reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to
use /dev/nbd0 for a different backend "pool1/image2"
For example, an operation like below could be dangerous:
$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -9 rbd-nbd
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"
Solution:
Provide a way for userspace processes to keep some metadata to identify
between the device and the backend, so that when a reconfigure request is
made, we can compare and avoid such dangerous operations.
With this solution, as part of the initial connect request, backend
path can be stored in the sysfs per device config, so that on a reconfigure
request it's easy to check if the backend path matches with the initial
connect backend path.
Please note, ioctl interface to nbd will not have these changes, as there
won't be any reconfigure.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210429102828.31248-1-prasanna.kalever@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Highlights:
- New think-lmi driver adding support for changing BIOS settings from
within Linux using the standard firmware-attributes class sysfs API
- MS Surface aggregator-cdev now also supports forwarding events to
user-space (for debugging / new driver development purposes only)
- New intel_skl_int3472 driver this provides the necessary glue to
translate ACPI table information to GPIOs, regulators, etc. for
camera sensors on Intel devices with IPU3 attached MIPI cameras
- A whole bunch of other fixes + device-specific quirk additions
- New devm_work_autocancel() devm-helpers.h function
Note this also contains merges of the following immutable branches/tags
shared with other subsystems:
- platform-drivers-x86-goodix-v5.14-1
- intel-gpio-v5.14-1
- linux-pm/acpi-scan
- devm-helpers-v5.14-1
The following is an automated git shortlog grouped by driver:
ACPI:
- scan: initialize local variable to avoid garbage being returned
- scan: Add function to fetch dependent of ACPI device
- scan: Extend acpi_walk_dep_device_list()
- scan: Rearrange dep_unmet initialization
Add intel_skl_int3472 driver:
- Add intel_skl_int3472 driver
ISST:
- Use numa node id for cpu pci dev mapping
- Optimize CPU to PCI device mapping
Input:
- goodix - platform/x86: touchscreen_dmi - Move upside down quirks to touchscreen_dmi.c
MAINTAINERS:
- Update IRC link for Surface System Aggregator subsystem
- Update info for telemetry
Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans:
- Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans
Merge tag 'devm-helpers-v5.14-1' into review-hans:
- Merge tag 'devm-helpers-v5.14-1' into review-hans
Merge tag 'intel-gpio-v5.14-1' into review-hans:
- Merge tag 'intel-gpio-v5.14-1' into review-hans
Merge tag 'platform-drivers-x86-goodix-v5.14-1' into review-hans:
- Merge tag 'platform-drivers-x86-goodix-v5.14-1' into review-hans
Remove "default n" entries:
- Remove "default n" entries
Rename hp-wireless to wireless-hotkey:
- Rename hp-wireless to wireless-hotkey
asus-nb-wmi:
- Revert "add support for ASUS ROG Zephyrus G14 and G15"
- Revert "Drop duplicate DMI quirk structures"
dcdbas:
- drop unneeded assignment in host_control_smi()
dell-privacy:
- Add support for Dell hardware privacy
dell-wmi:
- Rename dell-wmi.c to dell-wmi-base.c
dell-wmi-sysman:
- Change user experience when Admin/System Password is modified
- fw_attr_inuse can be static
- Use firmware_attributes_class helper
- Make populate_foo_data functions more robust
dell-wmi-sysman/think-lmi:
- Make fw_attr_class global static
devm-helpers:
- Add resource managed version of work init
docs:
- driver-api: Update Surface Aggregator user-space interface documentation
extcon:
- extcon-max8997: Simplify driver using devm
- extcon-max8997: Fix IRQ freeing at error path
- extcon-max77693.c: Fix potential work-queue cancellation race
- extcon-max14577: Fix potential work-queue cancellation race
firmware_attributes_class:
- Create helper file for handling firmware-attributes class registration events
gpio:
- wcove: Split error handling for CTRL and IRQ registers
- wcove: Unify style of to_reg() with to_ireg()
- wcove: Use IRQ hardware number getter instead of direct access
- crystalcove: remove platform_set_drvdata() + cleanup probe
gpiolib:
- acpi: Add acpi_gpio_get_io_resource()
- acpi: Introduce acpi_get_and_request_gpiod() helper
hdaps:
- Constify static attribute_group struct
ideapad-laptop:
- Ignore VPC event bit 10
intel_cht_int33fe:
- Move to its own subfolder
- Correct "displayport" fwnode reference
intel_ips:
- fix set but unused warning in read_mgtv
intel_pmt_crashlog:
- Constify static attribute_group struct
intel_skl_int3472:
- Uninitialized variable in skl_int3472_handle_gpio_resources()
- Move to intel/ subfolder
- Provide skl_int3472_unregister_clock()
- Provide skl_int3472_unregister_regulator()
- Use ACPI GPIO resource directly
- Fix dependencies (drop CLKDEV_LOOKUP)
- Free ACPI device resources after use
mfd:
- tps68470: Remove tps68470 MFD driver
platform/mellanox:
- mlxreg-hotplug: Revert "move to use request_irq by IRQF_NO_AUTOEN flag"
platform/surface:
- aggregator: Use list_move_tail instead of list_del/list_add_tail in ssh_packet_layer.c
- aggregator: Use list_move_tail instead of list_del/list_add_tail in ssh_request_layer.c
- aggregator: Drop unnecessary variable initialization
- aggregator: Do not return uninitialized value
- aggregator_cdev: Add lockdep support
- aggregator_cdev: Allow enabling of events from user-space
- aggregator_cdev: Add support for forwarding events to user-space
- aggregator: Update copyright
- aggregator: Allow enabling of events without notifiers
- aggregator: Allow registering notifiers without enabling events
- dtx: Add missing mutex_destroy() call in failure path
- aggregator: Fix event disable function
- aggregator_registry: Consolidate node groups for 5th- and 6th-gen devices
- aggregator_registry: Add support for 13" Intel Surface Laptop 4
- aggregator_registry: Update comments for 15" AMD Surface Laptop 4
samsung-laptop:
- set debugfs blobs to read only
- use octal numbers for rwx file permissions
tc1100-wmi:
- Constify static attribute_group struct
think-lmi:
- Move kfree(setting->possible_values) to tlmi_attr_setting_release()
- Split current_value to reflect only the value
- Fix issues with duplicate attributes
- Return EINVAL when kbdlang gets set to a 0 length string
- Add missing MODULE_DEVICE_TABLE
- Avoid potential read before start of the buffer
- Fix check for admin password being set
- Add WMI interface support on Lenovo platforms
thinkpad-lmi:
- Remove unused display_name member from struct tlmi_pwd_setting
thinkpad_acpi:
- Add X1 Carbon Gen 9 second fan support
- Fix inconsistent indenting
tools/power/x86/intel-speed-select:
- v1.10 release
- Fix uncore memory frequency display
toshiba_acpi:
- Fix missing error code in toshiba_acpi_setup_keyboard()
toshiba_haps:
- Fix missing newline in pr_debug call in toshiba_haps_notify
touchscreen_dmi:
- Fix Chuwi Hi10 Pro comment
- Add info for the Goodix GT912 panel of TM800A550L tablets
- Add an extra entry for the upside down Goodix touchscreen on Teclast X89 tablets
x86/platform/uv:
- Constify static attribute_group struct
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmDbELwUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yp2wgAj1mTOJi/4Rx1g8wXLpP/hflEkFMU
yyMeKe3LOEzuo/LZUfW4tqWiXa4aTgN6rUOF8KUumsIor/72hKcczuPVY+qCqF7V
qYZ0vMG93DfAyVPQvBrNjHMXiVevD/gMFRqJEOOgXt96B6Zea4vh1pBvLACAHFZ0
bjkZDX3cO89TSfUF7uhiU9UkMvMMAVs34Knc1Pe4QnZ16e2kPGcKip3qb73yT+xC
8NVRgE6fdSIJfDAVzqpdh91rfDdzHDJ6vT10uijOTkriJciN07UKtYuK5StCpAo5
sXIQllHySHRHj5N0IWZ04w6RMQ+l/9CaHDttkYWW3fV1EU9SVzvp/+d6zA==
=tAuE
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"Highlights:
- New think-lmi driver adding support for changing Lenovo Thinkpad
BIOS settings from within Linux using the standard firmware-
attributes class sysfs API
- MS Surface aggregator-cdev now also supports forwarding events to
user-space (for debugging / new driver development purposes only)
- New intel_skl_int3472 driver this provides the necessary glue to
translate ACPI table information to GPIOs, regulators, etc. for
camera sensors on Intel devices with IPU3 attached MIPI cameras
- A whole bunch of other fixes + device-specific quirk additions
- New devm_work_autocancel() devm-helpers.h function"
* tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (83 commits)
platform/x86: dell-wmi-sysman: Change user experience when Admin/System Password is modified
platform/x86: intel_skl_int3472: Uninitialized variable in skl_int3472_handle_gpio_resources()
platform/x86: think-lmi: Move kfree(setting->possible_values) to tlmi_attr_setting_release()
platform/x86: think-lmi: Split current_value to reflect only the value
platform/x86: think-lmi: Fix issues with duplicate attributes
platform/x86: think-lmi: Return EINVAL when kbdlang gets set to a 0 length string
platform/x86: intel_cht_int33fe: Move to its own subfolder
platform/x86: intel_skl_int3472: Move to intel/ subfolder
platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock()
platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_regulator()
platform/x86: intel_skl_int3472: Use ACPI GPIO resource directly
platform/x86: intel_skl_int3472: Fix dependencies (drop CLKDEV_LOOKUP)
platform/x86: intel_skl_int3472: Free ACPI device resources after use
platform/x86: Remove "default n" entries
platform/x86: ISST: Use numa node id for cpu pci dev mapping
platform/x86: ISST: Optimize CPU to PCI device mapping
tools/power/x86/intel-speed-select: v1.10 release
tools/power/x86/intel-speed-select: Fix uncore memory frequency display
extcon: extcon-max8997: Simplify driver using devm
extcon: extcon-max8997: Fix IRQ freeing at error path
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYNnKNAAKCRCRxhvAZXjc
ok/HAQDz3FK3/yeqxH6OLyCedUcD+YBFPPzrfqX+3y6q3z5tGgD9GAGxFXWcMFA2
/cbfmizwh1eJ3WMnbHUp7x6ogpQhWwQ=
=PpNs
-----END PGP SIGNATURE-----
Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr updates from Christian Brauner:
"A few releases ago the old mount API gained support for a mount
options which prevents following symlinks on a given mount. This adds
support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW
flag via mount_setattr() and fsmount(). With mount_setattr() that flag
can even be applied recursively.
There's an additional ack from Ross Zwisler who originally authored
the nosymfollow patch. As I've already had the patches in my for-next
I didn't add his ack explicitly"
* tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr()
mount: Support "nosymfollow" in new mount api
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDZ4TwACgkQxWXV+ddt
WDvWCQ/8Dgnk+FBC25JOkqgu29VZtvhfWkY1poDRuG+tca6VeMMnDbPgnTQFyeS1
38F4uNNi/F5UdFuLz3RK0jYgGFKXTp+sFjavFuXeJQpFxe7VSu7JrilZPaA1Dti8
E8Dp42ilrHDikDbZaT8JB9GSnR7a8tHnIs0RfZSIkHsd+rPs7QPtM0TTzEZyLHqH
2uYoVyd5EvclvM5JLVGxRZ3lTU64zfZlJg+TnoAkBpilqUHqpD+x5cEoNYbdhbAb
j3sF11h/zEa/wmU5w5LRd4Qvl3JygCrnAo+6VAxB/u0yzJnH+UwOEJdDDeUpB/9k
2F/Zy69CUQ7DdXM+Es4TOfAyQ9fpPLt8Z96GIBrdD5BxWbam4pyU5xH4cDPNpsHo
zRCepdU1zwD6z3cfEYKmUAx89ewC8SE8XlUOWiGun4pBKdi3tgwcrytTnu+02JND
mEkP4vTWG2bU+S0Si0u/aAKHcFvOwiY9iHM9tmblVvvlSFYrhFAclsytihPwu9NQ
d9FRQMo9JZbQZXqaWpcmd8eXACz9+5AulIhofpuZLciyhvWpL+CQ+xGNnzJ1DnTH
ct0m+ByFb33bTpAnblkgCMQa9xuwlM57NxvIclRaDPXWipqyZReih9fbF1TkHbXQ
0dkrKe8cHn9w+DI1Hs1Hu1zdD7WJJxNMY2x9MowMU9gDVNBbbVs=
=htVu
-----END PGP SIGNATURE-----
Merge tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"A normal mix of improvements, core changes and features that user have
been missing or complaining about.
User visible changes:
- new sysfs exports:
- add sysfs knob to limit scrub IO bandwidth per device
- device stats are also available in
/sys/fs/btrfs/FSID/devinfo/DEVID/error_stats
- support cancellable resize and device delete ioctls
- change how the empty value is interpreted when setting a property,
so far we have only 'btrfs.compression' and we need to distinguish
a reset to defaults and setting "do not compress", in general the
empty value will always mean 'reset to defaults' for any other
property, for compression it's either 'no' or 'none' to forbid
compression
Performance improvements:
- no need for full sync when truncation does not touch extents,
reported run time change is -12%
- avoid unnecessary logging of xattrs during fast fsyncs (+17%
throughput, -17% runtime on xattr stress workload)
Core:
- preemptive flushing improvements and fixes
- adjust clamping logic on multi-threaded workloads to avoid
flushing too soon
- take into account global block reserve, may help on almost full
filesystems
- continue flushing when there are enough pending delalloc and
ordered bytes
- simplify logic around conditional transaction commit, a workaround
used in the past for throttling that's been superseded by ticket
reservations that manage the throttling in a better way
- subpage blocksize preparation:
- submit read time repair only for each corrupted sector
- scrub repair now works with sectors and not pages
- free space cache (v1) works with sectors and not pages
- more fine grained bio tracking for extents
- subpage support in page callbacks, extent callbacks, end io
callbacks
- simplify transaction abort logic and always abort and don't check
various potentially unreliable stats tracked by the transaction
- exclusive operations can do more checks when started and allow eg.
cancellation of the same running operation
- ensure relocation never runs while we have send operations running,
e.g. when zoned background auto reclaim starts
Fixes:
- zoned: more sanity checks of write pointer
- improve error handling in delayed inodes
- send:
- fix invalid path for unlink operations after parent
orphanization
- fix crash when memory allocations trigger reclaim
- skip compression of we have only one page (can't make things
better)
- empty value of a property newly means reset to default
Other:
- lots of cleanups, comment updates, yearly typo fixing
- disable build on platforms having page size 256K"
* tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (101 commits)
btrfs: remove unused btrfs_fs_info::total_pinned
btrfs: rip out btrfs_space_info::total_bytes_pinned
btrfs: rip the first_ticket_bytes logic from fail_all_tickets
btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing
btrfs: rip out may_commit_transaction
btrfs: send: fix crash when memory allocations trigger reclaim
btrfs: ensure relocation never runs while we have send operations running
btrfs: shorten integrity checker extent data mount option
btrfs: switch mount option bits to enums and use wider type
btrfs: props: change how empty value is interpreted
btrfs: compression: don't try to compress if we don't have enough pages
btrfs: fix unbalanced unlock in qgroup_account_snapshot()
btrfs: sysfs: export dev stats in devinfo directory
btrfs: fix typos in comments
btrfs: remove a stale comment for btrfs_decompress_bio()
btrfs: send: use list_move_tail instead of list_del/list_add_tail
btrfs: disable build on platforms having page size 256K
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: inline wait_current_trans_commit_start in its caller
btrfs: sink wait_for_unblock parameter to async commit
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmDZzHUACgkQCF8+vY7k
4RWjmQ//QSMEAlz/Xv7YmnkheMEbEvMGgRzUK6UIzfbI9sb2ZPx/1dYySVExrPba
Z1WtrH9oEZGX75IwRSBb6Kb0j2k3E1Y9UZzaAAofE4yyrw5sH8oNvhxhAdak+YRC
XaVJq3xulIq/ClsEyaDzZPFfIhZ5Uo/Cz9s4bZMiA8IwZOImnttJmbrw+Og9ly0+
TrvA0MMkO790h+OOnu5Lv1Q2qJZXaVoTIZ+/icDW29WbQdTQnEZ6XOz/Y+4BZFwW
dA49MLdz7BlypV7a3ijIM4ENPrPmmRIf5agKkQ13Z84gNH14Vb8zShDFq2vHwdKC
7qqvZLuCug/GyK7hyPQHOM8b7wN2utNMZCJIcWVob4oZDzHSvLO+wFI4RCI/RBdY
3tOxrH5cd1FJXB5Vi4KtWNk2Ne63UyaJUYSDk9j30LSwNC/EA6+7fc9LeRY2ykNg
rSD8aXMBoHWtxPxBH5O1ljVdbvG4f8Ds6Yb1cjrxvQWIXvzE7TIC2zhRl0o/r6My
1BWOkVVHw0i5/U+PTyQnVK2XYprq0Jp0b9a0ErkJoOtRpd4VW0pFFcDzWN2tlR0h
McIinHAXdeZO6qHbNMyArYcwfTgWP51IWKm5qDQxPtEnrZDQIjbG3GfZZz9wtks/
Moz7vUTbSUaxSqY/Eg+meg1lilU/lyee9O6hhKCDsVQDxaZUKU8=
=VQX8
-----END PGP SIGNATURE-----
Merge tag 'media/v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- V4L2 core control API was split into separate files
- New RC maps: tango and tc-90405
- Hantro driver got support for G2/HEVC decoder
- av7710 is moving to staging, together with some legacy APIs
- several cleanups related to compat_ioctl32 code
- Move the MPEG-2 stateless control type out of staging
- Address several issues with RPM get logic on media drivers
- Lots of cleanups, bug fixes and improvements.
* tag 'media/v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits)
media: s5p-mfc: Fix display delay control creation
media: mtk-vpu: on suspend, read/write regs only if vpu is running
media: video-mux: Skip dangling endpoints
media: Fix Media Controller API config checks
media: i2c: rdacm20: Re-work ov10635 reset
media: i2c: rdacm20: Check return values
media: i2c: rdacm20: Report camera module name
media: i2c: rdacm20: Enable noise immunity
media: i2c: rdacm20: Embed 'serializer' field
media: i2c: rdacm21: Power up OV10640 before OV490
media: i2c: rdacm21: Fix OV10640 powerup
media: i2c: rdacm21: Add delay after OV490 reset
media: i2c: max9271: Introduce wake_up() function
media: i2c: max9271: Check max9271_write() return
media: i2c: max9286: Rework comments in .bound()
media: i2c: max9286: Define high channel amplitude
media: i2c: max9286: Cache channel amplitude
media: i2c: max9286: Rename reverse_channel_mv
media: i2c: max9286: Adjust parameters indent
media: hantro: add support for Rockchip RK3036
...
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of live migration
- Support for initializing the PDPTRs without loading them from memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this has
been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
+/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
=NcRh
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This covers all architectures (except MIPS) so I don't expect any
other feature pull requests this merge window.
ARM:
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration and
apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical
address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of
live migration
- Support for initializing the PDPTRs without loading them from
memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this
has been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM
registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
on AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
kvm: x86: disable the narrow guest module parameter on unload
selftests: kvm: Allows userspace to handle emulation errors.
kvm: x86: Allow userspace to handle emulation errors
KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
KVM: x86: Enhance comments for MMU roles and nested transition trickiness
KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
KVM: x86/mmu: Use MMU's role to determine PTTYPE
KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
KVM: x86/mmu: Add a helper to calculate root from role_regs
KVM: x86/mmu: Add helper to update paging metadata
KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
KVM: x86/mmu: Get nested MMU's root level from the MMU's role
...
* aggregation handling improvements for some drivers
* hidden AP discovery on 6 GHz and other HE 6 GHz
improvements
* minstrel improvements for no-ack frames
* deferred rate control for TXQs to improve reaction
times
* virtual time-based airtime scheduler
* along with various little cleanups/fixups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmDWUHgACgkQB8qZga/f
l8RxXA/+KjkrFUGDHlMTIzS/i03EcB+idLYy+XBWYuO3PN4p/uCCXLzdGfiRKTxp
iCAjYp0ZP01FlN4sOIN+qqrUMoN0e8xqBEY1DFbJELBB1knV4VR0FeBPPcBMUEgE
xAjdZnIsfKO1yU2mrdQVDnkvXr4jHDALePvGwCQe+4KUbwCmVjo6nb534v17Ie3C
MDNnGq395fCvo9NW+Yvzw6s9ZLdhr28bGhSBgBrqmPBx/9vm3b7BA8qO+X6BzAny
87S2x+z40wabBujCGMvw9J5H6OJ0ZLfbT0TumNznqQbCbmwRdSUC+W/cjgRq/aQn
CA3E9T7tofIr1mk9EgcjE8ax91TB/TTX5Nh9Huki25B7rNRBM57VnuEuzmXP5qBS
xgsr60agQHp2ZAGRPBC9msPcPCbvRfdl9ckTAF8/4vI7uEXSyYViX92M0F1FEy60
Mw13B8WeRA4bH22xHpqdXsyl0x2OQHXHlLHfpzmhcUy0i2cQOQMsWbl3g6CJMTq+
/InNSaG+cqrCVse734KJLumRhOq4+MkedYJlqL7uHZJrXlLgQ9Z8ewANjJ/BjTsQ
d8I16XkKPyv8UiJEwHZ6NaHvU48LR8T/r4ym9c/rz0DTWHZyp87QMYa55iwuk4Zo
cW7G3f/N7dMk9fGRxgfIrnNwNUCMsGcBSJeJyr5qT7u8+RW0bRo=
=nFvj
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2021-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes berg says:
====================
Lots of changes:
* aggregation handling improvements for some drivers
* hidden AP discovery on 6 GHz and other HE 6 GHz
improvements
* minstrel improvements for no-ack frames
* deferred rate control for TXQs to improve reaction
times
* virtual time-based airtime scheduler
* along with various little cleanups/fixups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>