IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
To be compatible with accelerator devices of different
versions, 'UACCE_CMD_QM_SET_QP_INFO' ioctl is added to obtain
queue information in userspace, including queue depth and buffer
description size.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Like what was done with IFLA_PROMISCUITY, add IFLA_ALLMULTI to advertise
the allmulti counter.
The flag IFF_ALLMULTI is advertised only if it was directly set by a
userland app.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable CAN_RAW sockets to read and write CAN XL frames analogue to the
CAN FD extension (new CAN_RAW_XL_FRAMES sockopt).
A CAN XL network interface is capable to handle Classical CAN, CAN FD and
CAN XL frames. When CAN_RAW_XL_FRAMES is enabled, the CAN_RAW socket checks
whether the addressed CAN network interface is capable to handle the
provided CAN frame.
In opposite to the fixed number of bytes for
- CAN frames (CAN_MTU = sizeof(struct can_frame))
- CAN FD frames (CANFD_MTU = sizeof(struct can_frame))
the number of bytes when reading/writing CAN XL frames depends on the
number of data bytes. For efficiency reasons the length of the struct
canxl_frame is truncated to the needed size for read/write operations.
This leads to a calculated size of CANXL_HDR_SIZE + canxl_frame::len which
is enforced on write() operations and guaranteed on read() operations.
NB: Valid length values are 1 .. 2048 (CANXL_MIN_DLEN .. CANXL_MAX_DLEN).
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20220912170725.120748-8-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds defines for data structures and length information for
CAN XL (CAN with eXtended data Length) which can transfer up to 2048
byte inside a single frame.
Notable changes from CAN FD:
- the 11 bit arbitration field is now named 'priority' instead of 'can_id'
(there are no 29 bit identifiers nor RTR frames anymore)
- the data length needs a uint16 value to cover up to 2048 byte
(the length element position is different to struct can[fd]_frame)
- new fields (SDT, AF) and a SEC bit have been introduced
- the virtual CAN interface identifier is not part if the CAN XL frame
struct as this VCID value is stored in struct skbuff (analog to vlan id)
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20220912170725.120748-5-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
To simplify the testing in user space all struct canfd_frame's provided by
the CAN subsystem of the Linux kernel now have the CANFD_FDF flag set in
canfd_frame::flags.
NB: Handcrafted ETH_P_CANFD frames introduced via PF_PACKET socket might
not set this bit correctly. During the check for sufficient headroom in
PF_PACKET sk_buffs the uninitialized CAN sk_buff data structures are filled.
In the case of a CAN FD frame the CANFD_FDF flag is set accordingly.
As the CAN frame content is already zero initialized in alloc_canfd_skb()
the obsolete initialization of cf->flags in the CTU CAN FD driver has been
removed as it would overwrite the already set CANFD_FDF flag.
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20220912170725.120748-4-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Historically, it has been shown that intercepting kernel faults with
userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of
time) can be exploited, or at least can make some kinds of exploits
easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we
changed things so, in order for kernel faults to be handled by
userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must
be configured so that any unprivileged user can do it.
In a typical implementation of a hypervisor with live migration (take
QEMU/KVM as one such example), we do indeed need to be able to handle
kernel faults. But, both options above are less than ideal:
- Toggling the sysctl increases attack surface by allowing any
unprivileged user to do it.
- Granting the live migration process CAP_SYS_PTRACE gives it this
ability, but *also* the ability to "observe and control the
execution of another process [...], and examine and change [its]
memory and registers" (from ptrace(2)). This isn't something we need
or want to be able to do, so granting this permission violates the
"principle of least privilege".
This is all a long winded way to say: we want a more fine-grained way to
grant access to userfaultfd, without granting other additional permissions
at the same time.
To achieve this, add a /dev/userfaultfd misc device. This device provides
an alternative to the userfaultfd(2) syscall for the creation of new
userfaultfds. The idea is, any userfaultfds created this way will be able
to handle kernel faults, without the caller having any special
capabilities. Access to this mechanism is instead restricted using e.g.
standard filesystem permissions.
[axelrasmussen@google.com: Handle misc_register() failure properly]
Link: https://lkml.kernel.org/r/20220819205201.658693-3-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-3-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This idea was introduced by David Rientjes[1].
Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request
a synchronous collapse of memory at their own expense.
The benefits of this approach are:
* CPU is charged to the process that wants to spend the cycles for the
THP
* Avoid unpredictable timing of khugepaged collapse
Semantics
This call is independent of the system-wide THP sysfs settings, but will
fail for memory marked VM_NOHUGEPAGE. If the ranges provided span
multiple VMAs, the semantics of the collapse over each VMA is independent
from the others. This implies a hugepage cannot cross a VMA boundary. If
collapse of a given hugepage-aligned/sized region fails, the operation may
continue to attempt collapsing the remainder of memory specified.
The memory ranges provided must be page-aligned, but are not required to
be hugepage-aligned. If the memory ranges are not hugepage-aligned, the
start/end of the range will be clamped to the first/last hugepage-aligned
address covered by said range. The memory ranges must span at least one
hugepage-sized region.
All non-resident pages covered by the range will first be
swapped/faulted-in, before being internally copied onto a freshly
allocated hugepage. Unmapped pages will have their data directly
initialized to 0 in the new hugepage. However, for every eligible
hugepage aligned/sized region to-be collapsed, at least one page must
currently be backed by memory (a PMD covering the address range must
already exist).
Allocation for the new hugepage may enter direct reclaim and/or
compaction, regardless of VMA flags. When the system has multiple NUMA
nodes, the hugepage will be allocated from the node providing the most
native pages. This operation operates on the current state of the
specified process and makes no persistent changes or guarantees on how
pages will be mapped, constructed, or faulted in the future
Return Value
If all hugepage-sized/aligned regions covered by the provided range were
either successfully collapsed, or were already PMD-mapped THPs, this
operation will be deemed successful. On success, process_madvise(2)
returns the number of bytes advised, and madvise(2) returns 0. Else, -1
is returned and errno is set to indicate the error for the most-recently
attempted hugepage collapse. Note that many failures might have occurred,
since the operation may continue to collapse in the event a single
hugepage-sized/aligned region fails.
ENOMEM Memory allocation failed or VMA not found
EBUSY Memcg charging failed
EAGAIN Required resource temporarily unavailable. Try again
might succeed.
EINVAL Other error: No PMD found, subpage doesn't have Present
bit set, "Special" page no backed by struct page, VMA
incorrectly sized, address not page-aligned, ...
Most notable here is ENOMEM and EBUSY (new to madvise) which are intended
to provide the caller with actionable feedback so they may take an
appropriate fallback measure.
Use Cases
An immediate user of this new functionality are malloc() implementations
that manage memory in hugepage-sized chunks, but sometimes subrelease
memory back to the system in native-sized chunks via MADV_DONTNEED;
zapping the pmd. Later, when the memory is hot, the implementation could
madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage
coverage and dTLB performance. TCMalloc is such an implementation that
could benefit from this[2].
Only privately-mapped anon memory is supported for now, but additional
support for file, shmem, and HugeTLB high-granularity mappings[2] is
expected. File and tmpfs/shmem support would permit:
* Backing executable text by THPs. Current support provided by
CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which
might impair services from serving at their full rated load after
(re)starting. Tricks like mremap(2)'ing text onto anonymous memory to
immediately realize iTLB performance prevents page sharing and demand
paging, both of which increase steady state memory footprint. With
MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance
and lower RAM footprints.
* Backing guest memory by hugapages after the memory contents have been
migrated in native-page-sized chunks to a new host, in a
userfaultfd-based live-migration stack.
[1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
[2] https://github.com/google/tcmalloc/tree/master/tcmalloc
[jrdr.linux@gmail.com: avoid possible memory leak in failure path]
Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com
[zokeefe@google.com add missing kfree() to madvise_collapse()]
Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/
Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com
[zokeefe@google.com: delay computation of hpage boundaries until use]]
Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Traditionally, the conditions for when DIO (direct I/O) is supported
were fairly simple. For both block devices and regular files, DIO had
to be aligned to the logical block size of the block device.
However, due to filesystem features that have been added over time (e.g.
multi-device support, data journalling, inline data, encryption, verity,
compression, checkpoint disabling, log-structured mode), the conditions
for when DIO is allowed on a regular file have gotten increasingly
complex. Whether a particular regular file supports DIO, and with what
alignment, can depend on various file attributes and filesystem mount
options, as well as which block device(s) the file's data is located on.
Moreover, the general rule of DIO needing to be aligned to the block
device's logical block size was recently relaxed to allow user buffers
(but not file offsets) aligned to the DMA alignment instead. See
commit bf8d08532b ("iomap: add support for dma aligned direct-io").
XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information.
Uplifting this to the VFS is one possibility. However, as discussed
(https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
this ioctl is rarely used and not known to be used outside of
XFS-specific code. It was also never intended to indicate when a file
doesn't support DIO at all, nor was it intended for block devices.
Therefore, let's expose this information via statx(). Add the
STATX_DIOALIGN flag and two new statx fields associated with it:
* stx_dio_mem_align: the alignment (in bytes) required for user memory
buffers for DIO, or 0 if DIO is not supported on the file.
* stx_dio_offset_align: the alignment (in bytes) required for file
offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
on the file. This will only be nonzero if stx_dio_mem_align is
nonzero, and vice versa.
Note that as with other statx() extensions, if STATX_DIOALIGN isn't set
in the returned statx struct, then these new fields won't be filled in.
This will happen if the file is neither a regular file nor a block
device, or if the file is a regular file and the filesystem doesn't
support STATX_DIOALIGN. It might also happen if the caller didn't
include STATX_DIOALIGN in the request mask, since statx() isn't required
to return unrequested information.
This commit only adds the VFS-level plumbing for STATX_DIOALIGN. For
regular files, individual filesystems will still need to add code to
support it. For block devices, a separate commit will wire it up too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220827065851.135710-2-ebiggers@kernel.org
Florian Westphal says:
====================
The following set contains changes for your *net-next* tree:
- make conntrack ignore packets that are delayed (containing
data already acked). The current behaviour to flag them as INVALID
causes more harm than good, let them pass so peer can send an
immediate ACK for the most recent sequence number.
- make conntrack recognize when both peers have sent 'invalid' FINs:
This helps cleaning out stale connections faster for those cases where
conntrack is no longer in sync with the actual connection state.
- Now that DECNET is gone, we don't need to reserve space for DECNET
related information.
- compact common 'find a free port number for the new inbound
connection' code and move it to a helper, then cap number of tries
the new helper will make until it gives up.
- replace various instances of strlcpy with strscpy, from Wolfram Sang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA logging allows a device to internally record what DMAs the device is
initiating and report them back to userspace. It is part of the VFIO
migration infrastructure that allows implementing dirty page tracking
during the pre copy phase of live migration. Only DMA WRITEs are logged,
and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
This patch introduces the DMA logging involved uAPIs.
It uses the FEATURE ioctl with its GET/SET/PROBE options as of below.
It exposes a PROBE option to detect if the device supports DMA logging.
It exposes a SET option to start device DMA logging in given IOVAs
ranges.
It exposes a SET option to stop device DMA logging that was previously
started.
It exposes a GET option to read back and clear the device DMA log.
Extra details exist as part of vfio.h per a specific option.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220908183448.195262-4-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
drivers/net/ethernet/freescale/fec.h
7d650df99d ("net: fec: add pm_qos support on imx6q platform")
40c79ce13b ("net: fec: add stop mode support for imx8 platform")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now instead of the number of arguments, the number of registers
holding argument values are stored in trampoline. Update
the description of bpf_get_func_arg[_cnt]() helpers. Previous
programs without struct arguments should continue to work
as usual.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152657.2078805-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-09-05
The following pull-request contains BPF updates for your *net-next* tree.
We've added 106 non-merge commits during the last 18 day(s) which contain
a total of 159 files changed, 5225 insertions(+), 1358 deletions(-).
There are two small merge conflicts, resolve them as follows:
1) tools/testing/selftests/bpf/DENYLIST.s390x
Commit 27e23836ce ("selftests/bpf: Add lru_bug to s390x deny list") in
bpf tree was needed to get BPF CI green on s390x, but it conflicted with
newly added tests on bpf-next. Resolve by adding both hunks, result:
[...]
lru_bug # prog 'printk': failed to auto-attach: -524
setget_sockopt # attach unexpected error: -524 (trampoline)
cb_refs # expected error message unexpected error: -524 (trampoline)
cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc)
htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
[...]
2) net/core/filter.c
Commit 1227c1771d ("net: Fix data-races around sysctl_[rw]mem_(max|default).")
from net tree conflicts with commit 29003875bd ("bpf: Change bpf_setsockopt(SOL_SOCKET)
to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from
bpf-next tree, result:
[...]
if (getopt) {
if (optname == SO_BINDTODEVICE)
return -EINVAL;
return sk_getsockopt(sk, SOL_SOCKET, optname,
KERNEL_SOCKPTR(optval),
KERNEL_SOCKPTR(optlen));
}
return sk_setsockopt(sk, SOL_SOCKET, optname,
KERNEL_SOCKPTR(optval), *optlen);
[...]
The main changes are:
1) Add any-context BPF specific memory allocator which is useful in particular for BPF
tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov.
2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort
to reuse the existing core socket code as much as possible, from Martin KaFai Lau.
3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector
with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani.
4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo.
5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF
integration with the rstat framework, from Yosry Ahmed.
6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev.
7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao.
8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF
backend, from James Hilliard.
9) Fix verifier helper permissions and reference state management for synchronous
callbacks, from Kumar Kartikeya Dwivedi.
10) Add support for BPF selftest's xskxceiver to also be used against real devices that
support MAC loopback, from Maciej Fijalkowski.
11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet.
12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu.
13) Various minor misc improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits)
bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.
bpf: Remove usage of kmem_cache from bpf_mem_cache.
bpf: Remove prealloc-only restriction for sleepable bpf programs.
bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
bpf: Remove tracing program restriction on map types
bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.
bpf: Add percpu allocation support to bpf_mem_alloc.
bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.
bpf: Adjust low/high watermarks in bpf_mem_cache
bpf: Optimize call_rcu in non-preallocated hash map.
bpf: Optimize element count in non-preallocated hash map.
bpf: Relax the requirement to use preallocated hash maps in tracing progs.
samples/bpf: Reduce syscall overhead in map_perf_test.
selftests/bpf: Improve test coverage of test_maps
bpf: Convert hash map to bpf_mem_alloc.
bpf: Introduce any context BPF specific memory allocator.
selftest/bpf: Add test for bpf_getsockopt()
bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
...
====================
Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
drm-misc-next for v6.1:
UAPI Changes:
Cross-subsystem Changes:
- DMA-buf: documentation updates.
- Assorted small fixes to vga16fb
- Fix fbdev drivers to use the aperture helpers.
- Make removal of conflicting drivers work correctly without fbdev enabled.
Core Changes:
- bridge, scheduler, dp-mst: Assorted small fixes.
- Add more format helpers to fourcc, and use it to replace the cpp usage.
- Add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx (single channel), and DRM_FORMAT_Dxx
("darkness", inverted single channel)
- Add packed AYUV8888 and XYUV8888 formats.
- Assorted documentation updates.
- Rename ttm_bo_init to ttm_bo_init_validate.
- Allow TTM bo's to exist without backing store.
- Convert drm selftests to kunit.
- Add managed init functions for (panel) bridge, crtc, encoder and connector.
- Fix endianness handling in various format conversion helpers.
- Make tests pass on big-endian platforms, and add test for rgb888 -> rgb565
- Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers and rename, so
drm_plane_helper is no longer needed in most drivers.
- Use idr_init_base instead of idr_init.
- Rename FB and GEM CMA helpers to DMA helpers.
- Rework XRGB8888 related conversion helpers, and add drm_fb_blit() that
takes a iosys_map. Make drm_fb_memcpy take an iosys_map too.
- Move edid luminance calculation to core, and use it in i915.
Driver Changes:
- bridge/{adv7511,ti-sn65dsi86,parade-ps8640}, panel/{simple,nt35510,tc358767},
nouveau, sun4i, mipi-dsi, mgag200, bochs, arm, komeda, vmwgfx, pl111:
Assorted small fixes and doc updates.
- vc4: Rework hdmi power up, and depend on PM.
- panel/simple: Add Samsung LTL101AL01.
- ingenic: Add JZ4760(B) support, avoid a modeset when sharpness property
is unchanged, and use the new PM ops.
- Revert some amdgpu commits that cause garbaged graphics when starting
X, and reapply them with the real problem fixed.
- Completely rework vc4 init to use managed helpers.
- Rename via_drv to via_dri1, and move all stuff there only used by the
dri1 implementation in preperation for atomic modeset.
- Use regmap bulk write in ssd130x.
- Power sequence and clock updates to it6505.
- Split panel-sitrox-st7701 init sequence and rework mode programming code.
- virtio: Improve error and edge conditions handling, and convert to use managed
helpers.
- Add Samsung LTL101AL01, B120XAN01.0, R140NWF5 RH, DMT028VGHMCMI-1A T, panels.
- Add generic fbdev support to komeda.
- Split mgag200 modeset handling to make it more model-specific.
- Convert simpledrm to use atomic helpers.
- Improve udl suspend/disconnect handling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f0c71766-61e8-19b7-763a-5fbcdefc633d@linux.intel.com
Add new event type for tap called gesture and the direction can be used
to differentiate single and double tap. This may be used by accelerometer
sensors to express single and double tap events. For directional tap,
modifiers like IIO_MOD_(X/Y/Z) can be used along with singletap and
doubletap direction.
Signed-off-by: Jagath Jog J <jagathjog1996@gmail.com>
Link: https://lore.kernel.org/r/20220831063117.4141-2-jagathjog1996@gmail.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
We need the USB fixes in here and this resolves the merge issue in:
drivers/usb/dwc3/gadget.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Berg says:
====================
drivers
- rtw89: large update across the map, e.g. coex, pci(e), etc.
- ath9k: uninit memory read fix
- ath10k: small peer map fix and a WCN3990 device fix
- wfx: underflow
stack
- the "change MAC address while IFF_UP" change from James
we discussed
- more MLO work, including a set of fixes for the previous
code, now that we have more code we can exercise it more
- prevent some features with MLO that aren't ready yet
(AP_VLAN and 4-address connections)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an MLD address attribute to BSS entries that the interface
is currently associated with to help userspace figure out what's
going on.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new extended feature bit signifying that the wireless hardware
supports changing the MAC address while the underlying net_device is
powered. Note that this has a different meaning from
IFF_LIVE_ADDR_CHANGE as additional restrictions might be imposed by
the hardware, such as:
- No connection is active on this interface, carrier is off
- No scan is in progress
- No offchannel operations are in progress
Signed-off-by: James Prestwood <prestwoj@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We now have a cleaner way to keep compatibility with user-space
(a.k.a. not breaking it) when we need to keep in place a one-element
array (for its use in user-space) together with a flexible-array
member (for its use in kernel-space) without making it hard to read
at the source level. This is through the use of the new
__DECLARE_FLEX_ARRAY() helper macro.
The size and memory layout of the structure is preserved after the
changes. See below.
Before changes:
$ pahole -C ip_msfilter net/ipv4/igmp.o
struct ip_msfilter {
union {
struct {
__be32 imsf_multiaddr_aux; /* 0 4 */
__be32 imsf_interface_aux; /* 4 4 */
__u32 imsf_fmode_aux; /* 8 4 */
__u32 imsf_numsrc_aux; /* 12 4 */
__be32 imsf_slist[1]; /* 16 4 */
}; /* 0 20 */
struct {
__be32 imsf_multiaddr; /* 0 4 */
__be32 imsf_interface; /* 4 4 */
__u32 imsf_fmode; /* 8 4 */
__u32 imsf_numsrc; /* 12 4 */
__be32 imsf_slist_flex[0]; /* 16 0 */
}; /* 0 16 */
}; /* 0 20 */
/* size: 20, cachelines: 1, members: 1 */
/* last cacheline: 20 bytes */
};
After changes:
$ pahole -C ip_msfilter net/ipv4/igmp.o
struct ip_msfilter {
__be32 imsf_multiaddr; /* 0 4 */
__be32 imsf_interface; /* 4 4 */
__u32 imsf_fmode; /* 8 4 */
__u32 imsf_numsrc; /* 12 4 */
union {
__be32 imsf_slist[1]; /* 16 4 */
struct {
struct {
} __empty_imsf_slist_flex; /* 16 0 */
__be32 imsf_slist_flex[0]; /* 16 0 */
}; /* 16 0 */
}; /* 16 4 */
/* size: 20, cachelines: 1, members: 5 */
/* last cacheline: 20 bytes */
};
In the past, we had to duplicate the whole original structure within
a union, and update the names of all the members. Now, we just need to
declare the flexible-array member to be used in kernel-space through
the __DECLARE_FLEX_ARRAY() helper together with the one-element array,
within a union. This makes the source code more clean and easier to read.
Link: https://github.com/KSPP/linux/issues/193
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull io_uring fixes from Jens Axboe:
- A single fix for over-eager retries for networking (Pavel)
- Revert the notification slot support for zerocopy sends.
It turns out that even after more than a year or development and
testing, there's not full agreement on whether just using plain
ordered notifications is Good Enough to avoid the complexity of using
the notifications slots. Because of that, we decided that it's best
left to a future final decision.
We can always bring back this feature, but we can't really change it
or remove it once we've released 6.0 with it enabled. The reverts
leave the usual CQE notifications as the primary interface for
knowing when data was sent, and when it was acked. (Pavel)
* tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block:
selftests/net: return back io_uring zc send tests
io_uring/net: simplify zerocopy send user API
io_uring/notif: remove notif registration
Revert "io_uring: rename IORING_OP_FILES_UPDATE"
Revert "io_uring: add zc notification flush requests"
selftests/net: temporarily disable io_uring zc test
io_uring/net: fix overexcessive retries
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
(id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
tun_flags to the BPF program.
It makes sense to expose tun_flags to the BPF program.
Assume for example multiple GRE tunnels maintained on a single GRE
interface in collect_md mode. The program expects origins to initiate
over GRE, however different origins use different GRE characteristics
(e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
some do not, etc..).
A BPF program getting tun_flags can therefore remember the relevant
flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In
the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
to correctly reply to the remote, using similar characteristics, based
on the stored tunnel flags.
Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If
specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.
Decided to use the existing unused 'tunnel_ext' as the storage for the
'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.
Also, the following has been considered during the design:
1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
and place into the new 'tunnel_flags' field. This has 2 drawbacks:
- The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
e.g. BPF_F_ZERO_CSUM_TX. It is awkward that it is "returned" into
tunnel_flags from a *get_tunnel_key* call.
- Not all "interesting" TUNNEL_xxx flags can be mapped to existing
BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
flags just for purposes of the returned tunnel_flags.
2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
"interesting" flags. That's ok, but the drawback is that what's
"interesting" for my usecase might be limiting for other usecases.
Therefore I decided to expose what's in key.tun_flags *as is*, which seems
most flexible. The BPF user can just choose to ignore bits he's not
interested in. The TUNNEL_xxx are also UAPI, so no harm exposing them
back in the get_tunnel_key call.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-1-shmulik.ladkani@gmail.com
This patch adds the following new device features for the low
power entry and exit in the header file. The implementation for the
same will be added in the subsequent patches.
- VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
- VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
- VFIO_DEVICE_FEATURE_LOW_POWER_EXIT
For vfio-pci based devices, with the standard PCI PM registers,
all power states cannot be achieved. The platform-based power management
needs to be involved to go into the lowest power state. For doing low
power entry and exit with platform-based power management,
these device features can be used.
The entry device feature has two variants. These two variants are mainly
to support the different behaviour for the low power entry.
If there is any access for the VFIO device on the host side, then the
device will be moved out of the low power state without the user's
guest driver involvement. Some devices (for example NVIDIA VGA or
3D controller) require the user's guest driver involvement for
each low-power entry. In the first variant, the host can return the
device to low power automatically. The device will continue to
attempt to reach low power until the low power exit feature is called.
In the second variant, if the device exits low power due to an access,
the host kernel will signal the user via the provided eventfd and will
not return the device to low power without a subsequent call to one of
the low power entry features. A call to the low power exit feature is
optional if the user provided eventfd is signaled.
These device features only support VFIO_DEVICE_FEATURE_SET and
VFIO_DEVICE_FEATURE_PROBE operations.
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-2-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, bpf and wireless.
Current release - regressions:
- bpf:
- fix wrong last sg check in sk_msg_recvmsg()
- fix kernel BUG in purge_effective_progs()
- mac80211:
- fix possible leak in ieee80211_tx_control_port()
- potential NULL dereference in ieee80211_tx_control_port()
Current release - new code bugs:
- nfp: fix the access to management firmware hanging
Previous releases - regressions:
- ip: fix triggering of 'icmp redirect'
- sched: tbf: don't call qdisc_put() while holding tree lock
- bpf: fix corrupted packets for XDP_SHARED_UMEM
- bluetooth: hci_sync: fix suspend performance regression
- micrel: fix probe failure
Previous releases - always broken:
- tcp: make global challenge ack rate limitation per net-ns and
default disabled
- tg3: fix potential hang-up on system reboot
- mac802154: fix reception for no-daddr packets
Misc:
- r8152: add PID for the lenovo onelink+ dock"
* tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
net/smc: Remove redundant refcount increase
Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
tcp: make global challenge ack rate limitation per net-ns and default disabled
tcp: annotate data-race around challenge_timestamp
net: dsa: hellcreek: Print warning only once
ip: fix triggering of 'icmp redirect'
sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
selftests: net: sort .gitignore file
Documentation: networking: correct possessive "its"
kcm: fix strp_init() order and cleanup
mlxbf_gige: compute MDIO period based on i1clk
ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
net: lan966x: improve error handle in lan966x_fdma_rx_get_frame()
nfp: fix the access to management firmware hanging
net: phy: micrel: Make the GPIO to be non-exclusive
net: virtio_net: fix notification coalescing comments
net/sched: fix netdevice reference leaks in attach_default_qdiscs()
net: sched: tbf: don't call qdisc_put() while holding tree lock
net: Use u64_stats_fetch_begin_irq() for stats fetch.
net: dsa: xrs700x: Use irqsave variant for u64 stats update
...
Following user feedback, this patch simplifies zerocopy send API. One of
the main complaints is that the current API is difficult with the
userspace managing notification slots, and then send retries with error
handling make it even worse.
Instead of keeping notification slots change it to the per-request
notifications model, which posts both completion and notification CQEs
for each request when any data has been sent, and only one CQE if it
fails. All notification CQEs will have IORING_CQE_F_NOTIF set and
IORING_CQE_F_MORE in completion CQEs indicates whether to wait a
notification or not.
IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now.
This is less flexible, but greatly simplifies the user API and also the
kernel implementation. We reuse notif helpers in this patch, but in the
future there won't be need for keeping two requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
USBIP driver packs URB transfer flags in network packets that are
exchanged between Server (usbip_host) and Client (vhci_hcd).
URB_* flags are internal to kernel and could change. Where as USBIP
URB flags exchanged in network packets are USBIP user API must not
change.
Add USBIP_URB* flags to make this an explicit API and change the
client and server to map them. Details as follows:
Client tx path (USBIP_CMD_SUBMIT):
- Maps URB_* to USBIP_URB_* when it sends USBIP_CMD_SUBMIT packet.
Server rx path (USBIP_CMD_SUBMIT):
- Maps USBIP_URB_* to URB_* when it receives USBIP_CMD_SUBMIT packet.
Flags aren't included in USBIP_CMD_UNLINK and USBIP_RET_SUBMIT packets
and no special handling is needed for them in the following cases:
- Server rx path (USBIP_CMD_UNLINK)
- Client rx path & Server tx path (USBIP_RET_SUBMIT)
Update protocol documentation to reflect the change.
Suggested-by: Hongren Zenithal Zheng <i@zenithal.me>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220824002456.94605-1-skhan@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is currently no way to report via extack in a structured way
that an attribute is missing. This leads to families resorting to
string messages.
Add a pair of attributes - @offset and @type for machine-readable
way of reporting missing attributes. The @offset points to the
nest which should have contained the attribute, @type is the
expected nla_type. The offset will be skipped if the attribute
is missing at the message level rather than inside a nest.
User space should be able to figure out which attribute enum
(AKA attribute space AKA attribute set) the nest pointed to by
@offset is using.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The V4L2_CAP_ASYNCIO capability was never implemented (and in fact
it isn't clear what it was supposed to do in the first place).
Drop it from the capabilities list. Keep it in videodev2.h with the
other defines under ifndef __KERNEL__ for backwards compatibility.
This will free up a capability bit for other future uses. And having
an unused and undefined I/O method is just plain confusing.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The comment for the VP8 loop filter flags uses the partially wrong name
for the flags. Unlike the other VP8 flag names, the loop filter flag names
don't have "_FLAG" in them. Change the comment so that it matches the
actual flag definitions in the header.
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The DW100 driver gets the dewarping mapping as a binary blob from the
userspace application through a custom control.
The blob format is hardware specific so create a dedicated control for
this purpose.
Signed-off-by: Xavier Roumegue <xavier.roumegue@oss.nxp.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Allow specifying the xfrm interface if_id and link as part of a route
metadata using the lwtunnel infrastructure.
This allows for example using a single xfrm interface in collect_md
mode as the target of multiple routes each specifying a different if_id.
With the appropriate changes to iproute2, considering an xfrm device
ipsec1 in collect_md mode one can for example add a route specifying
an if_id like so:
ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1
In which case traffic routed to the device via this route would use
if_id in the xfrm interface policy lookup.
Or in the context of vrf, one can also specify the "link" property:
ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1 link_dev eth15
Note: LWT_XFRM_LINK uses NLA_U32 similar to IFLA_XFRM_LINK even though
internally "link" is signed. This is consistent with other _LINK
attributes in other devices as well as in bpf and should not have an
effect as device indexes can't be negative.
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This commit adds support for 'collect_md' mode on xfrm interfaces.
Each net can have one collect_md device, created by providing the
IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be
altered and has no if_id or link device attributes.
On transmit to this device, the if_id is fetched from the attached dst
metadata on the skb. If exists, the link property is also fetched from
the metadata. The dst metadata type used is METADATA_XFRM which holds
these properties.
On the receive side, xfrmi_rcv_cb() populates a dst metadata for each
packet received and attaches it to the skb. The if_id used in this case is
fetched from the xfrm state, and the link is fetched from the incoming
device. This information can later be used by upper layers such as tc,
ebpf, and ip rules.
Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst
metadata is postponed until after scrubing. Similarly, xfrm_input() is
adapted to avoid dropping metadata dsts by only dropping 'valid'
(skb_valid_dst(skb) == true) dsts.
Policy matching on packets arriving from collect_md xfrmi devices is
done by using the xfrm state existing in the skb's sec_path.
The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session()
is changed to keep the details of the if_id extraction tucked away
in xfrm_interface.c.
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
BRBE captured branch types will overflow perf_branch_entry.type and generic
branch types in perf_branch_entry.new_type. So override each available arch
specific branch type in the following manner to comprehensively process all
reported branch types in BRBE.
PERF_BR_ARM64_FIQ PERF_BR_NEW_ARCH_1
PERF_BR_ARM64_DEBUG_HALT PERF_BR_NEW_ARCH_2
PERF_BR_ARM64_DEBUG_EXIT PERF_BR_NEW_ARCH_3
PERF_BR_ARM64_DEBUG_INST PERF_BR_NEW_ARCH_4
PERF_BR_ARM64_DEBUG_DATA PERF_BR_NEW_ARCH_5
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-5-anshuman.khandual@arm.com
Platforms like arm64 could capture privilege level information for all the
branch records. Hence this adds a new element in the struct branch_entry to
record the privilege level information, which could be requested through a
new event.attr.branch_sample_type based flag PERF_SAMPLE_BRANCH_PRIV_SAVE.
This flag helps user choose whether privilege information is captured.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-4-anshuman.khandual@arm.com
branch_entry.type now has ran out of space to accommodate more branch types
classification. This will prevent perf branch stack implementation on arm64
(via BRBE) to capture all available branch types. Extending this bit field
i.e branch_entry.type [4 bits] is not an option as it will break user space
ABI both for little and big endian perf tools.
Extend branch classification with a new field branch_entry.new_type via a
new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which
could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as
well.
branch_entry.new_type is a 4 bit field which can hold upto 16 branch types.
The first three branch types will hold various generic page faults followed
by five architecture specific branch types, which can be overridden by the
platform for specific use cases. These architecture specific branch types
gets overridden on arm64 platform for BRBE implementation.
New generic branch types
- PERF_BR_NEW_FAULT_ALGN
- PERF_BR_NEW_FAULT_DATA
- PERF_BR_NEW_FAULT_INST
New arch specific branch types
- PERF_BR_NEW_ARCH_1
- PERF_BR_NEW_ARCH_2
- PERF_BR_NEW_ARCH_3
- PERF_BR_NEW_ARCH_4
- PERF_BR_NEW_ARCH_5
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-3-anshuman.khandual@arm.com
This expands generic branch type classification by adding two more entries
there in i.e system error and not in transaction. This also updates the x86
implementation to process X86_BR_NO_TX records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.
--------------------------------------------------------------------------
| kernel | perf tool | Impact |
--------------------------------------------------------------------------
| old | old | Works as before |
--------------------------------------------------------------------------
| old | new | PERF_BR_UNKNOWN is processed |
--------------------------------------------------------------------------
| new | old | PERF_BR_NO_TX is blocked via old PERF_BR_MAX |
--------------------------------------------------------------------------
| new | new | PERF_BR_NO_TX is recognized |
--------------------------------------------------------------------------
When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf
tool) the user space might throw up an warning complaining about an
unrecognized branch types being reported, but it's expected. PERF_BR_SERROR
& PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64
platform.
PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry
which represent other transaction states for a given branch record. Because
this completes the transaction state classification.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220824044822.70230-2-anshuman.khandual@arm.com
CRIU is preserving ifindexes of net devices after restoration. However,
current Open vSwitch API does not allow to target ifindex, so we cannot
correctly restore OVS configuration.
Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired
ifindex.
Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
ifindex.
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new "spec" bitfield to branch entries for providing speculation
information. This will be populated using hints provided by branch sampling
features on supported hardware. The following cases are covered:
* No branch speculation information is available
* Branch is speculative but taken on the wrong path
* Branch is non-speculative but taken on the correct path
* Branch is speculative and taken on the correct path
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
Pull io_uring fixes from Jens Axboe:
- Add missing header file to the MAINTAINERS entry for io_uring (Ammar)
- liburing and the kernel ship the same io_uring.h header, but one
change we've had for a long time only in liburing is to ensure it's
C++ safe. Add extern C around it, so we can more easily sync them in
the future (Ammar)
- Fix an off-by-one in the sync cancel added in this merge window (me)
- Error handling fix for passthrough (Kanchan)
- Fix for address saving for async execution for the zc tx support
(Pavel)
- Fix ordering for TCP zc notifications, so we always have them ordered
correctly between "data was sent" and "data was acked". This isn't
strictly needed with the notification slots, but we've been pondering
disabling the slot support for 6.0 - and if we do, then we do require
the ordering to be sane. Regardless of that, it's the sane thing to
do in terms of API (Pavel)
- Minor cleanup for indentation and lockdep annotation (Pavel)
* tag 'io_uring-6.0-2022-08-26' of git://git.kernel.dk/linux-block:
io_uring/net: save address for sendzc async execution
io_uring: conditional ->async_data allocation
io_uring/notif: order notif vs send CQEs
io_uring/net: fix indentation
io_uring/net: fix zc send link failing
io_uring/net: fix must_hold annotation
io_uring: fix submission-failure handling for uring-cmd
io_uring: fix off-by-one in sync cancelation file check
io_uring: uapi: Add `extern "C"` in io_uring.h for liburing
MAINTAINERS: Add `include/linux/io_uring_types.h`