33472 Commits

Author SHA1 Message Date
Joerg Roedel
73f693c3a7 mm: remove vmalloc_sync_(un)mappings()
These functions are not needed anymore because the vmalloc and ioremap
mappings are now synchronized when they are created or torn down.

Remove all callers and function definitions.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:12 -07:00
Christoph Hellwig
041de93ff8 mm: remove vmalloc_user_node_flags
Open code it in __bpf_map_area_alloc, which is the only caller.  Also
clean up __bpf_map_area_alloc to have a single vmalloc call with slightly
different flags instead of the current two different calls.

For this to compile for the nommu case add a __vmalloc_node_range stub to
nommu.c.

[akpm@linux-foundation.org: fix nommu.c build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200414131348.444715-27-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
2b9059489c mm: remove __vmalloc_node_flags_caller
Just use __vmalloc_node instead which gets and extra argument.  To be able
to to use __vmalloc_node in all caller make it available outside of
vmalloc and implement it in nommu.c.

[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200414131348.444715-25-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
88dca4ca5a mm: remove the pgprot argument to __vmalloc
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
515e5b6d90 dma-mapping: use vmap insted of reimplementing it
Replace the open coded instance of vmap with the actual function.  In
the non-contiguous (IOMMU) case this requires an extra find_vm_area,
but given that this isn't a fast path function that is a small price
to pay.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:10 -07:00
NeilBrown
a37b0715dd mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).

The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled.  The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.

This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics.  This means the simple "add 25%" heuristic is no longer
reliable.

The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.

This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi.  This ensures it will always be
able to have some pages in flight, and so will not deadlock.

In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.

This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.

So this patch
 - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
 - removes the 25% bonus that that flag gives, and
 - If PF_LOCAL_THROTTLE is set, don't delay at all unless the
   global and the local free-run thresholds are exceeded.

Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads.  This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks.  I don't know what is wanted for realtime.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chuck Lever <chuck.lever@oracle.com>	[nfsd]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:08 -07:00
Wei Li
c893de12e1 kdb: Remove the misfeature 'KDBFLAGS'
Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined
by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG'
is set, and the user can define an environment variable named 'KDBFLAGS'
too. These are puzzling indeed.

After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature.
So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we
wrote into. After this modification, we can use `md4c1 kdb_flags` instead,
to observe the state flags.

Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20200521072125.21103-1-liwei391@huawei.com
[daniel.thompson@linaro.org: Make kdb_flags unsigned to avoid arithmetic
right shift]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02 15:15:46 +01:00
Douglas Anderson
1b310030bb kdb: Cleanup math with KDB_CMD_HISTORY_COUNT
From code inspection the math in handle_ctrl_cmd() looks super sketchy
because it subjects -1 from cmdptr and then does a "%
KDB_CMD_HISTORY_COUNT".  It turns out that this code works because
"cmdptr" is unsigned and KDB_CMD_HISTORY_COUNT is a nice power of 2.
Let's make this a little less sketchy.

This patch should be a no-op.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200507161125.1.I2cce9ac66e141230c3644b8174b6c15d4e769232@changeid
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02 15:15:46 +01:00
Douglas Anderson
b1350132fe kgdb: Don't call the deinit under spinlock
When I combined kgdboc_earlycon with an inflight patch titled ("soc:
qcom-geni-se: Add interconnect support to fix earlycon crash") [1]
things went boom.  Specifically I got a crash during the transition
between kgdboc_earlycon and the main kgdboc that looked like this:

Call trace:
 __schedule_bug+0x68/0x6c
 __schedule+0x75c/0x924
 schedule+0x8c/0xbc
 schedule_timeout+0x9c/0xfc
 do_wait_for_common+0xd0/0x160
 wait_for_completion_timeout+0x54/0x74
 rpmh_write_batch+0x1fc/0x23c
 qcom_icc_bcm_voter_commit+0x1b4/0x388
 qcom_icc_set+0x2c/0x3c
 apply_constraints+0x5c/0x98
 icc_set_bw+0x204/0x3bc
 icc_put+0x30/0xf8
 geni_remove_earlycon_icc_vote+0x6c/0x9c
 qcom_geni_serial_earlycon_exit+0x10/0x1c
 kgdboc_earlycon_deinit+0x38/0x58
 kgdb_register_io_module+0x11c/0x194
 configure_kgdboc+0x108/0x174
 kgdboc_probe+0x38/0x60
 platform_drv_probe+0x90/0xb0
 really_probe+0x130/0x2fc
 ...

The problem was that we were holding the "kgdb_registration_lock"
while calling into code that didn't expect to be called in spinlock
context.

Let's slightly defer when we call the deinit code so that it's not
done under spinlock.

NOTE: this does mean that the "deinit" call of the old kgdb IO module
is now made _after_ the init of the new IO module, but presumably
that's OK.

[1] https://lkml.kernel.org/r/1588919619-21355-3-git-send-email-akashast@codeaurora.org

Fixes: 220995622da5 ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200526142001.1.I523dc33f96589cb9956f5679976d402c8cda36fa@changeid
[daniel.thompson@linaro.org: Resolved merge issues by hand]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-06-02 15:15:45 +01:00
Ingo Molnar
25de110d14 irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
Some SMP platforms don't have CONFIG_IRQ_WORK defined, resulting in a link
error at build time.

Define a stub and clean up the prototype definitions.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-02 12:34:45 +02:00
Linus Torvalds
8b39a57e96 Merge branch 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/coredump updates from Al Viro:
 "set_fs() removal in coredump-related area - mostly Christoph's
  stuff..."

* 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
  binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
  binfmt_elf: remove the set_fs in fill_siginfo_note
  signal: refactor copy_siginfo_to_user32
  powerpc/spufs: simplify spufs core dumping
  powerpc/spufs: stop using access_ok
  powerpc/spufs: fix copy_to_user while atomic
2020-06-01 16:21:46 -07:00
Linus Torvalds
4fdea5848b Merge branch 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/__put-user updates from Al Viro:
 "Removal of __put_user() calls - misc patches that don't fit into any
  other series"

* 'uaccess.__put_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  pcm_native: result of put_user() needs to be checked
  scsi_ioctl.c: switch SCSI_IOCTL_GET_IDLUN to copy_to_user()
  compat sysinfo(2): don't bother with field-by-field copyout
2020-06-01 16:17:04 -07:00
Linus Torvalds
e148a8f948 Merge branch 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/readdir updates from Al Viro:
 "Finishing the conversion of readdir.c to unsafe_... API.

  This includes the uaccess_{read,write}_begin series by Christophe
  Leroy"

* 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  readdir.c: get rid of the last __put_user(), drop now-useless access_ok()
  readdir.c: get compat_filldir() more or less in sync with filldir()
  switch readdir(2) to unsafe_copy_dirent_name()
  drm/i915/gem: Replace user_access_begin by user_write_access_begin
  uaccess: Selectively open read or write user access
  uaccess: Add user_read_access_begin/end and user_write_access_begin/end
2020-06-01 16:11:38 -07:00
Linus Torvalds
b23c4771ff A fair amount of stuff this time around, dominated by yet another massive
set from Mauro toward the completion of the RST conversion.  I *really*
 hope we are getting close to the end of this.  Meanwhile, those patches
 reach pretty far afield to update document references around the tree;
 there should be no actual code changes there.  There will be, alas, more of
 the usual trivial merge conflicts.
 
 Beyond that we have more translations, improvements to the sphinx
 scripting, a number of additions to the sysctl documentation, and lots of
 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl7VId8PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yq/gH/iaDgirQZV6UZ2v9sfwQNYolNpf2sKAuOZjd
 bPFB7WJoMQbKwQEvYrAUL2+5zPOcLYuIfzyOfo1BV1py+EyKbACcKjI4AedxfJF7
 +NchmOBhlEqmEhzx2U08HRc4/8J223WG17fJRVsV3p+opJySexSFeQucfOciX5NR
 RUCxweWWyg/FgyqjkyMMTtsePqZPmcT5dWTlVXISlbWzcv5NFhuJXnSrw8Sfzcmm
 SJMzqItv3O+CabnKQ8kMLV2PozXTMfjeWH47ZUK0Y8/8PP9+cvqwFzZ0UDQJ1Xaz
 oyW/TqmunaXhfMsMFeFGSwtfgwRHvXdxkQdtwNHvo1dV4dzTvDw=
 =fDC/
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.8' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A fair amount of stuff this time around, dominated by yet another
  massive set from Mauro toward the completion of the RST conversion. I
  *really* hope we are getting close to the end of this. Meanwhile,
  those patches reach pretty far afield to update document references
  around the tree; there should be no actual code changes there. There
  will be, alas, more of the usual trivial merge conflicts.

  Beyond that we have more translations, improvements to the sphinx
  scripting, a number of additions to the sysctl documentation, and lots
  of fixes"

* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
  Documentation: fixes to the maintainer-entry-profile template
  zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
  tracing: Fix events.rst section numbering
  docs: acpi: fix old http link and improve document format
  docs: filesystems: add info about efivars content
  Documentation: LSM: Correct the basic LSM description
  mailmap: change email for Ricardo Ribalda
  docs: sysctl/kernel: document unaligned controls
  Documentation: admin-guide: update bug-hunting.rst
  docs: sysctl/kernel: document ngroups_max
  nvdimm: fixes to maintainter-entry-profile
  Documentation/features: Correct RISC-V kprobes support entry
  Documentation/features: Refresh the arch support status files
  Revert "docs: sysctl/kernel: document ngroups_max"
  docs: move locking-specific documents to locking/
  docs: move digsig docs to the security book
  docs: move the kref doc into the core-api book
  docs: add IRQ documentation at the core-api book
  docs: debugging-via-ohci1394.txt: add it to the core-api book
  docs: fix references for ipmi.rst file
  ...
2020-06-01 15:45:27 -07:00
Linus Torvalds
c2b0fc847f ARM updates for 5.8-rc1:
- remove a now unnecessary usage of the KERNEL_DS for
   sys_oabi_epoll_ctl()
 - update my email address in a number of drivers
 - decompressor EFI updates from Ard Biesheuvel
 - module unwind section handling updates
 - sparsemem Kconfig cleanups
 - make act_mm macro respect THREAD_SIZE
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAl7VacAACgkQ9OeQG+St
 rGRHaA//Z8m+8LnSd+sqwrlEZYVj6IdPoihOhVZRsEhp4Fb09ZENOxL06W4lynHy
 tPcs4YAqLyp3Xmn+pk5NuH3SBoFGPOuUxseMpQKyT2ZnA126LB/3sW+xFPSbCayY
 gBHT7QhX6MxJEwxCxgvp2McOs6F55rYENcjozQ+DQMiNY5MTm0fKgGgbn1kpzglz
 7N2U7MR9ulXTCof3hZolQWBMOKa6LRldG7C3ajPITeOtk+vjyAPobqrkbzRDsIPV
 09j6BruFQoUbuyxtycNC0x+BDotrS/NN5OyhR07eJR5R0QNDW+qn8iqrkkVQUQsr
 mZpTR8CelzLL2+/1CDY2KrweY13eFbDoxiTVJl9aqCdlOsJKxwk1yv4HrEcpbBoK
 vtKwPDxPIKxFeJSCJX3xFjg9g6mRrBJ5CItPOThVgEqNt/dsbogqXlX4UhIjXzPs
 DBbeQ+EEZgNg7Ws/EwXIwtM8ZPc+bZZY8fskJd0gRCjbiCtstXXNjsHRd1vZ16KM
 yytpDxEIB7A+6lxcnV80VSCjD++A//kVThZ5kBl+ec1HOxRSyYOGIMGUMZhuyfE8
 f4xE3KVVsbqHGyh94C6tDLx73XgkmjfNx8YAgGRss+fQBoJbmwkJ0fDy4MhKlznD
 UnVcOXSjs7Iqih7R+icAtbIkbo1EUF5Mwu2I3SEZ/FOJmzEbbCY=
 =vMHE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

 - remove a now unnecessary usage of the KERNEL_DS for
   sys_oabi_epoll_ctl()

 - update my email address in a number of drivers

 - decompressor EFI updates from Ard Biesheuvel

 - module unwind section handling updates

 - sparsemem Kconfig cleanups

 - make act_mm macro respect THREAD_SIZE

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8980/1: Allow either FLATMEM or SPARSEMEM on the multiplatform build
  ARM: 8979/1: Remove redundant ARCH_SPARSEMEM_DEFAULT setting
  ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE
  ARM: decompressor: run decompressor in place if loaded via UEFI
  ARM: decompressor: move GOT into .data for EFI enabled builds
  ARM: decompressor: defer loading of the contents of the LC0 structure
  ARM: decompressor: split off _edata and stack base into separate object
  ARM: decompressor: move headroom variable out of LC0
  ARM: 8976/1: module: allow arch overrides for .init section names
  ARM: 8975/1: module: fix handling of unwind init sections
  ARM: 8974/1: use SPARSMEM_STATIC when SPARSEMEM is enabled
  ARM: 8971/1: replace the sole use of a symbol with its definition
  ARM: 8969/1: decompressor: simplify libfdt builds
  Update rmk's email address in various drivers
  ARM: compat: remove KERNEL_DS usage in sys_oabi_epoll_ctl()
2020-06-01 15:36:32 -07:00
Jakub Sitnicki
0c047ecbb7 bpf, cgroup: Return ENOLINK for auto-detached links on update
Failure to update a bpf_link because it has been auto-detached by a dying
cgroup currently results in EINVAL error, even though the arguments passed
to bpf() syscall are not wrong.

bpf_links attaching to netns in this case will return ENOLINK, which
carries the message that the link is no longer attached to anything.

Change cgroup bpf_links to do the same to keep the uAPI errors consistent.

Fixes: 0c991ebc8c69 ("bpf: Implement bpf_prog replacement for an active bpf_cgroup_link")
Suggested-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-6-jakub@cloudflare.com
2020-06-01 15:21:03 -07:00
Jakub Sitnicki
7f045a49fe bpf: Add link-based BPF program attachment to network namespace
Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.

Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt > 0).

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
2020-06-01 15:21:03 -07:00
Jakub Sitnicki
b27f7bb590 flow_dissector: Move out netns_bpf prog callbacks
Move functions to manage BPF programs attached to netns that are not
specific to flow dissector to a dedicated module named
bpf/net_namespace.c.

The set of functions will grow with the addition of bpf_link support for
netns attached programs. This patch prepares ground by creating a place
for it.

This is a code move with no functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-4-jakub@cloudflare.com
2020-06-01 15:21:02 -07:00
Jakub Sitnicki
a3fd7ceee0 net: Introduce netns_bpf for BPF programs attached to netns
In order to:

 (1) attach more than one BPF program type to netns, or
 (2) support attaching BPF programs to netns with bpf_link, or
 (3) support multi-prog attach points for netns

we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.

Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.

Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.

This is similar to how it is organized for cgroup with cgroup_bpf.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
2020-06-01 15:21:02 -07:00
Linus Torvalds
533b220f7b arm64 updates for 5.8
- Branch Target Identification (BTI)
 	* Support for ARMv8.5-BTI in both user- and kernel-space. This
 	  allows branch targets to limit the types of branch from which
 	  they can be called and additionally prevents branching to
 	  arbitrary code, although kernel support requires a very recent
 	  toolchain.
 
 	* Function annotation via SYM_FUNC_START() so that assembly
 	  functions are wrapped with the relevant "landing pad"
 	  instructions.
 
 	* BPF and vDSO updates to use the new instructions.
 
 	* Addition of a new HWCAP and exposure of BTI capability to
 	  userspace via ID register emulation, along with ELF loader
 	  support for the BTI feature in .note.gnu.property.
 
 	* Non-critical fixes to CFI unwind annotations in the sigreturn
 	  trampoline.
 
 - Shadow Call Stack (SCS)
 	* Support for Clang's Shadow Call Stack feature, which reserves
 	  platform register x18 to point at a separate stack for each
 	  task that holds only return addresses. This protects function
 	  return control flow from buffer overruns on the main stack.
 
 	* Save/restore of x18 across problematic boundaries (user-mode,
 	  hypervisor, EFI, suspend, etc).
 
 	* Core support for SCS, should other architectures want to use it
 	  too.
 
 	* SCS overflow checking on context-switch as part of the existing
 	  stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
 
 - CPU feature detection
 	* Removed numerous "SANITY CHECK" errors when running on a system
 	  with mismatched AArch32 support at EL1. This is primarily a
 	  concern for KVM, which disabled support for 32-bit guests on
 	  such a system.
 
 	* Addition of new ID registers and fields as the architecture has
 	  been extended.
 
 - Perf and PMU drivers
 	* Minor fixes and cleanups to system PMU drivers.
 
 - Hardware errata
 	* Unify KVM workarounds for VHE and nVHE configurations.
 
 	* Sort vendor errata entries in Kconfig.
 
 - Secure Monitor Call Calling Convention (SMCCC)
 	* Update to the latest specification from Arm (v1.2).
 
 	* Allow PSCI code to query the SMCCC version.
 
 - Software Delegated Exception Interface (SDEI)
 	* Unexport a bunch of unused symbols.
 
 	* Minor fixes to handling of firmware data.
 
 - Pointer authentication
 	* Add support for dumping the kernel PAC mask in vmcoreinfo so
 	  that the stack can be unwound by tools such as kdump.
 
 	* Simplification of key initialisation during CPU bringup.
 
 - BPF backend
 	* Improve immediate generation for logical and add/sub
 	  instructions.
 
 - vDSO
 	- Minor fixes to the linker flags for consistency with other
 	  architectures and support for LLVM's unwinder.
 
 	- Clean up logic to initialise and map the vDSO into userspace.
 
 - ACPI
 	- Work around for an ambiguity in the IORT specification relating
 	  to the "num_ids" field.
 
 	- Support _DMA method for all named components rather than only
 	  PCIe root complexes.
 
 	- Minor other IORT-related fixes.
 
 - Miscellaneous
 	* Initialise debug traps early for KGDB and fix KDB cacheflushing
 	  deadlock.
 
 	* Minor tweaks to early boot state (documentation update, set
 	  TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
 
 	* Refactoring and cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl7U9csQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLBHCACs/YU4SM7Om5f+7QnxIKao5DBr2CnGGvdC
 yTfDghFDTLQVv3MufLlfno3yBe5G8sQpcZfcc+hewfcGoMzVZXu8s7LzH6VSn9T9
 jmT3KjDMrg0RjSHzyumJp2McyelTk0a4FiKArSIIKsJSXUyb1uPSgm7SvKVDwEwU
 JGDzL9IGilmq59GiXfDzGhTZgmC37QdwRoRxDuqtqWQe5CHoRXYexg87HwBKOQxx
 HgU9L7ehri4MRZfpyjaDrr6quJo3TVnAAKXNBh3mZAskVS9ZrfKpEH0kYWYuqybv
 znKyHRecl/rrGePV8RTMtrwnSdU26zMXE/omsVVauDfG9hqzqm+Q
 =w3qi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "A sizeable pile of arm64 updates for 5.8.

  Summary below, but the big two features are support for Branch Target
  Identification and Clang's Shadow Call stack. The latter is currently
  arm64-only, but the high-level parts are all in core code so it could
  easily be adopted by other architectures pending toolchain support

  Branch Target Identification (BTI):

   - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
     branch targets to limit the types of branch from which they can be
     called and additionally prevents branching to arbitrary code,
     although kernel support requires a very recent toolchain.

   - Function annotation via SYM_FUNC_START() so that assembly functions
     are wrapped with the relevant "landing pad" instructions.

   - BPF and vDSO updates to use the new instructions.

   - Addition of a new HWCAP and exposure of BTI capability to userspace
     via ID register emulation, along with ELF loader support for the
     BTI feature in .note.gnu.property.

   - Non-critical fixes to CFI unwind annotations in the sigreturn
     trampoline.

  Shadow Call Stack (SCS):

   - Support for Clang's Shadow Call Stack feature, which reserves
     platform register x18 to point at a separate stack for each task
     that holds only return addresses. This protects function return
     control flow from buffer overruns on the main stack.

   - Save/restore of x18 across problematic boundaries (user-mode,
     hypervisor, EFI, suspend, etc).

   - Core support for SCS, should other architectures want to use it
     too.

   - SCS overflow checking on context-switch as part of the existing
     stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

  CPU feature detection:

   - Removed numerous "SANITY CHECK" errors when running on a system
     with mismatched AArch32 support at EL1. This is primarily a concern
     for KVM, which disabled support for 32-bit guests on such a system.

   - Addition of new ID registers and fields as the architecture has
     been extended.

  Perf and PMU drivers:

   - Minor fixes and cleanups to system PMU drivers.

  Hardware errata:

   - Unify KVM workarounds for VHE and nVHE configurations.

   - Sort vendor errata entries in Kconfig.

  Secure Monitor Call Calling Convention (SMCCC):

   - Update to the latest specification from Arm (v1.2).

   - Allow PSCI code to query the SMCCC version.

  Software Delegated Exception Interface (SDEI):

   - Unexport a bunch of unused symbols.

   - Minor fixes to handling of firmware data.

  Pointer authentication:

   - Add support for dumping the kernel PAC mask in vmcoreinfo so that
     the stack can be unwound by tools such as kdump.

   - Simplification of key initialisation during CPU bringup.

  BPF backend:

   - Improve immediate generation for logical and add/sub instructions.

  vDSO:

   - Minor fixes to the linker flags for consistency with other
     architectures and support for LLVM's unwinder.

   - Clean up logic to initialise and map the vDSO into userspace.

  ACPI:

   - Work around for an ambiguity in the IORT specification relating to
     the "num_ids" field.

   - Support _DMA method for all named components rather than only PCIe
     root complexes.

   - Minor other IORT-related fixes.

  Miscellaneous:

   - Initialise debug traps early for KGDB and fix KDB cacheflushing
     deadlock.

   - Minor tweaks to early boot state (documentation update, set
     TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

   - Refactoring and cleanup"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  KVM: arm64: Check advertised Stage-2 page size capability
  arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
  ACPI/IORT: Remove the unused __get_pci_rid()
  arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
  arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
  arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
  arm64/cpufeature: Introduce ID_MMFR5 CPU register
  arm64/cpufeature: Introduce ID_DFR1 CPU register
  arm64/cpufeature: Introduce ID_PFR2 CPU register
  arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
  arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
  arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
  arm64: mm: Add asid_gen_match() helper
  firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
  arm64: vdso: Fix CFI directives in sigreturn trampoline
  arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
  ...
2020-06-01 15:18:27 -07:00
Jiri Olsa
958a3f2d2a bpf: Use tracing helpers for lsm programs
Currenty lsm uses bpf_tracing_func_proto helpers which do
not include stack trace or perf event output. It's useful
to have those for bpftrace lsm support [1].

Using tracing_prog_func_proto helpers for lsm programs.

[1] https://github.com/iovisor/bpftrace/pull/1347

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200531154255.896551-1-jolsa@kernel.org
2020-06-01 15:08:04 -07:00
Lorenzo Bianconi
1b698fa5d8 xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
2020-06-01 15:02:53 -07:00
Denis Efremov
bb2359f4db bpf: Change kvfree to kfree in generic_map_lookup_batch()
buf_prevkey in generic_map_lookup_batch() is allocated with
kmalloc(). It's safe to free it with kfree().

Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200601162814.17426-1-efremov@linux.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:33 -07:00
David Ahern
64b59025c1 xdp: Add xdp_txq_info to xdp_buff
Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

>From a UAPI perspective, add egress_ifindex to xdp context for
bpf programs to see the Tx device.

Update the verifier to only allow accesses to egress_ifindex by
XDP programs with BPF_XDP_DEVMAP expected attach type.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-4-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:32 -07:00
David Ahern
fbee97feed bpf: Add support to attach bpf program to a devmap entry
Add BPF_XDP_DEVMAP attach type for use with programs associated with a
DEVMAP entry.

Allow DEVMAPs to associate a program with a device entry by adding
a bpf_prog.fd to 'struct bpf_devmap_val'. Values read show the program
id, so the fd and id are a union. bpf programs can get access to the
struct via vmlinux.h.

The program associated with the fd must have type XDP with expected
attach type BPF_XDP_DEVMAP. When a program is associated with a device
index, the program is run on an XDP_REDIRECT and before the buffer is
added to the per-cpu queue. At this point rxq data is still valid; the
next patch adds tx device information allowing the prorgam to see both
ingress and egress device indices.

XDP generic is skb based and XDP programs do not work with skb's. Block
the use case by walking maps used by a program that is to be attached
via xdpgeneric and fail if any of them are DEVMAP / DEVMAP_HASH with

Block attach of BPF_XDP_DEVMAP programs to devices.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-3-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:32 -07:00
David Ahern
7f1c04269f devmap: Formalize map value as a named struct
Add 'struct bpf_devmap_val' to formalize the expected values that can
be passed in for a DEVMAP. Update devmap code to use the struct.

Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-2-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:31 -07:00
Yonghong Song
b36e62eb85 bpf: Use strncpy_from_unsafe_strict() in bpf_seq_printf() helper
In bpf_seq_printf() helper, when user specified a "%s" in the
format string, strncpy_from_unsafe() is used to read the actual string
to a buffer. The string could be a format string or a string in
the kernel data structure. It is really unlikely that the string
will reside in the user memory.

This is different from Commit b2a5212fb634 ("bpf: Restrict bpf_trace_printk()'s %s
usage and add %pks, %pus specifier") which still used
strncpy_from_unsafe() for "%s" to preserve the old behavior.

If in the future, bpf_seq_printf() indeed needs to read user
memory, we can implement "%pus" format string.

Based on discussion in [1], if the intent is to read kernel memory,
strncpy_from_unsafe_strict() should be used. So this patch
changed to use strncpy_from_unsafe_strict().

[1]: https://lore.kernel.org/bpf/20200521152301.2587579-1-hch@lst.de/T/

Fixes: 492e639f0c22 ("bpf: Add bpf_seq_printf and bpf_seq_write helpers")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200529004810.3352219-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:48:31 -07:00
Andrii Nakryiko
457f44363a bpf: Implement BPF ring buffer and verifier support for it
This commit adds a new MPSC ring buffer implementation into BPF ecosystem,
which allows multiple CPUs to submit data to a single shared ring buffer. On
the consumption side, only single consumer is assumed.

Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
  - more efficient memory utilization by sharing ring buffer across CPUs;
  - preserving ordering of events that happen sequentially in time, even
  across multiple CPUs (e.g., fork/exec/exit events for a task).

These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer.  Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.

Semantics and APIs
------------------
Single ring buffer is presented to BPF programs as an instance of BPF map of
type BPF_MAP_TYPE_RINGBUF. Two other alternatives considered, but ultimately
rejected.

One way would be to, similar to BPF_MAP_TYPE_PERF_EVENT_ARRAY, make
BPF_MAP_TYPE_RINGBUF could represent an array of ring buffers, but not enforce
"same CPU only" rule. This would be more familiar interface compatible with
existing perf buffer use in BPF, but would fail if application needed more
advanced logic to lookup ring buffer by arbitrary key. HASH_OF_MAPS addresses
this with current approach. Additionally, given the performance of BPF
ringbuf, many use cases would just opt into a simple single ring buffer shared
among all CPUs, for which current approach would be an overkill.

Another approach could introduce a new concept, alongside BPF map, to
represent generic "container" object, which doesn't necessarily have key/value
interface with lookup/update/delete operations. This approach would add a lot
of extra infrastructure that has to be built for observability and verifier
support. It would also add another concept that BPF developers would have to
familiarize themselves with, new syntax in libbpf, etc. But then would really
provide no additional benefits over the approach of using a map.
BPF_MAP_TYPE_RINGBUF doesn't support lookup/update/delete operations, but so
doesn't few other map types (e.g., queue and stack; array doesn't support
delete, etc).

The approach chosen has an advantage of re-using existing BPF map
infrastructure (introspection APIs in kernel, libbpf support, etc), being
familiar concept (no need to teach users a new type of object in BPF program),
and utilizing existing tooling (bpftool). For common scenario of using
a single ring buffer for all CPUs, it's as simple and straightforward, as
would be with a dedicated "container" object. On the other hand, by being
a map, it can be combined with ARRAY_OF_MAPS and HASH_OF_MAPS map-in-maps to
implement a wide variety of topologies, from one ring buffer for each CPU
(e.g., as a replacement for perf buffer use cases), to a complicated
application hashing/sharding of ring buffers (e.g., having a small pool of
ring buffers with hashed task's tgid being a look up key to preserve order,
but reduce contention).

Key and value sizes are enforced to be zero. max_entries is used to specify
the size of ring buffer and has to be a power of 2 value.

There are a bunch of similarities between perf buffer
(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics:
  - variable-length records;
  - if there is no more space left in ring buffer, reservation fails, no
    blocking;
  - memory-mappable data area for user-space applications for ease of
    consumption and high performance;
  - epoll notifications for new incoming data;
  - but still the ability to do busy polling for new data to achieve the
    lowest latency, if necessary.

BPF ringbuf provides two sets of APIs to BPF programs:
  - bpf_ringbuf_output() allows to *copy* data from one place to a ring
    buffer, similarly to bpf_perf_event_output();
  - bpf_ringbuf_reserve()/bpf_ringbuf_commit()/bpf_ringbuf_discard() APIs
    split the whole process into two steps. First, a fixed amount of space is
    reserved. If successful, a pointer to a data inside ring buffer data area
    is returned, which BPF programs can use similarly to a data inside
    array/hash maps. Once ready, this piece of memory is either committed or
    discarded. Discard is similar to commit, but makes consumer ignore the
    record.

bpf_ringbuf_output() has disadvantage of incurring extra memory copy, because
record has to be prepared in some other place first. But it allows to submit
records of the length that's not known to verifier beforehand. It also closely
matches bpf_perf_event_output(), so will simplify migration significantly.

bpf_ringbuf_reserve() avoids the extra copy of memory by providing a memory
pointer directly to ring buffer memory. In a lot of cases records are larger
than BPF stack space allows, so many programs have use extra per-CPU array as
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
completely. But in exchange, it only allows a known constant size of memory to
be reserved, such that verifier can verify that BPF program can't access
memory outside its reserved record space. bpf_ringbuf_output(), while slightly
slower due to extra memory copy, covers some use cases that are not suitable
for bpf_ringbuf_reserve().

The difference between commit and discard is very small. Discard just marks
a record as discarded, and such records are supposed to be ignored by consumer
code. Discard is useful for some advanced use-cases, such as ensuring
all-or-nothing multi-record submission, or emulating temporary malloc()/free()
within single BPF program invocation.

Each reserved record is tracked by verifier through existing
reference-tracking logic, similar to socket ref-tracking. It is thus
impossible to reserve a record, but forget to submit (or discard) it.

bpf_ringbuf_query() helper allows to query various properties of ring buffer.
Currently 4 are supported:
  - BPF_RB_AVAIL_DATA returns amount of unconsumed data in ring buffer;
  - BPF_RB_RING_SIZE returns the size of ring buffer;
  - BPF_RB_CONS_POS/BPF_RB_PROD_POS returns current logical possition of
    consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
off by the time helper returns, so this should be used only for
debugging/reporting reasons or for implementing various heuristics, that take
into account highly-changeable nature of some of those characteristics.

One such heuristic might involve more fine-grained control over poll/epoll
notifications about new data availability in ring buffer. Together with
BPF_RB_NO_WAKEUP/BPF_RB_FORCE_WAKEUP flags for output/commit/discard helpers,
it allows BPF program a high degree of control and, e.g., more efficient
batched notifications. Default self-balancing strategy, though, should be
adequate for most applications and will work reliable and efficiently already.

Design and implementation
-------------------------
This reserve/commit schema allows a natural way for multiple producers, either
on different CPUs or even on the same CPU/in the same BPF program, to reserve
independent records and work with them without blocking other producers. This
means that if BPF program was interruped by another BPF program sharing the
same ring buffer, they will both get a record reserved (provided there is
enough space left) and can work with it and submit it independently. This
applies to NMI context as well, except that due to using a spinlock during
reservation, in NMI context, bpf_ringbuf_reserve() might fail to get a lock,
in which case reservation will fail even if ring buffer is not full.

The ring buffer itself internally is implemented as a power-of-2 sized
circular buffer, with two logical and ever-increasing counters (which might
wrap around on 32-bit architectures, that's not a problem):
  - consumer counter shows up to which logical position consumer consumed the
    data;
  - producer counter denotes amount of data reserved by all producers.

Each time a record is reserved, producer that "owns" the record will
successfully advance producer counter. At that point, data is still not yet
ready to be consumed, though. Each record has 8 byte header, which contains
the length of reserved record, as well as two extra bits: busy bit to denote
that record is still being worked on, and discard bit, which might be set at
commit time if record is discarded. In the latter case, consumer is supposed
to skip the record and move on to the next one. Record header also encodes
record's relative offset from the beginning of ring buffer data area (in
pages). This allows bpf_ringbuf_commit()/bpf_ringbuf_discard() to accept only
the pointer to the record itself, without requiring also the pointer to ring
buffer itself. Ring buffer memory location will be restored from record
metadata header. This significantly simplifies verifier, as well as improving
API usability.

Producer counter increments are serialized under spinlock, so there is
a strict ordering between reservations. Commits, on the other hand, are
completely lockless and independent. All records become available to consumer
in the order of reservations, but only after all previous records where
already committed. It is thus possible for slow producers to temporarily hold
off submitted records, that were reserved later.

Reservation/commit/consumer protocol is verified by litmus tests in
Documentation/litmus-test/bpf-rb.

One interesting implementation bit, that significantly simplifies (and thus
speeds up as well) implementation of both producers and consumers is how data
area is mapped twice contiguously back-to-back in the virtual memory. This
allows to not take any special measures for samples that have to wrap around
at the end of the circular buffer data area, because the next page after the
last data page would be first data page again, and thus the sample will still
appear completely contiguous in virtual memory. See comment and a simple ASCII
diagram showing this visually in bpf_ringbuf_area_alloc().

Another feature that distinguishes BPF ringbuf from perf ring buffer is
a self-pacing notifications of new data being availability.
bpf_ringbuf_commit() implementation will send a notification of new record
being available after commit only if consumer has already caught up right up
to the record being committed. If not, consumer still has to catch up and thus
will see new data anyways without needing an extra poll notification.
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c) show that
this allows to achieve a very high throughput without having to resort to
tricks like "notify only every Nth sample", which are necessary with perf
buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept BPF_RB_NO_WAKEUP and
BPF_RB_FORCE_WAKEUP flags, which give full control over notifications of data
availability, but require extra caution and diligence in using this API.

Comparison to alternatives
--------------------------
Before considering implementing BPF ring buffer from scratch existing
alternatives in kernel were evaluated, but didn't seem to meet the needs. They
largely fell into few categores:
  - per-CPU buffers (perf, ftrace, etc), which don't satisfy two motivations
    outlined above (ordering and memory consumption);
  - linked list-based implementations; while some were multi-producer designs,
    consuming these from user-space would be very complicated and most
    probably not performant; memory-mapping contiguous piece of memory is
    simpler and more performant for user-space consumers;
  - io_uring is SPSC, but also requires fixed-sized elements. Naively turning
    SPSC queue into MPSC w/ lock would have subpar performance compared to
    locked reserve + lockless commit, as with BPF ring buffer. Fixed sized
    elements would be too limiting for BPF programs, given existing BPF
    programs heavily rely on variable-sized perf buffer already;
  - specialized implementations (like a new printk ring buffer, [0]) with lots
    of printk-specific limitations and implications, that didn't seem to fit
    well for intended use with BPF programs.

  [0] https://lwn.net/Articles/779550/

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-2-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:22 -07:00
Anton Protopopov
1ea0f9120c bpf: Fix map permissions check
The map_lookup_and_delete_elem() function should check for both FMODE_CAN_WRITE
and FMODE_CAN_READ permissions because it returns a map element to user space.

Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall")
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200527185700.14658-5-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:21 -07:00
John Fastabend
f470378c75 bpf: Extend bpf_base_func_proto helpers with probe_* and *current_task*
Often it is useful when applying policy to know something about the
task. If the administrator has CAP_SYS_ADMIN rights then they can
use kprobe + networking hook and link the two programs together to
accomplish this. However, this is a bit clunky and also means we have
to call both the network program and kprobe program when we could just
use a single program and avoid passing metadata through sk_msg/skb->cb,
socket, maps, etc.

To accomplish this add probe_* helpers to bpf_base_func_proto programs
guarded by a perfmon_capable() check. New supported helpers are the
following,

 BPF_FUNC_get_current_task
 BPF_FUNC_probe_read_user
 BPF_FUNC_probe_read_kernel
 BPF_FUNC_probe_read_user_str
 BPF_FUNC_probe_read_kernel_str

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/159033905529.12355.4368381069655254932.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:20 -07:00
Chris Packham
0142dddcbe bpf: Fix spelling in comment explaining ARG1 in ___bpf_prog_run
Change 'handeled' to 'handled'.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200525230025.14470-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:20 -07:00
Jakub Sitnicki
fe537393b5 bpf: Fix returned error sign when link doesn't support updates
System calls encode returned errors as negative values. Fix a typo that
breaks this convention for bpf(LINK_UPDATE) when bpf_link doesn't support
update operation.

Fixes: f9d041271cf4 ("bpf: Refactor bpf_link update handling")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200525122928.1164495-1-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01 14:38:19 -07:00
Linus Torvalds
17e0a7cb6a Misc cleanups, with an emphasis on removing obsolete/dead code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VLcQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iFnhAArGBqco3C2RPQugv7UDDbKEaMvxOGrc5B
 kwnyOS/k/yeIkfhT9u11oBuLcaj/Zgw8YCjFyRfaNsorRqnytLyZzZ6PvdCCE3YU
 X3DVYgulcdAQnM4bS2e3Kt9ciJvFxB27XNm0AfuyLMUxMqCD+iIO4gJ6TuQNBYy3
 dfUMfB1R9OUDW13GCrASe+p1Dw76uaqVngdFWJhnC8Rm49E6gFXq7CLQp5Cka81I
 KZeJ8I6ug9p3gqhOIXdi+S6g5CM5jf86Wkk7dOHwHFH7CceFb3FIz7z0n1je4Wgd
 L5rYX7+PwfNeZ73GIuvEBN+agJH2K0H/KmnlWNWeZHzc+J12MeruSdSMBIkBOEpn
 iSbYAOmDpQLzBjTdZjC8bDqTZf472WrTh4VwN9NxHLucjdC+IqGoTAvnyyEOmZ5o
 R7sv7Q++316CVwRhYVXbzwZcqtiinCDE1EkP5nKTo9z3z0kMF5+ce/k7wn5sgZIk
 zJq3LXtaToiDoDRAPGxcvFPts9MdC0EI1aKTIjaK/n6i2h/SpJfrTKgANWaldYTe
 XJIqlSB43saqf5YAQ3/sY+wnpCRBmmCU+sfKja4C8bH7RuggI3mZS19uhFs0Qctq
 Yx5bIXVSBAIqjJtgzQ0WAAZ5LrCpNNyAzb35ZYefQlGyJlx1URKXVBmxa6S99biU
 KiYX7Dk5uhQ=
 =0ZQd
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups, with an emphasis on removing obsolete/dead code"

* tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/spinlock: Remove obsolete ticket spinlock macros and types
  x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit
  x86/apb_timer: Drop unused declaration and macro
  x86/apb_timer: Drop unused TSC calibration
  x86/io_apic: Remove unused function mp_init_irq_at_boot()
  x86/mm: Stop printing BRK addresses
  x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()
  x86/nmi: Remove edac.h include leftover
  mm: Remove MPX leftovers
  x86/mm/mmap: Fix -Wmissing-prototypes warnings
  x86/early_printk: Remove unused includes
  crash_dump: Remove no longer used saved_max_pfn
  x86/smpboot: Remove the last ICPU() macro
2020-06-01 13:47:10 -07:00
Linus Torvalds
d861f6e682 Misc cleanups in the SMP hotplug and cross-call code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VJfsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ihcA/+Ko18kdGRPAlShM9qkDWO5N80p1LEp7F0
 ku1OxPAz9ii7K/jlnGr9wYYPxsIL3lbFeqFE7q5q5socXufaN8MUj9sVCmN7ScmR
 zO84aTHtxrJJhKIPM6HkUTbVl5KrQaud3F/J56CCjuKPsJWy9iuCGnKtfKK38bx+
 qJEfVKVm95Bv0NSEvqvci3DKKPYjzpKzuuttHXQ8Z80zG94FEkwj0JwZzttIjLl1
 rgRMgWTH7+3tQCMnZEfXG8xBxbXS9i3hKyr/v5QTNgIICyXGquPkf5MiwjJFS2Xb
 wpPqNh8HTo5kUJstYygRjcftatU7K72h2Rz/CoUkN2roNYlvRAhdBaBMwN0cGaG8
 pPhnLHHHRYZjl4fiROgRwVV3A6LcAHSrIcKzwGrvpCSpqyVozPGsmD/e8ZG1JYpC
 vxESTZbCDywng2Ls8jqQBut+dFGElvopXl1s004bCak89IFR4p15qojMJK2MSsqu
 BxhjIoqp8/f1fsAX+1p0RBEYnEr1KFtWa+nY8aVKL6bEx+Y7Qyq0ypMGtKavP06X
 VMcPMm1gYeXoGpLaTLYBRL5t7Rmm7i+xufuDQKUJetenfh2YS4aQ9lfV+rsQH1YE
 wavQrbwThfBZ9K1XkEmOkSqONysZ2YAtK9slKzciQIZvY3V8NbKAmBudCgqTgarp
 xqeW9NFfeFc=
 =Rr2n
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP updates from Ingo Molnar:
 "Misc cleanups in the SMP hotplug and cross-call code"

* tag 'smp-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Remove __freeze_secondary_cpus()
  cpu/hotplug: Remove disable_nonboot_cpus()
  cpu/hotplug: Fix a typo in comment "broadacasted"->"broadcasted"
  smp: Use smp_call_func_t in on_each_cpu()
2020-06-01 13:38:55 -07:00
Linus Torvalds
a7092c8204 Kernel side changes:
- Add AMD Fam17h RAPL support
   - Introduce CAP_PERFMON to kernel and user space
   - Add Zhaoxin CPU support
   - Misc fixes and cleanups
 
 Tooling changes:
 
   perf record:
 
     - Introduce --switch-output-event to use arbitrary events to be setup
       and read from a side band thread and, when they take place a signal
       be sent to the main 'perf record' thread, reusing the --switch-output
       code to take perf.data snapshots from the --overwrite ring buffer, e.g.:
 
 	# perf record --overwrite -e sched:* \
 		      --switch-output-event syscalls:*connect* \
 		      workload
 
       will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
       connect syscalls.
 
     - Add --num-synthesize-threads option to control degree of parallelism of the
       synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be
       time consuming. This mimics pre-existing behaviour in 'perf top'.
 
   perf bench:
 
     - Add a multi-threaded synthesize benchmark.
     - Add kallsyms parsing benchmark.
 
   Intel PT support:
 
     - Stitch LBR records from multiple samples to get deeper backtraces,
       there are caveats, see the csets for details.
     - Allow using Intel PT to synthesize callchains for regular events.
     - Add support for synthesizing branch stacks for regular events (cycles,
       instructions, etc) from Intel PT data.
 
   Misc changes:
 
     - Updated perf vendor events for power9 and Coresight.
     - Add flamegraph.py script via 'perf flamegraph'
     - Misc other changes, fixes and cleanups - see the Git log for details.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VJAcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hAYw/8DFtzGkMaaWkrDSj62LXtWQiqr1l01ZFt
 9GzV4aN4/go+K4BQtsQN8cUjOkRHFnOryLuD9LfSBfqsdjuiyTynV/cJkeUGQBck
 TT/GgWf3XKJzTUBRQRk367Gbqs9UKwBP8CdFhOXcNzGEQpjhbwwIDPmem94U4L1N
 XLsysgC45ejWL1kMTZKmk6hDIidlFeDg9j70WDPX1nNfCeisk25rxwTpdgvjsjcj
 3RzPRt2EGS+IkuF4QSCT5leYSGaCpVDHCQrVpHj57UoADfWAyC71uopTLG4OgYSx
 PVd9gvloMeeqWmroirIxM67rMd/TBTfVekNolhnQDjqp60Huxm+gGUYmhsyjNqdx
 Pb8HRZCBAudei9Ue4jNMfhCRK2Ug1oL5wNvN1xcSteAqrwMlwBMGHWns6l12x0ks
 BxYhyLvfREvnKijXc1o8D5paRgqohJgfnHlrUZeacyaw5hQCbiVRpwg0T1mWAF53
 u9hfWLY0Oy+Qs2C7EInNsWSYXRw8oPQNTFVx2I968GZqsEn4DC6Pt3ovWrDKIDnz
 ugoZJQkJ3/O8stYSMiyENehdWlo575NkapCTDwhLWnYztrw4skqqHE8ighU/e8ug
 o/Kx7ANWN9OjjjQpq2GVUeT0jCaFO+OMiGMNEkKoniYgYjogt3Gw5PeedBMtY07p
 OcWTiQZamjU=
 =i27M
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add AMD Fam17h RAPL support

   - Introduce CAP_PERFMON to kernel and user space

   - Add Zhaoxin CPU support

   - Misc fixes and cleanups

  Tooling changes:

   - perf record:

     Introduce '--switch-output-event' to use arbitrary events to be
     setup and read from a side band thread and, when they take place a
     signal be sent to the main 'perf record' thread, reusing the core
     for '--switch-output' to take perf.data snapshots from the ring
     buffer used for '--overwrite', e.g.:

	# perf record --overwrite -e sched:* \
		      --switch-output-event syscalls:*connect* \
		      workload

     will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
     connect syscalls.

     Add '--num-synthesize-threads' option to control degree of
     parallelism of the synthesize_mmap() code which is scanning
     /proc/PID/task/PID/maps and can be time consuming. This mimics
     pre-existing behaviour in 'perf top'.

   - perf bench:

     Add a multi-threaded synthesize benchmark and kallsyms parsing
     benchmark.

   - Intel PT support:

     Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.

     Allow using Intel PT to synthesize callchains for regular events.

     Add support for synthesizing branch stacks for regular events
     (cycles, instructions, etc) from Intel PT data.

  Misc changes:

   - Updated perf vendor events for power9 and Coresight.

   - Add flamegraph.py script via 'perf flamegraph'

   - Misc other changes, fixes and cleanups - see the Git log for details

  Also, since over the last couple of years perf tooling has matured and
  decoupled from the kernel perf changes to a large degree, going
  forward Arnaldo is going to send perf tooling changes via direct pull
  requests"

* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
  perf/x86/rapl: Add AMD Fam17h RAPL support
  perf/x86/rapl: Make perf_probe_msr() more robust and flexible
  perf/x86/rapl: Flip logic on default events visibility
  perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
  perf/x86/rapl: Move RAPL support to common x86 code
  perf/core: Replace zero-length array with flexible-array
  perf/x86: Replace zero-length array with flexible-array
  perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
  perf/x86/rapl: Add Ice Lake RAPL support
  perf flamegraph: Use /bin/bash for report and record scripts
  perf cs-etm: Move definition of 'traceid_list' global variable from header file
  libsymbols kallsyms: Move hex2u64 out of header
  libsymbols kallsyms: Parse using io api
  perf bench: Add kallsyms parsing
  perf: cs-etm: Update to build with latest opencsd version.
  perf symbol: Fix kernel symbol address display
  perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  ...
2020-06-01 13:23:59 -07:00
Linus Torvalds
60056060be The biggest change to core locking facilities in this cycle is the introduction
of local_lock_t - this primitive comes from the -rt project and identifies
 CPU-local locking dependencies normally handled opaquely beind preempt_disable()
 or local_irq_save/disable() critical sections.
 
 The generated code on mainline kernels doesn't change as a result, but still there
 are benefits: improved debugging and better documentation of data structure
 accesses.
 
 The new local_lock_t primitives are introduced and then utilized in a couple of
 kernel subsystems. No change in functionality is intended.
 
 There's also other smaller changes and cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VAogRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h67BAAusYb44jJyZUE74rmaLnJr0c6j7eJ6twT
 8LKRwxb21Y35DMuX6M5ewmvnHiLFYmjL728z+y8O+SP8vb4PSJBX/75X+wsawIJB
 cjHdxonyynVVC4zcbdrc37FsrOiVoKLbbZcpqRzHksKkCq2PHbFVxBNvEaKHZCWW
 1jnq0MRy9wEJtW9EThDWPLD+OPWhBvocUFYJH4fiqCIaDiip/E16fz3i+yMPt545
 Jz4Ibnsq+G5Ehm1N2AkaZuK9V9nYv85E7Z/UNiK4mkDOApE6OMS+q3d86BhqgPg5
 g/HL3HNXAtIY74tBYAac5tAQglT+283LuTpEPt9BEjNM7QxKg/ecXO7lwtn7Boku
 dACMqeuMHbLyru8uhbun/VBx1gca7HIhW1cvXO5OoR7o78fHpEFivjJ0B0OuSYAI
 y+/DsA41OlkWSEnboUs+zTQgFatqxQPke92xpGOJtjVVZRYHRqxcPtw9WFmoVqWA
 HeczDQLcSUhqbKSfr6X9BO2u3qxys5BzmImTKMqXEQ4d8Kk0QXbJgGYGfS8+ASey
 Am/jwUP3Cvzs99NxLH5gECKRSuTx3rY7nRGaIBYa+Ui575bdSF8sVAF13riB2mBp
 NJq2Pw0D36WcX7ecaC2Fk2ezkphbeuAr8E7gh/Mt/oVxjrfwRGfPMrnIwKygUydw
 1W5x+WZ+WsY=
 =TBTY
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "The biggest change to core locking facilities in this cycle is the
  introduction of local_lock_t - this primitive comes from the -rt
  project and identifies CPU-local locking dependencies normally handled
  opaquely beind preempt_disable() or local_irq_save/disable() critical
  sections.

  The generated code on mainline kernels doesn't change as a result, but
  still there are benefits: improved debugging and better documentation
  of data structure accesses.

  The new local_lock_t primitives are introduced and then utilized in a
  couple of kernel subsystems. No change in functionality is intended.

  There's also other smaller changes and cleanups"

* tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  zram: Use local lock to protect per-CPU data
  zram: Allocate struct zcomp_strm as per-CPU memory
  connector/cn_proc: Protect send_msg() with a local lock
  squashfs: Make use of local lock in multi_cpu decompressor
  mm/swap: Use local_lock for protection
  radix-tree: Use local_lock for protection
  locking: Introduce local_lock()
  locking/lockdep: Replace zero-length array with flexible-array
  locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()
2020-06-01 13:03:31 -07:00
Linus Torvalds
2227e5b21a The RCU updates for this cycle were:
- RCU-tasks update, including addition of RCU Tasks Trace for
    BPF use and TASKS_RUDE_RCU
  - kfree_rcu() updates.
  - Remove scheduler locking restriction
  - RCU CPU stall warning updates.
  - Torture-test updates.
  - Miscellaneous fixes and other updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7U/r0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hSNxAAirKhPGBoLI9DW1qde4OFhZg+BlIpS+LD
 IE/0eGB8hGwhb1793RGbzIJfSnRQpSOPxWbWc6DJZ4Zpi5/ZbVkiPKsuXpM1xGxs
 kuBCTOhWy1/p3iCZ1JH/JCrCAdWGZkIzEoaV7ipnHtV/+UrRbCWH5PB7R0fYvcbI
 q5bUcWJyEp/bYMxQn8DhAih6SLPHx+F9qaGAqqloLSHstTYG2HkBhBGKnqcd/Jex
 twkLK53poCkeP/c08V1dyagU2IRWj2jGB1NjYh/Ocm+Sn/vru15CVGspjVjqO5FF
 oq07lad357ddMsZmKoM2F5DhXbOh95A+EqF9VDvIzCvfGMUgqYI1oxWF4eycsGhg
 /aYJgYuN23YeEe2DkDzJB67GvBOwl4WgdoFaxKRzOiCSfrhkM8KqM4G9Fz1JIepG
 abRJCF85iGcLslU9DkrShQiDsd/CRPzu/jz6ybK0I2II2pICo6QRf76T7TdOvKnK
 yXwC6OdL7/dwOht20uT6XfnDXMCWI4MutiUrb8/C1DbaihwEaI2denr3YYL+IwrB
 B38CdP6sfKZ5UFxKh0xb+sOzWrw0KA+ThSAXeJhz3tKdxdyB6nkaw3J9lFg8oi20
 XGeAujjtjMZG5cxt2H+wO9kZY0RRau/nTqNtmmRrCobd5yJjHHPHH8trEd0twZ9A
 X5Wjh11lv3E=
 =Yisx
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The RCU updates for this cycle were:

   - RCU-tasks update, including addition of RCU Tasks Trace for BPF use
     and TASKS_RUDE_RCU

   - kfree_rcu() updates.

   - Remove scheduler locking restriction

   - RCU CPU stall warning updates.

   - Torture-test updates.

   - Miscellaneous fixes and other updates"

* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  rcu: Allow for smp_call_function() running callbacks from idle
  rcu: Provide rcu_irq_exit_check_preempt()
  rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
  rcu: Provide __rcu_is_watching()
  rcu: Provide rcu_irq_exit_preempt()
  rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  rcu/tree: Mark the idle relevant functions noinstr
  x86: Replace ist_enter() with nmi_enter()
  x86/mce: Send #MC singal from task work
  x86/entry: Get rid of ist_begin/end_non_atomic()
  sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
  lockdep: Always inline lockdep_{off,on}()
  hardirq/nmi: Allow nested nmi_enter()
  arm64: Prepare arch_nmi_enter() for recursion
  printk: Disallow instrumenting print_nmi_enter()
  printk: Prepare for nested printk_nmi_enter()
  rcutorture: Convert ULONG_CMP_LT() to time_before()
  torture: Add a --kasan argument
  torture: Save a few lines by using config_override_param initially
  ...
2020-06-01 12:56:29 -07:00
Linus Torvalds
0bd957eb11 Various kprobes updates, mostly centered around cleaning up the no-instrumentation
logic, instead of the current per debug facility blacklist, use the more generic
 .noinstr.text approach, combined with a 'noinstr' marker for functions.
 
 Also add instrumentation_begin()/end() to better manage the exact place in entry
 code where instrumentation may be used.
 
 Also add a kprobes blacklist for modules.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7U/KERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h6xg//bnWhJzrxlOr89d7c5pEUeZehTscZ4OxU
 HyiWnfgd6bHJGHiB8TRHZInJFys/Y0UG+xzQvCP2YCIHW42tguD3u0wQ1rOrA6im
 VkDxUwHn72avqnBq+knMwtqiKQjxJrPe+YpikWOgb4B+9jQwLARzTArhs+aoWBRn
 a9jRP1jcuS26F/9wxctFoHVvKZ7Vv+HCgtNzequHsd1e0J8ElvDRk+QkfkaZopl5
 cQ44TIfzR8xjJuGqW45hXwOw5PPjhZHwytSoFquSMb57txoWL2devn7S38VaCWv7
 /fqmQAnQqlW5eG5ipJ0zWY1n0uLZLRrIecfA1INY8fdJeFFr6cxaN6FM1GhVZ93I
 GjZZFYwxDv9IftpeSyCaIzF1zISV+as3r9sMKMt89us77XazRiobjWCi1aE9a1rX
 QRv1nTjmypWg65IMV+nfIT26riP6YXSZ3uXQJPwm+kzEjJJl0LSi2AfjWQadcHeZ
 Z8svSIepP4oJBJ9tJlZ3K7kHBV3E0G4SV3fnHaUYGrp9gheqhe33U0VWfILcvq7T
 zIhtZXzqRGaMKuw0IFy2xITCQyEZAXwTedtSSeyXt0CN/hwhaxbrd38HhKOBw8WH
 k+OAmXZ+lgSO5ZvkoxgV6QgHtjsif3ICcHNelJtcbRA80/3oj/QwJ5dAVR61EDZa
 3Jn8mMxvCn0=
 =25Vr
 -----END PGP SIGNATURE-----

Merge tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull kprobes updates from Ingo Molnar:
 "Various kprobes updates, mostly centered around cleaning up the
  no-instrumentation logic.

  Instead of the current per debug facility blacklist, use the more
  generic .noinstr.text approach, combined with a 'noinstr' marker for
  functions.

  Also add instrumentation_begin()/end() to better manage the exact
  place in entry code where instrumentation may be used.

  And add a kprobes blacklist for modules"

* tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Prevent probes in .noinstr.text section
  vmlinux.lds.h: Create section for protection against instrumentation
  samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers.
  kprobes: Support NOKPROBE_SYMBOL() in modules
  kprobes: Support __kprobes blacklist in modules
  kprobes: Lock kprobe_mutex while showing kprobe_blacklist
2020-06-01 12:45:04 -07:00
Linus Torvalds
ca1f5df23f Printk changes for 5.8
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl7U1TIACgkQUqAMR0iA
 lPI0mQ//TcVlRJgts/iwv0M2Simje28t9tziOHgWmEeiyGwE7vwDPDzt8QFiKzBa
 IrJ5iTRMtCrEF+eapqeH4g+Ve4Npm5Cobl8/h9JEiVu3SNC48TuaiUzU3+Bfl1zV
 vcDfZtN9QD1/CdLGlyKO75xjkCOaJRCFnx5ToXnd3llshMKI2XebUCnEH4TDe6Fz
 NGTjJL4kCPwzmae++UhlMfKwkayBtNbqcLkaTb7d67Tw2DcuuIVixUER77oC9QPN
 SfxdS07s0UVc4C9bCVe3KtYZR5YU/riOjKNJNutzP3JDtQNugywrrtI0qBwEisqM
 puMJ3xLeLssTn10FJxRK9ewRlXy2zT9mmcCuaVU6LtiyGnHOuEwIU+Ewu0itiJGu
 JuFovsNqTvOiZqFP7+pkDktOjffF2hsY/a6NxHr6aof8CrdO2w9dmCAgzGvnwV1N
 /zlmmPSEigVLz47eeivIIdQrPUejrEV7g1wOYYApnIlNCmGjdkGnXKNaUrLGDehQ
 QIlpx1uvmhntjiw1hTSbOV7KOQxLGtuy6DWLYC7uHD3H3aGDQyQO8dZYXSfOY1Qu
 cvQ/K8ykW0Kq2JKEHjkwSRHKDpxhvbQg7N5JCHQA49ahdZD3O5bOfeTofw+7nFO0
 j9g2Qv6nWFjLbAIAFtqjh6wz0UtGNoTl2cqQswsS10wlDcsq8vs=
 =pUBG
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Benjamin Herrenschmidt solved a problem with non-matched console
   aliases by first checking consoles defined on the command line. It is
   a more conservative approach than the previous attempts.

 - Benjamin also made sure that the console accessible via /dev/console
   always has CON_CONSDEV flag.

 - Andy Shevchenko added the %ptT modifier for printing struct time64_t.
   It extends the existing %ptR handling for struct rtc_time.

 - Bruno Meneguele fixed /dev/kmsg error value returned by unsupported
   SEEK_CUR.

 - Tetsuo Handa removed unused pr_cont_once().

... and a few small fixes.

* tag 'printk-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Remove pr_cont_once()
  printk: handle blank console arguments passed in.
  kernel/printk: add kmsg SEEK_CUR handling
  printk: Fix a typo in comment "interator"->"iterator"
  usb: pulse8-cec: Switch to use %ptT
  ARM: bcm2835: Switch to use %ptT
  lib/vsprintf: Print time64_t in human readable format
  lib/vsprintf: update comment about simple_strto<foo>() functions
  printk: Correctly set CON_CONSDEV even when preferred console was not registered
  printk: Fix preferred console selection with multiple matches
  printk: Move console matching logic into a separate function
  printk: Convert a use of sprintf to snprintf in console_unlock
2020-06-01 12:13:30 -07:00
Linus Torvalds
829f3b9401 Fixes and new features for pstore
- refactor pstore locking for safer module unloading (Kees Cook)
 - remove orphaned records from pstorefs when backend unloaded (Kees Cook)
 - refactor dump_oops parameter into max_reason (Pavel Tatashin)
 - introduce pstore/zone for common code for contiguous storage (WeiXiong Liao)
 - introduce pstore/blk for block device backend (WeiXiong Liao)
 - introduce mtd backend (WeiXiong Liao)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl7UbYYWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpkgD/9/09OkJIWydwk2lr2T89HW5fSF
 5uBT0a309/QDUpnV9yhcRsrESEicnvbtaGxD0kuYIInkiW/2cj1l689EkyRjUmy9
 q3z4GzLqOlC7qvd7LUPFNGHmllBb09H/CxmXDxRP3aynB9oHzdpNQdPcpLBDA00r
 0byp/AE48dFbKIhtT0QxpGUYZFOlyc7XVAaOkED4bmu148gx8q7MU1AxFgbx0Feb
 9iPV0r6XYMgXJZ3sn/3PJsxF0V/giDSJ8ui2xsYRjCE408zVIYLdDs2e8dz+2yW6
 +3Lyankgo+ofZc4XYExTYgn3WjhPFi+pjVRUaj+BcyTk9SLNIj2WmZdmcLMuzanh
 BaUurmED7ffTtlsH4PhQgn8/OY4FX2PO2MwUHwlU+87Y8YDiW0lpzTq5H822OO8p
 QQ8awql/6lLCJuyzuWIciVUsS65MCPxsZ4+LSiMZzyYpWu1sxrEY8ic3agzCgsA0
 0i+4nZFlLG+Aap/oiKpegenkIyAunn2tDXAyFJFH6qLOiZJ78iRuws3XZqjCElhJ
 XqvyDJIfjkJhWUb++ckeqX7ThOR4CPSnwba/7GHv7NrQWuk3Cn+GQ80oxydXUY6b
 2/4eYjq0wtvf9NeuJ4/LYNXotLR/bq9zS0zqwTWG50v+RPmuC3bNJB+RmF7fCiCG
 jo1Sd1LMeTQ7bnULpA==
 =7s1u
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "Fixes and new features for pstore.

  This is a pretty big set of changes (relative to past pstore pulls),
  but it has been in -next for a while. The biggest change here is the
  ability to support a block device as a pstore backend, which has been
  desired for a while. A lot of additional fixes and refactorings are
  also included, mostly in support of the new features.

   - refactor pstore locking for safer module unloading (Kees Cook)

   - remove orphaned records from pstorefs when backend unloaded (Kees
     Cook)

   - refactor dump_oops parameter into max_reason (Pavel Tatashin)

   - introduce pstore/zone for common code for contiguous storage
     (WeiXiong Liao)

   - introduce pstore/blk for block device backend (WeiXiong Liao)

   - introduce mtd backend (WeiXiong Liao)"

* tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits)
  mtd: Support kmsg dumper based on pstore/blk
  pstore/blk: Introduce "best_effort" mode
  pstore/blk: Support non-block storage devices
  pstore/blk: Provide way to query pstore configuration
  pstore/zone: Provide way to skip "broken" zone for MTD devices
  Documentation: Add details for pstore/blk
  pstore/zone,blk: Add ftrace frontend support
  pstore/zone,blk: Add console frontend support
  pstore/zone,blk: Add support for pmsg frontend
  pstore/blk: Introduce backend for block devices
  pstore/zone: Introduce common layer to manage storage zones
  ramoops: Add "max-reason" optional field to ramoops DT node
  pstore/ram: Introduce max_reason and convert dump_oops
  pstore/platform: Pass max_reason to kmesg dump
  printk: Introduce kmsg_dump_reason_str()
  printk: honor the max_reason field in kmsg_dumper
  printk: Collapse shutdown types into a single dump reason
  pstore/ftrace: Provide ftrace log merging routine
  pstore/ram: Refactor ftrace buffer merging
  pstore/ram: Refactor DT size parsing
  ...
2020-06-01 12:07:34 -07:00
Linus Torvalds
81e8c10dac Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Introduce crypto_shash_tfm_digest() and use it wherever possible.
   - Fix use-after-free and race in crypto_spawn_alg.
   - Add support for parallel and batch requests to crypto_engine.

  Algorithms:
   - Update jitter RNG for SP800-90B compliance.
   - Always use jitter RNG as seed in drbg.

  Drivers:
   - Add Arm CryptoCell driver cctrng.
   - Add support for SEV-ES to the PSP driver in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
  crypto: hisilicon - fix driver compatibility issue with different versions of devices
  crypto: engine - do not requeue in case of fatal error
  crypto: cavium/nitrox - Fix a typo in a comment
  crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
  crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
  crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
  crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
  crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
  crypto: hisilicon/qm - add debugfs to the QM state machine
  crypto: hisilicon/qm - add debugfs for QM
  crypto: stm32/crc32 - protect from concurrent accesses
  crypto: stm32/crc32 - don't sleep in runtime pm
  crypto: stm32/crc32 - fix multi-instance
  crypto: stm32/crc32 - fix run-time self test issue.
  crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
  crypto: hisilicon/zip - Use temporary sqe when doing work
  crypto: hisilicon - add device error report through abnormal irq
  crypto: hisilicon - remove codes of directly report device errors through MSI
  crypto: hisilicon - QM memory management optimization
  crypto: hisilicon - unify initial value assignment into QM
  ...
2020-06-01 12:00:10 -07:00
Lai Jiangshan
10cdb15759 workqueue: use BUILD_BUG_ON() for compile time test instead of WARN_ON()
Any runtime WARN_ON() has to be fixed, and BUILD_BUG_ON() can
help you nitice it earlier.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-06-01 11:02:42 -04:00
Rafael J. Wysocki
be6018a44c Merge branches 'pm-core' and 'pm-sleep'
* pm-core:
  PM: runtime: Replace pm_runtime_callbacks_present()
  PM: runtime: clk: Fix clk_pm_runtime_get() error path
  PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()

* pm-sleep:
  PM: hibernate: Restrict writes to the resume device
  PM: hibernate: Split off snapshot dev option
  PM: hibernate: Incorporate concurrency handling
  PM: sleep: Helpful edits for devices.rst documentation
  Documentation: PM: sleep: Update driver flags documentation
  PM: sleep: core: Rename DPM_FLAG_LEAVE_SUSPENDED
  PM: sleep: core: Rename DPM_FLAG_NEVER_SKIP
  PM: sleep: core: Rename dev_pm_smart_suspend_and_suspended()
  PM: sleep: core: Rename dev_pm_may_skip_resume()
  PM: sleep: core: Rework the power.may_skip_resume handling
  PM: sleep: core: Do not skip callbacks in the resume phase
  PM: sleep: core: Fold functions into their callers
  PM: sleep: core: Simplify the SMART_SUSPEND flag handling
2020-06-01 15:19:08 +02:00
Steven Rostedt (VMware)
c200784a08 tracing: Add a trace print when traceoff_on_warning is triggered
When "traceoff_on_warning" is enabled and a warning happens, there can still
be many trace events happening on other CPUs between the time the warning
occurred and the last trace event on that same CPU. This can cause confusion
in examining the trace, as it may not be obvious where the warning happened.
By adding a trace print into the trace just before disabling tracing, it
makes it obvious where the warning occurred, and the developer doesn't have
to look at other means to see what CPU it occurred on.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-01 08:23:47 -04:00
Tom Zanussi
726721a518 tracing: Move synthetic events to a separate file
With the addition of the in-kernel synthetic event API, synthetic
events are no longer specifically tied to the histogram triggers.

The synthetic event code is also making trace_event_hist.c very
bloated, so for those reasons, move it to a separate file,
trace_events_synth.c, along with a new trace_synth.h header file.

Because synthetic events are now independent from hist triggers, add a
new CONFIG_SYNTH_EVENTS config option, and have CONFIG_HIST_TRIGGERS
select it, and have CONFIG_SYNTH_EVENT_GEN_TEST depend on it.

Link: http://lkml.kernel.org/r/4d1fa1f85ed5982706ac44844ac92451dcb04715.1590693308.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-01 08:23:22 -04:00
Tom Zanussi
2d19bd79ae tracing: Add hist_debug trace event files for histogram debugging
Add a new "hist_debug" file for each trace event, which when read will
dump out a bunch of internal details about the hist triggers defined
on that event.

This is normally off but can be enabled by saying 'y' to the new
CONFIG_HIST_TRIGGERS_DEBUG config option.

This is in support of the new Documentation file describing histogram
internals, Documentation/trace/histogram-design.rst, which was
requested by developers trying to understand the internals when
extending or making use of the hist triggers for higher-level tools.

The histogram-design.rst documentation refers to the hist_debug files
and demonstrates their use with output in the test examples.

Link: http://lkml.kernel.org/r/77914c22b0ba493d9783c53bbfbc6087d6a7e1b1.1585941485.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-01 08:22:30 -04:00
Tom Zanussi
1b94b3aed3 tracing: Check state.disabled in synth event trace functions
Since trace_state.disabled is set in __synth_event_trace_start() at
the same time -ENOENT is returned, don't bother returning -ENOENT -
just have callers check trace_state.disabled instead, and avoid the
extra return val munging.

Link: http://lkml.kernel.org/r/87315c3889af870e8370e82b76cf48b426d70130.1585941485.git.zanussi@kernel.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@godmis.org>
2020-06-01 08:21:08 -04:00
Ingo Molnar
cb3cb6733f Merge branch 'WIP.core/rcu' into core/rcu, to pick up two x86/entry dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-01 11:47:29 +02:00
Petr Mladek
d053cf0d77 Merge branch 'for-5.8' into for-linus 2020-06-01 10:15:16 +02:00
Petr Mladek
6a0af9fc8c Merge branch 'for-5.7-preferred-console' into for-linus 2020-06-01 10:13:51 +02:00