5196 Commits

Author SHA1 Message Date
Linus Torvalds
78273df7f6 header cleanups for 6.8
The goal is to get sched.h down to a type only header, so the main thing
 happening in this patchset is splitting out various _types.h headers and
 dependency fixups, as well as moving some things out of sched.h to
 better locations.
 
 This is prep work for the memory allocation profiling patchset which
 adds new sched.h interdepencencies.
 
 Testing - it's been in -next, and fixes from pretty much all
 architectures have percolated in - nothing major.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWfBwwACgkQE6szbY3K
 bnZPwBAAmuRojXaeWxi01IPIOehSGDe68vw44PR9glEMZvxdnZuPOdvE4/+245/L
 bRKU2WBCjBUokUbV9msIShwRkFTZAmEMPNfPAAsFMA+VXeDYHKB+ZRdwTggNAQ+I
 SG6fZgh5m0HsewCDxU8oqVHkjVq4fXn0cy+aL6xLEd9gu67GoBzX2pDieS2Kvy6j
 jnyoKTxFwb+LTQgph0P4EIpq5I2umAsdLwdSR8EJ+8e9NiNvMo1pI00Lx/ntAnFZ
 JftWUJcMy3TQ5u1GkyfQN9y/yThX1bZK5GvmHS9SJ2Dkacaus5d+xaKCHtRuFS1I
 7C6b8PsNgRczUMumBXus44HdlNfNs1yU3lvVxFvBIPE1qC9pYRHrkWIXXIocXLLC
 oxTEJ6B2G3BQZVQgLIA4fOaxMVhmvKffi/aEZLi9vN9VVosd1a6XNKI6KbyRnXFp
 GSs9qDqszhn5I3GYNlDNQTc/8UsRlhPFgS6nS0By6QnvxtGi9QkU2tBRBsXvqwCy
 cLoCYIhc2tvugHvld70dz26umiJ4rnmxGlobStNoigDvIKAIUt1UmIdr1so8P8eH
 xehnL9ZcOX6xnANDL0AqMFFHV6I58CJynhFdUoXfVQf/DWLGX48mpi9LVNsYBzsI
 CAwVOAQ0UjGrpdWmJ9ueY/ABYqg9vRjzaDEXQ+MhAYO55CLaVsg=
 =3tyT
 -----END PGP SIGNATURE-----

Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs

Pull header cleanups from Kent Overstreet:
 "The goal is to get sched.h down to a type only header, so the main
  thing happening in this patchset is splitting out various _types.h
  headers and dependency fixups, as well as moving some things out of
  sched.h to better locations.

  This is prep work for the memory allocation profiling patchset which
  adds new sched.h interdepencencies"

* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
  Kill sched.h dependency on rcupdate.h
  kill unnecessary thread_info.h include
  Kill unnecessary kernel.h include
  preempt.h: Kill dependency on list.h
  rseq: Split out rseq.h from sched.h
  LoongArch: signal.c: add header file to fix build error
  restart_block: Trim includes
  lockdep: move held_lock to lockdep_types.h
  sem: Split out sem_types.h
  uidgid: Split out uidgid_types.h
  seccomp: Split out seccomp_types.h
  refcount: Split out refcount_types.h
  uapi/linux/resource.h: fix include
  x86/signal: kill dependency on time.h
  syscall_user_dispatch.h: split out *_types.h
  mm_types_task.h: Trim dependencies
  Split out irqflags_types.h
  ipc: Kill bogus dependency on spinlock.h
  shm: Slim down dependencies
  workqueue: Split out workqueue_types.h
  ...
2024-01-10 16:43:55 -08:00
Linus Torvalds
063a7ce32d lsm/stable-6.8 PR 20240105
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmWYKUIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNyHw/+IKnqL1MZ5QS+/HtSzi4jCL47N9yZ
 OHLol6XswyEGHH9myKPPGnT5lVA93v98v4ty2mws7EJUSGZQQUntYBPbU9Gi40+B
 XDzYSRocoj96sdlKeOJMgaWo3NBRD9HYSoGPDNWZixy6m+bLPk/Dqhn3FabKf1lo
 2qQSmstvChFRmVNkmgaQnBCAtWVqla4EJEL0EKX6cspHbuzRNTeJdTPn6Q/zOUVL
 O2znOZuEtSVpYS7yg3uJT0hHD8H0GnIciAcDAhyPSBL5Uk5l6gwJiACcdRfLRbgp
 QM5Z4qUFdKljV5XBCzYnfhhrx1df08h1SG84El8UK8HgTTfOZfYmawByJRWNJSQE
 TdCmtyyvEbfb61CKBFVwD7Tzb9/y8WgcY5N3Un8uCQqRzFIO+6cghHri5NrVhifp
 nPFlP4klxLHh3d7ZVekLmCMHbpaacRyJKwLy+f/nwbBEID47jpPkvZFIpbalat+r
 QaKRBNWdTeV+GZ+Yu0uWsI029aQnpcO1kAnGg09fl6b/dsmxeKOVWebir25AzQ++
 a702S8HRmj80X+VnXHU9a64XeGtBH7Nq0vu0lGHQPgwhSx/9P6/qICEPwsIriRjR
 I9OulWt4OBPDtlsonHFgDs+lbnd0Z0GJUwYT8e9pjRDMxijVO9lhAXyglVRmuNR8
 to2ByKP5BO+Vh8Y=
 =Py+n
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull security module updates from Paul Moore:

 - Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and
   lsm_set_self_attr().

   The first syscall simply lists the LSMs enabled, while the second and
   third get and set the current process' LSM attributes. Yes, these
   syscalls may provide similar functionality to what can be found under
   /proc or /sys, but they were designed to support multiple,
   simultaneaous (stacked) LSMs from the start as opposed to the current
   /proc based solutions which were created at a time when only one LSM
   was allowed to be active at a given time.

   We have spent considerable time discussing ways to extend the
   existing /proc interfaces to support multiple, simultaneaous LSMs and
   even our best ideas have been far too ugly to support as a kernel
   API; after +20 years in the kernel, I felt the LSM layer had
   established itself enough to justify a handful of syscalls.

   Support amongst the individual LSM developers has been nearly
   unanimous, with a single objection coming from Tetsuo (TOMOYO) as he
   is worried that the LSM_ID_XXX token concept will make it more
   difficult for out-of-tree LSMs to survive. Several members of the LSM
   community have demonstrated the ability for out-of-tree LSMs to
   continue to exist by picking high/unused LSM_ID values as well as
   pointing out that many kernel APIs rely on integer identifiers, e.g.
   syscalls (!), but unfortunately Tetsuo's objections remain.

   My personal opinion is that while I have no interest in penalizing
   out-of-tree LSMs, I'm not going to penalize in-tree development to
   support out-of-tree development, and I view this as a necessary step
   forward to support the push for expanded LSM stacking and reduce our
   reliance on /proc and /sys which has occassionally been problematic
   for some container users. Finally, we have included the linux-api
   folks on (all?) recent revisions of the patchset and addressed all of
   their concerns.

 - Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit
   ioctls on 64-bit systems problem.

   This patch includes support for all of the existing LSMs which
   provide ioctl hooks, although it turns out only SELinux actually
   cares about the individual ioctls. It is worth noting that while
   Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this
   patch, they did both indicate they are okay with the changes.

 - Fix a potential memory leak in the CALIPSO code when IPv6 is disabled
   at boot.

   While it's good that we are fixing this, I doubt this is something
   users are seeing in the wild as you need to both disable IPv6 and
   then attempt to configure IPv6 labeled networking via
   NetLabel/CALIPSO; that just doesn't make much sense.

   Normally this would go through netdev, but Jakub asked me to take
   this patch and of all the trees I maintain, the LSM tree seemed like
   the best fit.

 - Update the LSM MAINTAINERS entry with additional information about
   our process docs, patchwork, bug reporting, etc.

   I also noticed that the Lockdown LSM is missing a dedicated
   MAINTAINERS entry so I've added that to the pull request. I've been
   working with one of the major Lockdown authors/contributors to see if
   they are willing to step up and assume a Lockdown maintainer role;
   hopefully that will happen soon, but in the meantime I'll continue to
   look after it.

 - Add a handful of mailmap entries for Serge Hallyn and myself.

* tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits)
  lsm: new security_file_ioctl_compat() hook
  lsm: Add a __counted_by() annotation to lsm_ctx.ctx
  calipso: fix memory leak in netlbl_calipso_add_pass()
  selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test
  MAINTAINERS: add an entry for the lockdown LSM
  MAINTAINERS: update the LSM entry
  mailmap: add entries for Serge Hallyn's dead accounts
  mailmap: update/replace my old email addresses
  lsm: mark the lsm_id variables are marked as static
  lsm: convert security_setselfattr() to use memdup_user()
  lsm: align based on pointer length in lsm_fill_user_ctx()
  lsm: consolidate buffer size handling into lsm_fill_user_ctx()
  lsm: correct error codes in security_getselfattr()
  lsm: cleanup the size counters in security_getselfattr()
  lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation
  lsm: drop LSM_ID_IMA
  LSM: selftests for Linux Security Module syscalls
  SELinux: Add selfattr hooks
  AppArmor: Add selfattr hooks
  Smack: implement setselfattr and getselfattr hooks
  ...
2024-01-09 12:57:46 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Linus Torvalds
bfe8eb3b85 Scheduler changes for v6.8:
- Energy scheduling:
 
     - Consolidate how the max compute capacity is
       used in the scheduler and how we calculate
       the frequency for a level of utilization.
 
     - Rework interface between the scheduler and
       the schedutil governor
 
     - Simplify the util_est logic
 
  - Deadline scheduler:
 
     - Work more towards reducing SCHED_DEADLINE
       starvation of low priority tasks (e.g., SCHED_OTHER)
       tasks when higher priority tasks monopolize CPU
       cycles, via the introduction of 'deadline servers'
       (nested/2-level scheduling).
       "Fair servers" to make use of this facility are
       not introduced yet.
 
  - EEVDF:
 
     - Introduce O(1) fastpath for EEVDF task selection
 
  - NUMA balancing:
 
     - Tune the NUMA-balancing vma scanning logic some more,
       to better distribute the probability
       of a particular vma getting scanned.
 
  - Plus misc fixes, cleanups and updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWcASMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jLbg/+NOwF18M6klF1/3jUaV1PU09vRzYnnA7w
 oF7Tru7JLV+/vZK+rwI1zxzj5Nj3sVBQPIyp1embEHx7Z/QH8MIaIVpcSFsDDCYY
 Q8n6ZVRB+lKWEo5+Ti6JEJftDAWuLHXwFWDa57oWPuR0Tc736+zYHUfj7jdKk0RI
 nT/lnOT6hXU8q26O4QFrBrrhvCCxc4byo7buKPQfqie0bDA70ppIWkFQoQME6mvQ
 US9jvOyUipOiPV06DPwFvPDJUQBGq2VdJNk+5zCEtcqEfLREuo/Xq1Ww1x1BWaZI
 761532EuDo73iMK4IFZrvVmj1ioz957qbje11MSSkDdKj692xxjXyvnY0NBvZuho
 Ueog/jQ4D4I2qu7pPSCF8UfnI/Hw4Q+KJ89j3pcywRm4hmCTf9k3MGpAaVLVxH7G
 e5REZ5MSsFZi4Cs+zF87Of5KCKLhTr1qSetNtShinKahg06WZ+MZ8tW4jb52qy0j
 F8PMlvfBI3f7SOtA8s2P26mDGQ21YQehN2d5P+Fbwj/U3fjIlSTOyx6NwLpFwYaS
 Vf+fctchGFV1Sh7c2JjCh+ecYfXx3ghT/pvyPOImJtxtCKSRUQ8c26ApC1OsWfOE
 FdHv4f2dPqcyswCZzIv/2fyDXc9eaS2E05EMDNqVuMCGnzidzSs81n7hBioNMrnH
 ZgHK90TmEbw=
 =wTVh
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Energy scheduling:

   - Consolidate how the max compute capacity is used in the scheduler
     and how we calculate the frequency for a level of utilization.

   - Rework interface between the scheduler and the schedutil governor

   - Simplify the util_est logic

  Deadline scheduler:

   - Work more towards reducing SCHED_DEADLINE starvation of low
     priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks
     monopolize CPU cycles, via the introduction of 'deadline servers'
     (nested/2-level scheduling).

     "Fair servers" to make use of this facility are not introduced yet.

  EEVDF:

   - Introduce O(1) fastpath for EEVDF task selection

  NUMA balancing:

   - Tune the NUMA-balancing vma scanning logic some more, to better
     distribute the probability of a particular vma getting scanned.

  Plus misc fixes, cleanups and updates"

* tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  sched/fair: Fix tg->load when offlining a CPU
  sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair()
  sched/fair: Use all little CPUs for CPU-bound workloads
  sched/fair: Simplify util_est
  sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  arm64/amu: Use capacity_ref_freq() to set AMU ratio
  cpufreq/cppc: Set the frequency used for computing the capacity
  cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}()
  energy_model: Use a fixed reference frequency
  cpufreq/schedutil: Use a fixed reference frequency
  cpufreq: Use the fixed and coherent frequency for scaling capacity
  sched/topology: Add a new arch_scale_freq_ref() method
  freezer,sched: Clean saved_state when restoring it during thaw
  sched/fair: Update min_vruntime for reweight_entity() correctly
  sched/doc: Update documentation after renames and synchronize Chinese version
  sched/cpufreq: Rework iowait boost
  sched/cpufreq: Rework schedutil governor performance estimation
  sched/pelt: Avoid underestimation of task utilization
  sched/timers: Explain why idle task schedules out on remote timer enqueue
  sched/cpuidle: Comment about timers requirements VS idle handler
  ...
2024-01-08 19:49:17 -08:00
Florian Fainelli
f0a78b3e2a arm64: Update __NR_compat_syscalls for statmount/listmount
Commit d8b0f5465012 ("wire up syscalls for statmount/listmount") added
two new system calls to arch/arm64/include/asm/unistd32.h but forgot to
update the __NR_compat_syscalls number, thus causing the following build
failures:

  arch/arm64/include/asm/unistd32.h:922:24: error: array index in initializer exceeds array bounds
    922 | #define __NR_statmount 457
        |                        ^~~
  arch/arm64/kernel/sys32.c:130:34: note: in definition of macro '__SYSCALL'
    130 | #define __SYSCALL(nr, sym)      [nr] = __arm64_##sym,
        |                                  ^~

Bump up the number by two to accomodate for the new system calls added.

Fixes: d8b0f5465012 ("wire up syscalls for statmount/listmount")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08 18:09:54 -08:00
Linus Torvalds
ab5f3fcb7c arm64 updates for 6.8
* for-next/cpufeature
 
   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde
     Thunder-X machines.
   - Avoid mapping KPTI trampoline when it is not required.
   - Make CPU capability API more robust during early initialisation.
 
 * for-next/early-idreg-overrides
 
   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped.
 
 * for-next/fpsimd
 
   - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd
     code sequences in the kernel with pre-emption enabled.
 
 * for-next/kbuild
 
   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y.
   - Makefile cleanups.
 
 * for-next/lpa2-prep
 
   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).
 
 * for-next/misc
 
   - Remove dead code and fix a typo.
 
 * for-next/mm
 
   - Pass NUMA node information for IRQ stack allocations.
 
 * for-next/perf
 
   - Add perf support for the Synopsys DesignWare PCIe PMU.
   - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced
     in Armv8.8.
   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
   - Minor PMU driver fixes and optimisations.
 
 * for-next/rip-vpipt
 
   - Remove what support we had for the obsolete VPIPT I-cache policy.
 
 * for-next/selftests
 
   - Improvements to the SVE and SME selftests.
 
 * for-next/stacktrace
 
   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing.
 
 * for-next/sysregs
 
   - Update a bunch of register definitions based on the latest XML drop
     from Arm.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWWvKYQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIiTB/9agZBkEhZjP2sNDGyE4UFwawweWHkt2r8h
 WyvdwP91Z/AIsYSsGYu36J0l4pOnMKp/i6t+rt031SK4j+Q8hJYhSfDt3RvVbc0/
 Pz9D18V6cLrfq+Yxycqq9ufVdjs+m+CQ5WeLaRGmNIyEzJ/Jv/qrAN+2r603EeLP
 nq08qMZhDIQd2ZzbigCnGaNrTsVSafFfBFv1GsgDvnMZAjs1G6457A6zu+NatNUc
 +TMSG+3EawutHZZ2noXl0Ra7VOfIbVZFiUssxRPenKQByHHHR+QB2c/O1blri+dm
 XLMutvqO2/WvYGIfXO5koqZqvpVeR3zXxPwmGi5hQBsmOjtXzKd+
 =U4mo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "CPU features:

   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
     olde Thunder-X machines

   - Avoid mapping KPTI trampoline when it is not required

   - Make CPU capability API more robust during early initialisation

  Early idreg overrides:

   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped

  FPsimd:

   - Restore kernel-mode fpsimd context lazily, allowing us to run
     fpsimd code sequences in the kernel with pre-emption enabled

  KBuild:

   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y

   - Makefile cleanups

  LPA2 prep:

   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).

  Misc:

   - Remove dead code and fix a typo

  MM:

   - Pass NUMA node information for IRQ stack allocations

  Perf:

   - Add perf support for the Synopsys DesignWare PCIe PMU

   - Add support for event counting thresholds (FEAT_PMUv3_TH)
     introduced in Armv8.8

   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.

   - Minor PMU driver fixes and optimisations

  RIP VPIPT:

   - Remove what support we had for the obsolete VPIPT I-cache policy

  Selftests:

   - Improvements to the SVE and SME selftests

  Stacktrace:

   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing

  Sysregs:

   - Update a bunch of register definitions based on the latest XML drop
     from Arm"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  kselftest/arm64: Don't probe the current VL for unsupported vector types
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64: memory: remove duplicated include
  arm: perf: Fix ARCH=arm build with GCC
  arm64: Align boot cpucap handling with system cpucap handling
  arm64: Cleanup system cpucap handling
  MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
  drivers/perf: add DesignWare PCIe PMU driver
  PCI: Move pci_clear_and_set_dword() helper to PCI header
  PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  arm64: irq: set the correct node for shadow call stack
  Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  ...
2024-01-08 16:32:09 -08:00
Kirill A. Shutemov
5e0a760b44 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Linus Torvalds
8c9440fea7 vfs-6.8.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0CgAKCRCRxhvAZXjc
 osncAQDSJK0frJL+72NqXxa4YNzivrnuw6fhp5iaDAEqxdm8ygEAoJWyh7Rmkt8G
 drAXWGyGnCYqv7UgC6axLyciid7TxQg=
 =vJuv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
 "This contains the work to retrieve detailed information about mounts
  via two new system calls. This is hopefully the beginning of the end
  of the saga that started with fsinfo() years ago.

  The LWN articles in [1] and [2] can serve as a summary so we can avoid
  rehashing everything here.

  At LSFMM in May 2022 we got into a room and agreed on what we want to
  do about fsinfo(). Basically, split it into pieces. This is the first
  part of that agreement. Specifically, it is concerned with retrieving
  information about mounts. So this only concerns the mount information
  retrieval, not the mount table change notification, or the extended
  filesystem specific mount option work. That is separate work.

  Currently mounts have a 32bit id. Mount ids are already in heavy use
  by libmount and other low-level userspace but they can't be relied
  upon because they're recycled very quickly. We agreed that mounts
  should carry a unique 64bit id by which they can be referenced
  directly. This is now implemented as part of this work.

  The new 64bit mount id is exposed in statx() through the new
  STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
  returned. If it is raised and the kernel supports the new 64bit mount
  id the flag is raised in the result mask and the new 64bit mount id is
  returned. New and old mount ids do not overlap so they cannot be
  conflated.

  Two new system calls are introduced that operate on the 64bit mount
  id: statmount() and listmount(). A summary of the api and usage can be
  found on LWN as well (cf. [3]) but of course, I'll provide a summary
  here as well.

  Both system calls rely on struct mnt_id_req. Which is the request
  struct used to pass the 64bit mount id identifying the mount to
  operate on. It is extensible to allow for the addition of new
  parameters and for future use in other apis that make use of mount
  ids.

  statmount() mimicks the semantics of statx() and exposes a set flags
  that userspace may raise in mnt_id_req to request specific information
  to be retrieved. A statmount() call returns a struct statmount filled
  in with information about the requested mount. Supported requests are
  indicated by raising the request flag passed in struct mnt_id_req in
  the @mask argument in struct statmount.

  Currently we do support:

   - STATMOUNT_SB_BASIC:
     Basic filesystem info

   - STATMOUNT_MNT_BASIC
     Mount information (mount id, parent mount id, mount attributes etc)

   - STATMOUNT_PROPAGATE_FROM
     Propagation from what mount in current namespace

   - STATMOUNT_MNT_ROOT
     Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)

   - STATMOUNT_MNT_POINT
     Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)

   - STATMOUNT_FS_TYPE
     Name of the filesystem type as the magic number isn't enough due to submounts

  The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
  are appended to the end of the struct. Userspace can use the offsets
  in @fs_type, @mnt_root, and @mnt_point to reference those strings
  easily.

  The struct statmount reserves quite a bit of space currently for
  future extensibility. This isn't really a problem and if this bothers
  us we can just send a follow-up pull request during this cycle.

  listmount() is given a 64bit mount id via mnt_id_req just as
  statmount(). It takes a buffer and a size to return an array of the
  64bit ids of the child mounts of the requested mount. Userspace can
  thus choose to either retrieve child mounts for a mount in batches or
  iterate through the child mounts. For most use-cases it will be
  sufficient to just leave space for a few child mounts. But for big
  mount tables having an iterator is really helpful. Iterating through a
  mount table works by setting @param in mnt_id_req to the mount id of
  the last child mount retrieved in the previous listmount() call"

Link: https://lwn.net/Articles/934469 [1]
Link: https://lwn.net/Articles/829212 [2]
Link: https://lwn.net/Articles/950569 [3]

* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  add selftest for statmount/listmount
  fs: keep struct mnt_id_req extensible
  wire up syscalls for statmount/listmount
  add listmount(2) syscall
  statmount: simplify string option retrieval
  statmount: simplify numeric option retrieval
  add statmount(2) syscall
  namespace: extract show_path() helper
  mounts: keep list of mounts in an rbtree
  add unique mount ID
2024-01-08 10:57:34 -08:00
Ingo Molnar
cdb3033e19 Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window
This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-08 12:57:28 +01:00
Will Deacon
db32cf8e28 Merge branch 'for-next/fixes' into for-next/core
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings()
can be updated to use arm64_kernel_unmapped_at_el0() instead of checking
the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly.

* for-next/fixes:
  arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
  perf/arm-cmn: Fail DTC counter allocation correctly
  arm64: Avoid enabling KPTI unnecessarily
2024-01-04 12:32:33 +00:00
Will Deacon
3e8626b4ed Merge branch 'for-next/sysregs' into for-next/core
* for-next/sysregs:
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64/sysreg: Add new system registers for GCS
  arm64/sysreg: Add definition for FPMR
  arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
  arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
  arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
  arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
  arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
  arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
  arm64/sysreg: Add definition for ID_AA64PFR2_EL1
  arm64/sysreg: update CPACR_EL1 register
  arm64/sysreg: add system register POR_EL{0,1}
  arm64/sysreg: Add definition for HAFGRTR_EL2
  arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
2024-01-04 12:28:38 +00:00
Will Deacon
41cff14b03 Merge branch 'for-next/stacktrace' into for-next/core
* for-next/stacktrace:
  arm64: stacktrace: factor out kunwind_stack_walk()
  arm64: stacktrace: factor out kernel unwind state
2024-01-04 12:28:31 +00:00
Will Deacon
30431774fe Merge branch 'for-next/rip-vpipt' into for-next/core
* for-next/rip-vpipt:
  arm64: Rename reserved values for CTR_EL0.L1Ip
  arm64: Kill detection of VPIPT i-cache policy
  KVM: arm64: Remove VPIPT I-cache handling
2024-01-04 12:28:14 +00:00
Will Deacon
65180649fa Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: memory: remove duplicated include
  arm64: Delete the zero_za macro
  Documentation/arch/arm64: Fix typo
2024-01-04 12:27:49 +00:00
Will Deacon
ccaeeec529 Merge branch 'for-next/lpa2-prep' into for-next/core
* for-next/lpa2-prep:
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: mm: Take potential load offset into account when KASLR is off
  arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  arm64: Add ARM64_HAS_LPA2 CPU capability
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
  arm64/mm: Modify range-based tlbi to decrement scale
2024-01-04 12:27:42 +00:00
Will Deacon
88619527b4 Merge branch 'for-next/kbuild' into for-next/core
* for-next/kbuild:
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64: replace <asm-generic/export.h> with <linux/export.h>
  arm64: vdso32: rename 32-bit debug vdso to vdso32.so.dbg
2024-01-04 12:27:35 +00:00
Will Deacon
79eb42b269 Merge branch 'for-next/fpsimd' into for-next/core
* for-next/fpsimd:
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  arm64: fpsimd: Drop unneeded 'busy' flag
2024-01-04 12:27:29 +00:00
Andrey Konovalov
27232ba96c kasan/arm64: improve comments for KASAN_SHADOW_START/END
Patch series "kasan: assorted clean-ups".

Code clean-ups, nothing worthy of being backported to stable.


This patch (of 11):

Unify and improve the comments for KASAN_SHADOW_START/END definitions from
include/asm/kasan.h and include/asm/memory.h.

Also put both definitions together in include/asm/memory.h.

Also clarify the related BUILD_BUG_ON checks in mm/kasan_init.c.

Link: https://lkml.kernel.org/r/cover.1703188911.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/140108ca0b164648c395a41fbeecb0601b1ae9e1.1703188911.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 11:58:43 -08:00
Vincent Guittot
9942cb22ea sched/topology: Add a new arch_scale_freq_ref() method
Create a new method to get a unique and fixed max frequency. Currently
cpuinfo.max_freq or the highest (or last) state of performance domain are
used as the max frequency when computing the frequency for a level of
utilization, but:

  - cpuinfo_max_freq can change at runtime. boost is one example of
    such change.

  - cpuinfo.max_freq and last item of the PD can be different leading to
    different results between cpufreq and energy model.

We need to save the reference frequency that has been used when computing
the CPUs capacity and use this fixed and coherent value to convert between
frequency and CPU's capacity.

In fact, we already save the frequency that has been used when computing
the capacity of each CPU. We extend the precision to save kHz instead of
MHz currently and we modify the type to be aligned with other variables
used when converting frequency to capacity and the other way.

[ mingo: Minor edits. ]

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20231211104855.558096-2-vincent.guittot@linaro.org
2023-12-23 15:52:34 +01:00
Linus Torvalds
a4aebe9365 posix-timers: Get rid of [COMPAT_]SYS_NI() uses
Only the posix timer system calls use this (when the posix timer support
is disabled, which does not actually happen in any normal case), because
they had debug code to print out a warning about missing system calls.

Get rid of that special case, and just use the standard COND_SYSCALL
interface that creates weak system call stubs that return -ENOSYS for
when the system call does not exist.

This fixes a kCFI issue with the SYS_NI() hackery:

  CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
  WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0

Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-20 21:30:27 -08:00
Kent Overstreet
04bc786d66 arm64: Fix circular header dependency
Replace linux/percpu.h include with asm/percpu.h to avoid circular
dependency.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-12-20 19:26:30 -05:00
Fuad Tabba
4ebee8cebd arm64/sysreg: Add missing system instruction definitions for FGT
Add the definitions of missing system instructions that are
trappable by fine grain traps. The definitions are based on
DDI0602 2023-09.

Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20231214100158.2305400-5-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-17 12:11:23 +00:00
Wang Jinchao
5cc5ed7a66 arm64: memory: remove duplicated include
remove duplicated include

Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Link: https://lore.kernel.org/r/202312151439+0800-wangjinchao@xfusion.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-17 12:07:53 +00:00
Miklos Szeredi
d8b0f54650
wire up syscalls for statmount/listmount
Wire up all archs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20231025140205.3586473-7-mszeredi@redhat.com
Reviewed-by: Ian Kent <raven@themaw.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-14 11:49:17 +01:00
Mark Rutland
eb15d707c2 arm64: Align boot cpucap handling with system cpucap handling
Currently the detection+enablement of boot cpucaps is separate from the
patching of boot cpucap alternatives, which means there's a period where
cpus_have_cap($CAP) and alternative_has_cap($CAP) may be mismatched.

It would be preferable to manage the boot cpucaps in the same way as the
system cpucaps, both for clarity and to minimize the risk of accidental
usage of code relying upon an alternative which has not yet been
patched.

This patch aligns the handling of boot cpucaps with the handling of
system cpucaps:

* The existing setup_boot_cpu_capabilities() function is moved to be
  closer to the setup_system_capabilities() and setup_system_features()
  functions so that they're more clearly related and more likely to be
  updated together in future.

* The patching of boot cpucap alternatives is moved into
  setup_boot_cpu_capabilities(), immediately after boot cpucaps are
  detected and enabled.

* A new setup_boot_cpu_features() function is added to mirror
  setup_system_features(); this handles initialization of cpucap data
  structures and calls setup_boot_cpu_capabilities(). This makes
  init_cpu_features() a closer mirror to update_cpu_features(), and
  makes smp_prepare_boot_cpu() a closer mirror to smp_cpus_done().

Importantly, while these changes alter the structure of the code, they
retain the existing order of calls to:

  init_cpu_features(); // prefix initializing feature regs
  init_cpucap_indirect_list();
  detect_system_supports_pseudo_nmi();
  update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
  enable_cpu_capabilities(SCOPE_BOOT_CPU);
  apply_boot_alternatives();

... and hence there should be no functional change as a result of this
patch; this is purely a structural cleanup.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231212170910.3745497-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-13 16:02:01 +00:00
James Houghton
3c0696076a arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
It is currently possible for a userspace application to enter an
infinite page fault loop when using HugeTLB pages implemented with
contiguous PTEs when HAFDBS is not available. This happens because:

1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
   (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).

2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
   the memory access on a system without HAFDBS, we will get a page
   fault.

3. HugeTLB will check if it needs to update the dirty bits on the PTE.
   For contiguous PTEs, it will check to see if the pgprot bits need
   updating. In this case, HugeTLB wants to write a sequence of
   sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
   to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
   so it thinks no update is necessary.

We can get the kernel to write a sw-dirty, hw-clean PTE with the
following steps (showing the relevant VMA flags and pgprot bits):

i.   Create a valid, writable contiguous PTE.
       VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
       VMA pgprot bits: PTE_RDONLY | PTE_WRITE
       PTE pgprot bits: PTE_DIRTY | PTE_WRITE

ii.  mprotect the VMA to PROT_NONE.
       VMA vmflags:     VM_SHARED
       VMA pgprot bits: PTE_RDONLY
       PTE pgprot bits: PTE_DIRTY | PTE_RDONLY

iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
       VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
       VMA pgprot bits: PTE_RDONLY | PTE_WRITE
       PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY

Make it impossible to create a writeable sw-dirty, hw-clean PTE with
pte_modify(). Such a PTE should be impossible to create, and there may
be places that assume that pte_dirty() implies pte_hw_dirty().

Signed-off-by: James Houghton <jthoughton@google.com>
Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags")
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-12-12 16:25:26 +00:00
Ard Biesheuvel
2632e25217 arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
Now that kernel mode FPSIMD state is context switched along with other
task state, we can enable the existing logic that keeps track of which
task's FPSIMD state the CPU is holding in its registers. If it is the
context of the task that we are switching to, we can elide the reload of
the FPSIMD state from memory.

Note that we also need to check whether the FPSIMD state on this CPU is
the most recent: if a task gets migrated away and back again, the state
in memory may be more recent than the state in the CPU. So add another
CPU id field to task_struct to keep track of this. (We could reuse the
existing CPU id field used for user mode context, but that might result
in user state to be discarded unnecessarily, given that two distinct
CPUs could be holding the most recent user mode state and the most
recent kernel mode state)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231208113218.3001940-9-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:31:55 +00:00
Ard Biesheuvel
aefbab8e77 arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
Currently, the FPSIMD register file is not preserved and restored along
with the general registers on exception entry/exit or context switch.
For this reason, we disable preemption when enabling FPSIMD for kernel
mode use in task context, and suspend the processing of softirqs so that
there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not
be used at all in other contexts).

Disabling preemption while doing CPU intensive work on inputs of
potentially unbounded size is bad for real-time performance, which is
why we try and ensure that SIMD crypto code does not operate on more
than ~4k at a time, which is an arbitrary limit and requires assembler
code to implement efficiently.

We can avoid the need for disabling preemption if we can ensure that any
in-kernel users of the NEON will not lose the FPSIMD register state
across a context switch. And given that disabling softirqs implicitly
disables preemption as well, we will also have to ensure that a softirq
that runs code using FPSIMD can safely interrupt an in-kernel user.

So introduce a thread_info flag TIF_KERNEL_FPSTATE, and modify the
context switch hook for FPSIMD to preserve and restore the kernel mode
FPSIMD to/from struct thread_struct when it is set. This avoids any
scheduling blackouts due to prolonged use of FPSIMD in kernel mode,
without the need for manual yielding.

In order to support softirq processing while FPSIMD is being used in
kernel task context, use the same flag to decide whether the kernel mode
FPSIMD state needs to be preserved and restored before allowing FPSIMD
to be used in softirq context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231208113218.3001940-8-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:31:54 +00:00
Ard Biesheuvel
9b19700e62 arm64: fpsimd: Drop unneeded 'busy' flag
Kernel mode NEON will preserve the user mode FPSIMD state by saving it
into the task struct before clobbering the registers. In order to avoid
the need for preserving kernel mode state too, we disallow nested use of
kernel mode NEON, i..e, use in softirq context while the interrupted
task context was using kernel mode NEON too.

Originally, this policy was implemented using a per-CPU flag which was
exposed via may_use_simd(), requiring the users of the kernel mode NEON
to deal with the possibility that it might return false, and having NEON
and non-NEON code paths. This policy was changed by commit
13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
disabled"), and now, softirq processing is disabled entirely instead,
and so may_use_simd() can never fail when called from task or softirq
context.

This means we can drop the fpsimd_context_busy flag entirely, and
instead, ensure that we disable softirq processing in places where we
formerly relied on the flag for preventing races in the FPSIMD preserve
routines.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20231208113218.3001940-7-ardb@google.com
[will: Folded in fix from CAMj1kXFhzbJRyWHELCivQW1yJaF=p07LLtbuyXYX3G1WtsdyQg@mail.gmail.com]
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 14:29:16 +00:00
Ard Biesheuvel
376f5a3bd7 arm64: mm: get rid of kimage_vaddr global variable
We store the address of _text in kimage_vaddr, but since commit
09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
decision"), we no longer reference this variable from modules so we no
longer need to export it.

In fact, we don't need it at all so let's just get rid of it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231129111555.3594833-46-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 11:06:28 +00:00
Ard Biesheuvel
a22fc8e102 arm64: mm: Take potential load offset into account when KASLR is off
We enable CONFIG_RELOCATABLE even when CONFIG_RANDOMIZE_BASE is
disabled, and this permits the loader (i.e., EFI) to place the kernel
anywhere in physical memory as long as the base address is 64k aligned.

This means that the 'KASLR' case described in the header that defines
the size of the statically allocated page tables could take effect even
when CONFIG_RANDMIZE_BASE=n. So check for CONFIG_RELOCATABLE instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231129111555.3594833-45-ardb@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12 11:06:27 +00:00
Joey Gouly
c0c5a8ea96 arm64/sysreg: add system register POR_EL{0,1}
Add POR_EL{0,1} according to DDI0601 2023-03.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-3-45284e538474@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11 19:03:54 +00:00
Zenghui Yu
86d1921c9d arm64: Delete the zero_za macro
zero_za was introduced in commit ca8a4ebcff44 ("arm64/sme: Manually encode
SME instructions") but doesn't appear to have any in kernel user. Drop it.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20231205160140.1438-1-yuzenghui@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11 13:18:06 +00:00
Mark Rutland
1beef60e7d arm64: stacktrace: factor out kernel unwind state
On arm64 we share some unwinding code between the regular kernel
unwinder and the KVM hyp unwinder. Some of this common code only matters
to the regular unwinder, e.g. the `kr_cur` and `task` fields in the
common struct unwind_state.

We're likely to add more state which only matters for regular kernel
unwinding (or only for hyp unwinding). In preparation for such changes,
this patch factors out the kernel-specific state into a new struct
kunwind_state, and updates the kernel unwind code accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Puranjay Mohan <puranjay12@gmail.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20231124110511.2795958-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11 11:42:55 +00:00
Masahiro Yamada
8fd7588fd4 arm64: replace <asm-generic/export.h> with <linux/export.h>
Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
deprecated <asm-generic/export.h>, which is now a wrapper of
<linux/export.h>.

Replace #include <asm-generic/export.h> with #include <linux/export.h>.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20231126151045.1556686-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05 14:25:30 +00:00
Marc Zyngier
d8e12a0d37 arm64: Kill detection of VPIPT i-cache policy
Since the kernel will never run on a system with the VPIPT i-cache
policy, drop the detection code altogether.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231204143606.1806432-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05 11:38:03 +00:00
Marc Zyngier
ced242ba9d KVM: arm64: Remove VPIPT I-cache handling
We have some special handling for VPIPT I-cache in critical parts
of the cache and TLB maintenance. Remove it.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231204143606.1806432-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05 11:38:03 +00:00
Ryan Roberts
b1366d21da arm64: Add ARM64_HAS_LPA2 CPU capability
Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in the hypervisor.

Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, we combine both into a single
capability, allowing us to simplify the implementation. KVM requires
support in both stages in order to use LPA2 since the same library is
used for hyp stage 1 and guest stage 2 pgtables.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Anshuman Khandual
e477c8c483 arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-5-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
c910f2b655 arm64/mm: Update tlb invalidation routines for FEAT_LPA2
FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value as a
level hint for the 4KB granule (this is due to the extra level of
translation) - previously TTL=0b0100 meant no hint and was treated as
0b0000. Secondly, The BADDR field of the range-based tlbi instructions
is specified in 64KB units when LPA2 is in use (TCR.DS=1), whereas it is
in page units otherwise. Changes are required for tlbi to continue to
operate correctly when LPA2 is in use.

Solve the first problem by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. While we
are at it, we replace the notion of 0 being the non-hinted sentinel with
a macro, TLBI_TTL_UNKNOWN. This means callers won't need updating
if/when translation depth increases in future.

The second issue is more complex: When LPA2 is in use, use the non-range
tlbi instructions to forward align to a 64KB boundary first, then use
range-based tlbi from there on, until we have either invalidated all
pages or we have a single page remaining. If the latter, that is done
with non-range tlbi. We determine whether LPA2 is in use based on
lpa2_is_enabled() (for kernel calls) or kvm_lpa2_is_enabled() (for kvm
calls).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-4-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
936a4ec281 arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
Add stub functions which is initially always return false. These provide
the hooks that we need to update the range-based TLBI routines, whose
operands are encoded differently depending on whether lpa2 is enabled or
not.

The kernel and kvm will enable the use of lpa2 asynchronously in future,
and part of that enablement will involve fleshing out their respective
hook to advertise when it is using lpa2.

Since the kernel's decision to use lpa2 relies on more than just whether
the HW supports the feature, it can't just use the same static key as
kvm. This is another reason to use separate functions. lpa2_is_enabled()
is already implemented as part of Ard's kernel lpa2 series. Since kvm
will make its decision solely based on HW support, kvm_lpa2_is_enabled()
will be defined as system_supports_lpa2() once kvm starts using lpa2.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-3-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Ryan Roberts
e2768b798a arm64/mm: Modify range-based tlbi to decrement scale
In preparation for adding support for LPA2 to the tlb invalidation
routines, modify the algorithm used by range-based tlbi to start at the
highest 'scale' and decrement instead of starting at the lowest 'scale'
and incrementing. This new approach makes it possible to maintain 64K
alignment as we work through the range, until the last op (at scale=0).
This is required when LPA2 is enabled. (This part will be added in a
subsequent commit).

This change is separated into its own patch because it will also impact
non-LPA2 systems, and I want to make it easy to bisect in case it leads
to performance regression (see below for benchmarks that suggest this
should not be a problem).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for _incrementing_ scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Indeed benchmarking kernel compilation, a TLBI-heavy workload, suggests
that this new approach actually _improves_ performance slightly (using a
virtual machine on Apple M2):

Table shows time to execute kernel compilation workload with 8 jobs,
relative to baseline without this patch (more negative number is
bigger speedup). Repeated 9 times across 3 system reboots:

| counter   |       mean |     stdev |
|:----------|-----------:|----------:|
| real-time |      -0.6% |      0.0% |
| kern-time |      -1.6% |      0.5% |
| user-time |      -0.4% |      0.1% |

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-2-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Will Deacon
acfa60dbe0 arm64: mm: Fix "rodata=on" when CONFIG_RODATA_FULL_DEFAULT_ENABLED=y
When CONFIG_RODATA_FULL_DEFAULT_ENABLED=y, passing "rodata=on" on the
kernel command-line (rather than "rodata=full") should turn off the
"full" behaviour, leaving writable linear aliases of read-only kernel
memory. Unfortunately, the option has no effect in this situation and
the only way to disable the "rodata=full" behaviour is to disable rodata
protection entirely by passing "rodata=off".

Fix this by parsing the "on" and "off" options in the arch code,
additionally enforcing that 'rodata_full' cannot be set without also
setting 'rodata_enabled', allowing us to simplify a couple of checks
in the process.

Fixes: 2e8cff0a0eee ("arm64: fix rodata=full")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20231117131422.29663-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-22 18:46:05 +00:00
Casey Schaufler
5f42375904 LSM: wireup Linux Security Module syscalls
Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules
system calls.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-api@vger.kernel.org
Reviewed-by: Mickaël Salaün <mic@digikod.net>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-11-12 22:54:42 -05:00
Linus Torvalds
3ca112b71f Probes fixes for v6.7-rc1:
- Documentation update: Add a note about argument and return value
    fetching is the best effort because it depends on the type.
 
  - objpool: Fix to make internal global variables static in
    test_objpool.c.
 
  - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are
    the same prototypes in asm/kprobes.h for some architectures, but
    some of them are missing the prototype and it causes a warning.
    So move the prototype into linux/kprobes.h.
 
  - tracing: Fix to check the tracepoint event and return event at
    parsing stage. The tracepoint event doesn't support %return
    but if $retval exists, it will be converted to %return silently.
    This finds that case and rejects it.
 
  - tracing: Fix the order of the descriptions about the parameters
    of __kprobe_event_gen_cmd_start() to be consistent with the
    argument list of the function.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVOwAQbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bItMH/0F/vyiirgLrRVvQ+5Tr
 Hm32oc1BQzxnQ0+9bjzk3r90KYk5cysBEEqxKzgxq9/RsJdyCczQUpxYehU0BoZT
 1B4pB5eQ0DwcdGAVk4TyBRYVBb3uhCyyZNXv+F60AsO8i87fHHoJXT9SoKK+Vgx4
 MAklE1gnxFFlRoYCBQpks89NajRx6n3aEL4/oXO3WYSrv+H2WGtZamB+RhpufkDx
 Qx5TkIGnjulcW6J5m7Px5N3z9AX00SbfooZHAae3fqsek5RPNecfc1/WiANNXrSm
 SYsG/i1jcHVvmk2YmCVokVLPKzhCOsKIuiW91rBu/Tu6lqiJmC+fxWxuZqAdXFUi
 +kw=
 =uymB
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Documentation update: Add a note about argument and return value
   fetching is the best effort because it depends on the type.

 - objpool: Fix to make internal global variables static in
   test_objpool.c.

 - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are the
   same prototypes in asm/kprobes.h for some architectures, but some of
   them are missing the prototype and it causes a warning. So move the
   prototype into linux/kprobes.h.

 - tracing: Fix to check the tracepoint event and return event at
   parsing stage. The tracepoint event doesn't support %return but if
   $retval exists, it will be converted to %return silently. This finds
   that case and rejects it.

 - tracing: Fix the order of the descriptions about the parameters of
   __kprobe_event_gen_cmd_start() to be consistent with the argument
   list of the function.

* tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/kprobes: Fix the order of argument descriptions
  tracing: fprobe-event: Fix to check tracepoint event and return
  kprobes: unify kprobes_exceptions_nofify() prototypes
  lib: test_objpool: make global variables static
  Documentation: tracing: Add a note about argument and retval access
2023-11-10 16:35:04 -08:00
Linus Torvalds
ac347a0655 arm64 fixes:
- Move the MediaTek GIC quirk handling from irqchip to core. Before the
   merging window commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo
   NMIs on MediaTek devices w/ firmware issues") temporarily addressed
   this issue. Fixed now at a deeper level in the arch code.
 
 - Reject events meant for other PMUs in the CoreSight PMU driver,
   otherwise some of the core PMU events would disappear.
 
 - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
   causing some events to be invisible.
 
 - Remove duplicate declaration of __arm64_sys##name following the patch
   to avoid prototype warning for syscalls.
 
 - Typos in the elf_hwcap documentation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmVObkoACgkQa9axLQDI
 XvHfiQ//eM5pDYlXTtkD8lqqAMKL5270iig9kN3lbrHO9+fPU0f15tntPyBJbgdv
 mTLTkfw5Uz1WqCuJkDHIL3aqeJU7uphJQgS+X4Js//37txJ0T+soJ2LQ+yCxIhVi
 PrJBcfNe6lz+0j/AeP7548hXt+gmUFIkBrSqy0NYPnhEd9Ly1mkk5Ggvt6e1baU3
 STSjsjFXNl9YtmsiU4Yy3X4n/vrt3rqQzsuq18R51Cw/w/J/CvI2g6z0bhMcThY1
 NHrMJU5xhTfDxOASS2p40HFZau4yCtIvbr0Y0HF1UsXilBXp7F17J7j6Og6+IEO1
 bOTgPnZ9p6faZ4BrNvC8wYNtclonHf5eYyrdf+aUzoyDIXkAtAqAU9lPg1+2+Aiv
 FrRmROtgnLX1upM9fq7/sSX+SUYUZMibtVlt1aNqgQktVUkUc6t0tzaj7lBtnvXQ
 PkUnA7qcUnwsF3r2GbUvYI3mzQfN7hTt924eFOtiDcXjWwrhhXeBI3vQyCwS2JGa
 zl2D+5tw/tERKYXwkNHWw69d9BYu7eVP5cw06nOXk3iDVNW8dJf7J3eUWnqNl7Ss
 nSBdYKgE97MvVWmaeaKWrrOO//zeHKeFoaH1BxxlHRTwhgpi6DWcRccB8F9RqKwe
 eZP1vKW66qH82DpHR9ivQ+OXE1WCDi0ZdcKhi2KYdNtf6wuXssY=
 =c+Yt
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Mostly PMU fixes and a reworking of the pseudo-NMI disabling on broken
  MediaTek firmware:

   - Move the MediaTek GIC quirk handling from irqchip to core. Before
     the merging window commit 44bd78dd2b88 ("irqchip/gic-v3: Disable
     pseudo NMIs on MediaTek devices w/ firmware issues") temporarily
     addressed this issue. Fixed now at a deeper level in the arch code

   - Reject events meant for other PMUs in the CoreSight PMU driver,
     otherwise some of the core PMU events would disappear

   - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
     causing some events to be invisible

   - Remove duplicate declaration of __arm64_sys##name following the
     patch to avoid prototype warning for syscalls

   - Typos in the elf_hwcap documentation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/syscall: Remove duplicate declaration
  Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW"
  arm64: Move MediaTek GIC quirk handling from irqchip to core
  arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
  perf: arm_cspmu: Reject events meant for other PMUs
  Documentation/arm64: Fix typos in elf_hwcaps
2023-11-10 12:22:14 -08:00
Arnd Bergmann
abc28463c8 kprobes: unify kprobes_exceptions_nofify() prototypes
Most architectures that support kprobes declare this function in their
own asm/kprobes.h header and provide an override, but some are missing
the prototype, which causes a warning for the __weak stub implementation:

kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes]
 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self,

Move the prototype into linux/kprobes.h so it is visible to all
the definitions.

Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-10 19:59:05 +09:00
Kevin Brodsky
f86128050d arm64/syscall: Remove duplicate declaration
Commit 6ac19f96515e ("arm64: avoid prototype warnings for syscalls")
added missing declarations to various syscall wrapper macros. It
however proved a little too zealous in __SYSCALL_DEFINEx(), as a
declaration for __arm64_sys##name was already present. A declaration
is required before the call to ALLOW_ERROR_INJECTION(), so keep
the original one and remove the new one.

Fixes: 6ac19f96515e ("arm64: avoid prototype warnings for syscalls")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231109141153.250046-1-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-09 17:19:14 +00:00
Ilkka Koskinen
403edfa436 arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
The driver used to truncate several 64-bit registers such as PMCEID[n]
registers used to describe whether architectural and microarchitectural
events in range 0x4000-0x401f exist. Due to discarding the bits, the
driver made the events invisible, even if they existed.

Moreover, PMCCFILTR and PMCR registers have additional bits in the upper
32 bits. This patch makes them available although they aren't currently
used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are
removed as they not being used at all.

Fixes: df29ddf4f04b ("arm64: perf: Abstract system register accesses away")
Reported-by: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/..
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-07 11:00:57 +00:00
Linus Torvalds
8f6f76a6a2 As usual, lots of singleton and doubleton patches all over the tree and
there's little I can say which isn't in the individual changelogs.
 
 The lengthier patch series are
 
 - "kdump: use generic functions to simplify crashkernel reservation in
   arch", from Baoquan He.  This is mainly cleanups and consolidation of
   the "crashkernel=" kernel parameter handling.
 
 - After much discussion, David Laight's "minmax: Relax type checks in
   min() and max()" is here.  Hopefully reduces some typecasting and the
   use of min_t() and max_t().
 
 - A group of patches from Oleg Nesterov which clean up and slightly fix
   our handling of reads from /proc/PID/task/...  and which remove
   task_struct.therad_group.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
 jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
 =On4u
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00