25628 Commits

Author SHA1 Message Date
Kajol Jain
424bcb570c powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
commit 76d588dddc459fefa1da96e0a081a397c5c8e216 upstream.

Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18 11:58:21 +01:00
Michael Jeanson
938791ad58 powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1
commit ad050d2390fccb22aa3e6f65e11757ce7a5a7ca5 upstream.

In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on
PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table
changing from their dot prefixed variant to the non-prefixed ones.

Since ftrace prefixes a dot to the syscall names when matching them to
build its syscall event list, this resulted in no syscall events being
available.

Remove the PPC64_ELF_ABI_V1 specific version of
arch_syscall_match_sym_name to have the same behavior across all powerpc
variants.

Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-07 11:11:48 +01:00
Nathan Lynch
482d990a5d powerpc/rtas: avoid scheduling in rtas_os_term()
[ Upstream commit 6c606e57eecc37d6b36d732b1ff7e55b7dc32dd4 ]

It's unsafe to use rtas_busy_delay() to handle a busy status from
the ibm,os-term RTAS function in rtas_os_term():

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Tainted: G      D            6.0.0-rc5-02182-gf8553a572277-dirty #9
Call Trace:
[c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable)
[c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0
[c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0
[c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4
[c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68
[c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50
[c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0
[c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0
[c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0
[c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420
[c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200

Use rtas_busy_delay_time() instead, which signals without side effects
whether to attempt the ibm,os-term RTAS call again.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-04 11:28:58 +01:00
Nathan Lynch
464d10e8d7 powerpc/rtas: avoid device tree lookups in rtas_os_term()
[ Upstream commit ed2213bfb192ab51f09f12e9b49b5d482c6493f3 ]

rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.

Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-04 11:28:57 +01:00
Nathan Lynch
0ab8499443 powerpc/pseries/eeh: use correct API for error log size
[ Upstream commit 9aafbfa5f57a4b75bafd3bed0191e8429c5fa618 ]

rtas-error-log-max is not the name of an RTAS function, so rtas_token()
is not the appropriate API for retrieving its value. We already have
rtas_get_error_log_max() which returns a sensible value if the property
is absent for any reason, so use that instead.

Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log retrieval")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop no-longer possible error handling as noticed by ajd]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-6-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:52 +01:00
Kajol Jain
01e3641b16 powerpc/hv-gpci: Fix hv_gpci event list
[ Upstream commit 03f7c1d2a49acd30e38789cd809d3300721e9b0e ]

Based on getPerfCountInfo v1.018 documentation, some of the
hv_gpci events were deprecated for platform firmware that
supports counter_info_version 0x8 or above.

Fix the hv_gpci event list by adding a new attribute group
called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6"
macro to enable these events for platform firmware
that supports counter_info_version 0x6 or below. And assigning
the hv_gpci event list based on output counter info version
of underlying plaform.

Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:51 +01:00
Yang Yingliang
983e289747 powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe()
[ Upstream commit 4d0eea415216fe3791da2f65eb41399e70c7bedf ]

If platform_device_add() is not called or failed, it can not call
platform_device_del() to clean up memory, it should call
platform_device_put() in error case.

Fixes: 26f6cb999366 ("[POWERPC] fsl_soc: add support for fsl_spi")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221029111626.429971-1-yangyingliang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:51 +01:00
Nicholas Piggin
5429722209 powerpc/perf: callchain validate kernel stack pointer bounds
[ Upstream commit 32c5209214bd8d4f8c4e9d9b630ef4c671f58e79 ]

The interrupt frame detection and loads from the hypothetical pt_regs
are not bounds-checked. The next-frame validation only bounds-checks
STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another
test for this.

The user could set r1 to be equal to the address matching the first
interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
due to the kernel redzone, and induce the kernel to load the marker from
there. Possibly this could cause a crash at least. If the user could
induce the previous page to contain a valid marker, then it might be
able to direct perf to read specific memory addresses in a way that
could be transmitted back to the user in the perf data.

Fixes: 20002ded4d93 ("perf_counter: powerpc: Add callchain support")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-4-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:51 +01:00
Pali Rohár
e45e3aae8d powerpc: dts: turris1x.dts: Add channel labels for temperature sensor
[ Upstream commit 67bbb62f61e810734da0a1577a9802ddaed24140 ]

Channel 0 of SA56004ED chip refers to internal SA56004ED chip sensor (chip
itself is located on the board) and channel 1 of SA56004ED chip refers to
external sensor which is connected to temperature diode of the P2020 CPU.

Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930123901.10251-1-pali@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:51 +01:00
Nayna Jain
03d7168103 powerpc/pseries: fix plpks_read_var() code for different consumers
[ Upstream commit 1f622f3f80cbf8999ff5955a2fcfbd801a1f32e0 ]

Even though plpks_read_var() is currently called to read variables
owned by different consumers, it internally supports only OS consumer.

Fix plpks_read_var() to handle different consumers correctly.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-7-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Nayna Jain
c156df808e powerpc/pseries: Return -EIO instead of -EINTR for H_ABORTED error
[ Upstream commit bb8e4c7cb759b90a04f2e94056b50288ff46a0ed ]

Some commands for eg. "cat" might continue to retry on encountering
EINTR. This is not expected for original error code H_ABORTED.

Map H_ABORTED to more relevant Linux error code EIO.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-4-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Nayna Jain
1a7657adb0 powerpc/pseries: Fix the H_CALL error code in PLPKS driver
[ Upstream commit af223e1728c448073d1e12fe464bf344310edeba ]

PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to
a different numerical value.

Fix the error codes as per PAPR Specification.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Nayna Jain
b38e5a3c46 powerpc/pseries: fix the object owners enum value in plpks driver
[ Upstream commit 2330757e0be0acad88852e211dcd6106390a729b ]

OS_VAR_LINUX enum in PLPKS driver should be 0x02 instead of 0x01.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-2-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Yang Yingliang
8240299519 powerpc/xive: add missing iounmap() in error path in xive_spapr_populate_irq_data()
[ Upstream commit 8b49670f3bb3f10cd4d5a6dca17f5a31b173ecdc ]

If remapping 'data->trig_page' fails, the 'data->eoi_mmio' need be unmapped
before returning from xive_spapr_populate_irq_data().

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221017032333.1852406-1-yangyingliang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Gustavo A. R. Silva
028631dd53 powerpc/xmon: Fix -Wswitch-unreachable warning in bpt_cmds
[ Upstream commit 1c4a4a4c8410be4a231a58b23e7a30923ff954ac ]

When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable into the case that uses it, which silences the warning:

arch/powerpc/xmon/xmon.c: In function ‘bpt_cmds’:
arch/powerpc/xmon/xmon.c:1529:13: warning: statement will never be executed [-Wswitch-unreachable]
 1529 |         int mode;
      |             ^~~~

Fixes: 09b6c1129f89 ("powerpc/xmon: Fix compile error with PPC_8xx=y")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YySE6FHiOcbWWR+9@work
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Christophe JAILLET
fb3ef6a5af powerpc/52xx: Fix a resource leak in an error handling path
[ Upstream commit 5836947613ef33d311b4eff6a32d019580a214f5 ]

The error handling path of mpc52xx_lpbfifo_probe() has a request_irq()
that is not balanced by a corresponding free_irq().

Add the missing call, as already done in the remove function.

Fixes: 3c9059d79f5e ("powerpc/5200: add LocalPlus bus FIFO device driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dec1496d46ccd5311d0f6e9f9ca4238be11bf6a6.1643440531.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:32:50 +01:00
Michael Ellerman
2e7ec190a0 powerpc/64s: Add missing declaration for machine_check_early_boot()
There's no declaration for machine_check_early_boot(), which leads to a
build failure with W=1. Add one.

Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125132521.2167039-1-mpe@ellerman.id.au
2022-11-26 00:25:32 +11:00
Christophe Leroy
89d21e259a powerpc/bpf/32: Fix Oops on tail call tests
test_bpf tail call tests end up as:

  test_bpf: #0 Tail call leaf jited:1 85 PASS
  test_bpf: #1 Tail call 2 jited:1 111 PASS
  test_bpf: #2 Tail call 3 jited:1 145 PASS
  test_bpf: #3 Tail call 4 jited:1 170 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
  test_bpf: #5 Tail call load/store jited:1
  BUG: Unable to handle kernel data access on write at 0xf1b4e000
  Faulting instruction address: 0xbe86b710
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=4K MMU=Hash PowerMac
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
  Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
  NIP:  be86b710 LR: be857e88 CTR: be86b704
  REGS: f1b4df20 TRAP: 0300   Not tainted  (6.1.0-rc4+)
  MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28008242  XER: 00000000
  DAR: f1b4e000 DSISR: 42000000
  GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
  GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
  GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
  GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
  NIP [be86b710] 0xbe86b710
  LR [be857e88] __run_one+0xec/0x264 [test_bpf]
  Call Trace:
  [f1b4dfe0] [00000002] 0x2 (unreliable)
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 0000000000000000 ]---

This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b690 ("bpf, tests: Add load store
test case for tail call")

This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.

This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.

Let's take a new approach. Use register r4 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. This means the input parameter must be in r3, which is more
correct as it is a 32 bits parameter, then tail call better match with
normal BPF function entry, the down side being that we move that input
parameter back and forth between r3 and r4. That can be optimised later.

Doing that also has the advantage of maximising the common parts between
tail calls and a normal function exit.

With the fix, tail call tests are now successfull:

  test_bpf: #0 Tail call leaf jited:1 53 PASS
  test_bpf: #1 Tail call 2 jited:1 115 PASS
  test_bpf: #2 Tail call 3 jited:1 154 PASS
  test_bpf: #3 Tail call 4 jited:1 165 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
  test_bpf: #5 Tail call load/store jited:1 141 PASS
  test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
  test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
  test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
  test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
  test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu
2022-11-24 23:05:10 +11:00
Nicholas Piggin
eb761a1760 powerpc: Fix writable sections being moved into the rodata region
.data.rel.ro*  catches .data.rel.root_cpuacct, and the kernel crashes on
a store in css_clear_dir. At least we know read-only data protection is
working...

Fixes: b6adc6d6d3272 ("powerpc/build: move .data.rel.ro, .sdata2 to read-only")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221116043954.3307852-1-npiggin@gmail.com
2022-11-16 21:37:14 +11:00
Michael Ellerman
02a771c9a6 powerpc/32: Select ARCH_SPLIT_ARG64
On 32-bit kernels, 64-bit syscall arguments are split into two
registers. For that to work with syscall wrappers, the prototype of the
syscall must have the argument split so that the wrapper macro properly
unpacks the arguments from pt_regs.

The fanotify_mark() syscall is one such syscall, which already has a
split prototype, guarded behind ARCH_SPLIT_ARG64.

So select ARCH_SPLIT_ARG64 to get that prototype and fix fanotify_mark()
on 32-bit kernels with syscall wrappers.

Note also that fanotify_mark() is the only usage of ARCH_SPLIT_ARG64.

Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221101034852.2340319-1-mpe@ellerman.id.au
2022-11-01 15:27:12 +11:00
Andreas Schwab
ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Fixes: e23750623835 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Michael Ellerman
2153fc9623 powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
There's a build failure for Book3E without AltiVec:
  Error: cc1: error: AltiVec not supported in this target
  make[6]: *** [/linux/scripts/Makefile.build:250:
  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o] Error 1

This happens because the amdgpu build is only gated by
PPC_LONG_DOUBLE_128, but that symbol can be enabled even though AltiVec
is disabled.

The only user of PPC_LONG_DOUBLE_128 is amdgpu, so just add a dependency
on AltiVec to that symbol to fix the build.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221027125626.1383092-1-mpe@ellerman.id.au
2022-10-31 17:39:44 +11:00
Nicholas Piggin
65722736c3 powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
Commit a4cb3651a1743 ("powerpc/64s/interrupt: Fix lost interrupts when
returning to soft-masked context") fixed the problem of pending irqs
being cleared when clearing the HARD_DIS bit, but then it didn't clear
the bit at all. This change clears HARD_DIS without affecting other bits
in the mask.

When an interrupt hits in a soft-masked section that has MSR[EE]=1, it
can hard disable and set PACA_IRQS_HARD_DIS, which must be cleared when
returning to the EE=1 caller (unless it was set due to a MUST_HARD_MASK
interrupt becoming pending). Failure to clear this leaves the
returned-to context running with MSR[EE]=1 and PACA_IRQS_HARD_DIS, which
confuses irq assertions and could be dangerous for code that might test
the flag.

This was observed in a hash MMU kernel where a kernel hash fault hits in
a local_irqs_disabled region that has EE=1. The hash fault also runs
with EE=1, then as it returns, a decrementer hits in the restart section
and the irq restart code hard-masks which sets the PACA_IRQ_HARD_DIS
flag, which is not clear when the original context is returned to.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: a4cb3651a1743 ("powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221022052207.471328-1-npiggin@gmail.com
2022-10-27 00:38:35 +11:00
Nicholas Piggin
dc398a084d powerpc/64s/interrupt: Perf NMI should not take normal exit path
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Squash in fixups from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006140413.126443-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
a073672eb0 powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
NMI PMIs really should not return using the normal interrupt_return
function. If such a PMI hits in code returning to user with the context
switched to user mode, this warning can fire. This was enough to cause
crashes when reproducing on 64s, because another perf interrupt would
hit while reporting bug, and that would cause another bug, and so on
until smashing the stack.

Work around that particular crash for now by just disabling that context
warning for PMIs. This is a hack and not a complete fix, there could be
other such problems lurking in corners. But it does fix the known crash.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
e59b3399fd KVM: PPC: BookS PR-KVM and BookE do not support context tracking
The context tracking code in PR-KVM and BookE implementations is not
complete, and can cause host crashes if context tracking is enabled.

Make these implementations depend on !CONTEXT_TRACKING_USER.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-2-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
00ff1eaac1 powerpc: Fix reschedule bug in KUAP-unlocked user copy
schedule must not be explicitly called while KUAP is unlocked, because
the AMR register will not be saved across the context switch on
64s (preemption is allowed because that is driven by interrupts which do
save the AMR).

exit_vmx_usercopy() runs inside an unlocked user access region, and it
calls preempt_enable() which will call schedule() if need_resched() was
set while non-preemptible. This can cause tasks to run unprotected when
the should not, and can cause the user copy to be improperly blocked
when scheduling back to it.

Fix this by avoiding the explicit resched for preempt kernels by
generating an interrupt to reschedule the context if need_resched() got
set.

Reported-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
2b2095f3a6 powerpc/64s: Fix hash__change_memory_range preemption warning
stop_machine_cpuslocked takes a mutex so it must be called in a
preemptible context, so it can't simply be fixed by disabling
preemption.

This is not a bug, because CPU hotplug is locked, so this processor will
call in to the stop machine function. So raw_smp_processor_id() could be
used. This leaves a small chance that this thread will be migrated to
another CPU, so the master work would be done by a CPU from a different
context. Better for test coverage to make that a common case by just
having the first CPU to call in become the master.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-2-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin
b9ef323ea1 powerpc/64s: Disable preemption in hash lazy mmu mode
apply_to_page_range on kernel pages does not disable preemption, which
is a requirement for hash's lazy mmu mode, which keeps track of the
TLBs to flush with a per-cpu array.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-1-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin
b12eb279ff powerpc/64s: make linear_map_hash_lock a raw spinlock
This lock is taken while the raw kfence_freelist_lock is held, so it
must also be a raw spinlock, as reported by lockdep when raw lock
nesting checking is enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-3-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin
35159b5717 powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe
With kfence enabled, there are several cases where HPTE and TLBIE locks
are called from softirq context, for example:

  WARNING: inconsistent lock state
  6.0.0-11845-g0cbbc95b12ac #1 Tainted: G                 N
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
  c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600
  {IN-SOFTIRQ-W} state was registered at:
    .lock_acquire+0x20c/0x520
    ._raw_spin_lock+0x4c/0x70
    .native_hpte_invalidate+0x62c/0x840
    .hash__kernel_map_pages+0x450/0x640
    .kfence_protect+0x58/0xc0
    .kfence_guarded_free+0x374/0x5a0
    .__slab_free+0x3d0/0x630
    .put_cred_rcu+0xcc/0x120
    .rcu_core+0x3c4/0x14e0
    .__do_softirq+0x1dc/0x7dc
    .do_softirq_own_stack+0x40/0x60

Fix this by consistently disabling irqs while taking either of these
locks. Don't just disable bh because several of the more common cases
already disable irqs, so this just makes the locks always irq-safe.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-2-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin
be83d5485d powerpc/64s: Add lockdep for HPTE lock
Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't
take the tlbie lock, so this shows up some of the same lockdep warnings
that were being reported by the ppc970. And they're not taken in exactly
the same places so this is nice to have in its own right.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-1-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Haren Myneni
2147783d6b powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU
The hypervisor assigns VAS (Virtual Accelerator Switchboard)
windows depends on cores configured in LPAR. The kernel uses
OF reconfig notifier to reconfig VAS windows for DLPAR CPU event.
In the case of shared CPU mode partition, the hypervisor assigns
VAS windows depends on CPU entitled capacity, not based on vcpus.
When the user changes CPU entitled capacity for the partition,
drmgr uses /proc/ppc64/lparcfg interface to notify the kernel.

This patch adds the following changes to update VAS resources
for shared mode:
- Call vas reconfig windows from lparcfg_write()
- Ignore reconfig changes in the VAS notifier

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Rework error handling, report any errors as EIO]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/efa9c16e4a78dda4567a16f13dabfd73cb4674a2.camel@linux.ibm.com
2022-10-18 22:46:18 +11:00
Haren Myneni
89ed0b769d powerpc/pseries/vas: Add VAS IRQ primary handler
irq_default_primary_handler() can be used only with IRQF_ONESHOT
flag, but the flag disables IRQ before executing the thread handler
and enables it after the interrupt is handled. But this IRQ disable
sets the VAS IRQ OFF state in the hypervisor. In case if NX faults
during this window, the hypervisor will not deliver the fault
interrupt to the partition and the user space may wait continuously
for the CSB update. So use VAS specific IRQ handler instead of
calling the default primary handler.

Increment pending_faults counter in IRQ handler and the bottom
thread handler will process all faults based on this counter.
In case if the another interrupt is received while the thread is
running, it will be processed using this counter. The synchronization
of top and bottom handlers will be done with IRQTF_RUNTHREAD flag
and will re-enter to bottom half if this flag is set.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaad8813b4762a6753cfcd0b605a7574a5192ec7.camel@linux.ibm.com
2022-10-18 22:46:18 +11:00
Linus Torvalds
f1947d7c8a Random number generator fixes for Linux 6.1-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
 A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
 qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
 R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
 lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
 MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
 8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
 XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
 Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
 vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
 4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
 lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
 =M+mV
 -----END PGP SIGNATURE-----

Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds
5e714bf171 - Alistair Popple has a series which addresses a race which causes page
refcounting errors in ZONE_DEVICE pages.
 
 - Peter Xu fixes some userfaultfd test harness instability.
 
 - Various other patches in MM, mainly fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA
 jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j
 MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc=
 =tbdp
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

 - fix userfaultfd test harness instability (Peter Xu)

 - various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...
2022-10-14 12:28:43 -07:00
Linus Torvalds
70609c1495 powerpc fixes for 6.1 #2
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs.
    Notably this broke ftruncate64 & pread/write64, which can lead to file corruption.
 
  - Fix lost interrupts when returning to soft-masked context on 64-bit.
 
  - Fix build failure when CONFIG_DTL=n.
 
 Thanks to: Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNJQp8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPd3D/9+uPapAs6/ZGkrbojseiF/fZNyIfTF
 HcLZ3iWUP9+APuaRxd1m3HbBoWdgFQ9yC5abYnX1z+E/yEhbTaOof9Gu1vXfxbIF
 Orlc+X12uZMrdU1L2qlgk/89QLeKOTlp/b2eUXJ332Vl4DfJDoknMsqTqbMz9NrL
 EDj6vmG3JzJgKHmplwxd7d4CExQNh9khZr3lZ6Ae/hC1cZ1PVXD67akwxsc0i8E2
 dBGzwsP1SYXPrBMXONT1GNs/whXrKBHNyLWFueab3p0FsSkbc4sY/HaAPzjq7YzK
 aPV0FQi2YpIZ5IIunZA+vW+tQ8EvyOpOUPMGb7iiN/31YcFbf/k6Y92AlCFlHRq3
 9vvsNGzDeFBfvH2wWcR1XaAMQHmBpC4L+dYbCKIyuK9DjgZtpZUV3acagRMFXqMN
 TeN9h2GXAYxCdNc27I5v3wAvbNcx69E4Bs5F7voKGD4pb7V/4d/li2oix6v0FKIy
 qAt1PGhD4IgPE3oJOEOYsJRVQNyDfG4eKlFh5ijyyc5hQeGNeKiAH+G+a61R3ueo
 GvVOdkR/1QxK1XkbgrMyclb0nHN1kEjRHeIcMFW1+dMKA9UvOlJ6u8nkOFRvA3js
 r00O1uE7DT5WzdGXgEHCCDd7PybBKxpM32m4/vzgDM7xG+DHxp3j7LTQsDOfkFsH
 bkc31LqrAw3yDQ==
 =B+7s
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

 - Fix lost interrupts when returning to soft-masked context on 64-bit.

 - Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
2022-10-14 11:16:18 -07:00
Nicholas Piggin
90d5ce82e1 powerpc/pseries: Fix CONFIG_DTL=n build
The recently moved dtl code must be compiled-in if
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y even if CONFIG_DTL=n.

Fixes: 6ba5aa541aaa0 ("powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013073131.1485742-1-npiggin@gmail.com
2022-10-13 22:30:07 +11:00
Nicholas Piggin
a4cb3651a1 powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
2022-10-13 22:29:49 +11:00
Alistair Popple
ef23345089 mm: free device private pages have zero refcount
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use.  However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.

This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount.  Instead we should return
pages to the drivers page allocator with a zero reference count.  Kernel
code can then safely use kernel functions such as get_page_unless_zero().

Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00
Alistair Popple
16ce101db8 mm/memory.c: fix race when faulting a device private page
Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace. 
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting. 
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code. 
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.


This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Nicholas Piggin
e237506238 powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.

The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.

With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.

Fix this by having 32-bit kernels share those syscall definitions with
compat.

Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72452 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-13 00:49:58 +11:00
Jason A. Donenfeld
197173db99 treewide: use get_random_bytes() when possible
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:58 -06:00
Jason A. Donenfeld
81895a65ec treewide: use prandom_u32_max() when possible, part 1
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:55 -06:00
Joel Stanley
ae5b6779fa powerpc: Fix 85xx build
The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.

Fixes: 8afc66e8d43b ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-11 10:13:34 -07:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
3604a7f568 This update includes the following changes:
API:
 
 - Feed untrusted RNGs into /dev/random.
 - Allow HWRNG sleeping to be more interruptible.
 - Create lib/utils module.
 - Setting private keys no longer required for akcipher.
 - Remove tcrypt mode=1000.
 - Reorganised Kconfig entries.
 
 Algorithms:
 
 - Load x86/sha512 based on CPU features.
 - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher.
 
 Drivers:
 
 - Add HACE crypto driver aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmM785cACgkQxycdCkmx
 i6dveBAAmGVYtrPmcGfA6CmzZ8ps9KdZxhjHjzLKwuqrOMulZvE2IYeUV4QtNqpQ
 6NLY2+TkqL0XIbCXoByIk32lMYIlXBaJdMYdHHDTeo7E2wqZn/46SPSWeNKazyJx
 dkL8Oj62nqDc2s0LOi3vLvod+sENFQ69R+vkHOa0fZhX0UBsac3NIXo+74Y2A7bE
 0+iQFKTWdNnoQzQ0j4q8WMiolKYh21iPZ9l5sjgMgichLCaE6PrITlRcaWrtPhey
 U1OmJtbTPsg+5X1r9KyLtoAXtBDONl66GQyne+p/ZYD8cMhxomjJaPlMhwWE/n4d
 d2KJKvoXoPPo4c+yNIS9hBav07ZriPl0q0jd2M1rd6oYTmFpaodTgIBfjvxO+wfV
 GoqDS8PEc42U1uwkuKC/cvfr6pB8WiybfXy+vSXBm/jUgIOO3y+eqsC8Jx9ZoQeG
 F+d34PYfJrJbmDRtcA6ZKdzN0OmKq7aCilx1kGKGPg0D+uq64FBo7zsT6XzTK8HL
 2Za9AACPn87xLQwGrKDSBfyrlSSIJm2FaIIPayUXHEo7cyoiZwbTpXRRJ1mDR+v9
 jzI+xPEXCthtjysuRmufNhTkiZUv3lZ8ORfQ0QFKR53tjZUm+dVQo0V/N/ZSXoSV
 SyRvXYO+ToXePAofNWl1LcO1grX/vxtFNedMkDLHXooRcnCaIYo=
 =rq2f
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Feed untrusted RNGs into /dev/random
   - Allow HWRNG sleeping to be more interruptible
   - Create lib/utils module
   - Setting private keys no longer required for akcipher
   - Remove tcrypt mode=1000
   - Reorganised Kconfig entries

  Algorithms:
   - Load x86/sha512 based on CPU features
   - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher

  Drivers:
   - Add HACE crypto driver aspeed"

* tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits)
  crypto: aspeed - Remove redundant dev_err call
  crypto: scatterwalk - Remove unused inline function scatterwalk_aligned()
  crypto: aead - Remove unused inline functions from aead
  crypto: bcm - Simplify obtain the name for cipher
  crypto: marvell/octeontx - use sysfs_emit() to instead of scnprintf()
  hwrng: core - start hwrng kthread also for untrusted sources
  crypto: zip - remove the unneeded result variable
  crypto: qat - add limit to linked list parsing
  crypto: octeontx2 - Remove the unneeded result variable
  crypto: ccp - Remove the unneeded result variable
  crypto: aspeed - Fix check for platform_get_irq() errors
  crypto: virtio - fix memory-leak
  crypto: cavium - prevent integer overflow loading firmware
  crypto: marvell/octeontx - prevent integer overflows
  crypto: aspeed - fix build error when only CRYPTO_DEV_ASPEED is enabled
  crypto: hisilicon/qm - fix the qos value initialization
  crypto: sun4i-ss - use DEFINE_SHOW_ATTRIBUTE to simplify sun4i_ss_debugfs
  crypto: tcrypt - add async speed test for aria cipher
  crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher
  crypto: aria - prepare generic module for optimized implementations
  ...
2022-10-10 13:04:25 -07:00
Linus Torvalds
d4013bc4d4 bitmap patches for v6.1-rc1
From Phil Auld:
 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
 
 From me:
 cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
 
 This series cleans that mess and adds new config FORCE_NR_CPUS that
 allows to optimize cpumask subsystem if the number of CPUs is known
 at compile-time.
 
 From me:
 lib: optimize find_bit() functions
 
 Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
 
 From me:
 lib/find: add find_nth_bit()
 
 Adds find_nth_bit(), which is ~70 times faster than bitcounting with
 for_each() loop:
         for_each_set_bit(bit, mask, size)
                 if (n-- == 0)
                         return bit;
 
 Also adds bitmap_weight_and() to let people replace this pattern:
 	tmp = bitmap_alloc(nbits);
 	bitmap_and(tmp, map1, map2, nbits);
 	weight = bitmap_weight(tmp, nbits);
 	bitmap_free(tmp);
 with a single bitmap_weight_and() call.
 
 From me:
 cpumask: repair cpumask_check()
 
 After switching cpumask to use nr_cpu_ids, cpumask_check() started
 generating many false-positive warnings. This series fixes it.
 
 From Valentin Schneider:
 bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
 
 Extends the API with one more function and applies it in sched/core.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
 vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
 /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
 r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
 GR7sMQae
 =Q/+F
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
2022-10-10 12:49:34 -07:00
Linus Torvalds
8afc66e8d4 Kbuild updates for v6.1
- Remove potentially incomplete targets when Kbuid is interrupted by
    SIGINT etc. in case GNU Make may miss to do that when stderr is piped
    to another program.
 
  - Rewrite the single target build so it works more correctly.
 
  - Fix rpm-pkg builds with V=1.
 
  - List top-level subdirectories in ./Kbuild.
 
  - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in kallsyms.
 
  - Avoid two different modules in lib/zstd/ having shared code, which
    potentially causes building the common code as build-in and modular
    back-and-forth.
 
  - Unify two modpost invocations to optimize the build process.
 
  - Remove head-y syntax in favor of linker scripts for placing particular
    sections in the head of vmlinux.
 
  - Bump the minimal GNU Make version to 3.82.
 
  - Clean up misc Makefiles and scripts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmM+4vcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGY2IQAInr0JUNnkkxwUSXtOcQuA3IK8RJ
 FbU9HXJRoV9H+7+l3SMlN7mIbrs5eE5fTY3iwQ3CVe139d1+1q7nvTMRv8owywJx
 GBgzswncuu1lk7iQQ//CxiqMwSCG8GJdYn1uDVy4I5jg3o+DtFZJtyq2Wb7pqsMm
 ZhZ4PozRN+idYQJSF6Vx/zEVLHI7quMBwfe4CME8/0Kg2+hnYzbXV/aUf0ED2emq
 zdCMDQgIOK5AhY+8qgMXKYnBUJMTqBp6LoR4p3ApfUkwRFY0sGa0/LK3U/B22OE7
 uWyR4fCUExGyerlcHEVev+9eBfmsLLPyqlchNwpSDOPf5OSdnKmgqJEBR/Cvx0eh
 URerPk7EHxyH3G8yi+cU2GtofNTGc5RHPRgJE2ADsQEi5TAUKGmbXMlsFRL/51Vn
 lTANZObBNa1d4enljF6TfTL5nuccOa+DKvXnH9fQ49t0QdtSikv6J/lGwilwm1Sr
 BctmCsySPuURZfkpI9OQnLuouloMXl9f7Q/+S39haS/tSgvPpyITyO71nxDnXn/s
 BbFObZJUk9QkqOACjBP1hNErTLt83uBxQ9z+rDCw/SbLIe4nw0wyneuygfHI5rI8
 3RZB2DbGauuJHX2Zs6YGS14SLSY33IsLqKR1/Vy3LrPvOHuEvNiOR8LITq5E0YCK
 OffK2Y5cIlXR0QWf
 =DHiN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove potentially incomplete targets when Kbuid is interrupted by
   SIGINT etc in case GNU Make may miss to do that when stderr is piped
   to another program.

 - Rewrite the single target build so it works more correctly.

 - Fix rpm-pkg builds with V=1.

 - List top-level subdirectories in ./Kbuild.

 - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in
   kallsyms.

 - Avoid two different modules in lib/zstd/ having shared code, which
   potentially causes building the common code as build-in and modular
   back-and-forth.

 - Unify two modpost invocations to optimize the build process.

 - Remove head-y syntax in favor of linker scripts for placing
   particular sections in the head of vmlinux.

 - Bump the minimal GNU Make version to 3.82.

 - Clean up misc Makefiles and scripts.

* tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
  docs: bump minimal GNU Make version to 3.82
  ia64: simplify esi object addition in Makefile
  Revert "kbuild: Check if linker supports the -X option"
  kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
  kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
  zstd: Fixing mixed module-builtin objects
  kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
  kallsyms: take the input file instead of reading stdin
  kallsyms: drop duplicated ignore patterns from kallsyms.c
  kbuild: reuse mksysmap output for kallsyms
  mksysmap: update comment about __crc_*
  kbuild: remove head-y syntax
  kbuild: use obj-y instead extra-y for objects placed at the head
  kbuild: hide error checker logs for V=1 builds
  kbuild: re-run modpost when it is updated
  kbuild: unify two modpost invocations
  kbuild: move vmlinux.o rule to the top Makefile
  kbuild: move .vmlinux.objs rule to Makefile.modpost
  kbuild: list sub-directories in ./Kbuild
  Makefile.compiler: replace cc-ifversion with compiler-specific macros
  ...
2022-10-10 12:00:45 -07:00