34771 Commits

Author SHA1 Message Date
Rafael J. Wysocki
9a2a9ebc0a cpufreq: Introduce governor flags
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10 18:31:17 +01:00
Will Deacon
04e613ded8 arm64: smp: Tell RCU about CPUs that fail to come online
Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
that RCU is informed early about incoming CPUs that might end up calling
into printk() before they are online. However, if such a CPU fails the
early CPU feature compatibility checks in check_local_cpu_capabilities(),
then it will be powered off or parked without informing RCU, leading to
an endless stream of stalls:

  | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  | rcu:	2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
  | (detected by 0, t=5252 jiffies, g=9317, q=136)
  | Task dump for CPU 2:
  | task:swapper/2       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x00000028
  | Call trace:
  | ret_from_fork+0x0/0x30

Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
off or parked.

Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Suggested-by: Qian Cai <cai@redhat.com>
Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck
Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-10 13:11:44 +00:00
Wang Qing
abbaa433de bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare
There is a bug when passing zero to PTR_ERR() and return.
Fix the smatch error.

Fixes: c4d0bfb45068 ("bpf: Add bpf_snprintf_btf helper")
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1604735144-686-1-git-send-email-wangqing@vivo.com
2020-11-09 22:37:19 +01:00
Peter Zijlstra
1908dc9117 perf: Tweak perf_event_attr::exclusive semantics
Currently perf_event_attr::exclusive can be used to ensure an
event(group) is the sole group scheduled on the PMU. One consequence
is that when you have a pinned event (say the watchdog) you can no
longer have regular exclusive event(group)s.

Inspired by the fact that !pinned events are considered less strict,
allow !pinned,exclusive events to share the PMU with pinned,!exclusive
events.

Pinned,exclusive is still fully exclusive.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162902.105962225@infradead.org
2020-11-09 18:12:36 +01:00
Peter Zijlstra
2714c3962f perf: Fix event multiplexing for exclusive groups
Commit 9e6302056f80 ("perf: Use hrtimers for event multiplexing")
placed the hrtimer (re)start call in the wrong place.  Instead of
capturing all scheduling failures, it only considered the PMU failure.

The result is that groups using perf_event_attr::exclusive are no
longer rotated.

Fixes: 9e6302056f80 ("perf: Use hrtimers for event multiplexing")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162902.038667689@infradead.org
2020-11-09 18:12:36 +01:00
Peter Zijlstra
251ff2d493 perf: Simplify group_sched_in()
Collate the error paths. Code duplication only leads to divergence and
extra bugs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162901.972161394@infradead.org
2020-11-09 18:12:35 +01:00
Peter Zijlstra
8c7855d829 perf: Simplify group_sched_out()
Since event_sched_out() clears cpuctx->exclusive upon removal of an
exclusive event (and only group leaders can be exclusive), there is no
point in group_sched_out() trying to do it too. It is impossible for
cpuctx->exclusive to still be set here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162901.904060564@infradead.org
2020-11-09 18:12:35 +01:00
Peter Zijlstra
76a4efa809 perf/arch: Remove perf_sample_data::regs_user_copy
struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09 18:12:34 +01:00
Peter Zijlstra
09da9c8125 perf: Optimize get_recursion_context()
"Look ma, no branches!"

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lkml.kernel.org/r/20201030151955.187580298@infradead.org
2020-11-09 18:12:34 +01:00
Peter Zijlstra
ce0f17fc93 perf: Fix get_recursion_context()
One should use in_serving_softirq() to detect SoftIRQ context.

Fixes: 96f6d4444302 ("perf_counter: avoid recursion")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201030151955.120572175@infradead.org
2020-11-09 18:12:34 +01:00
Peter Zijlstra
267fb27352 perf: Reduce stack usage of perf_output_begin()
__perf_output_begin() has an on-stack struct perf_sample_data in the
unlikely case it needs to generate a LOST record. However, every call
to perf_output_begin() must already have a perf_sample_data on-stack.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-11-09 18:12:33 +01:00
Dan Carpenter
1e106aa350 futex: Don't enable IRQs unconditionally in put_pi_state()
The exit_pi_state_list() function calls put_pi_state() with IRQs disabled
and is not expecting that IRQs will be enabled inside the function.

Use the _irqsave() variant so that IRQs are restored to the original state
instead of being enabled unconditionally.

Fixes: 153fbd1226fb ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201106085205.GA1159983@mwanda
2020-11-09 14:30:30 +01:00
Eddy Wu
b4e00444ca fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent
current->group_leader->exit_signal may change during copy_process() if
current->real_parent exits.

Move the assignment inside tasklist_lock to avoid the race.

Signed-off-by: Eddy Wu <eddy_wu@trendmicro.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-08 11:18:39 -08:00
Linus Torvalds
100e38914a A single fix for the perf core plugging a memory leak in the address filter
parser.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oC4ATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXveD/4mIAq1umSj/YdThj65S6nAHjhW1MjO
 VxGRNSEiUxOld6OnxZlyup1q4NvyYB1cVk71jLjdBIvEK/ANdq3Cd+PVB5tAAoNK
 726zrPlOXx4NJulq2q17ygazZQgOtl/5QsYovS0PymCyyvwfz56yr/OfPRThCe+S
 gJZs+zt8uunhEKz1J7lki/vn7D6bHRdaL8yTLabsPIQ296WXrUd2NS4fuanFNyLO
 kxETMAECyy77q17dgClFFDheyLHAotoySLm6qSwNeWUPFwlCmoCbHgzSBoryIlH5
 DMlwA5iuA5qVv/4jlZ0+WvVWeSWOyF0boZ1cZI+r4nitVvdwnnxTgXcyuTxDSggu
 nygQqLlTIQ31dwQleAfKrooJfwk9nB5uLEQkTwdx0BQwdHhvb7usnNEcIMx8WVFf
 86LorPlfbKrUTD7PWj7zg3x32GNv/nWjuDnzD0NW6BPKqVy6XaRZtaG2WyYfB9YW
 9KXpfsxjDeTBrZK5b/YOna1ah00J+2hwIL7sUhVf24aOmmImy3+Yqe2bQOlR/JGs
 yIBFOAPCWrl7AJOwziMD2NQGCTzk9hqKRyhy1WtPC/L0uohf16/xbnViyL1Ijhud
 Hzu09qoo8EXYbSwRcHQ+qK39s9fwDK1yvPp+k4XGXS/v94YN0W5rd3o49bTMvhxE
 T8CKfwngsBzeYg==
 =r2wz
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
 "A single fix for the perf core plugging a memory leak in the address
  filter parser"

* tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix a memory leak in perf_event_parse_addr_filter()
2020-11-08 10:05:10 -08:00
Linus Torvalds
aaaaa7ecdc A single fix for the futex code where an intermediate state in the
underlying RT mutex was not handled correctly and triggering a BUG()
 instead of treating it as another variant of retry condition.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oCqcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUSvD/kBBV+cSHoLZqWzoYCySxnIgTX5xLgH
 imdhBnGX4MZsHydD1zu9IKuHbOTdRdnf/aXk+6akqyRTqtAxqL3e1BW7zfX2aWo/
 EOk5P36ZLzdqqLrc4Z/FZOLaUAx95t2dse0PA3jXOUTHaxr/bwih+/f1vR84xFBG
 /Sv1TubPLU7koZI057SrZsHmhF7yc22BMvg30UFUgx8tZyXGBIUEYOYS6sgGBwQe
 qXIxTHkpiRwCHu+i44QJiBdhPg8gEb4tsJQLlnmhPYz4m5v/w4HGe7ixZlBQU7f1
 isY4GEuDHRRheOzvP0d5LgcRx6r1hNDDu8MWNjWp7N1sTTKMGnDnZ/a36TgBtlvc
 Go+HVNCapfhSeul8qjZuL022a9Yv2NcsouwdPWfZpMyWuhkXYWnaVHWvuEKsB4UZ
 z0b34q5pULwSz/R4BiqwoLZXdWNsN8x/C31F3v0jBNa/HwK5dTEAgeez0+HNJhDA
 Ci/+CgggLuch6V6JYwOT1AEB4ymEXTuRK/d8IwlVADA3fN0wulUUAnbgVYHRWD4y
 JaXJKWHnw/G8R7AI31aVHa70D64tAM+0UdsLM+4/idP2kc2Cv7EcCdIHz2OUfQBD
 I1/RmugdCPsF5Zutb0dQFNumzxRGk6l+woa7EFnzpkO96GBJDCQiUZKvTFINdCh0
 LoLv49vOBVDifQ==
 =PX5t
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull futex fix from Thomas Gleixner:
 "A single fix for the futex code where an intermediate state in the
  underlying RT mutex was not handled correctly and triggering a BUG()
  instead of treating it as another variant of retry condition"

* tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Handle transient "ownerless" rtmutex state correctly
2020-11-08 09:56:37 -08:00
Linus Torvalds
15a9844458 A set of fixes for interrupt chip drivers:
- Fix the fallout of the IPI as interrupt conversion in Kconfig and the
    BCM2836 interrupt chip driver/
 
  - Fixes for interrupt affinity setting and the handling of hierarchical
    irq domains in the SiFive PLIC driver.
 
  - Make the unmapped event handling in the TI SCI driver work correctly.
 
  - A few minor fixes and cleanups in various chip drivers and Kconfig.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oCiATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUIkEAC9wYn4x5fObSRpamwj0lagGkUsd0dB
 JmKW+u/oUkhM1ZosqPegslA6RDc7LitONFcnqBD348JBrxyP94OnskhBWITv9Nx1
 p4AkLS7HDs0pgOpam66qJdxAmLZ0F0vOPnEgr2VQWqaRV8qlTESNiCxT2wKF7mRd
 qJn5pW3kSJnyfA66cpFVC3Db64KmdWQvv0BCnc1Wqq3odXgOuTvOkqzVrlqJZQ59
 RrZudU0Lz9FtUujQ7AuP+RZY7Ti/AjmEaZDvcsnz3SR6vZGV8qG1f/88RP31d4qK
 62cIOfz/bsl1uxAwnKGM4U84tzYua6djhRZXInlzL4/iYKlm5qVR8Qpotl7IxT5n
 ntMJGhu/Evy987mq1maOR2rWyqVNU5BoVJUeHHibP8LANTKf7sdWOaNqieXQRKYS
 ZnTmoRImBOFhxi0BuqKwwHqJILC4yOZpOa+ARMPqns2KzA4jpAvN+MwPWwpsBVaD
 giVm8e8CQvFgMQjjRHcOkfernSsQs/fyQSYQY9qJI/IzVqTEFcYUONneelJGFB7R
 iDFMURx7aQUPNf8p9c7eEx0BSBUBan+Quul4HQ1I+SnmOZg3QgRvlMm/zPxhnfyU
 3NV4oi4c/fG+7ex+tR3igcfYPCK35BP+6iDG+GKTBjAHimQqk9XXsDYN/Wh3nW6f
 UnsJIs9zdhPJsA==
 =HUPi
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Fix the fallout of the IPI as interrupt conversion in Kconfig and
     the BCM2836 interrupt chip driver

   - Fixes for interrupt affinity setting and the handling of
     hierarchical irq domains in the SiFive PLIC driver

   - Make the unmapped event handling in the TI SCI driver work
     correctly

   - A few minor fixes and cleanups in various chip drivers and Kconfig"

* tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events
  irqchip/ti-sci-inta: Add support for unmapped event handling
  dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling
  irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
  irqchip/sifive-plic: Fix chip_data access within a hierarchy
  irqchip/sifive-plic: Fix broken irq_set_affinity() callback
  irqchip/stm32-exti: Add all LP timer exti direct events support
  irqchip/bcm2836: Fix missing __init annotation
  irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY
  irqchip/mst: Make mst_intc_of_init static
  irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7
  genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
2020-11-08 09:52:57 -08:00
Linus Torvalds
6a8d0d283d A single fix for the generic entry code to correct the wrong assumption
that the lockdep interrupt state needs not to be established before calling
 the RCU check.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oCN8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYgDD/42A/CTltvs1meUm8gpCbACe/mwY4Ms
 4K35p9E/3ozyhjkXbP1odZ6I/rh1pKZYz+qxS7JYDEDgl+3ZfZCeb8WFNspBRvD1
 uURn3ou5aVFHGutjF/5yEzrHP+Vkw8ViOvIjhZ1zpPQ0kWyEP7P5K+0wPFV1BVda
 ugk6T0ouQo7Kl7c7g/EP9tr9UNorih9ZmkJcsZ7fkzDin+5Ine6v9J1VRumSkm11
 4OU7My/c/0EpTuslnW4w4kVYk90B+o6dJccG5SVJVrRHvziD3qhM3slI2UqnIIb+
 /8NTrUy79mikZQh6LG5QrPoX9/+tp68csoFwzE6MpVvV+GTlfnbX7xn2ZW0t7ZSA
 FVYUxYSO3kCzvU2VfqzLc8HtgK/esoMpcgloB0FM9XDw2khlgvnJyTCAnsn44QIP
 tIA+FHGv9oa+QW5NoiRpBtTAYImTfiwaTa5AXCBFijHXsJVMziWDqZ+AQwf58LP9
 RjItnOu0ZKzvLHRYDfqpusIUytFv04YuAn2kJ1vDWL4nXeo3vma2at+s/zbPHOjW
 LZDUeavv7e5pqMDFC1QDja4o5fSfK0Q7ddVP8Xm1yrnZz5jPmQNCGpLxZWx8wsjq
 qxPEUdfqTFvRJQhLHkVznuQUrOVYPVlZfoPro47HNHeHs+qLI1QxmsJHhRu2vd1M
 PwVVBz1vvk4Txg==
 =ZSdr
 -----END PGP SIGNATURE-----

Merge tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull entry code fix from Thomas Gleixner:
 "A single fix for the generic entry code to correct the wrong
  assumption that the lockdep interrupt state needs not to be
  established before calling the RCU check"

* tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Fix the incorrect ordering of lockdep and RCU check
2020-11-08 09:51:28 -08:00
Mike Galbraith
9f5d1c336a futex: Handle transient "ownerless" rtmutex state correctly
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio       Operation
T1   120	lock(F)
T2   120	lock(F)   -> blocks (top waiter)
T3   50 (RT)	lock(F)   -> boosts T1 and blocks (new top waiter)
XX   		timeout/  -> wakes T2
		signal
T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter
     			     and the lower priority T2 cannot steal the lock.
     			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
2020-11-07 22:07:04 +01:00
kiyin(尹亮)
7bdb157cde perf/core: Fix a memory leak in perf_event_parse_addr_filter()
As shown through runtime testing, the "filename" allocation is not
always freed in perf_event_parse_addr_filter().

There are three possible ways that this could happen:

 - It could be allocated twice on subsequent iterations through the loop,
 - or leaked on the success path,
 - or on the failure path.

Clean up the code flow to make it obvious that 'filename' is always
freed in the reallocation path and in the two return paths as well.

We rely on the fact that kfree(NULL) is NOP and filename is initialized
with NULL.

This fixes the leak. No other side effects expected.

[ Dan Carpenter: cleaned up the code flow & added a changelog. ]
[ Ingo Molnar: updated the changelog some more. ]

Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu>
Cc: Anthony Liguori <aliguori@amazon.com>
--
 kernel/events/core.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)
2020-11-07 13:07:26 +01:00
Jakub Kicinski
86bbf01977 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-11-06

1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.

2) Tighten bpf_lsm function check, from KP.

3) Fix bpftool attaching to flow dissector, from Lorenz.

4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Update verification logic for LSM programs
  bpf: Zero-fill re-used per-cpu map element
  bpf: BPF_PRELOAD depends on BPF_SYSCALL
  tools/bpftool: Fix attaching flow dissector
  libbpf: Fix possible use after free in xsk_socket__delete
  libbpf: Fix null dereference in xsk_socket__delete
  libbpf, hashmap: Fix undefined behavior in hash_bits
  bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
  tools, bpftool: Remove two unused variables.
  tools, bpftool: Avoid array index warnings.
  xsk: Fix possible memory leak at socket close
  bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
  samples/bpf: Set rlimit for memlock to infinity in all samples
  bpf: Fix -Wshadow warnings
  selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================

Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 17:49:34 -08:00
KP Singh
6f64e47783 bpf: Update verification logic for LSM programs
The current logic checks if the name of the BTF type passed in
attach_btf_id starts with "bpf_lsm_", this is not sufficient as it also
allows attachment to non-LSM hooks like the very function that performs
this check, i.e. bpf_lsm_verify_prog.

In order to ensure that this verification logic allows attachment to
only LSM hooks, the LSM_HOOK definitions in lsm_hook_defs.h are used to
generate a BTF_ID set. Upon verification, the attach_btf_id of the
program being attached is checked for presence in this set.

Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution")
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105230651.2621917-1-kpsingh@chromium.org
2020-11-06 13:15:21 -08:00
Lukas Bulwahn
90574a9c02 printk: remove unneeded dead-store assignment
make clang-analyzer on x86_64 defconfig caught my attention with:

  kernel/printk/printk_ringbuffer.c:885:3: warning:
  Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores]

		desc = to_desc(desc_ring, head_id);
		^

Commit b6cf8b3f3312 ("printk: add lockless ringbuffer") introduced
desc_reserve() with this unneeded dead-store assignment.

As discussed with John Ogness privately, this is probably just some minor
left-over from previous iterations of the ringbuffer implementation. So,
simply remove this unneeded dead assignment to make clang-analyzer happy.

As compilers will detect this unneeded assignment and optimize this anyway,
the resulting object code is identical before and after this change.

No functional change. No change to object code.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
2020-11-06 11:05:44 +01:00
David Verbeiren
d3bec0138b bpf: Zero-fill re-used per-cpu map element
Zero-fill element values for all other cpus than current, just as
when not using prealloc. This is the only way the bpf program can
ensure known initial values for all cpus ('onallcpus' cannot be
set when coming from the bpf program).

The scenario is: bpf program inserts some elements in a per-cpu
map, then deletes some (or userspace does). When later adding
new elements using bpf_map_update_elem(), the bpf program can
only set the value of the new elements for the current cpu.
When prealloc is enabled, previously deleted elements are re-used.
Without the fix, values for other cpus remain whatever they were
when the re-used entry was previously freed.

A selftest is added to validate correct operation in above
scenario as well as in case of LRU per-cpu map element re-use.

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: David Verbeiren <david.verbeiren@tessares.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net
2020-11-05 19:55:57 -08:00
Randy Dunlap
7c0afcad75 bpf: BPF_PRELOAD depends on BPF_SYSCALL
Fix build error when BPF_SYSCALL is not set/enabled but BPF_PRELOAD is
by making BPF_PRELOAD depend on BPF_SYSCALL.

ERROR: modpost: "bpf_preload_ops" [kernel/bpf/preload/bpf_preload.ko] undefined!

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105195109.26232-1-rdunlap@infradead.org
2020-11-05 18:49:29 -08:00
Linus Torvalds
3249fe4563 Tracing fixes:
- Fix off-by-one error in retrieving the context buffer for trace_printk()
 
 - Fix off-by-one error in stack nesting limit
 
 - Fix recursion to not make all NMI code false positive as recursing
 
 - Stop losing events in function tracing when transitioning between irq context
 
 - Stop losing events in ring buffer when transitioning between irq context
 
 - Fix return code of error pointer in parse_synth_field() to prevent
   NULL pointer dereference.
 
 - Fix false positive of NMI recursion in kprobe event handling
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX6QqmRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWiAQCKf3xblmQDcge2BLQPy2H7ih5VCSSb
 vDPrQKgLa35pSgEAhGj7mgAKxb6kDbae1BqPorfa/qn9Bi13XfRNTxBK2Qw=
 =zKfd
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix off-by-one error in retrieving the context buffer for
   trace_printk()

 - Fix off-by-one error in stack nesting limit

 - Fix recursion to not make all NMI code false positive as recursing

 - Stop losing events in function tracing when transitioning between irq
   context

 - Stop losing events in ring buffer when transitioning between irq
   context

 - Fix return code of error pointer in parse_synth_field() to prevent
   NULL pointer dereference.

 - Fix false positive of NMI recursion in kprobe event handling

* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  kprobes: Tell lockdep about kprobe nesting
  tracing: Make -ENOMEM the default error for parse_synth_field()
  ring-buffer: Fix recursion protection transitions between interrupt context
  tracing: Fix the checking of stackidx in __ftrace_trace_stack
  ftrace: Handle tracing when switching between context
  ftrace: Fix recursion check for NMI test
  tracing: Fix out of bounds write in get_trace_buf
2020-11-05 11:41:38 -08:00
Linus Torvalds
f786dfa374 Power management fixes for 5.10-rc3.
- Unify the handling of managed and stateless device links in the
    runtime PM framework and prevent runtime PM references to devices
    from being leaked after device link removal (Rafael Wysocki).
 
  - Fix two mistakes in the cpuidle documentation (Julia Lawall).
 
  - Prevent the schedutil cpufreq governor from missing policy
    limits updates in some cases (Viresh Kumar).
 
  - Prevent static OPPs from being dropped by mistake (Viresh Kumar).
 
  - Prevent helper function in the OPP framework from returning
    prematurely (Viresh Kumar).
 
  - Prevent opp_table_lock from being held too long during removal
    of OPP tables with no more active references (Viresh Kumar).
 
  - Drop redundant semicolon from the Intel RAPL power capping
    driver (Tom Rix).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+kAnUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxND0P/j5e/wqFZ/34XEaUa4PIs27TDoJHq1Jh
 PfUxqSBEVO6Xm8TMCtOIuS8gw4x7ehenXC8gFdivy+Fu+4QNkPKYggF5/LgWO8Gl
 cMNzYwJ8shIBQWQZftIx01Bn5tAvk1YhV1mnSNf580Iy7FhKqojMnrvzQnpGD4jR
 piBkvBcbIJWDk+T96RzbqnqmMD0euvlYfzg1KnsyqsOpoRl7kZoH4ahWYskdIxDY
 NtA4SqQ8dvxjTwI8+a+JBb//ua9jjjDyjd7FUV87HMdcVh9rZKbmHKkk4Yv/3C4C
 jZuCTV6zaWIHCbbkZwi+OTNE8q21GgWm1xqwnGWVTFSGrs8ZyfjLs73/g9S5zI3N
 C/n4YUJxiYDVcnrAVwR7kSTEouiQ6mTuChOt7T3r1Ilx67rw4TDsI59Fk1Xn07bZ
 1wfaPASUPsFAxF7vMhkdzsidhpR3BYNgMwmAWo9R4Yvw4evyRN4tywoUQJ1X27zg
 ERir6mFVz6XCnXPrJhyx0bWLo+VD8J/arhfIPSTqHR7wn0tgc23aVeYy7wvfYiiu
 QQJqQ58Aoa0Z7ZpeAv3xbSung+eqQ/dDC9FnXqxdZCI6brYUhrZ19OhsFnEyzGUR
 IDRzcP1/72AxhidpjO71txOJSJFTqKF2LzL/wuS6kMbYwmFjYfj18xX5GvOUuXmL
 hO+E+QdleQFo
 =lKQA
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix the device links support in runtime PM, correct mistakes in
  the cpuidle documentation, fix the handling of policy limits changes
  in the schedutil cpufreq governor, fix assorted issues in the OPP
  (operating performance points) framework and make one janitorial
  change.

  Specifics:

   - Unify the handling of managed and stateless device links in the
     runtime PM framework and prevent runtime PM references to devices
     from being leaked after device link removal (Rafael Wysocki).

   - Fix two mistakes in the cpuidle documentation (Julia Lawall).

   - Prevent the schedutil cpufreq governor from missing policy limits
     updates in some cases (Viresh Kumar).

   - Prevent static OPPs from being dropped by mistake (Viresh Kumar).

   - Prevent helper function in the OPP framework from returning
     prematurely (Viresh Kumar).

   - Prevent opp_table_lock from being held too long during removal of
     OPP tables with no more active references (Viresh Kumar).

   - Drop redundant semicolon from the Intel RAPL power capping driver
     (Tom Rix)"

* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: runtime: Resume the device earlier in __device_release_driver()
  PM: runtime: Drop pm_runtime_clean_up_links()
  PM: runtime: Drop runtime PM references to supplier on link removal
  powercap/intel_rapl: remove unneeded semicolon
  Documentation: PM: cpuidle: correct path name
  Documentation: PM: cpuidle: correct typo
  cpufreq: schedutil: Don't skip freq update if need_freq_update is set
  opp: Reduce the size of critical section in _opp_table_kref_release()
  opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
  opp: Don't always remove static OPPs in _of_add_opp_table_v1()
2020-11-05 11:04:29 -08:00
Thomas Gleixner
9d820f68b2 entry: Fix the incorrect ordering of lockdep and RCU check
When an exception/interrupt hits kernel space and the kernel is not
currently in the idle task then RCU must be watching.

irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
turn invokes lockdep when taking a lock. But at that point lockdep does not
yet know about the fact that interrupts have been disabled by the CPU,
which triggers a lockdep splat complaining about inconsistent state.

Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.

So use the same sequence as for the idle case and tell lockdep about the
irq state change first, invoke the RCU check and then do the lockdep and
tracer update.

Fixes: a5497bab5f72 ("entry: Provide generic interrupt entry/exit code")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de
2020-11-04 18:06:14 +01:00
Steven Rostedt (VMware)
645f224e7b kprobes: Tell lockdep about kprobe nesting
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.

Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-04 09:46:06 -05:00
Steven Rostedt (VMware)
561ca66910 tracing: Make -ENOMEM the default error for parse_synth_field()
parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.

Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.

Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 15:58:32 -05:00
Steven Rostedt (VMware)
b02414c8f0 ring-buffer: Fix recursion protection transitions between interrupt context
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.

Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.

Cc: stable@vger.kernel.org
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 15:58:32 -05:00
Lukas Bulwahn
3b70ae4f5c kernel/hung_task.c: make type annotations consistent
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.

It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.

Hence, sparse complains:

  kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))

Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:

  kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
  kernel/hung_task.c:277:52:    expected void *
  kernel/hung_task.c:277:52:    got void [noderef] __user *buffer

No functional change. No change in object code.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Ignatov <rdna@fb.com>
Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02 12:14:19 -08:00
Zqiang
6993d0fdbe kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

	CPU0						CPU1
kthread_cancel_delayed_work_sync()
   __kthread_cancel_work_sync()
     __kthread_cancel_work()
        work->canceling++;
					      kthread_delayed_work_timer_fn()
						   kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling is
set.

Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02 12:14:19 -08:00
Oleg Nesterov
7b3c36fc4c ptrace: fix task_join_group_stop() for the case when current is traced
This testcase

	#include <stdio.h>
	#include <unistd.h>
	#include <signal.h>
	#include <sys/ptrace.h>
	#include <sys/wait.h>
	#include <pthread.h>
	#include <assert.h>

	void *tf(void *arg)
	{
		return NULL;
	}

	int main(void)
	{
		int pid = fork();
		if (!pid) {
			kill(getpid(), SIGSTOP);

			pthread_t th;
			pthread_create(&th, NULL, tf, NULL);

			return 0;
		}

		waitpid(pid, NULL, WSTOPPED);

		ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
		waitpid(pid, NULL, 0);

		ptrace(PTRACE_CONT, pid, 0,0);
		waitpid(pid, NULL, 0);

		int status;
		int thread = waitpid(-1, &status, 0);
		assert(thread > 0 && thread != pid);
		assert(status == 0x80137f);

		return 0;
	}

fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().

This is because task_join_group_stop() has 2 problems when current is traced:

	1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
	   can be woken up by debugger and it can clone another thread which
	   should join the group-stop.

	   We need to check group_stop_count || SIGNAL_STOP_STOPPED.

	2. If SIGNAL_STOP_STOPPED is already set, we should not increment
	   sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
	   should stop without another do_notify_parent_cldstop() report.

To clarify, the problem is very old and we should blame
ptrace_init_task().  But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.

Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <christian@brauner.io>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02 12:14:19 -08:00
Viresh Kumar
23a881852f cpufreq: schedutil: Don't skip freq update if need_freq_update is set
The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.

For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.

Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.

Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.

This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.

As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.

We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.

Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rearrange code to avoid a branch ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-02 17:57:49 +01:00
Qiujun Huang
906695e593 tracing: Fix the checking of stackidx in __ftrace_trace_stack
The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING
is illegal too. And fix two typos by the way.

Link: https://lkml.kernel.org/r/20201031085714.2147-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 11:21:40 -05:00
Christoph Hellwig
fc0021aa34 swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t.  swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets.  Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.

Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-11-02 10:10:39 -05:00
Stefano Stabellini
e9696d259d swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling

  memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);

If the allocation must fail, no_iotlb_memory is set.

Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.

When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.

Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.

Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.

Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-11-02 10:10:37 -05:00
Steven Rostedt (VMware)
726b3d3f14 ftrace: Handle tracing when switching between context
When an interrupt or NMI comes in and switches the context, there's a delay
from when the preempt_count() shows the update. As the preempt_count() is
used to detect recursion having each context have its own bit get set when
tracing starts, and if that bit is already set, it is considered a recursion
and the function exits. But if this happens in that section where context
has changed but preempt_count() has not been updated, this will be
incorrectly flagged as a recursion.

To handle this case, create another bit call TRANSITION and test it if the
current context bit is already set. Flag the call as a recursion if the
TRANSITION bit is already set, and if not, set it and continue. The
TRANSITION bit will be cleared normally on the return of the function that
set it, or if the current context bit is clear, set it and clear the
TRANSITION bit to allow for another transition between the current context
and an even higher one.

Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 08:52:18 -05:00
Steven Rostedt (VMware)
ee11b93f95 ftrace: Fix recursion check for NMI test
The code that checks recursion will work to only do the recursion check once
if there's nested checks. The top one will do the check, the other nested
checks will see recursion was already checked and return zero for its "bit".
On the return side, nothing will be done if the "bit" is zero.

The problem is that zero is returned for the "good" bit when in NMI context.
This will set the bit for NMIs making it look like *all* NMI tracing is
recursing, and prevent tracing of anything in NMI context!

The simple fix is to return "bit + 1" and subtract that bit on the end to
get the real bit.

Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 08:52:18 -05:00
Qiujun Huang
c1acb4ac1a tracing: Fix out of bounds write in get_trace_buf
The nesting count of trace_printk allows for 4 levels of nesting. The
nesting counter starts at zero and is incremented before being used to
retrieve the current context's buffer. But the index to the buffer uses the
nesting counter after it was incremented, and not its original number,
which in needs to do.

Link: https://lkml.kernel.org/r/20201029161905.4269-1-hqjagain@gmail.com

Cc: stable@vger.kernel.org
Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification")
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02 08:52:18 -05:00
Linus Torvalds
4312e0e8d3 A few fixes for timers/timekeeping:
- Prevent undefined behaviour in the timespec64_to_ns() conversion which
     is used for converting user supplied time input to nanoseconds. It
     lacked overflow protection.
 
   - Mark sched_clock_read_begin/retry() to prevent recursion in the tracer
 
   - Remove unused debug functions in the hrtimer and timerlist code
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evaETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoT69D/0Z3ZGdNQu3mAPYdMSaW7R3MfXnLKno
 biRCwre/T88Exv+Yu612iLthTlfNlxZbofAgUc5BvCwjAjtt+uXlmS8mwXNsbGFd
 516hisUnjJkjbcHI6WxrQQ7+YsK2XLjOeV90TpLs9zMgI94kQ+uzEzqKc6ZaRQP4
 Zi/emqgHeO/RoObNBWjNh9McrHqXllXU/LeqD7y82JXdLPik78PniKFkh7B1d6u8
 RL7kW1TeblkNaHMVjO4/lM9zR1hkZr4GPYbEHIbVN8FMQ10BAD2iswUR72U2mqx7
 zf2Jt+09UEzPRu2gOr4Lrvo6bJQKClu4ZUnRt4apU4BeTgyG0gi7pgy0VZ3M1xtq
 KUnN8dNSMBwTurv9GuGTdiNpfFad2nsnSAra8r7GlZDHxNe1wWIPJmyu/FV1xohD
 yUgRq2s7rvCb894oz159hyIwkI/ZNbFr/AJTCvdzowgf9nf4WS5/Y2/JVLF28LRT
 4xPhvuVUfrhHJajoZayuVvxYSVEPZFek2SgUEIFi6b1+FiMFwV6guv/f31D5GQ+E
 NHgvDMbDsVmzGt++c4zFHoqWvBrgC8Os8AEP8s2oczfvwMt2A8g7nIrGieAux+Jm
 eLO1DfXaXpQi/MmTBf/WD1dKwjuoLV9dT0Fp2y2docu9SXm/bMgRhxX7fm3uDge7
 W87XWkfIxd5fzA==
 =Gdoq
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A few fixes for timers/timekeeping:

   - Prevent undefined behaviour in the timespec64_to_ns() conversion
     which is used for converting user supplied time input to
     nanoseconds. It lacked overflow protection.

   - Mark sched_clock_read_begin/retry() to prevent recursion in the
     tracer

   - Remove unused debug functions in the hrtimer and timerlist code"

* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Prevent undefined behaviour in timespec64_to_ns()
  timers: Remove unused inline funtion debug_timer_free()
  hrtimer: Remove unused inline function debug_hrtimer_free()
  time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
2020-11-01 11:13:45 -08:00
Linus Torvalds
82423b46fc A single fix for stop machine. Mark functions no trace to prevent a crash
caused by recursion when enabling or disabling a tracer on RISC-V (probably
 all architectures which patch through stop machine).
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evLQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQR6D/49PIepVEn63HmgsKdePUShUFzYb+v6
 mPV0iwaDCdW6BKXVEuHkzGdz3dB5ytdIZP6tJLV5a44tYxPhYESO+O4FwAJnqBxD
 JGWLUELxqeEESnjBHww26/YptxqRwcVir2G0WeRDIO0eV2JE424ptdRSisgRVC2s
 Kcu41D9Ry5SLayEKN29GyRXQPrMwLCUTAzeziN4TwJeRB2exSUMg6iDFiGC2uZy6
 IzSu9BXak5T61LL7q5urElkY8a+sGLx6N29QBtSNxazYeXCzCoEtX+1IQ2e9b3PO
 4abFsEtYJ9Mbw4R+iKm0lIpUMKmr/GOPWBs/PDTqK0aYhpc8UnKXTCr6hWLTvtkE
 Anb/08AqN7YtvgbBR+bEN4fPOP4jL5kQVjLbNlIKm5ktRjZ5Sfd9OKOc//c/Bs3e
 Auqe2u+UW2REDsB3lNiVFY+I/1nQUVwKLwfOsBHJW1KzH6O+BhJrJgigUERTeyTf
 B5eLsOIxKgP1vprA/qQgPc/VT25PFIMOtxr7KhH1qPBej9R/wGEOdeOGSdJoiNYM
 aH3kCTUMU0NP8IUSyVQFt4v2IhZ5aF+3rQmyVXifGRl54QCXKgb5gzqcQ/L2SFn+
 I4SkB35hAoEzJ9S4vtx5ZtWPXAO20V644qRMalYv5X739zMVZQ2AyHGB9rhhktmX
 a8ZDIVFCh0ssmw==
 =38RZ
 -----END PGP SIGNATURE-----

Merge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fix from Thomas Gleixner:
 "A single fix for stop machine.

  Mark functions no trace to prevent a crash caused by recursion when
  enabling or disabling a tracer on RISC-V (probably all architectures
  which patch through stop machine)"

* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine, rcu: Mark functions as notrace
2020-11-01 11:11:38 -08:00
Linus Torvalds
8d99084efc A couple of locking fixes:
- Fix incorrect failure injection handling on the fuxtex code
 
  - Prevent a preemption warning in lockdep when tracking local_irq_enable()
    and interrupts are already enabled
 
  - Remove more raw_cpu_read() usage from lockdep which causes state
    corruption on !X86 architectures.
 
  - Make the nr_unused_locks accounting in lockdep correct again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evEUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod93EADG90GmRBYQxn6y2eQUKE/9f5SMiMFJ
 KdvuqNBqHOhYm3iUPbZJcb0P/JZi1NHP0fBFMishdESGi96tD96K7T04WD0gmjtm
 ArWFroe8uzEYtY9atlEwM0Nrvq0w8ZBLv9x1adXzJ59vB8/8Uq+wzYioSWn9yMcv
 ye3jfVyAlM7ouFHDQAA36s/nhvZfxms4C0t+6S3gjVTIp/6riGuYh5t7dbXUMlnu
 nGLiIJFjU+ekurweVDGpqD/nAxYfqf3UxebWnrosf7iu6suwYwaPFZGZ/kxlbr5e
 qWx0B1RuhjAoefVJlPTkHmuhd0SnH/Gm/tTNkQ3LidJhPTIhLJlb7zffwyZlc510
 VdaUipfZ6bNqDD6/dK6fKJtdKSE4w/z3pT53954NUD5zw/jIcHlgnaQieh72DH+F
 1EKqmsNrwHAxYfMndQxLGdIoBScUAFzHzDnzsY9KKS2cfhChljzLa2nDIfMsDfKQ
 aROugzEbZPQEb1iWUEOF3XopcuZzZQCaPlLDLvAnsBeYEPm0gdmbKFPFsDjOyBVX
 /Qc41O7DyHKcoiLX2zM2c7CxnV5J6YEZz3jQSZLFlpH9Ih7jwAl9/6VirggNUNvV
 YVsgM/myhYQtJBqHHojNppZFFW3KdgfxWuY7+qt7Ox5w/ck5qYQwRnoB4FROwVHV
 pzcYTBE5qkQnIw==
 =S01o
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A couple of locking fixes:

   - Fix incorrect failure injection handling in the fuxtex code

   - Prevent a preemption warning in lockdep when tracking
     local_irq_enable() and interrupts are already enabled

   - Remove more raw_cpu_read() usage from lockdep which causes state
     corruption on !X86 architectures.

   - Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix nr_unused_locks accounting
  locking/lockdep: Remove more raw_cpu_read() usage
  futex: Fix incorrect should_fail_futex() handling
  lockdep: Fix preemption WARN for spurious IRQ-enable
2020-11-01 11:08:17 -08:00
Linus Torvalds
53760f9b74 flexible-array member conversion patches for 5.10-rc2
Hi Linus,
 
 Please, pull the following patches that replace zero-length arrays with
 flexible-array members.
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl+cjRUACgkQRwW0y0cG
 2zGWAhAAjUfTsAmXWhKNaWFSCYR0Q822puTUWOKfiBd+jjGaO04luTtr2gjv2Dkb
 Vgad8H4N8oZU79xfh5JZ5PUyScaso8wE6ZJTh2PLKXpKmNd213f5x/pIt78CCDTa
 Y1L/eR41mmveTL3VNS3sf6WaZpT9owxJKGIY8JgdiOmSjxJQpX5zdaC1KYso4eXr
 lIXIRo9VLEmVLhhHhZi+QmX6+aQ05E1D9K0ENe4/uEnRsV525W78iwZ4fYeLzr+A
 krEOdgx6sPgzajPYnHoayrrcKNKxD5YY1SWuVSm2tqYYIhlRoK3f5xgLOd10RiHE
 YMgx8aWzGmGJwoUhgp1bo/l9EZ7O8OWRqM/GOP4x6Wgjdhqw2x5jgskmhsKNGEXu
 /BlbS+qL5aUrMCxhvNbApuZW6xBiBbva76MH3vU9vFhZbVz1CHLQdGI0tfxggYWS
 jc2UPgoxL9OQlf3jSc+gK7RMFhBGNWn2Aiy8GQas3BxPYXuYPvwOj+irDOG/qZ9D
 VZ5swUw4+th+DsF5K53mEFeLv0fONMgL9Ka5bNR6+k6HG0WNLYYVOiet3xYUDo1f
 eZbMZthfc+QW7R8cwG0WuFk6rC6mLqE+A9nQuLZoJD+VMuJd4pwW9+6EW8nDX08w
 FS4/o92xUFJfOCgaLRS61FSAuSmFENieN+yoKMK/Uf6PJVdNMb4=
 =vyu3
 -----END PGP SIGNATURE-----

Merge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull more flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members"

* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  printk: ringbuffer: Replace zero-length array with flexible-array member
  net/smc: Replace zero-length array with flexible-array member
  net/mlx5: Replace zero-length array with flexible-array member
  mei: hw: Replace zero-length array with flexible-array member
  gve: Replace zero-length array with flexible-array member
  Bluetooth: btintel: Replace zero-length array with flexible-array member
  scsi: target: tcmu: Replace zero-length array with flexible-array member
  ima: Replace zero-length array with flexible-array member
  enetc: Replace zero-length array with flexible-array member
  fs: Replace zero-length array with flexible-array member
  Bluetooth: Replace zero-length array with flexible-array member
  params: Replace zero-length array with flexible-array member
  tracepoint: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
  platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
  mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
  dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
2020-10-31 14:31:28 -07:00
Gustavo A. R. Silva
a38283da05 printk: ringbuffer: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-10-30 16:57:42 -05:00
Linus Torvalds
8843f40550 Power management fixes for 5.10-rc2
- Modify Kconfig to prevent configuring either the "conservative"
    or the "ondemand" governor as the default cpufreq governor if
    intel_pstate is selected, in which case "schedutil" is the
    default choice for the default governor setting (Rafael Wysocki).
 
  - Modify the cpufreq core, intel_pstate and the schedutil governor
    to avoid missing updates of the HWP max limit when intel_pstate
    operates in the passive mode with HWP enabled (Rafael Wysocki).
 
  - Fix max_cstate module parameter handling in intel_idle for
    processor models with C-state tables coming from ACPI (Chen Yu).
 
  - Clean up assorted pieces of power management code (Jackie Zamow,
    Tom Rix, Zhang Qilong).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+cOyISHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxxj0P/1qmvZeLR5ODvSKLURSlzQYPiF4Ud0uo
 7huByO3K63hJ9qpEQZVVzNl0Yuu5vCa7cliebnIRNDHMy4AVOVw5Qf9Uhz/tFlmZ
 ZBvSX1LGWdvMzUnx00AASdHbNOrFy+YyooQ5RfA8sFmtvAhRhPNRW9VCzc9Midoh
 cICSQ6fQDn+0wb+hv89HQlIRvbakM6a8QDL4xI6Y8aS/a5OwOHGJQnC7ZLPIoQLi
 BxD/IEDdNvDiLFPx8OpxJ93NtLaz1tyvWFJHvqSnOREDYB22YkG8WLZftY7h8pz6
 MMbXrZN11/Tex1H0KXDI4YscCBnwxNnFuBo1ur/FX6GvDm2Sm9ybpVHCXxYKux46
 ZFc8K2auqApt98xTlrqKru/rfCB2/fubypUustqpYeCjowBENPFbjyag8bRaKLh3
 awiDwwz1wZDd2mZfEl4iaxJI4gjUwRmPDC1UiT1dlc7nozaDzPxrGGiQghAshOnW
 JVQAiw+evERIFuH5YvJ+/si1sJYhreMjoQ7mICNMmpgJ2ElVvNufpW9aVRQiycp1
 9a0u1ZgGs9DI1580+YTl1nGvsRefdgvMDpozcESwxY8dmMm+027+lyLhwlrUTPfy
 gmakUcb4jbvIicG53huxus7teqBLwrX36HOHBb1l3tSJfOkcQieddGUZxuT8Q9xp
 rAY2mRMRr7SI
 =w+LI
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a few issues related to running intel_pstate in the passive
  mode with HWP enabled, correct the handling of the max_cstate module
  parameter in intel_idle and make a few janitorial changes.

  Specifics:

   - Modify Kconfig to prevent configuring either the "conservative" or
     the "ondemand" governor as the default cpufreq governor if
     intel_pstate is selected, in which case "schedutil" is the default
     choice for the default governor setting (Rafael Wysocki).

   - Modify the cpufreq core, intel_pstate and the schedutil governor to
     avoid missing updates of the HWP max limit when intel_pstate
     operates in the passive mode with HWP enabled (Rafael Wysocki).

   - Fix max_cstate module parameter handling in intel_idle for
     processor models with C-state tables coming from ACPI (Chen Yu).

   - Clean up assorted pieces of power management code (Jackie Zamow,
     Tom Rix, Zhang Qilong)"

* tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set
  cpufreq: Introduce cpufreq_driver_test_flags()
  cpufreq: speedstep: remove unneeded semicolon
  PM: sleep: fix typo in kernel/power/process.c
  intel_idle: Fix max_cstate for processor models without C-state tables
  cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode
  cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag
  cpufreq: Avoid configuring old governors as default with intel_pstate
  cpufreq: e_powersaver: remove unreachable break
2020-10-30 12:45:04 -07:00
Peter Zijlstra
1a39340865 lockdep: Fix nr_unused_locks accounting
Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
usage_traceoverflow") breaks the nr_unused_locks validation code
triggered by /proc/lockdep_stats.

By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
indicator for accounting nr_unused_locks; simplyfy by using any first
bit.

Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net
2020-10-30 17:07:18 +01:00
Peter Zijlstra
d48e385003 locking/lockdep: Remove more raw_cpu_read() usage
I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.

And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.

Similer to:

  baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net
2020-10-30 17:07:18 +01:00
Rafael J. Wysocki
dea47cf45a Merge branches 'pm-cpuidle' and 'pm-sleep'
* pm-cpuidle:
  intel_idle: Fix max_cstate for processor models without C-state tables

* pm-sleep:
  PM: sleep: fix typo in kernel/power/process.c
2020-10-30 16:30:14 +01:00
Ard Biesheuvel
080b6f4076 bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.

However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.

This reverts commit 3193c0836, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.

Fixes: 3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org
2020-10-29 20:01:46 -07:00