27383 Commits

Author SHA1 Message Date
Peter Zijlstra
a22e47a4e3 sched/core: Convert nohz_flags to atomic_t
Using atomic_t allows us to use the more flexible bitops provided
there. Also its smaller.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 07:59:16 +01:00
Peter Zijlstra
8f111bc357 cpufreq/schedutil: Rewrite CPUFREQ_RT support
Instead of trying to duplicate scheduler state to track if an RT task
is running, directly use the scheduler runqueue state for it.

This vastly simplifies things and fixes a number of bugs related to
sugov and the scheduler getting out of sync wrt this state.

As a consequence we not also update the remove cfs/dl state when
iterating the shared mask.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 07:59:15 +01:00
Peter Zijlstra
4042d003a0 cpufreq/schedutil: Remove unused CPUFREQ_DL
Bitrot...

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 07:59:14 +01:00
Norbert Manthey
13a453c241 sched/fair: Add ';' after label attributes
Due to using GCC defines for configuration, some labels might be unused in
certain configurations. While adding a __maybe_unused to the label is
fine in general, the line has to be terminated with ';'. This is also
reflected in the GCC documentation, but GCC parsed the previous variant
without an error message.

This has been spotted while compiling with goto-cc, the compiler for the
CPROVER tool suite.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Signed-off-by: Michael Tautschnig <tautschn@amazon.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1519717660-16157-1-git-send-email-nmanthey@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 07:59:13 +01:00
Ingo Molnar
fc4c5a3828 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 07:32:20 +01:00
Richard Guy Briggs
45b578fe4c audit: link denied should not directly generate PATH record
Audit link denied events generate duplicate PATH records which disagree
in different ways from symlink and hardlink denials.
audit_log_link_denied() should not directly generate PATH records.

See: https://github.com/linux-audit/audit-kernel/issues/21

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-08 19:25:35 -05:00
Richard Guy Briggs
15564ff0a1 audit: make ANOM_LINK obey audit_enabled and audit_dummy_context
Audit link denied events emit disjointed records when audit is disabled.
No records should be emitted when audit is disabled.

See: https://github.com/linux-audit/audit-kernel/issues/21

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-08 19:19:54 -05:00
Leon Yu
3f553b308b module: propagate error in modules_open()
otherwise kernel can oops later in seq_release() due to dereferencing null
file->private_data which is only set if seq_open() succeeds.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: seq_release+0xc/0x30
Call Trace:
 close_pdeo+0x37/0xd0
 proc_reg_release+0x5d/0x60
 __fput+0x9d/0x1d0
 ____fput+0x9/0x10
 task_work_run+0x75/0x90
 do_exit+0x252/0xa00
 do_group_exit+0x36/0xb0
 SyS_exit_group+0xf/0x10

Fixes: 516fb7f2e73d ("/proc/module: use the same logic as /proc/kallsyms for address exposure")
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-03-08 21:58:51 +01:00
Linus Torvalds
e67548254b MIPS fixes for 4.16-rc5
A miscellaneous pile of MIPS fixes for 4.16:
  - Move put_compat_sigset() to evade hardened usercopy warnings (4.16)
  - Select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16)
  - Fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0)
  - Fix disabling of IPIs during BMIPS suspend (3.19)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlqhPTwACgkQbAtpk944
 dnq+WA//UR0iGZWc+FlpSAKjvVMWecvjQx81RhoBsFL18TmhDp9dtAVrm+oDmlJB
 /WpnA+BZCM58oEeXYNZ4dc4k4nD9VQTrFksaFJkVKBNyuVEiZbv3u21v6NKbUYBw
 0S4J2YKkUWtLIPFi5aymzPWn1bajY/Q2wi/1REeTIdcVeghWQDd/iShjYpSnvpCY
 XGvZebJyzFVigX404Z7WFDNoO9GhsZwdOMe00nW536ph2LFyCE6U65VzwdmvVZkr
 95kpnfQAb5aYv1/2jFEhyoX2ddhHuCmT+TfJ5Db68dd3AXkV/pdOjcNMdPOhmlFB
 ebDlFw91XKoAT360M/JtQkamZt09Zzl+Ea7lJ7Me4N6JGmEcqIjrZc7tNTUKknnW
 2W8WhrDpuZ2x4x3jx0ckGSvFFYhtkcFqNTktTjMYOSwmSiyd1Txe2VPLSVHy3d3J
 SbE3ioqnbIq+LHiVEqtFxmmsqrfgrh36v+Mc3kihQCCybTji+dGH8GR3bF7ccLOW
 6rpENPPVlP6/A/UNeNfH4d6apjhVBLuo1qdo22KEDKz6o9OUI3UKTKUG9wGhzvho
 rV+2LaYgtn4qmYoqjxBp2NgxZRkuw5TdYOXQL0XQl8te2L7bWyGXqWazfhbwMLpT
 p6yZaMIzPJhpPbhsyh/gTjqqPFmf86lFQW6SXwsbOT1aZnK9tGs=
 =X1dj
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS fixes from James Hogan:
 "A miscellaneous pile of MIPS fixes for 4.16:

   - move put_compat_sigset() to evade hardened usercopy warnings (4.16)

   - select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16)

   - fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0)

   - fix disabling of IPIs during BMIPS suspend (3.19)"

* tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
  MIPS: BMIPS: Do not mask IPIs during suspend
  MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_SERIO
  MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_PARPORT
  signals: Move put_compat_sigset to compat.h to silence hardened usercopy
  MIPS: OCTEON: irq: Check for null return on kzalloc allocation
  MIPS: ath25: Check for kzalloc allocation failure
2018-03-08 10:03:12 -08:00
Borislav Petkov
5ad7510537 panic: Add closing panic marker parenthesis
Otherwise it looks unbalanced.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-2-bp@alien8.de
2018-03-08 12:01:10 +01:00
Teng Qin
95da0cdb72 bpf: add support to read sample address in bpf program
This commit adds new field "addr" to bpf_perf_event_data which could be
read and used by bpf programs attached to perf events. The value of the
field is copied from bpf_perf_event_data_kern.addr and contains the
address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
when calling perf_event_open.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-08 02:22:34 +01:00
Stephen Smalley
6b4f3d0105 usb, signal, security: only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill
commit d178bc3a708f39cbfefc3fab37032d3f2511b4ec ("user namespace: usb:
 make usb urbs user namespace aware (v2)") changed kill_pid_info_as_uid
to kill_pid_info_as_cred, saving and passing a cred structure instead of
uids.  Since the secid can be obtained from the cred, drop the secid fields
from the usb_dev_state and async structures, and drop the secid argument to
kill_pid_info_as_cred.  Replace the secid argument to security_task_kill
with the cred.  Update SELinux, Smack, and AppArmor to use the cred, which
avoids the need for Smack and AppArmor to use a secid at all in this hook.
Further changes to Smack might still be required to take full advantage of
this change, since it should now be possible to perform capability
checking based on the supplied cred.  The changes to Smack and AppArmor
have only been compile-tested.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2018-03-07 09:05:53 +11:00
Oliver O'Halloran
167f5594b5 kernel/memremap: Remove stale devres_free() call
devm_memremap_pages() was re-worked in e8d513483300 "memremap: change
devm_memremap_pages interface to use struct dev_pagemap" to take a
caller allocated struct dev_pagemap as a function parameter. A call to
devres_free() was left in the error cleanup path which results in a
kernel panic if the remap fails for some reason. Remove it to fix the
panic and let devm_memremap_pages() fail gracefully.

Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-06 10:58:54 -08:00
Greg Edwards
11dd266637 audit: do not panic on invalid boot parameter
If you pass in an invalid audit boot parameter value, e.g. "audit=off",
the kernel panics very early in boot before the regular console is
initialized.  Unless you have earlyprintk enabled, there is no
indication of what the problem is on the console.

Convert the panic() calls to pr_err(), and leave auditing enabled if an
invalid parameter value was passed in.

Modify the parameter to also accept "on" or "off" as valid values, and
update the documentation accordingly.

Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-06 13:50:07 -05:00
Ingo Molnar
8af31363cd Linux 4.16-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlqceRweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG59gH/0CVX4x6EobO/PQu
 CzLVtAoRGFuIghB6Gmbx3Q1Ck4sn4q2SqUKTtkf03yauRGvnJsEmd6wEZ8f2IOHy
 f30nX9s+4irzpQUIum4rH9KP6SMVJfNXlSVSisnamA6MbhPre3/NRcAIBUxdE4cK
 lP81TaT6Nvp5cOySlPjPdWSbN4B1froFQ6rZ/lvG406QzqCvKvlS39h6IYjOF7Ds
 zB/h3RkyuK9YyxFUO338RTEQ583esc0jTiTN4Pzb6nH3x8aTawDqGrwI2B4mkTLw
 vNSPPE2VW9to0cZX+J7TH+uusPNXIlHZCD9tXwqWe5M+sCrE2FuydnmpZIf1A2LY
 aWs0KQs=
 =4nyn
 -----END PGP SIGNATURE-----

Merge tag 'v4.16-rc4' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-06 07:30:22 +01:00
David S. Miller
0f3e9c97eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All of the conflicts were cases of overlapping changes.

In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06 01:20:46 -05:00
Linus Torvalds
547046141f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Use an appropriate TSQ pacing shift in mac80211, from Toke
    Høiland-Jørgensen.

 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
    in ip6_route_me_harder, from Eric Dumazet.

 3) Fix several shutdown races and similar other problems in l2tp, from
    James Chapman.

 4) Handle missing XDP flush properly in tuntap, for real this time.
    From Jason Wang.

 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
    Borkmann.

 6) Fix phy_resume() locking, from Andrew Lunn.

 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
    from Xin Long.

 8) Revert F-RTO middle box workarounds, they only handle one dimension
    of the problem. From Yuchung Cheng.

 9) Fix socket refcounting in RDS, from Ka-Cheong Poon.

10) Don't allow ppp unit registration to an unregistered channel, from
    Guillaume Nault.

11) Various hv_netvsc fixes from Stephen Hemminger.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
  hv_netvsc: propagate rx filters to VF
  hv_netvsc: filter multicast/broadcast
  hv_netvsc: defer queue selection to VF
  hv_netvsc: use napi_schedule_irqoff
  hv_netvsc: fix race in napi poll when rescheduling
  hv_netvsc: cancel subchannel setup before halting device
  hv_netvsc: fix error unwind handling if vmbus_open fails
  hv_netvsc: only wake transmit queue if link is up
  hv_netvsc: avoid retry on send during shutdown
  virtio-net: re enable XDP_REDIRECT for mergeable buffer
  ppp: prevent unregistered channels from connecting to PPP units
  tc-testing: skbmod: fix match value of ethertype
  mlxsw: spectrum_switchdev: Check success of FDB add operation
  net: make skb_gso_*_seglen functions private
  net: xfrm: use skb_gso_validate_network_len() to check gso sizes
  net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  rds: Incorrect reference counting in TCP socket creation
  net: ethtool: don't ignore return from driver get_fecparam method
  vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
  ...
2018-03-05 11:29:24 -08:00
Linus Torvalds
4c4ce3022d Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A small set of fixes from the timer departement:

   - Add a missing timer wheel clock forward when migrating timers off a
     unplugged CPU to prevent operating on a stale clock base and
     missing timer deadlines.

   - Use the proper shift count to extract data from a register value to
     prevent evaluating unrelated bits

   - Make the error return check in the FSL timer driver work correctly.
     Checking an unsigned variable for less than zero does not really
     work well.

   - Clarify the confusing comments in the ARC timer code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Forward timer base before migrating timers
  clocksource/drivers/arc_timer: Update some comments
  clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
  clocksource/drivers/fsl_ftm_timer: Fix error return checking
2018-03-04 11:34:49 -08:00
Ingo Molnar
14a7405b2e sched/core: Undefine tracepoint creation at the end of core.c
Make it easier to concatenate all the scheduler .c files for single-module
compilation.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04 12:39:34 +01:00
Ingo Molnar
02d8ec9456 sched/deadline, rt: Rename queue_push_tasks/queue_pull_task to create separate namespace
There are similarly named functions in both of these modules:

  kernel/sched/deadline.c:static inline void queue_push_tasks(struct rq *rq)
  kernel/sched/deadline.c:static inline void queue_pull_task(struct rq *rq)
  kernel/sched/deadline.c:static inline void queue_push_tasks(struct rq *rq)
  kernel/sched/deadline.c:static inline void queue_pull_task(struct rq *rq)
  kernel/sched/deadline.c:	queue_push_tasks(rq);
  kernel/sched/deadline.c:	queue_pull_task(rq);
  kernel/sched/deadline.c:			queue_push_tasks(rq);
  kernel/sched/deadline.c:			queue_pull_task(rq);
  kernel/sched/rt.c:static inline void queue_push_tasks(struct rq *rq)
  kernel/sched/rt.c:static inline void queue_pull_task(struct rq *rq)
  kernel/sched/rt.c:static inline void queue_push_tasks(struct rq *rq)
  kernel/sched/rt.c:	queue_push_tasks(rq);
  kernel/sched/rt.c:	queue_pull_task(rq);
  kernel/sched/rt.c:			queue_push_tasks(rq);
  kernel/sched/rt.c:			queue_pull_task(rq);

... which makes it harder to grep for them. Prefix them with
deadline_ and rt_, respectively.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04 12:39:34 +01:00
Ingo Molnar
a92057e14b sched/idle: Merge kernel/sched/idle.c and kernel/sched/idle_task.c
Merge these two small .c modules as they implement two aspects
of idle task handling.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04 12:39:33 +01:00
Ingo Molnar
325ea10c08 sched/headers: Simplify and clean up header usage in the scheduler
Do the following cleanups and simplifications:

 - sched/sched.h already includes <asm/paravirt.h>, so no need to
   include it in sched/core.c again.

 - order the <linux/sched/*.h> headers alphabetically

 - add all <linux/sched/*.h> headers to kernel/sched/sched.h

 - remove all unnecessary includes from the .c files that
   are already included in kernel/sched/sched.h.

Finally, make all scheduler .c files use a single common header:

  #include "sched.h"

... which now contains a union of the relied upon headers.

This makes the various .c files easier to read and easier to handle.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04 12:39:29 +01:00
Linus Torvalds
20f14172cb Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "A 4.16 regression fix, three fixes for -stable, and a cleanup fix:

   - During the merge window support for the new ACPI NVDIMM Platform
     Capabilities structure disabled support for "deep flush", a
     force-unit- access like mechanism for persistent memory. Restore
     that mechanism.

   - VFIO like RDMA is yet one more memory registration / pinning
     interface that is incompatible with Filesystem-DAX. Disable long
     term pins of Filesystem-DAX mappings via VFIO.

   - The Filesystem-DAX detection to prevent long terms pins mistakenly
     also disabled Device-DAX pins which are not subject to the same
     block- map collision concerns.

   - Similar to the setup path, softlockup warnings can trigger in the
     shutdown path for large persistent memory namespaces. Teach
     for_each_device_pfn() to perform cond_resched() in all cases.

   - Boaz noticed that the might_sleep() in dax_direct_access() is stale
     as of the v4.15 kernel.

  These have received a build success notification from the 0day robot,
  and the longterm pin fixes have appeared in -next. However, I recently
  rebased the tree to remove some other fixes that need to be reworked
  after review feedback.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  memremap: fix softlockup reports at teardown
  libnvdimm: re-enable deep flush for pmem devices via fsync()
  vfio: disable filesystem-dax page pinning
  dax: fix vma_is_fsdax() helper
  dax: ->direct_access does not sleep anymore
2018-03-03 14:32:00 -08:00
Ingo Molnar
97fb7a0a89 sched: Clean up and harmonize the coding style of the scheduler code base
A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:

 - fix speling in comments,

 - use curly braces for multi-line statements,

 - remove unnecessary parentheses from integer literals,

 - capitalize consistently,

 - remove stray newlines,

 - add comments where necessary,

 - remove invalid/unnecessary comments,

 - align structure definitions and other data types vertically,

 - add missing newlines for increased readability,

 - fix vertical tabulation where it's misaligned,

 - harmonize preprocessor conditional block labeling
   and vertical alignment,

 - remove line-breaks where they uglify the code,

 - add newline after local variable definitions,

No change in functionality:

  md5:
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-03 15:50:21 +01:00
Mario Leinweber
c2e513821d sched/deadline: Clean up various coding style details
- Fixed style error: Missing space before the open parenthesis
- Fixed style warnings: 2x Missing blank line after declaration

One warning left: else after return
 (I don't feel comfortable fixing that without side effects)

Signed-off-by: Mario Leinweber <marioleinweber@web.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302182007.28691-1-marioleinweber@web.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-03 15:50:20 +01:00
Dan Williams
949b93250a memremap: fix softlockup reports at teardown
The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.

Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Fixes: 71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page()...")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-02 19:34:50 -08:00
Matt Redfearn
fde9fc766e
signals: Move put_compat_sigset to compat.h to silence hardened usercopy
Since commit afcc90f8621e ("usercopy: WARN() on slab cache usercopy
region violations"), MIPS systems booting with a compat root filesystem
emit a warning when copying compat siginfo to userspace:

WARNING: CPU: 0 PID: 953 at mm/usercopy.c:81 usercopy_warn+0x98/0xe8
Bad or missing usercopy whitelist? Kernel memory exposure attempt
detected from SLAB object 'task_struct' (offset 1432, size 16)!
Modules linked in:
CPU: 0 PID: 953 Comm: S01logging Not tainted 4.16.0-rc2 #10
Stack : ffffffff808c0000 0000000000000000 0000000000000001 65ac85163f3bdc4a
	65ac85163f3bdc4a 0000000000000000 90000000ff667ab8 ffffffff808c0000
	00000000000003f8 ffffffff808d0000 00000000000000d1 0000000000000000
	000000000000003c 0000000000000000 ffffffff808c8ca8 ffffffff808d0000
	ffffffff808d0000 ffffffff80810000 fffffc0000000000 ffffffff80785c30
	0000000000000009 0000000000000051 90000000ff667eb0 90000000ff667db0
	000000007fe0d938 0000000000000018 ffffffff80449958 0000000020052798
	ffffffff808c0000 90000000ff664000 90000000ff667ab0 00000000100c0000
	ffffffff80698810 0000000000000000 0000000000000000 0000000000000000
	0000000000000000 0000000000000000 ffffffff8010d02c 65ac85163f3bdc4a
	...
Call Trace:
[<ffffffff8010d02c>] show_stack+0x9c/0x130
[<ffffffff80698810>] dump_stack+0x90/0xd0
[<ffffffff80137b78>] __warn+0x100/0x118
[<ffffffff80137bdc>] warn_slowpath_fmt+0x4c/0x70
[<ffffffff8021e4a8>] usercopy_warn+0x98/0xe8
[<ffffffff8021e68c>] __check_object_size+0xfc/0x250
[<ffffffff801bbfb8>] put_compat_sigset+0x30/0x88
[<ffffffff8011af24>] setup_rt_frame_n32+0xc4/0x160
[<ffffffff8010b8b4>] do_signal+0x19c/0x230
[<ffffffff8010c408>] do_notify_resume+0x60/0x78
[<ffffffff80106f50>] work_notifysig+0x10/0x18
---[ end trace 88fffbf69147f48a ]---

Commit 5905429ad856 ("fork: Provide usercopy whitelisting for
task_struct") noted that:

"While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t))."

However, this is not true in the case of compat signals, whose sigset
is copied by put_compat_sigset and receives size as an argument.

At most call sites, put_compat_sigset is copying a sigset from the
current task_struct. This triggers a warning when
CONFIG_HARDENED_USERCOPY is active. However, by marking this function as
static inline, the warning can be avoided because in all of these cases
the size is constant at compile time, which is allowed. The only site
where this is not the case is handling the rt_sigpending syscall, but
there the copy is being made from a stack local variable so does not
trigger the warning.

Move put_compat_sigset to compat.h, and mark it static inline. This
fixes the WARN on MIPS.

Fixes: afcc90f8621e ("usercopy: WARN() on slab cache usercopy region violations")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Dmitry V . Levin" <ldv@altlinux.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18639/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-03-02 21:31:55 +00:00
David S. Miller
a5f7b0eeb2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-02-28

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Add schedule points and reduce the number of loop iterations
   the test_bpf kernel module is performing in order to not hog
   the CPU for too long, from Eric.

2) Fix an out of bounds access in tail calls in the ppc64 BPF
   JIT compiler, from Daniel.

3) Fix a crash on arm64 on unaligned BPF xadd operations that
   could be triggered via interpreter and JIT, from Daniel.

Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 21:42:07 -05:00
Linus Torvalds
7bec4a9646 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
 "Make sure that we wake up userspace loggers. This fixes a race
  introduced by the console waiter logic during this merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: Wake klogd when passing console_lock owner
2018-03-01 10:06:39 -08:00
Lingutla Chandrasekhar
c52232a49e timers: Forward timer base before migrating timers
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.

If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.

In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.

But there is a worse problem than that. The following sequence of events
illustrates it:

 - CPU0 timer1 is queued expires = 59969 and base->clk = 59131.

   The timer is queued at wheel level 2, with resulting expiry time = 60032
   (due to level granularity).

 - CPU1 enters idle @60007, with next timer expiry @60020.

 - CPU0 is hotplugged at @60009

 - CPU1 exits idle and runs the control thread which migrates the
   timers from CPU0

   timer1 is now queued in level 0 for immediate handling in the next
   softirq because the requested expiry time 59969 is before CPU1 base->clk
   60007

 - CPU1 runs code which forwards the base clock which succeeds because the
   next expiring timer. which was collected at idle entry time is still set
   to 60020.

   So it forwards beyond 60007 and therefore misses to expire the migrated
   timer1. That timer gets expired when the wheel wraps around again, which
   takes between 63 and 630ms depending on the HZ setting.

Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.

[ tglx: Massaged comment and changelog ]

Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
2018-02-28 23:34:33 +01:00
Andy Shevchenko
d61e2944b6 genirq: Add wakeup sysfs node to show IRQ wakeup state
Surprisingly there is no simple way to see if the IRQ line in question
is wakeup source or not.

Note that wakeup might be an OOB (out-of-band) source like GPIO line
which makes things slightly more complicated.

Add a sysfs node to cover this case.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20180226155043.67937-1-andriy.shevchenko@linux.intel.com
2018-02-28 18:07:20 +01:00
Borislav Petkov
04860d48a8 locking/lockdep: Show unadorned pointers
Show unadorned pointers in lockdep reports - lockdep is a debugging
facility and hashing pointers there doesn't make a whole lotta sense.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20180226134926.23069-1-bp@alien8.de
2018-02-28 15:25:44 +01:00
Baolin Wang
27263e8dc0 clocksource: Use ATTRIBUTE_GROUPS
Use ATTRIBUTE_GROUPS instead of manually creating the individual device
files.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: sboyd@codeaurora.org
Cc: broonie@kernel.org
Cc: john.stultz@linaro.org
Link: https://lkml.kernel.org/r/d80dccb981dc2461781ebb8d71a32ccdc1b0e6f9.1516167691.git.baolin.wang@linaro.org
2018-02-28 14:05:07 +01:00
Baolin Wang
e87821d18c clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
Convert DEVICE_ATTR to DEVICE_ATTR_RW/RO/WO which is the preferred and
simpler way of implementation.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: sboyd@codeaurora.org
Cc: broonie@kernel.org
Cc: john.stultz@linaro.org
Link: https://lkml.kernel.org/r/8f35c77e753e957b61187e8e7b2e4a3d61e4a72b.1516167691.git.baolin.wang@linaro.org
2018-02-28 14:04:52 +01:00
Baolin Wang
7f852afe44 clocksource: Don't walk the clocksource list for empty override
If the override clocksource name is empty there is no point in walking the
clocksource list for a match.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: sboyd@codeaurora.org
Cc: broonie@kernel.org
Cc: john.stultz@linaro.org
Link: https://lkml.kernel.org/r/069ce2a605546bcad6552968cff755f0a03f9f10.1516167691.git.baolin.wang@linaro.org
2018-02-28 14:04:52 +01:00
Petr Mladek
c14376de3a printk: Wake klogd when passing console_lock owner
wake_klogd is a local variable in console_unlock(). The information
is lost when the console_lock owner using the busy wait added by
the commit dbdda842fe96f8932 ("printk: Add console owner and waiter
logic to load balance console writes"). The following race is
possible:

CPU0				CPU1
console_unlock()

  for (;;)
     /* calling console for last message */

				printk()
				  log_store()
				    log_next_seq++;

     /* see new message */
     if (seen_seq != log_next_seq) {
	wake_klogd = true;
	seen_seq = log_next_seq;
     }

     console_lock_spinning_enable();

				  if (console_trylock_spinning())
				     /* spinning */

     if (console_lock_spinning_disable_and_check()) {
	printk_safe_exit_irqrestore(flags);
	return;

				  console_unlock()
				    if (seen_seq != log_next_seq) {
				    /* already seen */
				    /* nothing to do */

Result: Nobody would wakeup klogd.

One solution would be to make a global variable from wake_klogd.
But then we would need to manipulate it under a lock or so.

This patch wakes klogd also when console_lock is passed to the
spinning waiter. It looks like the right way to go. Also userspace
should have a chance to see and store any "flood" of messages.

Note that the very late klogd wake up was a historic solution.
It made sense on single CPU systems or when sys_syslog() operations
were synchronized using the big kernel lock like in v2.1.113.
But it is questionable these days.

Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
Link: http://lkml.kernel.org/r/20180226155734.dzwg3aovqnwtvkoy@pathway.suse.cz
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-02-27 10:25:50 +01:00
Linus Torvalds
85a2d939c0 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Yet another pile of melted spectrum related changes:

   - sanitize the array_index_nospec protection mechanism: Remove the
     overengineered array_index_nospec_mask_check() magic and allow
     const-qualified types as index to avoid temporary storage in a
     non-const local variable.

   - make the microcode loader more robust by properly propagating error
     codes. Provide information about new feature bits after micro code
     was updated so administrators can act upon.

   - optimizations of the entry ASM code which reduce code footprint and
     make the code simpler and faster.

   - fix the {pmd,pud}_{set,clear}_flags() implementations to work
     properly on paravirt kernels by removing the address translation
     operations.

   - revert the harmful vmexit_fill_RSB() optimization

   - use IBRS around firmware calls

   - teach objtool about retpolines and add annotations for indirect
     jumps and calls.

   - explicitly disable jumplabel patching in __init code and handle
     patching failures properly instead of silently ignoring them.

   - remove indirect paravirt calls for writing the speculation control
     MSR as these calls are obviously proving the same attack vector
     which is tried to be mitigated.

   - a few small fixes which address build issues with recent compiler
     and assembler versions"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
  KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
  objtool, retpolines: Integrate objtool with retpoline support more closely
  x86/entry/64: Simplify ENCODE_FRAME_POINTER
  extable: Make init_kernel_text() global
  jump_label: Warn on failed jump_label patching attempt
  jump_label: Explicitly disable jump labels in __init code
  x86/entry/64: Open-code switch_to_thread_stack()
  x86/entry/64: Move ASM_CLAC to interrupt_entry()
  x86/entry/64: Remove 'interrupt' macro
  x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
  x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
  x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
  x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
  objtool: Add module specific retpoline rules
  objtool: Add retpoline validation
  objtool: Use existing global variables for options
  x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
  x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
  x86/paravirt, objtool: Annotate indirect calls
  ...
2018-02-26 09:34:21 -08:00
David S. Miller
ba6056a41c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-02-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Various improvements for BPF kselftests: i) skip unprivileged tests
   when kernel.unprivileged_bpf_disabled sysctl knob is set, ii) count
   the number of skipped tests from unprivileged, iii) when a test case
   had an unexpected error then print the actual but also the unexpected
   one for better comparison, from Joe.

2) Add a sample program for collecting CPU state statistics with regards
   to how long the CPU resides in cstate and pstate levels. Based on
   cpu_idle and cpu_frequency trace points, from Leo.

3) Various x64 BPF JIT optimizations to further shrink the generated
   image size in order to make it more icache friendly. When tested on
   the Cilium generated programs, image size reduced by approx 4-5% in
   best case mainly due to how LLVM emits unsigned 32 bit constants,
   from Daniel.

4) Improvements and fixes on the BPF sockmap sample programs: i) fix
   the sockmap's Makefile to include nlattr.o for libbpf, ii) detach
   the sock ops programs from the cgroup before exit, from Prashant.

5) Avoid including xdp.h in filter.h by just forward declaring the
   struct xdp_rxq_info in filter.h, from Jesper.

6) Fix the BPF kselftests Makefile for cgroup_helpers.c by only declaring
   it a dependency for test_dev_cgroup.c but not every other test case
   where it is not needed, from Jesper.

7) Adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user selftest since the
   default is insufficient for creating the 'global_map' used in the
   corresponding BPF program, from Yonghong.

8) Likewise, for the xdp_redirect sample, Tushar ran into the same when
   invoking xdp_redirect and xdp_monitor at the same time, therefore
   in order to have the sample generically work bump the limit here,
   too. Fix from Tushar.

9) Avoid an unnecessary NULL check in BPF_CGROUP_RUN_PROG_INET_SOCK()
   since sk is always guaranteed to be non-NULL, from Yafang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26 10:37:24 -05:00
Linus Torvalds
c23a757591 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes:

   - UAPI data type correction for hyperv

   - correct the cpu cores field in /proc/cpuinfo on CPU hotplug

   - return proper error code in the resctrl file system failure path to
     avoid silent subsequent failures

   - correct a subtle accounting issue in the new vector allocation code
     which went unnoticed for a while and caused suspend/resume
     failures"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
  x86/topology: Fix function name in documentation
  x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
  x86/apic/vector: Handle vector release on CPU unplug correctly
  genirq/matrix: Handle CPU offlining proper
  x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
2018-02-25 16:58:55 -08:00
David S. Miller
f74290fdb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-24 00:04:20 -05:00
Paul E. McKenney
338c46403f Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD
fixes.2018.02.23a: Miscellaneous fixes
srcu.2018.02.20a: SRCU updates
torture.2018.02.20a: Torture-test updates
2018-02-23 15:15:41 -08:00
Paul E. McKenney
ad7c946b35 rcu: Create RCU-specific workqueues with rescuers
RCU's expedited grace periods can participate in out-of-memory deadlocks
due to all available system_wq kthreads being blocked and there not being
memory available to create more.  This commit prevents such deadlocks
by allocating an RCU-specific workqueue_struct at early boot time, and
providing it with a rescuer to ensure forward progress.  This uses the
shiny new init_rescuer() function provided by Tejun (but indirectly).

This commit also causes SRCU to use this new RCU-specific
workqueue_struct.  Note that SRCU's use of workqueues never blocks them
waiting for readers, so this should be safe from a forward-progress
viewpoint.  Note that this moves SRCU from system_power_efficient_wq
to a normal workqueue.  In the unlikely event that this results in
measurable degradation, a separate power-efficient workqueue will be
creates for SRCU.

Reported-by: Prateek Sood <prsood@codeaurora.org>
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-02-23 15:14:40 -08:00
Linus Torvalds
9cb9c07d6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.

 2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.

 3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.

 4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.

 5) Fix potential deadlocks in netfilter getsockopt() code paths, from
    Paolo Abeni.

 6) Netfilter stackpointer size checks really are needed to validate
    user input, from Florian Westphal.

 7) Missing timer init in x_tables, from Paolo Abeni.

 8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.

 9) When an ibmvnic device is brought down then back up again, it can be
    sent queue entries from a previous session, handle this properly
    instead of crashing. From Thomas Falcon.

10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.

11) When we are dumping filters in cls_api, the output SKB is empty, and
    the filter we are dumping is too large for the space in the SKB, we
    should return -EMSGSIZE like other netlink dump operations do.
    Otherwise userland has no signal that is needs to increase the size
    of its read buffer. From Roman Kapl.

12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.

13) Module refcount leak in netlink when a dump start fails, from Jason
    Donenfeld.

14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
    from Eric Dumazet.

15) Releasing bpf per-cpu arraymaps can take a long time, add a
    condtional scheduling point. From Eric Dumazet.

16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
    Daniel Borkmann.

17) Fix page leak in gianfar driver, from Andy Spencer.

18) Missed clearing of estimator scratch buffer, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net_sched: gen_estimator: fix broken estimators based on percpu stats
  gianfar: simplify FCS handling and fix memory leak
  ipv6 sit: work around bogus gcc-8 -Wrestrict warning
  macvlan: fix use-after-free in macvlan_common_newlink()
  bpf, arm64: fix out of bounds access in tail call
  bpf, x64: implement retpoline for tail call
  rxrpc: Fix send in rxrpc_send_data_packet()
  net: aquantia: Fix error handling in aq_pci_probe()
  bpf: fix rcu lockdep warning for lpm_trie map_free callback
  bpf: add schedule points in percpu arrays management
  regulatory: add NUL to request alpha2
  ibmvnic: Fix early release of login buffer
  net/smc9194: Remove bogus CONFIG_MAC reference
  net: ipv4: Set addr_type in hash_keys for forwarded case
  tcp_bbr: better deal with suboptimal GSO
  smsc75xx: fix smsc75xx_set_features()
  netlink: put module reference if dump start fails
  selftests/bpf/test_maps: exit child process without error in ENOMEM case
  selftests/bpf: update gitignore with test_libbpf_open
  selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
  ..
2018-02-23 15:14:17 -08:00
Linus Torvalds
2eb02aa94f Merge branch 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:

 - keys fixes via David Howells:
      "A collection of fixes for Linux keyrings, mostly thanks to Eric
       Biggers:

        - Fix some PKCS#7 verification issues.

        - Fix handling of unsupported crypto in X.509.

        - Fix too-large allocation in big_key"

 - Seccomp updates via Kees Cook:
      "These are fixes for the get_metadata interface that landed during
       -rc1. While the new selftest is strictly not a bug fix, I think
       it's in the same spirit of avoiding bugs"

 - an IMA build fix from Randy Dunlap

* 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity/security: fix digsig.c build error with header file
  KEYS: Use individual pages in big_key for crypto buffers
  X.509: fix NULL dereference when restricting key with unsupported_sig
  X.509: fix BUG_ON() when hash algorithm is unsupported
  PKCS#7: fix direct verification of SignerInfo signature
  PKCS#7: fix certificate blacklisting
  PKCS#7: fix certificate chain verification
  seccomp: add a selftest for get_metadata
  ptrace, seccomp: tweak get_metadata behavior slightly
  seccomp, ptrace: switch get_metadata types to arch independent
2018-02-23 15:04:24 -08:00
Daniel Borkmann
ca36960211 bpf: allow xadd only on aligned memory
The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:

  [  830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
  [...]
  [  830.916161] Internal error: Oops: 96000021 [#1] SMP
  [  830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
  [  830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
  [  830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
  [  831.003793] pc : __ll_sc_atomic_add+0x4/0x18
  [  831.008055] lr : ___bpf_prog_run+0x1198/0x1588
  [  831.012485] sp : ffff00001ccabc20
  [  831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
  [  831.021087] x27: 0000000000000001 x26: 0000000000000000
  [  831.026387] x25: 000000c168d9db98 x24: 0000000000000000
  [  831.031686] x23: ffff000008203878 x22: ffff000009488000
  [  831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
  [  831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
  [  831.047585] x17: 0000000000000000 x16: 0000000000000000
  [  831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
  [  831.058184] x13: 0000000000000000 x12: 0000000000000000
  [  831.063484] x11: 0000000000000001 x10: 0000000000000000
  [  831.068783] x9 : 0000000000000000 x8 : 0000000000000000
  [  831.074083] x7 : 0000000000000000 x6 : 000580d428000000
  [  831.079383] x5 : 0000000000000018 x4 : 0000000000000000
  [  831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
  [  831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
  [  831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
  [  831.102577] Call trace:
  [  831.105012]  __ll_sc_atomic_add+0x4/0x18
  [  831.108923]  __bpf_prog_run32+0x4c/0x70
  [  831.112748]  bpf_test_run+0x78/0xf8
  [  831.116224]  bpf_prog_test_run_xdp+0xb4/0x120
  [  831.120567]  SyS_bpf+0x77c/0x1110
  [  831.123873]  el0_svc_naked+0x30/0x34
  [  831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)

Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.

xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-23 14:33:39 -08:00
Linus Torvalds
8961ca441b exynos, meson, ipuv3, secondary gpu, cirrus, edid quirk fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaj38KAAoJEAx081l5xIa+AY8P/0oX+UPtjNxVqUTzeejxxZG7
 EpmcJWP2SENnkOSdiyPMLI4SIOgv0B+73hX6ATbsVx9nseqxAJyoAFJZCQy7ioS3
 RjB6wXi/WQrxrXc3MU5FUp8AfPLvZx2BlAHGqyuk3V2f3fIjl0tWmMuxgdc0WX1j
 wzzHNBEKoXG5WVVEOXJZq5xd8s35QTdhqGpqrvl1ruHtqmnls8n67qPB9F7F6lHm
 Iwi6MlvIxwoLIuWj0cJyOoUdw0Z6/MQ+Of8zW1E0NJIfgfa9LKjtIRUacJvOndRP
 Oq9XUCI/6gmNswmdktz65w1SfuU/cq9j46FuBh23QYNvYfuYgtvL0xhQPYF08vtK
 83X1Sop8Pzz9f2jCL2TPKLF37TetNpMT1gTP/NsGirRc+cvZTMBl1+OcWO47oTYZ
 TZ70L7GSJOdJV/n5vdCE5bSBS/thvLC5tyUGgRH+y7E6Lt2HouVN3ulkKb/stuQ3
 ee9NbI16YXZepK3+Z4YUdFziC40BO7K0LGlyAjs9G95LBRQNq9jNJLXTog5vSUJa
 3DFjEqQ558iciGkmYx4cQhlCqYvzuNClutz2D4RN7LqA5wHKqt4LWwTgjnUk9Z82
 lvNm3IGB+HiXEWpQmEuQeMqC+Xwxfdx3n+s3I7TpztdbgIWJM4KqAa4OKKK2NUM6
 qxEYwcQ2P84obOwBkVu3
 =LRdE
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A bunch of fixes for rc3:

  Exynos:
   - fixes for using monotonic timestamps
   - register definitions
   - removal of unused file

  ipu-v3L
   - minor changes
   - make some register arrays const+static
   - fix some leaks

  meson:
   - fix for vsync

  atomic:
   - fix for memory leak

  EDID parser:
   - add quirks for some more non-desktop devices
   - 6-bit panel fix.

  drm_mm:
   - fix a bug in the core drm mm hole handling

  cirrus:
   - fix lut loading regression

  Lastly there is a deadlock fix around runtime suspend for secondary
  GPUs.

  There was a deadlock between one thread trying to wait for a workqueue
  job to finish in the runtime suspend path, and the workqueue job it
  was waiting for in turn waiting for a runtime_get_sync to return.

  The fixes avoids it by not doing the runtime sync in the workqueue as
  then we always wait for all those tasks to complete before we runtime
  suspend"

* tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux: (25 commits)
  drm/tve200: fix kernel-doc documentation comment include
  drm/edid: quirk Sony PlayStation VR headset as non-desktop
  drm/edid: quirk Windows Mixed Reality headsets as non-desktop
  drm/edid: quirk Oculus Rift headsets as non-desktop
  drm/meson: fix vsync buffer update
  drm: Handle unexpected holes in color-eviction
  drm: exynos: Use proper macro definition for HDMI_I2S_PIN_SEL_1
  drm/exynos: remove exynos_drm_rotator.h
  drm/exynos: g2d: Delete an error message for a failed memory allocation in two functions
  drm/exynos: fix comparison to bitshift when dealing with a mask
  drm/exynos: g2d: use monotonic timestamps
  drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
  gpu: ipu-csi: add 10/12-bit grayscale support to mbus_code_to_bus_cfg
  gpu: ipu-cpmem: add 16-bit grayscale support to ipu_cpmem_set_image
  gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle
  gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle
  drm/amdgpu: Fix deadlock on runtime suspend
  drm/radeon: Fix deadlock on runtime suspend
  drm/nouveau: Fix deadlock on runtime suspend
  drm: Allow determining if current task is output poll worker
  ...
2018-02-23 10:31:31 -08:00
Paul Moore
ce423631ce audit: track the owner of the command mutex ourselves
Evidently the __mutex_owner() function was never intended for use
outside the core mutex code, so build a thing locking wrapper around
the mutex code which allows us to track the mutex owner.

One, arguably positive, side effect is that this allows us to hide
the audit_cmd_mutex inside of kernel/audit.c behind the lock/unlock
functions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-02-23 11:22:22 -05:00
Thomas Gleixner
651ca2c004 genirq/matrix: Handle CPU offlining proper
At CPU hotunplug the corresponding per cpu matrix allocator is shut down and
the allocated interrupt bits are discarded under the assumption that all
allocated bits have been either migrated away or shut down through the
managed interrupts mechanism.

This is not true because interrupts which are not started up might have a
vector allocated on the outgoing CPU. When the interrupt is started up
later or completely shutdown and freed then the allocated vector is handed
back, triggering warnings or causing accounting issues which result in
suspend failures and other issues.

Change the CPU hotplug mechanism of the matrix allocator so that the
remaining allocations at unplug time are preserved and global accounting at
hotplug is correctly readjusted to take the dormant vectors into account.

Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator")
Reported-by: Yuriy Vostrikov <delamonpansie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yuriy Vostrikov <delamonpansie@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180222112316.849980972@linutronix.de
2018-02-22 22:05:43 +01:00
Yonghong Song
6c5f61023c bpf: fix rcu lockdep warning for lpm_trie map_free callback
Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
fixed a memory leak and removed unnecessary locks in map_free callback function.
Unfortrunately, it introduced a lockdep warning. When lockdep checking is turned on,
running tools/testing/selftests/bpf/test_lpm_map will have:

  [   98.294321] =============================
  [   98.294807] WARNING: suspicious RCU usage
  [   98.295359] 4.16.0-rc2+ #193 Not tainted
  [   98.295907] -----------------------------
  [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious rcu_dereference_check() usage!
  [   98.297657]
  [   98.297657] other info that might help us debug this:
  [   98.297657]
  [   98.298663]
  [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
  [   98.299536] 2 locks held by kworker/2:1/54:
  [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0
  [   98.301381]  #1:  ((work_completion)(&map->work)){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0

Since actual trie tree removal happens only after no other
accesses to the tree are possible, replacing
  rcu_dereference_protected(*slot, lockdep_is_held(&trie->lock))
with
  rcu_dereference_protected(*slot, 1)
fixed the issue.

Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22 21:29:12 +01:00
Eric Dumazet
32fff239de bpf: add schedule points in percpu arrays management
syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22 21:27:06 +01:00