3398 Commits

Author SHA1 Message Date
Nitin Gupta
a8e654f01c sparc64: update pmdp_invalidate() to return old pmd value
It's required to avoid losing dirty and accessed bits.

[akpm@linux-foundation.org: add a `do' to the do-while loop]
Link: http://lkml.kernel.org/r/20171213105756.69879-9-kirill.shutemov@linux.intel.com
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:38 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Linus Torvalds
d4173023e6 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo cleanups from Eric Biederman:
 "Long ago when 2.4 was just a testing release copy_siginfo_to_user was
  made to copy individual fields to userspace, possibly for efficiency
  and to ensure initialized values were not copied to userspace.

  Unfortunately the design was complex, it's assumptions unstated, and
  humans are fallible and so while it worked much of the time that
  design failed to ensure unitialized memory is not copied to userspace.

  This set of changes is part of a new design to clean up siginfo and
  simplify things, and hopefully make the siginfo handling robust enough
  that a simple inspection of the code can be made to ensure we don't
  copy any unitializied fields to userspace.

  The design is to unify struct siginfo and struct compat_siginfo into a
  single definition that is shared between all architectures so that
  anyone adding to the set of information shared with struct siginfo can
  see the whole picture. Hopefully ensuring all future si_code
  assignments are arch independent.

  The design is to unify copy_siginfo_to_user32 and
  copy_siginfo_from_user32 so that those function are complete and cope
  with all of the different cases documented in signinfo_layout. I don't
  think there was a single implementation of either of those functions
  that was complete and correct before my changes unified them.

  The design is to introduce a series of helpers including
  force_siginfo_fault that take the values that are needed in struct
  siginfo and build the siginfo structure for their callers. Ensuring
  struct siginfo is built correctly.

  The remaining work for 4.17 (unless someone thinks it is post -rc1
  material) is to push usage of those helpers down into the
  architectures so that architecture specific code will not need to deal
  with the fiddly work of intializing struct siginfo, and then when
  struct siginfo is guaranteed to be fully initialized change copy
  siginfo_to_user into a simple wrapper around copy_to_user.

  Further there is work in progress on the issues that have been
  documented requires arch specific knowledge to sort out.

  The changes below fix or at least document all of the issues that have
  been found with siginfo generation. Then proceed to unify struct
  siginfo the 32 bit helpers that copy siginfo to and from userspace,
  and generally clean up anything that is not arch specific with regards
  to siginfo generation.

  It is a lot but with the unification you can of siginfo you can
  already see the code reduction in the kernel"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
  signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
  mm/memory_failure: Remove unused trapno from memory_failure
  signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
  signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
  signal: Helpers for faults with specialized siginfo layouts
  signal: Add send_sig_fault and force_sig_fault
  signal: Replace memset(info,...) with clear_siginfo for clarity
  signal: Don't use structure initializers for struct siginfo
  signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
  ptrace: Use copy_siginfo in setsiginfo and getsiginfo
  signal: Unify and correct copy_siginfo_to_user32
  signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
  signal: Unify and correct copy_siginfo_from_user32
  signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
  signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
  signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
  signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
  signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
  signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
  signal: Move addr_lsb into the _sigfault union for clarity
  ...
2018-01-30 14:18:52 -08:00
Linus Torvalds
49f9c3552c init_task out-of-lining
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv
 QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU
 0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr
 pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf
 NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq
 A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q
 f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG
 PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh
 BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F
 0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ
 jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG
 tdfUsev1Mc8=
 =jhWg
 -----END PGP SIGNATURE-----

Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull init_task initializer cleanups from David Howells:
 "It doesn't seem useful to have the init_task in a header file rather
  than in a normal source file. We could consolidate init_task handling
  instead and expand out various macros.

  Here's a series of patches that consolidate init_task handling:

   (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
       openrisc.

   (2) Alter the INIT_TASK_DATA linker script macro to set
       init_thread_union and init_stack rather than defining these in C.

       Insert init_task and init_thread_into into the init_stack area in
       the linker script as appropriate to the configuration, with
       different section markers so that they end up correctly ordered.

       We can then get merge ia64's init_task.c into the main one.

       We then have a bunch of single-use INIT_*() macros that seem only
       to be macros because they used to be used per-arch. We can then
       expand these in place of the user and get rid of a few lines and
       a lot of backslashes.

   (3) Expand INIT_TASK() in place.

   (4) Expand in place various small INIT_*() macros that are defined
       conditionally. Expand them and surround them by #if[n]def/#endif
       in the .c file as it takes fewer lines.

   (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.

   (6) Expand INIT_STRUCT_PID in place.

  These macros can then be discarded"

* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  Expand INIT_STRUCT_PID and remove
  Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
  Expand various INIT_* macros and remove
  Expand INIT_TASK() in init/init_task.c and remove
  Construct init thread stack in the linker script rather than by union
  openrisc: Make THREAD_SIZE available to vmlinux.lds
  hexagon: Make THREAD_SIZE available to vmlinux.lds
  cris: Make THREAD_SIZE available to vmlinux.lds
2018-01-29 09:08:34 -08:00
Corentin Labbe
aebb48f5e4 sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64
This patch fixes the typo CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64

Fixes: 81658ad0d923 ("sparc64: Add CAMELLIA driver making use of the new camellia opcodes.")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:47:55 -05:00
Eric W. Biederman
ea64d5acc8 signal: Unify and correct copy_siginfo_to_user32
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems.  Some architectures fail to handle all of the cases in in
the siginfo union.  Some architectures perform a blind copy of the
siginfo union when the si_code is negative.  A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.

Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.

A special case is made for x86 x32 format.  This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats.  By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures.  Vastly increasing the testing base and the chances of
finding bugs.

As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 19:56:20 -06:00
Eric W. Biederman
212a36a17e signal: Unify and correct copy_siginfo_from_user32
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.

Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.

In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.

When copying the embedded sigval union copy the si_int member.  That
ensures the 32bit values passes through the kernel unchanged.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:55:59 -06:00
Al Viro
b713da69e4 signal: unify compat_siginfo_t
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
      Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
      linux/compat.h

      CONFIG_X86_X32 is set when the user requests X32 support.

      CONFIG_X86_X32_ABI is set when the user requests X32 support
      and the tool-chain has X32 allowing X32 support to be built.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-01-15 17:40:31 -06:00
Eric W. Biederman
2f82a46f66 signal: Remove _sys_private and _overrun_incr from struct compat_siginfo
We have never passed either field to or from userspace so just remove them.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-12 14:34:46 -06:00
David Howells
0500871f21 Construct init thread stack in the linker script rather than by union
Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.

The following symbols are then made available from INIT_TASK_DATA() linker
script macro:

	init_thread_union
	init_stack

INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack.  init_thread_union is given its own section so that
it can be placed into the stack space in the right order.  I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2018-01-09 23:21:02 +00:00
Jan Engelhardt
59585b4be9 sparc64: repair calling incorrect hweight function from stubs
Commit v4.12-rc4-1-g9289ea7f952b introduced a mistake that made the
64-bit hweight stub call the 16-bit hweight function.

Fixes: 9289ea7f952b ("sparc64: Use indirect calls in hamming weight stubs")
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 20:29:48 -05:00
Linus Torvalds
ead68f2161 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller"
 "What's a holiday weekend without some networking bug fixes? [1]

   1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
      calls, from Daniel Borkmann.

   2) Fix regression from errata limiting change to marvell PHY driver,
      from Zhao Qiang.

   3) Fix u16 overflow in SCTP, from Xin Long.

   4) Fix potential memory leak during bridge newlink, from Nikolay
      Aleksandrov.

   5) Fix BPF selftest build on s390, from Hendrik Brueckner.

   6) Don't append to cfg80211 automatically generated certs file,
      always write new ones from scratch. From Thierry Reding.

   7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.

   8) Fix hang on tg3 MTU change with certain chips, from Brian King.

   9) Add stall detection to arc emac driver and reset chip when this
      happens, from Alexander Kochetkov.

  10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.

  11) Fix stmmac timestamping bug due to mis-shifting of field. From
      Fredrik Hallenberg.

  12) Fix metrics match when deleting an ipv4 route. The kernel sets
      some internal metrics bits which the user isn't going to set when
      it makes the delete request. From Phil Sutter.

  13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
      from Yelena Krivosheev.

  14) Fix double free and memory corruption in get_net_ns_by_id, from
      Eric W. Biederman.

  15) Flush ipv4 FIB tables in the reverse order. Some tables can share
      their actual backing data, in particular this happens for the MAIN
      and LOCAL tables. We have to kill the LOCAL table first, because
      it uses MAIN's backing memory. Fix from Ido Schimmel.

  16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
      Horn, and Alexei Starovoitov.

  17) Make changes to ipv6 autoflowlabel sysctl really propagate to
      sockets, unless the socket has set the per-socket value
      explicitly. From Shaohua Li.

  18) Fix leaks and double callback invocations of zerocopy SKBs, from
      Willem de Bruijn"

[1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  skbuff: skb_copy_ubufs must release uarg even without user frags
  skbuff: orphan frags before zerocopy clone
  net: reevalulate autoflowlabel setting after sysctl setting
  openvswitch: Fix pop_vlan action for double tagged frames
  ipv6: Honor specified parameters in fibmatch lookup
  bpf: do not allow root to mangle valid pointers
  selftests/bpf: add tests for recent bugfixes
  bpf: fix integer overflows
  bpf: don't prune branches when a scalar is replaced with a pointer
  bpf: force strict alignment checks for stack pointers
  bpf: fix missing error return in check_stack_boundary()
  bpf: fix 32-bit ALU op verification
  bpf: fix incorrect tracking of register size truncation
  bpf: fix incorrect sign extension in check_alu_op()
  bpf/verifier: fix bounds calculation on BPF_RSH
  ipv4: Fix use-after-free when flushing FIB tables
  s390/qeth: fix error handling in checksum cmd callback
  tipc: remove joining group member from congested list
  selftests: net: Adding config fragment CONFIG_NUMA=y
  nfp: bpf: keep track of the offloaded program
  ...
2017-12-21 15:57:30 -08:00
Kees Cook
10a7e9d849 Do not hash userspace addresses in fault handlers
The hashing of %p was designed to restrict kernel addresses. There is
no reason to hash the userspace values seen during a segfault report,
so switch these to %px. (Some architectures already use %lx.)

Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-19 17:04:43 -08:00
David S. Miller
b36025b19a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-17

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a corner case in generic XDP where we have non-linear skbs
   but enough tailroom in the skb to not miss to linearizing there,
   from Song.

2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
   BPF context is not skb, from Daniel.

3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
   call would use the wrong register for the skb, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18 10:49:22 -05:00
Linus Torvalds
f6f3732162 Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths"
This reverts commits 5c9d2d5c269c, c7da82b894e9, and e7fe7b5cae90.

We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.

Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was.  Dave
Hansen says:

 "So, the underlying bug here is that we now a get_user_pages_remote()
  and then go ahead and do the p*_access_permitted() checks against the
  current PKRU. This was introduced recently with the addition of the
  new p??_access_permitted() calls.

  We have checks in the VMA path for the "remote" gups and we avoid
  consulting PKRU for them. This got missed in the pkeys selftests
  because I did a ptrace read, but not a *write*. I also didn't
  explicitly test it against something where a COW needed to be done"

It's also not entirely clear that it makes sense to check the protection
key bits at this level at all.  But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.

We'll see.

Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back.  Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76f7 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-15 18:53:22 -08:00
Daniel Borkmann
07aee94394 bpf, sparc: fix usage of wrong reg for load_skb_regs after call
When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.

Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-15 09:19:35 -08:00
Hendrik Brueckner
c895f6f703 bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type
Commit 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
program type") introduced the bpf_perf_event_data structure which
exports the pt_regs structure.  This is OK for multiple architectures
but fail for s390 and arm64 which do not export pt_regs.  Programs
using them, for example, the bpf selftest fail to compile on these
architectures.

For s390, exporting the pt_regs is not an option because s390 wants
to allow changes to it.  For arm64, there is a user_pt_regs structure
that covers parts of the pt_regs structure for use by user space.

To solve the broken uapi for s390 and arm64, introduce an abstract
type for pt_regs and add an asm/bpf_perf_event.h file that concretes
the type.  An asm-generic header file covers the architectures that
export pt_regs today.

The arch-specific enablement for s390 and arm64 follows in separate
commits.

Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Fixes: 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-05 15:02:40 +01:00
Linus Torvalds
a0908a1b7d Merge branch 'akpm' (patches from Andrew)
Mergr misc fixes from Andrew Morton:
 "28 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
  fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
  mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
  autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
  autofs: revert "autofs: take more care to not update last_used on path walk"
  fs/fat/inode.c: fix sb_rdonly() change
  mm, memcg: fix mem_cgroup_swapout() for THPs
  mm: migrate: fix an incorrect call of prep_transhuge_page()
  kmemleak: add scheduling point to kmemleak_scan()
  scripts/bloat-o-meter: don't fail with division by 0
  fs/mbcache.c: make count_objects() more robust
  Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
  mm/madvise.c: fix madvise() infinite loop under special circumstances
  exec: avoid RLIMIT_STACK races with prlimit()
  IB/core: disable memory registration of filesystem-dax vmas
  v4l2: disable filesystem-dax mapping support
  mm: fail get_vaddr_frames() for filesystem-dax mappings
  mm: introduce get_user_pages_longterm
  device-dax: implement ->split() to catch invalid munmap attempts
  mm, hugetlbfs: introduce ->split() to vm_operations_struct
  scripts/faddr2line: extend usage on generic arch
  ...
2017-11-29 19:12:44 -08:00
Dan Williams
c7da82b894 mm: replace pmd_write with pmd_access_permitted in fault + gup paths
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

 - validate that the mapping is writable from a protection keys
   standpoint

 - validate that the pte has _PAGE_USER set since all fault paths where
   pmd_write is must be referencing user-memory.

Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Dan Williams
e7fe7b5cae mm: replace pud_write with pud_access_permitted in fault + gup paths
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

 - validate that the mapping is writable from a protection keys
   standpoint

 - validate that the pte has _PAGE_USER set since all fault paths where
   pud_write is must be referencing user-memory.

[dan.j.williams@intel.com: fix powerpc compile error]
  Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Dan Williams
e4e40e0263 mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:

    did you consider using the other paradigm:

    In arch include files:
    #define pud_write       pud_write
    static inline int pud_write(pud_t pud)
     .....

    Then in include/asm-generic/pgtable.h:

    #ifndef pud_write
    tatic inline int pud_write(pud_t pud)
    {
            ....
    }
    #endif

    If you had, then the powerpc code would have worked ... ;-) and many
    of the other interfaces in include/asm-generic/pgtable.h are
    protected that way ...

Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.

Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Al Viro
c71d227fc4 make kernel-side POLL... arch-independent
mangle/demangle on the way to/from userland

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-29 19:00:41 -05:00
David S. Miller
e5372cd5ef sparc64: Fix boot on T4 and later.
If we don't put the NG4fls.o object into the same part of
the link as the generic sparc64 objects for fls() and __fls()
then the relocation in the branch we use for patching will
not fit.

Move NG4fls.o into lib-y to fix this problem.

Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
2017-11-29 15:09:29 -05:00
Al Viro
8ced390c2b define __poll_t, annotate constants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:19:52 -05:00
Linus Torvalds
1deab8ce2c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:

 1) Add missing cmpxchg64() for 32-bit sparc.

 2) Timer conversions from Allen Pais and Kees Cook.

 3) vDSO support, from Nagarathnam Muthusamy.

 4) Fix sparc64 huge page table walks based upon bug report by Al Viro,
    from Nitin Gupta.

 5) Optimized fls() for T4 and above, from Vijay Kumar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix page table walk for PUD hugepages
  sparc64: Convert timers to user timer_setup()
  sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
  sparc/led: Convert timers to use timer_setup()
  sparc64: Use sparc optimized fls and __fls for T4 and above
  sparc64: SPARC optimized __fls function
  sparc64: SPARC optimized fls function
  sparc64: Define SPARC default __fls function
  sparc64: Define SPARC default fls function
  vDSO for sparc
  sparc32: Add cmpxchg64().
  sbus: char: Move D7S_MINOR to include/linux/miscdevice.h
  sparc: time: Remove unneeded linux/miscdevice.h include
  sparc64: mmu_context: Add missing include files
2017-11-17 20:21:44 -08:00
Linus Torvalds
fa7f578076 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a bit more MM

 - procfs updates

 - dynamic-debug fixes

 - lib/ updates

 - checkpatch

 - epoll

 - nilfs2

 - signals

 - rapidio

 - PID management cleanup and optimization

 - kcov updates

 - sysvipc updates

 - quite a few misc things all over the place

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  EXPERT Kconfig menu: fix broken EXPERT menu
  include/asm-generic/topology.h: remove unused parent_node() macro
  arch/tile/include/asm/topology.h: remove unused parent_node() macro
  arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
  arch/sh/include/asm/topology.h: remove unused parent_node() macro
  arch/ia64/include/asm/topology.h: remove unused parent_node() macro
  drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
  mm: add infrastructure for get_user_pages_fast() benchmarking
  sysvipc: make get_maxid O(1) again
  sysvipc: properly name ipc_addid() limit parameter
  sysvipc: duplicate lock comments wrt ipc_addid()
  sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
  initramfs: use time64_t timestamps
  drivers/watchdog: make use of devm_register_reboot_notifier()
  kernel/reboot.c: add devm_register_reboot_notifier()
  kcov: update documentation
  Makefile: support flag -fsanitizer-coverage=trace-cmp
  kcov: support comparison operands collection
  kcov: remove pointless current != NULL check
  kernel/panic.c: add TAINT_AUX
  ...
2017-11-17 16:56:17 -08:00
Dou Liyang
5f4cdac6bc arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removed the last user of parent_node().

The parent_node() macro in SPARC64 platform is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1504234599-29533-6-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:04 -08:00
Linus Torvalds
e29116758c Merge branch 'parisc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
 "Highlights:

   - one important fix from Dave to prevent kernel crash when userspace
     hands over invalid values to our in-kernel CAS implementation.

   - added CPU topology support, including multi-core scheduler support
     on PA8900 CPUs

  Minor changes:

   - minor fixes for sparse (from Luc)

   - drop duplicates for CPU_BIG_ENDIAN from parisc and sparc top
     Kconfig files (from Babu)

   - reorganized parisc PDC (firmware-access) header files for usage
     from userspace. Required for upcoming qemu parisc emulator and
     SeaBIOS fork to support parisc"

* 'parisc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  arch: Fix duplicates in Kconfig for parisc and sparc
  parisc: Make some PDC structures accessible in uapi headers
  parisc: Pass endianness info to sparse
  parisc: Add CPU topology support
  parisc: Fix validity check of pointer size argument in new CAS implementation
2017-11-17 14:26:14 -08:00
Linus Torvalds
5a3e0b196b File locking related changes for v4.15
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDuoWAAoJEAAOaEEZVoIVXEQP/jQYoU9hgvEj8j3ZIgi56SDJ
 pR45w2zcJz2/uU43DEKyShyLgsuoBbJQ3l/gGBH/tl+xGm9NzB0gatoEu9GmKNYz
 /IN6/vUFnoIAUyD+iMZbpmsYKIkz0z2YJo261IfspAwIft/cvHJnYYGQrP9YXg9F
 c7bdDuANTKocdQigc4BQyOe3OfIBGfTwJhuakO+1yuZmGOVNyxEcdYbMM8FiTfc8
 +62kvQQ3t7WMqSbM8M0QdGcYQjG0EwcVAuV7COurLJIva7hUkVel32MVUjoFcf28
 BnRu2ztFJCubm1HA85twlJDtpeXbcMqrUl/CcwRMpwDaePd5GVB1h5iKqbZ51BZ1
 fWT2STmt+8hY2B5eiXoYEaG3B7ZRr+r0oroxqOxpiZ/m4AVeouF+gPGv+NV5zgvD
 NGWC0MdklIJ4xaC99NEeP6kBhz0M74VKymFCTeHkVg9m4TqDepNvitKed0qagw19
 uw8seei7TOTm4o117+l55NHmyfTHXFO4U0WLTJyeZcoEnUs0rOcHeqyy0RwCBMrK
 W2fJtdBLFr+tBIIrID4TnPhhYtSvIPjz+FpiRDobqhgvMva/PIvLGTWK4unrgIjG
 ZQ7YGnwWda8GjqKhgZacn/BSXyJzOAF9hJp0mz2ORaOxaMarEV55duiZufCvGuZw
 uUQWRCKuQX7Oi05i9jXp
 =fCeF
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking update from Jeff Layton:
 "A couple of fixes for a patch that went into v4.14, and the bug report
  just came in a few days ago.. It passes my (minimal) testing, and has
  been in linux-next for a few days now.

  I also would like to get my address changed in MAINTAINERS to clear
  that hurdle"

* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
  fcntl: don't leak fd reference when fixup_compat_flock fails
  MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
2017-11-17 13:21:58 -08:00
Linus Torvalds
93f30c73ec Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to ->compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
2017-11-17 11:54:55 -08:00
Babu Moger
7b8b098c47 arch: Fix duplicates in Kconfig for parisc and sparc
Fix duplicates for sparc and parisc. This was due these following commits.

1. commit 4c97a0c8fee3 ("arch: define CPU_BIG_ENDIAN for all fixed big
   endian archs")
2. commit 97d9f969161d ("arch/sparc: Define config parameter
   CPU_BIG_ENDIAN")
3. commit 74ad3d28af21 ("parisc: Define CONFIG_CPU_BIG_ENDIAN")

Remove duplicates.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-11-17 15:27:54 +01:00
Linus Torvalds
7c225c69f8 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2 updates

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
  memory hotplug: fix comments when adding section
  mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
  mm: simplify nodemask printing
  mm,oom_reaper: remove pointless kthread_run() error check
  mm/page_ext.c: check if page_ext is not prepared
  writeback: remove unused function parameter
  mm: do not rely on preempt_count in print_vma_addr
  mm, sparse: do not swamp log with huge vmemmap allocation failures
  mm/hmm: remove redundant variable align_end
  mm/list_lru.c: mark expected switch fall-through
  mm/shmem.c: mark expected switch fall-through
  mm/page_alloc.c: broken deferred calculation
  mm: don't warn about allocations which stall for too long
  fs: fuse: account fuse_inode slab memory as reclaimable
  mm, page_alloc: fix potential false positive in __zone_watermark_ok
  mm: mlock: remove lru_add_drain_all()
  mm, sysctl: make NUMA stats configurable
  shmem: convert shmem_init_inodecache() to void
  Unify migrate_pages and move_pages access checks
  mm, pagevec: rename pagevec drained field
  ...
2017-11-15 19:42:40 -08:00
Mel Gorman
2d4894b5d2 mm: remove cold parameter from free_hot_cold_page*
Most callers users of free_hot_cold_page claim the pages being released
are cache hot.  The exception is the page reclaim paths where it is
likely that enough pages will be freed in the near future that the
per-cpu lists are going to be recycled and the cache hotness information
is lost.  As no one really cares about the hotness of pages being
released to the allocator, just ditch the parameter.

The APIs are renamed to indicate that it's no longer about hot/cold
pages.  It should also be less confusing as there are subtle differences
between them.  __free_pages drops a reference and frees a page when the
refcount reaches zero.  free_hot_cold_page handled pages whose refcount
was already zero which is non-obvious from the name.  free_unref_page
should be more obvious.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

[mgorman@techsingularity.net: add pages to head, not tail]
  Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Pavel Tatashin
78c943662f sparc64: optimize struct page zeroing
Add an optimized mm_zero_struct_page(), so struct page's are zeroed
without calling memset().  We do eight to ten regular stores based on
the size of struct page.  Compiler optimizes out the conditions of
switch() statement.

SPARC-M6 with 15T of memory, single thread performance:

                               BASE            FIX  OPTIMIZED_FIX
        bootmem_init   28.440467985s   2.305674818s   2.305161615s
free_area_init_nodes  202.845901673s 225.343084508s 172.556506560s
                      --------------------------------------------
Total                 231.286369658s 227.648759326s 174.861668175s

BASE:  current linux
FIX:   This patch series without "optimized struct page zeroing"
OPTIMIZED_FIX: This patch series including the current patch.

bootmem_init() is where memory for struct pages is zeroed during
allocation.  Note, about two seconds in this function is a fixed time:
it does not increase as memory is increased.

Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
df8ee57889 sparc64: simplify vmemmap_populate
Remove duplicating code by using common functions vmemmap_pud_populate
and vmemmap_pgd_populate.

Link: http://lkml.kernel.org/r/20171013173214.27300-5-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
2a20aa1710 sparc64/mm: set fields in deferred pages
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set
some fields prior to initializing:

mem_init() {
     register_page_bootmem_info();
     free_all_bootmem();
     ...
}

When register_page_bootmem_info() is called only non-deferred struct
pages are initialized.  But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.

mem_init
register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

free_all_bootmem()
free_low_memory_core_early()
 for_each_reserved_mem_region()
  reserve_bootmem_region()
   init_reserved_page() <- Only if this is deferred reserved page
    __init_single_pfn()
     __init_single_page()
      memset(0) <-- Loose the set fields here

We end up with similar issue as in the previous patch, where currently
we do not observe problem as memory is zeroed.  But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
75f296d93b kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it.

Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
c4812909f5 mm: introduce wrappers to access mm->nr_ptes
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd
and nr_pud.

The patch also makes nr_ptes accounting dependent onto CONFIG_MMU.  Page
table accounting doesn't make sense if you don't have page tables.

It's preparation for consolidation of page-table counters in mm_struct.

Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
b4e98d9ac7 mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
account PUD page tables, only PMD and PTE.

We already addressed the same issue for PMD page tables, see commit
dc6c9a35b66b ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.

The patch expands accounting to PUD level.

[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
  Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
  Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Linus Torvalds
1b6115fbe3 pci-v4.15-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaDFIdAAoJEFmIoMA60/r8Jg4P/3IrmMNVnpqmYEZ7lRSW7UQ3
 8jtupbzIkbPsIAEhbJ7xqO7zKx85j6Og+ZSOv4a8u/tS6cd1aVZu2PpWsTkacez0
 7nLGVCSL3HZi5qcrtOvb2Pmke18SUKSPxVYSgS2ajQavB1oKaY03FbHDWyWidCZx
 qxkeGZOiUDw5kSGkQWyks1WgB0dd76rVbPcrKKEJueGgrdSm+EdgdDv8eT6bZInZ
 uMrCmSjNYTQP0KASCJJvgYOtJbdwvP6NuQTxzOlU2G+H2SqsLRjsz4UUR8FF06T5
 cndpgpG3QSAZLx7wCeWTvRorTEYORzKMoyw/AUjhiGbRep9Zw0aKNvCC99E6xjyD
 FECrk6kCrqZs7l+LVXK4SwpBXCVjNgRoFAHBEKF2X3/SWUkUhHXZHCVvMQB8LQiS
 2p8VRoYWw2aCLkHCGynuzToUrD2P2Pjxe5n/13aYVJkyBNfQqqZ3l2YHiZdpDO3j
 rgG6RW0WCrpZxfb/0WAbPnQ2qpZAwDPO6hOW7dIfTZabFVXRIkBvNq53by/0MxvP
 jyOcMTsq2l8y46f3VgNPUAHj0f52HwfZA3PQRPh+MQDz5385BJklDRWtfVM7cQx9
 IoeGkq1zLLvpOh63he/jnnRELxDvNVcxND8lOkenJlObj9kK63hUEcXg/zEMS4w3
 oetLw9TqE32Jb7GfpVSw
 =j4L3
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

  - detach driver before tearing down procfs/sysfs (Alex Williamson)

  - disable PCIe services during shutdown (Sinan Kaya)

  - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel)

  - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas)

  - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas)

  - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn
    Helgaas)

  - report non-fatal AER errors only to the affected endpoint (Gabriele
    Paoloni)

  - distribute bus numbers, MMIO, and I/O space among hotplug bridges to
    allow more devices to be hot-added (Mika Westerberg)

  - fix pciehp races during initialization and surprise link down (Mika
    Westerberg)

  - handle surprise-removed devices in PME handling (Qiang)

  - support resizable BARs for large graphics devices (Christian König)

  - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo
    Sironi)

  - create SR-IOV virtfn/physfn sysfs links before attaching driver
    (Stuart Hayes)

  - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen)

  - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy)

  - avoid slot reset if bridge itself is broken (Jan Glauber)

  - clean up pci_reset_function() path (Jan H. Schönherr)

  - make pci_map_rom() fail if the option ROM is invalid (Changbin Du)

  - convert timers to timer_setup() (Kees Cook)

  - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap)

  - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal)

  - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master()
    declarations (Bjorn Helgaas)

  - fix endpoint framework overflows and BUG()s (Dan Carpenter)

  - fix endpoint framework issues (Kishon Vijay Abraham I)

  - avoid broken Cavium CN8xxx bus reset behavior (David Daney)

  - extend Cavium ACS capability quirks (Vadim Lomovtsev)

  - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel)

  - turn off dra7xx clocks cleanly on shutdown (Keerthy)

  - fix Faraday probe error path (Wei Yongjun)

  - support HiSilicon STB SoC PCIe host controller (Jianguo Sun)

  - fix Hyper-V interrupt affinity issue (Dexuan Cui)

  - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly
    Kuznetsov)

  - support multiple MSI on iProc (Sandor Bodo-Merle)

  - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou
    Zhiqiang)

  - fix Layerscape default error response (Minghuan Lian)

  - support MSI on Tango host controller (Marc Gonzalez)

  - support Tegra186 PCIe host controller (Manikanta Maddireddy)

  - use generic accessors on Tegra when possible (Thierry Reding)

  - support V3 Semiconductor PCI host controller (Linus Walleij)

* tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits)
  PCI/ASPM: Add L1 Substates definitions
  PCI/ASPM: Reformat ASPM register definitions
  PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
  PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
  PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe()
  PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up()
  PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up()
  PCI: Fix kernel-doc build warning
  PCI: Fail pci_map_rom() if the option ROM is invalid
  PCI: Move pci_map_rom() error path
  PCI: Move PCI_QUIRKS to the PCI bus menu
  alpha/PCI: Make pdev_save_srm_config() static
  PCI: Remove unused declarations
  PCI: Remove redundant pci_dev, pci_bus, resource declarations
  PCI: Remove redundant pcibios_set_master() declarations
  PCI/PME: Handle invalid data when reading Root Status
  PCI: hv: Use effective affinity mask
  PCI: pciehp: Do not clear Presence Detect Changed during initialization
  PCI: pciehp: Fix race condition handling surprise link down
  PCI: Distribute available resources to hotplug-capable bridges
  ...
2017-11-15 15:01:28 -08:00
Jeff Layton
4d2dc2cc76 fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.

Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.

With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.

Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
2017-11-15 08:08:36 -05:00
Nitin Gupta
70f3c8b7c2 sparc64: Fix page table walk for PUD hugepages
For a PUD hugepage entry, we need to propagate bits [32:22]
from virtual address to resolve at 4M granularity. However,
the current code was incorrectly propagating bits [29:19].
This bug can cause incorrect data to be returned for pages
backed with 16G hugepages.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:37:43 +09:00
Allen Pais
ff0296877a sparc64: Convert timers to user timer_setup()
Switch to using the new timer_setup() and from_timer()
in LDOM Virtual I/O handshake.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:36:27 +09:00
Elena Reshetova
cdf5976fcb sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mdesc_handle.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:28:23 +09:00
Kees Cook
68fa10dce0 sparc/led: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a static variable to hold timeout
value.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:27:50 +09:00
Vijay Kumar
46ad8d2d22 sparc64: Use sparc optimized fls and __fls for T4 and above
For T4 and above, patch fls and __fls functions
at the boot time to use lzcnt instruction.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
2b41ce5df2 sparc64: SPARC optimized __fls function
Defined SPARC optimized __fls using lzcnt opcode.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
70cbec0c53 sparc64: SPARC optimized fls function
Defined SPARC optimized fls using lzcnt opcode.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
be52bbe3ea sparc64: Define SPARC default __fls function
__fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
41413a6035 sparc64: Define SPARC default fls function
fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:45 +09:00