Commit Graph

802418 Commits

Author SHA1 Message Date
Linus Torvalds
38c0ecf608 Fix bug in auxdisplay.
- charlcd: fix x/y command parsing
     From Mans Rullgard
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlwdT8QACgkQGXyLc2ht
 IW3xxQ/9EEw7lrrACBaTzj9DaEkln1Dz1ObEkSZdZJ+YxkPaagKAZZvaDmigYrkL
 SRSAxNr11mRVLOLjPK4/D1JpK07Xb7Ahu0lgcfInqxEv+lgkqVNdxCxomCqNowRM
 FbCd1xKH2VMWo1z6Bi/tvnJtD/tYnJzzBPPPNMRtd5Cmtt6OF5LVjE68pGcgHELq
 HD6uygy3WmGIzHxhV3O5PSM7FhD0mp9HgOzheBXqJY0ku5y/fj5eyp7QaucqSd2E
 oTr8X43GdGhAbntMbQ7Go0c9u4r6TJKkmqw9ylvXICDyNRzP3fLOipqRDh9U9slv
 ApHkZeM7v/2vzYdf1VJNep5Cb3EoeAE93CGnmVN+Jux006/6BQSTZ6k4z6XfXTpA
 wguhLgnRJ0EUx6REhC1ciPdmB9sJnkg6AUO8O4KQFU8X5MDT2pV+055ZopKkGTBB
 RKYvn3ymmY0FcOXPcn0lu48aCCkkyGZ7ANxTfZDpBt5UnMEWWHiYaElyet2OssG9
 RhkxS6ogzhlEesEGF5c9ZkRujHorBnvpwPACocK+kwoEMBaRQ6SDWQ+gZS6YGceS
 21MqY3WqMXZxT1PzJ1r9Z+FVB6gPPLGCiDEJg4QOrBQ3r6cq5JQK5LVNnD3b9Uqz
 /wWJj0KT1oOyMkdxzxxiGtcutDTzRluBOmRyQ0A6z0xTabgD0r8=
 =BHr1
 -----END PGP SIGNATURE-----

Merge tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux

Pull auxdisplay fix from Miguel Ojeda:
 "charlcd: fix x/y command parsing (Mans Rullgard)"

* tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux:
  auxdisplay: charlcd: fix x/y command parsing
2018-12-22 14:25:23 -08:00
Christian Brauner
94f82008ce Revert "vfs: Allow userns root to call mknod on owned filesystems."
This reverts commit 55956b59df.

commit 55956b59df ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:

bool may_open_dev(const struct path *path)
{
        return !(path->mnt->mnt_flags & MNT_NODEV) &&
                !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}

The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.

All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).

Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.

[1]: https://github.com/systemd/systemd/pull/9483

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22 14:18:34 -08:00
Sai Praneeth Prakhya
1debf0958f x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
The following commit:

  d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd")

forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and
EFI_MIXED_MODE:

- EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
  using ioremap() and init_memory_mapping(). This feature can be enabled by
  passing "efi=old_map" as kernel command line argument. But,
  efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
  efi_pgd and hence cannot be used for unmapping EFI boot services code/data
  regions from swapper_pg_dir.

Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.

- EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
  64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
  all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
  EFI_LOADER_<CODE/DATA>, EFI_BOOT_SERVICES_<CODE/DATA> and
  EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
  facilitate EFI runtime calls access it's arguments in 1:1 mode.

Hence, don't unmap EFI boot services code/data regions when booted in mixed mode.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-22 20:58:30 +01:00
Christoph Hellwig
0cd60eb1a7 dma-mapping: fix flags in dma_alloc_wc
We really need the writecombine flag in dma_alloc_wc, fix a stupid
oversight.

Fixes: 7ed1d91a9e ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22 08:46:27 -08:00
Linus Torvalds
23203e3f34 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: fix has_unmovable_pages for HugePages
  fork,memcg: fix crash in free_thread_stack on memcg charge fail
  mm: thp: fix flags for pmd migration when split
  mm, memory_hotplug: initialize struct pages for the full memory section
2018-12-21 14:59:00 -08:00
Oscar Salvador
17e2e7d7e1 mm, page_alloc: fix has_unmovable_pages for HugePages
While playing with gigantic hugepages and memory_hotplug, I triggered
the following #PF when "cat memoryX/removable":

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:has_unmovable_pages+0x154/0x210
  Call Trace:
   is_mem_section_removable+0x7d/0x100
   removable_show+0x90/0xb0
   dev_attr_show+0x1c/0x50
   sysfs_kf_seq_show+0xca/0x1b0
   seq_read+0x133/0x380
   __vfs_read+0x26/0x180
   vfs_read+0x89/0x140
   ksys_read+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is we do not pass the Head to page_hstate(), and so, the call
to compound_order() in page_hstate() returns 0, so we end up checking
all hstates's size to match PAGE_SIZE.

Obviously, we do not find any hstate matching that size, and we return
NULL.  Then, we dereference that NULL pointer in
hugepage_migration_supported() and we got the #PF from above.

Fix that by getting the head page before calling page_hstate().

Also, since gigantic pages span several pageblocks, re-adjust the logic
for skipping pages.  While are it, we can also get rid of the
round_up().

[osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
  Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Rik van Riel
5eed6f1dff fork,memcg: fix crash in free_thread_stack on memcg charge fail
Commit 9b6f7e163c ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.

Unfortunately, it also results in a crash.

This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.

This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:

#5 [ffffc900244efc88] die at ffffffff8101f0ab
 #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
 #7 [ffffc900244efce0] general_protection at ffffffff818ff082
    [exception RIP: llist_add_batch+7]
    RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
    RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
    RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
    R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
    RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
    RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
    RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
    R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

The simplest fix is to assign tsk->stack right where it is allocated.

Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163c ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Peter Xu
2e83ee1d86 mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs.  However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or
pmd_yound() while actually they don't make sense at all when it's a
migration entry.  Fix them up.  Since at it, drop the ifdef together as
not needed.

Note that if my understanding is correct about the problem then if
without the patch there is chance to lose some of the dirty bits in the
migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
swap offset instead of bit 2) and it could potentially corrupt the
memory of an userspace program which depends on the dirty bit.

Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Mikhail Zaslonko
2830bf6f05 mm, memory_hotplug: initialize struct pages for the full memory section
If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered
by memory_hotplug sysfs handlers:

Here are the the panic examples:
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_PGFLAGS=y

 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( test_pages_in_a_zone+0xde/0x160)
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( is_mem_section_removable+0xb4/0x190)
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.

Michal said:

: This has alwways been problem AFAIU.  It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8 kernels mostly and
: therefore Fixes: f7f99100d8 ("mm: stop zeroing memory during
: allocation in vmemmap")

Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Linus Torvalds
6cafab50ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "Just some small fixes here and there, and a refcount leak in a serial
  driver, nothing serious"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  serial/sunsu: fix refcount leak
  sparc: Set "ARCH: sunxx" information on the same line
  sparc: vdso: Drop implicit common-page-size linker flag
2018-12-21 14:23:57 -08:00
Linus Torvalds
87935eee57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking fixes from David Miller:
 "Some more bug fixes have trickled in, we have:

  1) Local MAC entries properly in mscc driver, from Allan W. Nielsen.

  2) Eric Dumazet found some more of the typical "pskb_may_pull() -->
     oops forgot to reload the header pointer" bugs in ipv6 tunnel
     handling.

  3) Bad SKB socket pointer in ipv6 fragmentation handling, from Herbert
     Xu.

  4) Overflow fix in sk_msg_clone(), from Vakul Garg.

  5) Validate address lengths in AF_PACKET, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
  qmi_wwan: Add support for Fibocom NL678 series
  tls: Do not call sk_memcopy_from_iter with zero length
  ipv6: tunnels: fix two use-after-free
  Prevent overflow of sk_msg in sk_msg_clone()
  packet: validate address length
  net: netxen: fix a missing check and an uninitialized use
  tcp: fix a race in inet_diag_dump_icsk()
  MAINTAINERS: update cxgb4 and cxgb3 maintainer
  ipv6: frags: Fix bogus skb->sk in reassembled packets
  mscc: Configured MAC entries should be locked.
2018-12-21 14:21:17 -08:00
Mans Rullgard
9bc30ab821 auxdisplay: charlcd: fix x/y command parsing
The x/y command parsing has been broken since commit 129957069e
("staging: panel: Fixed checkpatch warning about simple_strtoul()").

Commit b34050fadb ("auxdisplay: charlcd: Fix and clean up handling of
x/y commands") fixed some problems by rewriting the parsing code,
but also broke things further by removing the check for a complete
command before attempting to parse it.  As a result, parsing is
terminated at the first x or y character.

This reinstates the check for a final semicolon.  Whereas the original
code use strchr(), this is wasteful seeing as the semicolon is always
at the end of the buffer.  Thus check this character directly instead.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-12-21 21:27:21 +01:00
Yangtao Li
d430aff8cd serial/sunsu: fix refcount leak
The function of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.

su_get_type() doesn't do that. The match node are used as an identifier
to compare against the current node, so we can directly drop the refcount
after getting the node from the path as it is not used as pointer.

Fix this by use a single variable and drop the refcount right after
of_find_node_by_path().

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 11:24:38 -08:00
Corentin Labbe
afaffac368 sparc: Set "ARCH: sunxx" information on the same line
While checking boot log from SPARC qemu, I saw that the "ARCH: sunxx"
information was split on two different line.
This patchs merge both line together.
In the meantime, thoses information need to be printed via pr_info
since printk print them by default via the warning loglevel.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 11:24:38 -08:00
ndesaulniers@google.com
0ff70f62c6 sparc: vdso: Drop implicit common-page-size linker flag
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/sparc/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.

Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 11:24:38 -08:00
Linus Torvalds
5092adb227 Unbreak AMD nested virtualization.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcHOSgAAoJEL/70l94x66DAw4H/jQjdRjT1DAf4vswXwMD6lpJ
 qHcSyAYL4d/PFbcovfAm2ca8F0HJylVWDeZcqQRP3zdX53diqJ4gyYMaNuuY0niX
 zKvzNhFw1oaZK93rwrF6BX1jl4Virw2uC4qL9bhgV/OfkmvTPvIFkP8gJGVDt9YY
 Kn5yhWnJOpHOCQs3GW8zOy2LWtiuCrp7epSrMGjGsWrp50ccW1tTioxYyDmBr3mF
 GizAIgDD2xMwIeOlj4IngQhDTahwekOA9XzhSMKjm0/GMcZ33TXPcnUdoa0Yxguj
 Uu3cXLfcEUfakZdefi3FB5eDB2knDe3kbmKviok2giAAY1hBvEO5b6bHrn+5W2g=
 =l4oP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:
 "A simple patch for a pretty bad bug: Unbreak AMD nested
  virtualization."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: fix switch to guest mmu
2018-12-21 11:15:36 -08:00
Daniele Palmas
d667044f49 qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
This patch fixes qmap header retrieval when modem is configured for
dl data aggregation.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 10:58:45 -08:00
Linus Torvalds
e572fa0e84 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix a division by zero crash in the posix-timers code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Fix division by zero bug
2018-12-21 10:51:54 -08:00
Jörgen Storvist
7c3db4105c qmi_wwan: Add support for Fibocom NL678 series
Added support for Fibocom NL678 series cellular module QMI interface.
Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x40 series chipsets.

Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 10:51:21 -08:00
Vakul Garg
65a10e28ae tls: Do not call sk_memcopy_from_iter with zero length
In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC,
the number of bytes to be copied using subsequent function
sk_msg_memcopy_from_iter() becomes zero. This causes function
sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg()
to return failure. To prevent it, do not call sk_msg_memcopy_from_iter()
when number of bytes to copy (indicated by 'try_to_copy') is zero.

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 10:26:54 -08:00
Linus Torvalds
d5fa080d4c Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fix from Ingo Molnar:
 "A single fix for a robust futexes race between sys_exit() and
  sys_futex_lock_pi()"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Cure exit race
2018-12-21 10:11:51 -08:00
Linus Torvalds
70ad6368e8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "The biggest part is a series of reverts for the macro based GCC
  inlining workarounds. It caused regressions in distro build and other
  kernel tooling environments, and the GCC project was very receptive to
  fixing the underlying inliner weaknesses - so as time ran out we
  decided to do a reasonably straightforward revert of the patches. The
  plan is to rely on the 'asm inline' GCC 9 feature, which might be
  backported to GCC 8 and could thus become reasonably widely available
  on modern distros.

  Other than those reverts, there's misc fixes from all around the
  place.

  I wish our final x86 pull request for v4.20 was smaller..."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
  Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
  Revert "x86/refcount: Work around GCC inlining bug"
  Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
  Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
  Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
  Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
  Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
  Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
  x86/mtrr: Don't copy uninitialized gentry fields back to userspace
  x86/fsgsbase/64: Fix the base write helper functions
  x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
  x86/vdso: Pass --eh-frame-hdr to the linker
  x86/mm: Fix decoy address handling vs 32-bit builds
  x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
  x86/dump_pagetables: Fix LDT remap address marker
  x86/mm: Fix guard hole handling
2018-12-21 09:22:24 -08:00
Eric Dumazet
cbb49697d5 ipv6: tunnels: fix two use-after-free
xfrm6_policy_check() might have re-allocated skb->head, we need
to reload ipv6 header pointer.

sysbot reported :

BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304

CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40
 ipv6_addr_type include/net/ipv6.h:403 [inline]
 ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727
 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757
 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
IPVS: ftp: loaded support on port[0] = 21
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq+0x19/0x20 kernel/softirq.c:340
 netif_rx_ni+0x521/0x860 net/core/dev.c:4569
 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84
 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727
 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176
 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727
 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747
 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline]
 rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945
kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>'
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop!
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 sock_write_iter+0x35e/0x5c0 net/socket.c:900
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues'
kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0'
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues'
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457669
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669
RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003
kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env
RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4
R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff

Allocated by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3684 [inline]
 __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698
 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140
 __alloc_skb+0x155/0x760 net/core/skbuff.c:208
kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0'
 alloc_skb include/linux/skbuff.h:1011 [inline]
 __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450
 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619
 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116
 __sys_sendmsg+0x11d/0x280 net/socket.c:2154
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg net/socket.c:2161 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices'

Freed by task 1304:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3817
 skb_free_head+0x93/0xb0 net/core/skbuff.c:553
 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498
 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896
 pskb_may_pull include/linux/skbuff.h:2188 [inline]
 _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272
kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env
 __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322
 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline]
 xfrm_policy_check include/net/xfrm.h:1175 [inline]
 xfrm6_policy_check include/net/xfrm.h:1185 [inline]
 vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316
 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132
 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394
 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443
 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083
 process_backlog+0x24e/0x7a0 net/core/dev.c:5923
kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0'
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412
 __do_softirq+0x308/0xb7e kernel/softirq.c:292

The buggy address belongs to the object at ffff888191b8cac0
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 176 bytes inside of
 512-byte region [ffff888191b8cac0, ffff888191b8ccc0)
The buggy address belongs to the page:
page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940
raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>'

Memory state around the buggy address:
 ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 0d3c703a9d ("ipv6: Cleanup IPv6 tunnel receive path")
Fixes: ed1efb2aef ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 09:18:31 -08:00
Linus Torvalds
96d6ee7d2f final drm-fixes for 4.20
array_index_nospec patch, cc: stable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAlwc6lsACgkQTA9ye/CY
 qnFlcw/+PM/Rqi1MpI7lhHORI+MzodHfIAnftjstD4SPcl5e1/d51m9QMzS6iM6i
 AnZwl5WqsSfBg/5FcKg4r38NNo/nRWmrhFjQFsDyQeuPFTHKGS7jcYxy8sZBkkS1
 A/Y9iVsVNLUM48Ddhgbeibqb9vmUbdek0l8qkwYvi/RMIqNO8q+I2e8H7DOGXPKZ
 9glCiTPxhAHc/LmxqEPSzHTTBRlZBsgG+4A0Pb1w598xeW5GmCVvM8qZIQl/ET9g
 x3Z3GiOdKozR3WMue4dndxvtCg0SAZPPZp2C9bGsmapPiU126mtquTUl49Hou6qK
 Hjz4xnyNE9yM/4+Mce5j46SQLE0pezr10yYrJ4MtfF8I7W2qyxlb4aBni2v8ldVO
 kVVTayrUagi6owjrO6zJyJuDlXpxDsIfjaWvBsXEf73cmftQaQM6G55D85RJGu7o
 /yS2p+5xpjmy1aej46qam/S3AMiRgSery7naBYIWB7dwoIQ+sEvOoBZAY0TIh1VY
 CIq/hgh8zxlpfttU8u3Qbi6gWp4rc8pMvU6YpGvSmyzmZ1J9eDdv4bFh1vGsveTm
 Y8TDjra5Nh2nMAUP1mvVNDbkfV+jMuAoQQ+RMylnj4gugl+Llr9r3Sni+pat54vn
 jbIKmLmA3UMWoL5CIup3jaiI9n3+1UcXKG2aAJLe6uMWfYbIfNM=
 =yEGc
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2018-12-21' of git://anongit.freedesktop.org/drm/drm

Pull final drm fix from Daniel Vetter:
 "Very calm week, so either everything perfect or everyone on holidays
  already. Just one array_index_nospec patch, also for stable"

* tag 'drm-fixes-2018-12-21' of git://anongit.freedesktop.org/drm/drm:
  drm/ioctl: Fix Spectre v1 vulnerabilities
2018-12-21 09:17:52 -08:00
Vakul Garg
5c1e7e94a7 Prevent overflow of sk_msg in sk_msg_clone()
Fixed function sk_msg_clone() to prevent overflow of 'dst' while adding
pages in scatterlist entries. The overflow of 'dst' causes crash in kernel
tls module while doing record encryption.

Crash fixed by this patch.

[   78.796119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[   78.804900] Mem abort info:
[   78.807683]   ESR = 0x96000004
[   78.810744]   Exception class = DABT (current EL), IL = 32 bits
[   78.816677]   SET = 0, FnV = 0
[   78.819727]   EA = 0, S1PTW = 0
[   78.822873] Data abort info:
[   78.825759]   ISV = 0, ISS = 0x00000004
[   78.829600]   CM = 0, WnR = 0
[   78.832576] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000bf8ee311
[   78.839195] [0000000000000008] pgd=0000000000000000
[   78.844081] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   78.849642] Modules linked in: tls xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables xt_CHECKSUM cpve cpufreq_conservative lm90 ina2xx crct10dif_ce
[   78.865377] CPU: 0 PID: 6007 Comm: openssl Not tainted 4.20.0-rc6-01647-g754d5da63145-dirty #107
[   78.874149] Hardware name: LS1043A RDB Board (DT)
[   78.878844] pstate: 60000005 (nZCv daif -PAN -UAO)
[   78.883632] pc : scatterwalk_copychunks+0x164/0x1c8
[   78.888500] lr : scatterwalk_copychunks+0x160/0x1c8
[   78.893366] sp : ffff00001d04b600
[   78.896668] x29: ffff00001d04b600 x28: ffff80006814c680
[   78.901970] x27: 0000000000000000 x26: ffff80006c8de786
[   78.907272] x25: ffff00001d04b760 x24: 000000000000001a
[   78.912573] x23: 0000000000000006 x22: ffff80006814e440
[   78.917874] x21: 0000000000000100 x20: 0000000000000000
[   78.923175] x19: 000081ffffffffff x18: 0000000000000400
[   78.928476] x17: 0000000000000008 x16: 0000000000000000
[   78.933778] x15: 0000000000000100 x14: 0000000000000001
[   78.939079] x13: 0000000000001080 x12: 0000000000000020
[   78.944381] x11: 0000000000001080 x10: 00000000ffff0002
[   78.949683] x9 : ffff80006814c248 x8 : 00000000ffff0000
[   78.954985] x7 : ffff80006814c318 x6 : ffff80006c8de786
[   78.960286] x5 : 0000000000000f80 x4 : ffff80006c8de000
[   78.965588] x3 : 0000000000000000 x2 : 0000000000001086
[   78.970889] x1 : ffff7e0001b74e02 x0 : 0000000000000000
[   78.976192] Process openssl (pid: 6007, stack limit = 0x00000000291367f9)
[   78.982968] Call trace:
[   78.985406]  scatterwalk_copychunks+0x164/0x1c8
[   78.989927]  skcipher_walk_next+0x28c/0x448
[   78.994099]  skcipher_walk_done+0xfc/0x258
[   78.998187]  gcm_encrypt+0x434/0x4c0
[   79.001758]  tls_push_record+0x354/0xa58 [tls]
[   79.006194]  bpf_exec_tx_verdict+0x1e4/0x3e8 [tls]
[   79.010978]  tls_sw_sendmsg+0x650/0x780 [tls]
[   79.015326]  inet_sendmsg+0x2c/0xf8
[   79.018806]  sock_sendmsg+0x18/0x30
[   79.022284]  __sys_sendto+0x104/0x138
[   79.025935]  __arm64_sys_sendto+0x24/0x30
[   79.029936]  el0_svc_common+0x60/0xe8
[   79.033588]  el0_svc_handler+0x2c/0x80
[   79.037327]  el0_svc+0x8/0xc
[   79.040200] Code: 6b01005f 54fff788 940169b1 f9000320 (b9400801)
[   79.046283] ---[ end trace 74db007d069c1cf7 ]---

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 09:12:49 -08:00
Willem de Bruijn
99137b7888 packet: validate address length
Packet sockets with SOCK_DGRAM may pass an address for use in
dev_hard_header. Ensure that it is of sufficient length.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 09:11:25 -08:00
Linus Torvalds
0b51733372 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Switching a few devices with Synaptics over to SMbus and disabling
  SMbus on a couple devices with Elan touchpads as they need more
  plumbing on PS/2 side"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - enable SMBus for HP EliteBook 840 G4
  Input: elantech - disable elan-i2c for P52 and P72
  Input: synaptics - enable RMI on ThinkPad T560
  Input: omap-keypad - fix idle configuration to not block SoC idle states
2018-12-21 09:09:30 -08:00
Linus Torvalds
bc380733a5 GPIO fixes for v4.20:
- ACPI IRQ request deferral
 - OMAP: revert deferred wakeup quirk
 - MAX7301: fix DMA safe memory handling
 - MVEBU: selective probe failure on missing clk
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcHKOUAAoJEEEQszewGV1zFCoP/0E/AfCLVxTZSgEPBlhzBL1a
 naXw4bFAM5WLF3t9aXqck0+mZ5T8ko7/VutOmmn54ogmAW++bGK+FyGmSxu+i9Ia
 EDVRlHzSvGmT6mQ1zK/J4YrWIlaW9paTOsbAdTkqrl2fsJygdyKOoWF0IzcHa0St
 L90+sCseloPj285GS96EYQFolCvyIEqaRVfRiBsnxD7fIpSNRpywKvKG83b8WGdZ
 aPyDfkjeDk9DZBaUx7je9INz5LwO5Aa3wnlEs26DXTj426+/KKPN/4Q64V2FSmbT
 3WE2rOtgadyVqjF0YopjQ+IqK97/cN6dXYrPJKpW8QaOAS+4MWy4cvRrXyUSGfOS
 piu6DzSyBMUPjRhbo8Vxxhu4N+MKxBu4wwLKtuW4sVS03e2DZ7JIwhZQbGkP26+3
 v3vmbHL2/KN5N2syW5d5ZKURcf6cG1DWC5ZL1XFrXFv1635DAPfT9i9cHBNMxRLO
 p1M5bgmIyVsIHQ96xEW/2Bvrw+MkS8KVLWrNHmqengGWrXM2X01GKzgD+CmkWh6b
 KU5u7kmNjACoYg+cCF+yN8YKrkrc+NxbFEgh3MhUUilgggCTCqFUOkJCAf8wuaqD
 FS1tpRe3s7RFW3u06Yb94q7Dt6jNsAzIOSTriSQExxydtQDI6ukyizxeVZRorwaU
 Np4KSWpl+gl1CfcZvimQ
 =+Iw2
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Hopefully last round of GPIO fixes.

  The ACPI patch is pretty important for some laptop users, the rest is
  driver-specific for embedded (mostly ARM) systems.

  I took out one ACPI patch that wasn't critical enough because I
  couldn't justify sending it at this point, and that is why the commit
  date is today, but the patches have been in linux-next.

  Sorry for not sending some of them earlier :(

  Notice that we have a co-maintainer for GPIO now, Bartosz Golaszewski,
  and he might jump in and make some pull requests at times when I am
  off.

  Summary:

   - ACPI IRQ request deferral

   - OMAP: revert deferred wakeup quirk

   - MAX7301: fix DMA safe memory handling

   - MVEBU: selective probe failure on missing clk"

* tag 'gpio-v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: mvebu: only fail on missing clk if pwm is actually to be used
  gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
  gpio: gpio-omap: Revert deferred wakeup quirk handling for regressions
  gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers
2018-12-21 09:05:28 -08:00
Kangjie Lu
d134e486e8 net: netxen: fix a missing check and an uninitialized use
When netxen_rom_fast_read() fails, "bios" is left uninitialized and may
contain random value, thus should not be used.

The fix ensures that if netxen_rom_fast_read() fails, we return "-EIO".

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 09:01:47 -08:00
Linus Torvalds
783619556a an important smb3 fix for an regression to some servers introduced by compounding optimization to rmdir
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlwcW2IACgkQiiy9cAdy
 T1FaLAv/Vs0QhYATaJHkvmjk7EdFoXTZST2wYxPcPSrOfWbCaMjsKREwPVUjOCiW
 W//zL/Qbi0FSVMWdoskJP0KQC5rAhUSxlvtDxrpzqszfeJprIs0oL5IRlOwxNdtT
 F9/i8+s/AuYq0TTn15UtqZHS6wZdWcerdttWV1V/97hEwcO5Xg0pEtyCmLPf7k8W
 wNdxCAQFZ9j2pDVyuJO3a0+Tas34dc2t/cac12h0qeXbrE6e88bQWa6bpEKSvCKr
 cMU94pkajCKeelZOhq+ga7cCmlBJs6gt4sgsKEsoDn72tQCsWVH6p1N4+AxmLsZU
 bR65XodusR1WHMesSth7QraUk0pIQ4ZzMRPZJCkh9bSjaa+fxX1Up/sjn74q1prf
 DHJ/52rQrWK3hvETUZD2B6N9AEDN0swbqeCJRlUYlzG5OEdfit9qSgfTaRzYxVnX
 +tct7j+8mjzk+rsGuTXrQupPXUndPTcpUrKFp5db9Sejcx/Gw/atKhW6mVgL3LiR
 8kVIraSV
 =+8yd
 -----END PGP SIGNATURE-----

Merge tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fix from Steve French:
 "An important smb3 fix for an regression to some servers introduced by
  compounding optimization to rmdir.

  This fix has been tested by multiple developers (including me) with
  the usual private xfstesting, but also by the new cifs/smb3 "buildbot"
  xfstest VMs (thank you Ronnie and Aurelien for good work on this
  automation). The automated testing has been updated so that it will
  catch problems like this in the future.

  Note that Pavel discovered (very recently) some unrelated but
  extremely important bugs in credit handling (smb3 flow control problem
  that can lead to disconnects/reconnects) when compounding, that I
  would have liked to send in ASAP but the complete testing of those two
  fixes may not be done in time and have to wait for 4.21"

* tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: Fix rmdir compounding regression to strict servers
2018-12-21 08:56:31 -08:00
Anup Patel
9b9afe4a0e
RISC-V: Select GENERIC_SCHED_CLOCK for clocksource drivers
The riscv_timer driver can provide sched_clock using "rdtime"
instruction but to achieve this we require generic sched_clock
framework hence this patch selects GENERIC_SCHED_CLOCK for RISCV.

Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:17:19 -08:00
Olof Johansson
a266cdba17
RISC-V: lib: minor asm cleanup
Fix tab/space conversion and use ENTRY/ENDPROC macros.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:17:02 -08:00
Palmer Dabbelt
358f3fff52
RISC-V: Move from EARLY_PRINTK to SBI earlycon
Now that we have earlycon support in the SBI console driver there is no
reason to have our arch-specific early printk support.  This patch set
turns on SBI earlycon support and removes the old early printk.
2018-12-21 08:15:39 -08:00
Nick Kossifidis
3aed8c4326
RISC-V: Update Kconfig to better handle CMDLINE
Added a menu to choose how the built-in command line will be
used and CMDLINE_EXTEND for compatibility with FDT code.

v2: Improved help messages, removed references to bootloader
and made them more descriptive. I also asked help from a
friend who's a language expert just in case.

v3: This time used the corrected text

v4: Copy the config strings from the arm32 port.

v5: Actually copy the config strings from the arm32 port.

Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Signed-off-by: Debbie Maliotaki <dmaliotaki@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:15:02 -08:00
David Abdurachmanov
397182e0db
riscv: remove unused variable in ftrace
Noticed while building kernel-4.20.0-0.rc5.git2.1.fc30 for
Fedora 30/RISCV.

[..]
BUILDSTDERR: arch/riscv/kernel/ftrace.c: In function 'prepare_ftrace_return':
BUILDSTDERR: arch/riscv/kernel/ftrace.c:135:6: warning: unused variable 'err' [-Wunused-variable]
BUILDSTDERR:   int err;
BUILDSTDERR:       ^~~
[..]

Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Fixes: e949b6db51 ("riscv/function_graph: Simplify with function_graph_enter()")
Reviewed-by: Olof Johansson <olof@lixom.net>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:11:26 -08:00
Yangtao Li
cd378dbb3d
RISC-V: add of_node_put()
use of_node_put() to release the refcount.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:11:08 -08:00
Atish Patra
94f9bf118f
RISC-V: Fix of_node_* refcount
Fix of_node* refcount at various places by using of_node_put.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:10:49 -08:00
Andrea Parri
8b699616f3
riscv, atomic: Add #define's for the atomic_{cmp,}xchg_*() variants
If an architecture does not define the atomic_{cmp,}xchg_*() variants,
the generic implementation defaults them to the fully-ordered version.

riscv's had its own variants since "the beginning", but it never told
(#define-d these for) the generic implementation: it is time to do so.

Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21 08:10:30 -08:00
Mark Brown
c3b5725965
Merge remote-tracking branch 'regulator/topic/coupled' into regulator-next 2018-12-21 13:43:35 +00:00
Mark Brown
b27d9668be
Merge branch 'regulator-4.21' into regulator-next 2018-12-21 13:43:32 +00:00
Mark Brown
67a2ab931e
Merge branch 'regulator-4.20' into regulator-linus 2018-12-21 13:43:30 +00:00
Robert Hoo
a0aea130af KVM: x86: Add CPUID support for new instruction WBNOINVD
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 14:26:32 +01:00
Andrew Jones
57d5edfe64 kvm: selftests: ucall: fix exit mmio address guessing
Fix two more bugs in the exit_mmio address guessing.
The first bug was that the start and step calculations were
wrong since they were dividing the number of address bits instead
of the address space. The second other bug was that the guessing
algorithm wasn't considering the valid physical and virtual address
ranges correctly for an identity map.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 13:58:46 +01:00
Sean Christopherson
2bcbd40671 Revert "compiler-gcc: disable -ftracer for __noclone functions"
The -ftracer optimization was disabled in __noclone as a workaround to
GCC duplicating a blob of inline assembly that happened to define a
global variable.  It has been pointed out that no amount of workarounds
can guarantee the compiler won't duplicate inline assembly[1], and that
disabling the -ftracer optimization has several unintended and nasty
side effects[2][3].

Now that the offending KVM code which required the workaround has
been properly fixed and no longer uses __noclone, remove the -ftracer
optimization tweak from __noclone.

[1] https://lore.kernel.org/lkml/ri6y38lo23g.fsf@suse.cz/T/#u
[2] https://lore.kernel.org/lkml/20181218140105.ajuiglkpvstt3qxs@treble/T/#u
[3] https://patchwork.kernel.org/patch/8707981/#21817015

This reverts commit 95272c2937.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:44:28 +01:00
Kangjie Lu
cd07e3701f
regulator: tps65910: fix a missing check of return value
tps65910_reg_set_bits() may fail. The fix checks if it fails, and if so,
returns with its error code.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-12-21 11:04:59 +00:00
Sean Christopherson
453eafbe65 KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc...  For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly.  The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.

The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits.  KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME.  The label is then exposed to C code via a global
variable definition in the inline assembly.

Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler.  The earliest approach placed the inline
assembly in a separate noinline function[1].  Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.

After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated.  This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].

Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6].  And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.

So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].

Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines.  Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw().  Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.

Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL.  There are no great
alternatives to fudging RSP.  Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI).  The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]).  But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required.  A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines.  And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path.  In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.

[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49a8 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226bfd ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c2937 ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com

Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Martin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:50 +01:00
Sean Christopherson
051a2d3e59 KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the
'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run()
and nested_vmx_check_vmentry_hw().  Using the symbolic name means that
adding/removing an output parameter(s) requires "rewriting" almost all
of the asm blob, which makes it nearly impossible to understand what's
being changed in even the most minor patches.

Opportunistically improve the code comments.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:43 +01:00
Axel Lin
77ea906082
regulator: mcp16502: Select REGMAP_I2C to fix build error
Fix build error when CONFIG_REGMAP_I2C=m && CONFIG_REGULATOR_MCP16502=y.

drivers/regulator/mcp16502.o: In function `mcp16502_probe':
mcp16502.c:(.text+0xca): undefined reference to `__devm_regmap_init_i2c'

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-12-21 11:02:24 +00:00
Paolo Bonzini
c6ad459733 Second PPC KVM update for 4.21
This has 5 commits that fix page dirty tracking when running nested
 HV KVM guests, from Suraj Jitindar Singh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJcHGnCAAoJEJ2a6ncsY3GfaMQIAKP1KQ/SzE38gORRjdf4aUTf
 8X6+B6bAjVPwu0OgR2xoU4hPT8vpViG8gIjlspDm/v1igRlokz5GeCEXXlttQwK6
 5JFfqS+psCQ/Z/bPFo0uW7cOojeDeE3s1Vd3XXjMH79T6Mvpg54fYutvxd+qbjW6
 gAl/jGK6xxo1XsaQfySeSSLA3b8ibI77mjnPBgvbSHbrxBAIjoAfqKTVSjrPwR2b
 ZvHyCbaoX2uOGyWrw9O73CqCgsiXEOQvr8dLEjsT7a+brrJBO4mv20s6DzlpC1lS
 YVHd8GAfk44h1fxsNXsj9eEUIcRMEB4fYYOd/u0TECdeyFFWz+8JjBhm6u0TRck=
 =Zrya
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next

Second PPC KVM update for 4.21

This has 5 commits that fix page dirty tracking when running nested
HV KVM guests, from Suraj Jitindar Singh.
2018-12-21 11:48:41 +01:00
Sean Christopherson
e814349950 KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
____kvm_handle_fault_on_reboot() provides a generic exception fixup
handler that is used to cleanly handle faults on VMX/SVM instructions
during reboot (or at least try to).  If there isn't a reboot in
progress, ____kvm_handle_fault_on_reboot() treats any exception as
fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
a BUG() to get a stack trace and die.

When it was originally added by commit 4ecac3fd6d ("KVM: Handle
virtualization instruction #UD faults during reboot"), the "call" to
kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
is the RIP of the faulting instructing.

The PUSH+JMP trickery is necessary because the exception fixup handler
code lies outside of its associated function, e.g. right after the
function.  An actual CALL from the .fixup code would show a slightly
bogus stack trace, e.g. an extra "random" function would be inserted
into the trace, as the return RIP on the stack would point to no known
function (and the unwinder will likely try to guess who owns the RIP).

Unfortunately, the JMP was replaced with a CALL when the macro was
reworked to not spin indefinitely during reboot (commit b7c4145ba2
"KVM: Don't spin on virt instruction faults during reboot").  This
causes the aforementioned behavior where a bogus function is inserted
into the stack trace, e.g. my builds like to blame free_kvm_area().

Revert the CALL back to a JMP.  The changelog for commit b7c4145ba2
("KVM: Don't spin on virt instruction faults during reboot") contains
nothing that indicates the switch to CALL was deliberate.  This is
backed up by the fact that the PUSH <insn RIP> was left intact.

Note that an alternative to the PUSH+JMP magic would be to JMP back
to the "real" code and CALL from there, but that would require adding
a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
and would add no value, i.e. the stack trace would be the same.

Using CALL:

------------[ cut here ]------------
kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
invalid opcode: 0000 [#1] SMP
CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
Call Trace:
 free_kvm_area+0x1044/0x43ea [kvm_intel]
 ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? __set_task_blocked+0x38/0x90
 ? __set_current_blocked+0x50/0x60
 ? __fpu__restore_sig+0x97/0x490
 ? do_vfs_ioctl+0xa1/0x620
 ? __x64_sys_futex+0x89/0x180
 ? ksys_ioctl+0x66/0x70
 ? __x64_sys_ioctl+0x16/0x20
 ? do_syscall_64+0x4f/0x100
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
---[ end trace 9775b14b123b1713 ]---

Using JMP:

------------[ cut here ]------------
kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
invalid opcode: 0000 [#1] SMP
CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
Call Trace:
 vmx_vcpu_run+0x156/0x630 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? __set_task_blocked+0x38/0x90
 ? __set_current_blocked+0x50/0x60
 ? __fpu__restore_sig+0x97/0x490
 ? do_vfs_ioctl+0xa1/0x620
 ? __x64_sys_futex+0x89/0x180
 ? ksys_ioctl+0x66/0x70
 ? __x64_sys_ioctl+0x16/0x20
 ? do_syscall_64+0x4f/0x100
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
---[ end trace f9daedb85ab3ddba ]---

Fixes: b7c4145ba2 ("KVM: Don't spin on virt instruction faults during reboot")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:48:23 +01:00