568874 Commits

Author SHA1 Message Date
Hugh Dickins
003e476716 kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Hugh Dickins
edde73205b kaiser: do not set _PAGE_NX on pgd_none
native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).

The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too.  So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:

A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.

This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.

What we need is a flag to set_pgd(), to tell it we're dealing with
userspace.  Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that.  But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.

But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Dave Hansen
bed9bb7f3e kaiser: merged update
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Richard Fellner
8a43ddfb93 KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Tom Lendacky
0fa147b407 x86/boot: Add early cmdline parsing for options with arguments
commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Greg Kroah-Hartman
e68d6189c7 Linux 4.4.109 2018-01-02 20:33:28 +01:00
Andy Lutomirski
5d67dbef74 mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
commit 5dd0b16cdaff9b94da06074d5888b03235c0bf17 upstream.

This fixes CONFIG_SMP=n, CONFIG_DEBUG_TLBFLUSH=y without introducing
further #ifdef soup.  Caught by a Kbuild bot randconfig build.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")
Link: http://lkml.kernel.org/r/76da9a3cc4415996f2ad2c905b93414add322021.1496673616.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Linus Torvalds
2e38839225 n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
commit 966031f340185eddd05affcf72b740549f056348 upstream.

We added support for EXTPROC back in 2010 in commit 26df6d13406d ("tty:
Add EXTPROC support for LINEMODE") and the intent was to allow it to
override some (all?) ICANON behavior.  Quoting from that original commit
message:

         There is a new bit in the termios local flag word, EXTPROC.
         When this bit is set, several aspects of the terminal driver
         are disabled.  Input line editing, character echo, and mapping
         of signals are all disabled.  This allows the telnetd to turn
         off these functions when in linemode, but still keep track of
         what state the user wants the terminal to be in.

but the problem turns out that "several aspects of the terminal driver
are disabled" is a bit ambiguous, and you can really confuse the n_tty
layer by setting EXTPROC and then causing some of the ICANON invariants
to no longer be maintained.

This fixes at least one such case (TIOCINQ) becoming unhappy because of
the confusion over whether ICANON really means ICANON when EXTPROC is set.

This basically makes TIOCINQ match the case of read: if EXTPROC is set,
we ignore ICANON.  Also, make sure to reset the ICANON state ie EXTPROC
changes, not just if ICANON changes.

Fixes: 26df6d13406d ("tty: Add EXTPROC support for LINEMODE")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Thomas Gleixner
977614061c x86/smpboot: Remove stale TLB flush invocations
commit 322f8b8b340c824aef891342b0f5795d15e11562 upstream.

smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector()
invoke local_flush_tlb() for no obvious reason.

Digging in history revealed that the original code in the 2.1 era added
those because the code manipulated a swapper_pg_dir pagetable entry. The
pagetable manipulation was removed long ago in the 2.3 timeframe, but the
TLB flush invocations stayed around forever.

Remove them along with the pointless pr_debug()s which come from the same 2.1
change.

Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171230211829.586548655@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Thomas Gleixner
458ed31799 nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

  if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Daniel Thompson
46e9398c96 usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
commit da99706689481717998d1d48edd389f339eea979 upstream.

When plugging in a USB webcam I see the following message:
xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs
XHCI_TRUST_TX_LENGTH quirk?
handle_tx_event: 913 callbacks suppressed

All is quiet again with this patch (and I've done a fair but of soak
testing with the camera since).

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:28 +01:00
Mathias Nyman
0ceb7a6639 USB: Fix off by one in type-specific length check of BOS SSP capability
commit 07b9f12864d16c3a861aef4817eb1efccbc5d0e6 upstream.

USB 3.1 devices are not detected as 3.1 capable since 4.15-rc3 due to a
off by one in commit 81cf4a45360f ("USB: core: Add type-specific length
check of BOS descriptors")

It uses USB_DT_USB_SSP_CAP_SIZE() to get SSP capability size which takes
the zero based SSAC as argument, not the actual count of sublink speed
attributes.

USB3 spec 9.6.2.5 says "The number of Sublink Speed Attributes = SSAC + 1."

The type-specific length check patch was added to stable and needs to be
fixed there as well

Fixes: 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors")
CC: Masakazu Mokuno <masakazu.mokuno@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Oliver Neukum
c359123bfd usb: add RESET_RESUME for ELSA MicroLink 56K
commit b9096d9f15c142574ebebe8fbb137012bb9d99c2 upstream.

This modem needs this quirk to operate. It produces timeouts when
resumed without reset.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Dmitry Fleytman Dmitry Fleytman
955ae55f2d usb: Add device quirk for Logitech HD Pro Webcam C925e
commit 7f038d256c723dd390d2fca942919573995f4cfd upstream.

Commit e0429362ab15
("usb: Add device quirk for Logitech HD Pro Webcams C920 and C930e")
introduced quirk to workaround an issue with some Logitech webcams.

There is one more model that has the same issue - C925e, so applying
the same quirk as well.

See aforementioned commit message for detailed explanation of the problem.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
SZ Lin (林上智)
e8cd11a935 USB: serial: option: adding support for YUGA CLM920-NC5
commit 3920bb713038810f25770e7545b79f204685c8f2 upstream.

This patch adds support for YUGA CLM920-NC5 PID 0x9625 USB modem to option
driver.

Interface layout:
0: QCDM/DIAG
1: ADB
2: MODEM
3: AT
4: RMNET

Signed-off-by: Taiyi Wu <taiyity.wu@moxa.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Daniele Palmas
140cfcc8b0 USB: serial: option: add support for Telit ME910 PID 0x1101
commit 08933099e6404f588f81c2050bfec7313e06eeaf upstream.

This patch adds support for PID 0x1101 of Telit ME910.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Reinhard Speyerer
232972f739 USB: serial: qcserial: add Sierra Wireless EM7565
commit 92a18a657fb2e2ffbfa0659af32cc18fd2346516 upstream.

Sierra Wireless EM7565 devices use the QCSERIAL_SWI layout for their
serial ports

T:  Bus=01 Lev=03 Prnt=29 Port=01 Cnt=02 Dev#= 31 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1199 ProdID=9091 Rev= 0.06
S:  Manufacturer=Sierra Wireless, Incorporated
S:  Product=Sierra Wireless EM7565 Qualcomm Snapdragon X16 LTE-A
S:  SerialNumber=xxxxxxxx
C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=qcserial
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial
E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=86(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

but need sendsetup = true for the NMEA port to make it work properly.

Simplify the patch compared to v1 as suggested by Bjørn Mork by taking
advantage of the fact that existing devices work with sendsetup = true
too.

Use sendsetup = true for the NMEA interface of QCSERIAL_SWI and add
DEVICE_SWI entries for the EM7565 PID 0x9091 and the EM7565 QDL PID
0x9090.

Tests with several MC73xx/MC74xx/MC77xx devices have been performed in
order to verify backward compatibility.

Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Max Schulze
24162c1781 USB: serial: ftdi_sio: add id for Airbus DS P8GR
commit c6a36ad383559a60a249aa6016cebf3cb8b6c485 upstream.

Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver.

Signed-off-by: Max Schulze <max.schulze@posteo.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Shuah Khan
76f611cbdb usbip: vhci: stop printing kernel pointer addresses in messages
commit 8272d099d05f7ab2776cf56a2ab9f9443be18907 upstream.

Remove and/or change debug, info. and error messages to not print
kernel pointer addresses.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:27 +01:00
Shuah Khan
feeae6411a usbip: stub: stop printing kernel pointer addresses in messages
commit 248a22044366f588d46754c54dfe29ffe4f8b4df upstream.

Remove and/or change debug, info. and error messages to not print
kernel pointer addresses.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Juan Zea
e0362688b4 usbip: fix usbip bind writing random string after command in match_busid
commit 544c4605acc5ae4afe7dd5914147947db182f2fb upstream.

usbip bind writes commands followed by random string when writing to
match_busid attribute in sysfs, caused by using full variable size
instead of string length.

Signed-off-by: Juan Zea <juan.zea@qindel.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Willem de Bruijn
3f54a6d41d sock: free skb in skb_complete_tx_timestamp on error
[ Upstream commit 35b99dffc3f710cafceee6c8c6ac6a98eb2cb4bf ]

skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Grygorii Strashko
623bfb5818 net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
[ Upstream commit c1a8d0a3accf64a014d605e6806ce05d1c17adf1 ]

Under some circumstances driver will perform PHY reset in
ksz9031_read_status() to fix autoneg failure case (idle error count =
0xFF). When this happens ksz9031 will not detect link status change any
more when connecting to Netgear 1G switch (link can be recovered sometimes by
restarting netdevice "ifconfig down up"). Reproduced with TI am572x board
equipped with ksz9031 PHY while connecting to Netgear 1G switch.

Fix the issue by reconfiguring autonegotiation after PHY reset in
ksz9031_read_status().

Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Eric W. Biederman
5854ca90c6 net: Fix double free and memory corruption in get_net_ns_by_id()
[ Upstream commit 21b5944350052d2583e82dd59b19a9ba94a007f0 ]

(I can trivially verify that that idr_remove in cleanup_net happens
 after the network namespace count has dropped to zero --EWB)

Function get_net_ns_by_id() does not check for net::count
after it has found a peer in netns_ids idr.

It may dereference a peer, after its count has already been
finaly decremented. This leads to double free and memory
corruption:

put_net(peer)                                   rtnl_lock()
atomic_dec_and_test(&peer->count) [count=0]     ...
__put_net(peer)                                 get_net_ns_by_id(net, id)
  spin_lock(&cleanup_list_lock)
  list_add(&net->cleanup_list, &cleanup_list)
  spin_unlock(&cleanup_list_lock)
queue_work()                                      peer = idr_find(&net->netns_ids, id)
  |                                               get_net(peer) [count=1]
  |                                               ...
  |                                               (use after final put)
  v                                               ...
  cleanup_net()                                   ...
    spin_lock(&cleanup_list_lock)                 ...
    list_replace_init(&cleanup_list, ..)          ...
    spin_unlock(&cleanup_list_lock)               ...
    ...                                           ...
    ...                                           put_net(peer)
    ...                                             atomic_dec_and_test(&peer->count) [count=0]
    ...                                               spin_lock(&cleanup_list_lock)
    ...                                               list_add(&net->cleanup_list, &cleanup_list)
    ...                                               spin_unlock(&cleanup_list_lock)
    ...                                             queue_work()
    ...                                           rtnl_unlock()
    rtnl_lock()                                   ...
    for_each_net(tmp) {                           ...
      id = __peernet2id(tmp, peer)                ...
      spin_lock_irq(&tmp->nsid_lock)              ...
      idr_remove(&tmp->netns_ids, id)             ...
      ...                                         ...
      net_drop_ns()                               ...
	net_free(peer)                            ...
    }                                             ...
  |
  v
  cleanup_net()
    ...
    (Second free of peer)

Also, put_net() on the right cpu may reorder with left's cpu
list_replace_init(&cleanup_list, ..), and then cleanup_list
will be corrupted.

Since cleanup_net() is executed in worker thread, while
put_net(peer) can happen everywhere, there should be
enough time for concurrent get_net_ns_by_id() to pick
the peer up, and the race does not seem to be unlikely.
The patch fixes the problem in standard way.

(Also, there is possible problem in peernet2id_alloc(), which requires
check for net::count under nsid_lock and maybe_get_net(peer), but
in current stable kernel it's used under rtnl_lock() and it has to be
safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
be fixed too. While this is not in stable kernel yet, so I'll send
a separate message to netdev@ later).

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Nikolay Aleksandrov
dd1e454c4d net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
[ Upstream commit 84aeb437ab98a2bce3d4b2111c79723aedfceb33 ]

The early call to br_stp_change_bridge_id in bridge's newlink can cause
a memory leak if an error occurs during the newlink because the fdb
entries are not cleaned up if a different lladdr was specified, also
another minor issue is that it generates fdb notifications with
ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
which get added on NETDEV_REGISTER event, but are not cleaned up in the
newlink error path. To remove this special case the call to
br_stp_change_bridge_id is done after netdev register and we cleanup the
bridge on changelink error via br_dev_delete to plug all leaks.

This patch makes netlink bridge destruction on newlink error the same as
dellink and ioctl del which is necessary since at that point we have a
fully initialized bridge device.

To reproduce the issue:
$ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
RTNETLINK answers: Invalid argument

$ rmmod bridge
[ 1822.142525] =============================================================================
[ 1822.143640] BUG bridge_fdb_cache (Tainted: G           O    ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
[ 1822.144821] -----------------------------------------------------------------------------

[ 1822.145990] Disabling lock debugging due to kernel taint
[ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
[ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G    B      O     4.15.0-rc2+ #87
[ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1822.150008] Call Trace:
[ 1822.150510]  dump_stack+0x78/0xa9
[ 1822.151156]  slab_err+0xb1/0xd3
[ 1822.151834]  ? __kmalloc+0x1bb/0x1ce
[ 1822.152546]  __kmem_cache_shutdown+0x151/0x28b
[ 1822.153395]  shutdown_cache+0x13/0x144
[ 1822.154126]  kmem_cache_destroy+0x1c0/0x1fb
[ 1822.154669]  SyS_delete_module+0x194/0x244
[ 1822.155199]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 1822.155773]  entry_SYSCALL_64_fastpath+0x23/0x9a
[ 1822.156343] RIP: 0033:0x7f929bd38b17
[ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
[ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
[ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
[ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
[ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
[ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
[ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128

Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Ido Schimmel
169a9861c6 ipv4: Fix use-after-free when flushing FIB tables
[ Upstream commit b4681c2829e24943aadd1a7bb3a30d41d0a20050 ]

Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
local table uses the same trie allocated for the main table when custom
rules are not in use.

When a net namespace is dismantled, the main table is flushed and freed
(via an RCU callback) before the local table. In case the callback is
invoked before the local table is iterated, a use-after-free can occur.

Fix this by iterating over the FIB tables in reverse order, so that the
main table is always freed after the local table.

v3: Reworded comment according to Alex's suggestion.
v2: Add a comment to make the fix more explicit per Dave's and Alex's
feedback.

Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:26 +01:00
Tonghao Zhang
4033c396f8 sctp: Replace use of sockets_allocated with specified macro.
[ Upstream commit 8cb38a602478e9f806571f6920b0a3298aabf042 ]

The patch(180d8cd942ce) replaces all uses of struct sock fields'
memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem
to accessor macros. But the sockets_allocated field of sctp sock is
not replaced at all. Then replace it now for unifying the code.

Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Tobias Jordan
ca57949d16 net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
[ Upstream commit 589bf32f09852041fbd3b7ce1a9e703f95c230ba ]

add appropriate calls to clk_disable_unprepare() by jumping to out_mdio
in case orion_mdio_probe() returns -EPROBE_DEFER.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock")
Signed-off-by: Tobias Jordan <Tobias.Jordan@elektrobit.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Mohamed Ghannam
be27b620a8 net: ipv4: fix for a race condition in raw_sendmsg
[ Upstream commit 8f659a03a0ba9289b9aeb9b4470e6fb263d6f483 ]

inet->hdrincl is racy, and could lead to uninitialized stack pointer
usage, so its value should be read only once.

Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt")
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Brian King
6dfc02ba51 tg3: Fix rx hang on MTU change with 5717/5719
[ Upstream commit 748a240c589824e9121befb1cba5341c319885bc ]

This fixes a hang issue seen when changing the MTU size from 1500 MTU
to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom,
they've indicated that these chipsets have the same phy as the 57766
chipset, so the same workarounds apply. This has been tested by IBM
on both Power 8 and Power 9 systems as well as by Broadcom on x86
hardware and has been confirmed to resolve the hang issue.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Christoph Paasch
6925223ab3 tcp md5sig: Use skb's saddr when replying to an incoming segment
[ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ]

The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.

Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.

This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.

Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Shaohua Li
e0bdd21a86 net: reevalulate autoflowlabel setting after sysctl setting
[ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ]

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7c4
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Sebastian Sjoholm
06525d5b8e net: qmi_wwan: add Sierra EM7565 1199:9091
[ Upstream commit aceef61ee56898cfa7b6960fb60b9326c3860441 ]

Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem.
The USB id is added to qmi_wwan.c to allow QMI communication
with the EM7565.

Signed-off-by: Sebastian Sjoholm <ssjoholm@mac.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:25 +01:00
Kevin Cernekee
f778ce6faa netlink: Add netns check on taps
[ Upstream commit 93c647643b48f0131f02e45da3bd367d80443291 ]

Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity.  Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.

Test case:

    vpnns -- bash -c "ip link add nlmon0 type nlmon; \
                      ip link set nlmon0 up; \
                      tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
    sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
        spi 0x1 mode transport \
        auth sha1 0x6162633132330000000000000000000000000000 \
        enc aes 0x00000000000000000000000000000000
    grep --binary abc123 /tmp/nlmon.pcap

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Kevin Cernekee
373b423038 net: igmp: Use correct source address on IGMPv3 reports
[ Upstream commit a46182b00290839fa3fa159d54fd3237bd8669f0 ]

Closing a multicast socket after the final IPv4 address is deleted
from an interface can generate a membership report that uses the
source IP from a different interface.  The following test script, run
from an isolated netns, reproduces the issue:

    #!/bin/bash

    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link set dummy0 up
    ip link set dummy1 up
    ip addr add 10.1.1.1/24 dev dummy0
    ip addr add 192.168.99.99/24 dev dummy1

    tcpdump -U -i dummy0 &
    socat EXEC:"sleep 2" \
        UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &

    sleep 1
    ip addr del 10.1.1.1/24 dev dummy0
    sleep 5
    kill %tcpdump

RFC 3376 specifies that the report must be sent with a valid IP source
address from the destination subnet, or from address 0.0.0.0.  Add an
extra check to make sure this is the case.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Eric Dumazet
0a8cb76e11 ipv6: mcast: better catch silly mtu values
[ Upstream commit b9b312a7a451e9c098921856e7cfbc201120e1a7 ]

syzkaller reported crashes in IPv6 stack [1]

Xin Long found that lo MTU was set to silly values.

IPv6 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in mld code where it is assumed
the mtu is suitable.

Fix this by reading device mtu once and checking IPv6 minimal MTU.

[1]
 skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
 head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:104!
 invalid opcode: 0000 [#1] SMP KASAN
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
 RSP: 0018:ffff8801db307508 EFLAGS: 00010286
 RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
 RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
 RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
 R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
 FS:  0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  skb_over_panic net/core/skbuff.c:109 [inline]
  skb_put+0x181/0x1c0 net/core/skbuff.c:1694
  add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
  add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
  mld_send_cr net/ipv6/mcast.c:1903 [inline]
  mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
  call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
  expire_timers kernel/time/timer.c:1357 [inline]
  __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d3/0x210 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Eric Dumazet
3d406a18bd ipv4: igmp: guard against silly MTU values
[ Upstream commit b5476022bbada3764609368f03329ca287528dc8 ]

IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Linus Torvalds
1cd09d4b38 kbuild: add '-fno-stack-check' to kernel build options
commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream.

It appears that hardened gentoo enables "-fstack-check" by default for
gcc.

That doesn't work _at_all_ for the kernel, because the kernel stack
doesn't act like a user stack at all: it's much smaller, and it doesn't
auto-expand on use.  So the extra "probe one page below the stack" code
generated by -fstack-check just breaks the kernel in horrible ways,
causing infinite double faults etc.

[ I have to say, that the particular code gcc generates looks very
  stupid even for user space where it works, but that's a separate
  issue.  ]

Reported-and-tested-by: Alexander Tsoy <alexander@tsoy.me>
Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Andy Lutomirski
6c4db09c29 x86/mm/64: Fix reboot interaction with CR4.PCIDE
commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream.

Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.)  The result is a GPF and
a hang instead of a reboot.

I didn't catch this in testing because neither my computer nor my VM
reboots this way.  I can trigger it with reboot=bios, though.

Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Andy Lutomirski
fd0504525e x86/mm: Enable CR4.PCIDE on supported systems
commit 660da7c9228f685b2ebe664f9fd69aaddcc420b5 upstream.

We can use PCID if the CPU has PCID and PGE and we're not on Xen.

By itself, this has no effect. A followup patch will start using PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:24 +01:00
Andy Lutomirski
dcccd3c266 x86/mm: Add the 'nopcid' boot option to turn off PCID
commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.

The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
78043e5b6f x86/mm: Disable PCID on 32-bit kernels
commit cba4671af7550e008f7a7835f06df0763825bf3e upstream.

32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode.  Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
b2e24274d5 x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
commit ce4a4e565f5264909a18c733b864c3f74467f69e upstream.

The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

 - flush_tlb_mm_range() didn't flush individual pages if the range
   was small.

 - The lazy TLB code was much weaker.  This usually wouldn't matter,
   but, if a kernel thread flushed its lazy "active_mm" more than
   once (due to reclaim or similar), it wouldn't be unlazied and
   would instead pointlessly flush repeatedly.

 - Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
3efba6062a x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.

flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:

 - It was missing an smp_mb() in the case where
   current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)

 - It was missing tracepoints and vm counter updates.

The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page.  This
hardly seems worthwhile.  If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
9f4d1ba1d4 x86/mm: Make flush_tlb_mm_range() more predictable
commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.

I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
227d6f0e79 x86/mm: Remove flush_tlb() and flush_tlb_current_task()
commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.

I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
6ce9d1e681 x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
commit 9ccee2373f0658f234727700e619df097ba57023 upstream.

mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Hui Wang
c04ed3a849 ALSA: hda - fix headset mic detection issue on a Dell machine
commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream.

It has the codec alc256, and add its pin definition to pin quirk
table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Takashi Iwai
0ba2ebc9f3 ALSA: hda: Drop useless WARN_ON()
commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream.

Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915
pointer leftover in error path") cleared hdac_acomp pointer, the
WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give
a false-positive warning, as the function gets called no matter
whether the component is registered or not.  For fixing it, let's get
rid of the spurious WARN_ON().

Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path")
Reported-by: Kouta Okamoto <kouta.okamoto@toshiba.co.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:22 +01:00
Johan Hovold
5251932b97 ASoC: twl4030: fix child-node lookup
commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.

Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:22 +01:00