639186 Commits

Author SHA1 Message Date
Darrick J. Wong
ce83e494d1 xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent
commit e1a4e37cc7b665b6804fba812aca2f4d7402c249 upstream.

In a pathological scenario where we are trying to bunmapi a single
extent in which every other block is shared, it's possible that trying
to unmap the entire large extent in a single transaction can generate so
many EFIs that we overflow the transaction reservation.

Therefore, use a heuristic to guess at the number of blocks we can
safely unmap from a reflink file's data fork in an single transaction.
This should prevent problems such as the log head slamming into the tail
and ASSERTs that trigger because we've exceeded the transaction
reservation.

Note that since bunmapi can fail to unmap the entire range, we must also
teach the deferred unmap code to roll into a new transaction whenever we
get low on reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: random edits, all bugs are my fault]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Brian Foster
7cb011bbac xfs: push buffer of flush locked dquot to avoid quotacheck deadlock
commit 7912e7fef2aebe577f0b46d3cba261f2783c5695 upstream.

Reclaim during quotacheck can lead to deadlocks on the dquot flush
lock:

 - Quotacheck populates a local delwri queue with the physical dquot
   buffers.
 - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
   dirties all of the dquots.
 - Reclaim kicks in and attempts to flush a dquot whose buffer is
   already queud on the quotacheck queue. The flush succeeds but
   queueing to the reclaim delwri queue fails as the backing buffer is
   already queued. The flush unlock is now deferred to I/O completion
   of the buffer from the quotacheck queue.
 - The dqadjust bulkstat continues and dirties the recently flushed
   dquot once again.
 - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
   the flush lock to update the backing buffers with the in-core
   recalculated values. It deadlocks on the redirtied dquot as the
   flush lock was already acquired by reclaim, but the buffer resides
   on the local delwri queue which isn't submitted until the end of
   quotacheck.

This is reproduced by running quotacheck on a filesystem with a
couple million inodes in low memory (512MB-1GB) situations. This is
a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
buffer lists"), which removed a trylock and buffer I/O submission
from the quotacheck dquot flush sequence.

Quotacheck first resets and collects the physical dquot buffers in a
delwri queue. Then, it traverses the filesystem inodes via bulkstat,
updates the in-core dquots, flushes the corrected dquots to the
backing buffers and finally submits the delwri queue for I/O. Since
the backing buffers are queued across the entire quotacheck
operation, dquot reclaim cannot possibly complete a dquot flush
before quotacheck completes.

Therefore, quotacheck must submit the buffer for I/O in order to
cycle the flush lock and flush the dirty in-core dquot to the
buffer. Add a delwri queue buffer push mechanism to submit an
individual buffer for I/O without losing the delwri queue status and
use it from quotacheck to avoid the deadlock. This restores
quotacheck behavior to as before the regression was introduced.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Brian Foster
85ab1b23d2 xfs: fix spurious spin_is_locked() assert failures on non-smp kernels
commit 95989c46d2a156365867b1d795fdefce71bce378 upstream.

The 0-day kernel test robot reports assertion failures on
!CONFIG_SMP kernels due to failed spin_is_locked() checks. As it
turns out, spin_is_locked() is hardcoded to return zero on
!CONFIG_SMP kernels and so this function cannot be relied on to
verify spinlock state in this configuration.

To avoid this problem, replace the associated asserts with lockdep
variants that do the right thing regardless of kernel configuration.
Drop the one assert that checks for an unlocked lock as there is no
suitable lockdep variant for that case. This moves the spinlock
checks from XFS debug code to lockdep, but generally provides the
same level of protection.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Jan Kara
4c1d33c4cf xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
commit a54fba8f5a0dc36161cacdf2aa90f007f702ec1a upstream.

Currently several places in xfs_find_get_desired_pgoff() handle the case
of a missing page. Make them all handled in one place after the loop has
terminated.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Andy Lutomirski
3fddeb8003 x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
commit e137a4d8f4dd2e277e355495b6b2cb241a8693c3 upstream.

Switching FS and GS is a mess, and the current code is still subtly
wrong: it assumes that "Loading a nonzero value into FS sets the
index and base", which is false on AMD CPUs if the value being
loaded is 1, 2, or 3.

(The current code came from commit 3e2b68d752c9 ("x86/asm,
sched/x86: Rewrite the FS and GS context switch code"), which made
it better but didn't fully fix it.)

Rewrite it to be much simpler and more obviously correct.  This
should fix it fully on AMD CPUs and shouldn't adversely affect
performance.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Andy Lutomirski
0caec70692 x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
commit 9584d98bed7a7a904d0702ad06bbcc94703cb5b4 upstream.

In ELF_COPY_CORE_REGS, we're copying from the current task, so
accessing thread.fsbase and thread.gsbase makes no sense.  Just read
the values from the CPU registers.

In practice, the old code would have been correct most of the time
simply because thread.fsbase and thread.gsbase usually matched the
CPU registers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Andy Lutomirski
c7d1ddec25 x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
commit 767d035d838f4fd6b5a5bbd7a3f6d293b7f65a49 upstream.

execve used to leak FSBASE and GSBASE on AMD CPUs.  Fix it.

The security impact of this bug is small but not quite zero -- it
could weaken ASLR when a privileged task execs a less privileged
program, but only if program changed bitness across the exec, or the
child binary was highly unusual or actively malicious.  A child
program that was compromised after the exec would not have access to
the leaked base.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:57 +02:00
Jaegeuk Kim
cc9618c9ff f2fs: check hot_data for roll-forward recovery
commit 125c9fb1ccb53eb2ea9380df40f3c743f3fb2fed upstream.

We need to check HOT_DATA to truncate any previous data block when doing
roll-forward recovery.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Jaegeuk Kim
0f90297cba f2fs: let fill_super handle roll-forward errors
commit afd2b4da40b3b567ef8d8e6881479345a2312a03 upstream.

If we set CP_ERROR_FLAG in roll-forward error, f2fs is no longer to proceed
any IOs due to f2fs_cp_error(). But, for example, if some stale data is involved
on roll-forward process, we're able to get -ENOENT, getting fs stuck.
If we get any error, let fill_super set SBI_NEED_FSCK and try to recover back
to stable point.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Haishuang Yan
60b94125a1 ip_tunnel: fix setting ttl and tos value in collect_md mode
[ Upstream commit 0f693f1995cf002432b70f43ce73f79bf8d0b6c9 ]

ttl and tos variables are declared and assigned, but are not used in
iptunnel_xmit() function.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Marcelo Ricardo Leitner
3f60dadbe1 sctp: fix missing wake ups in some situations
[ Upstream commit 7906b00f5cd1cd484fced7fcda892176e3202c8a ]

Commit fb586f25300f ("sctp: delay calls to sk_data_ready() as much as
possible") minimized the number of wake ups that are triggered in case
the association receives a packet with multiple data chunks on it and/or
when io_events are enabled and then commit 0970f5b36659 ("sctp: signal
sk_data_ready earlier on data chunks reception") moved the wake up to as
soon as possible. It thus relies on the state machine running later to
clean the flag that the event was already generated.

The issue is that there are 2 call paths that calls
sctp_ulpq_tail_event() outside of the state machine, causing the flag to
linger and possibly omitting a needed wake up in the sequence.

One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
partial reliability triggers removal of chunks from the send queue when
the application calls sendmsg().

This commit fixes it by not setting the flag in case the socket is not
owned by the user, as it won't be cleaned later. This works for
user-initiated calls and also for rx path processing.

Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
Reported-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Eric Dumazet
bf8ed95d2c ipv6: fix typo in fib6_net_exit()
[ Upstream commit 32a805baf0fb70b6dbedefcd7249ac7f580f9e3b ]

IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Sabrina Dubroca
c9335db792 ipv6: fix memory leak with multiple tables during netns destruction
[ Upstream commit ba1cc08d9488c94cb8d94f545305688b72a2a300 ]

fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.

Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.

Reproducer:
    ip netns add x
    ip -net x -6 rule add from 6003:1::/64 table 100
    ip netns del x

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Xin Long
ca7d8a337b ip6_gre: update mtu properly in ip6gre_err
[ Upstream commit 5c25f30c93fdc5bf25e62101aeaae7a4f9b421b3 ]

Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the
offset of gre header from mtu info. The expected mtu of gre device
should also subtract gre header. Otherwise, the next packets still
can't be sent out.

Jianlin found this issue when using the topo:
  client(ip6gre)<---->(nic1)route(nic2)<----->(ip6gre)server

and reducing nic2's mtu, then both tcp and sctp's performance with
big size data became 0.

This patch is to fix it by also subtracting grehdr (tun->tun_hlen)
from mtu info when updating gre device's mtu in ip6gre_err(). It
also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Jason Wang
f5755c0e87 vhost_net: correctly check tx avail during rx busy polling
[ Upstream commit 8b949bef9172ca69d918e93509a4ecb03d0355e0 ]

We check tx avail through vhost_enable_notify() in the past which is
wrong since it only checks whether or not guest has filled more
available buffer since last avail idx synchronization which was just
done by vhost_vq_avail_empty() before. What we really want is checking
pending buffers in the avail ring. Fix this by calling
vhost_vq_avail_empty() instead.

This issue could be noticed by doing netperf TCP_RR benchmark as
client from guest (but not host). With this fix, TCP_RR from guest to
localhost restores from 1375.91 trans per sec to 55235.28 trans per
sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).

Fixes: 030881372460 ("vhost_net: basic polling support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Claudiu Manoil
90406e68e4 gianfar: Fix Tx flow control deactivation
[ Upstream commit 5d621672bc1a1e5090c1ac5432a18c79e0e13e03 ]

The wrong register is checked for the Tx flow control bit,
it should have been maccfg1 not maccfg2.
This went unnoticed for so long probably because the impact is
hardly visible, not to mention the tangled code from adjust_link().
First, link flow control (i.e. handling of Rx/Tx link level pause frames)
is disabled by default (needs to be enabled via 'ethtool -A').
Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
old boards), which results in Tx flow control remaining always on
once activated.

Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:56 +02:00
Jesper Dangaard Brouer
1bcf18718e Revert "net: fix percpu memory leaks"
[ Upstream commit 5a63643e583b6a9789d7a225ae076fb4e603991c ]

This reverts commit 1d6119baf0610f813eb9d9580eb4fd16de5b4ceb.

After reverting commit 6d7b857d541e ("net: use lib/percpu_counter API
for fragmentation mem accounting") then here is no need for this
fix-up patch.  As percpu_counter is no longer used, it cannot
memory leak it any-longer.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Jesper Dangaard Brouer
5a7a40bad2 Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
[ Upstream commit fb452a1aa3fd4034d7999e309c5466ff2d7005aa ]

This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Ido Schimmel
b5a3ae8b12 bridge: switchdev: Clear forward mark when transmitting packet
[ Upstream commit 79e99bdd60b484af9afe0147e85a13e66d5c1cdb ]

Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for
stacked devices") added the 'offload_fwd_mark' bit to the skb in order
to allow drivers to indicate to the bridge driver that they already
forwarded the packet in L2.

In case the bit is set, before transmitting the packet from each port,
the port's mark is compared with the mark stored in the skb's control
block. If both marks are equal, we know the packet arrived from a switch
device that already forwarded the packet and it's not re-transmitted.

However, if the packet is transmitted from the bridge device itself
(e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark
stored in the skb's control block isn't valid.

This scenario can happen in rare cases where a packet was trapped during
L3 forwarding and forwarded by the kernel to a bridge device.

Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Tested-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Ido Schimmel
73ee5a73e7 mlxsw: spectrum: Forbid linking to devices that have uppers
[ Upstream commit 25cc72a33835ed8a6f53180a822cadab855852ac ]

The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.

Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.

One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.

Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.

Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nogah Frankel <nogahf@mellanox.com>
Tested-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Wei Wang
a10c510179 tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
[ Upstream commit 499350a5a6e7512d9ed369ed63a4244b6536f4f8 ]

When tcp_disconnect() is called, inet_csk_delack_init() sets
icsk->icsk_ack.rcv_mss to 0.
This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
__tcp_select_window() call path to have division by 0 issue.
So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.

Reported-by: Andrey Konovalov  <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Florian Fainelli
a6e51fda71 Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
[ Upstream commit ebc8254aeae34226d0bc8fda309fd9790d4dccfe ]

This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.

David Daney provide the following call trace and diagram of events:

When ndo_stop() is called we call:

 phy_disconnect()
    +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
    +---> phy_stop_machine()
    |      +---> phy_state_machine()
    |              +----> queue_delayed_work(): Work queued.
    +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
    +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

 CPU 12 Unable to handle kernel paging request at virtual address
0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 80000004021ed100 task.stack: 8000000409d70000
$ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
$ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
$ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
$12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
$16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
$20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
$24   : 0000000000000061 ffffffff808637b0
$28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
Hi    : 000000000000002a
Lo    : 000000000000003f
epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3        KX SX UX KERNEL EXL IE
Cause : 00800008 (ExcCode 02)
BadVA : 0000000000000048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
task=80000004021ed100, tls=0000000000000000)
Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
        0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
        80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
        ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
        8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
        ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
        0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
        8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
        8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
        8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
        ...
Call Trace:
[<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
[<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
[<ffffffff808a1708>] process_one_work+0x158/0x368
[<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
[<ffffffff808a8598>] kthread+0xc8/0xe0
[<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c

The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.

PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Eric Dumazet
af33da0ed9 kcm: do not attach PF_KCM sockets to avoid deadlock
[ Upstream commit 351050ecd6523374b370341cc29fe61e2201556b ]

syzkaller had no problem to trigger a deadlock, attaching a KCM socket
to another one (or itself). (original syzkaller report was a very
confusing lockdep splat during a sendmsg())

It seems KCM claims to only support TCP, but no enforcement is done,
so we might need to add additional checks.

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:55 +02:00
Benjamin Poirier
8c623e5d03 packet: Don't write vnet header beyond end of buffer
[ Upstream commit edbd58be15a957f6a760c4a514cd475217eb97fd ]

... which may happen with certain values of tp_reserve and maclen.

Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
Stefano Brivio
2b3bd5972a cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()
[ Upstream commit 0f3086868e8889a823a6e0f3d299102aa895d947 ]

Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:

	for (i = 0; i < size / 8; i++)
		entry->cmd[i] = be64_to_cpu(cmd[i]);

Up to 48 bytes from the stack are then leaked to debugfs.

This happens whenever we send (and log) commands described by
structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
fw_sched_cmd(32), fw_devlog_cmd(32).

The cxgb4vf driver got this right instead.

When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.

Fixes: 7f080c3f2ff0 ("cxgb4: Add support to enable logging of firmware mailbox commands")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
stephen hemminger
de2ecec26d netvsc: fix deadlock betwen link status and removal
[ Upstream commit 9b4e946ce14e20d7addbfb7d9139e604f9fda107 ]

There is a deadlock possible when canceling the link status
delayed work queue. The removal process is run with RTNL held,
and the link status callback is acquring RTNL.

Resolve the issue by using trylock and rescheduling.
If cancel is in process, that block it from happening.

Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
Arnd Bergmann
64dfc67548 qlge: avoid memcpy buffer overflow
[ Upstream commit e58f95831e7468d25eb6e41f234842ecfe6f014f ]

gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:

    inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
  memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);

Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.

The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.

Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
Stefano Brivio
08d56d8a99 sctp: Avoid out-of-bounds reads from address storage
[ Upstream commit ee6c88bb754e3d363e568da78086adfedb692447 ]

inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
to export diagnostic information to userspace.

However, the memory allocated to store sockaddr information is
smaller than that and depends on the address family, so we leak
up to 100 uninitialized bytes to userspace. Just use the size of
the source structs instead, in all the three cases this is what
userspace expects. Zero out the remaining memory.

Unused bytes (i.e. when IPv4 addresses are used) in source
structs sctp_sockaddr_entry and sctp_transport are already
cleared by sctp_add_bind_addr() and sctp_transport_new(),
respectively.

Noticed while testing KASAN-enabled kernel with 'ss':

[ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
[ 2326.896800] Read of size 128 by task ss/9527
[ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
[ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
[ 2326.917585] Call Trace:
[ 2326.920312]  dump_stack+0x63/0x8d
[ 2326.924014]  kasan_object_err+0x21/0x70
[ 2326.928295]  kasan_report+0x288/0x540
[ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.938500]  ? skb_put+0x8b/0xd0
[ 2326.942098]  ? memset+0x31/0x40
[ 2326.945599]  check_memory_region+0x13c/0x1a0
[ 2326.950362]  memcpy+0x23/0x50
[ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
[ 2326.966495]  ? __lock_sock+0x102/0x150
[ 2326.970671]  ? sock_def_wakeup+0x60/0x60
[ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
[ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
[ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
[ 2326.990504]  ? memset+0x31/0x40
[ 2326.994007]  ? mutex_lock+0x12/0x40
[ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
[ 2327.003340]  ? __sys_sendmsg+0x150/0x150
[ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
[ 2327.012979]  netlink_dump+0x1e6/0x490
[ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
[ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
[ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
[ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
[ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
[ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
[ 2327.050199]  netlink_rcv_skb+0x14b/0x180
[ 2327.054574]  ? sock_diag_bind+0x60/0x60
[ 2327.058850]  sock_diag_rcv+0x28/0x40
[ 2327.062837]  netlink_unicast+0x2e7/0x3b0
[ 2327.067212]  ? netlink_attachskb+0x330/0x330
[ 2327.071975]  ? kasan_check_write+0x14/0x20
[ 2327.076544]  netlink_sendmsg+0x5be/0x730
[ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.085486]  ? kasan_check_write+0x14/0x20
[ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
[ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.099678]  sock_sendmsg+0x74/0x80
[ 2327.103567]  ___sys_sendmsg+0x520/0x530
[ 2327.107844]  ? __get_locked_pte+0x178/0x200
[ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
[ 2327.117660]  ? vm_insert_page+0x360/0x360
[ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
[ 2327.126895]  ? vm_insert_pfn+0x32/0x40
[ 2327.131077]  ? vvar_fault+0x71/0xd0
[ 2327.134968]  ? special_mapping_fault+0x69/0x110
[ 2327.140022]  ? __do_fault+0x42/0x120
[ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
[ 2327.148965]  ? __fget_light+0xa7/0xc0
[ 2327.153049]  __sys_sendmsg+0xcb/0x150
[ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
[ 2327.161409]  ? SyS_shutdown+0x140/0x140
[ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
[ 2327.170646]  ? __do_page_fault+0x55d/0x620
[ 2327.175216]  ? __sys_sendmsg+0x150/0x150
[ 2327.179591]  SyS_sendmsg+0x12/0x20
[ 2327.183384]  do_syscall_64+0xe3/0x230
[ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
[ 2327.192622] RIP: 0033:0x7f41d18fa3b0
[ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
[ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
[ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
[ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
[ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
[ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
[ 2327.251953] Allocated:
[ 2327.254581] PID = 9484
[ 2327.257215]  save_stack_trace+0x1b/0x20
[ 2327.261485]  save_stack+0x46/0xd0
[ 2327.265179]  kasan_kmalloc+0xad/0xe0
[ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
[ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
[ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
[ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
[ 2327.288455]  inet_bind+0x5f/0x3a0
[ 2327.292151]  SYSC_bind+0x1a4/0x1e0
[ 2327.295944]  SyS_bind+0xe/0x10
[ 2327.299349]  do_syscall_64+0xe3/0x230
[ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.308194] Freed:
[ 2327.310434] PID = 4131
[ 2327.313065]  save_stack_trace+0x1b/0x20
[ 2327.317344]  save_stack+0x46/0xd0
[ 2327.321040]  kasan_slab_free+0x73/0xc0
[ 2327.325220]  kfree+0x96/0x1a0
[ 2327.328530]  dynamic_kobj_release+0x15/0x40
[ 2327.333195]  kobject_release+0x99/0x1e0
[ 2327.337472]  kobject_put+0x38/0x70
[ 2327.341266]  free_notes_attrs+0x66/0x80
[ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
[ 2327.350211]  free_module+0x20/0x2a0
[ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
[ 2327.358667]  do_syscall_64+0xe3/0x230
[ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.367510] Memory state around the buggy address:
[ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
[ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
[ 2327.397031]                                ^
[ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
[ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
[ 2327.417907] ==================================================================

This fixes CVE-2017-7558.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
Florian Fainelli
4d8ee1935b fsl/man: Inherit parent device and of_node
[ Upstream commit a1a50c8e4c241a505b7270e1a3c6e50d94e794b1 ]

Junote Cai reported that he was not able to get a DSA setup involving the
Freescale DPAA/FMAN driver to work and narrowed it down to
of_find_net_device_by_node(). This function requires the network device's
device reference to be correctly set which is the case here, though we have
lost any device_node association there.

The problem is that dpaa_eth_add_device() allocates a "dpaa-ethernet" platform
device, and later on dpaa_eth_probe() is called but SET_NETDEV_DEV() won't be
propagating &pdev->dev.of_node properly. Fix this by inherenting both the parent
device and the of_node when dpaa_eth_add_device() creates the platform device.

Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:54 +02:00
Eric Dumazet
1e39e5c6a2 udp: on peeking bad csum, drop packets even if not at head
[ Upstream commit fd6055a806edc4019be1b9fb7d25262599bca5b1 ]

When peeking, if a bad csum is discovered, the skb is unlinked from
the queue with __sk_queue_drop_skb and the peek operation restarted.

__sk_queue_drop_skb only drops packets that match the queue head.

This fails if the skb was found after the head, using SO_PEEK_OFF
socket option. This causes an infinite loop.

We MUST drop this problematic skb, and we can simply check if skb was
already removed by another thread, by looking at skb->next :

This pointer is set to NULL by the  __skb_unlink() operation, that might
have happened only under the spinlock protection.

Many thanks to syzkaller team (and particularly Dmitry Vyukov who
provided us nice C reproducers exhibiting the lockup) and Willem de
Bruijn who provided first version for this patch and a test program.

Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:53 +02:00
Sabrina Dubroca
4b4a194a10 macsec: add genl family module alias
[ Upstream commit 78362998f58c7c271e2719dcd0aaced435c801f9 ]

This helps tools such as wpa_supplicant can start even if the macsec
module isn't loaded yet.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:53 +02:00
Wei Wang
43c792a848 ipv6: fix sparse warning on rt6i_node
[ Upstream commit 4e587ea71bf924f7dac621f1351653bd41e446cb ]

Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:53 +02:00
Wei Wang
7f8f23fc80 ipv6: add rcu grace period before freeing fib6_node
[ Upstream commit c5cff8561d2d0006e972bd114afd51f082fee77c ]

We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().

Note: there is no "Fixes" tag because this bug was there in a very
early stage.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:53 +02:00
Stefano Brivio
dccb31be7e ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
[ Upstream commit 3de33e1ba0506723ab25734e098cf280ecc34756 ]

A packet length of exactly IPV6_MAXPLEN is allowed, we should
refuse parsing options only if the size is 64KiB or more.

While at it, remove one extra variable and one assignment which
were also introduced by the commit that introduced the size
check. Checking the sum 'offset + len' and only later adding
'len' to 'offset' doesn't provide any advantage over directly
summing to 'offset' and checking it.

Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-20 08:19:52 +02:00
Greg Kroah-Hartman
4ad5dcaca7 Linux 4.9.50 2017-09-13 14:13:54 -07:00
Richard Wareing
5b82e0e938 xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
commit b31ff3cdf540110da4572e3e29bd172087af65cc upstream.

If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.

This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: xfs_blkdev_issue_flush+0xd/0x20
  .....
  Call Trace:
    xfs_file_fsync+0x188/0x1c0
    vfs_fsync_range+0x3b/0xa0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    do_syscall_64+0x4d/0xb0
    entry_SYSCALL64_slow_path+0x25/0x25

Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur.  To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:

  # mkfs.xfs -f /dev/pmem0
  # mount /dev/pmem0 /mnt/test
  # mkdir /mnt/test/foo
  # xfs_io -c 'chattr +t' /mnt/test/foo
  # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar

Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.

Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.

Fixes: f538d4da8d52 ("[XFS] write barrier support")
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:37 -07:00
tarangg@amazon.com
3885bc68ae NFS: Sync the correct byte range during synchronous writes
commit e973b1a5999e57da677ab50da5f5479fdc0f0c31 upstream.

Since commit 18290650b1c8 ("NFS: Move buffered I/O locking into
nfs_file_write()") nfs_file_write() has not flushed the correct byte
range during synchronous writes.  generic_write_sync() expects that
iocb->ki_pos points to the right edge of the range rather than the
left edge.

To replicate the problem, open a file with O_DSYNC, have the client
write at increasing offsets, and then print the successful offsets.
Block port 2049 partway through that sequence, and observe that the
client application indicates successful writes in advance of what the
server received.

Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Jacob Strauss <jsstraus@amazon.com>
Signed-off-by: Tarang Gupta <tarangg@amazon.com>
Tested-by: Tarang Gupta <tarangg@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:37 -07:00
Trond Myklebust
a70912a6bf NFS: Fix 2 use after free issues in the I/O code
commit 196639ebbe63a037fe9a80669140bd292d8bcd80 upstream.

The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:37 -07:00
Mark Rutland
301d91e03c ARM: 8692/1: mm: abort uaccess retries upon fatal signal
commit 746a272e44141af24a02f6c9b0f65f4c4598ed42 upstream.

When there's a fatal signal pending, arm's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:37 -07:00
Marc Zyngier
b40aa8b047 ARM64: dts: marvell: armada-37xx: Fix GIC maintenance interrupt
commit 95696d292e204073433ed2ef3ff4d3d8f42a8248 upstream.

The GIC-500 integrated in the Armada-37xx SoCs is compliant with
the GICv3 architecture, and thus provides a maintenance interrupt
that is required for hypervisors to function correctly.

With the interrupt provided in the DT, KVM now works as it should.
Tested on an Espressobin system.

Fixes: adbc3695d9e4 ("arm64: dts: add the Marvell Armada 3700 family and a development board")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Ben Seri
6300c8bfaf Bluetooth: Properly check L2CAP config option output buffer length
commit e860d2c904d1a9f38a24eb44c9f34b8f915a6ea3 upstream.

Validate the output buffer length for L2CAP config requests and responses
to avoid overflowing the stack buffer used for building the option blocks.

Signed-off-by: Ben Seri <ben@armis.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Takashi Iwai
03bea515b9 ALSA: msnd: Optimize / harden DSP and MIDI loops
commit 20e2b791796bd68816fa115f12be5320de2b8021 upstream.

The ISA msnd drivers have loops fetching the ring-buffer head, tail
and size values inside the loops.  Such codes are inefficient and
fragile.

This patch optimizes it, and also adds the sanity check to avoid the
endless loops.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196131
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196133
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: grygorii tertychnyi <gtertych@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Yang Shi
d21f3eaa09 locktorture: Fix potential memory leak with rw lock test
commit f4dbba591945dc301c302672adefba9e2ec08dc5 upstream.

When running locktorture module with the below commands with kmemleak enabled:

$ modprobe locktorture torture_type=rw_lock_irq
$ rmmod locktorture

The below kmemleak got caught:

root@10:~# echo scan > /sys/kernel/debug/kmemleak
[  323.197029] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@10:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffffc07592d500 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c3 7b 02 00 00 00 00 00  .........{......
    00 00 00 00 00 00 00 00 d7 9b 02 00 00 00 00 00  ................
  backtrace:
    [<ffffff80081e5a88>] create_object+0x110/0x288
    [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
    [<ffffff80081d5acc>] __kmalloc+0x234/0x318
    [<ffffff80006fa130>] 0xffffff80006fa130
    [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
    [<ffffff800817e28c>] do_init_module+0x68/0x1cc
    [<ffffff800811c848>] load_module+0x1a68/0x22e0
    [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
    [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffffffc07592d480 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 3b 6f 01 00 00 00 00 00  ........;o......
    00 00 00 00 00 00 00 00 23 6a 01 00 00 00 00 00  ........#j......
  backtrace:
    [<ffffff80081e5a88>] create_object+0x110/0x288
    [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
    [<ffffff80081d5acc>] __kmalloc+0x234/0x318
    [<ffffff80006fa22c>] 0xffffff80006fa22c
    [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
    [<ffffff800817e28c>] do_init_module+0x68/0x1cc
    [<ffffff800811c848>] load_module+0x1a68/0x22e0
    [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
    [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
    [<ffffffffffffffff>] 0xffffffffffffffff

It is because cxt.lwsa and cxt.lrsa don't get freed in module_exit, so free
them in lock_torture_cleanup() and free writer_tasks if reader_tasks is
failed at memory allocation.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: 石洋 <yang.s@alibaba-inc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Laurent Dufour
3c8381df2a mm/memory.c: fix mem_cgroup_oom_disable() call missing
commit de0c799bba2610a8e1e9a50d76a28614520a4cd4 upstream.

Seen while reading the code, in handle_mm_fault(), in the case
arch_vma_access_permitted() is failing the call to
mem_cgroup_oom_disable() is not made.

To fix that, move the call to mem_cgroup_oom_enable() after calling
arch_vma_access_permitted() as it should not have entered the memcg OOM.

Link: http://lkml.kernel.org/r/1504625439-31313-1-git-send-email-ldufour@linux.vnet.ibm.com
Fixes: bae473a423f6 ("mm: introduce fault_env")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Andy Lutomirski
ebf381be01 selftests/x86/fsgsbase: Test selectors 1, 2, and 3
commit 23d98c204386a98d9ef9f9e744f41443ece4929f upstream.

Those are funny cases.  Make sure they work.

(Something is screwy with signal handling if a selector is 1, 2, or 3.
Anyone who wants to dive into that rabbit hole is welcome to do so.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Aleksa Sarai
0f7dbc4d5b btrfs: resume qgroup rescan on rw remount
commit 6c6b5a39c4bf3dbd8cf629c9f5450e983c19dbb9 upstream.

Several distributions mount the "proper root" as ro during initrd and
then remount it as rw before pivot_root(2). Thus, if a rescan had been
aborted by a previous shutdown, the rescan would never be resumed.

This issue would manifest itself as several btrfs ioctl(2)s causing the
entire machine to hang when btrfs_qgroup_wait_for_completion was hit
(due to the fs_info->qgroup_rescan_running flag being set but the rescan
itself not being resumed). Notably, Docker's btrfs storage driver makes
regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT
(causing this problem to be manifested on boot for some machines).

Cc: Jeff Mahoney <jeffm@suse.com>
Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Daniel Verkamp
f52a535c84 nvme-fabrics: generate spec-compliant UUID NQNs
commit 40a5fce495715c48c2e02668144e68a507ac5a30 upstream.

The default host NQN, which is generated based on the host's UUID,
does not follow the UUID-based NQN format laid out in the NVMe 1.3
specification.  Remove the "NVMf:" portion of the NQN to match the spec.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Abhishek Sahu
b276bc66d4 mtd: nand: qcom: fix config error for BCH
commit 10777de570016471fd929869c7830a7772893e39 upstream.

The configuration for BCH is not correct in the current driver.
The ECC_CFG_ECC_DISABLE bit defines whether to enable or disable the
BCH ECC in which

	0x1 : BCH_DISABLED
	0x0 : BCH_ENABLED

But currently host->bch_enabled is being assigned to BCH_DISABLED.

Fixes: c76b78d8ec05a ("mtd: nand: Qualcomm NAND controller driver")
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Reviewed-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Abhishek Sahu
f4a272d578 mtd: nand: qcom: fix read failure without complete bootchain
commit d8a9b320a26c1ea28e51e4f3ecfb593d5aac2910 upstream.

The NAND page read fails without complete boot chain since
NAND_DEV_CMD_VLD value is not proper. The default power on reset
value for this register is

    0xe - ERASE_START_VALID | WRITE_START_VALID | READ_STOP_VALID

The READ_START_VALID should be enabled for sending PAGE_READ
command. READ_STOP_VALID should be cleared since normal NAND
page read does not require READ_STOP command.

Fixes: c76b78d8ec05a ("mtd: nand: Qualcomm NAND controller driver")
Reviewed-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Boris Brezillon
865162031c mtd: nand: mxc: Fix mxc_v1 ooblayout
commit 3bff08dffe3115a25ce04b95ea75f6d868572c60 upstream.

Commit a894cf6c5a82 ("mtd: nand: mxc: switch to mtd_ooblayout_ops")
introduced a bug in the OOB layout description. Even if the driver claims
that 3 ECC bytes are reserved to protect 512 bytes of data, it's actually
5 ECC bytes to protect 512+6 bytes of data (some OOB bytes are also
protected using extra ECC bytes).

Fix the mxc_v1_ooblayout_{free,ecc}() functions to reflect this behavior.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: a894cf6c5a82 ("mtd: nand: mxc: switch to mtd_ooblayout_ops")
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00