637952 Commits

Author SHA1 Message Date
Gilad Ben-Yossef
d4783eb9f0 crypto: gcm - wait for crypto op not signal safe
commit f3ad587070d6bd961ab942b3fd7a85d00dfc934b upstream.

crypto_gcm_setkey() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the data buffer that is kfree'ed in this case.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:55 +02:00
Gilad Ben-Yossef
2d0280070e crypto: drbg - wait for crypto op not signal safe
commit a5dfefb1c3f3db81662556393fd9283511e08430 upstream.

drbg_kcapi_sym_ctr() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the output buffer.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:55 +02:00
Eric Biggers
d24c1c1977 KEYS: encrypted: avoid encrypting/decrypting stack buffers
commit e9ff56ac352446f55141aaef1553cee662b2e310 upstream.

Since v4.9, the crypto API cannot (normally) be used to encrypt/decrypt
stack buffers because the stack may be virtually mapped.  Fix this for
the padding buffers in encrypted-keys by using ZERO_PAGE for the
encryption padding and by allocating a temporary heap buffer for the
decryption padding.

Tested with CONFIG_DEBUG_SG=y:
	keyctl new_session
	keyctl add user master "abcdefghijklmnop" @s
	keyid=$(keyctl add encrypted desc "new user:master 25" @s)
	datablob="$(keyctl pipe $keyid)"
	keyctl unlink $keyid
	keyid=$(keyctl add encrypted desc "load $datablob" @s)
	datablob2="$(keyctl pipe $keyid)"
	[ "$datablob" = "$datablob2" ] && echo "Success!"

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:55 +02:00
Eric Biggers
2436976102 KEYS: fix freeing uninitialized memory in key_update()
commit 63a0b0509e700717a59f049ec6e4e04e903c7fe2 upstream.

key_update() freed the key_preparsed_payload even if it was not
initialized first.  This would cause a crash if userspace called
keyctl_update() on a key with type like "asymmetric" that has a
->preparse() method but not an ->update() method.  Possibly it could
even be triggered for other key types by racing with keyctl_setperm() to
make the KEY_NEED_WRITE check fail (the permission was already checked,
so normally it wouldn't fail there).

Reproducer with key type "asymmetric", given a valid cert.der:

keyctl new_session
keyid=$(keyctl padd asymmetric desc @s < cert.der)
keyctl setperm $keyid 0x3f000000
keyctl update $keyid data

[  150.686666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[  150.687601] IP: asymmetric_key_free_kids+0x12/0x30
[  150.688139] PGD 38a3d067
[  150.688141] PUD 3b3de067
[  150.688447] PMD 0
[  150.688745]
[  150.689160] Oops: 0000 [#1] SMP
[  150.689455] Modules linked in:
[  150.689769] CPU: 1 PID: 2478 Comm: keyctl Not tainted 4.11.0-rc4-xfstests-00187-ga9f6b6b8cd2f #742
[  150.690916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[  150.692199] task: ffff88003b30c480 task.stack: ffffc90000350000
[  150.692952] RIP: 0010:asymmetric_key_free_kids+0x12/0x30
[  150.693556] RSP: 0018:ffffc90000353e58 EFLAGS: 00010202
[  150.694142] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
[  150.694845] RDX: ffffffff81ee3920 RSI: ffff88003d4b0700 RDI: 0000000000000001
[  150.697569] RBP: ffffc90000353e60 R08: ffff88003d5d2140 R09: 0000000000000000
[  150.702483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  150.707393] R13: 0000000000000004 R14: ffff880038a4d2d8 R15: 000000000040411f
[  150.709720] FS:  00007fcbcee35700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[  150.711504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  150.712733] CR2: 0000000000000001 CR3: 0000000039eab000 CR4: 00000000003406e0
[  150.714487] Call Trace:
[  150.714975]  asymmetric_key_free_preparse+0x2f/0x40
[  150.715907]  key_update+0xf7/0x140
[  150.716560]  ? key_default_cmp+0x20/0x20
[  150.717319]  keyctl_update_key+0xb0/0xe0
[  150.718066]  SyS_keyctl+0x109/0x130
[  150.718663]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  150.719440] RIP: 0033:0x7fcbce75ff19
[  150.719926] RSP: 002b:00007ffd5d167088 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
[  150.720918] RAX: ffffffffffffffda RBX: 0000000000404d80 RCX: 00007fcbce75ff19
[  150.721874] RDX: 00007ffd5d16785e RSI: 000000002866cd36 RDI: 0000000000000002
[  150.722827] RBP: 0000000000000006 R08: 000000002866cd36 R09: 00007ffd5d16785e
[  150.723781] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000404d80
[  150.724650] R13: 00007ffd5d16784d R14: 00007ffd5d167238 R15: 000000000040411f
[  150.725447] Code: 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 ff 74 23 55 48 89 e5 53 48 89 fb <48> 8b 3f e8 06 21 c5 ff 48 8b 7b 08 e8 fd 20 c5 ff 48 89 df e8
[  150.727489] RIP: asymmetric_key_free_kids+0x12/0x30 RSP: ffffc90000353e58
[  150.728117] CR2: 0000000000000001
[  150.728430] ---[ end trace f7f8fe1da2d5ae8d ]---

Fixes: 4d8c0250b841 ("KEYS: Call ->free_preparse() even after ->preparse() returns an error")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Eric Biggers
1b253e023f KEYS: fix dereferencing NULL payload with nonzero length
commit 5649645d725c73df4302428ee4e02c869248b4c5 upstream.

sys_add_key() and the KEYCTL_UPDATE operation of sys_keyctl() allowed a
NULL payload with nonzero length to be passed to the key type's
->preparse(), ->instantiate(), and/or ->update() methods.  Various key
types including asymmetric, cifs.idmap, cifs.spnego, and pkcs7_test did
not handle this case, allowing an unprivileged user to trivially cause a
NULL pointer dereference (kernel oops) if one of these key types was
present.  Fix it by doing the copy_from_user() when 'plen' is nonzero
rather than when '_payload' is non-NULL, causing the syscall to fail
with EFAULT as expected when an invalid buffer is specified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Gilad Ben-Yossef
0e479742e8 crypto: asymmetric_keys - handle EBUSY due to backlog correctly
commit e68368aed56324e2e38d4f6b044bb8cf82077fc2 upstream.

public_key_verify_signature() was passing the CRYPTO_TFM_REQ_MAY_BACKLOG
flag to akcipher_request_set_callback() but was not handling correctly
the case where a -EBUSY error could be returned from the call to
crypto_akcipher_verify() if backlog was used, possibly casuing
data corruption due to use-after-free of buffers.

Resolve this by handling -EBUSY correctly.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Eric W. Biederman
7c24a70c70 ptrace: Properly initialize ptracer_cred on fork
commit c70d9d809fdeecedb96972457ee45c49a232d97f upstream.

When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default.  This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork.  Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <tiwai@suse.de>
Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Johan Hovold
3802abc6e0 serial: ifx6x60: fix use-after-free on module unload
commit 1e948479b3d63e3ac0ecca13cbf4921c7d17c168 upstream.

Make sure to deregister the SPI driver before releasing the tty driver
to avoid use-after-free in the SPI remove callback where the tty
devices are deregistered.

Fixes: 72d4724ea54c ("serial: ifx6x60: Add modem power off function in the platform reboot process")
Cc: Jun Chen <jun.d.chen@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Jane Chu
04ac452dad arch/sparc: support NR_CPUS = 4096
[ Upstream commit c79a13734d104b5b147d7cb0870276ccdd660dae ]

Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>

Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Pavel Tatashin
433a50e681 sparc64: delete old wrap code
[ Upstream commit 0197e41ce70511dc3b71f7fefa1a676e2b5cd60b ]

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:54 +02:00
Pavel Tatashin
b6bb22de0c sparc64: new context wrap
[ Upstream commit a0582f26ec9dfd5360ea2f35dd9a1b026f8adda0 ]

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Pavel Tatashin
975f3cdc39 sparc64: add per-cpu mm of secondary contexts
[ Upstream commit 7a5b4bbf49fe86ce77488a70c5dccfe2d50d7a2d ]

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Pavel Tatashin
7932bfad0e sparc64: redefine first version
[ Upstream commit c4415235b2be0cc791572e8e7f7466ab8f73a2bf ]

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Pavel Tatashin
65e3443b61 sparc64: combine activate_mm and switch_mm
[ Upstream commit 14d0334c6748ff2aedb3f2f7fdc51ee90a9b54e7 ]

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Pavel Tatashin
e7590a1b15 sparc64: reset mm cpumask after wrap
[ Upstream commit 588974857359861891f478a070b1dc7ae04a3880 ]

After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
James Clarke
b3ad7a3e57 sparc: Machine description indices can vary
[ Upstream commit c982aa9c304bf0b9a7522fd118fed4afa5a0263c ]

VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Mike Kravetz
8d665e039e sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
[ Upstream commit 654f4807624a657f364417c2a7454f0df9961734 ]

When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used.  As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB.  When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB.  We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
David S. Miller
4b684e6474 sparc64: Add __multi3 for gcc 7.x and later.
[ Upstream commit 1b4af13ff2cc6897557bb0b8d9e2fad4fa4d67aa ]

Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:53 +02:00
Nikolay Aleksandrov
0255284edd net: bridge: start hello timer only if device is up
[ Upstream commit aeb073241fe7a2b932e04e20c60e47718332877f ]

When the transition of NO_STP -> KERNEL_STP was fixed by always calling
mod_timer in br_stp_start, it introduced a new regression which causes
the timer to be armed even when the bridge is down, and since we stop
the timers in its ndo_stop() function, they never get disabled if the
device is destroyed before it's upped.

To reproduce:
$ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on;
ip l del br0; done;

CC: Xin Long <lucien.xin@gmail.com>
CC: Ivan Vecera <cera@cera.cz>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 6d18c732b95c ("bridge: start hello_timer when enabling KERNEL_STP in br_stp_start")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Niklas Cassel
3dd4daf112 net: stmmac: fix completely hung TX when using TSO
[ Upstream commit 426849e6611f2092553f8d53372ae310818a6292 ]

stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.

This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.

When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.

When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Max Filippov
a83564d128 net: ethoc: enable NAPI before poll may be scheduled
[ Upstream commit d220b942a4b6a0640aee78841608f4aa5e8e185e ]

ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Richard Haines
a97f807363 net/ipv6: Fix CALIPSO causing GPF with datagram support
[ Upstream commit e3ebdb20fddacded2740a333ff66781e0d28b05c ]

When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
IP header may have moved.

Also update the payload length after adding the CALIPSO option.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Eric Dumazet
0aa89f1b07 net: ping: do not abuse udp_poll()
[ Upstream commit 77d4b1d36926a9b8387c6b53eeba42bcaaffcea3 ]

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
David S. Miller
599a4478d8 ipv6: Fix leak in ipv6_gso_segment().
[ Upstream commit e3e86b5119f81e5e2499bea7ea1ebe8ac6aab789 ]

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Mark Bloch
c242e1a814 vxlan: fix use-after-free on deletion
[ Upstream commit a53cb29b0af346af44e4abf13d7e59f807fba690 ]

Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.

We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Acked-by: Jiri Benc <jbenc@redhat.com>
Tested-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Yuchung Cheng
3ee35b9682 tcp: disallow cwnd undo when switching congestion control
[ Upstream commit 44abafc4cc094214a99f860f778c48ecb23422fc ]

When the sender switches its congestion control during loss
recovery, if the recovery is spurious then it may incorrectly
revert cwnd and ssthresh to the older values set by a previous
congestion control. Consider a congestion control (like BBR)
that does not use ssthresh and keeps it infinite: the connection
may incorrectly revert cwnd to an infinite value when switching
from BBR to another congestion control.

This patch fixes it by disallowing such cwnd undo operation
upon switching congestion control.  Note that undo_marker
is not reset s.t. the packets that were incorrectly marked
lost would be corrected. We only avoid undoing the cwnd in
tcp_undo_cwnd_reduction().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Ganesh Goudar
61c92d5a53 cxgb4: avoid enabling napi twice to the same queue
[ Upstream commit e7519f9926f1d0d11c776eb0475eb098c7760f68 ]

Take uld mutex to avoid race between cxgb_up() and
cxgb4_register_uld() to enable napi for the same uld
queue.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:52 +02:00
Ben Hutchings
abbcb731d6 ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
[ Upstream commit 6e80ac5cc992ab6256c3dae87f7e57db15e1a58c ]

xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:51 +02:00
Lance Richardson
b5e9b7ad0d vxlan: eliminate cached dst leak
[ Upstream commit 35cf2845563c1aaa01d27bd34d64795c4ae72700 ]

After commit 0c1d70af924b ("net: use dst_cache for vxlan device"),
cached dst entries could be leaked when more than one remote was
present for a given vxlan_fdb entry, causing subsequent netns
operations to block indefinitely and "unregister_netdevice: waiting
for lo to become free." messages to appear in the kernel log.

Fix by properly releasing cached dst and freeing resources in this
case.

Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:51 +02:00
Mintz, Yuval
96d145216b bnx2x: Fix Multi-Cos
[ Upstream commit 3968d38917eb9bd0cd391265f6c9c538d9b33ffa ]

Apparently multi-cos isn't working for bnx2x quite some time -
driver implements ndo_select_queue() to allow queue-selection
for FCoE, but the regular L2 flow would cause it to modulo the
fallback's result by the number of queues.
The fallback would return a queue matching the needed tc
[via __skb_tx_hash()], but since the modulo is by the number of TSS
queues where number of TCs is not accounted, transmission would always
be done by a queue configured into using TC0.

Fixes: ada7c19e6d27 ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:05:51 +02:00
Greg Kroah-Hartman
f1aa865ae5 Linux 4.9.31 2017-06-07 12:08:04 +02:00
Jan Kara
11214bd292 xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
commit d7fd24257aa60316bf81093f7f909dc9475ae974 upstream.

There is an off-by-one error in loop termination conditions in
xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:53 +02:00
Eric Sandeen
75c5afd58d xfs: fix unaligned access in xfs_btree_visit_blocks
commit a4d768e702de224cc85e0c8eac9311763403b368 upstream.

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:53 +02:00
Darrick J. Wong
7fb8ab8f0a xfs: avoid mount-time deadlock in CoW extent recovery
commit 3ecb3ac7b950ff8f6c6a61e8b7b0d6e3546429a0 upstream.

If a malicious user corrupts the refcount btree to cause a cycle between
different levels of the tree, the next mount attempt will deadlock in
the CoW recovery routine while grabbing buffer locks.  We can use the
ability to re-grab a buffer that was previous locked to a transaction to
avoid deadlocks, so do that here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Christoph Hellwig
e40c145c02 xfs: xfs_trans_alloc_empty
This is a partial cherry-pick of commit e89c041338
("xfs: implement the GETFSMAP ioctl"), which also adds this helper, and
a great example of why feature patches should be properly split into
their parts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from the larger patch for -stable]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-07 12:07:52 +02:00
Zorro Lang
0e542792a0 xfs: bad assertion for delalloc an extent that start at i_size
commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream.

By run fsstress long enough time enough in RHEL-7, I find an
assertion failure (harder to reproduce on linux-4.11, but problem
is still there):

  XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c

The assertion is in xfs_getbmap() funciton:

  if (map[i].br_startblock == DELAYSTARTBLOCK &&
-->   map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
          ASSERT((iflags & BMV_IF_DELALLOC) != 0);

When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
startoff is just at EOF. But we only need to make sure delalloc
extents that are within EOF, not include EOF.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Darrick J. Wong
f60d76efa9 xfs: BMAPX shouldn't barf on inline-format directories
commit 6eadbf4c8ba816c10d1c97bed9aa861d9fd17809 upstream.

When we're fulfilling a BMAPX request, jump out early if the data fork
is in local format.  This prevents us from hitting a debugging check in
bmapi_read and barfing errors back to userspace.  The on-disk extent
count check later isn't sufficient for IF_DELALLOC mode because da
extents are in memory and not on disk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Brian Foster
53c44c236f xfs: fix indlen accounting error on partial delalloc conversion
commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream.

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Eryu Guan
54894ea3c5 xfs: fix use-after-free in xfs_finish_page_writeback
commit 161f55efba5ddccc690139fae9373cafc3447a97 upstream.

Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.

This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.

[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975]  dump_stack+0x63/0x84
[ 2747.924673]  kasan_object_err+0x21/0x70
[ 2747.928950]  kasan_report+0x271/0x530
[ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409]  ? end_page_writeback+0xce/0x110
[ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
[ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197]  process_one_work+0x5ff/0x1000
[ 2747.962766]  worker_thread+0xe4/0x10e0
[ 2747.966946]  kthread+0x2d3/0x3d0
[ 2747.970546]  ? process_one_work+0x1000/0x1000
[ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706]  ? do_page_fault+0x30/0x80
[ 2747.989887]  ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411]  save_stack_trace+0x1b/0x20
[ 2748.010688]  save_stack+0x46/0xd0
[ 2748.014383]  kasan_kmalloc+0xad/0xe0
[ 2748.018370]  kasan_slab_alloc+0x12/0x20
[ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024]  alloc_buffer_head+0x22/0xc0
[ 2748.031399]  alloc_page_buffers+0xd1/0x250
[ 2748.035968]  create_empty_buffers+0x30/0x410
[ 2748.040730]  create_page_buffers+0x120/0x1b0
[ 2748.045493]  __block_write_begin_int+0x17a/0x1800
[ 2748.050740]  iomap_write_begin+0x100/0x2f0
[ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362]  iomap_apply+0x157/0x270
[ 2748.064347]  iomap_zero_range+0x5a/0x80
[ 2748.068624]  iomap_truncate_page+0x6b/0xa0
[ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838]  vfs_fallocate+0x2cf/0x780
[ 2748.093021]  SyS_fallocate+0x48/0x80
[ 2748.097006]  do_syscall_64+0x18a/0x430
[ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816]  save_stack_trace+0x1b/0x20
[ 2748.115093]  save_stack+0x46/0xd0
[ 2748.118788]  kasan_slab_free+0x73/0xc0
[ 2748.122969]  kmem_cache_free+0x7a/0x200
[ 2748.127247]  free_buffer_head+0x41/0x80
[ 2748.131524]  try_to_free_buffers+0x178/0x250
[ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563]  try_to_release_page+0x100/0x180
[ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462]  vfs_fallocate+0x2cf/0x780
[ 2748.172642]  SyS_fallocate+0x48/0x80
[ 2748.176629]  do_syscall_64+0x18a/0x430
[ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a

Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Darrick J. Wong
d457f82281 xfs: reserve enough blocks to handle btree splits when remapping
commit fe0be23e68200573de027de9b8cc2b27e7fce35e upstream.

In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent.  This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.

Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.

With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Brian Foster
0ba833fe73 xfs: wait on new inodes during quotaoff dquot release
commit e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 upstream.

The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Brian Foster
2ea882d8eb xfs: update ag iterator to support wait on new inodes
commit ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 upstream.

The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:52 +02:00
Brian Foster
e86b616b5b xfs: support ability to wait on new inodes
commit 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f upstream.

Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Brian Foster
10f0b2c3c2 xfs: fix up quotacheck buffer list error handling
commit 20e8a063786050083fe05b4f45be338c60b49126 upstream.

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Brian Foster
95487d4be1 xfs: prevent multi-fsb dir readahead from reading random blocks
commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream.

Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.

This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.

Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.

Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Eric Sandeen
93bd169845 xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream.

Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Christoph Hellwig
99226b890d xfs: fix integer truncation in xfs_bmap_remap_alloc
commit 52813fb13ff90bd9c39a93446cbf1103c290b6e9 upstream.

bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Brian Foster
4e2762878a xfs: drop iolock from reclaim context to appease lockdep
commit 3b4683c294095b5f777c03307ef8c60f47320e12 upstream.

Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.

The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Darrick J. Wong
4e8163fc81 xfs: actually report xattr extents via iomap
commit 84358536dc355a9c8978ee425f87e116186bed16 upstream.

Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00
Darrick J. Wong
de417ea6b0 xfs: fix over-copying of getbmap parameters from userspace
commit be6324c00c4d1e0e665f03ed1fc18863a88da119 upstream.

In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:07:51 +02:00