IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit e5e5732d81 ]
After revoking atomic write, related LBA can be reused by others, so we
need to wait page writeback before reusing the LBA, in order to avoid
interference between old atomic written in-flight IO and new IO.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4071e67cff ]
The following patch disables loading of f2fs module on architectures
which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
such architectures , log messages are:
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/vdiskb1, missing codepage or helper program, or other error.
/dev/vdiskb1: F2FS filesystem,
UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 1th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 2th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
which was introduced by git commit 5c9b469295
tested on git kernel 4.17.0-rc6-00309-gec30dcf7f425
with patch applied:
modprobe: ERROR: could not insert 'f2fs': Invalid argument
May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096
Signed-off-by: Anatoly Pugachev <matorola@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae55e59da0 ]
If the server recalls the layout that was just handed out, we risk hitting
a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we
release the sequence slot after processing the LAYOUTGET operation that
was sent as part of the OPEN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e37d07983a ]
When cleaning up buffer entries as we wrap up, their state should be
"completed". If any of the entries is in "submitted" state, it means
that something bad has happened. Trigger a warning immediately instead of
waiting for the state flag to eventually be updated, thus hiding the
issue.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 926bc2f100 ]
The stores to update the SLB shadow area must be made as they appear
in the C code, so that the hypervisor does not see an entry with
mismatched vsid and esid. Use WRITE_ONCE for this.
GCC has been observed to elide the first store to esid in the update,
which means that if the hypervisor interrupts the guest after storing
to vsid, it could see an entry with old esid and new vsid, which may
possibly result in memory corruption.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 447808bf50 ]
time_init() will set up tb_ticks_per_usec based on reality.
time_init() is called *after* udbg_init_opal_common() during boot.
from arch/powerpc/kernel/time.c:
unsigned long tb_ticks_per_usec = 100; /* sane default */
Currently, all powernv systems have a timebase frequency of 512mhz
(512000000/1000000 == 0x200) - although there's nothing written
down anywhere that I can find saying that we couldn't make that
different based on the requirements in the ISA.
So, we've been (accidentally) thwacking the (currently) correct
(for powernv at least) value for tb_ticks_per_usec earlier than
we otherwise would have.
The "sane default" seems to be adequate for our purposes between
udbg_init_opal_common() and time_init() being called, and if it isn't,
then we should probably be setting it somewhere that isn't hvc_opal.c!
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46d4be41b9 ]
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.
In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a6b3964ad7 ]
A no-op form of ori (or immediate of 0 into r31 and the result stored
in r31) has been re-tasked as a speculation barrier. The instruction
only acts as a barrier on newer machines with appropriate firmware
support. On older CPUs it remains a harmless no-op.
Implement barrier_nospec using this instruction.
mpe: The semantics of the instruction are believed to be that it
prevents execution of subsequent instructions until preceding branches
have been fully resolved and are no longer executing speculatively.
There is no further documentation available at this time.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1128bb7813 ]
commit 87a156fb18 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops
A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%
Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.
Fixes: 87a156fb18 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb2595c139 ]
ucma_process_join() will free the new allocated "mc" struct,
if there is any error after that, especially the copy_to_user().
But in parallel, ucma_leave_multicast() could find this "mc"
through idr_find() before ucma_process_join() frees it, since it
is already published.
So "mc" could be used in ucma_leave_multicast() after it is been
allocated and freed in ucma_process_join(), since we don't refcnt
it.
Fix this by separating "publish" from ID allocation, so that we
can get an ID first and publish it later after copy_to_user().
Fixes: c8f6a362bf ("RDMA/cma: Add multicast communication support")
Reported-by: Noam Rathaus <noamr@beyondsecurity.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fff200caf6 ]
There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().
Commit 83129b37ef ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.
Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.
Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 44ee54aabf ]
The DA9063 watchdog has only one register field to store the timeout value
and to enable the watchdog. The watchdog gets enabled if the value is
not zero. There is no issue if the watchdog is already running but it
leads into problems if the watchdog is disabled.
If the watchdog is disabled and only the timeout value should be prepared
the watchdog gets enabled too. Add a check to get the current watchdog
state and update the watchdog timeout value on hw-side only if the
watchdog is already active.
Fixes: 5e9c16e376 ("watchdog: Add DA9063 PMIC watchdog driver.")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd975e6914 ]
When listing sets with timeout support, there's a probability that
just timing out entries with "0" timeout value is listed/saved.
However when restoring the saved list, the zero timeout value means
permanent elelements.
The new behaviour is that timing out entries are listed with "timeout 1"
instead of zero.
Fixes netfilter bugzilla #1258.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cbdebe481a ]
Userspace `ipset` command forbids family option for hash:mac type:
ipset create test hash:mac family inet4
ipset v6.30: Unknown argument: `family'
However, this check is not done in kernel itself. When someone use
external netlink applications (pyroute2 python library for example), one
can create hash:mac with invalid family and inconsistant results from
userspace (`ipset` command cannot read set content anymore).
This patch enforce the logic in kernel, and forbids insertion of
hash:mac with a family set.
Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no
impact on other hash:* sets
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit abfdff44bc ]
When using RTC_ALM_SET or RTC_WKALM_SET with rtc_wkalrm.enabled not set,
rtc_timer_enqueue() is not called and rtc_set_alarm() may succeed but the
subsequent RTC_AIE_ON ioctl will fail. RTC_ALM_READ would also fail in that
case.
Ensure rtc_set_alarm() fails when alarms are not supported to avoid letting
programs think the alarms are working for a particular RTC when they are
not.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 401c636a0e ]
When we get a hung task it can often be valuable to see _all_ the hung
tasks on the system before calling panic().
Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5316056503549952
----------------------------------------
INFO: task syz-executor0:6540 blocked for more than 120 seconds.
Not tainted 4.16.0+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor0 D23560 6540 4521 0x80000004
Call Trace:
context_switch kernel/sched/core.c:2848 [inline]
__schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490
schedule+0xf5/0x430 kernel/sched/core.c:3549
schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060
isofs_get_last_session fs/isofs/inode.c:567 [inline]
isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660
mount_bdev+0x2b7/0x370 fs/super.c:1119
isofs_mount+0x34/0x40 fs/isofs/inode.c:1560
mount_fs+0x66/0x2d0 fs/super.c:1222
vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:2514 [inline]
do_new_mount fs/namespace.c:2517 [inline]
do_mount+0xea4/0x2b90 fs/namespace.c:2847
ksys_mount+0xab/0x120 fs/namespace.c:3063
SYSC_mount fs/namespace.c:3077 [inline]
SyS_mount+0x39/0x50 fs/namespace.c:3074
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
(...snipped...)
Showing all locks held in the system:
(...snipped...)
2 locks held by syz-executor0/6540:
#0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline]
#0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */
#1: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
(...snipped...)
3 locks held by syz-executor7/6541:
#0: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
#1: 000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
#2: 00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */
----------------------------------------
When reporting an AB-BA deadlock like shown above, it would be nice if
trace of PID=6541 is printed as well as trace of PID=6540 before calling
panic().
Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay
calling panic() but normally there should not be so many hung tasks.
Link: http://lkml.kernel.org/r/201804050705.BHE57833.HVFOFtSOMQJFOL@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48d8476b41 ]
MAP_DMA ioctls might be called from various threads within a process,
for example when using QEMU, the vCPU threads are often generating
these calls and we therefore take a reference to that vCPU task.
However, QEMU also supports vCPU hotplug on some machines and the task
that called MAP_DMA may have exited by the time UNMAP_DMA is called,
resulting in the mm_struct pointer being NULL and thus a failure to
match against the existing mapping.
To resolve this, we instead take a reference to the thread
group_leader, which has the same mm_struct and resource limits, but
is less likely exit, at least in the QEMU case. A difficulty here is
guaranteeing that the capabilities of the group_leader match that of
the calling thread, which we resolve by tracking CAP_IPC_LOCK at the
time of calling rather than at an indeterminate time in the future.
Potentially this also results in better efficiency as this is now
recorded once per MAP_DMA ioctl.
Reported-by: Xu Yandong <xuyandong2@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 002fe996f6 ]
When we create an mdev device, we check for duplicates against the
parent device and return -EEXIST if found, but the mdev device
namespace is global since we'll link all devices from the bus. We do
catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
with it comes a kernel warning and stack trace for trying to create
duplicate sysfs links, which makes it an undesirable response.
Therefore we should really be looking for duplicates across all mdev
parent devices, or as implemented here, against our mdev device list.
Using mdev_list to prevent duplicates means that we can remove
mdev_parent.lock, but in order not to serialize mdev device creation
and removal globally, we add mdev_device.active which allows UUIDs to
be reserved such that we can drop the mdev_list_lock before the mdev
device is fully in place.
Two behavioral notes; first, mdev_parent.lock had the side-effect of
serializing mdev create and remove ops per parent device. This was
an implementation detail, not an intentional guarantee provided to
the mdev vendor drivers. Vendor drivers can trivially provide this
serialization internally if necessary. Second, review comments note
the new -EAGAIN behavior when the device, and in particular the remove
attribute, becomes visible in sysfs. If a remove is triggered prior
to completion of mdev_device_create() the user will see a -EAGAIN
error. While the errno is different, receiving an error during this
period is not, the previous implementation returned -ENODEV for the
same condition. Furthermore, the consistency to the user is improved
in the case where mdev_device_remove_ops() returns error. Previously
concurrent calls to mdev_device_remove() could see the device
disappear with -ENODEV and return in the case of error. Now a user
would see -EAGAIN while the device is in this transitory state.
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3171822fdc ]
When running a fuzz tester against a KASAN-enabled kernel, the following
splat periodically occurs.
The problem occurs when the test sends a GETDEVICEINFO request with a
malformed xdr array (size but no data) for gdia_notify_types and the
array size is > 0x3fffffff, which results in an overflow in the value of
nbytes which is passed to read_buf().
If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
the overflow occurs, the value of nbytes 0, and when that happens the
pointer returned by read_buf() points to the end of the xdr data (i.e.
argp->end) when really it should be returning NULL.
Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
value is arbitrary, but it's the same threshold used by
nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
expected to get at most a single bitmap in gdia_notify_types).
[ 119.256854] ==================================================================
[ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538
[ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
[ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[ 119.261202] Call Trace:
[ 119.262265] dump_stack+0x71/0xab
[ 119.263371] print_address_description+0x6a/0x270
[ 119.264609] kasan_report+0x258/0x380
[ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
[ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
[ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd]
[ 119.274918] svc_process+0x161c/0x31a0 [sunrpc]
[ 119.276172] ? svc_printk+0x190/0x190 [sunrpc]
[ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc]
[ 119.278622] nfsd+0x2b9/0x430 [nfsd]
[ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
[ 119.281157] kthread+0x2db/0x390
[ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 119.283756] ret_from_fork+0x35/0x40
[ 119.286041] Allocated by task 436:
[ 119.287525] kasan_kmalloc+0xa0/0xd0
[ 119.288685] kmem_cache_alloc+0xe9/0x1f0
[ 119.289900] get_empty_filp+0x7b/0x410
[ 119.291037] path_openat+0xca/0x4220
[ 119.292242] do_filp_open+0x182/0x280
[ 119.293411] do_sys_open+0x216/0x360
[ 119.294555] do_syscall_64+0xa0/0x2f0
[ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 119.298068] Freed by task 436:
[ 119.299271] __kasan_slab_free+0x130/0x180
[ 119.300557] kmem_cache_free+0x78/0x210
[ 119.301823] rcu_process_callbacks+0x35b/0xbd0
[ 119.303162] __do_softirq+0x192/0x5ea
[ 119.305443] The buggy address belongs to the object at ffff880113ada000
which belongs to the cache filp of size 256
[ 119.308556] The buggy address is located 0 bytes inside of
256-byte region [ffff880113ada000, ffff880113ada100)
[ 119.311376] The buggy address belongs to the page:
[ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
[ 119.314428] flags: 0x17ffe000000100(slab)
[ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
[ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
[ 119.319050] page dumped because: kasan: bad access detected
[ 119.321652] Memory state around the buggy address:
[ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.327547] ^
[ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 119.331740] ==================================================================
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f9312a5410 ]
If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP,
then it thinks we're trying to replay an existing request. If so, then
let's just bump the sequence ID and retry the operation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 93b7f7ad20 ]
Currently, when IO to DS fails, client returns the layout and
retries against the MDS. However, then on umounting (inode eviction)
it returns the layout again.
This is because pnfs_return_layout() was changed in
commit d78471d32b ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR")
to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned
the layout, it will be returned again. Instead, let's also check
if we have already marked the layout invalid.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7bf7bb37f1 ]
When finding the parent netvsc device, the search needs to be across
all netvsc device instances (independent of network namespace).
Find parent device of VF using upper_dev_get routine which
searches only adjacent list.
Fixes: e8ff40d4bf ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
netns aware byref
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 57f230ab04 ]
The max number of slots used in xennet_get_responses() is set to
MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
difference is resulting in frequent messages "too many slots" and a
reduced network throughput for some workloads (factor 10 below that of
a kernel-xen based guest).
Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
the max number of slots to use solves that problem (tests showed no
more messages "too many slots" and throughput was as high as with the
kernel-xen based guest system).
Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
netfront_tx_slot_available() for making it clearer what is really being
tested without actually modifying the tested value.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c9484b986e ]
Patch series "kcov: fix unexpected faults".
These patches fix a few issues where KCOV code could trigger recursive
faults, discovered while debugging a patch enabling KCOV for arch/arm:
* On CONFIG_PREEMPT kernels, there's a small race window where
__sanitizer_cov_trace_pc() can see a bogus kcov_area.
* Lazy faulting of the vmalloc area can cause mutual recursion between
fault handling code and __sanitizer_cov_trace_pc().
* During the context switch, switching the mm can cause the kcov_area to
be transiently unmapped.
These are prerequisites for enabling KCOV on arm, but the issues
themsevles are generic -- we just happen to avoid them by chance rather
than design on x86-64 and arm64.
This patch (of 3):
For kernels built with CONFIG_PREEMPT, some C code may execute before or
after the interrupt handler, while the hardirq count is zero. In these
cases, in_task() can return true.
A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
resets the task's kcov data via kcov_task_init(). Instrumented code
executed during this period will call __sanitizer_cov_trace_pc(), and as
in_task() returns true, will inspect t->kcov_mode before trying to write
to t->kcov_area.
In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores,
which may be re-ordered, torn, etc. Thus __sanitizer_cov_trace_pc() may
see bogus values for any of these fields, and may attempt to write to
memory which is not mapped.
Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
barrier() to ensure this is ordered before we clear t->kov_{area,size}.
This ensures that any code execute while kcov_init_task() is preempted
will either see valid values for t->kcov_{area,size}, or will see that
t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
t->kcov_area.
Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9e25826ffc ]
Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:
[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617] mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748] mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290] mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238] pci_device_remove+0x31/0xb0
[13578.650244] device_release_driver_internal+0x14f/0x220
[13578.655562] driver_detach+0x32/0x70
[13578.659183] bus_remove_driver+0x47/0xa0
[13578.663134] pci_unregister_driver+0x1e/0x80
[13578.667486] mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207] __x64_sys_delete_module+0x13b/0x1e0
[13578.677888] ? exit_to_usermode_loop+0x78/0x80
[13578.682374] do_syscall_64+0x39/0xe0
[13578.685976] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.
Fixes: b3529af6bb ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b0eb6b41a upstream.
Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:
/git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
/git/arm-soc/include/linux/compiler.h:357:38: error: call to
'__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)
The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
#else
#define LAST_CPUPID_WIDTH 0
#endif
#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
#define LAST_CPUPID_NOT_IN_PAGE_FLAGS
#endif
which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.
However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.
Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.
Says Arnd:
Further experiments show that the build error already existed before,
but was only triggered with larger values of CONFIG_NR_CPU and/or
CONFIG_NODES_SHIFT that might be used in actual configurations but
not in randconfig builds.
With longer CPU and node masks, I could recreate the problem with
kernels as old as linux-4.7 when arm64 NUMA support got added.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf ("arm64: mm: move vmemmap region right below the linear region")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2519c1bbe3 upstream.
Commit 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:
"warning: 'link' may be used uninitialized in this function"
It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.
Cc: stable@vger.kernel.org
Fixes: 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e536e222f upstream.
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15cc78644d upstream.
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.
Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.
Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 7862ad1846 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1863c38725 upstream.
Running the following:
# cd /sys/kernel/debug/tracing
# echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
# echo snapshot > events/sched/sched_switch/trigger
Triggers the following bug:
------------[ cut here ]------------
kernel BUG at mm/slub.c:296!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:kfree+0x16c/0x180
Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
Call Trace:
event_trigger_callback+0xee/0x1d0
event_trigger_write+0xfc/0x1a0
__vfs_write+0x33/0x190
? handle_mm_fault+0x115/0x230
? _cond_resched+0x16/0x40
vfs_write+0xb0/0x190
ksys_write+0x52/0xc0
do_syscall_64+0x5a/0x160
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f363e16ab50
Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
---[ end trace d301afa879ddfa25 ]---
The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.
By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.
Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
Cc: stable@vger.kerne.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b512719f77 upstream.
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
Commit c96f5471ce ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays. If
%current succeeded init while @p didn't, it leads to the following
crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
RIP: 0010:__delayacct_blkio_end+0xc/0x40
Call Trace:
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
Fixes: c96f5471ce ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <dsj@fb.com>
Debugged-by: Dave Jones <dsj@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Snyder <joshs@netflix.com>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>