IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 678cce4019d746da6c680c48ba9e6d417803e127 upstream.
The x86_64 implementation of Poly1305 produces the wrong result on some
inputs because poly1305_4block_avx2() incorrectly assumes that when
partially reducing the accumulator, the bits carried from limb 'd4' to
limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic
which processes only one block at a time. However, it's not true for
the AVX2 implementation, which processes 4 blocks at a time and
therefore can produce intermediate limbs about 4x larger.
Fix it by making the relevant calculations use 64-bit arithmetic rather
than 32-bit. Note that most of the carries already used 64-bit
arithmetic, but the d4 -> h0 carry was different for some reason.
To be safe I also made the same change to the corresponding SSE2 code,
though that only operates on 1 or 2 blocks at a time. I don't think
it's really needed for poly1305_block_sse2(), but it doesn't hurt
because it's already x86_64 code. It *might* be needed for
poly1305_2block_sse2(), but overflows aren't easy to reproduce there.
This bug was originally detected by my patches that improve testmgr to
fuzz algorithms against their generic implementation. But also add a
test vector which reproduces it directly (in the AVX2 case).
Fixes: b1ccc8f4b631 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64")
Fixes: c70f4abef07a ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64")
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Martin Willi <martin@strongswan.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a58038b9e420276157785afa0a0bbb4b9bc2265 upstream.
This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57.
As Oren Twaig pointed out the old discussion:
https://patchwork.kernel.org/patch/8292231/
that the change coud potentially cause an extra IPI to be sent to
the destination vcpu because the AVIC hardware already set the IRR bit
before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
Since writting to ICR and ICR2 will also set the IRR. If something triggers
the destination vcpu to get scheduled before the emulation finishes, then
this could result in an additional IPI.
Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed.
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Oren Twaig <oren@scalemp.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0228034d8e5915b98c33db35a98f5e909e848ae9 upstream.
This patch clears FC_RP_STARTED flag during logoff, because of this
re-login(flogi) didn't happen to the switch.
This reverts commit 1550ec458e0cf1a40a170ab1f4c46e3f52860f65.
Fixes: 1550ec458e0c ("scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO")
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Hannes Reinecke <hare@#suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be549d49115422f846b6d96ee8fd7173a5f7ceb0 upstream.
When SCSI blk-mq is enabled, there is a bug in handling errors in
scsi_queue_rq. Specifically, the bug is not setting result field of
scsi_request correctly when the dispatch of the command has been
failed. Since the upper layer code including the sg_io ioctl expects to
receive any error status from result field of scsi_request, the error is
silently ignored and this could cause data corruptions for some
applications.
Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a3f7221acddfe1caa9ff09b3a8158c39b2fdeac upstream.
There is a small race window in the card disconnection code that
allows the registration of another card with the very same card id.
This leads to a warning in procfs creation as caught by syzkaller.
The problem is that we delete snd_cards and snd_cards_lock entries at
the very beginning of the disconnection procedure. This makes the
slot available to be assigned for another card object while the
disconnection procedure is being processed. Then it becomes possible
to issue a procfs registration with the existing file name although we
check the conflict beforehand.
The fix is simply to move the snd_cards and snd_cards_lock clearances
at the end of the disconnection procedure. The references to these
entries are merely either from the global proc files like
/proc/asound/cards or from the card registration / disconnection, so
it should be fine to shift at the very end.
Reported-by: syzbot+48df349490c36f9f54ab@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b26e36b7ef36a8a3a147b1609b2505f8a4ecf511 upstream.
We have two Dell laptops which have the codec 10ec0236 and 10ec0256
respectively, the headset mic on them can't work, need to apply the
quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin
configurations in the pin quirk table.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af4b54a2e5ba18259ff9aac445bf546dd60d037e upstream.
`ni6501_alloc_usb_buffers()` is called from `ni6501_auto_attach()` to
allocate RX and TX buffers for USB transfers. It allocates
`devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the
allocation of `devpriv->usb_tx_buf` fails, it frees
`devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an
error. Later, `ni6501_detach()` will be called from the core comedi
module code to clean up. `ni6501_detach()` also frees both
`devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but
`devpriv->usb_rx_buf` may have already beed freed, leading to a
double-free error. Fix it bu removing the call to
`kfree(devpriv->usb_rx_buf)` from `ni6501_alloc_usb_buffers()`, relying
on `ni6501_detach()` to free the memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 660cf4ce9d0f3497cc7456eaa6d74c8b71d6282c upstream.
If `ni6501_auto_attach()` returns an error, the core comedi module code
will call `ni6501_detach()` to clean up. If `ni6501_auto_attach()`
successfully allocated the comedi device private data, `ni6501_detach()`
assumes that a `struct mutex mut` contained in the private data has been
initialized and uses it. Unfortunately, there are a couple of places
where `ni6501_auto_attach()` can return an error after allocating the
device private data but before initializing the mutex, so this
assumption is invalid. Fix it by initializing the mutex just after
allocating the private data in `ni6501_auto_attach()` before any other
errors can be retturned. Also move the call to `usb_set_intfdata()`
just to keep the code a bit neater (either position for the call is
fine).
I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=cf4f2b6c24aff0a3edf6>:
usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
usb 1-1: string descriptor 0 read error: -71
comedi comedi0: Wrong number of endpoints
ni6501 1-1:0.233: driver 'ni6501' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 585 Comm: kworker/0:3 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe8/0x16e lib/dump_stack.c:113
assign_lock_key kernel/locking/lockdep.c:786 [inline]
register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
__lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
__mutex_lock_common kernel/locking/mutex.c:925 [inline]
__mutex_lock+0xfe/0x12b0 kernel/locking/mutex.c:1072
ni6501_detach+0x5b/0x110 drivers/staging/comedi/drivers/ni_usb6501.c:567
comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
hub_port_connect drivers/usb/core/hub.c:5089 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
port_event drivers/usb/core/hub.c:5350 [inline]
hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
kthread+0x313/0x420 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Reported-by: syzbot+cf4f2b6c24aff0a3edf6@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 663d294b4768bfd89e529e069bffa544a830b5bf upstream.
`vmk80xx_alloc_usb_buffers()` is called from `vmk80xx_auto_attach()` to
allocate RX and TX buffers for USB transfers. It allocates
`devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the
allocation of `devpriv->usb_tx_buf` fails, it frees
`devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an
error. Later, `vmk80xx_detach()` will be called from the core comedi
module code to clean up. `vmk80xx_detach()` also frees both
`devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but
`devpriv->usb_rx_buf` may have already been freed, leading to a
double-free error. Fix it by removing the call to
`kfree(devpriv->usb_rx_buf)` from `vmk80xx_alloc_usb_buffers()`, relying
on `vmk80xx_detach()` to free the memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08b7c2f9208f0e2a32159e4e7a4831b7adb10a3e upstream.
If `vmk80xx_auto_attach()` returns an error, the core comedi module code
will call `vmk80xx_detach()` to clean up. If `vmk80xx_auto_attach()`
successfully allocated the comedi device private data,
`vmk80xx_detach()` assumes that a `struct semaphore limit_sem` contained
in the private data has been initialized and uses it. Unfortunately,
there are a couple of places where `vmk80xx_auto_attach()` can return an
error after allocating the device private data but before initializing
the semaphore, so this assumption is invalid. Fix it by initializing
the semaphore just after allocating the private data in
`vmk80xx_auto_attach()` before any other errors can be returned.
I believe this was the cause of the following syzbot crash report
<https://syzkaller.appspot.com/bug?extid=54c2f58f15fe6876b6ad>:
usb 1-1: config 0 has no interface number 0
usb 1-1: New USB device found, idVendor=10cf, idProduct=8068, bcdDevice=e6.8d
usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe8/0x16e lib/dump_stack.c:113
assign_lock_key kernel/locking/lockdep.c:786 [inline]
register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
__lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152
down+0x12/0x80 kernel/locking/semaphore.c:58
vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829
comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156
comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190
comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880
comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
really_probe+0x2da/0xb10 drivers/base/dd.c:509
driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
__device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
__device_attach+0x223/0x3a0 drivers/base/dd.c:844
bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
device_add+0xad2/0x16e0 drivers/base/core.c:2106
usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
hub_port_connect drivers/usb/core/hub.c:5089 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
port_event drivers/usb/core/hub.c:5350 [inline]
hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
kthread+0x313/0x420 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Reported-by: syzbot+54c2f58f15fe6876b6ad@syzkaller.appspotmail.com
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe2d3df639a7940a125a33d6460529b9689c5406 upstream.
On some laptops, kxcjk1013 is powered off when system enters S3. We need
restore the range regiter during resume. Otherwise, the sensor doesn't
work properly after S3.
Signed-off-by: he, bo <bo.he@intel.com>
Signed-off-by: Chen, Hu <hu1.chen@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f75591fc5a123929a29636834d1bcb8b5c9fee3 upstream.
This fixes a possible circular locking dependency detected warning seen
with:
- CONFIG_PROVE_LOCKING=y
- consumer/provider IIO devices (ex: "voltage-divider" consumer of "adc")
When using the IIO consumer interface, e.g. iio_channel_get(), the consumer
device will likely call iio_read_channel_raw() or similar that rely on
'info_exist_lock' mutex.
typically:
...
mutex_lock(&chan->indio_dev->info_exist_lock);
if (chan->indio_dev->info == NULL) {
ret = -ENODEV;
goto err_unlock;
}
ret = do_some_ops()
err_unlock:
mutex_unlock(&chan->indio_dev->info_exist_lock);
return ret;
...
Same mutex is also hold in iio_device_unregister().
The following deadlock warning happens when:
- the consumer device has called an API like iio_read_channel_raw()
at least once.
- the consumer driver is unregistered, removed (unbind from sysfs)
======================================================
WARNING: possible circular locking dependency detected
4.19.24 #577 Not tainted
------------------------------------------------------
sh/372 is trying to acquire lock:
(kn->count#30){++++}, at: kernfs_remove_by_name_ns+0x3c/0x84
but task is already holding lock:
(&dev->info_exist_lock){+.+.}, at: iio_device_unregister+0x18/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&dev->info_exist_lock){+.+.}:
__mutex_lock+0x70/0xa3c
mutex_lock_nested+0x1c/0x24
iio_read_channel_raw+0x1c/0x60
iio_read_channel_info+0xa8/0xb0
dev_attr_show+0x1c/0x48
sysfs_kf_seq_show+0x84/0xec
seq_read+0x154/0x528
__vfs_read+0x2c/0x15c
vfs_read+0x8c/0x110
ksys_read+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbedefb60
-> #0 (kn->count#30){++++}:
lock_acquire+0xd8/0x268
__kernfs_remove+0x288/0x374
kernfs_remove_by_name_ns+0x3c/0x84
remove_files+0x34/0x78
sysfs_remove_group+0x40/0x9c
sysfs_remove_groups+0x24/0x34
device_remove_attrs+0x38/0x64
device_del+0x11c/0x360
cdev_device_del+0x14/0x2c
iio_device_unregister+0x24/0x60
release_nodes+0x1bc/0x200
device_release_driver_internal+0x1a0/0x230
unbind_store+0x80/0x130
kernfs_fop_write+0x100/0x1e4
__vfs_write+0x2c/0x160
vfs_write+0xa4/0x17c
ksys_write+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbe906840
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->info_exist_lock);
lock(kn->count#30);
lock(&dev->info_exist_lock);
lock(kn->count#30);
*** DEADLOCK ***
...
cdev_device_del() can be called without holding the lock. It should be safe
as info_exist_lock prevents kernelspace consumers to use the exported
routines during/after provider removal. cdev_device_del() is for userspace.
Help to reproduce:
See example: Documentation/devicetree/bindings/iio/afe/voltage-divider.txt
sysv {
compatible = "voltage-divider";
io-channels = <&adc 0>;
output-ohms = <22>;
full-ohms = <222>;
};
First, go to iio:deviceX for the "voltage-divider", do one read:
$ cd /sys/bus/iio/devices/iio:deviceX
$ cat in_voltage0_raw
Then, unbind the consumer driver. It triggers above deadlock warning.
$ cd /sys/bus/platform/drivers/iio-rescale/
$ echo sysv > unbind
Note I don't actually expect stable will pick this up all the
way back into IIO being in staging, but if's probably valid that
far back.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Fixes: ac917a81117c ("staging:iio:core set the iio_dev.info pointer to null on unregister")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09c6bdee51183a575bf7546890c8c137a75a2b44 upstream.
Having a brief look at at91_adc_read_raw() it is obvious that in the case
of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR registers is
omitted. If 2 different channels are queried we can end up with a
situation where two interrupts are enabled, but only one interrupt is
cleared in the interrupt handler. Resulting in a interrupt loop and a
system hang.
Signed-off-by: Georg Ottinger <g.ottinger@abatec.at>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20ea39ef9f2f911bd01c69519e7d69cfec79fde3 upstream.
The trialmask is expected to have all bits set to 0 after allocation.
Currently kmalloc_array() is used which does not zero the memory and so
random bits are set. This results in random channels being enabled when
they shouldn't. Replace kmalloc_array() with kcalloc() which has the same
interface but zeros the memory.
Note the fix is actually required earlier than the below fixes tag, but
will require a manual backport due to move from kmalloc to kmalloc_array.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Fixes commit 057ac1acdfc4 ("iio: Use kmalloc_array() in iio_scan_mask_set()").
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06003531502d06bc89d32528f6ec96bf978790f9 upstream.
When issuing the write DAC register and write eeprom command, the two
powerdown bits (PD0 and PD1) are assumed by the chip to be present in
the bytes sent. Leaving them at 0 implies "powerdown disabled" which is
a different state that the current one. By adding the current state of
the powerdown in the i2c write, the chip will correctly power-on exactly
like as it is at the moment of store_eeprom call.
This is documented in MCP4725's datasheet, FIGURE 6-2: "Write Commands
for DAC Input Register and EEPROM" and MCP4726's datasheet, FIGURE 6-3:
"Write All Memory Command".
Signed-off-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com>
Acked-by: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fccfb9ce70ed4ea7a145f77b86de62e38178517f upstream.
The desired channel has to be selected in order to correctly fill the
buffer with the corresponding data.
The `ad_sd_write_reg()` already does this, but for the
`ad_sd_read_reg_raw()` this was omitted.
Fixes: af3008485ea03 ("iio:adc: Add common code for ADI Sigma Delta devices")
Signed-off-by: Dragos Bogdan <dragos.bogdan@analog.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d02d7082e5823598090530c3988a35f69689943 upstream.
Calculation did not use IIO_DEGREE_TO_RAD and implemented a variant to
avoid precision loss as we aim a nano value. The offset added to avoid
rounding error, though, doesn't give us a close result to the expected
value. E.g.
For 1000dps, the result should be:
(1000 * pi ) / 180 >> 15 ~= 0.000532632218
But with current calculation we get
$ cat scale
0.000547890
Fix the calculation by just doing the maths involved for a nano value
val * pi * 10e12 / (180 * 2^15)
so we get a closer result.
$ cat scale
0.000532632
Fixes: c14dca07a31d ("iio: cros_ec_sensors: add ChromeOS EC Contiguous Sensors driver")
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40a7198a4a01037003c7ca714f0d048a61e729ac upstream.
Standard unit for temperature is millidegrees Celcius, whereas this driver
was reporting in degrees. Fix the scale factor in the driver.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 409a51e0a4a5f908763191fae2c29008632eb712 upstream.
According to the datasheet, the last bit of CHIP_ID register controls
I2C bus, and the first one is unused. Handle this correctly.
Note that there are chips out there that have a value such that
the id check currently fails.
Signed-off-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce0f216221856a17fc4934b39284678a5fef2e9 upstream.
This patch fixes the differential channels addresses for the ad7193.
Signed-off-by: Mircea Caprioru <mircea.caprioru@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a8a29be499cbb67df79370aaf5109085509feb8 upstream.
This patch fixes an obvious typo, which will cause erroneously returning the Peak
Voltage instead of the Peak Current.
Signed-off-by: Leonard Pollak <leonardp@tr-host.de>
Cc: <Stable@vger.kernel.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99c221796a810055974b54c02e8f53297e48d146 upstream.
I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
shows that we're sometimes able to deliver a few but never all.
When we're trying to inject an NMI we may fail to do so immediately for
various reasons, however, we still need to inject it so enable_nmi_window()
arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
for pending NMIs before entering the guest and unless there's a different
event to process, the NMI will never get delivered.
Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
pending NMIs are checked and possibly injected.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f4dc2e77cdfaf7e644ef29693fa229db29ee1de upstream.
Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
state area, i.e. don't save/restore EFER across SMM transitions. KVM
somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
guest doesn't support long mode. But during RSM, KVM unconditionally
clears EFER so that it can get back to pure 32-bit mode in order to
start loading CRs with their actual non-SMM values.
Clear EFER only when it will be written when loading the non-SMM state
so as to preserve bits that can theoretically be set on 32-bit vCPUs,
e.g. KVM always emulates EFER_SCE.
And because CR4.PAE is cleared only to play nice with EFER, wrap that
code in the long mode check as well. Note, this may result in a
compiler warning about cr4 being consumed uninitialized. Re-read CR4
even though it's technically unnecessary, as doing so allows for more
readable code and RSM emulation is not a performance critical path.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b98749cac4a695f084a5ff076f4510b23e353ecd upstream.
In the oplock break handler, writing pending changes from pages puts
the FileInfo handle. If the refcount reaches zero it closes the handle
and waits for any oplock break handler to return, thus causing a deadlock.
To prevent this situation:
* We add a wait flag to cifsFileInfo_put() to decide whether we should
wait for running/pending oplock break handlers
* We keep an additionnal reference of the SMB FileInfo handle so that
for the rest of the handler putting the handle won't close it.
- The ref is bumped everytime we queue the handler via the
cifs_queue_oplock_break() helper.
- The ref is decremented at the end of the handler
This bug was triggered by xfstest 464.
Also important fix to address the various reports of
oops in smb2_push_mandatory_locks
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1f227d16083b2e280b7dde4ca78883d75593f2fd ]
The thunderx driver forbids to load an eBPF program if the MTU is too high,
but this can be circumvented by loading the eBPF, then raising the MTU.
Fix this by limiting the MTU if an eBPF program is already loaded.
Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ee15c101f29e0093ffb5448773ccbc786eb313b ]
The thunderx driver splits frames bigger than 1530 bytes to multiple
pages, making impossible to run an eBPF program on it.
This leads to a maximum MTU of 1508 if QinQ is in use.
The thunderx driver forbids to load an eBPF program if the MTU is higher
than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
protocols which need some more headroom.
Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed0de45a1008991fdaa27a0152befcb74d126a8b ]
Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.
Refer to the commit 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6fd) for a similar issue.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 813dbeb656d6c90266f251d8bd2b02d445afa63f ]
We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.
Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 43c2adb9df7ddd6560fd3546d925b42cef92daa0 ]
After adding a team interface to bridge, the team interface will enter
promisc mode. Then if we add a new slave to team0, the slave will keep
promisc off. Fix it by setting slave to promisc on if team master is
already in promisc mode, also do the same for allmulti.
v2: add promisc and allmulti checking when delete ports
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 50ce163a72d817a99e8974222dcf2886d5deb1ae ]
For some reason, tcp_grow_window() correctly tests if enough room
is present before attempting to increase tp->rcv_ssthresh,
but does not prevent it to grow past tcp_space()
This is causing hard to debug issues, like failing
the (__tcp_select_window(sk) >= tp->rcv_wnd) test
in __tcp_ack_snd_check(), causing ACK delays and possibly
slow flows.
Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio,
we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000"
after about 60 round trips, when the active side no longer sends
immediate acks.
This bug predates git history.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 988dc4a9a3b66be75b30405a5494faf0dc7cffb6 ]
gue tunnels run iptunnel_pull_offloads on received skbs. This can
determine a possible use-after-free accessing guehdr pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5b493ce192bd7a4e7bd073b5685aad121eeef82 ]
br_multicast_start_querier() walks over the port list but it can be
called from a timer with only multicast_lock held which doesn't protect
the port list, so use RCU to walk over it.
Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3b2e2904deb314cc77a2192f506f2fd44e3d10d0 ]
When the commit below was introduced it changed two visible things:
- the skb was no longer passed through the protocol handlers with the
original device
- the skb was passed up the stack with skb->dev = bridge
The first change broke af_packet sockets on bridge ports. For example we
use them for hostapd which listens for ETH_P_PAE packets on the ports.
We discussed two possible fixes:
- create a clone and pass it through NF_HOOK(), act on the original skb
based on the result
- somehow signal to the caller from the okfn() that it was called,
meaning the skb is ok to be passed, which this patch is trying to
implement via returning 1 from the bridge link-local okfn()
Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
drop/error would return < 0 thus the okfn() is called only when the
return was 1, so we signal to the caller that it was called by preserving
the return value from nf_hook().
Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 899537b73557aafbdd11050b501cf54b4f5c45af ]
arg is controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/atm/lec.c:715 lec_mcast_attach() warn: potential spectre issue 'dev_lec' [r] (local cap)
Fix this by sanitizing arg before using it to index dev_lec.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 92480b3977fd3884649d404cbbaf839b70035699 ]
When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.
This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.
Reproducer:
ip link add bondL type bond
ip link add bondU type bond
ip link set bondL master bondU
ip link del bondL
No "Fixes:" tag, this code is older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 27da0d2ef998e222a876c0cec72aa7829a626266 ]
A bugfix just broke compilation of appletalk when CONFIG_SYSCTL
is disabled:
In file included from net/appletalk/ddp.c:65:
net/appletalk/ddp.c: In function 'atalk_init':
include/linux/atalk.h:164:34: error: expected expression before 'do'
#define atalk_register_sysctl() do { } while(0)
^~
net/appletalk/ddp.c:1934:7: note: in expansion of macro 'atalk_register_sysctl'
rc = atalk_register_sysctl();
This is easier to avoid by using conventional inline functions
as stubs rather than macros. The header already has inline
functions for other purposes, so I'm changing over all the
macros for consistency.
Fixes: 6377f787aeb9 ("appletalk: Fix use-after-free in atalk_proc_exit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This was fixed in upstream by commit 7d9e6c5afab6 ("net: stmmac: Integrate
XGMAC into main driver flow") that is a new feature commit.
We found a race condition in the DMA init sequence that hits if the
PHY already has link up during stmmac_hw_setup. Since the ring length
was programmed after enabling the RX path, we might receive a packet
before the correct ring length is programmed. When that happened we
could not get reliable interrupts for DMA RX and the MTL complained
about RX FIFO overrun.
Signed-off-by: Lars Persson <larper@axis.com>
Cc: stable@vger.kernel.org # 4.14.x
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The changes to fix the CVE 2019-7308 make the bpf verifier stricter
with respect to operations that were allowed earlier in unprivileged
mode. Fixup the test cases so that the error messages now correctly
reflect pointer arithmetic going out of range for tests.
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3612af783cf52c74a031a2f11b82247b2599d3cd upstream.
Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:
[...]
uint64_t a = bpf_ktime_get_ns();
uint64_t b = bpf_ktime_get_ns();
uint64_t delta = b - a;
if ((int64_t)delta > 0) {
[...]
Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).
Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Reported-by: Arthur Fabre <afabre@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0803278b0b4d8eeb2b461fb698785df65a725d9e upstream.
Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug.
Call trace:
dump_stack+0xbf/0x12e
print_address_description+0x6a/0x280
kasan_report+0x237/0x360
sanitize_ptr_alu+0x85a/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fault injection trace:
kfree+0xea/0x290
free_func_state+0x4a/0x60
free_verifier_state+0x61/0xe0
push_stack+0x216/0x2f0 <- inject failslab
sanitize_ptr_alu+0x2b1/0x8d0
adjust_ptr_min_max_vals+0x8f2/0x1ca0
adjust_reg_min_max_vals+0x8ed/0x22e0
do_check+0x1ca6/0x5d00
bpf_check+0x9ca/0x2570
bpf_prog_load+0xc91/0x1030
__se_sys_bpf+0x61e/0x1f00
do_syscall_64+0xc8/0x550
entry_SYSCALL_64_after_hwframe+0x49/0xbe
When kzalloc() fails in push_stack(), free_verifier_state() will free
current verifier state. As push_stack() returns, dst_reg was restored
if ptr_is_dst_reg is false. However, as member of the cur_state,
dst_reg is also freed, and error occurs when dereferencing dst_reg.
Simply fix it by testing ret of push_stack() before restoring dst_reg.
Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3bd7413e0ca40b60cf60d4003246d067cafdeda upstream.
While 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer
arithmetic") took care of rejecting alu op on pointer when e.g. pointer
came from two different map values with different map properties such as
value size, Jann reported that a case was not covered yet when a given
alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
different branches where we would incorrectly try to sanitize based
on the pointer's limit. Catch this corner case and reject the program
instead.
Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 979d63d50c0c0f7bc537bf821e056cc9fe5abd38 upstream.
Jann reported that the original commit back in b2157399cc98
("bpf: prevent out-of-bounds speculation") was not sufficient
to stop CPU from speculating out of bounds memory access:
While b2157399cc98 only focussed on masking array map access
for unprivileged users for tail calls and data access such
that the user provided index gets sanitized from BPF program
and syscall side, there is still a more generic form affected
from BPF programs that applies to most maps that hold user
data in relation to dynamic map access when dealing with
unknown scalars or "slow" known scalars as access offset, for
example:
- Load a map value pointer into R6
- Load an index into R7
- Do a slow computation (e.g. with a memory dependency) that
loads a limit into R8 (e.g. load the limit from a map for
high latency, then mask it to make the verifier happy)
- Exit if R7 >= R8 (mispredicted branch)
- Load R0 = R6[R7]
- Load R0 = R6[R0]
For unknown scalars there are two options in the BPF verifier
where we could derive knowledge from in order to guarantee
safe access to the memory: i) While </>/<=/>= variants won't
allow to derive any lower or upper bounds from the unknown
scalar where it would be safe to add it to the map value
pointer, it is possible through ==/!= test however. ii) another
option is to transform the unknown scalar into a known scalar,
for example, through ALU ops combination such as R &= <imm>
followed by R |= <imm> or any similar combination where the
original information from the unknown scalar would be destroyed
entirely leaving R with a constant. The initial slow load still
precedes the latter ALU ops on that register, so the CPU
executes speculatively from that point. Once we have the known
scalar, any compare operation would work then. A third option
only involving registers with known scalars could be crafted
as described in [0] where a CPU port (e.g. Slow Int unit)
would be filled with many dependent computations such that
the subsequent condition depending on its outcome has to wait
for evaluation on its execution port and thereby executing
speculatively if the speculated code can be scheduled on a
different execution port, or any other form of mistraining
as described in [1], for example. Given this is not limited
to only unknown scalars, not only map but also stack access
is affected since both is accessible for unprivileged users
and could potentially be used for out of bounds access under
speculation.
In order to prevent any of these cases, the verifier is now
sanitizing pointer arithmetic on the offset such that any
out of bounds speculation would be masked in a way where the
pointer arithmetic result in the destination register will
stay unchanged, meaning offset masked into zero similar as
in array_index_nospec() case. With regards to implementation,
there are three options that were considered: i) new insn
for sanitation, ii) push/pop insn and sanitation as inlined
BPF, iii) reuse of ax register and sanitation as inlined BPF.
Option i) has the downside that we end up using from reserved
bits in the opcode space, but also that we would require
each JIT to emit masking as native arch opcodes meaning
mitigation would have slow adoption till everyone implements
it eventually which is counter-productive. Option ii) and iii)
have both in common that a temporary register is needed in
order to implement the sanitation as inlined BPF since we
are not allowed to modify the source register. While a push /
pop insn in ii) would be useful to have in any case, it
requires once again that every JIT needs to implement it
first. While possible, amount of changes needed would also
be unsuitable for a -stable patch. Therefore, the path which
has fewer changes, less BPF instructions for the mitigation
and does not require anything to be changed in the JITs is
option iii) which this work is pursuing. The ax register is
already mapped to a register in all JITs (modulo arm32 where
it's mapped to stack as various other BPF registers there)
and used in constant blinding for JITs-only so far. It can
be reused for verifier rewrites under certain constraints.
The interpreter's tmp "register" has therefore been remapped
into extending the register set with hidden ax register and
reusing that for a number of instructions that needed the
prior temporary variable internally (e.g. div, mod). This
allows for zero increase in stack space usage in the interpreter,
and enables (restricted) generic use in rewrites otherwise as
long as such a patchlet does not make use of these instructions.
The sanitation mask is dynamic and relative to the offset the
map value or stack pointer currently holds.
There are various cases that need to be taken under consideration
for the masking, e.g. such operation could look as follows:
ptr += val or val += ptr or ptr -= val. Thus, the value to be
sanitized could reside either in source or in destination
register, and the limit is different depending on whether
the ALU op is addition or subtraction and depending on the
current known and bounded offset. The limit is derived as
follows: limit := max_value_size - (smin_value + off). For
subtraction: limit := umax_value + off. This holds because
we do not allow any pointer arithmetic that would
temporarily go out of bounds or would have an unknown
value with mixed signed bounds where it is unclear at
verification time whether the actual runtime value would
be either negative or positive. For example, we have a
derived map pointer value with constant offset and bounded
one, so limit based on smin_value works because the verifier
requires that statically analyzed arithmetic on the pointer
must be in bounds, and thus it checks if resulting
smin_value + off and umax_value + off is still within map
value bounds at time of arithmetic in addition to time of
access. Similarly, for the case of stack access we derive
the limit as follows: MAX_BPF_STACK + off for subtraction
and -off for the case of addition where off := ptr_reg->off +
ptr_reg->var_off.value. Subtraction is a special case for
the masking which can be in form of ptr += -val, ptr -= -val,
or ptr -= val. In the first two cases where we know that
the value is negative, we need to temporarily negate the
value in order to do the sanitation on a positive value
where we later swap the ALU op, and restore original source
register if the value was in source.
The sanitation of pointer arithmetic alone is still not fully
sufficient as is, since a scenario like the following could
happen ...
PTR += 0x1000 (e.g. K-based imm)
PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
PTR += 0x1000
PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
[...]
... which under speculation could end up as ...
PTR += 0x1000
PTR -= 0 [ truncated by mitigation ]
PTR += 0x1000
PTR -= 0 [ truncated by mitigation ]
[...]
... and therefore still access out of bounds. To prevent such
case, the verifier is also analyzing safety for potential out
of bounds access under speculative execution. Meaning, it is
also simulating pointer access under truncation. We therefore
"branch off" and push the current verification state after the
ALU operation with known 0 to the verification stack for later
analysis. Given the current path analysis succeeded it is
likely that the one under speculation can be pruned. In any
case, it is also subject to existing complexity limits and
therefore anything beyond this point will be rejected. In
terms of pruning, it needs to be ensured that the verification
state from speculative execution simulation must never prune
a non-speculative execution path, therefore, we mark verifier
state accordingly at the time of push_stack(). If verifier
detects out of bounds access under speculative execution from
one of the possible paths that includes a truncation, it will
reject such program.
Given we mask every reg-based pointer arithmetic for
unprivileged programs, we've been looking into how it could
affect real-world programs in terms of size increase. As the
majority of programs are targeted for privileged-only use
case, we've unconditionally enabled masking (with its alu
restrictions on top of it) for privileged programs for the
sake of testing in order to check i) whether they get rejected
in its current form, and ii) by how much the number of
instructions and size will increase. We've tested this by
using Katran, Cilium and test_l4lb from the kernel selftests.
For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
we've used test_l4lb.o as well as test_l4lb_noinline.o. We
found that none of the programs got rejected by the verifier
with this change, and that impact is rather minimal to none.
balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
7,797 bytes JITed before and after the change. Most complex
program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
and 18,538 bytes JITed before and after and none of the other
tail call programs in bpf_lxc.o had any changes either. For
the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
after the change. Other programs from that object file had
similar small increase. Both test_l4lb.o had no change and
remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
JITed and for test_l4lb_noinline.o constant at 5,080 bytes
(634 insns) xlated and 3,313 bytes JITed. This can be explained
in that LLVM typically optimizes stack based pointer arithmetic
by using K-based operations and that use of dynamic map access
is not overly frequent. However, in future we may decide to
optimize the algorithm further under known guarantees from
branch and value speculation. Latter seems also unclear in
terms of prediction heuristics that today's CPUs apply as well
as whether there could be collisions in e.g. the predictor's
Value History/Pattern Table for triggering out of bounds access,
thus masking is performed unconditionally at this point but could
be subject to relaxation later on. We were generally also
brainstorming various other approaches for mitigation, but the
blocker was always lack of available registers at runtime and/or
overhead for runtime tracking of limits belonging to a specific
pointer. Thus, we found this to be minimally intrusive under
given constraints.
With that in place, a simple example with sanitized access on
unprivileged load at post-verification time looks as follows:
# bpftool prog dump xlated id 282
[...]
28: (79) r1 = *(u64 *)(r7 +0)
29: (79) r2 = *(u64 *)(r7 +8)
30: (57) r1 &= 15
31: (79) r3 = *(u64 *)(r0 +4608)
32: (57) r3 &= 1
33: (47) r3 |= 1
34: (2d) if r2 > r3 goto pc+19
35: (b4) (u32) r11 = (u32) 20479 |
36: (1f) r11 -= r2 | Dynamic sanitation for pointer
37: (4f) r11 |= r2 | arithmetic with registers
38: (87) r11 = -r11 | containing bounded or known
39: (c7) r11 s>>= 63 | scalars in order to prevent
40: (5f) r11 &= r2 | out of bounds speculation.
41: (0f) r4 += r11 |
42: (71) r4 = *(u8 *)(r4 +0)
43: (6f) r4 <<= r1
[...]
For the case where the scalar sits in the destination register
as opposed to the source register, the following code is emitted
for the above example:
[...]
16: (b4) (u32) r11 = (u32) 20479
17: (1f) r11 -= r2
18: (4f) r11 |= r2
19: (87) r11 = -r11
20: (c7) r11 s>>= 63
21: (5f) r2 &= r11
22: (0f) r2 += r0
23: (61) r0 = *(u32 *)(r2 +0)
[...]
JIT blinding example with non-conflicting use of r10:
[...]
d5: je 0x0000000000000106 _
d7: mov 0x0(%rax),%edi |
da: mov $0xf153246,%r10d | Index load from map value and
e0: xor $0xf153259,%r10 | (const blinded) mask with 0x1f.
e7: and %r10,%rdi |_
ea: mov $0x2f,%r10d |
f0: sub %rdi,%r10 | Sanitized addition. Both use r10
f3: or %rdi,%r10 | but do not interfere with each
f6: neg %r10 | other. (Neither do these instructions
f9: sar $0x3f,%r10 | interfere with the use of ax as temp
fd: and %r10,%rdi | in interpreter.)
100: add %rax,%rdi |_
103: mov 0x0(%rdi),%eax
[...]
Tested that it fixes Jann's reproducer, and also checked that test_verifier
and test_progs suite with interpreter, JIT and JIT with hardening enabled
on x86-64 and arm64 runs successfully.
[0] Speculose: Analyzing the Security Implications of Speculative
Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
https://arxiv.org/pdf/1801.04084.pdf
[1] A Systematic Evaluation of Transient Execution Attacks and
Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
Dmitry Evtyushkin, Daniel Gruss,
https://arxiv.org/pdf/1811.05441.pdf
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vallish Vaidyeshwara <vallish@amazon.com>
[some checkpatch cleanups and backported to 4.14 by sblbir]
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7137c4eab85c1cf3d46acdde90ce1163b28c873 upstream.
In check_map_access() we probe actual bounds through __check_map_access()
with offset of reg->smin_value + off for lower bound and offset of
reg->umax_value + off for the upper bound. However, even though the
reg->smin_value could have a negative value, the final result of the
sum with off could be positive when pointer arithmetic with known and
unknown scalars is combined. In this case we reject the program with
an error such as "R<x> min value is negative, either use unsigned index
or do a if (index >=0) check." even though the access itself would be
fine. Therefore extend the check to probe whether the actual resulting
reg->smin_value + off is less than zero.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[backported to 4.14 sblbir]
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d7eceede769f90b66cfa06ad5b357140d5141ed upstream.
For unknown scalars of mixed signed bounds, meaning their smin_value is
negative and their smax_value is positive, we need to reject arithmetic
with pointer to map value. For unprivileged the goal is to mask every
map pointer arithmetic and this cannot reliably be done when it is
unknown at verification time whether the scalar value is negative or
positive. Given this is a corner case, the likelihood of breaking should
be very small.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[backported to 4.14 sblbir]
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4298d25830a866cc0f427d4bccb858e76715859 upstream.
Restrict stack pointer arithmetic for unprivileged users in that
arithmetic itself must not go out of bounds as opposed to the actual
access later on. Therefore after each adjust_ptr_min_max_vals() with
a stack pointer as a destination we simulate a check_stack_access()
of 1 byte on the destination and once that fails the program is
rejected for unprivileged program loads. This is analog to map
value pointer arithmetic and needed for masking later on.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[backported to 4.14 sblbir]
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>