IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 745e10146c31b1c6ed3326286704ae251b17f663 ]
"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),
do {
set_store_user_clean(cachep);
drain_cpu_caches(cachep);
...
} while (!is_store_user_clean(cachep));
For example,
do_drain
slabs_destroy
slab_destroy
kmem_cache_free
__cache_free
___cache_free
kmemleak_free_recursive
delete_object_full
__delete_object
put_object
free_object_rcu
kmem_cache_free
cache_free_debugcheck --> dirty kmemleak_object
One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show(). The other is to set store_user_clean
after drain_cpu_caches() which leaves a small window between
drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
be dirty again lead to slightly wrong information has been stored but
could also speed up things significantly which sounds like a good
compromise. For example,
# cat /proc/slab_allocators
0m42.778s # 1st approach
0m0.737s # 2nd approach
[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
Fixes: d31676dfde25 ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0fd50504a54f5548eb666dc16ddf8394e44e4b7 ]
If not find zero bit in find_next_zero_bit(), it will return the size
parameter passed in, so the start bit should be compared with bitmap_maxno
rather than cma->count. Although getting maxchunk is working fine due to
zero value of order_per_bit currently, the operation will be stuck if
order_per_bit is set as non-zero.
Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b59e01a3aa665f751d1410b99fae9336bd424e1 ]
Currently one bit in cma bitmap represents number of pages rather than
one page, cma->count means cma size in pages. So to find available pages
via find_next_zero_bit()/find_next_bit() we should use cma size not in
pages but in bits although current free pages number is correct due to
zero value of order_per_bit. Once order_per_bit is changed the bitmap
status will be incorrect.
The size input in cma_debug_show_areas() is not correct. It will
affect the available pages at some position to debug the failure issue.
This is an example with order_per_bit = 1
Before this change:
[ 4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages
After this change:
[ 4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages
Obviously the bitmap status before is incorrect.
Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1df3a339074e31db95c4790ea9236874b13ccd87 ]
f022d8cb7ec7 ("mm: cma: Don't crash on allocation if CMA area can't be
activated") fixes the crash issue when activation fails via setting
cma->count as 0, same logic exists if bitmap allocation fails.
Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 299c83dce9ea3a79bb4b5511d2cb996b6b8e5111 ]
342332e6a925 ("mm/page_alloc.c: introduce kernelcore=mirror option") and
later patches rewrote the calculation of node spanned pages.
e506b99696a2 ("mem-hotplug: fix node spanned pages when we have a movable
node"), but the current code still has problems,
When we have a node with only zone_movable and the node id is not zero,
the size of node spanned pages is double added.
That's because we have an empty normal zone, and zone_start_pfn or
zone_end_pfn is not between arch_zone_lowest_possible_pfn and
arch_zone_highest_possible_pfn, so we need to use clamp to constrain the
range just like the commit <96e907d13602> (bootmem: Reimplement
__absent_pages_in_range() using for_each_mem_pfn_range()).
e.g.
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000023fffffff]
Movable zone start for each node
Node 0: 0x0000000100000000
Node 1: 0x0000000140000000
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x00000000bffdffff]
node 0: [mem 0x0000000100000000-0x000000013fffffff]
node 1: [mem 0x0000000140000000-0x000000023fffffff]
node 0 DMA spanned:0xfff present:0xf9e absent:0x61
node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020
node 0 Normal spanned:0 present:0 absent:0
node 0 Movable spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA spanned:0 present:0 absent:0
node 1 DMA32 spanned:0 present:0 absent:0
node 1 Normal spanned:0x100000 present:0x100000 absent:0
node 1 Movable spanned:0x100000 present:0x100000 absent:0
On node 1 totalpages(node_present_pages): 2097152
node_spanned_pages:2097152
Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K
cma-reserved)
It shows that the current memory of node 1 is double added.
After this patch, the problem is fixed.
node 0 DMA spanned:0xfff present:0xf9e absent:0x61
node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020
node 0 Normal spanned:0 present:0 absent:0
node 0 Movable spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA spanned:0 present:0 absent:0
node 1 DMA32 spanned:0 present:0 absent:0
node 1 Normal spanned:0 present:0 absent:0
node 1 Movable spanned:0x100000 present:0x100000 absent:0
On node 1 totalpages(node_present_pages): 1048576
node_spanned_pages:1048576
memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K
cma-reserved)
Link: http://lkml.kernel.org/r/1554178276-10372-1-git-send-email-fanglinxu@huawei.com
Signed-off-by: Linxu Fang <fanglinxu@huawei.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0919e1b69ab459e06df45d3ba6658d281962db80 ]
When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation. When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored. PagePrivate
being set at free huge page time mostly happens on error paths.
When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem. If so,
pages are also reserved within the filesystem. The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem. However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd. Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G -4.0M 4.1G - /opt/hugepool
To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.
I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.
Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 734fb89968900b5c5f8edd5038bd4cdeab8c61d2 ]
To avoid random config build issue, select mmu notifier when HMM is
selected. In any cases when HMM get selected it will be by users that
will also wants the mmu notifier.
Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be167862ae7dd85c56d385209a4890678e1b0488 ]
Patch series "compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING", v3.
This patch (of 11):
When function tracing for IPIs is enabled, we get a warning for an
overflow of the ipi_types array with the IPI_CPU_BACKTRACE type as
triggered by raise_nmi():
arch/arm/kernel/smp.c: In function 'raise_nmi':
arch/arm/kernel/smp.c:489:2: error: array subscript is above array bounds [-Werror=array-bounds]
trace_ipi_raise(target, ipi_types[ipinr]);
This is a correct warning as we actually overflow the array here.
This patch raise_nmi() to call __smp_cross_call() instead of
smp_cross_call(), to avoid calling into ftrace. For clarification, I'm
also adding a two new code comments describing how this one is special.
The warning appears to have shown up after commit e7273ff49acf ("ARM:
8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI"), which changed the
number assignment from '15' to '8', but as far as I can tell has existed
since the IPI tracepoints were first introduced. If we decide to
backport this patch to stable kernels, we probably need to backport
e7273ff49acf as well.
[yamada.masahiro@socionext.com: rebase on v5.1-rc1]
Link: http://lkml.kernel.org/r/20190423034959.13525-2-yamada.masahiro@socionext.com
Fixes: e7273ff49acf ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI")
Fixes: 365ec7b17327 ("ARM: add IPI tracepoints") # v3.17
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e260ad01f0aa9e96b5386d5cd7184afd949dc457 ]
Currently when userspace gives us a values that overflow e.g. file-max
and other callers of __do_proc_doulongvec_minmax() we simply ignore the
new value and leave the current value untouched.
This can be problematic as it gives the illusion that the limit has
indeed be bumped when in fact it failed. This commit makes sure to
return EINVAL when an overflow is detected. Please note that this is a
userspace facing change.
Link: http://lkml.kernel.org/r/20190210203943.8227-4-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd8309de0d60838eef6fb575b0c4c7e95841cf73 ]
fsync() needs to make sure the data & meta-data of file are persistent
after the return of fsync(), even when a power-failure occurs later. In
the case of fat-fs, the FAT belongs to the meta-data of file, so we need
to issue a flush after the writeback of FAT instead before.
Also bail out early when any stage of fsync fails.
Link: http://lkml.kernel.org/r/20190409030158.136316-1-houtao1@huawei.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f9fc54d313fab2834f44f516459cdc8ac91d797f upstream.
The return type for get_regs_len in struct ethtool_ops is int,
the hns3 driver may return error when failing to get the regs
len by sending cmd to firmware.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 upstream.
Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is disabled.
Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream.
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.
This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.
The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.
However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.
See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/https://lwn.net/Articles/180387https://lwn.net/Articles/180396
for historic context.
The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:
kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...
Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.
Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.
FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:
See
https://github.com/libfuse/osspdhttps://lwn.net/Articles/308445https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:
https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.
2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.
3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.
4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.
It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).
This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.
This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.
Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.
The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4bf63482c309287ca84d91770ffa7dcc18e37eb upstream.
Most, if not all, Quectel devices use dynamic interface numbers, and
users are able to change the USB configuration at will. Matching on for
example interface number is therefore not possible.
Instead, the QMI device can be identified by looking at the interface
class, subclass and protocol (all 0xff), as well as the number of
endpoints. The reason we need to look at the number of endpoints, is
that the diagnostic port interface has the same class, subclass and
protocol as QMI. However, the diagnostic port only has two endpoints,
while QMI has three.
Until now, we have identified the QMI device by combining a match on
class, subclass and protocol, with a call to the function
quectel_diag_detect(). In quectel_diag_detect(), we check if the number
of endpoints matches for known Quectel vendor/product ids.
Adding new vendor/product ids to quectel_diag_detect() is not a good
long-term solution. This commit replaces the function with a quirk, and
applies the quirk to affected Quectel devices that I have been able to
test the change with (EP06, EM12 and EC25). If the quirk is set and the
number of endpoints equal two, we return from qmi_wwan_probe() with
-ENODEV.
[In order for this patch to apply cleanly to 4.14, two minor changes had
to be made. First, the original work-around (quectel_diag_detect()) for
the dynamic interface numbers was never backported to 4.14, so there is
no need to remove this code. Second, support for the EM12 was also not
backported to 4.14. Since supporting EM12 is a trivial change (just
another VID/PID match), and the match for EM12 is changed by this patch,
I chose to not submit adding EM12-support as a separate patch.]
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4cdd17ba1dff20ffc99fdbd2e6f0201fc7fe67df upstream.
We need to compute the uart state only on the first open. This is
usually what is done in the ->install hook. serial_core used to do this
in ->open on every open. So move it to ->install.
As a side effect, it ensures the state is set properly in the window
after tty_init_dev is called, but before uart_open. This fixes a bunch
of races between tty_open and flush_to_ldisc we were dealing with
recently.
One of such bugs was attempted to fix in commit fedb5760648a (serial:
fix race between flush_to_ldisc and tty_open), but it only took care of
a couple of functions (uart_start and uart_unthrottle). I was able to
reproduce the crash on a SLE system, but in uart_write_room which is
also called from flush_to_ldisc via process_echoes. I was *unable* to
reproduce the bug locally. It is due to having this patch in my queue
since 2012!
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L 4.12.14-396-default #1 SLE15-SP1 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
task: ffff8800427d8040 task.stack: ffff8800427f0000
RIP: 0010:uart_write_room+0xc4/0x590
RSP: 0018:ffff8800427f7088 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90
RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400
R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12
R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178
FS: 0000000000000000(0000) GS:ffff880043300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0
Call Trace:
tty_write_room+0x6d/0xc0
__process_echoes+0x55/0x870
n_tty_receive_buf_common+0x105e/0x26d0
tty_ldisc_receive_buf+0xb7/0x1c0
tty_port_default_receive_buf+0x107/0x180
flush_to_ldisc+0x35d/0x5c0
...
0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is
tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at
uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount
used in uart_port_lock from uart_write_room.
So revert the upstream commit here as my local patch should fix the
whole family.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Wang Li <wangli39@baidu.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 396dd8143bdd94bd1c358a228a631c8c895a1126 upstream.
On many (all?) the Gemini Lake systems we work with, there is frequent
momentary graphical corruption at the top of the screen, and it seems
that disabling framebuffer compression can avoid this.
The ticket was reported 6 months ago and has already affected a
multitude of users, without any real progress being made. So, lets
disable framebuffer compression on GeminiLake until a solution is found.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108085
Fixes: fd7d6c5c8f3e ("drm/i915: enable FBC on gen9+ too")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.11+
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190423092810.28359-1-jian-hong@endlessm.com
(cherry picked from commit 1d25724b41fad7eeb2c3058a5c8190d6ece73e08)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d90c06d57027203f73021bb7ddb30b800d65c636 upstream.
This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.
Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e26ccb119bde03584be53406bbd22e711b0d6e6 upstream.
Instead of the closest reference divider prefer the lowest,
this fixes flickering issues on HP Compaq nx9420.
Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108514
Suggested-by: Paul Dufresne <dufresnep@gmail.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d6fea5744d6798353f37ac42a8a653a2607ca69 upstream.
In case we need to use them for GPU reset prior initializing the
asic. Fixes a crash if the driver attempts to reset the GPU at driver
load time.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b30a43ac7132cdda833ac4b13dd1ebd35ace14b7 upstream.
There was a nouveau DDX that relied on legacy context ioctls to work,
but we fixed it years ago, give distros that have a modern DDX the
option to break the uAPI and close the mess of holes that legacy
context support is.
Full context of the story:
commit 0e975980d435d58df2d430d688b8c18778b42218
Author: Peter Antoine <peter.antoine@intel.com>
Date: Tue Jun 23 08:18:49 2015 +0100
drm: Turn off Legacy Context Functions
The context functions are not used by the i915 driver and should not
be used by modeset drivers. These driver functions contain several bugs
and security holes. This change makes these functions optional can be
turned on by a setting, they are turned off by default for modeset
driver with the exception of the nouvea driver that may require them with
an old version of libdrm.
The previous attempt was
commit 7c510133d93dd6f15ca040733ba7b2891ed61fd1
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Aug 8 15:41:21 2013 +0200
drm: mark context support as a legacy subsystem
but this had to be reverted
commit c21eb21cb50d58e7cbdcb8b9e7ff68b85cfa5095
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Sep 20 08:32:59 2013 +1000
Revert "drm: mark context support as a legacy subsystem"
v2: remove returns from void function, and formatting (Daniel Vetter)
v3:
- s/Nova/nouveau/ in the commit message, and add references to the
previous attempts
- drop the part touching the drm hw lock, that should be a separate
patch.
Signed-off-by: Peter Antoine <peter.antoine@intel.com> (v2)
Cc: Peter Antoine <peter.antoine@intel.com> (v2)
Reviewed-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: move DRM_VM dependency into legacy config.
v3: fix missing dep (kbuild robot)
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd17cc5a20ae9aaa3ed775f360b75ff93cd66a1d upstream.
The limit here is supposed to be how much of the page is left, but it's
just using PAGE_SIZE as the limit.
The other thing to remember is that snprintf() returns the number of
bytes which would have been copied if we had had enough room. So that
means that if we run out of space then this code would end up passing a
negative value as the limit and the kernel would print an error message.
I have change the code to use scnprintf() which returns the number of
bytes that were successfully printed (not counting the NUL terminator).
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 110080cea0d0e4dfdb0b536e7f8a5633ead6a781 upstream.
There are a couple potential integer overflows here.
round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);
The first thing is that the "m->size + (...)" addition could overflow,
and the second is that round_up() overflows to zero if the result is
within PAGE_SIZE of the type max.
In this code, the "m->size" variable is an u64 but we're saving the
result in "map_size" which is an unsigned long and genwqe_user_vmap()
takes an unsigned long as well. So I have used ULONG_MAX as the upper
bound. From a practical perspective unsigned long is fine/better than
trying to change all the types to u64.
Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 9547d81ac3bc0d2b9729a28e7dd610007144a837 which is
commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream.
Petr writes:
Karl has reported to me today, that he's experiencing weird
reboot hang on his devices with 4.9.180 kernel and that he has
bisected it down to my backported patch.
I would like to kindly ask you for removal of this patch. This
patch should be reverted from all stable kernels up to 5.1,
because perf counters were not broken on those kernels, and this
patch won't work on the ath79 legacy IRQ code anyway, it needs
new irqchip driver which was enabled on ath79 with commit
51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code").
Reported-by: Petr Štetiar <ynezz@true.cz>
Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Cc: John Crispin <john@phrozen.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4f2d1af7163becb181419af9dece9206001e0a6 upstream.
The pistachio platform uses the U-Boot bootloader & generally boots a
kernel in the uImage format. As such it's useful to build one when
building the kernel, but to do so currently requires the user to
manually specify a uImage target on the make command line.
Make uImage.gz the pistachio platform's default build target, so that
the default is to build a kernel image that we can actually boot on a
board such as the MIPS Creator Ci40.
Marked for stable backport as far as v4.1 where pistachio support was
introduced. This is primarily useful for CI systems such as kernelci.org
which will benefit from us building a suitable image which can then be
booted as part of automated testing, extending our test coverage to the
affected stable branches.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
URL: https://groups.io/g/kernelci/message/388
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 074a1e1167afd82c26f6d03a9a8b997d564bb241 upstream.
The virt_addr_valid() function is meant to return true iff
virt_to_page() will return a valid struct page reference. This is true
iff the address provided is found within the unmapped address range
between PAGE_OFFSET & MAP_BASE, but we don't currently check for that
condition. Instead we simply mask the address to obtain what will be a
physical address if the virtual address is indeed in the desired range,
shift it to form a PFN & then call pfn_valid(). This can incorrectly
return true if called with a virtual address which, after masking,
happens to form a physical address corresponding to a valid PFN.
For example we may vmalloc an address in the kernel mapped region
starting a MAP_BASE & obtain the virtual address:
addr = 0xc000000000002000
When masked by virt_to_phys(), which uses __pa() & in turn CPHYSADDR(),
we obtain the following (bogus) physical address:
addr = 0x2000
In a common system with PHYS_OFFSET=0 this will correspond to a valid
struct page which should really be accessed by virtual address
PAGE_OFFSET+0x2000, causing virt_addr_valid() to incorrectly return 1
indicating that the original address corresponds to a struct page.
This is equivalent to the ARM64 change made in commit ca219452c6b8
("arm64: Correctly bounds check virt_addr_valid").
This fixes fallout when hardened usercopy is enabled caused by the
related commit 517e1fbeb65f ("mm/usercopy: Drop extra
is_vmalloc_or_module() check") which removed a check for the vmalloc
range that was present from the introduction of the hardened usercopy
feature.
Signed-off-by: Paul Burton <paul.burton@mips.com>
References: ca219452c6b8 ("arm64: Correctly bounds check virt_addr_valid")
References: 517e1fbeb65f ("mm/usercopy: Drop extra is_vmalloc_or_module() check")
Reported-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: YunQiang Su <ysu@wavecomp.com>
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929366
Cc: stable@vger.kernel.org # v4.12+
Cc: linux-mips@vger.kernel.org
Cc: Yunqiang Su <ysu@wavecomp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49b809586730a77b57ce620b2f9689de765d790b upstream.
This driver does not support reading more than 255 bytes at once because
the register for storing the number of bytes to read is only 8 bits. Add
a max_read_len quirk to enforce this.
This was found when using this driver with the SFP driver, which was
previously reading all 256 bytes in the SFP EEPROM in one transaction.
This caused a bunch of hard-to-debug errors in the xiic driver since the
driver/logic was treating the number of bytes to read as zero.
Rejecting transactions that aren't supported at least allows the problem
to be diagnosed more easily.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec527c318036a65a083ef68d8ba95789d2212246 upstream.
As explained in
0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.
That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine
reboots.
Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.
Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.
Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8880fa32c557600f5f624084152668ed3c2ea51e upstream.
The ram pstore backend has always had the crash dumper frontend enabled
unconditionally. However, it was possible to effectively disable it
by setting a record_size=0. All the machinery would run (storing dumps
to the temporary crash buffer), but 0 bytes would ultimately get stored
due to there being no przs allocated for dumps. Commit 89d328f637b9
("pstore/ram: Correctly calculate usable PRZ bytes"), however, assumed
that there would always be at least one allocated dprz for calculating
the size of the temporary crash buffer. This was, of course, not the
case when record_size=0, and would lead to a NULL deref trying to find
the dprz buffer size:
BUG: unable to handle kernel NULL pointer dereference at (null)
...
IP: ramoops_probe+0x285/0x37e (fs/pstore/ram.c:808)
cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
Instead, we need to only enable the frontends based on the success of the
prz initialization and only take the needed actions when those zones are
available. (This also fixes a possible error in detecting if the ftrace
frontend should be enabled.)
Reported-and-tested-by: Yaro Slav <yaro330@gmail.com>
Fixes: 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea84b580b95521644429cc6748b6c2bf27c8b0f3 upstream.
Instead of running with interrupts disabled, use a semaphore. This should
make it easier for backends that may need to sleep (e.g. EFI) when
performing a write:
|BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
|in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
|Preemption disabled at:
|[<ffffffff99d60512>] pstore_dump+0x72/0x330
|CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45
|Call Trace:
| dump_stack+0x4f/0x6a
| ___might_sleep.cold.91+0xd3/0xe4
| __might_sleep+0x50/0x90
| wait_for_completion+0x32/0x130
| virt_efi_query_variable_info+0x14e/0x160
| efi_query_variable_store+0x51/0x1a0
| efivar_entry_set_safe+0xa3/0x1b0
| efi_pstore_write+0x109/0x140
| pstore_dump+0x11c/0x330
| kmsg_dump+0xa4/0xd0
| oops_exit+0x22/0x30
...
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b77fa617a2ff4d6beccad3d3d4b3a1f2d10368aa upstream.
Since the console writer does not use the preallocated crash dump buffer
any more, there is no reason to perform locking around it.
Fixes: 70ad35db3321 ("pstore: Convert console write to use ->write_buf")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream.
Do the proper cleanup in case the size check fails.
Tested with xfstests:generic/228
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate")
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63923d2c3800919774f5c651d503d1dd2adaddd5 upstream.
We only support I/O to kernel space. Using %sr1 to load the coherence
index may be racy unless interrupts are disabled. This patch changes the
code used to load the coherence index to use implicit space register
selection. This saves one instruction and eliminates the race.
Tested on rp3440, c8000 and c3750.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream.
Herbert Xu pointed out that commit bb73c52bad36 ("rcu: Don't disable
preemption for Tiny and Tree RCU readers") was incorrect in making the
preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
a no-op, but still is a compiler barrier.
And RCU locking still _needs_ that compiler barrier.
It is simply fundamentally not true that RCU locking would be a complete
no-op: we still need to guarantee (for example) that things that can
trap and cause preemption cannot migrate into the RCU locked region.
The way we do that is by making it a barrier.
See for example commit 386afc91144b ("spinlocks and preemption points
need to be at least compiler barriers") from back in 2013 that had
similar issues with spinlocks that become no-ops on UP: they must still
constrain the compiler from moving other operations into the critical
region.
Now, it is true that a lot of RCU operations already use READ_ONCE() and
WRITE_ONCE() (which in practice likely would never be re-ordered wrt
anything remotely interesting), but it is also true that that is not
globally the case, and that it's not even necessarily always possible
(ie bitfields etc).
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
Cc: stable@kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4970b42d5c362bf873982db7d93245c5281e58f4 ]
This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849.
Nathan reported the new behaviour breaks Android, as Android just add
new rules and delete old ones.
If we return 0 without adding dup rules, Android will remove the new
added rules and causing system to soft-reboot.
Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Yaro Slav <yaro330@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 691306ebd18f945e44b4552a4bfcca3475e5d957 as the
patch that this "fixes" is about to be reverted...
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b7999b07726c16974ba9ca3bb9fe98ecbec5f81c ]
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 28e74a7cfd6403f0d1c0f8b10b45d6fae37b227e ]
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time. This behaviour is not specified
in the SFP MSAs, which specifies:
"The serial interface uses the 2-wire serial CMOS E2PROM protocol
defined for the ATMEL AT24C01A/02/04 family of components."
and
"As long as the SFP+ receives an acknowledge, it shall serially clock
out sequential data words. The sequence is terminated when the host
responds with a NACK and a STOP instead of an acknowledge."
We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ]
As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ]
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.
The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.
Only eat these 2 bytes if hdrincl == 0.
Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 720f1de4021f09898b8c8443f3b3e995991b6e3a ]
Currently, the process issuing a "start" command on the pktgen procfs
interface, acquires the pktgen thread lock and never release it, until
all pktgen threads are completed. The above can blocks indefinitely any
other pktgen command and any (even unrelated) netdevice removal - as
the pktgen netdev notifier acquires the same lock.
The issue is demonstrated by the following script, reported by Matteo:
ip -b - <<'EOF'
link add type dummy
link add type veth
link set dummy0 up
EOF
modprobe pktgen
echo reset >/proc/net/pktgen/pgctrl
{
echo rem_device_all
echo add_device dummy0
} >/proc/net/pktgen/kpktgend_0
echo count 0 >/proc/net/pktgen/dummy0
echo start >/proc/net/pktgen/pgctrl &
sleep 1
rmmod veth
Fix the above releasing the thread lock around the sleep call.
Additionally we must prevent racing with forcefull rmmod - as the
thread lock no more protects from them. Instead, acquire a self-reference
before waiting for any thread. As a side effect, running
rmmod pktgen
while some thread is running now fails with "module in use" error,
before this patch such command hanged indefinitely.
Note: the issue predates the commit reported in the fixes tag, but
this fix can't be applied before the mentioned commit.
v1 -> v2:
- no need to check for thread existence after flipping the lock,
pktgen threads are freed only at net exit time
-
Fixes: 6146e6a43b35 ("[PKTGEN]: Removes thread_{un,}lock() macros.")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ]
When the following tests last for several hours, the problem will occur.
Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
The following will occur.
"
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
%
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
"
>From vmcore, we can find that clean_list is NULL.
>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.
In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
/* more than one entry in llist nodes */
if (clean_nodes->next)
llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.
Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ]
Querying EEPROM high pages data for SFP module is currently
not supported by our driver but is still tried, resulting in
invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module to
limit the reading for page 0 only and prevent invalid FW queries.
Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to
INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not
updated to use the altered key. The result is that every packet Tx does
a lookup on the gateway address which does not find an entry, a new one
is created only to find the existing one in the table right before the
insert since arp_constructor was updated to reset the primary key. This
is seen in the allocs and destroys counters:
ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY")
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a8dd9f67cd0da7dc284f48b032ce00db1a68791 ]
syzbot found the following leak in sctp_process_init
BUG: memory leak
unreferenced object 0xffff88810ef68400 (size 1024):
comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
hex dump (first 32 bytes):
1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25 ..(........h...%
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
[inline]
[<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
[<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
[<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
[<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
[<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
[<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
net/sctp/sm_make_chunk.c:2437
[<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
[inline]
[<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
[inline]
[<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
[inline]
[<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
[<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
net/sctp/associola.c:1074
[<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
[<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
[<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
[<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
[<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
[<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
[<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
[<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
[<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
[<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
[<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
[<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
[<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
[<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3
The problem was that the peer.cookie value points to an skb allocated
area on the first pass through this function, at which point it is
overwritten with a heap allocated value, but in certain cases, where a
COOKIE_ECHO chunk is included in the packet, a second pass through
sctp_process_init is made, where the cookie value is re-allocated,
leaking the first allocation.
Fix is to always allocate the cookie value, and free it when we are done
using it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ee4e76937d69128a6a66861ba393ebdc2ffc8a2 ]
ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
and pass it to the kernel driver via ops->get_regs() for filling.
There is no restriction about what the kernel drivers can or cannot do
with the open ethtool_regs structure. They usually set regs->version
and ignore regs->len or set it to the same size as ops->get_regs_len().
But if userspace allocates a smaller buffer for the registers dump,
we would cause a userspace buffer overflow in the final copy_to_user()
call, which uses the regs.len value potentially reset by the driver.
To fix this, make this case obvious and store regs.len before calling
ops->get_regs(), to only copy as much data as requested by userspace,
up to the value returned by ops->get_regs_len().
While at it, remove the redundant check for non-null regbuf.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>