IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 504c76979bccec66e4c2e41f6a006e49e284466f ]
There is no check that allocation in axp20x_funcs_groups_from_mask
is successful.
The patch adds corresponding check and return values.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 66110abc4c931f879d70e83e1281f891699364bf ]
PG_checked flag will be set on data page during GC, later, we can
recognize such page by the flag and migrate page to cold segment.
But previously, we don't clear this flag when invalidating data page,
after page redirtying, we will write it into wrong log.
Let's clear PG_checked flag in set_page_dirty() to avoid this.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 92aef4675d5b1b55404e1532379e343bed0e5cf2 ]
Currently when virtio_find_single_vq fails, we go through del_vqs which
throws a warning (Trying to free already-free IRQ). Skip del_vqs if vq
allocation failed.
Link: http://lkml.kernel.org/r/20180524101021.49880-1-jean-philippe.brucker@arm.com
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f476d7c540cb57556d3cc7e78704e6cd5100f5f ]
It may be possible to run p9_fd_cancel() with a deleted req->req_list
and incur in a double del. To fix hold the client->lock while changing
the status, so the other threads will be synchronized.
Link: http://lkml.kernel.org/r/20180723184253.6682-1-tomasbortoli@gmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com
To: Eric Van Hensbergen <ericvh@gmail.com>
To: Ron Minnich <rminnich@sandia.gov>
To: Latchesar Ionkov <lucho@ionkov.net>
Cc: Yiwen Jiang <jiangyiwen@huwei.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0702bc4d2fe793018ad9aa0eb14bff7f526c4095 ]
When compiling bmips with SMP disabled, the build fails with:
drivers/irqchip/irq-bcm7038-l1.o: In function `bcm7038_l1_cpu_offline':
drivers/irqchip/irq-bcm7038-l1.c:242: undefined reference to `irq_set_affinity_locked'
make[5]: *** [vmlinux] Error 1
Fix this by adding and setting bcm7038_l1_cpu_offline only when actually
compiling for SMP. It wouldn't have been used anyway, as it requires
CPU_HOTPLUG, which in turn requires SMP.
Fixes: 34c535793bcb ("irqchip/bcm7038-l1: Implement irq_cpu_offline() callback")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4096165d55218a6f58b6c2ebc5d2428aa0aa70e4 ]
If there are any errors in stm32_exti_host_init() then it leads to a
NULL dereference in the callers. The function should clean up after
itself.
Fixes: f9fc1745501e ("irqchip/stm32: Add host and driver data structures")
Reviewed-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4938c79bd0f5f3650c8c2cd4cdc972f0a6962ce4 ]
If you use a 64-bit compiler to build a 32-bit kernel then you'll get an
error when building the vDSO due to a library mismatch. The happens
because the relevant "-march" argument isn't supplied to the GCC run
that generates one of the vDSO intermediate files.
I'm not actually sure what the right thing to do here is as I'm not
particularly familiar with the kernel build system. I poked the
documentation and it appears that KCFLAGS is the correct thing to do
(it's suggested that should be used when building modules), but we set
KBUILD_CFLAGS in arch/riscv/Makefile.
This does at least fix the build error.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c7079853c859c910b9d047a37891b4aafb8f8dd7 ]
Thread A Background GC
- f2fs_zero_range
- truncate_pagecache_range
- gc_data_segment
- get_read_data_page
- move_data_page
- set_page_dirty
- set_cold_data
- f2fs_do_zero_range
- dn->data_blkaddr = NEW_ADDR;
- f2fs_set_data_blkaddr
Actually, we don't need to set dirty & checked flag on the page, since
all valid data in the page should be zeroed by zero_range().
Use i_gc_rwsem[WRITE] to avoid such race condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f4417d693b43fa240ac8bde4487f67745ca23d8 ]
The argument to nsinfo__copy() was assumed to be valid, but some code paths
exist that will lead to NULL being passed.
In particular, running 'perf script -D' on a perf.data file containing an
PERF_RECORD_MMAP event associating the '[vdso]' dso with pid 0 earlier in
the event stream will lead to a segfault.
Since all calling code is already checking for a non-null return value,
just return NULL for this case as well.
Signed-off-by: Benno Evers <bevers@mesosphere.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180810133614.9925-1-bevers@mesosphere.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 512ddf7d7db056edfed3159ea7cb4e4a5eefddd4 ]
If coccicheck fails, it should return an error code distinct from zero
to signal about an internal problem. Current code instead of exiting with
the tool's error code returns the error code of 'echo "coccicheck failed"'
which is almost always equals to zero, thus failing the original intention
of alerting about a problem. This patch fixes the code.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov <efremov@linux.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dddc0557e3a02499ce336b1e2e67f5afaecccc80 ]
[Why]
A null pointer deference can occur if crtc is null in
amdgpu_dm_crtc_handle_crc_irq. This can happen if get_crtc_by_otg_inst
returns NULL during dm_crtc_high_irq, leading to a hang in some IGT
test cases.
[How]
Check that CRTC is non-null before accessing its fields.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f0e89359775ee21fe1ea732e34edb52aef5addf ]
In commit 27d868b5e6cf ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.
Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below). This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.
One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.
Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 60081dcc4fce385ade26d3145b2479789df0b7e5 ]
For marvell phy m88e1510, bit SUPPORTED_FIBRE of phydev->supported
is default on. Both phy_resume() and phy_suspend() will check the
SUPPORTED_FIBRE bit and write register of fibre page.
Currently in hns3 driver, the SUPPORTED_FIBRE bit will be cleared
after phy_connect_direct() finished. Because phy_resume() is called
in phy_connect_direct(), and phy_suspend() is called when disconnect
phy device, so the operation for fibre page register is not symmetrical.
It will cause phy link issue when reload hns3 driver.
This patch fixes it by disable the SUPPORTED_FIBRE before connecting
phy.
Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b089cfd95d32638335c551651a8e00fd2c4edb0b ]
Don't warn for a flush issued to a read-only device. It's not strictly
a writable command, as it doesn't change any on-media data by itself.
Reported-by: Stefan Agner <stefan@agner.ch>
Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c39d5278e62956238a681e4cfc69fae5507fc57 ]
According to the functional specification of hardware, the first
descriptor of response from command 'lookup vlan talbe' is not valid.
Currently, the first descriptor is parsed as normal value, which will
cause an expected error.
This patch fixes this problem by skipping the first descriptor.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 344353366591acf659a0d0dea498611da78d67e2 ]
The auxtrace init variable 'err' was not being initialized, leading perf
to abort early in an SPE record command when there was no explicit
error, rather only based whatever memory contents were on the stack.
Initialize it explicitly on getting an SPE successfully, the same way
cs-etm does.
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dongjiu Geng <gengdongjiu@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support")
Link: http://lkml.kernel.org/r/20180810174512.52900813e57cbccf18ce99a2@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f016b19a9275089a2ab06c2144567c2ad8d5d6ad ]
The value coming from acpi_hw_read() should not be used if it
returns an error code, so check the status returned by it before
using that value in two places in acpi_hw_register_read().
Reported-by: Mark Gross <mark.gross@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]
hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.
Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Acked-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 25677478474a91fa1b46f19a4a591a9848bca6fb ]
We cannot do it last, otherwithse it will be skipped for dynamic
volumes.
Reported-by: Lachmann, Juergen <juergen.lachmann@harman.com>
Fixes: 34653fd8c46e ("ubi: fastmap: Check each mapping only once")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 037b0b86ecf5646f8eae777d8b52ff8b401692ec ]
Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:
[root@bar]# cat foo.c
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/tcp.h>
#include <linux/in.h>
int main(void)
{
int sock = socket(PF_INET, SOCK_STREAM, 0);
setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
return 0;
}
[root@bar]# gcc foo.c -O2 -Wall
[root@bar]# lsmod | grep sctp
[root@bar]# ./a.out
[root@bar]# lsmod | grep sctp
sctp 1077248 4
libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
[root@bar]#
Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:
[root@bar]# lsmod | grep sctp
[root@bar]# ./a.out
[root@bar]# lsmod | grep sctp
[root@bar]#
Sockmap is not affected from this since it's either built-in or not.
Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e673b23b541b8e7f773b2d378d6eb99831741cd ]
Shaochun Chen points out we leak dumper filter state allocations
stored in dump_control->data in case there is an error before netlink sets
cb_running (after which ->done will be called at some point).
In order to fix this, add .start functions and move allocations there.
Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6
("netfilter: nf_tables: move dumper state allocation into ->start").
Reported-by: shaochun chen <cscnull@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 880b29ac107d15644bf4da228376ba3cd6af6d71 ]
Add entry to WMI keymap for lid flip event on Asus UX360.
On Asus Zenbook ux360 flipping lid from/to tablet mode triggers
keyscan code 0xfa which cannot be handled and results in kernel
log message "Unknown key fa pressed".
Signed-off-by: Aleh Filipovich<aleh@appnexus.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a148ce15375fc664ad64762c751c0c2aecb2cafe ]
eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc()
in xt_alloc_table_info()") has unintentionally fortified
xt_alloc_table_info allocation when __GFP_RETRY has been dropped from
the vmalloc fallback. Later on there was a syzbot report that this
can lead to OOM killer invocations when tables are too large and
0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
has been merged to restore the original behavior. Georgi Nikolov however
noticed that he is not able to install his iptables anymore so this can
be seen as a regression.
The primary argument for 0537250fdc6c was that this allocation path
shouldn't really trigger the OOM killer and kill innocent tasks. On the
other hand the interface requires root and as such should allow what the
admin asks for. Root inside a namespaces makes this more complicated
because those might be not trusted in general. If they are not then such
namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY
and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.
Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
Reported-by: Georgi Nikolov <gnikolov@icdsoft.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a53b42c11815d2357e31a9403ae3950517525894 ]
We came across infinite loop in ipvs when using ipvs in docker
env.
When ipvs receives new packets and cannot find an ipvs connection,
it will create a new connection, then if the dest is unavailable
(i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.
But if the dropped packet is the first packet of this connection,
the connection control timer never has a chance to start and the
ipvs connection cannot be released. This will lead to memory leak, or
infinite loop in cleanup_net() when net namespace is released like
this:
ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
__ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
ops_exit_list at ffffffff81567a49
cleanup_net at ffffffff81568b40
process_one_work at ffffffff810a851b
worker_thread at ffffffff810a9356
kthread at ffffffff810b0b6f
ret_from_fork at ffffffff81697a18
race condition:
CPU1 CPU2
ip_vs_in()
ip_vs_conn_new()
ip_vs_del_dest()
__ip_vs_unlink_dest()
~IP_VS_DEST_F_AVAILABLE
cp->dest && !IP_VS_DEST_F_AVAILABLE
__ip_vs_conn_put
...
cleanup_net ---> infinite looping
Fix this by checking whether the timer already started.
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d2e7075b87181ed0c675e4936e20bdadba02e1f ]
The vmcoreinfo of a crashed system is potentially fragmented. Thus the
crash kernel has an intermediate step where the vmcoreinfo is copied into a
temporary, continuous buffer in the crash kernel memory. This temporary
buffer is never freed. Free it now to prevent the memleak.
While at it replace all occurrences of "VMCOREINFO" by its corresponding
macro to prevent potential renaming issues.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit da786717e0894886301ed2536843c13f9e8fd53e ]
Roman reports that DHCPv6 client no longer sees replies from server
due to
ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP
rule. We need to set the F_IFACE flag for linklocal addresses, they
are scoped per-device.
Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
Reported-by: Roman Mamedov <rm@romanrm.net>
Tested-by: Roman Mamedov <rm@romanrm.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 817b89beb9d8876450fcde9155e17425c329569d ]
It is common XDP practice to unload/deattach the XDP bpf program,
when the XDP sample program is Ctrl-C interrupted (SIGINT) or
killed (SIGTERM).
The samples/bpf programs xdp_redirect_cpu and xdp_rxq_info,
forgot to trap signal SIGTERM (which is the default signal used
by the kill command).
This was discovered by Red Hat QA, which automated scripts depend
on killing the XDP sample program after a timeout period.
Fixes: fad3917e361b ("samples/bpf: add cpumap sample program xdp_redirect_cpu")
Fixes: 0fca931a6f21 ("samples/bpf: program demonstrating access to xdp_rxq_info")
Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d40b0116c94bd8fc2b63aae35ce8e66bb53bba42 ]
While working on sockmap I noticed that we do not always kfree the
struct smap_psock_map_entry list elements which track psocks attached
to maps. In the case of sock_hash_ctx_update_elem(), these map entries
are allocated outside of __sock_map_ctx_update_elem() with their
linkage to the socket hash table filled. In the case of sock array,
the map entries are allocated inside of __sock_map_ctx_update_elem()
and added with their linkage to the psock->maps. Both additions are
under psock->maps_lock each.
Now, we drop these elements from their psock->maps list in a few
occasions: i) in sock array via smap_list_map_remove() when an entry
is either deleted from the map from user space, or updated via
user space or BPF program where we drop the old socket at that map
slot, or the sock array is freed via sock_map_free() and drops all
its elements; ii) for sock hash via smap_list_hash_remove() in exactly
the same occasions as just described for sock array; iii) in the
bpf_tcp_close() where we remove the elements from the list via
psock_map_pop() and iterate over them dropping themselves from either
sock array or sock hash; and last but not least iv) once again in
smap_gc_work() which is a callback for deferring the work once the
psock refcount hit zero and thus the socket is being destroyed.
Problem is that the only case where we kfree() the list entry is
in case iv), which at that point should have an empty list in
normal cases. So in cases from i) to iii) we unlink the elements
without freeing where they go out of reach from us. Hence fix is
to properly kfree() them as well to stop the leakage. Given these
are all handled under psock->maps_lock there is no need for deferred
RCU freeing.
I later also ran with kmemleak detector and it confirmed the finding
as well where in the state before the fix the object goes unreferenced
while after the patch no kmemleak report related to BPF showed up.
[...]
unreferenced object 0xffff880378eadae0 (size 64):
comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 PMu]............
backtrace:
[<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
[<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
[<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
[<000000002ef89e83>] 0xffffffffffffffff
unreferenced object 0xffff880378ead240 (size 64):
comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 .Du]............
backtrace:
[<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
[<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
[<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
[<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
[<0000000000763660>] do_syscall_64+0x9a/0x300
[<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<000000002ef89e83>] 0xffffffffffffffff
[...]
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 585f5a6252ee43ec8feeee07387e3fcc7e8bb292 ]
The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
and BPF_NOEXIST map update flags. While on array-like maps this approach
is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
enforce map update flags to be BPF_ANY such that xchg() can be used
directly, the current implementation in sock map does not guarantee
that such operation with BPF_EXIST / BPF_NOEXIST is atomic.
The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
socket from the slot which is then tested for NULL / non-NULL. However
later after __sock_map_ctx_update_elem(), the actual update is done
through osock = xchg(&stab->sock_map[i], sock). Problem is that in
the meantime a different CPU could have updated / deleted a socket
on that specific slot and thus flag contraints won't hold anymore.
I've been thinking whether best would be to just break UAPI and do
an enforcement of BPF_ANY to check if someone actually complains,
however trouble is that already in BPF kselftest we use BPF_NOEXIST
for the map update, and therefore it might have been copied into
applications already. The fix to keep the current behavior intact
would be to add a map lock similar to the sock hash bucket lock only
for covering the whole map.
Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 90545cdc3f2b2ea700e24335610cd181e73756da ]
I found that in BPF sockmap programs once we either delete a socket
from the map or we updated a map slot and the old socket was purged
from the map that these socket can never get reattached into a map
even though their related psock has been dropped entirely at that
point.
Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops
intact, so that on the next tcp_set_ulp_id() the kernel returns an
-EEXIST thinking there is still some active ULP attached.
BPF sockmap is the only one that has this issue as the other user,
kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas
sockmap semantics allow dropping the socket from the map with all
related psock state being cleaned up.
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 166ab6f0a0702fdd4d865ad5090bf3094ed83428 ]
The smap_start_sock() and smap_stop_sock() are each protected under
the sock->sk_callback_lock from their call-sites except in the case
of sock_map_delete_elem() where we drop the old socket from the map
slot. This is racy because the same sock could be part of multiple
sock maps, so we run smap_stop_sock() in parallel, and given at that
point psock->strp_enabled might be true on both CPUs, we might for
example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
Therefore, hold the sock->sk_callback_lock as well on delete. Looks
like 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add
multi-map support") had this right, but later on e9db4ef6bf4c ("bpf:
sockhash fix omitted bucket lock in sock_close") removed it again
from delete leaving this smap_stop_sock() instance unprotected.
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6cd00a01f0c1ae6a852b09c59b8dd55cc6c35d1d ]
Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes
are initialized at __d_alloc(), we can't copy the whole size
unconditionally.
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50)
636f6e66696766732e746d70000000000010000000000000020000000188ffff
i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u
^
RIP: 0010:take_dentry_name_snapshot+0x28/0x50
RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002
RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60
RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001
R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00
R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0
take_dentry_name_snapshot+0x28/0x50
vfs_rename+0x128/0x870
SyS_rename+0x3b2/0x3d0
entry_SYSCALL_64_fastpath+0x1a/0xa4
0xffffffffffffffff
Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d39f8fb4b7776dcb09ec3bf7a321547083078ee3 ]
The deferred memory initialization relies on section definitions, e.g
PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on
most architectures.
Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit
ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since
the commit 2e3ca40f03bb13709df4 ("mm: relax deferred struct page
requirements") this requirement was relaxed and now it is possible to
enable DEFERRED_STRUCT_PAGE_INIT on architectures that support
DISCONTINGMEM and NO_BOOTMEM which causes build failures.
For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc
causes the following build failure:
CC mm/page_alloc.o
mm/page_alloc.c: In function 'update_defer_init':
mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION'
undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
^~~~~~~~~~~~~~~~~
USEC_PER_SEC
mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/cache.h:5:0,
from include/linux/printk.h:9,
from include/linux/kernel.h:14,
from include/asm-generic/bug.h:18,
from arch/arc/include/asm/bug.h:32,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'deferred_grow_zone':
mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
^
include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK'
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
^~~~
include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL'
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
^~~~~~~~~~~~~~
mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN'
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
^~~~~
In file included from include/asm-generic/bug.h:18:0,
from arch/arc/include/asm/bug.h:32,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'free_area_init_node':
mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^
include/linux/kernel.h:812:22: note: in definition of macro '__typecheck'
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
^
include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
__builtin_choose_expr(__safe_cmp(x, y), \
^~~~~~~~~~
include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t'
pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^~~~~
include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t'
pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
^~~~~
scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failed
Let's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
as the systems that support DISCONTIGMEM do not seem to have that huge
amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.
Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a718e28f538441a3b6612da9ff226973376cdf0f ]
Signed integer overflow is undefined according to the C standard. The
overflow in ksys_fadvise64_64() is deliberate, but since it is signed
overflow, UBSAN complains:
UBSAN: Undefined behaviour in mm/fadvise.c:76:10
signed integer overflow:
4 + 9223372036854775805 cannot be represented in type 'long long int'
Use unsigned types to do math. Unsigned overflow is defined so UBSAN
will not complain about it. This patch doesn't change generated code.
[akpm@linux-foundation.org: add comment explaining the casts]
Link: http://lkml.kernel.org/r/20180629184453.7614-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: <icytxw@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2ea62630681027c455117aa471ea3ab8bb099ead ]
On a shared LPAR, Phyp will not update the CPU associativity at boot
time. Just after the boot system does recognize itself as a shared
LPAR and trigger a request for correct CPU associativity. But by then
the scheduler would have already created/destroyed its sched domains.
This causes
- Broken load balance across Nodes causing islands of cores.
- Performance degradation esp if the system is lightly loaded
- dmesg to wrongly report all CPUs to be in Node 0.
- Messages in dmesg saying borken topology.
- With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
node sched domain"), can cause rcu stalls at boot up.
The sched_domains_numa_masks table which is used to generate cpumasks
is only created at boot time just before creating sched domains and
never updated. Hence, its better to get the topology correct before
the sched domains are created.
For example on 64 core Power 8 shared LPAR, dmesg reports
Brought up 512 CPUs
Node 0 CPUs: 0-511
Node 1 CPUs:
Node 2 CPUs:
Node 3 CPUs:
Node 4 CPUs:
Node 5 CPUs:
Node 6 CPUs:
Node 7 CPUs:
Node 8 CPUs:
Node 9 CPUs:
Node 10 CPUs:
Node 11 CPUs:
...
BUG: arch topology borken
the DIE domain not a subset of the NUMA domain
BUG: arch topology borken
the DIE domain not a subset of the NUMA domain
numactl/lscpu output will still be correct with cores spreading across
all nodes:
Socket(s): 64
NUMA node(s): 12
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s): 152-159,248-255,344-351,440-447
NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Currently on this LPAR, the scheduler detects 2 levels of Numa and
created numa sched domains for all CPUs, but it finds a single DIE
domain consisting of all CPUs. Hence it deletes all numa sched
domains.
To address this, detect the shared processor and update topology soon
after CPUs are setup so that correct topology is updated just before
scheduler creates sched domain.
With the fix, dmesg reports:
numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
and lscpu also reports:
Socket(s): 64
NUMA node(s): 12
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s): 152-159,248-255,344-351,440-447
NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Reported-by: Manjunatha H R <manjuhr1@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Trim / format change log]
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b96e9eb62841c519ba1db32d036628be3cdef91f ]
Current clock name looks like this:
/soc/bus@ffd00000/pwm@1b000#mux0
This is bad because CCF uses the clock to create a directory in clk debugfs.
With such name, the directory creation (silently) fails and the debugfs
entry end up being created at the debugfs root.
With this change, the clock name will now be:
ffd1b000.pwm#mux0
This matches the clock naming scheme used in the ethernet and mmc driver.
It also fixes the problem with debugfs.
Fixes: 36af66a79056 ("pwm: Convert to using %pOF instead of full_name")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]
If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.
This can lead to the following crash:
divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
------------ 3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
hfi1_init_dd+0x14b3/0x27a0 [hfi1]
? pcie_capability_write_word+0x46/0x70
? hfi1_pcie_init+0xc0/0x200 [hfi1]
do_init_one+0x153/0x4c0 [hfi1]
? sched_clock_cpu+0x85/0xc0
init_one+0x1b5/0x260 [hfi1]
local_pci_probe+0x4a/0xb0
work_for_cpu_fn+0x1a/0x30
process_one_work+0x17f/0x440
worker_thread+0x278/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork+0x77/0xb0
? insert_kthread_work+0x40/0x40
If the BIOS is not supplying NUMA information:
- set the default table count to 1 for all possible nodes
- select node 0 (instead of current NUMA) node to get consistent
performance
- generate an error indicating that the BIOS should be upgraded
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a30446c0dca3483c384b54a431cc951e15f7e79 ]
Currently acpi_gsb_i2c_read_bytes() directly returns i2c_transfer's return
value. i2c_transfer returns a value < 0 on error and 2 (for 2 successfully
executed transfers) on success. But the ACPI code expects 0 on success, so
currently acpi_gsb_i2c_read_bytes()'s caller does:
if (status > 0)
status = 0;
This commit makes acpi_gsb_i2c_read_bytes() return a value which can be
directly consumed by the ACPI code, mirroring acpi_gsb_i2c_write_bytes(),
this commit also makes acpi_gsb_i2c_read_bytes() explitcly check that
i2c_transfer returns 2, rather then accepting any value > 0.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 704ae091b061082b37a9968621af4c290c641d50 ]
Without linux/irq.h, there is no declaration of notifier_block, leading to
a build warning:
In file included from arch/x86/kernel/cpu/mcheck/threshold.c:10:
arch/x86/include/asm/mce.h:151:46: error: 'struct notifier_block' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
It's sufficient to declare the struct tag here, which avoids pulling in
more header files.
Fixes: 447ae3166702 ("x86: Don't include linux/irq.h from asm/hardirq.h")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicolai Stange <nstange@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180817100156.3009043-1-arnd@arndb.de
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69599206ea9a3f8f2e94d46580579cbf9d08ad6c ]
Legacy PCI over virtio uses a 32bit PFN for the queue. If the
queue pfn is too large to fit in 32bits, which we could hit on
arm64 systems with 52bit physical addresses (even with 64K page
size), we simply miss out a proper link to the other side of
the queue.
Add a check to validate the PFN, rather than silently breaking
the devices.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@kernel.org>
Cc: Peter Maydel <peter.maydell@linaro.org>
Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a6b29230ec336189bab32498df3f06c8a6944d8 ]
We should return error pointers in this function. Returning NULL
results in a NULL dereference in the caller.
Fixes: 73688d1ed0b8 ("apparmor: refactor prepare_ns() and make usable from different views")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 87915adc3f0acdf03c776df42e308e5a155c19af ]
In flush_work(), we need to create a lockdep dependency so that
the following scenario is appropriately tagged as a problem:
work_function()
{
mutex_lock(&mutex);
...
}
other_function()
{
mutex_lock(&mutex);
flush_work(&work); // or cancel_work_sync(&work);
}
This is a problem since the work might be running and be blocked
on trying to acquire the mutex.
Similarly, in flush_workqueue().
These were removed after cross-release partially caught these
problems, but now cross-release was reverted anyway. IMHO the
removal was erroneous anyway though, since lockdep should be
able to catch potential problems, not just actual ones, and
cross-release would only have caught the problem when actually
invoking wait_for_completion().
Fixes: fd1a5b04dfb8 ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d6e89786bed977f37f55ffca11e563f6d2b1e3b5 ]
In cancel_work_sync(), we can only have one of two cases, even
with an ordered workqueue:
* the work isn't running, just cancelled before it started
* the work is running, but then nothing else can be on the
workqueue before it
Thus, we need to skip the lockdep workqueue dependency handling,
otherwise we get false positive reports from lockdep saying that
we have a potential deadlock when the workqueue also has other
work items with locking, e.g.
work1_function() { mutex_lock(&mutex); ... }
work2_function() { /* nothing */ }
other_function() {
queue_work(ordered_wq, &work1);
queue_work(ordered_wq, &work2);
mutex_lock(&mutex);
cancel_work_sync(&work2);
}
As described above, this isn't a problem, but lockdep will
currently flag it as if cancel_work_sync() was flush_work(),
which *is* a problem.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df865e8337c397471b95f51017fea559bc8abb4a ]
elf_kcore_store_hdr() uses __pa() to find the physical address of
KCORE_RAM or KCORE_TEXT entries exported as program headers.
This trips CONFIG_DEBUG_VIRTUAL's checks, as the KCORE_TEXT entries are
not in the linear map.
Handle these two cases separately, using __pa_symbol() for the KCORE_TEXT
entries.
Link: http://lkml.kernel.org/r/20180711131944.15252-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>