linux

iv/linux

Author	SHA1	Message	Date
Jason A. Donenfeld	53e054b3cd	siphash: add cryptographically secure PRF commit 2c956a60778cbb6a27e0c7a8a52a91378c90e1d1 upstream. SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining. For the first usage: There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases. There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk. For the second usage: A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy. Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.9 as dependency of commits df453700e8d8 "inet: switch IP ID generator to siphash" and 3c79107631db "netfilter: ctnetlink: don't use conntrack/expect object addresses as id"] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:42 +02:00
Jason Wang	02b40edda9	vhost: scsi: add weight support commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream. This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable. This addresses CVE-2019-3900. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [bwh: Backported to 4.9: - Drop changes in vhost_scsi_ctl_handle_vq() - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:42 +02:00
Jason Wang	4b58628857	vhost_net: fix possible infinite loop commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream. When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet: while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT \| MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... } This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like: 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when: - userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers This addresses CVE-2019-3900. Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [bwh: Backported to 4.9: - Both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:42 +02:00
Jason Wang	66c8d9d53e	vhost: introduce vhost_exceeds_weight() commit e82b9b0727ff6d665fff2d326162b460dded554d upstream. We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [bwh: Backported to 4.9: - In vhost_net, both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:41 +02:00
Jason Wang	6214511f22	vhost_net: introduce vhost_exceeds_weight() commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:41 +02:00
Paolo Abeni	73f768b768	vhost_net: use packet weight for rx handler, too commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream. Similar to commit a2ac99905f1e ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:41 +02:00
haibinzhang(张海斌)	43f7e9b81a	vhost-net: set packet weight of tx polling to 2 * vq size commit a2ac99905f1ea8b15997a6ec39af69aa28a3653b upstream. handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy polling udp packets with small length(e.g. 1byte udp payload), because setting VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length. Ping-Latencies shown below were tested between two Virtual Machines using netperf (UDP_STREAM, len=1), and then another machine pinged the client: vq size=256 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 3.319 18.489 57.303 64 1.643 2.021 2.552 128 1.825 2.600 3.224 256 1.997 2.710 4.295 512 1.860 3.171 4.631 1024 2.002 4.173 9.056 2048 2.257 5.650 9.688 4096 2.093 8.508 15.943 vq size=512 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.537 29.177 66.245 64 2.798 3.614 4.403 128 2.861 3.820 4.775 256 3.008 4.018 4.807 512 3.254 4.523 5.824 1024 3.079 5.335 7.747 2048 3.944 8.201 12.762 4096 4.158 11.057 19.985 Seems pretty consistent, a small dip at 2 VQ sizes. Ring size is a hint from device about a burst size it can tolerate. Based on benchmarks, set the weight to 2 * vq size. To evaluate this change, another tests were done using netperf(RR, TX) between two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was tweaked through qemu. Results shown below does not show obvious changes. vq size=256 TCP_RR vq size=512 TCP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2% 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1% 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3% 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1% 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% vq size=256 UDP_RR vq size=512 UDP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0% 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1% 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0% 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% vq size=256 TCP_STREAM vq size=512 TCP_STREAM size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2% 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1% 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0% 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0% 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1% 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0% 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1% 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1% 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0% 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0% 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1% 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1% 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1% Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com> Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:41 +02:00
Daniel Borkmann	c98446e1ba	bpf: add bpf_jit_limit knob to restrict unpriv allocations commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3 upstream. Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:41 +02:00
Daniel Borkmann	4d04757054	bpf: restrict access to core bpf sysctls commit 2e4a30983b0f9b19b59e38bbf7427d7fdd480d98 upstream. Given BPF reaches far beyond just networking these days, it was never intended to allow setting and in some cases reading those knobs out of a user namespace root running without CAP_SYS_ADMIN, thus tighten such access. Also the bpf_jit_enable = 2 debugging mode should only be allowed if kptr_restrict is not set since it otherwise can leak addresses to the kernel log. Dump a note to the kernel log that this is for debugging JITs only when enabled. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> [bwh: Backported to 4.9: - We don't have bpf_dump_raw_ok(), so drop the condition based on it. This condition only made it a bit harder for a privileged user to do something silly. - Drop change to bpf_jit_kallsyms] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:40 +02:00
Daniel Borkmann	5124abda30	bpf: get rid of pure_initcall dependency to enable jits commit fa9dd599b4dae841924b022768354cfde9affecb upstream. Having a pure_initcall() callback just to permanently enable BPF JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave a small race window in future where JIT is still disabled on boot. Since we know about the setting at compilation time anyway, just initialize it properly there. Also consolidate all the individual bpf_jit_enable variables into a single one and move them under one location. Moreover, don't allow for setting unspecified garbage values on them. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> [bwh: Backported to 4.9 as dependency of commit 2e4a30983b0f "bpf: restrict access to core bpf sysctls": - Drop change in arch/mips/net/ebpf_jit.c - Drop change to bpf_jit_kallsyms - Adjust filenames, context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:40 +02:00
Miles Chen	43729e6fea	mm/memcontrol.c: fix use after free in mem_cgroup_iter() commit 54a83d6bcbf8f4700013766b974bf9190d40b689 upstream. This patch is sent to report an use after free in mem_cgroup_iter() after merging commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()"). I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to the trees. However, I can still observe use after free issues addressed in the commit be2657752e9e. (on low-end devices, a few times this month) backtrace: css_tryget <- crash here mem_cgroup_iter shrink_node shrink_zones do_try_to_free_pages try_to_free_pages __perform_reclaim __alloc_pages_direct_reclaim __alloc_pages_slowpath __alloc_pages_nodemask To debug, I poisoned mem_cgroup before freeing it: static void __mem_cgroup_free(struct mem_cgroup memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); free_percpu(memcg->stat); + / poison memcg before freeing it / + memset(memcg, 0x78, sizeof(struct mem_cgroup)); kfree(memcg); } The coredump shows the position=0xdbbc2a00 is freed. (gdb) p/x ((struct mem_cgroup_per_node )0xe5009e00)->iter[8] $13 = {position = 0xdbbc2a00, generation = 0x2efd} 0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100 0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000 0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000 0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878 0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0 0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000 0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000 0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000 0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000 0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878 0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878 0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878 In the reclaim path, try_to_free_pages() does not setup sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ..., shrink_node(). In mem_cgroup_iter(), root is set to root_mem_cgroup because sc->target_mem_cgroup is NULL. It is possible to assign a memcg to root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter(). try_to_free_pages struct scan_control sc = {...}, target_mem_cgroup is 0x0; do_try_to_free_pages shrink_zones shrink_node mem_cgroup root = sc->target_mem_cgroup; memcg = mem_cgroup_iter(root, NULL, &reclaim); mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); My device uses memcg non-hierarchical mode. When we release a memcg: invalidate_reclaim_iterators() reaches only dead_memcg and its parents. If non-hierarchical mode is used, invalidate_reclaim_iterators() never reaches root_mem_cgroup. static void invalidate_reclaim_iterators(struct mem_cgroup dead_memcg) { struct mem_cgroup *memcg = dead_memcg; for (; memcg; memcg = parent_mem_cgroup(memcg) ... } So the use after free scenario looks like: CPU1 CPU2 try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); invalidate_reclaim_iterators(memcg); ... __mem_cgroup_free() kfree(memcg); try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id); iter = &mz->iter[reclaim->priority]; pos = READ_ONCE(iter->position); css_tryget(&pos->css) <- use after free To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in invalidate_reclaim_iterators(). [cai@lca.pw: fix -Wparentheses compilation warning] Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting") Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:40 +02:00
Isaac J. Manjarres	faf6760c93	mm/usercopy: use memory range to be accessed for wraparound check commit 951531691c4bcaa59f56a316e018bc2ff1ddf855 upstream. Currently, when checking to see if accessing n bytes starting at address "ptr" will cause a wraparound in the memory addresses, the check in check_bogus_address() adds an extra byte, which is incorrect, as the range of addresses that will be accessed is [ptr, ptr + (n - 1)]. This can lead to incorrectly detecting a wraparound in the memory address, when trying to read 4 KB from memory that is mapped to the the last possible page in the virtual address space, when in fact, accessing that range of memory would not cause a wraparound to occur. Use the memory range that will actually be accessed when considering if accessing a certain amount of bytes will cause the memory address to wrap around. Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Trilok Soni <tsoni@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [kees: backport to v4.9] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:40 +02:00
Gustavo A. R. Silva	694457ee8c	sh: kernel: hw_breakpoint: Fix missing break in switch statement commit 1ee1119d184bb06af921b48c3021d921bbd85bac upstream. Add missing break statement in order to prevent the code from falling through to case SH_BREAKPOINT_WRITE. Fixes: 09a072947791 ("sh: hw-breakpoints: Add preliminary support for SH-4A UBC.") Cc: stable@vger.kernel.org Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:40 +02:00
Suganath Prabu	718ce1eb74	scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA commit df9a606184bfdb5ae3ca9d226184e9489f5c24f7 upstream. Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as per hardware design, if DMA-able range contains all 64-bits set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault. E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000 bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then HBA will fault the firmware. Driver will set 63-bit DMA mask to ensure the above address will not be used. Cc: <stable@vger.kernel.org> # 5.1.20+ Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:39 +02:00
Emmanuel Grumbach	0d7ed7f429	iwlwifi: don't unmap as page memory that was mapped as single commit 87e7e25aee6b59fef740856f4e86d4b60496c9e1 upstream. In order to remember how to unmap a memory (as single or as page), we maintain a bit per Transmit Buffer (TBs) in the meta data (structure iwl_cmd_meta). We maintain a bitmap: 1 bit per TB. If the TB is set, we will free the memory as a page. This bitmap was never cleared. Fix this. Cc: stable@vger.kernel.org Fixes: 3cd1980b0cdf ("iwlwifi: pcie: introduce new tfd and tb formats") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:39 +02:00
Brian Norris	fefc9e1015	mwifiex: fix 802.11n/WPA detection commit df612421fe2566654047769c6852ffae1a31df16 upstream. Commit 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant vendor IEs") adjusted the ieee_types_vendor_header struct, which inadvertently messed up the offsets used in mwifiex_is_wpa_oui_present(). Add that offset back in, mirroring mwifiex_is_rsn_oui_present(). As it stands, commit 63d7ef36103d breaks compatibility with WPA (not WPA2) 802.11n networks, since we hit the "info: Disable 11n if AES is not supported by AP" case in mwifiex_is_network_compatible(). Fixes: 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant vendor IEs") Cc: <stable@vger.kernel.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:39 +02:00
Steve French	92225e4100	smb3: send CAP_DFS capability during session setup commit 8d33096a460d5b9bd13300f01615df5bb454db10 upstream. We had a report of a server which did not do a DFS referral because the session setup Capabilities field was set to 0 (unlike negotiate protocol where we set CAP_DFS). Better to send it session setup in the capabilities as well (this also more closely matches Windows client behavior). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:39 +02:00
Pavel Shilovsky	84bdf5e258	SMB3: Fix deadlock in validate negotiate hits reconnect commit e99c63e4d86d3a94818693147b469fa70de6f945 upstream. Currently we skip SMB2_TREE_CONNECT command when checking during reconnect because Tree Connect happens when establishing an SMB session. For SMB 3.0 protocol version the code also calls validate negotiate which results in SMB2_IOCL command being sent over the wire. This may deadlock on trying to acquire a mutex when checking for reconnect. Fix this by skipping SMB2_IOCL command when doing the reconnect check. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:38 +02:00
Brian Norris	c7b1a1e984	mac80211: don't WARN on short WMM parameters from AP commit 05aaa5c97dce4c10a9e7eae2f1569a684e0c5ced upstream. In a very similar spirit to commit c470bdc1aaf3 ("mac80211: don't WARN on bad WMM parameters from buggy APs"), an AP may not transmit a fully-formed WMM IE. For example, it may miss or repeat an Access Category. The above loop won't catch that and will instead leave one of the four ACs zeroed out. This triggers the following warning in drv_conf_tx() wlan0: invalid CW_min/CW_max: 0/0 and it may leave one of the hardware queues unconfigured. If we detect such a case, let's just print a warning and fall back to the defaults. Tested with a hacked version of hostapd, intentionally corrupting the IEs in hostapd_eid_wmm(). Cc: stable@vger.kernel.org Signed-off-by: Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20190726224758.210953-1-briannorris@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:38 +02:00
Takashi Iwai	542233bd73	ALSA: hda - Don't override global PCM hw info flag commit c1c6c877b0c79fd7e05c931435aa42211eaeebaf upstream. The commit bfcba288b97f ("ALSA - hda: Add support for link audio time reporting") introduced the conditional PCM hw info setup, but it overwrites the global azx_pcm_hw object. This will cause a problem if any other HD-audio controller, as it'll inherit the same bit flag although another controller doesn't support that feature. Fix the bug by setting the PCM hw info flag locally. Fixes: bfcba288b97f ("ALSA - hda: Add support for link audio time reporting") Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:38 +02:00
Wenwen Wang	dbb4f2d59f	ALSA: firewire: fix a memory leak bug commit 1be3c1fae6c1e1f5bb982b255d2034034454527a upstream. In iso_packets_buffer_init(), 'b->packets' is allocated through kmalloc_array(). Then, the aligned packet size is checked. If it is larger than PAGE_SIZE, -EINVAL will be returned to indicate the error. However, the allocated 'b->packets' is not deallocated on this path, leading to a memory leak. To fix the above issue, free 'b->packets' before returning the error code. Fixes: 31ef9134eb52 ("ALSA: add LaCie FireWire Speakers/Griffin FireWave Surround driver") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Cc: <stable@vger.kernel.org> # v2.6.39+ Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:38 +02:00
Guenter Roeck	d3d1b67ffd	hwmon: (nct7802) Fix wrong detection of in4 presence commit 38ada2f406a9b81fb1249c5c9227fa657e7d5671 upstream. The code to detect if in4 is present is wrong; if in4 is not present, the in4_input sysfs attribute is still present. In detail: - Ihen RTD3_MD=11 (VSEN3 present), everything is as expected (no bug). - If we have RTD3_MD!=11 (no VSEN3), we unexpectedly have a in4_input file under /sys and the "sensors" command displays in4_input. But as expected, we have no in4_min, in4_max, in4_alarm, in4_beep. Fix is_visible function to detect and report in4_input visibility as expected. Reported-by: Gilles Buloz <Gilles.Buloz@kontron.com> Cc: Gilles Buloz <Gilles.Buloz@kontron.com> Cc: stable@vger.kernel.org Fixes: 3434f37835804 ("hwmon: Driver for Nuvoton NCT7802Y") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:37 +02:00
Tomas Bortoli	127ab64c38	can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices commit 30a8beeb3042f49d0537b7050fd21b490166a3d9 upstream. Uninitialized Kernel memory can leak to USB devices. Fix by using kzalloc() instead of kmalloc() on the affected buffers. Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+513e4d0985298538bf9b@syzkaller.appspotmail.com Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:37 +02:00
Tomas Bortoli	0cad79bfb5	can: peak_usb: pcan_usb_pro: Fix info-leaks to USB devices commit ead16e53c2f0ed946d82d4037c630e2f60f4ab69 upstream. Uninitialized Kernel memory can leak to USB devices. Fix by using kzalloc() instead of kmalloc() on the affected buffers. Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+d6a5a1a3657b596ef132@syzkaller.appspotmail.com Fixes: f14e22435a27 ("net: can: peak_usb: Do not do dma on the stack") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:36 +02:00
Leonard Crestez	ac370f1ea5	perf/core: Fix creating kernel counters for PMUs that override event->cpu [ Upstream commit 4ce54af8b33d3e21ca935fc1b89b58cbba956051 ] Some hardware PMU drivers will override perf_event.cpu inside their event_init callback. This causes a lockdep splat when initialized through the kernel API: WARNING: CPU: 0 PID: 250 at kernel/events/core.c:2917 ctx_sched_out+0x78/0x208 pc : ctx_sched_out+0x78/0x208 Call trace: ctx_sched_out+0x78/0x208 __perf_install_in_context+0x160/0x248 remote_function+0x58/0x68 generic_exec_single+0x100/0x180 smp_call_function_single+0x174/0x1b8 perf_install_in_context+0x178/0x188 perf_event_create_kernel_counter+0x118/0x160 Fix this by calling perf_install_in_context with event->cpu, just like perf_event_open Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frank Li <Frank.li@nxp.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/c4ebe0503623066896d7046def4d6b1e06e0eb2e.1563972056.git.leonard.crestez@nxp.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:35 +02:00
Peter Zijlstra	a59873b5fe	tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop [ Upstream commit 952041a8639a7a3a73a2b6573cb8aa8518bc39f8 ] While reviewing rwsem down_slowpath, Will noticed ldsem had a copy of a bug we just found for rwsem. X = 0; CPU0 CPU1 rwsem_down_read() for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); X = 1; rwsem_up_write(); rwsem_mark_wake() atomic_long_add(adjustment, &sem->count); smp_store_release(&waiter->task, NULL); if (!waiter.task) break; ... } r = X; Allows 'r == 0'. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4898e640caf0 ("tty: Add timed, writer-prioritized rw semaphore") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:35 +02:00
Hannes Reinecke	5bf03ad9c4	scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG [ Upstream commit 20122994e38aef0ae50555884d287adde6641c94 ] Retrying immediately after we've received a 'transitioning' sense code is pretty much pointless, we should always use a delay before retrying. So ensure the default delay is applied before retrying. Signed-off-by: Hannes Reinecke <hare@suse.com> Tested-by: Zhangguanghui <zhang.guanghui@h3c.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:34 +02:00
Tyrel Datwyler	b3a0c297e6	scsi: ibmvfc: fix WARN_ON during event pool release [ Upstream commit 5578257ca0e21056821e6481bd534ba267b84e58 ] While removing an ibmvfc client adapter a WARN_ON like the following WARN_ON is seen in the kernel log: WARNING: CPU: 6 PID: 5421 at ./include/linux/dma-mapping.h:541 ibmvfc_free_event_pool+0x12c/0x1f0 [ibmvfc] CPU: 6 PID: 5421 Comm: rmmod Tainted: G E 4.17.0-rc1-next-20180419-autotest #1 NIP: d00000000290328c LR: d00000000290325c CTR: c00000000036ee20 REGS: c000000288d1b7e0 TRAP: 0700 Tainted: G E (4.17.0-rc1-next-20180419-autotest) MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 44008828 XER: 20000000 CFAR: c00000000036e408 SOFTE: 1 GPR00: d00000000290325c c000000288d1ba60 d000000002917900 c000000289d75448 GPR04: 0000000000000071 c0000000ff870000 0000000018040000 0000000000000001 GPR08: 0000000000000000 c00000000156e838 0000000000000001 d00000000290c640 GPR12: c00000000036ee20 c00000001ec4dc00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 00000100276901e0 0000000010020598 GPR20: 0000000010020550 0000000010020538 0000000010020578 00000000100205b0 GPR24: 0000000000000000 0000000000000000 0000000010020590 5deadbeef0000100 GPR28: 5deadbeef0000200 d000000002910b00 0000000000000071 c0000002822f87d8 NIP [d00000000290328c] ibmvfc_free_event_pool+0x12c/0x1f0 [ibmvfc] LR [d00000000290325c] ibmvfc_free_event_pool+0xfc/0x1f0 [ibmvfc] Call Trace: [c000000288d1ba60] [d00000000290325c] ibmvfc_free_event_pool+0xfc/0x1f0 [ibmvfc] (unreliable) [c000000288d1baf0] [d000000002909390] ibmvfc_abort_task_set+0x7b0/0x8b0 [ibmvfc] [c000000288d1bb70] [c0000000000d8c68] vio_bus_remove+0x68/0x100 [c000000288d1bbb0] [c0000000007da7c4] device_release_driver_internal+0x1f4/0x2d0 [c000000288d1bc00] [c0000000007da95c] driver_detach+0x7c/0x100 [c000000288d1bc40] [c0000000007d8af4] bus_remove_driver+0x84/0x140 [c000000288d1bcb0] [c0000000007db6ac] driver_unregister+0x4c/0xa0 [c000000288d1bd20] [c0000000000d6e7c] vio_unregister_driver+0x2c/0x50 [c000000288d1bd50] [d00000000290ba0c] cleanup_module+0x24/0x15e0 [ibmvfc] [c000000288d1bd70] [c0000000001dadb0] sys_delete_module+0x220/0x2d0 [c000000288d1be30] [c00000000000b284] system_call+0x58/0x6c Instruction dump: e8410018 e87f0068 809f0078 e8bf0080 e8df0088 2fa30000 419e008c e9230200 2fa90000 419e0080 894d098a 794a07e0 <0b0a0000> e9290008 2fa90000 419e0028 This is tripped as a result of irqs being disabled during the call to dma_free_coherent() by ibmvfc_free_event_pool(). At this point in the code path we have quiesced the adapter and its overly paranoid anyways to be holding the host lock. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:33 +02:00
Junxiao Bi	d80a03c747	scsi: megaraid_sas: fix panic on loading firmware crashdump [ Upstream commit 3b5f307ef3cb5022bfe3c8ca5b8f2114d5bf6c29 ] While loading fw crashdump in function fw_crash_buffer_show(), left bytes in one dma chunk was not checked, if copying size over it, overflow access will cause kernel panic. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:32 +02:00
Arnd Bergmann	67b14bd258	ARM: davinci: fix sleep.S build error on ARMv4 [ Upstream commit d64b212ea960db4276a1d8372bd98cb861dfcbb0 ] When building a multiplatform kernel that includes armv4 support, the default target CPU does not support the blx instruction, which leads to a build failure: arch/arm/mach-davinci/sleep.S: Assembler messages: arch/arm/mach-davinci/sleep.S:56: Error: selected processor does not support `blx ip' in ARM mode Add a .arch statement in the sources to make this file build. Link: https://lore.kernel.org/r/20190722145211.1154785-1-arnd@arndb.de Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:31 +02:00
Lorenzo Pieralisi	ab1ab88e93	ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id() [ Upstream commit 5a46d3f71d5e5a9f82eabc682f996f1281705ac7 ] Static analysis identified that index comparison against ITS entries in iort_dev_find_its_id() is off by one. Update the comparison condition and clarify the resulting error message. Fixes: 4bf2efd26d76 ("ACPI: Add new IORT functions to support MSI domain handling") Link: https://lore.kernel.org/linux-arm-kernel/20190613065410.GB16334@mwanda/ Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Will Deacon <will@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:31 +02:00
Arnd Bergmann	dfaf368e67	drbd: dynamically allocate shash descriptor [ Upstream commit 77ce56e2bfaa64127ae5e23ef136c0168b818777 ] Building with clang and KASAN, we get a warning about an overly large stack frame on 32-bit architectures: drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect' [-Werror,-Wframe-larger-than=] We already allocate other data dynamically in this function, so just do the same for the shash descriptor, which makes up most of this memory. Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/ Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:30 +02:00
Arnaldo Carvalho de Melo	217c52217e	perf probe: Avoid calling freeing routine multiple times for same pointer [ Upstream commit d95daf5accf4a72005daa13fbb1d1bd8709f2861 ] When perf_add_probe_events() we call cleanup_perf_probe_events() for the pev pointer it receives, then, as part of handling this failure the main 'perf probe' goes on and calls cleanup_params() and that will again call cleanup_perf_probe_events()for the same pointer, so just set nevents to zero when handling the failure of perf_add_probe_events() to avoid the double free. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-x8qgma4g813z96dvtw9w219q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:29 +02:00
Charles Keepax	13810c478b	ALSA: compress: Be more restrictive about when a drain is allowed [ Upstream commit 3b8179944cb0dd53e5223996966746cdc8a60657 ] Draining makes little sense in the situation of hardware overrun, as the hardware will have consumed all its available samples. Additionally, draining whilst the stream is paused would presumably get stuck as no data is being consumed on the DSP side. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:28 +02:00
Charles Keepax	cbc76c3b9d	ALSA: compress: Don't allow paritial drain operations on capture streams [ Upstream commit a70ab8a8645083f3700814e757f2940a88b7ef88 ] Partial drain and next track are intended for gapless playback and don't really have an obvious interpretation for a capture stream, so makes sense to not allow those operations on capture streams. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:27 +02:00
Charles Keepax	8d25080f4d	ALSA: compress: Prevent bypasses of set_params [ Upstream commit 26c3f1542f5064310ad26794c09321780d00c57d ] Currently, whilst in SNDRV_PCM_STATE_OPEN it is possible to call snd_compr_stop, snd_compr_drain and snd_compr_partial_drain, which allow a transition to SNDRV_PCM_STATE_SETUP. The stream should only be able to move to the setup state once it has received a SNDRV_COMPRESS_SET_PARAMS ioctl. Fix this issue by not allowing those ioctls whilst in the open state. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:27 +02:00
Charles Keepax	daa60696bd	ALSA: compress: Fix regression on compressed capture streams [ Upstream commit 4475f8c4ab7b248991a60d9c02808dbb813d6be8 ] A previous fix to the stop handling on compressed capture streams causes some knock on issues. The previous fix updated snd_compr_drain_notify to set the state back to PREPARED for capture streams. This causes some issues however as the handling for snd_compr_poll differs between the two states and some user-space applications were relying on the poll failing after the stream had been stopped. To correct this regression whilst still fixing the original problem the patch was addressing, update the capture handling to skip the PREPARED state rather than skipping the SETUP state as it has done until now. Fixes: 4f2ab5e1d13d ("ALSA: compress: Fix stop handling on compressed capture streams") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:26 +02:00
Julian Wiedmann	657c648a19	s390/qdio: add sanity checks to the fast-requeue path [ Upstream commit a6ec414a4dd529eeac5c3ea51c661daba3397108 ] If the device driver were to send out a full queue's worth of SBALs, current code would end up discovering the last of those SBALs as PRIMED and erroneously skip the SIGA-w. This immediately stalls the queue. Add a check to not attempt fast-requeue in this case. While at it also make sure that the state of the previous SBAL was successfully extracted before inspecting it. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:25 +02:00
Wen Yang	260718adc6	cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init() [ Upstream commit e0a12445d1cb186d875410d093a00d215bec6a89 ] The cpu variable is still being used in the of_get_property() call after the of_node_put() call, which may result in use-after-free. Fixes: a9acc26b75f6 ("cpufreq/pasemi: fix possible object reference leak") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:24 +02:00
Björn Gerhart	71faaeb11d	hwmon: (nct6775) Fix register address and added missed tolerance for nct6106 [ Upstream commit f3d43e2e45fd9d44ba52d20debd12cd4ee9c89bf ] Fixed address of third NCT6106_REG_WEIGHT_DUTY_STEP, and added missed NCT6106_REG_TOLERANCE_H. Fixes: 6c009501ff200 ("hwmon: (nct6775) Add support for NCT6102D/6106D") Signed-off-by: Bjoern Gerhart <gerhart@posteo.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:24 +02:00
Brian Norris	760b12a726	mac80211: don't warn about CW params when not using them [ Upstream commit d2b3fe42bc629c2d4002f652b3abdfb2e72991c7 ] ieee80211_set_wmm_default() normally sets up the initial CW min/max for each queue, except that it skips doing this if the driver doesn't support ->conf_tx. We still end up calling drv_conf_tx() in some cases (e.g., ieee80211_reconfig()), which also still won't do anything useful...except it complains here about the invalid CW parameters. Let's just skip the WARN if we weren't going to do anything useful with the parameters. Signed-off-by: Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20190718015712.197499-1-briannorris@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:23 +02:00
Thomas Tai	2b9ac213b6	iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND [ Upstream commit 94bccc34071094c165c79b515d21b63c78f7e968 ] iscsi_ibft can use ACPI to find the iBFT entry during bootup, currently, ISCSI_IBFT depends on ISCSI_IBFT_FIND which is a X86 legacy way to find the iBFT by searching through the low memory. This patch changes the dependency so that other arch like ARM64 can use ISCSI_IBFT as long as the arch supports ACPI. ibft_init() needs to use the global variable ibft_addr declared in iscsi_ibft_find.c. A #ifndef CONFIG_ISCSI_IBFT_FIND is needed to declare the variable if CONFIG_ISCSI_IBFT_FIND is not selected. Moving ibft_addr into the iscsi_ibft.c does not work because if ISCSI_IBFT is selected as a module, the arch/x86/kernel/setup.c won't be able to find the variable at compile time. Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:23 +02:00
Florian Westphal	43e4ea5b3a	netfilter: nfnetlink: avoid deadlock due to synchronous request_module [ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ] Thomas and Juliana report a deadlock when running: (rmmod nf_conntrack_netlink/xfrm_user) conntrack -e NEW -E & modprobe -v xfrm_user They provided following analysis: conntrack -e NEW -E netlink_bind() netlink_lock_table() -> increases "nl_table_users" nfnetlink_bind() # does not unlock the table as it's locked by netlink_bind() __request_module() call_usermodehelper_exec() This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind() won't return until modprobe process is done. "modprobe xfrm_user": xfrm_user_init() register_pernet_subsys() -> grab pernet_ops_rwsem .. netlink_table_grab() calls schedule() as "nl_table_users" is non-zero so modprobe is blocked because netlink_bind() increased nl_table_users while also holding pernet_ops_rwsem. "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink: ctnetlink_init() register_pernet_subsys() -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module both modprobe processes wait on one another -- neither can make progress. Switch netlink_bind() to "nowait" modprobe -- this releases the netlink table lock, which then allows both modprobe instances to complete. Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-08-25 10:51:22 +02:00
Stephane Grosjean	13350c2269	can: peak_usb: fix potential double kfree_skb() commit fee6a8923ae0d318a7f7950c6c6c28a96cea099b upstream. When closing the CAN device while tx skbs are inflight, echo skb could be released twice. By calling close_candev() before unlinking all pending tx urbs, then the internal echo_skb[] array is fully and correctly cleared before the USB write callback and, therefore, can_get_echo_skb() are called, for each aborted URB. Fixes: bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core") Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:22 +02:00
Suzuki K Poulose	e253114f73	usb: yurex: Fix use-after-free in yurex_delete commit fc05481b2fcabaaeccf63e32ac1baab54e5b6963 upstream. syzbot reported the following crash [0]: BUG: KASAN: use-after-free in usb_free_coherent+0x79/0x80 drivers/usb/core/usb.c:928 Read of size 8 at addr ffff8881b18599c8 by task syz-executor.4/16007 CPU: 0 PID: 16007 Comm: syz-executor.4 Not tainted 5.3.0-rc2+ #23 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x6a/0x32c mm/kasan/report.c:351 __kasan_report.cold+0x1a/0x33 mm/kasan/report.c:482 kasan_report+0xe/0x12 mm/kasan/common.c:612 usb_free_coherent+0x79/0x80 drivers/usb/core/usb.c:928 yurex_delete+0x138/0x330 drivers/usb/misc/yurex.c:100 kref_put include/linux/kref.h:65 [inline] yurex_release+0x66/0x90 drivers/usb/misc/yurex.c:392 __fput+0x2d7/0x840 fs/file_table.c:280 task_work_run+0x13f/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop+0x1d2/0x200 arch/x86/entry/common.c:163 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline] syscall_return_slowpath arch/x86/entry/common.c:274 [inline] do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x413511 Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007ffc424ea2e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000413511 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006 RBP: 0000000000000001 R08: 0000000029a2fc22 R09: 0000000029a2fc26 R10: 00007ffc424ea3c0 R11: 0000000000000293 R12: 000000000075c9a0 R13: 000000000075c9a0 R14: 0000000000761938 R15: ffffffffffffffff Allocated by task 2776: save_stack+0x1b/0x80 mm/kasan/common.c:69 set_track mm/kasan/common.c:77 [inline] __kasan_kmalloc mm/kasan/common.c:487 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:460 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:748 [inline] usb_alloc_dev+0x51/0xf95 drivers/usb/core/usb.c:583 hub_port_connect drivers/usb/core/hub.c:5004 [inline] hub_port_connect_change drivers/usb/core/hub.c:5213 [inline] port_event drivers/usb/core/hub.c:5359 [inline] hub_event+0x15c0/0x3640 drivers/usb/core/hub.c:5441 process_one_work+0x92b/0x1530 kernel/workqueue.c:2269 worker_thread+0x96/0xe20 kernel/workqueue.c:2415 kthread+0x318/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Freed by task 16007: save_stack+0x1b/0x80 mm/kasan/common.c:69 set_track mm/kasan/common.c:77 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:449 slab_free_hook mm/slub.c:1423 [inline] slab_free_freelist_hook mm/slub.c:1470 [inline] slab_free mm/slub.c:3012 [inline] kfree+0xe4/0x2f0 mm/slub.c:3953 device_release+0x71/0x200 drivers/base/core.c:1064 kobject_cleanup lib/kobject.c:693 [inline] kobject_release lib/kobject.c:722 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x171/0x280 lib/kobject.c:739 put_device+0x1b/0x30 drivers/base/core.c:2213 usb_put_dev+0x1f/0x30 drivers/usb/core/usb.c:725 yurex_delete+0x40/0x330 drivers/usb/misc/yurex.c:95 kref_put include/linux/kref.h:65 [inline] yurex_release+0x66/0x90 drivers/usb/misc/yurex.c:392 __fput+0x2d7/0x840 fs/file_table.c:280 task_work_run+0x13f/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop+0x1d2/0x200 arch/x86/entry/common.c:163 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline] syscall_return_slowpath arch/x86/entry/common.c:274 [inline] do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881b1859980 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 72 bytes inside of 2048-byte region [ffff8881b1859980, ffff8881b185a180) The buggy address belongs to the page: page:ffffea0006c61600 refcount:1 mapcount:0 mapping:ffff8881da00c000 index:0x0 compound_mapcount: 0 flags: 0x200000000010200(slab\|head) raw: 0200000000010200 0000000000000000 0000000100000001 ffff8881da00c000 raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881b1859880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881b1859900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff8881b1859980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881b1859a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881b1859a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== A quick look at the yurex_delete() shows that we drop the reference to the usb_device before releasing any buffers associated with the device. Delay the reference drop until we have finished the cleanup. [0] https://lore.kernel.org/lkml/0000000000003f86d8058f0bd671@google.com/ Fixes: 6bc235a2e24a5e ("USB: add driver for Meywa-Denki & Kayac YUREX") Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tomoki Sekiyama <tomoki.sekiyama@gmail.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: andreyknvl@google.com Cc: gregkh@linuxfoundation.org Cc: Alan Stern <stern@rowland.harvard.edu> Cc: syzkaller-bugs@googlegroups.com Cc: dtor@chromium.org Reported-by: syzbot+d1fedb1c1fdb07fca507@syzkaller.appspotmail.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190805111528.6758-1-suzuki.poulose@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:21 +02:00
Thomas Richter	0d7f710286	perf record: Fix module size on s390 commit 12a6d2940b5f02b4b9f71ce098e3bb02bc24a9ea upstream. On s390 the modules loaded in memory have the text segment located after the GOT and Relocation table. This can be seen with this output: [root@m35lp76 perf]# fgrep qeth /proc/modules qeth 151552 1 qeth_l2, Live 0x000003ff800b2000 ... [root@m35lp76 perf]# cat /sys/module/qeth/sections/.text 0x000003ff800b3990 [root@m35lp76 perf]# There is an offset of 0x1990 bytes. The size of the qeth module is 151552 bytes (0x25000 in hex). The location of the GOT/relocation table at the beginning of a module is unique to s390. commit 203d8a4aa6ed ("perf s390: Fix 'start' address of module's map") adjusts the start address of a module in the map structures, but does not adjust the size of the modules. This leads to overlapping of module maps as this example shows: [root@m35lp76 perf] # ./perf report -D 0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x25000) @ 0]: x /lib/modules/.../qeth.ko.xz 0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x8000) @ 0]: x /lib/modules/.../ip6_tables.ko.xz The module qeth.ko has an adjusted start address modified to b3990, but its size is unchanged and the module ends at 0x3ff800d8990. This end address overlaps with the next modules start address of 0x3ff800d85a0. When the size of the leading GOT/Relocation table stored in the beginning of the text segment (0x1990 bytes) is subtracted from module qeth end address, there are no overlaps anymore: 0x3ff800d8990 - 0x1990 = 0x0x3ff800d7000 which is the same as 0x3ff800b2000 + 0x25000 = 0x0x3ff800d7000. To fix this issue, also adjust the modules size in function arch__fix_module_text_start(). Add another function parameter named size and reduce the size of the module when the text segment start address is changed. Output after: 0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x23670) @ 0]: x /lib/modules/.../qeth.ko.xz 0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x7a60) @ 0]: x /lib/modules/.../ip6_tables.ko.xz Reported-by: Stefan Liebler <stli@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org Fixes: 203d8a4aa6ed ("perf s390: Fix 'start' address of module's map") Link: http://lkml.kernel.org/r/20190724122703.3996-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:21 +02:00
Adrian Hunter	708eac3bdb	perf db-export: Fix thread__exec_comm() commit 3de7ae0b2a1d86dbb23d0cb135150534fdb2e836 upstream. Threads synthesized from /proc have comms with a start time of zero, and not marked as "exec". Currently, there can be 2 such comms. The first is created by processing a synthesized fork event and is set to the parent's comm string, and the second by processing a synthesized comm event set to the thread's current comm string. In the absence of an "exec" comm, thread__exec_comm() picks the last (oldest) comm, which, in the case above, is the parent's comm string. For a main thread, that is very probably wrong. Use the second-to-last in that case. This affects only db-export because it is the only user of thread__exec_comm(). Example: $ sudo perf record -a -o pt-a-sleep-1 -e intel_pt//u -- sleep 1 $ sudo chown ahunter pt-a-sleep-1 Before: $ perf script -i pt-a-sleep-1 --itrace=bep -s tools/perf/scripts/python/export-to-sqlite.py pt-a-sleep-1.db branches calls $ sqlite3 -header -column pt-a-sleep-1.db 'select * from comm_threads_view' comm_id command thread_id pid tid ---------- ---------- ---------- ---------- ---------- 1 swapper 1 0 0 2 rcu_sched 2 10 10 3 kthreadd 3 78 78 5 sudo 4 15180 15180 5 sudo 5 15180 15182 7 kworker/4: 6 10335 10335 8 kthreadd 7 55 55 10 systemd 8 865 865 10 systemd 9 865 875 13 perf 10 15181 15181 15 sleep 10 15181 15181 16 kworker/3: 11 14179 14179 17 kthreadd 12 29376 29376 19 systemd 13 746 746 21 systemd 14 401 401 23 systemd 15 879 879 23 systemd 16 879 945 25 kthreadd 17 556 556 27 kworker/u1 18 14136 14136 28 kworker/u1 19 15021 15021 29 kthreadd 20 509 509 31 systemd 21 836 836 31 systemd 22 836 967 33 systemd 23 1148 1148 33 systemd 24 1148 1163 35 kworker/2: 25 17988 17988 36 kworker/0: 26 13478 13478 After: $ perf script -i pt-a-sleep-1 --itrace=bep -s tools/perf/scripts/python/export-to-sqlite.py pt-a-sleep-1b.db branches calls $ sqlite3 -header -column pt-a-sleep-1b.db 'select * from comm_threads_view' comm_id command thread_id pid tid ---------- ---------- ---------- ---------- ---------- 1 swapper 1 0 0 2 rcu_sched 2 10 10 3 kswapd0 3 78 78 4 perf 4 15180 15180 4 perf 5 15180 15182 6 kworker/4: 6 10335 10335 7 kcompactd0 7 55 55 8 accounts-d 8 865 865 8 accounts-d 9 865 875 10 perf 10 15181 15181 12 sleep 10 15181 15181 13 kworker/3: 11 14179 14179 14 kworker/1: 12 29376 29376 15 haveged 13 746 746 16 systemd-jo 14 401 401 17 NetworkMan 15 879 879 17 NetworkMan 16 879 945 19 irq/131-iw 17 556 556 20 kworker/u1 18 14136 14136 21 kworker/u1 19 15021 15021 22 kworker/u1 20 509 509 23 thermald 21 836 836 23 thermald 22 836 967 25 unity-sett 23 1148 1148 25 unity-sett 24 1148 1163 27 kworker/2: 25 17988 17988 28 kworker/0: 26 13478 13478 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: 65de51f93ebf ("perf tools: Identify which comms are from exec") Link: http://lkml.kernel.org/r/20190808064823.14846-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:20 +02:00
Thomas Richter	624c2b3696	perf record: Fix wrong size in perf_record_mmap for last kernel module commit 9ad4652b66f19a60f07e63b942b80b5c2d7465bf upstream. During work on perf report for s390 I ran into the following issue: 0 0x318 [0x78]: PERF_RECORD_MMAP -1/0: [0x3ff804d6990(0xfffffc007fb2966f) @ 0]: x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid module size for qeth_l2.ko (the s390 ethernet device driver). Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory. It turned out that this wrong size is created by the perf record command. What happens is this function call sequence from __cmd_record(): perf_session__new(): perf_session__create_kernel_maps(): machine__create_kernel_maps(): machine__create_modules(): Creates map for all loaded kernel modules. modules__parse(): Reads /proc/modules and extracts module name and load address (1st and last column) machine__create_module(): Called for every module found in /proc/modules. Creates a new map for every module found and enters module name and start address into the map. Since the module end address is unknown it is set to zero. This ends up with a kernel module map list sorted by module start addresses. All module end addresses are zero. Last machine__create_kernel_maps() calls function map_groups__fixup_end(). This function iterates through the maps and assigns each map entry's end address the successor map entry start address. The last entry of the map group has no successor, so ~0 is used as end to consume the remaining memory. Later __cmd_record calls function record__synthesize() which in turn calls perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules() to create PERF_REPORT_MMAP entries into the perf.data file. On s390 this results in the last module qeth_l2.ko (which has highest start address, see module table: [root@s8360047 perf]# cat /proc/modules qeth_l2 86016 1 - Live 0x000003ff804d6000 qeth 266240 1 qeth_l2, Live 0x000003ff80296000 ccwgroup 24576 1 qeth, Live 0x000003ff80218000 vmur 36864 0 - Live 0x000003ff80182000 qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000 [root@s8360047 perf]# ) to be the last entry and its map has an end address of ~0. When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko its start address and length is written. The length is calculated in line: event->mmap.len = pos->end - pos->start; and results in 0xffffffffffffffff - 0x3ff804d6990() = 0xfffffc007fb2966f () On s390 the module start address is actually determined by a __weak function named arch__fix_module_text_start() in machine__create_module(). I think this improvable. We can use the module size (2nd column of /proc/modules) to get each loaded kernel module size and calculate its end address. Only for map entries which do not have a valid end address (end is still zero) we can use the heuristic we have now, that is use successor start address or ~0. Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com> LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Daz <daniel.diaz@linaro.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:20 +02:00
Joerg Roedel	6c8f40d2c2	mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() commit 3f8fd02b1bf1d7ba964485a56f2f4b53ae88c167 upstream. On x86-32 with PTI enabled, parts of the kernel page-tables are not shared between processes. This can cause mappings in the vmalloc/ioremap area to persist in some page-tables after the region is unmapped and released. When the region is re-used the processes with the old mappings do not fault in the new mappings but still access the old ones. This causes undefined behavior, in reality often data corruption, kernel oopses and panics and even spontaneous reboots. Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to all page-tables in the system before the regions can be re-used. References: https://bugzilla.suse.com/show_bug.cgi?id=1118689 Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:20 +02:00
Joerg Roedel	ffd85e35d6	x86/mm: Sync also unmappings in vmalloc_sync_all() commit 8e998fc24de47c55b47a887f6c95ab91acd4a720 upstream. With huge-page ioremap areas the unmappings also need to be synced between all page-tables. Otherwise it can cause data corruption when a region is unmapped and later re-used. Make the vmalloc_sync_one() function ready to sync unmappings and make sure vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD is found. Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-25 10:51:19 +02:00

1 2 3 4 5 ...

648962 Commits