linux

iv/linux

Author	SHA1	Message	Date
Liping Zhang	91af6ba7ff	netfilter: ebt_nflog: fix unexpected truncated packet "struct nf_loginfo li;" is a local variable, so we should set the flags to 0 explicitly, else, packets maybe truncated unexpectedly when copied to the userspace. Fixes: 7643507fe8b5 ("netfilter: xt_NFLOG: nflog-range does not truncate packets") Cc: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-29 18:47:02 +02:00
Liping Zhang	deaa0a976b	netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanup After running the following commands for a while, kmemleak reported that "1879 new suspected memory leaks" happened: # while : ; do ip netns add test ip netns delete test done unreferenced object 0xffff88006342fa38 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d598a>] dccp_init_net+0x8a/0x160 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... unreferenced object 0xffff88006342da58 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): 10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d6a9d>] sctp_init_net+0x5d/0x130 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... This is because we forgot to implement the get_net_proto for sctp and dccp, so we won't invoke the nf_ct_unregister_sysctl to free the ctl_table when do netns cleanup. Also note, we will fail to register the sysctl for dccp/sctp either due to the lack of get_net_proto. Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP") Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP") Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-29 18:47:01 +02:00
Mateusz Jurczyk	d2ecfa765d	Bluetooth: Add sockaddr length checks before accessing sa_family in bind and connect handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the Bluetooth sockets. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2017-06-29 14:37:57 +02:00
Tejun Heo	29e2dd0d56	bluetooth: remove WQ_MEM_RECLAIM from hci workqueues Bluetooth hci uses ordered HIGHPRI, MEM_RECLAIM workqueues. It's likely that the flags came from mechanical conversion from create_singlethread_workqueue(). Bluetooth shouldn't be depended upon for memory reclaim and the spurious MEM_RECLAIM flag can trigger the following warning. Remove WQ_MEM_RECLAIM and convert to alloc_ordered_workqueue() while at it. workqueue: WQ_MEM_RECLAIM hci0:hci_power_off is flushing !WQ_MEM_RECLAIM events:btusb_work ------------[ cut here ]------------ WARNING: CPU: 2 PID: 14231 at /home/brodo/local/kernel/git/linux/kernel/workqueue.c:2423 check_flush_dependency+0xb3/0x100 Modules linked in: CPU: 2 PID: 14231 Comm: kworker/u9:4 Not tainted 4.12.0-rc6+ #3 Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 Workqueue: hci0 hci_power_off task: ffff9432dad58000 task.stack: ffff986d43790000 RIP: 0010:check_flush_dependency+0xb3/0x100 RSP: 0018:ffff986d43793c90 EFLAGS: 00010086 RAX: 000000000000005a RBX: ffff943316810820 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000001 RBP: ffff986d43793cb0 R08: 0000000000000775 R09: ffffffff85bdd5c0 R10: 0000000000000040 R11: 0000000000000000 R12: ffffffff84d596e0 R13: ffff9432dad58000 R14: ffff94321c640320 R15: ffff9432dad58000 FS: 0000000000000000(0000) GS:ffff94331f500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007b8bca242000 CR3: 000000014f60a000 CR4: 00000000003406e0 Call Trace: flush_work+0x8a/0x1c0 ? flush_work+0x184/0x1c0 ? skb_free_head+0x21/0x30 __cancel_work_timer+0x124/0x1b0 ? hci_dev_do_close+0x2a4/0x4d0 cancel_work_sync+0x10/0x20 btusb_close+0x23/0x100 hci_dev_do_close+0x2ca/0x4d0 hci_power_off+0x1e/0x50 process_one_work+0x184/0x3e0 worker_thread+0x4a/0x3a0 ? preempt_count_sub+0x9b/0x100 ? preempt_count_sub+0x9b/0x100 kthread+0x125/0x140 ? process_one_work+0x3e0/0x3e0 ? __kthread_create_on_node+0x1a0/0x1a0 ? do_syscall_64+0x58/0xd0 ret_from_fork+0x27/0x40 Code: 00 75 bf 49 8b 56 18 48 8d 8b b0 00 00 00 48 81 c6 b0 00 00 00 4d 89 e0 48 c7 c7 20 23 6b 85 c6 05 83 cd 31 01 01 e8 bf c4 0c 00 <0f> ff eb 93 80 3d 74 cd 31 01 00 75 a5 65 48 8b 04 25 00 c5 00 ---[ end trace b88fd2f77754bfec ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2017-06-29 14:36:38 +02:00
Paolo Abeni	67a51780ae	ipv6: udp: leverage scratch area helpers The commit b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") leveraged the scratched area helpers for UDP v4 but I forgot to update accordingly the IPv6 code path. This change extends the scratch area usage to the IPv6 code, synching the two implementations and giving some performance benefit. IPv6 is again almost on the same level of IPv4, performance-wide. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-27 15:43:57 -04:00
Paolo Abeni	b26bbdae46	udp: move scratch area helpers into the include file So that they can be later used by the IPv6 code, too. Also lift the comments a bit. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-27 15:43:56 -04:00
Dave Watson	d97af30f61	tcp: fix null ptr deref in getsockopt(..., TCP_ULP, ...) If icsk_ulp_ops is unset, it dereferences a null ptr. Add a null ptr check. BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 [inline] BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 net/ipv4/tcp.c:3057 Read of size 4 at addr 0000000000000020 by task syz-executor1/15452 Signed-off-by: Dave Watson <davejwatson@fb.com> Reported-by: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-27 15:39:11 -04:00
Eric Dumazet	6f64ec7451	net: prevent sign extension in dev_get_stats() Similar to the fix provided by Dominik Heidler in commit 9b3dc0a17d73 ("l2tp: cast l2tp traffic counter to unsigned") we need to take care of 32bit kernels in dev_get_stats(). When using atomic_long_read(), we add a 'long' to u64 and might misinterpret high order bit, unless we cast to unsigned. Fixes: caf586e5f23ce ("net: add a core netdev->rx_dropped counter") Fixes: 015f0688f57ca ("net: net: add a core netdev->tx_dropped counter") Fixes: 6e7333d315a76 ("net: add rx_nohandler stat counter") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-27 14:45:12 -04:00
Jeffy Chen	5da8e47d84	Bluetooth: hidp: fix possible might sleep error in hidp_session_thread It looks like hidp_session_thread has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Tested-by: AL Yu-Chen Cho <acho@suse.com> Tested-by: Rohit Vaswani <rvaswani@nvidia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2017-06-27 19:32:11 +02:00
Jeffy Chen	f06d977309	Bluetooth: cmtp: fix possible might sleep error in cmtp_session It looks like cmtp_session has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: AL Yu-Chen Cho <acho@suse.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2017-06-27 19:32:11 +02:00
Jeffy Chen	25717382c1	Bluetooth: bnep: fix possible might sleep error in bnep_session It looks like bnep_session has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: AL Yu-Chen Cho <acho@suse.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2017-06-27 19:32:11 +02:00
Matthias Schiffer	d116ffc770	net: add netlink_ext_ack argument to rtnl_link_ops.slave_validate Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	17dd0ec470	net: add netlink_ext_ack argument to rtnl_link_ops.slave_changelink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	a8b8a889e3	net: add netlink_ext_ack argument to rtnl_link_ops.validate Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	ad744b223c	net: add netlink_ext_ack argument to rtnl_link_ops.changelink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:22 -04:00
Matthias Schiffer	7a3f4a1851	net: add netlink_ext_ack argument to rtnl_link_ops.newlink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-26 23:13:21 -04:00
Marcelo Ricardo Leitner	a02d036c02	sctp: adjust ssthresh when transport is idle RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd because otherwise it could cause the transport to lock into congestion avoidance phase specially if ssthresh was previously reduced by some packet drop, leading to poor performance. The Errata says to adjust ssthresh to cwnd only once, though the same goal is achieved by updating it every time we update cwnd too. The caveat is that we could take longer to get back up to speed but that should be compensated by the fact that we don't adjust on RTO basis (as RFC says) but based on Heartbeats, which are usually way longer. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 14:43:53 -04:00
Marcelo Ricardo Leitner	4ccbd0b0b9	sctp: adjust cwnd increase in Congestion Avoidance phase RFC4960 Errata 3.26 identified that at the same time RFC4960 states that cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was underspecified and as described could allow increasing cwnd more than that. This patch updates it so partial_bytes_acked is maxed to cwnd if flight_size doesn't reach cwnd, protecting it from such case. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 14:43:53 -04:00
Marcelo Ricardo Leitner	e56f777af8	sctp: allow increasing cwnd regardless of ctsn moving or not As per RFC4960 Errata 3.22, this condition is not needed anymore as it could cause the partial_bytes_acked to not consider the TSNs acked in the Gap Ack Blocks although they were received by the peer successfully. This patch thus drops the check for new Cumulative TSN Ack Point, leaving just the flight_size < cwnd one. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 14:43:53 -04:00
Marcelo Ricardo Leitner	d0b53f4097	sctp: update order of adjustments of partial_bytes_acked and cwnd RFC4960 Errata 3.12 says RFC4960 is unclear about the order of adjustments applied to partial_bytes_acked and cwnd in the congestion avoidance phase, and that the actual order should be: partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd is increased by MTU. We were first increasing cwnd, and then subtracting the new value pba, which leads to a different result as pba is smaller than what it should and could cause cwnd to not grow as much. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 14:43:53 -04:00
WANG Cong	d747a7a51b	tcp: reset sk_rx_dst in tcp_disconnect() We have to reset the sk->sk_rx_dst when we disconnect a TCP connection, because otherwise when we re-connect it this dst reference is simply overridden in tcp_finish_connect(). This fixes a dst leak which leads to a loopback dev refcnt leak. It is a long-standing bug, Kevin reported a very similar (if not same) bug before. Thanks to Andrei for providing such a reliable reproducer which greatly narrows down the problem. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Reported-by: Andrei Vagin <avagin@gmail.com> Reported-by: Kevin Xu <kaiwen.xu@hulu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 12:23:07 -04:00
Wei Wang	85cb73ff9b	net: ipv6: reset daddr and dport in sk if connect() fails In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if error occurs. In udp_v6_early_demux(), check for sk_state to make sure it is in TCP_ESTABLISHED state. Together, it makes sure unconnected UDP socket won't be considered as a valid candidate for early demux. v3: add TCP_ESTABLISHED state check in udp_v6_early_demux() v2: fix compilation error Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 11:46:56 -04:00
Mateusz Jurczyk	e3c42b61ff	af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the AF_IUCV socket. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in iucv_sock_bind()") Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> [jwi: removed unneeded null-check for addr] Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 11:42:58 -04:00
Hans Wippel	2e56c26b39	net/iucv: improve endianness handling Use proper endianness conversion for an skb protocol assignment. Given that IUCV is only available on big endian systems (s390), this simply avoids an endianness warning reported by sparse. Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 11:42:58 -04:00
Jakub Kicinski	3fcece12bc	net: store port/representator id in metadata_dst Switches and modern SR-IOV enabled NICs may multiplex traffic from Port representators and control messages over single set of hardware queues. Control messages and muxed traffic may need ordered delivery. Those requirements make it hard to comfortably use TC infrastructure today unless we have a way of attaching metadata to skbs at the upper device. Because single set of queues is used for many netdevs stopping TC/sched queues of all of them reliably is impossible and lower device has to retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on the fastpath. This patch attempts to enable port/representative devs to attach metadata to skbs which carry port id. This way representatives can be queueless and all queuing can be performed at the lower netdev in the usual way. Traffic arriving on the port/representative interfaces will be have metadata attached and will subsequently be queued to the lower device for transmission. The lower device should recognize the metadata and translate it to HW specific format which is most likely either a special header inserted before the network headers or descriptor/metadata fields. Metadata is associated with the lower device by storing the netdev pointer along with port id so that if TC decides to redirect or mirror the new netdev will not try to interpret it. This is mostly for SR-IOV devices since switches don't have lower netdevs today. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-25 11:42:01 -04:00
Ingo Molnar	1bc3cd4dfa	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-24 08:57:20 +02:00
Dan Carpenter	ac55cd6193	tls: return -EFAULT if copy_to_user() fails The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT here. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:19:27 -04:00
David S. Miller	93bbbfbb4a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-06-23 1) Use memdup_user to spmlify xfrm_user_policy. From Geliang Tang. 2) Make xfrm_dev_register static to silence a sparse warning. From Wei Yongjun. 3) Use crypto_memneq to check the ICV in the AH protocol. From Sabrina Dubroca. 4) Remove some unused variables in esp6. From Stephen Hemminger. 5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port. From Antony Antony. 6) Include the UDP encapsulation port to km_migrate announcements. From Antony Antony. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:17:31 -04:00
David S. Miller	43b786c676	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-06-23 1) Fix xfrm garbage collecting when unregistering a netdevice. From Hangbin Liu. 2) Fix NULL pointer derefernce when exiting a network namespace. From Hangbin Liu. 3) Fix some error codes in pfkey to prevent a NULL pointer derefernce. From Dan Carpenter. 4) Fix NULL pointer derefernce on allocation failure in pfkey. From Dan Carpenter. 5) Adjust IPv6 payload_len to include extension headers. Otherwise we corrupt the packets when doing ESP GRO on transport mode. From Yossi Kuperman. 6) Set nhoff to the proper offset of the IPv6 nexthdr when doing ESP GRO. From Yossi Kuperman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:11:26 -04:00
Jakub Kicinski	926f38e974	tcp: fix out-of-bounds access in ULP sysctl KASAN reports out-of-bound access in proc_dostring() coming from proc_tcp_available_ulp() because in case TCP ULP list is empty the buffer allocated for the response will not have anything printed into it. Set the first byte to zero to avoid strlen() going out-of-bounds. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:10:05 -04:00
WANG Cong	0ccc22f425	sit: use __GFP_NOWARN for user controlled allocation The memory allocation size is controlled by user-space, if it is too large just fail silently and return NULL, not to mention there is a fallback allocation later. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:08:40 -04:00
Yonghong Song	239946314e	bpf: possibly avoid extra masking for narrower load in verifier Commit 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") permits narrower load for certain ctx fields. The commit however will already generate a masking even if the prog-specific ctx conversion produces the result with narrower size. For example, for __sk_buff->protocol, the ctx conversion loads the data into register with 2-byte load. A narrower 2-byte load should not generate masking. For __sk_buff->vlan_present, the conversion function set the result as either 0 or 1, essentially a byte. The narrower 2-byte or 1-byte load should not generate masking. To avoid unnecessary masking, prog-specific _is_valid_access now passes converted_op_size back to verifier, which indicates the valid data width after perceived future conversion. Based on this information, verifier is able to avoid unnecessary marking. Since we want more information back from prog-specific _is_valid_access checking, all of them are packed into one data structure for more clarity. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:04:11 -04:00
Jakub Kicinski	ce158e580a	xdp: add reporting of offload mode Extend the XDP_ATTACHED_* values to include offloaded mode. Let drivers report whether program is installed in the driver or the HW by changing the prog_attached field from bool to u8 (type of the netlink attribute). Exploit the fact that the value of XDP_ATTACHED_DRV is 1, therefore since all drivers currently assign the mode with double negation: mode = !!xdp_prog; no drivers have to be modified. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 13:42:20 -04:00
Jakub Kicinski	ee5d032f7d	xdp: add HW offload mode flag for installing programs Add an installation-time flag for requesting that the program be installed only if it can be offloaded to HW. Internally new command for ndo_xdp is added, this way we avoid putting checks into drivers since they all return -EINVAL on an unknown command. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 13:42:19 -04:00
Jakub Kicinski	32d602771b	xdp: pass XDP flags into install handlers Pass XDP flags to the xdp ndo. This will allow drivers to look at the mode flags and make decisions about offload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 13:42:18 -04:00
Michal Kubeček	a5cb659bbc	net: account for current skb length when deciding about UFO Our customer encountered stuck NFS writes for blocks starting at specific offsets w.r.t. page boundary caused by networking stack sending packets via UFO enabled device with wrong checksum. The problem can be reproduced by composing a long UDP datagram from multiple parts using MSG_MORE flag: sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 3000, 0, ...); Assume this packet is to be routed via a device with MTU 1500 and NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), this condition is tested (among others) to decide whether to call ip_ufo_append_data(): ((length + fragheaderlen) > mtu) \|\| (skb && skb_is_gso(skb)) At the moment, we already have skb with 1028 bytes of data which is not marked for GSO so that the test is false (fragheaderlen is usually 20). Thus we append second 1000 bytes to this skb without invoking UFO. Third sendto(), however, has sufficient length to trigger the UFO path so that we end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() uses udp_csum() to calculate the checksum but that assumes all fragments have correct checksum in skb->csum which is not true for UFO fragments. When checking against MTU, we need to add skb->len to length of new segment if we already have a partially filled skb and fragheaderlen only if there isn't one. In the IPv6 case, skb can only be null if this is the first segment so that we have to use headersize (length of the first IPv6 header) rather than fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 13:29:38 -04:00
Paolo Abeni	9bd780f5e0	udp: fix poll() Michael reported an UDP breakage caused by the commit b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue"). The function __first_packet_length() can update the checksum bits of the pending skb, making the scratched area out-of-sync, and setting skb->csum, if the skb was previously in need of checksum validation. On later recvmsg() for such skb, checksum validation will be invoked again - due to the wrong udp_skb_csum_unnecessary() value - and will fail, causing the valid skb to be dropped. This change addresses the issue refreshing the scratch area in __first_packet_length() after the possible checksum update. Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 11:18:43 -04:00
Mateusz Jurczyk	f6a5885fc4	NFC: Add sockaddr length checks before accessing sa_family in bind handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() handlers of the AF_NFC socket. Since the syscall doesn't enforce a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:38:31 +02:00
Gustavo A. R. Silva	03036184e9	nfc: nci: remove unnecessary null check Remove unnecessary NULL check for pointer conn_info. conn_info is set in list_for_each_entry() using container_of(), which is never NULL. Addresses-Coverity-ID: 1362349 Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:34:18 +02:00
Mateusz Jurczyk	a0323b979f	nfc: Ensure presence of required attributes in the activate_target handler Check that the NFC_ATTR_TARGET_INDEX and NFC_ATTR_PROTOCOLS attributes (in addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to accessing them. This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:28:39 +02:00
Mateusz Jurczyk	608c4adfca	nfc: Fix the sockaddr length sanitization in llcp_sock_connect Fix the sockaddr length verification in the connect() handler of NFC/LLCP sockets, to compare against the size of the actual structure expected on input (sockaddr_nfc_llcp) instead of its shorter version (sockaddr_nfc). Both structures are defined in include/uapi/linux/nfc.h. The fields specific to the _llcp extended struct are as follows: 276 __u8 dsap; /* Destination SAP, if known / 277 __u8 ssap; / Source SAP to be bound to / 278 char service_name[NFC_LLCP_MAX_SERVICE_NAME]; / Service name URI */; 279 size_t service_name_len; If the caller doesn't provide a sufficiently long sockaddr buffer, these fields remain uninitialized (and they currently originate from the stack frame of the top-level sys_connect handler). They are then copied by llcp_sock_connect() into internal storage (nfc_llcp_sock structure), and could be subsequently read back through the user-mode getsockname() function (handled by llcp_sock_getname()). This would result in the disclosure of up to ~70 uninitialized bytes from the kernel stack to user-mode clients capable of creating AFC_NFC sockets. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:26:19 +02:00
Mark Greer	1b609e4384	NFC: digital: NFC-DEP Target WT(nfcdep,max) is now 14 Version 1.1 of the NFC Forum's NFC Digital Protocol Technical Specification dated 2014-07-14 specifies that the NFC-DEP Protocol's Target WT(nfcdep,max) value is 14. In version 1.0 it was 8 so change the value in the Linux NFC-DEP Protocol code accordingly. Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:19:59 +02:00
Mark Greer	7f9f171336	NFC: digital: NFC-A SEL_RES must be one byte Section 4.8.2 (SEL_RES Response) of NFC Forum's NFC Digital Protocol Technical Specification dated 2010-11-17 clearly states that the size of a SEL_RES Response is one byte. Enforce this restriction in the code. Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:19:59 +02:00
Markus Elfring	dcfca27faf	NFC: digital: Delete an error message for memory allocation failure Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:14:12 +02:00
Markus Elfring	ae72c9910b	NFC: digital: Improve a size determination in four functions Replace the specification of four data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-23 00:07:04 +02:00
Paolo Abeni	4b943faedf	udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb() very similar to commit dd99e425be23 ("udp: prefetch rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache miss when the BH is bottle-neck for UDP over ipv6 packet processing, e.g. for small packets when a single RX NIC ingress queue is in use. Performances under flood when multiple NIC RX queues used are unaffected, but when a single NIC rx queue is in use, this gives ~8% performance improvement. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-22 13:44:04 -04:00
WANG Cong	60abc0be96	ipv6: avoid unregistering inet6_dev for loopback The per netns loopback_dev->ip6_ptr is unregistered and set to NULL when its mtu is set to smaller than IPV6_MIN_MTU, this leads to that we could set rt->rt6i_idev NULL after a rt6_uncached_list_flush_dev() and then crash after another call. In this case we should just bring its inet6_dev down, rather than unregistering it, at least prior to commit 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") we always override the case for loopback. Thanks a lot to Andrey for finding a reliable reproducer. Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-22 13:21:44 -04:00
Sowmini Varadhan	c14b036681	rds: tcp: set linger to 1 when unloading a rds-tcp If we are unloading the rds_tcp module, we can set linger to 1 and drop pending packets to accelerate reconnect. The peer will end up resetting the connection based on new generation numbers of the new incarnation, so hanging on to unsent TCP packets via linger is mostly pointless in this case. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Jenny Xu <jenny.x.xu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-22 11:34:04 -04:00
Sowmini Varadhan	69b92b5b74	rds: tcp: send handshake ping-probe from passive endpoint The RDS handshake ping probe added by commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg() before the first data packet is sent to a peer. If the conversation is not bidirectional (i.e., one side is always passive and never invokes rds_sendmsg()) and the passive side restarts its rds_tcp module, a new HS ping probe needs to be sent, so that the number of paths can be re-established. This patch achieves that by sending a HS ping probe from rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done a handshake probe with this peer yet). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Jenny Xu <jenny.x.xu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-22 11:34:04 -04:00
Chenbo Feng	8fac365f63	tcp: Add a tcp_filter hook before handle ack packet Currently in both ipv4 and ipv6 code path, the ack packet received when sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup filter since it is handled from tcp_child_process and never reaches the tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks here can make sure all the ingress tcp packet can be correctly filtered. Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-22 11:13:56 -04:00

... 3 4 5 6 7 ...

47071 Commits