linux

iv/linux

Author	SHA1	Message	Date
Eyal Birger	d549699048	net/packet: fix packet receive on L3 devices without visible hard header In the patchset merged by commit b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:29:36 -08:00
Jakub Kicinski	cc69837fca	net: don't include ethtool.h from netdevice.h linux/netdevice.h is included in very many places, touching any of its dependecies causes large incremental builds. Drop the linux/ethtool.h include, linux/netdevice.h just needs a forward declaration of struct ethtool_ops. Fix all the places which made use of this implicit include. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:27:04 -08:00
Kurt Kanzenbach	8551fad63c	net: dsa: tag_hellcreek: Cleanup includes Remove unused and add needed includes. No functional change. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:57:21 -08:00
Stefano Garzarella	3fe356d58e	vsock/virtio: discard packets only when socket is really closed Starting from commit 8692cefc433f ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt"), we discard packets in virtio_transport_recv_pkt() if the socket has been released. When the socket is connected, we schedule a delayed work to wait the RST packet from the other peer, also if SHUTDOWN_MASK is set in sk->sk_shutdown. This is done to complete the virtio-vsock shutdown algorithm, releasing the port assigned to the socket definitively only when the other peer has consumed all the packets. If we discard the RST packet received, the socket will be closed only when the VSOCK_CLOSE_TIMEOUT is reached. Sergio discovered the issue while running ab(1) HTTP benchmark using libkrun [1] and observing a latency increase with that commit. To avoid this issue, we discard packet only if the socket is really closed (SOCK_DONE flag is set). We also set SOCK_DONE in virtio_transport_release() when we don't need to wait any packets from the other peer (we didn't schedule the delayed work). In this case we remove the socket from the vsock lists, releasing the port assigned. [1] https://github.com/containers/libkrun Fixes: 8692cefc433f ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt") Cc: justin.he@arm.com Reported-by: Sergio Lopez <slp@redhat.com> Tested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20201120104736.73749-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:36:29 -08:00
Ricardo Dias	01770a1661	tcp: fix race condition when creating child sockets from syncookies When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:32:33 -08:00
Paul Moore	3df98d7921	lsm,selinux: pass flowi_common instead of flowi to the LSM hooks As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-11-23 18:36:21 -05:00
Jason Gunthorpe	ed92f6a52b	Linux 5.10-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+69egeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGTSYH/ifRBlaxy5UiHFc0 2zdR7pkjWrYfDTTT3sazIAhdlzzcfnkUqgFxOP45F4ZIqeTzunH3sUY+5UlT9IX7 liUgnLxQ/1R9Gx8kPGQfu+tLCey78xVFydGsqJoW9sPRw2R+apMdGGa/lOrk+OXz DXIN+dDnGFqwCCNJpK+rxQQhFf++IPpSI8z6Y23moOFhsDZrEziHuVFy2FGyRM6z prZ/us/tcobE8ptCk1RmOxLoJ1DR6UxpA2vLimTE+JD8siOsSWPbjE0KudnWCnd5 BLqIjrsPJbSxyuzzK3v9dnO5wMv7tMDuMIuYM/MQTXDttNwtsqt/aP6gdnUCym7N 5eHEj5g= =MuO1 -----END PGP SIGNATURE----- Merge tag 'v5.10-rc5' into rdma.git for-next For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-23 16:50:59 -04:00
David Howells	d7d775b1ff	rxrpc: Ask the security class how much space to allow in a packet Ask the security class how much header and trailer space to allow for when allocating a packet, given how much data is remaining. This will allow the rxgk security class to stick both a trailer in as well as a header as appropriate in the future. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 19:53:11 +00:00
David Howells	ceff522db2	rxrpc: rxkad: Don't use pskb_pull() to advance through the response packet In the rxkad security class, don't use pskb_pull() to advance through the contents of the response packet. There's no point, especially as the next and last access to the skbuff still has to allow for the wire header in the offset (which we didn't advance over). Better to just add the displacement to the next offset. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	521bb3049c	rxrpc: Organise connection security to use a union Organise the security information in the rxrpc_connection struct to use a union to allow for different data for different security classes. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	f4bdf3d683	rxrpc: Don't reserve security header in Tx DATA skbuff Insert the security header into the skbuff representing a DATA packet to be transmitted rather than using skb_reserve() when the packet is allocated. This makes it easier to apply crypto that spans the security header and the data, particularly in the upcoming RxGK class where we have a common encrypt-and-checksum function that is used in a number of circumstances. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	8d47a43c48	rxrpc: Merge prime_packet_security into init_connection_security Merge the ->prime_packet_security() into the ->init_connection_security() hook as they're always called together. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	177b898966	rxrpc: Fix example key name in a comment Fix an example of an rxrpc key name in a comment. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	9a0e6464f4	rxrpc: Ignore unknown tokens in key payload unless no known tokens When parsing a payload for an rxrpc-type key, ignore any tokens that are not of a known type and don't give an error for them - unless there are no tokens of a known type. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	4c20c33340	rxrpc: Make the parsing of xdr payloads more coherent Make the parsing of xdr-encoded payloads, as passed to add_key, more coherent. Shuttling back and forth between various variables was a bit hard to follow. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	d5953f6543	rxrpc: Allow security classes to give more info on server keys Allow a security class to give more information on an rxrpc_s-type key when it is viewed in /proc/keys. This will allow the upcoming RxGK security class to show the enctype name here. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	d2ae4e9182	rxrpc: Don't leak the service-side session key to userspace Don't let someone reading a service-side rxrpc-type key get access to the session key that was exchanged with the client. The server application will, at some point, need to be able to read the information in the ticket, but this probably shouldn't include the key material. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	12da59fcab	rxrpc: Hand server key parsing off to the security class Hand responsibility for parsing a server key off to the security class. We can determine which class from the description. This is necessary as rxgk server keys have different lookup requirements and different content requirements (dependent on crypto type) to those of rxkad server keys. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	ca7fb10059	rxrpc: Split the server key type (rxrpc_s) into its own file Split the server private key type (rxrpc_s) out into its own file rather than mingling it with the authentication/client key type (rxrpc) since they don't really bear any relation. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	ec832bd06d	rxrpc: Don't retain the server key in the connection Don't retain a pointer to the server key in the connection, but rather get it on demand when the server has to deal with a response packet. This is necessary to implement RxGK (GSSAPI-mediated transport class), where we can't know which key we'll need until we've challenged the client and got back the response. This also means that we don't need to do a key search in the accept path in softirq mode. Also, whilst we're at it, allow the security class to ask for a kvno and encoding-type variant of a server key as RxGK needs different keys for different encoding types. Keys of this type have an extra bit in the description: "<service-id>:<security-index>:<kvno>:<enctype>" Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	41057ebde0	rxrpc: Support keys with multiple authentication tokens rxrpc-type keys can have multiple tokens attached for different security classes. Currently, rxrpc always picks the first one, whether or not the security class it indicates is supported. Add preliminary support for choosing which security class will be used (this will need to be directed from a higher layer) and go through the tokens to find one that's supported. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	0727d3ec38	rxrpc: List the held token types in the key description in /proc/keys When viewing an rxrpc-type key through /proc/keys, display a list of held token types. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
David Howells	8a5dc32115	rxrpc: Remove the rxk5 security class as it's now defunct Remove the rxrpc rxk5 security class as it's now defunct and nothing uses it anymore. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:29 +00:00
Jens Axboe	b713c195d5	net: provide __sys_shutdown_sock() that takes a socket No functional changes in this patch, needed to provide io_uring support for shutdown(2). Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-11-23 09:15:15 -07:00
Marek Majtyka	178648916e	xsk: Fix incorrect netdev reference count Fix incorrect netdev reference count in xsk_bind operation. Incorrect reference count of the device appears when a user calls bind with the XDP_ZEROCOPY flag on an interface which does not support zero-copy. In such a case, an error is returned but the reference count is not decreased. This change fixes the fault, by decreasing the reference count in case of such an error. The problem being corrected appeared in '162c820ed896' for the first time, and the code was moved to new file location over the time with commit 'c2d3d6a47462'. This specific patch applies to all version starting from 'c2d3d6a47462'. The same solution should be applied but on different file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions from '162c820ed896' to 'c2d3d6a47462' excluded. Fixes: 162c820ed896 ("xdp: hold device for umem regardless of zero-copy mode") Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201120151443.105903-1-marekx.majtyka@intel.com	2020-11-23 13:19:33 +01:00
Yejune Deng	988187e881	ipvs: replace atomic_add_return() atomic_inc_return() looks better Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-22 13:45:52 +01:00
Randy Dunlap	fd2d6bc4c2	netfilter: nft_reject_bridge: fix build errors due to code movement Fix build errors in net/bridge/netfilter/nft_reject_bridge.ko by selecting NF_REJECT_IPV4, which provides the missing symbols. ERROR: modpost: "nf_reject_skb_v4_tcp_reset" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! ERROR: modpost: "nf_reject_skb_v4_unreach" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! Fixes: fa538f7cf05a ("netfilter: nf_reject: add reject skbuff creation helpers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-22 13:44:51 +01:00
Jakub Kicinski	5e08723967	linux-can-next-for-5.11-20201120 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl+3q+gTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqYdjB/46Qv583QBIbM8YaXN+XTSfxSufp0Ku G0R88mJKdOalgeiD58TaZ3hAegjOW44nCMmZP64mQB7XNvRiyNk8j69VDsRy+4Zy r86Hi0TIlpeM5oNHb84SpvIO01elmuYNlaoU9t7NQV5YlKA2Lq+kcmMLGa7Z1zVk IZ2KF2eF2V83p/4A5t1cgQV6LRlDBgAeoe/9pyG0mNX3Fv8lI3WIqjRkuevSm6iJ XMEuc7kjUZ8Yd59Vvnipm89teD5lCyIpxeJ2hrOsiUlkwyosdCxewMeohJ596a2L t5DoiT+3nJLOBQ/IKCYe5RoCf7sFZ8nfjqcNYeOm7Yw1CbXg9KwGZTU7 =BedD -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.11-20201120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-11-20 The first patch is by Yegor Yefremov and he improves the j1939 documentaton by adding tables for the CAN identifier and its fields. Then there are 8 patches by Oliver Hartkopp targeting the CAN driver infrastructure and drivers. These add support for optional DLC element to the Classical CAN frame structure. See patch ea7800565a12 ("can: add optional DLC element to Classical CAN frame structure") for details. Oliver's last patch adds len8_dlc support to several drivers. Stefan Mätje provides a patch to add len8_dlc support to the esd_usb2 driver. The next patch is by Oliver Hartkopp, too and adds support for modification of Classical CAN DLCs to CAN GW sockets. The next 3 patches target the nxp,flexcan DT bindings. One patch by my adds the missing uint32 reference to the clock-frequency property. Joakim Zhang's patches fix the fsl,clk-source property and add the IMX_SC_R_CAN() macro to the imx firmware header file, which will be used in the flexcan driver later. Another patch by Joakim Zhang prepares the flexcan driver for SCU based stop-mode, by giving the existing, GPR based stop-mode, a _GPR postfix. The next 5 patches are by me, target the flexcan driver, and clean up the .ndo_open and .ndo_stop callbacks. These patches try to fix a sporadically hanging flexcan_close() during simultanious ifdown, sending of CAN messages and probably open CAN bus. I was never able to reproduce, but these seem to fix the problem at the reporting user. As these changes are rather big, I'd like to mainline them via net-next/master. The next patches are by Jimmy Assarsson and Christer Beskow, they add support for new USB devices to the existing kvaser_usb driver. The last patch is by Kaixu Xia and simplifies the return in the mcp251xfd_chip_softreset() function in the mcp251xfd driver. * tag 'linux-can-next-for-5.11-20201120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (25 commits) can: mcp251xfd: remove useless code in mcp251xfd_chip_softreset can: kvaser_usb: Add new Kvaser hydra devices can: kvaser_usb: kvaser_usb_hydra: Add support for new device variant can: kvaser_usb: Add new Kvaser Leaf v2 devices can: kvaser_usb: Add USB_{LEAF,HYDRA}_PRODUCT_ID_END defines can: flexcan: flexcan_close(): change order if commands to properly shut down the controller can: flexcan: flexcan_open(): completely initialize controller before requesting IRQ can: flexcan: flexcan_rx_offload_setup(): factor out mailbox and rx-offload setup into separate function can: flexcan: move enabling/disabling of interrupts from flexcan_chip_{start,stop}() to callers can: flexcan: factor out enabling and disabling of interrupts into separate function can: flexcan: rename macro FLEXCAN_QUIRK_SETUP_STOP_MODE -> FLEXCAN_QUIRK_SETUP_STOP_MODE_GPR dt-bindings: firmware: add IMX_SC_R_CAN(x) macro for CAN dt-bindings: can: fsl,flexcan: fix fsl,clk-source property dt-bindings: can: fsl,flexcan: add uint32 reference to clock-frequency property can: gw: support modification of Classical CAN DLCs can: drivers: add len8_dlc support for esd_usb2 CAN adapter can: drivers: add len8_dlc support for various CAN adapters can: drivers: introduce helpers to access Classical CAN DLC values can: update documentation for DLC usage in Classical CAN can: rename CAN FD related can_len2dlc and can_dlc2len helpers ... ==================== Link: https://lore.kernel.org/r/20201120133318.3428231-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:58:12 -08:00
Julian Wiedmann	c5dab0941f	net/af_iucv: set correct sk_protocol for child sockets Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*), instead of the PF_IUCV protocol that the parent was created with in iucv_sock_create(). We're currently not using sk->sk_protocol ourselves, so this shouldn't have much impact (except eg. getting the output in skb_dump() right). Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:43:45 -08:00
Heiner Kallweit	7609ecb2ed	net: bridge: switch to net core statistics counters handling Use netdev->tstats instead of a member of net_bridge for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:40:50 -08:00
Alexander Duyck	55472017a4	tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control When setting congestion control via a BPF program it is seen that the SYN/ACK for packets within a given flow will not include the ECT0 flag. A bit of simple printk debugging shows that when this is configured without BPF we will see the value INET_ECN_xmit value initialized in tcp_assign_congestion_control however when we configure this via BPF the socket is in the closed state and as such it isn't configured, and I do not see it being initialized when we transition the socket into the listen state. The result of this is that the ECT0 bit is configured based on whatever the default state is for the socket. Any easy way to reproduce this is to monitor the following with tcpdump: tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca Without this patch the SYN/ACK will follow whatever the default is. If dctcp all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0 will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK bit matches the value seen on the other packets in the given stream. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:09:47 -08:00
Alexander Duyck	861602b577	tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header An issue was recently found where DCTCP SYN/ACK packets did not have the ECT bit set in the L3 header. A bit of code review found that the recent change referenced below had gone though and added a mask that prevented the ECN bits from being populated in the L3 header. This patch addresses that by rolling back the mask so that it is only applied to the flags coming from the incoming TCP request instead of applying it to the socket tos/tclass field. Doing this the ECT bits were restored in the SYN/ACK packets in my testing. One thing that is not addressed by this patch set is the fact that tcp_reflect_tos appears to be incompatible with ECN based congestion avoidance algorithms. At a minimum the feature should likely be documented which it currently isn't. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:09:47 -08:00
Paolo Abeni	ea4ca586b1	mptcp: refine MPTCP-level ack scheduling Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:25 -08:00
Florian Westphal	fa3fe2b150	mptcp: track window announced to peer OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:25 -08:00
Geliang Tang	84dfe3677a	mptcp: send out dedicated ADD_ADDR packet When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:25 -08:00
Geliang Tang	d91d322a72	mptcp: change add_addr_signal type This patch changed the 'add_addr_signal' type from bool to char, so that we could encode the addr type there. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:25 -08:00
Paolo Abeni	0397c6d85f	mptcp: keep unaccepted MPC subflow into join list This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:25 -08:00
Florian Westphal	860975c6f8	mptcp: skip to next candidate if subflow has unacked data In case a subflow path is blocked, MPTCP-level retransmit may not take place anymore because such subflow is likely to have unacked data left in its write queue. Ignore subflows that have experienced loss and test next candidate. Fixes: 3b1d6210a95773691 ("mptcp: implement and use MPTCP-level retransmission") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:24 -08:00
Paolo Abeni	26aa231439	mptcp: fix state tracking for fallback socket We need to cope with some more state transition for fallback sockets, or could still end-up moving to TCP_CLOSE too early and avoid spooling some pending data Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:24 -08:00
Paolo Abeni	b2771d2419	mptcp: drop WORKER_RUNNING status bit Only mptcp_close() can actually cancel the workqueue, no need to add and use this flag. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:33:24 -08:00
Christian Eggers	30abc9cd9c	net: dsa: avoid potential use-after-free error If dsa_switch_ops::port_txtstamp() returns false, clone will be freed immediately. Shouldn't store a pointer to freed memory. Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:02:50 -08:00
Vadim Fedorenko	20ffc7adf5	net/tls: missing received data after fast remote close In case when tcp socket received FIN after some data and the parser haven't started before reading data caller will receive an empty buffer. This behavior differs from plain TCP socket and leads to special treating in user-space. The flow that triggers the race is simple. Server sends small amount of data right after the connection is configured to use TLS and closes the connection. In this case receiver sees TLS Handshake data, configures TLS socket right after Change Cipher Spec record. While the configuration is in process, TCP socket receives small Application Data record, Encrypted Alert record and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with SK_DONE bit set. The received data is not parsed upon arrival and is never sent to user-space. Patch unpauses parser directly if we have unparsed data in tcp receive queue. Fixes: fcf4793e278e ("tls: check RCV_SHUTDOWN in tls_wait_data") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:25:26 -08:00
Anmol Karn	3b3fd068c5	rose: Fix Null pointer dereference in rose_send_frame() rose_send_frame() dereferences `neigh->dev` when called from rose_transmit_clear_request(), and the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh`, and it is initialized in rose_add_loopback_neigh() as NULL. i.e when `rose_loopback_neigh` used in rose_loopback_timer() its `->dev` was still NULL and rose_loopback_timer() was calling rose_rx_call_request() without checking for NULL. - net/rose/rose_link.c This bug seems to get triggered in this line: rose_call = (ax25_address *)neigh->dev->dev_addr; Fix it by adding NULL checking for `rose_loopback_neigh->dev` in rose_loopback_timer(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reported-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Tested-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=9d2a7ca8c7f2e4b682c97578dfa3f236258300b3 Signed-off-by: Anmol Karn <anmol.karan123@gmail.com> Link: https://lore.kernel.org/r/20201119191043.28813-1-anmol.karan123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:04:58 -08:00
Paolo Abeni	12f4bd8622	net: add annotation for sock_{lock,unlock}_fast The static checker is fooled by the non-static locking scheme implemented by the mentioned helpers. Let's make its life easier adding some unconditional annotation so that the helpers are now interpreted as a plain spinlock from sparse. v1 -> v2: - add __releases() annotation to unlock_sock_fast() Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/6ed7ae627d8271fb7f20e0a9c6750fbba1ac2635.1605634911.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:00:54 -08:00
Numan Siddique	e2ef5203c8	net: openvswitch: Be liberal in tcp conntrack. There is no easy way to distinguish if a conntracked tcp packet is marked invalid because of tcp_in_window() check error or because it doesn't belong to an existing connection. With this patch, openvswitch sets liberal tcp flag for the established sessions so that out of window packets are not marked invalid. A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which sets this flag for both the directions of the nf_conn. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 09:53:48 -08:00
Magnus Karlsson	537cf4e3cc	xsk: Fix umem cleanup bug at socket destruct Fix a bug that is triggered when a partially setup socket is destroyed. For a fully setup socket, a socket that has been bound to a device, the cleanup of the umem is performed at the end of the buffer pool's cleanup work queue item. This has to be performed in a work queue, and not in RCU cleanup, as it is doing a vunmap that cannot execute in interrupt context. However, when a socket has only been partially set up so that a umem has been created but the buffer pool has not, the code erroneously directly calls the umem cleanup function instead of using a work queue, and this leads to a BUG_ON() in vunmap(). As there in this case is no buffer pool, we cannot use its work queue, so we need to introduce a work queue for the umem and schedule this for the cleanup. So in the case there is no pool, we are going to use the umem's own work queue to schedule the cleanup. But if there is a pool, the cleanup of the umem is still being performed by the pool's work queue, as it is important that the umem is cleaned up after the pool. Fixes: e5e1a4bc916d ("xsk: Fix possible memory leak at socket close") Reported-by: Marek Majtyka <marekx.majtyka@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Marek Majtyka <marekx.majtyka@intel.com> Link: https://lore.kernel.org/bpf/1605873219-21629-1-git-send-email-magnus.karlsson@gmail.com	2020-11-20 15:52:39 +01:00
Oliver Hartkopp	94c23097f9	can: gw: support modification of Classical CAN DLCs Add support for data length code modifications for Classical CAN. The netlink configuration interface always allowed to pass any value that fits into a byte, therefore only the modification process had to be extended to handle the raw DLC represenation of Classical CAN frames. When a DLC value from 0 .. F is provided for Classical CAN frame modifications the 'len' value is modified as-is with the exception that potentially existing 9 .. F DLC values in the len8_dlc element are moved to the 'len' element for the modification operation by mod_retrieve_ccdlc(). After the modification the Classical CAN frame DLC information is brought back into the correct format by mod_store_ccdlc() which is filling 'len' and 'len8_dlc' accordingly. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201119084921.2621-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-11-20 12:05:14 +01:00
Oliver Hartkopp	c7b7496779	can: replace can_dlc as variable/element for payload length The naming of can_dlc as element of struct can_frame and also as variable name is misleading as it claims to be a 'data length CODE' but in reality it always was a plain data length. With the indroduction of a new 'len' element in struct can_frame we can now remove can_dlc as name and make clear which of the former uses was a plain length (-> 'len') or a data length code (-> 'dlc') value. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201120100444.3199-1-socketcan@hartkopp.net [mkl: gs_usb: keep struct gs_host_frame::can_dlc as is] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-11-20 12:04:12 +01:00
Bhaumik Bhatt	2ca7e30d3b	net: qrtr: Unprepare MHI channels during remove Reset MHI device channels when driver remove is called due to module unload or any crash scenario. This will make sure that MHI channels no longer remain enabled for transfers since the MHI stack does not take care of this anymore after the auto-start channels feature was removed. Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2020-11-20 11:55:17 +05:30
Oliver Herms	6b13d8f71f	IPv6: RTM_GETROUTE: Add RTA_ENCAP to result This patch adds an IPv6 routes encapsulation attribute to the result of netlink RTM_GETROUTE requests (i.e. ip route get 2001:db8::). Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201118230651.GA8861@tws Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:56:20 -08:00

... 7 8 9 10 11 ...

63495 Commits