linux

iv/linux

Author	SHA1	Message	Date
Sergio Prado	3e2a5e1539	net: macb: add wake-on-lan support via magic packet Tested on Acqua A5 SoM (http://www.acmesystems.it/acqua). Signed-off-by: Sergio Prado <sergio.prado@e-labworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:56:02 -05:00
Amitoj Kaur Chawla	e651520320	net: hamradio: baycom_ser_fdx: Replace timeval with timespec64 32 bit systems using 'struct timeval' will break in the year 2038, so we replace the code appropriately. However, this driver is not broken in 2038 since we are only using microseconds portion of the time. This patch replaces 'struct timeval' with 'struct timespec64'. We only need to find elapsed microseconds rather than absolute time, so it's better to use monotonic time, so using ktime_get_ts64() makes the code more efficient and more robust against concurrent settimeofday() calls. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:54:54 -05:00
Tycho Andersen	4a92602aa1	openvswitch: allow management from inside user namespaces Operations with the GENL_ADMIN_PERM flag fail permissions checks because this flag means we call netlink_capable, which uses the init user ns. Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations which should be allowed inside a user namespace. The motivation for this is to be able to run openvswitch in unprivileged containers. I've tested this and it seems to work, but I really have no idea about the security consequences of this patch, so thoughts would be much appreciated. v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one massive one Reported-by: James Page <james.page@canonical.com> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> CC: Eric Biederman <ebiederm@xmission.com> CC: Pravin Shelar <pshelar@ovn.org> CC: Justin Pettit <jpettit@nicira.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:53:19 -05:00
Michael S. Tsirkin	4456ed04ea	ethtool: future-proof interface for speed extensions Many virtual and not quite virtual devices allow any speed to be set through ethtool. In particular, this applies to the virtio-net devices. Document this fact to make sure people don't assume the enum lists all possible values. Reserve values greater than INT_MAX for future extension and to avoid conflict with SPEED_UNKNOWN. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:52:12 -05:00
stephen hemminger	809dc75e9b	vrf: duplicate include of rtnetlink.h Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:45:24 -05:00
stephen hemminger	40d29af057	vxlan: udp_tunnel duplicate include net/udp_tunnel.h Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:45:24 -05:00
stephen hemminger	f48e72318a	rds: duplicate include net/tcp.h Duplicate include detected. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:45:24 -05:00
Amitoj Kaur Chawla	6d9b6f424d	bonding: Return correct error code The return value of kzalloc on failure of allocation of memory should be -ENOMEM and not -1. Found using Coccinelle. A simplified version of the semantic patch used is: //<smpl> @@ expression *e; @@ e = kzalloc(...); if (e == NULL) { ... return - -1 + -ENOMEM ; } //</smpl> The single call site only checks that the return value is not 0, hence no change is required at the call site. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 09:43:25 -05:00
David S. Miller	e7e9956d8f	Merge branch 'gso-checksums' Alexander Duyck says: ==================== Add GSO support for outer checksum w/ inner checksum offloads This patch series updates the existing segmentation offload code for tunnels to make better use of existing and updated GSO checksum computation. This is done primarily through two mechanisms. First we maintain a separate checksum in the GSO context block of the sk_buff. This allows us to maintain two checksum values, one offloaded with values stored in csum_start and csum_offset, and one computed and tracked in SKB_GSO_CB(skb)->csum. By maintaining these two values we are able to take advantage of the same sort of math used in local checksum offload so that we can provide both inner and outer checksums with minimal overhead. Below is the performance for a netperf session between an ixgbe PF and VF on the same host but in different namespaces. As can be seen a significant gain in performance can be had from allowing the use of Tx checksum offload on the inner headers while performing a software offload on the outer header computation: Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB Before: 87380 16384 16384 10.00 12844.38 9.30 -1.00 0.712 -1.00 After: 87380 16384 16384 10.00 13216.63 6.78 -1.00 0.504 -1.000 Changes from v1: * Dropped use of CHECKSUM_UNNECESSARY for remote checksum offload * Left encap_hdr_csum as it will likely be needed in future for SCTP GSO * Broke the changes out over many more patches * Updated GRE segmentation to more closely match UDP tunnel segmentation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:42 -05:00
Alexander Duyck	f245d079c1	net: Allow tunnels to use inner checksum offloads with outer checksums needed This patch enables us to use inner checksum offloads if provided by hardware with outer checksums computed by software. It basically reduces encap_hdr_csum to an advisory flag for now, but based on the fact that SCTP may be getting segmentation support before long I thought we may want to keep it as it is possible we may need to support CRC32c and 1's compliment checksum in the same packet at some point in the future. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	dbef491ebe	udp: Use uh->len instead of skb->len to compute checksum in segmentation The segmentation code was having to do a bunch of work to pull the skb->len and strip the udp header offset before the value could be used to adjust the checksum. Instead of doing all this work we can just use the value that goes into uh->len since that is the correct value with the correct byte order that we need anyway. By using this value we can save ourselves a bunch of pain as there is no need to do multiple byte swaps. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	fdaefd62fd	udp: Clean up the use of flags in UDP segmentation offload This patch goes though and cleans up the logic related to several of the control flags used in UDP segmentation. Specifically the use of dont_encap isn't really needed as we can just check the skb for CHECKSUM_PARTIAL and if it isn't set then we don't need to update the internal headers. As such we can just drop that value. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	3872035241	gre: Use inner_proto to obtain inner header protocol Instead of parsing headers to determine the inner protocol we can just pull the value from inner_proto. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	2e598af713	gre: Use GSO flags to determine csum need instead of GRE flags This patch updates the gre checksum path to follow something much closer to the UDP checksum path. By doing this we can avoid needing to do as much header inspection and can just make use of the fields we were already reading in the sk_buff structure. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	ddff00d420	net: Move skb_has_shared_frag check out of GRE code and into segmentation The call skb_has_shared_frag is used in the GRE path and skb_checksum_help to verify that no frags can be modified by an external entity. This check really doesn't belong in the GRE path but in the skb_segment function itself. This way any protocol that might be segmented will be performing this check before attempting to offload a checksum to software. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:34 -05:00
Alexander Duyck	08b64fcca9	net: Store checksum result for offloaded GSO checksums This patch makes it so that we can offload the checksums for a packet up to a certain point and then begin computing the checksums via software. Setting this up is fairly straight forward as all we need to do is reset the values stored in csum and csum_start for the GSO context block. One complication for this is remote checksum offload. In order to allow the inner checksums to be offloaded while computing the outer checksum manually we needed to have some way of indicating that the offload wasn't real. In order to do that I replaced CHECKSUM_PARTIAL with CHECKSUM_UNNECESSARY in the case of us computing checksums for the outer header while skipping computing checksums for the inner headers. We clean up the ip_summed flag and set it to either CHECKSUM_PARTIAL or CHECKSUM_NONE once we hand the packet off to the next lower level. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:33 -05:00
Alexander Duyck	7fbeffed77	net: Update remote checksum segmentation to support use of GSO checksum This patch addresses two main issues. First in the case of remote checksum offload we were avoiding dealing with scatter-gather issues. As a result it would be possible to assemble a series of frames that used frags instead of being linearized as they should have if remote checksum offload was enabled. Second I have updated the code so that we now let GSO take care of doing the checksum on the data itself and drop the special case that was added for remote checksum offload. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:33 -05:00
Alexander Duyck	7644345622	net: Move GSO csum into SKB_GSO_CB This patch moves the checksum maintained by GSO out of skb->csum and into the GSO context block in order to allow for us to work on outer checksums while maintaining the inner checksum offsets in the case of the inner checksum being offloaded, while the outer checksums will be computed. While updating the code I also did a minor cleanu-up on gso_make_checksum. The change is mostly to make it so that we store the values and compute the checksum instead of computing the checksum and then storing the values we needed to update. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:33 -05:00
Alexander Duyck	bef3c6c937	net: Drop unecessary enc_features variable from tunnel segmentation functions The enc_features variable isn't necessary since features isn't used anywhere after we create enc_features so instead just use a destructive AND on features itself and save ourselves the variable declaration. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 08:55:33 -05:00
sixiao@microsoft.com	a060679c6b	hv_netvsc: cleanup netdev feature flags for netvsc 1. Adding NETIF_F_TSO6 feature flag; 2. Adding NETIF_F_HW_CSUM. NETIF_F_IPV6_CSUM and NETIF_F_IP_CSUM are being deprecated; 3. Cleanup the coding style of flag assignment by using macro. Signed-off-by: Simon Xiao <sixiao@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:17:58 -05:00
David S. Miller	c1e9a49110	Merge branch 'ethtool-nfc-ipv6' Edward Cree says: ==================== IPv6 NFC This series adds support for steering IPv6 flows using the ethtool NFC interface, and implements it for sfc devices. Tested using an in-development patch to the ethtool utility. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:16:29 -05:00
Edward Cree	a7ad40d00a	sfc: implement IPv6 NFC (and IPV4_USER_FLOW) Signed-off-by: Edward Cree <ecree@solarflare.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:16:18 -05:00
Edward Cree	72bb68721f	ethtool: add IPv6 to the NFC API Signed-off-by: Edward Cree <ecree@solarflare.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:16:18 -05:00
David S. Miller	84e7f67aac	Merge branch 'cxgb4-tos' Hariprasad Shenai says: ==================== Add TOS support and some cleanup This series adds TOS support for iWARP and also does some cleanup to make code more readable. Patch series is created against infiniband tree and includes patches on iw_cxgb4 and cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:13:29 -05:00
Hariprasad Shenai	ba9cee6aa6	cxgb4/iw_cxgb4: TOS support This series provides support for iWARP applications to specify a TOS value and have that map to a VLAN Priority for iw_cxgb4 iWARP connections. In iw_cxgb4, when allocating an L2T entry, pass the skb_priority based on the tos value in the cm_id. Also pass the correct tos value during connection setup so the passive side gets the client's desired tos. When sending the FLOWC work request to FW, if the egress device is in a vlan, then use the vlan priority bits as the scheduling class. This allows associating RDMA connections with scheduling classes to provide traffic shaping per flow. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:13:23 -05:00
Hariprasad Shenai	6102c66eeb	iw_cxgb4: remove false error log entry Don't log errors if a listening endpoint is going away when procesing a PASS_ACCEPT_REQ message. This can happen. Change the error printk to a PDBG() debug log entry Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:13:23 -05:00
Hariprasad Shenai	9d3053ef57	iw_cxgb4: make queue allocation code more readable Rename local mm* variables to more meaningful names Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 07:13:23 -05:00
David S. Miller	f773abf63c	Merge branch 'fec-next' Troy Kisky says: ==================== net: fec: cleanup/fixes V2 is a rebase on top of johannes endian-safe patch and is only the 1st eight patches. The testing for this series was done on a nitrogen6x. The base commit was commit `b45efa30a6` Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Testing showed no change in performance. Testing used imx_v6_v7_defconfig + CONFIG_MICREL_PHY. The processor was running at 996Mhz. The following commands were used to get the transfer rates. On an x86 ubunto system, iperf -s -i.5 -u On a nitrogen6x board, running via SD Card. I first stopped some background processes stop cron stop upstart-file-bridge stop upstart-socket-bridge stop upstart-udev-bridge stop rsyslog stop dbus killall dhclient echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor taskset 0x2 iperf -c 192.168.0.201 -u -t60 -b500M -r There is a branch available on github with this series, and the rest of my fec patches, for those who would like to test it. https://github.com:boundarydevices/linux-imx6.git branch net-next_master ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:15:00 -05:00
Troy Kisky	fc75ba5159	net: fec: improve error handling Unmap initial buffer on error. Don't free skb until it has been unmapped. Move cbd_bufaddr assignment closer to the mapping function. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	be293467b8	net: fec: don't transfer ownership until descriptor write is complete If you don't own it, you shouldn't write to it. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	80dc6a9f8e	net: fec: don't disable FEC_ENET_TS_TIMER interrupt Only the interrupt routine processes this condition. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	53bb20d1fa	net: fec: add variable reg_desc_active to speed things up There is no need for complex macros every time we need to activate a queue. Also, no need to call skb_get_queue_mapping when we already know which queue it is using. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	7355f27606	net: fec: add struct bufdesc_prop This reduces code and gains speed. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	93c595f7b8	net: fec: fix fec_enet_get_free_txdesc_num When first initialized, cur_tx points to the 1st entry in the queue, and dirty_tx points to the last. At this point, fec_enet_get_free_txdesc_num will return tx_ring_size -2. If tx_ring_size -2 entries are now queued, then fec_enet_get_free_txdesc_num should return 0, but it returns tx_ring_size instead. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	095098e194	net: fec: fix rx error counts On an overrun, the other flags are not valid, so don't check them. Also, don't pass bad frames up the stack. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Troy Kisky	55cd48c821	net: fec: stop the "rcv is not +last, " error messages Setting the FTRL register will stop the fec from trying to use multiple receive buffers. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 06:14:51 -05:00
Nikolay Aleksandrov	7f20cd2521	bonding: 3ad: allow to set ad_actor settings while the bond is up No need to require the bond down while changing these settings, the change will be reflected immediately and the 3ad mode will sort itself out. For faster convergence set port->ntt to true in order to generate new LACPDUs immediately. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:31:11 -05:00
Johannes Berg	7a02bf892d	ipv6: add option to drop unsolicited neighbor advertisements In certain 802.11 wireless deployments, there will be NA proxies that use knowledge of the network to correctly answer requests. To prevent unsolicitd advertisements on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_unsolicited_na". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:27:36 -05:00
Johannes Berg	abbc30436d	ipv6: add option to drop unicast encapsulated in L2 multicast In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv6 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:27:36 -05:00
Johannes Berg	97daf33145	ipv4: add option to drop gratuitous ARP packets In certain 802.11 wireless deployments, there will be ARP proxies that use knowledge of the network to correctly answer requests. To prevent gratuitous ARP frames on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_gratuitous_arp". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:27:35 -05:00
Johannes Berg	12b74dfadb	ipv4: add option to drop unicast encapsulated in L2 multicast In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv4 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Additionally, enabling this option provides compliance with a SHOULD clause of RFC 1122. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:27:35 -05:00
Siva Reddy Kallam	ccad099356	MAINTAINERS: Update tg3 maintainer Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:25:48 -05:00
Wei Tang	1490d2bd1b	bpf_dbg: do not initialise statics to 0 This patch fixes the checkpatch.pl error to bpf_dbg.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:24:45 -05:00
David Ahern	dc599f76c2	net: Add support for filtering link dump by master device and kind Add support for filtering link dumps by master device and kind, similar to the filtering implemented for neighbor dumps. Each net_device that exists adds between 1196 bytes (eth) and 1556 bytes (bridge) to the link dump. As the number of interfaces increases so does the amount of data pushed to user space for a link list. If the user only wants to see a list of specific devices (e.g., interfaces enslaved to a specific bridge or a list of VRFs) most of that data is thrown away. Passing the filters to the kernel to have only relevant data returned makes the dump more efficient. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 04:18:26 -05:00
David S. Miller	fd1914b290	Merge branch 'tcp-fast-so_reuseport' Craig Gallek says: ==================== Faster SO_REUSEPORT for TCP This patch series complements an earlier series (`6a5ef90c58`) which added faster SO_REUSEPORT lookup for UDP sockets by extending the feature to TCP sockets. It uses the same array-based data structure which allows for socket selection after finding the first listening socket that matches an incoming packet. Prior to this feature, every socket in the reuseport group needed to be found and examined before a selection could be made. With this series the SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF socket options now work for TCP sockets as well. The test at the end of the series includes an example of how to use these options to select a reuseport socket based on the cpu core id handling the incoming packet. There are several refactoring patches that precede the feature implementation. Only the last two patches in this series should result in any behavioral changes. v4 - Fix build issue when compiling IPv6 as a module. This required moving the ipv6_rcv_saddr_equal into an object that is included as a built-in object. I included this change in the second patch which adds inet6_hash since that is where ipv6_rcv_saddr_equal will later be called from non-module code. v3: - Another warning in the first patch caught by a build bot. Return 0 in the no-op UDP hash function. v2: - In the first patched I missed a couple of hash functions that should now be returning int instead of void. I missed these the first time through as it only generated a warning and not an error :\ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:23 -05:00
Craig Gallek	4b2a6aed21	soreuseport: BPF selection functional test for TCP Unfortunately the existing test relied on packet payload in order to map incoming packets to sockets. In order to get this to work with TCP, TCP_FASTOPEN needed to be used. Since the fast open path is slightly different than the standard TCP path, I created a second test which sends to reuseport group members based on receiving cpu core id. This will probably serve as a better real-world example use as well. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:15 -05:00
Craig Gallek	c125e80b88	soreuseport: fast reuseport TCP socket selection This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socket from the group must be found in the listener list before any socket in the group can be used to receive a packet. Previously, every socket in the group needed to be considered before handing off the incoming packet. This feature also exposes the ability to use a BPF program when selecting a socket from a reuseport group. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:15 -05:00
Craig Gallek	fa46349767	soreuseport: Prep for fast reuseport TCP socket selection Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb points to a structure which assumes each socket in the group has this pointer assigned at the same time it's added to the array in the structure. The sk_clone_lock function breaks this assumption. Since a child socket shouldn't implicitly be in a reuseport group, the simple fix is to clear the field in the clone. Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that SO_REUSEPORT also be set first. For UDP sockets, this is easily enforced at bind-time since that process both puts the socket in the appropriate receive hlist and updates the reuseport structures. Since these operations can happen at two different times for TCP sockets (bind and listen) it must be explicitly checked to enforce the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the setsockopt call. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:15 -05:00
Craig Gallek	a583636a83	inet: refactor inet[6]_lookup functions to take skb This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations where it will be used in a following patch. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:14 -05:00
Craig Gallek	d9b3fca273	tcp: __tcp_hdrlen() helper tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr. This splits the size calculation into a helper function that can be used if a struct tcphdr is already available. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:14 -05:00

1 2 3 4 5 ...

574145 Commits