linux

iv/linux

Author	SHA1	Message	Date
Jiri Benc	00a93babd0	openvswitch: add tunnel protocol to sw_flow_key Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now also acts as an indicator whether the flow contains tunnel data (this was previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses in an union with IPv4 ones this won't work anymore). The new field was added to a hole in sw_flow_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:17:59 -07:00
Nikolay Aleksandrov	4917a1548f	bridge: netlink: make br_fill_info's frame size smaller When KASAN is enabled the frame size grows > 2048 bytes and we get a warning, so make it smaller. net/bridge/br_netlink.c: In function 'br_fill_info': >> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes >> is larger than 2048 bytes [-Wframe-larger-than=] Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:15:57 -07:00
David Ahern	16660f0bd9	net: Add support for filtering neigh dump by device index Add support for filtering neighbor dumps by device by adding the NDA_IFINDEX attribute to the dump request. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:12:02 -07:00
David S. Miller	e892406f00	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-10-03 This series contains updates to i40e and i40evf, some of which are to resolve more Red Hat bugzilla issues. Jiang Liu updates the i40e and i40evf drivers to use numa_mem_id() instead of numa_node_id() to get the nearest node with memory which better supports memoryless nodes. Anjali fixes an issue from Dan Carpenter <dan.carpenter@oracle.com>, to resolve a memory leak in X722 RSS configuration path, where we should free the memory allocated before exiting. Shannon modifies the drivers to ensure we have the spinlocks before we clear the ARQ and ASQ management registers. In addition, we widen the locked portion insert a sanity check to ensure we are working with safe register values. Mitch fixes an issue where under certain circumstances, we can get an extra VF_RESOURCES message from the PF driver at runtime. When this occurs, we need to parse it because our VSI may have changed and that will affect the relationship with the PF driver. But this parsing also blows away our current MAC address, so resolve the issue by restoring the current MAC address from the netdev struct after we parse the resource message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 03:01:53 -07:00
Russell King	d25b8e7429	net: dsa: better error reporting Add additional error reporting to the generic DSA code, so it's easier to debug when things go wrong. This was useful when initially bringing up 88e6176 on a new board. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 02:58:49 -07:00
Russell King	4bac50bace	net: dsa: mv88e6xxx: remove link polling The link status is polled by the generic phy layer, there's no need to duplicate that polling with additional polling. This additional polling adds additional MDIO traffic, and races with the generic phy layer, resulting in missing or duplicated link status messages. Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 02:58:47 -07:00
David S. Miller	21c4c073f1	Revert "regmap: Allow installing custom reg_update_bits function" This reverts commit `7741c373cf`.	2015-10-06 06:25:43 -07:00
David S. Miller	6a27a6c3be	Revert "net: Microchip encx24j600 driver" This reverts commit `04fbfce7a2`.	2015-10-06 06:25:36 -07:00
David S. Miller	c664bc6d94	Revert "net: encx24j600_exit() can be static" This reverts commit `9886ce2b9d`.	2015-10-06 06:25:29 -07:00
Peter Nørlund	0a837fe472	ipv4: Fix compilation errors in fib_rebalance This fixes net/built-in.o: In function `fib_rebalance': fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3' and net/built-in.o: In function `fib_rebalance': net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod' Fixes: `0e884c78ee` ("ipv4: L3 hash-based multipath") Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 23:48:09 -07:00
Daniel Borkmann	0cdf5640e4	ebpf: include perf_event only where really needed Commit `ea317b267e` ("bpf: Add new bpf map type to store the pointer to struct perf_event") added perf_event.h to the main eBPF header, so it gets included for all users. perf_event.h is actually only needed from array map side, so lets sanitize this a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Kaixu Xia <xiakaixu@huawei.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 07:04:08 -07:00
Nicolas Schichan	4560cdff03	ARM: net: support BPF_ALU \| BPF_MOD instructions in the BPF JIT. For ARMv7 with UDIV instruction support, generate an UDIV instruction followed by an MLS instruction. For other ARM variants, generate code calling a C wrapper similar to the jit_udiv() function used for BPF_ALU \| BPF_DIV instructions. Some performance numbers reported by the test_bpf module (the duration per filter run is reported in nanoseconds, between "jitted:<x>" and "PASS": ARMv7 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS ARMv7 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 104 PASS ARMv5 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS ARMv5 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS ARMv5 kirkwood nojit: test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS ARMv5 kirkwood jit: test_bpf: #3 DIV_MOD_KX jited:1 311 PASS Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 07:02:42 -07:00
David S. Miller	df7b601542	Merge branch 'asix-rx-mem-handling' Mark Craske says: ==================== Improve ASIX RX memory allocation error handling The ASIX RX handler algorithm is weak on error handling. There is a design flaw in the ASIX RX handler algorithm because the implementation for handling RX Ethernet frames for the DUB-E100 C1 can have Ethernet frames spanning multiple URBs. This means that payload data from more than 1 URB is sometimes needed to fill the socket buffer with a complete Ethernet frame. When the URB with the start of an Ethernet frame is received then an attempt is made to allocate a socket buffer. If the memory allocation fails then the algorithm sets the buffer pointer member to NULL and the function exits (no crash yet). Subsequently, the RX hander is called again to process the next URB which assumes there is a socket buffer available and the kernel crashes when there is no buffer. This patchset implements an improvement to the RX handling algorithm to avoid a crash when no memory is available for the socket buffer. The patchset will apply cleanly to the net-next master branch but the created kernel has not been tested. The driver was tested on ARM kernels v3.8 and v3.14 for a commercial product. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:51 -07:00
Dean Jenkins	6a570814cd	asix: Continue processing URB if no RX netdev buffer Avoid a loss of synchronisation of the Ethernet Data header 32-bit word due to a failure to get a netdev socket buffer. The ASIX RX handling algorithm returned 0 upon a failure to get an allocation of a netdev socket buffer. This causes the URB processing to stop which potentially causes a loss of synchronisation with the Ethernet Data header 32-bit word. Therefore, subsequent processing of URBs may be rejected due to a loss of synchronisation. This may cause additional good Ethernet frames to be discarded along with outputting of synchronisation error messages. Implement a solution which checks whether a netdev socket buffer has been allocated before trying to copy the Ethernet frame into the netdev socket buffer. But continue to process the URB so that synchronisation is maintained. Therefore, only a single Ethernet frame is discarded when no netdev socket buffer is available. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Mark Craske <Mark_Craske@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:43 -07:00
Dean Jenkins	3f30b158eb	asix: On RX avoid creating bad Ethernet frames When RX Ethernet frames span multiple URB socket buffers, the data stream may suffer a discontinuity which will cause the current Ethernet frame in the netdev socket buffer to be incomplete. This frame needs to be discarded instead of appending unrelated data from the current URB socket buffer to the Ethernet frame in the netdev socket buffer. This avoids creating a corrupted Ethernet frame in the netdev socket buffer. A discontinuity can occur when the previous URB socket buffer held an incomplete Ethernet frame due to truncation or a URB socket buffer containing the end of the Ethernet frame was missing. Therefore, add a sanity test for when an Ethernet frame spans multiple URB socket buffers to check that the remaining bytes of the currently received Ethernet frame point to a good Data header 32-bit word of the next Ethernet frame. Upon error, reset the remaining bytes variable to zero and discard the current netdev socket buffer. Assume that the Data header is located at the start of the current socket buffer and attempt to process the next Ethernet frame from there. This avoids unnecessarily discarding a good URB socket buffer that contains a new Ethernet frame. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Mark Craske <Mark_Craske@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:43 -07:00
Dean Jenkins	9a5ccd8e03	asix: Simplify asix_rx_fixup_internal() netdev alloc The code is checking that the Ethernet frame will fit into a netdev allocated socket buffer within the constraints of MTU size, Ethernet header length plus VLAN header length. The original code was checking rx->remaining each loop of the while loop that processes multiple Ethernet frames per URB and/or Ethernet frames that span across URBs. rx->remaining decreases per while loop so there is no point in potentially checking multiple times that the Ethernet frame (remaining part) will fit into the netdev socket buffer. The modification checks that the size of the Ethernet frame will fit the netdev socket buffer before allocating the netdev socket buffer. This avoids grabbing memory and then deciding that the Ethernet frame is too big and then freeing the memory. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Mark Craske <Mark_Craske@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:41 -07:00
Dean Jenkins	3bfc69abf8	asix: Tidy-up 32-bit header word synchronisation Tidy-up the Data header 32-bit word synchronisation logic in asix_rx_fixup_internal() by removing redundant logic tests. The code is looking at the following cases of the Data header 32-bit word that is present before each Ethernet frame: a) all 32 bits of the Data header word are in the URB socket buffer b) first 16 bits of the Data header word are at the end of the URB socket buffer c) last 16 bits of the Data header word are at the start of the URB socket buffer eg. split_head = true Note that the lifetime of rx->split_head exists outside of the function call and is accessed per processing of each URB. Therefore, split_head being true acts on the next URB to be processed. To check for b) the offset will be 16 bits (2 bytes) from the end of the buffer then indicate split_head is true. To check for c) split_head must be true because the first 16 bits have been found. To check for a) else c) Note that the \|\| logic of the old code included the state (skb->len - offset == sizeof(u16) && rx->split_head) which is not possible because the split_head cannot be true whilst checking for b). This is because the split_head indicates that the first 16 bits have been found and that is not possible whilst checking for the first 16 bits. Therefore simplify the logic. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Mark Craske <Mark_Craske@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:40 -07:00
Dean Jenkins	7b0378f517	asix: Rename remaining and size for clarity The Data header synchronisation is easier to understand if the variables "remaining" and "size" are renamed. Therefore, the lifetime of the "remaining" variable exists outside of asix_rx_fixup_internal() and is used to indicate any remaining pending bytes of the Ethernet frame that need to be obtained from the next socket buffer. This allows an Ethernet frame to span across multiple socket buffers. "size" is now local to asix_rx_fixup_internal() and contains the size read from the Data header 32-bit word. Add "copy_length" to hold the number of the Ethernet frame bytes (maybe a part of a full frame) that are to be copied out of the socket buffer. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Mark Craske <Mark_Craske@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:58:38 -07:00
Daniel Borkmann	bab1899187	bpf, seccomp: prepare for upcoming criu support The current ongoing effort to dump existing cBPF seccomp filters back to user space requires to hold the pre-transformed instructions like we do in case of socket filters from sk_attach_filter() side, so they can be reloaded in original form at a later point in time by utilities such as criu. To prepare for this, simply extend the bpf_prog_create_from_user() API to hold a flag that tells whether we should store the original or not. Also, fanout filters could make use of that in future for things like diag. While fanout filters already use bpf_prog_destroy(), move seccomp over to them as well to handle original programs when present. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:47:05 -07:00
WANG Cong	0a15afd2ea	vrf: fix a kernel warning This fixes: tried to remove device ip6gre0 from (null) ------------[ cut here ]------------ kernel BUG at net/core/dev.c:5219! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 3 PID: 161 Comm: kworker/u8:2 Not tainted 4.3.0-rc2+ #1142 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff8800d784a9c0 ti: ffff8800d74a4000 task.ti: ffff8800d74a4000 RIP: 0010:[<ffffffff817f0797>] [<ffffffff817f0797>] __netdev_adjacent_dev_remove+0x40/0xec RSP: 0018:ffff8800d74a7a98 EFLAGS: 00010282 RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88011adcf701 RSI: ffff88011adccbf8 RDI: ffff88011adccbf8 RBP: ffff8800d74a7ab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff81d190ff R11: 00000000ffffffff R12: ffff8800d599e7c0 R13: 0000000000000000 R14: ffff8800d599e890 R15: ffffffff82385e00 FS: 0000000000000000(0000) GS:ffff88011ac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007ffd6f003000 CR3: 000000000220c000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff8800d599e7c0 0000000000000b00 ffff8800d599e8a0 ffff8800d74a7ad8 ffffffff817f0861 0000000000000000 ffff8800d599e7c0 ffff8800d74a7af8 ffffffff817f088f 0000000000000000 ffff8800d599e7c0 Call Trace: [<ffffffff817f0861>] __netdev_adjacent_dev_unlink+0x1e/0x35 [<ffffffff817f088f>] __netdev_adjacent_dev_unlink_neighbour+0x17/0x41 [<ffffffff817f56e6>] netdev_upper_dev_unlink+0x6c/0x13d [<ffffffff81674a3d>] vrf_del_slave+0x26/0x7d [<ffffffff81674ac3>] vrf_device_event+0x2f/0x34 [<ffffffff81098c40>] notifier_call_chain+0x75/0x9c [<ffffffff81098fa2>] raw_notifier_call_chain+0x14/0x16 [<ffffffff817ee129>] call_netdevice_notifiers_info+0x52/0x59 [<ffffffff817f179d>] call_netdevice_notifiers+0x13/0x15 [<ffffffff817f6f18>] rollback_registered_many+0x14f/0x24f [<ffffffff817f70f2>] unregister_netdevice_many+0x19/0x64 [<ffffffff819a2455>] ip6gre_exit_net+0x163/0x177 [<ffffffff817eb019>] ops_exit_list+0x44/0x55 [<ffffffff817ebcb7>] cleanup_net+0x193/0x226 [<ffffffff81091e1c>] process_one_work+0x26c/0x4d8 [<ffffffff81091d20>] ? process_one_work+0x170/0x4d8 [<ffffffff81092296>] worker_thread+0x1df/0x2c2 [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f [<ffffffff81097a20>] kthread+0xd4/0xdc [<ffffffff810bc523>] ? trace_hardirqs_on_caller+0x17d/0x199 [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83 [<ffffffff81a5240f>] ret_from_fork+0x3f/0x70 [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83 Fixes: `93a7e7e837` ("net: Remove the now unused vrf_ptr") Cc: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:35:51 -07:00
kbuild test robot	9886ce2b9d	net: encx24j600_exit() can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:43 -07:00
Jon Ringle	04fbfce7a2	net: Microchip encx24j600 driver This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet controller over a SPI bus interface. This driver makes use of the regmap API to optimize access to registers by caching registers where possible. Datasheet: http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:41 -07:00
Jon Ringle	7741c373cf	regmap: Allow installing custom reg_update_bits function This commit allows installing a custom reg_update_bits function for cases where the hardware provides a mechanism to set or clear register bits without a read/modify/write cycle. Such is the case with the Microchip ENCX24J600. Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:40 -07:00
Govindarajulu Varadarajan	937317c7c1	enic: do hang reset only in case of tx timeout The current code invokes hang reset in case of error interrupt. We should hang reset only in case of tx timeout. This because of the way hang reset is implemented in firmware. Hang reset takes more firmware resources than soft reset. Adaptor does not generate error interrupt in case of tx timeout. Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:51:35 -07:00
Govindarajulu Varadarajan	cc809237e1	enic: handle spurious error interrupt Some of the enic adaptors are know to generate spurious interrupts. When error interrupt is generated, driver just resets the device. This patch resets the device only when an error is occurred. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:51:33 -07:00
David S. Miller	2905f5bb1c	Merge branch 'cxgb4-next' Hariprasad Shenai says: ==================== cxgb4: Trivial fixes for cxgb4 Fixes the following issues Don't read non existent T4/T5/T6 adapter registers for ethtool dump. For T4, dont read mailbox control registers. Adds new devlog faility and report correct link speed for unsupported ones. This patch series has been created against net-next tree and includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:45 -07:00
Hariprasad Shenai	85412255ef	cxgb4: Report correct link speed for unsupported ones When we get garbage from the firmware with weird Port Speeds, etc. we should emit a warning regarding unsupported speeds rather than use the bogus default of "10Mbps" which isn't even an option in the firmware Port Information message Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:41 -07:00
Hariprasad Shenai	da4976e17b	cxgb4: Adds a new Device Log Facility FW_DEVLOG_FACILITY_CF The firmware team added a new Device Log Facility FW_DEVLOG_FACILITY_CF, but the driver has been decoding Device Log messages with that Facility as "(NULL)", fixing it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:41 -07:00
Hariprasad Shenai	b3695540ba	cxgb4: For T4, don't read the Firmware Mailbox Control register T4 doesn't have the Shadow copy of the register which we can read without side effect. So don't read mbox control register for T4 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:40 -07:00
Hariprasad Shenai	8119c01800	cxgb4 : Update T4/T5/T6 register ranges Update T4/T5/T6 adapter register ranges so that it doesn't read non existent registers when dumped using ethtool Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:48:39 -07:00
David S. Miller	40e106801e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next Eric W. Biederman says: ==================== net: Pass net through ip fragmention This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. This round focuses on passing net through ip fragmentation which we seem to call from about everywhere. That is the main ip output paths, the bridge netfilter code, and openvswitch. This has to happend at once accross the tree as function pointers are involved. First some prep work is done, then ipv4 and ipv6 are converted and then temporary helper functions are removed. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:39:31 -07:00
David S. Miller	7e2832f17f	Merge branch 'rds-perf' Sowmini Varadhan says: ==================== RDS: RDS-TCP perf enhancements A 3-part patchset that (a) improves current RDS-TCP perf by 2X-3X and (b) refactors earlier robustness code for better observability/scaling. Patch 1 is an enhancment of earlier robustness fixes that had used separate sockets for client and server endpoints to resolve race conditions. It is possible to have an equivalent solution that does not use 2 sockets. The benefit of a single socket solution is that it results in more predictable and observable behavior for the underlying TCP pipe of an RDS connection Patches 2 and 3 are simple, straightforward perf bug fixes that align the RDS TCP socket with other parts of the kernel stack. v2: fix kbuild-test-robot warnings, comments from Sergei Shtylov and Santosh Shilimkar. ==================== Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:35:29 -07:00
Sowmini Varadhan	76b29ef120	RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit For the same reasons as commit `2f53384424` ("tcp: allow splice() to build full TSO packets") and commit `35f9c09fe9` ("tcp: tcp_sendpages() should call tcp_push() once"), rds_tcp_xmit may have multiple pages to send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to tcp_sendpage() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:53 -07:00
Sowmini Varadhan	1edd6a14d2	RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K) clobbers efficient use of TSO because it inflates the size_goal that is computed in tcp_sendmsg/tcp_sendpage and skews packet latency, and the default values for these parameters actually results in significantly better performance. In request-response tests using rds-stress with a packet size of 100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16) between a single pair of IP addresses achieves a throughput of 6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under equivalent conditions on these platforms. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:53 -07:00
Sowmini Varadhan	3b20fc3897	RDS: Use a single TCP socket for both send and receive. Commit `f711a6ae06` ("net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.") modified rds-tcp so that an incoming SYN would ignore an existing "client" TCP connection which had the local port set to the transient port. The motivation for ignoring the existing "client" connection in `f711a6ae` was to avoid race conditions and an endless duel of reconnect attempts triggered by a restart/abort of one of the nodes in the TCP connection. However, having separate sockets for active and passive sides is avoidable, and the simpler model of a single TCP socket for both send and receives of all RDS connections associated with that tcp socket makes for easier observability. We avoid the race conditions from `f711a6ae` by attempting reconnects in rds_conn_shutdown if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP. The c_outgoing bit is initialized in __rds_conn_create(). A side-effect of re-using the client rds_connection for an incoming SYN is the potential of encountering duelling SYNs, i.e., we have an outgoing RDS_CONN_CONNECTING socket when we get the incoming SYN. The logic to arbitrate this criss-crossing SYN exchange in rds_tcp_accept_one() has been modified to emulate the BGP state machine: the smaller IP address should back off from the connection attempt. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:34:51 -07:00
David S. Miller	393159e917	Merge branch 'xgbe-next' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2015-09-30 The following patches are included in this driver update series: - Remove unneeded semi-colon - Follow the DT/ACPI precedence used by the device_ APIs - Add ethtool support for getting and setting the msglevel - Add ethtool support error and debug messages - Simplify the hardware FIFO assignment calculations - Add receive buffer unavailable statistic - Use the device workqueue instead of the system workqueue - Remove the use of a link state bit This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:40 -07:00
Lendacky, Thomas	50789845cf	amd-xgbe: Remove the XGBE_LINK state bit The XGBE_LINK bit is used just to determine whether to call the netif_carrier_on/off functions. Rather than define and use this bit, just call the functions. The netif_carrier_ok function can be used in place of checking the XGBE_LINK bit in the future. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:27 -07:00
Lendacky, Thomas	afb43e8a0a	amd-xgbe: Use device workqueue instead of system workqueue The driver creates, flushes and destroys a device workqueue but queues work to the system workqueue. Switch from using the system workqueue to the device workqueue. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:26 -07:00
Lendacky, Thomas	72c9ac4e1f	amd-xgbe: Add receive buffer unavailable statistic Add a statistic that tracks how many times an interrupt is generated for a receive buffer not being available to the hardware which prevents the hardware from being able to DMA the received data. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:26 -07:00
Lendacky, Thomas	9c439e4b73	amd-xgbe: Simplify calculation and setting of queue fifos The calculation of the Tx and Rx fifo sizes can be calculated rather than hardcoded in a switch statement. Additionally, the per-queue fifo sizes can be calculated rather than hardcoded using if/else if statements that can possibly underutilize the available fifo area. Change the code to calculate the fifo sizes and the per-queue fifo sizes to simplify the code and make best use of the available fifo. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:25 -07:00
Lendacky, Thomas	e5dd8b8110	amd-xgbe: Add ethtool error and debug messages Add error and dynamic debug messages to various ethtool functions in the driver while also removing the DBGPR debug print calls. Also, change the message level for some error messages from alert to err. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:25 -07:00
Lendacky, Thomas	349fb2d700	amd-xgbe: Add ethtool support for setting the msglevel Provide the ethtool functions to support getting and setting the msglevel for the driver. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:23 -07:00
Lendacky, Thomas	47f2e6c275	amd-xgbe: Use proper DT / ACPI precedence checking Device tree presence takes precedence over ACPI in the device_* APIs. The amd-xgbe driver should follow the same precedence. Update the check on whether to use DT / ACPI to follow this. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:22 -07:00
Lendacky, Thomas	3947d78a54	amd-xgbe: Remove an unneeded semicolon on a switch statement Remove an unneeded semicolon at the end of a switch statement block. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:23:22 -07:00
Eric Dumazet	ac8cfc7bb8	tcp: restore fastopen operations I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc() while max_qlen can be set before listen() is called, using TCP_FASTOPEN socket option for example. Fixes: `0536fcc039` ("tcp: prepare fastopen code for upcoming listener changes") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:19:06 -07:00
David S. Miller	77946de51b	Merge branch 'net-y2038' Arnd Bergmann says: ==================== net: assorted y2038 changes This is a set of changes for network drivers and core code to get rid of the use of time_t and derived data structures. I have a longer set of patches that enables me to build kernels with the time_t definition removed completely as a help to find y2038 overflow issues. This is the subset for networking that contains all code that has a reasonable way of fixing at the moment and that is either commonly used (in one of the defconfigs) or that blocks building a whole subsystem. Most of the patches in this series should be noncontroversial, but the last two that I marked [RFC] are a bit tricky and need input from people that are more familiar with the code than I am. All 12 patches are independent of one another and can be applied in any order, so feel free to pick all that look good. Patches that are not included here are: - disabling less common device drivers that I don't have a fix for yet, this includes drivers/net/ethernet/brocade/bna/bfa_ioc.c drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c drivers/net/ethernet/tile/tilegx.c drivers/net/hamradio/baycom_ser_fdx.c drivers/net/wireless/ath/ath10k/core.h drivers/net/wireless/ath/ath9k/ drivers/net/wireless/ath/ath9k/ drivers/net/wireless/atmel.c drivers/net/wireless/prism54/isl_38xx.c drivers/net/wireless/rt2x00/rt2x00debug.c drivers/net/wireless/rtlwifi/ drivers/net/wireless/ti/wlcore/ drivers/staging/ozwpan/ net/atm/mpoa_caches.c net/atm/mpoa_proc.c net/dccp/probe.c net/ipv4/tcp_probe.c net/netfilter/nfnetlink_queue_core.c net/netfilter/nfnetlink_queue_core.c net/netfilter/xt_time.c net/openvswitch/flow.c net/sctp/probe.c net/sunrpc/auth_gss/ net/sunrpc/svcauth_unix.c net/vmw_vsock/af_vsock.c We'll get there eventually, or we an add a dependency to ensure they are not built on 32-bit kernels that need to survive beyond 2038. Most of these should be really easy to fix. - recvmmsg/sendmmsg system calls: patches have been sent out as part of the syscall series, need a little more work and review - SIOCGSTAMP/SIOCGSTAMPNS/ ioctl calls: tricky, need to discuss with some folks at kernel summit - SO_RCVTIMEO/SO_SNDTIMEO/SO_TIMESTAMP/SO_TIMESTAMPNS socket opt: similar and related to the ioctl - mmapped packet socket: need to create v4 of the API, nontrivial - pktgen: sends 32-bit timestamps over network, need to find out if using unsigned stamps is good enough - af_rxpc: similar to pktgen, uses 32-bit times for deadlines - ppp ioctl: patch is being worked on, nontrivial but doable ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:49 -07:00
Arnd Bergmann	3ef0a25bf9	net: sctp: avoid incorrect time_t use We want to avoid using time_t in the kernel because of the y2038 overflow problem. The use in sctp is not for storing seconds at all, but instead uses microseconds and is passed as 32-bit on all machines. This patch changes the type to u32, which better fits the use. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: linux-sctp@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:48 -07:00
Arnd Bergmann	3dd7669f1f	ipv6: use ktime_t for internal timestamps The ipv6 mip6 implementation is one of only a few users of the skb_get_timestamp() function in the kernel, which is both unsafe on 32-bit architectures because of the 2038 overflow, and slightly less efficient than the skb_get_ktime() based approach. This converts the function call and the mip6_report_rate_limiter structure that stores the time stamp, eliminating all uses of timeval in the ipv6 code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:47 -07:00
Arnd Bergmann	f6389ecbc5	nfnetlink: use y2038 safe timestamp The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:47 -07:00
Arnd Bergmann	70ba07b675	atm: remove 'struct zatm_t_hist' The zatm_t_hist structure is not used anywhere in the kernel, but is exported to user space. As we are trying to eliminate uses of time_t in the kernel for y2038 compatibility, the current definition triggers checking tools because it contains 'struct timeval'. As pointed out by Chas Williams, the only user of this structure was the ZATM_GETHIST ioctl command that has been removed a long time ago, and we can remove the structure as well without breaking any user space. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Chas Williams <3chas3@gmail.com> Cc: linux-atm-general@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:46 -07:00

1 2 3 4 5 ...

548080 Commits