linux

iv/linux

Author	SHA1	Message	Date
David S. Miller	8ae3c911b9	Merge branch 'cxgb4-net' Anish Bhatt says: ==================== cxgb4 : DCBx fixes for apps/host lldp agents This patchset contains some minor fixes for cxgb4 DCBx code. Chiefly, cxgb4 was not cleaning up any apps added to kernel app table when link was lost. Disabling DCBx in firmware would automatically set DCBx state to host-managed and enabled, we now wait for an explicit enable call from an lldp agent instead First patch was originally sent to net-next, but considering it applies to correcting behaviour of code already in net, I think it qualifies as a bug fix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:00:16 -04:00
Anish Bhatt	3bb062613b	cxgb4 : Handle dcb enable correctly Disabling DCBx in firmware automatically enables DCBx for control via host lldp agents. Wait for an explicit setstate call from an lldp agents to enable DCBx instead. Fixes: `76bcb31efc` ("cxgb4 : Add DCBx support codebase and dcbnl_ops") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:00:09 -04:00
Anish Bhatt	2376c879b8	cxgb4 : Improve handling of DCB negotiation or loss thereof Clear out any DCB apps we might have added to kernel table when we lose DCB sync (or IEEE equivalent event). These were previously left behind and not cleaned up correctly. IEEE allows individual components to work independently, so improve check for IEEE completion by specifying individual components. Fixes: `10b0046685` ("cxgb4: IEEE fixes for DCBx state machine") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:00:09 -04:00
David S. Miller	5d26b1f50a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Allow to recycle a TCP port in conntrack when the change role from server to client, from Marcelo Leitner. 2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch from Dan Carpenter. 3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables chain statistic updates. From Sabrina Dubroca. 4) Don't compile ip options in bridge netfilter, this mangles the packet and bridge should not alter layer >= 3 headers when forwarding packets. Patch from Herbert Xu and tested by Florian Westphal. 5) Account the final NLMSG_DONE message when calculating the size of the nflog netlink batches. Patch from Florian Westphal. 6) Fix a possible netlink attribute length overflow with large packets. Again from Florian Westphal. 7) Release the skbuff if nfnetlink_log fails to put the final NLMSG_DONE message. This fixes a leak on error. This shouldn't ever happen though, otherwise this means we miscalculate the netlink batch size, so spot a warning if this ever happens so we can track down the problem. This patch from Houcheng Lin. 8) Look at the right list when recycling targets in the nft_compat, patch from Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 18:47:40 -04:00
Arturo Borrero	7965ee9371	netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-27 22:17:46 +01:00
Eric Dumazet	93a35f59f1	net: napi_reuse_skb() should check pfmemalloc Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-26 22:47:23 -04:00
David S. Miller	aa9c557915	Merge branch 'mellanox' Eli Cohen says: ==================== irq sync fixes This two patch series fixes a race where an interrupt handler could access a freed memory. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-26 22:46:08 -04:00
Eli Cohen	bf1bac5b78	net/mlx4_core: Call synchronize_irq() before freeing EQ buffer After moving the EQ ownership to software effectively destroying it, call synchronize_irq() to ensure that any handler routines running on other CPU cores finish execution. Only then free the EQ buffer. The same thing is done when we destroy a CQ which is one of the sources generating interrupts. In the case of CQ we want to avoid completion handlers on a CQ that was destroyed. In the case we do the same to avoid receiving asynchronous events after the EQ has been destroyed and its buffers freed. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-26 22:46:04 -04:00
Eli Cohen	96e4be06cb	net/mlx5_core: Call synchronize_irq() before freeing EQ buffer After destroying the EQ, the object responsible for generating interrupts, call synchronize_irq() to ensure that any handler routines running on other CPU cores finish execution. Only then free the EQ buffer. This patch solves a very rare case when we get panic on driver unload. The same thing is done when we destroy a CQ which is one of the sources generating interrupts. In the case of CQ we want to avoid completion handlers on a CQ that was destroyed. In the case we do the same to avoid receiving asynchronous events after the EQ has been destroyed and its buffers freed. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-26 22:46:03 -04:00
Geert Uytterhoeven	b71e821de5	drivers: net: xgene: Rewrite buggy loop in xgene_enet_ecc_init() drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c: In function ‘xgene_enet_ecc_init’: drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:126: warning: ‘data’ may be used uninitialized in this function Depending on the arbitrary value on the stack, the loop may terminate too early, and cause a bogus -ENODEV failure. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 17:05:20 -04:00
Dan Carpenter	013f6579c6	i40e: _MASK vs _SHIFT typo in i40e_handle_mdd_event() We accidentally mask by the _SHIFT variable. It means that "event" is always zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 16:50:56 -04:00
Eric Dumazet	fe0ca7328d	macvlan: fix a race on port dismantle and possible skb leaks We need to cancel the work queue after rcu grace period, otherwise it can be rescheduled by incoming packets. We need to purge queue if some skbs are still in it. We can use __skb_queue_head_init() variant in macvlan_process_broadcast() Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `412ca1550c` ("macvlan: Move broadcasts into a work queue") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 16:24:02 -04:00
Eric Dumazet	349ce993ac	tcp: md5: do not use alloc_percpu() percpu tcp_md5sig_pool contains memory blobs that ultimately go through sg_set_buf(). -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); This requires that whole area is in a physically contiguous portion of memory. And that @buf is not backed by vmalloc(). Given that alloc_percpu() can use vmalloc() areas, this does not fit the requirements. Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool is small anyway, there is no gain to dynamically allocate it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `765cf9976e` ("tcp: md5: remove one indirection level in tcp_md5sig_pool") Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 16:10:04 -04:00
David S. Miller	4cc40af080	Merge branch 'xen-netback' David Vrabel says: ==================== xen-netback: guest Rx queue drain and stall fixes This series fixes two critical xen-netback bugs. 1. Netback may consume all of host memory by queuing an unlimited number of skb on the internal guest Rx queue. This behaviour is guest triggerable. 2. Carrier flapping under high traffic rates which reduces performance. The first patch is a prerequite. Removing support for frontends with feature-rx-notify makes it easier to reason about the correctness of netback since it no longer has to support this outdated and broken mode. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 14:15:25 -04:00
David Vrabel	ecf08d2dbb	xen-netback: reintroduce guest Rx stall detection If a frontend not receiving packets it is useful to detect this and turn off the carrier so packets are dropped early instead of being queued and drained when they expire. A to-guest queue is stalled if it doesn't have enough free slots for a an extended period of time (default 60 s). If at least one queue is stalled, the carrier is turned off (in the expectation that the other queues will soon stall as well). The carrier is only turned on once all queues are ready. When the frontend connects, all the queues start in the stalled state and only become ready once the frontend queues enough Rx requests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 14:15:20 -04:00
David Vrabel	f48da8b14d	xen-netback: fix unlimited guest Rx internal queue and carrier flapping Netback needs to discard old to-guest skb's (guest Rx queue drain) and it needs detect guest Rx stalls (to disable the carrier so packets are discarded earlier), but the current implementation is very broken. 1. The check in hard_start_xmit of the slot availability did not consider the number of packets that were already in the guest Rx queue. This could allow the queue to grow without bound. The guest stops consuming packets and the ring was allowed to fill leaving S slot free. Netback queues a packet requiring more than S slots (ensuring that the ring stays with S slots free). Netback queue indefinately packets provided that then require S or fewer slots. 2. The Rx stall detection is not triggered in this case since the (host) Tx queue is not stopped. 3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback will consider this an Rx purge event which may result in it taking the carrier down unnecessarily. It also considers a queue with only 1 slot free as unstalled (even though the next packet might not fit in this). The internal guest Rx queue is limited by a byte length (to 512 Kib, enough for half the ring). The (host) Tx queue is stopped and started based on this limit. This sets an upper bound on the amount of memory used by packets on the internal queue. This allows the estimatation of the number of slots for an skb to be removed (it wasn't a very good estimate anyway). Instead, the guest Rx thread just waits for enough free slots for a maximum sized packet. skbs queued on the internal queue have an 'expires' time (set to the current time plus the drain timeout). The guest Rx thread will detect when the skb at the head of the queue has expired and discard expired skbs. This sets a clear upper bound on the length of time an skb can be queued for. For a guest being destroyed the maximum time needed to wait for all the packets it sent to be dropped is still the drain timeout (10 s) since it will not be sending new packets. Rx stall detection is reintroduced in a later commit. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 14:15:20 -04:00
David Vrabel	bc96f648df	xen-netback: make feature-rx-notify mandatory Frontends that do not provide feature-rx-notify may stall because netback depends on the notification from frontend to wake the guest Rx thread (even if can_queue is false). This could be fixed but feature-rx-notify was introduced in 2006 and I am not aware of any frontends that do not implement this. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-25 14:15:20 -04:00
Richard Cochran	5345c1d417	ptp: restore the makefile for building the test program. This patch brings back the makefile called testptp.mk which was removed in commit `adb19fb66e` (Documentation: add makefiles for more targets). While the idea of that commit was to improve build coverage of the examples, the new Makefile is unable to cross compile the testptp program. In contrast, the deleted makefile was able to do this just fine. This patch fixes the regression by restoring the original makefile. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Peter Foley <pefoley2@pefoley.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 16:07:10 -04:00
Houcheng Lin	b51d3fa364	netfilter: nf_log: release skbuff on nlmsg put failure The kernel should reserve enough room in the skb so that the DONE message can always be appended. However, in case of e.g. new attribute erronously not being size-accounted for, __nfulnl_send() will still try to put next nlmsg into this full skbuf, causing the skb to be stuck forever and blocking delivery of further messages. Fix issue by releasing skb immediately after nlmsg_put error and WARN() so we can track down the cause of such size mismatch. [ fw@strlen.de: add tailroom/len info to WARN ] Signed-off-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:34:11 +02:00
Florian Westphal	c1e7dc91ee	netfilter: nfnetlink_log: fix maximum packet length logged to userspace don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to `9cefbbc9c8` (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:32:27 +02:00
Florian Westphal	9dfa1dfe4d	netfilter: nf_log: account for size of NLMSG_DONE attribute We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:30:15 +02:00
Herbert Xu	7677e86843	bridge: Do not compile options in br_parse_ip_options Commit `462fb2af97` bridge : Sanitize skb before it enters the IP stack broke when IP options are actually used because it mangles the skb as if it entered the IP stack which is wrong because the bridge is supposed to operate below the IP stack. Since nobody has actually requested for parsing of IP options this patch fixes it by simply reverting to the previous approach of ignoring all IP options, i.e., zeroing the IPCB. If and when somebody who uses IP options and actually needs them to be parsed by the bridge complains then we can revisit this. Reported-by: David Newall <davidn@davidnewall.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:24:03 +02:00
Haiyang Zhang	942396b019	hyperv: Fix the total_data_buflen in send path total_data_buflen is used by netvsc_send() to decide if a packet can be put into send buffer. It should also include the size of RNDIS message before the Ethernet frame. Otherwise, a messge with total size bigger than send_section_size may be copied into the send buffer, and cause data corruption. [Request to include this patch to the Stable branches] Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 17:58:50 -04:00
David S. Miller	f765678e32	Merge branch 'amd-xgbe' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver fixes 2014-10-22 The following series of patches includes fixes to the driver. - Properly handle feature changes via ethtool by using correctly sized variables - Perform proper napi packet counting and budget checking This patch series is based on net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 17:50:39 -04:00
Lendacky, Thomas	55ca6bcd73	amd-xgbe: Fix napi Rx budget accounting Currently the amd-xgbe driver increments the packets processed counter each time a descriptor is processed. Since a packet can be represented by more than one descriptor incrementing the counter in this way is not appropriate. Also, since multiple descriptors cause the budget check to be short circuited, sometimes the returned value from the poll function would be larger than the budget value resulting in a WARN_ONCE being triggered. Update the polling logic to properly account for the number of packets processed and exit when the budget value is reached. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 17:50:31 -04:00
Lendacky, Thomas	386f1c9650	amd-xgbe: Properly handle feature changes via ethtool The ndo_set_features callback function was improperly using an unsigned int to save the current feature value for features such as NETIF_F_RXCSUM. Since that feature is in the upper 32 bits of a 64 bit variable the result was always 0 making it not possible to actually turn off the hardware RX checksum support. Change the unsigned int type to the netdev_features_t type in order to properly capture the current value and perform the proper operation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 17:50:31 -04:00
Philipp Zabel	81f35ffde0	net: fec: ptp: fix NULL pointer dereference if ptp_clock is not set Since commit `278d240478` (net: fec: ptp: Enable PPS output based on ptp clock) fec_enet_interrupt calls fec_ptp_check_pps_event unconditionally, which calls into ptp_clock_event. If fep->ptp_clock is NULL, ptp_clock_event tries to dereference the NULL pointer. Since on i.MX53 fep->bufdesc_ex is not set, fec_ptp_init is never called, and fep->ptp_clock is NULL, which reliably causes a kernel panic. This patch adds a check for fep->ptp_clock == NULL in fec_enet_interrupt. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 17:48:06 -04:00
Sathya Perla	9e7ceb0607	net: fix saving TX flow hash in sock for outgoing connections The commit "net: Save TX flow hash in sock and set in skbuf on xmit" introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate and record flow hash(sk_txhash) in the socket structure. sk_txhash is used to set skb->hash which is used to spread flows across multiple TXQs. But, the above routines are invoked before the source port of the connection is created. Because of this all outgoing connections that just differ in the source port get hashed into the same TXQ. This patch fixes this problem for IPv4/6 by invoking the the above routines after the source port is available for the socket. Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit") Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 16:14:29 -04:00
Li RongQing	789f202326	xfrm6: fix a potential use after free in xfrm6_policy.c pskb_may_pull() maybe change skb->data and make nh and exthdr pointer oboslete, so recompute the nd and exthdr Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 15:38:48 -04:00
LEROY Christophe	8751b12cd9	net: fs_enet: set back promiscuity mode after restart After interface restart (eg: after link disconnection/reconnection), the bridge function doesn't work anymore. This is due to the promiscuous mode being cleared by the restart. The mac-fcc already includes code to set the promiscuous mode back during the restart. This patch adds the same handling to mac-fec and mac-scc. Tested with bridge function on MPC885 with FEC. Reported-by: Germain Montoies <germain.montoies@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 15:33:13 -04:00
Karl Beldan	a63ba13eec	net: tso: fix unaligned access to crafted TCP header in helper API The crafted header start address is from a driver supplied buffer, which one can reasonably expect to be aligned on a 4-bytes boundary. However ATM the TSO helper API is only used by ethernet drivers and the tcp header will then be aligned to a 2-bytes only boundary from the header start address. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 12:52:55 -04:00
Jon Cooper	8fc963515e	sfc: remove incorrect EFX_BUG_ON_PARANOID check write_count and insert_count can wrap around, making > check invalid. Fixes: `70b33fb0dd` ("sfc: add support for skb->xmit_more"). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-22 12:51:16 -04:00
Sabrina Dubroca	c123bb7163	netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation alloc_percpu returns NULL on failure, not a negative error code. Fixes: `ff3cd7b3c9` ("netfilter: nf_tables: refactor chain statistic routines") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:51 +02:00
Dan Carpenter	0f9f5e1b83	netfilter: ipset: off by one in ip_set_nfnl_get_byindex() The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:50 +02:00
Marcelo Leitner	e37ad9fd63	netfilter: nf_conntrack: allow server to become a client in TW handling When a port that was used to listen for inbound connections gets closed and reused for outgoing connections (like rsh ends up doing for stderr flow), current we may reject the SYN/ACK packet for the new connection because tcp_conntracks states forbirds a port to become a client while there is still a TIME_WAIT entry in there for it. As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout for it is 120s, there is a ~60s window that the application can end up opening a port that conntrack will end up blocking. This patch fixes this by simply allowing such state transition: if we see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note that the rest of the code already handles this situation, more specificly in tcp_packet(), first switch clause. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-22 14:12:50 +02:00
Sabrina Dubroca	7c1c97d54f	net: sched: initialize bstats syncp Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp. Fixes: `22e0f8b932` "net: sched: make bstats per cpu and estimator RCU safe" Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 21:45:21 -04:00
Alexei Starovoitov	32bf08a625	bpf: fix bug in eBPF verifier while comparing for verifier state equivalency the comparison was missing a check for uninitialized register. Make sure it does so and add a testcase. Fixes: `f1bca824da` ("bpf: add search pruning optimization to verifier") Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 21:43:46 -04:00
Thomas Graf	78fd1d0ab0	netlink: Re-add locking to netlink_lookup() and seq walker The synchronize_rcu() in netlink_release() introduces unacceptable latency. Reintroduce minimal lookup so we can drop the synchronize_rcu() until socket destruction has been RCUfied. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 21:34:49 -04:00
Ying Xue	1a194c2d59	tipc: fix lockdep warning when intra-node messages are delivered When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] * DEADLOCK * [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Ying Xue	7b8613e0a1	tipc: fix a potential deadlock Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
David S. Miller	73829bf6fe	Merge branch 'enic' Govindarajulu Varadarajan says: ==================== enic: Bug fixes This series fixes the following problem. Please apply this to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:24:30 -04:00
Govindarajulu Varadarajan	39dc90c159	enic: Do not call napi_disable when preemption is disabled. In enic_stop, we disable preemption using local_bh_disable(). We disable preemption to wait for busy_poll to finish. napi_disable should not be called here as it might sleep. Moving napi_disable() call out side of local_bh_disable. BUG: sleeping function called from invalid context at include/linux/netdevice.h:477 in_atomic(): 1, irqs_disabled(): 0, pid: 443, name: ifconfig INFO: lockdep is turned off. Preemption disabled at:[<ffffffffa029c5c4>] enic_rfs_flw_tbl_free+0x34/0xd0 [enic] CPU: 31 PID: 443 Comm: ifconfig Not tainted 3.17.0-netnext-05504-g59f35b8 #268 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffff8800dac10000 ffff88020b8dfcb8 ffffffff8148a57c 0000000000000000 ffff88020b8dfcd0 ffffffff8107e253 ffff8800dac12a40 ffff88020b8dfd10 ffffffffa029305b ffff88020b8dfd48 ffff8800dac10000 ffff88020b8dfd48 Call Trace: [<ffffffff8148a57c>] dump_stack+0x4e/0x7a [<ffffffff8107e253>] __might_sleep+0x123/0x1a0 [<ffffffffa029305b>] enic_stop+0xdb/0x4d0 [enic] [<ffffffff8138ed7d>] __dev_close_many+0x9d/0xf0 [<ffffffff8138ef81>] __dev_close+0x31/0x50 [<ffffffff813974a8>] __dev_change_flags+0x98/0x160 [<ffffffff81397594>] dev_change_flags+0x24/0x60 [<ffffffff814085fd>] devinet_ioctl+0x63d/0x710 [<ffffffff81139c16>] ? might_fault+0x56/0xc0 [<ffffffff81409ef5>] inet_ioctl+0x65/0x90 [<ffffffff813768e0>] sock_do_ioctl+0x20/0x50 [<ffffffff81376ebb>] sock_ioctl+0x20b/0x2e0 [<ffffffff81197250>] do_vfs_ioctl+0x2e0/0x500 [<ffffffff81492619>] ? sysret_check+0x22/0x5d [<ffffffff81285f23>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8109fe19>] ? trace_hardirqs_on_caller+0x119/0x270 [<ffffffff811974ac>] SyS_ioctl+0x3c/0x80 [<ffffffff814925ed>] system_call_fastpath+0x1a/0x1f Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:24:25 -04:00
Govindarajulu Varadarajan	b6931c9ba7	enic: fix possible deadlock in enic_stop/ enic_rfs_flw_tbl_free The following warning is shown when spinlock debug is enabled. This occurs when enic_flow_may_expire timer function is running and enic_stop is called on same CPU. Fix this by using spink_lock_bh(). ================================= [ INFO: inconsistent lock state ] 3.17.0-netnext-05504-g59f35b8 #268 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. ifconfig/443 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&enic->rfs_h.lock)->rlock){+.?...}, at: enic_rfs_flw_tbl_free+0x34/0xd0 [enic] {IN-SOFTIRQ-W} state was registered at: [<ffffffff810a25af>] __lock_acquire+0x83f/0x21c0 [<ffffffff810a45f2>] lock_acquire+0xa2/0xd0 [<ffffffff814913fc>] _raw_spin_lock+0x3c/0x80 [<ffffffffa029c3d5>] enic_flow_may_expire+0x25/0x130[enic] [<ffffffff810bcd07>] call_timer_fn+0x77/0x100 [<ffffffff810bd8e3>] run_timer_softirq+0x1e3/0x270 [<ffffffff8105f9ae>] __do_softirq+0x14e/0x280 [<ffffffff8105fdae>] irq_exit+0x8e/0xb0 [<ffffffff8103da0f>] smp_apic_timer_interrupt+0x3f/0x50 [<ffffffff81493742>] apic_timer_interrupt+0x72/0x80 [<ffffffff81018143>] default_idle+0x13/0x20 [<ffffffff81018a6a>] arch_cpu_idle+0xa/0x10 [<ffffffff81097676>] cpu_startup_entry+0x2c6/0x330 [<ffffffff8103b7ad>] start_secondary+0x21d/0x290 irq event stamp: 2997 hardirqs last enabled at (2997): [<ffffffff81491865>] _raw_spin_unlock_irqrestore+0x65/0x90 hardirqs last disabled at (2996): [<ffffffff814915e6>] _raw_spin_lock_irqsave+0x26/0x90 softirqs last enabled at (2968): [<ffffffff813b57a3>] dev_deactivate_many+0x213/0x260 softirqs last disabled at (2966): [<ffffffff813b5783>] dev_deactivate_many+0x1f3/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&enic->rfs_h.lock)->rlock); <Interrupt> lock(&(&enic->rfs_h.lock)->rlock); * DEADLOCK * Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:24:25 -04:00
David S. Miller	d10845fc85	Merge branch 'gso_encap_fixes' Florian Westphal says: ==================== net: minor gso encapsulation fixes The following series fixes a minor bug in the gso segmentation handlers when encapsulation offload is used. Theoretically this could cause kernel panic when the stack tries to software-segment such a GRE offload packet, but it looks like there is only one affected call site (tbf scheduler) and it handles NULL return value. I've included a followup patch to add IS_ERR_OR_NULL checks where needed. While looking into this, I also found that size computation of the individual segments is incorrect if skb->encapsulation is set. Please see individual patches for delta vs. v1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:19 -04:00
Florian Westphal	f993bc25e5	net: core: handle encapsulation offloads when computing segment lengths if ->encapsulation is set we have to use inner_tcp_hdrlen and add the size of the inner network headers too. This is 'mostly harmless'; tbf might send skb that is slightly over quota or drop skb even if it would have fit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:13 -04:00
Florian Westphal	330966e501	net: make skb_gso_segment error handling more robust skb_gso_segment has three possible return values: 1. a pointer to the first segmented skb 2. an errno value (IS_ERR()) 3. NULL. This can happen when GSO is used for header verification. However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL and would oops when NULL is returned. Note that these call sites should never actually see such a NULL return value; all callers mask out the GSO bits in the feature argument. However, there have been issues with some protocol handlers erronously not respecting the specified feature mask in some cases. It is preferable to get 'have to turn off hw offloading, else slow' reports rather than 'kernel crashes'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:13 -04:00
Florian Westphal	1e16aa3ddf	net: gso: use feature flag argument in all protocol gso handlers skb_gso_segment() has a 'features' argument representing offload features available to the output path. A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use those instead of the provided ones when handing encapsulation/tunnels. Depending on dev->hw_enc_features of the output device skb_gso_segment() can then return NULL even when the caller has disabled all GSO feature bits, as segmentation of inner header thinks device will take care of segmentation. This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs that did not fit the remaining token quota as the segmentation does not work when device supports corresponding hw offload capabilities. Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 12:38:12 -04:00
David S. Miller	ce8ec48967	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains netfilter fixes for your net tree, they are: 1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules. 2) Restrict nat and masq expressions to the nat chain type. Otherwise, users may crash their kernel if they attach a nat/masq rule to a non nat chain. 3) Fix hook validation in nft_compat when non-base chains are used. Basically, initialize hook_mask to zero. 4) Make sure you use match/targets in nft_compat from the right chain type. The existing validation relies on the table name which can be avoided by 5) Better netlink attribute validation in nft_nat. This expression has to reject the configuration when no address and proto configurations are specified. 6) Interpret NFTA_NAT_REG__MAX if only if NFTA_NAT_REG__MIN is set. Yet another sanity check to reject incorrect configurations from userspace. 7) Conditional NAT attribute dumping depending on the existing configuration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 11:57:47 -04:00
Ian Morgan	95ff886887	ax88179_178a: fix bonding failure The following patch fixes a bug which causes the ax88179_178a driver to be incapable of being added to a bond. When I brought up the issue with the bonding maintainers, they indicated that the real problem was with the NIC driver which must return zero for success (of setting the MAC address). I see that several other NIC drivers follow that pattern by either simply always returing zero, or by passing through a negative (error) result while rewriting any positive return code to zero. With that same philisophy applied to the ax88179_178a driver, it allows it to work correctly with the bonding driver. I believe this is suitable for queuing in -stable, as it's a small, simple, and obvious fix that corrects a defect with no other known workaround. This patch is against vanilla 3.17(.0). Signed-off-by: Ian Morgan <imorgan@primordial.ca> drivers/net/usb/ax88179_178a.c \| 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-20 00:53:30 -04:00
Linus Torvalds	61ed53deb1	Add support for Haswell NTB split BARs, a debugfs entry for basic debugging info, and some code clean-ups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJUQQXsAAoJEG5mS6x6i9IjtlcP/1kB9TzJ0XGGEhIMv98X989X jzOuDHandfhfB6f0ch+1HUOji34Ig+hX2GmKRnUMOJJvXTdwEl9qpbEo3tUNl6S+ pMf0oNzuMJGfrWsHbDa7MkCdnm6aLQrK/NxwHWY42g5yB2gc6vgcr6vD4bFEODBC fPgQIuogbLlVexFJFuVH6XbxNUhEXWiKf4vdq2m3tlgatLpe5NdVPQrq3qu8PLRa sbyNHWgCAVussI2LJI2t4dJwIUMnGFGTWBVNpxQo6ceWxziRFGg9Yo5068NnWU6j nDGL/TiQTM+Zp51U39Sd6eg5apnQMgrHTXumghSRDLnFe+odiVsRXX5IEqEAqXJS g6Et+sh7/CPO/nWM2dpIH5sxKuV4HAV7cSGsGjQD0pw6gKW8CT0rQI4gWufkorKH oR2etKjywN9j9+ofUN4w4FfKPlxeTgKp+zVbtTlUMOveJ4NIYBDDMvUS9KS9ZyMt s2X5/EJ864higC0VhTU/DxfqGc/g3+nz/EtWynxn9q2VPWlz/Zp0lLWvx4J4qGjP g4Pbv6EzWZqR1pia9L5alwR9nEYtn+bkwFKvvygp0tA+bor0G75qoRVy75KhUIPn 1oQllKGiFhSLlNq9q6OMnlLv/BMXgvSnm7NL291/TY2RWfiWatlwGaMvnmkLWKKm wskyofD4yYOs4UJXgUJ8 =JVjj -----END PGP SIGNATURE----- Merge tag 'ntb-3.18' of git://github.com/jonmason/ntb Pull ntb (non-transparent bridge) updates from Jon Mason: "Add support for Haswell NTB split BARs, a debugfs entry for basic debugging info, and some code clean-ups" * tag 'ntb-3.18' of git://github.com/jonmason/ntb: ntb: Adding split BAR support for Haswell platforms ntb: use errata flag set via DID to implement workaround ntb: conslidate reading of PPD to move platform detection earlier ntb: move platform detection to separate function NTB: debugfs device entry	2014-10-19 12:58:22 -07:00

1 2 3 4 5 ...

480337 Commits