linux

iv/linux

Author	SHA1	Message	Date
Liping Zhang	590025a27f	netfilter: nft_ct: fix unpaired nf_connlabels_get/put call We only get nf_connlabels if the user add ct label set expr successfully, but we will also put nf_connlabels if the user delete ct lable get expr. This is mismathced, and will cause ct label expr cannot work properly. Also, if we init something fail, we should put nf_connlabels back. Otherwise, we may waste to alloc the memory that will never be used. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-19 19:52:03 +02:00
Florian Westphal	f4dc77713f	netfilter: x_tables: speed up jump target validation The dummy ruleset I used to test the original validation change was broken, most rules were unreachable and were not tested by mark_source_chains(). In some cases rulesets that used to load in a few seconds now require several minutes. sample ruleset that shows the behaviour: echo "*filter" for i in $(seq 0 100000);do printf ":chain_%06x - [0:0]\n" $i done for i in $(seq 0 100000);do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done echo COMMIT [ pipe result into iptables-restore ] This ruleset will be about 74mbyte in size, with ~500k searches though all 500k[1] rule entries. iptables-restore will take forever (gave up after 10 minutes) Instead of always searching the entire blob for a match, fill an array with the start offsets of every single ipt_entry struct, then do a binary search to check if the jump target is present or not. After this change ruleset restore times get again close to what one gets when reverting 36472341017529e (~3 seconds on my workstation). [1] every user-defined rule gets an implicit RETURN, so we get 300k jumps + 100k userchains + 100k returns -> 500k rule entries Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps") Reported-by: Jeff Wu <wujiafu@gmail.com> Tested-by: Jeff Wu <wujiafu@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-18 21:35:23 +02:00
Liping Zhang	3101e0fc1f	netfilter: conntrack: protect early_drop by rcu read lock User can add ct entry via nfnetlink(IPCTNL_MSG_CT_NEW), and if the total number reach the nf_conntrack_max, we will try to drop some ct entries. But in this case(the main function call path is ctnetlink_create_conntrack -> nf_conntrack_alloc -> early_drop), rcu_read_lock is not held, so race with hash resize will happen. Fixes: 242922a02717 ("netfilter: conntrack: simplify early_drop") Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-12 16:24:22 +02:00
Toby DiPasquale	c2b9b4fee8	netfilter: nf_conntrack_h323: fix off-by-one in DecodeQ931 This patch corrects an off-by-one error in the DecodeQ931 function in the nf_conntrack_h323 module. This error could result in reading off the end of a Q.931 frame. Signed-off-by: Toby DiPasquale <toby@cbcg.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:32:45 +02:00
Pablo Neira Ayuso	c080b460df	Merge tag 'ipvs-for-v4.8' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.8 please consider these enhancements to the IPVS. This alters the behaviour of the "least connection" schedulers such that pre-established connections are included in the active connection count. This avoids overloading servers when a large number of new connections arrive in a short space of time - e.g. when clients reconnect after a node or network failure. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:16:34 +02:00
Pablo Neira Ayuso	42a5576913	netfilter: nf_tables: get rid of possible_net_t from set and basechain We can pass the netns pointer as parameter to the functions that need to gain access to it. From basechains, I didn't find any client for this field anymore so let's remove this too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:16:04 +02:00
Liping Zhang	3f8b61b7f9	netfilter: nft_ct: make byte/packet expr more friendly If we want to use ct packets expr, and add a rule like follows: # nft add rule filter input ct packets gt 1 counter We will find that no packets will hit it, because nf_conntrack_acct is disabled by default. So It will not work until we enable it manually via "echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct". This is not friendly, so like xt_connbytes do, if the user want to use ct byte/packet expr, enable nf_conntrack_acct automatically. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:16:02 +02:00
Hangbin Liu	47c7445625	netfilter: physdev: physdev-is-out should not work with OUTPUT chain physdev_mt() will check skb->nf_bridge first, which was alloced in br_nf_pre_routing. So if we want to use --physdev-out and physdev-is-out, we need to match it in FORWARD or POSTROUTING chain. physdev_mt_check() only checked physdev-out and missed physdev-is-out. Fix it and update the debug message to make it clearer. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Marcelo R Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:16:01 +02:00
Florian Westphal	870190a9ec	netfilter: nat: convert nat bysrc hash to rhashtable It did use a fixed-size bucket list plus single lock to protect add/del. Unlike the main conntrack table we only need to add and remove keys. Convert it to rhashtable to get table autosizing and per-bucket locking. The maximum number of entries is -- as before -- tied to the number of conntracks so we do not need another upperlimit. The change does not handle rhashtable_remove_fast error, only possible "error" is -ENOENT, and that is something that can happen legitimetely, e.g. because nat module was inserted at a later time and no src manip took place yet. Tested with http-client-benchmark + httpterm with DNAT and SNAT rules in place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 12:07:57 +02:00
Florian Westphal	7c96643519	netfilter: move nat hlist_head to nf_conn The nat extension structure is 32bytes in size on x86_64: struct nf_conn_nat { struct hlist_node bysource; /* 0 16 / struct nf_conn ct; /* 16 8 / union nf_conntrack_nat_help help; / 24 4 / int masq_index; / 28 4 / / size: 32, cachelines: 1, members: 4 / / last cacheline: 32 bytes */ }; The hlist is needed to quickly check for possible tuple collisions when installing a new nat binding. Storing this in the extension area has two drawbacks: 1. We need ct backpointer to get the conntrack struct from the extension. 2. When reallocation of extension area occurs we need to fixup the bysource hash head via hlist_replace_rcu. We can avoid both by placing the hlist_head in nf_conn and place nf_conn in the bysource hash rather than the extenstion. We can also remove the ->move support; no other extension needs it. Moving the entire nat extension into nf_conn would be possible as well but then we have to add yet another callback for deletion from the bysource hash table rather than just using nat extension ->destroy hook for this. nf_conn size doesn't increase due to aligment, followup patch replaces hlist_node with single pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 11:47:50 +02:00
Florian Westphal	242922a027	netfilter: conntrack: simplify early_drop We don't need to acquire the bucket lock during early drop, we can use lockless traveral just like ____nf_conntrack_find. The timer deletion serves as synchronization point, if another cpu attempts to evict same entry, only one will succeed with timer deletion. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 11:46:22 +02:00
Liping Zhang	8786a9716d	netfilter: nf_ct_helper: unlink helper again when hash resize happen From: Liping Zhang <liping.zhang@spreadtrum.com> Similar to ctnl_untimeout, when hash resize happened, we should try to do unhelp from the 0# bucket again. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 11:44:34 +02:00
Liping Zhang	474803d37e	netfilter: cttimeout: unlink timeout obj again when hash resize happen Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing ctnl_untimeout, and iterate on 3000# bucket. Meanwhile, another user try to reduce hash size to 2048, then all nf_conn are removed to the new hashtable. When this hash resize operation finished, we still try to itreate ct begin from 3000# bucket, find nothing to do and just return. We may miss unlinking some timeout objects. And later we will end up with invalid references to timeout object that are already gone. So when we find that hash resize happened, try to unlink timeout objects from the 0# bucket again. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 11:39:08 +02:00
Liping Zhang	64b87639c9	netfilter: conntrack: fix race between nf_conntrack proc read and hash resize When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows： BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [<ffffffff81261a1c>] seq_read+0x2cc/0x390 [<ffffffff812a8d62>] proc_reg_read+0x42/0x70 [<ffffffff8123bee7>] __vfs_read+0x37/0x130 [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0 [<ffffffff8123cf75>] vfs_read+0x95/0x140 [<ffffffff8123e475>] SyS_read+0x55/0xc0 [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f98175 ("netfilter: conntrack: fix lookup race during hash resize"). And add a wrapper function nf_conntrack_get_ht to get hash and hsize suggested by Florian Westphal. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-11 11:38:57 +02:00
Michal Kubecek	be2cef4990	ipvs: count pre-established TCP states as active Some users observed that "least connection" distribution algorithm doesn't handle well bursts of TCP connections from reconnecting clients after a node or network failure. This is because the algorithm counts active connection as worth 256 inactive ones where for TCP, "active" only means TCP connections in ESTABLISHED state. In case of a connection burst, new connections are handled before previous ones have finished the three way handshaking so that all are still counted as "inactive", i.e. cheap ones. The become "active" quickly but at that time, all of them are already assigned to one real server (or few), resulting in highly unbalanced distribution. Address this by counting the "pre-established" states as "active". Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-07-07 20:30:52 +02:00
David S. Miller	a90a6e55f3	One more set of new features: * beacon report (for radio measurement) support in cfg80211/mac80211 * hwsim: allow wmediumd in namespaces * mac80211: extend 160MHz workaround to CSA IEs * mesh: properly encrypt group-addressed privacy action frames * mesh: allow setting peer AID * first steps for MU-MIMO monitor mode * along with various other cleanups and improvements -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXfSCuAAoJEGt7eEactAAdaugP/ilrcELQRsIN5ZXCAKYZuwXV T01JPwgOaWL9ILu7h1SfG/+j9kzMnyk4WmRpeoj2FGNcyfG2AvULWSLpQJ2abwgQ 8o/emuLinQwRENevaMUTRSOE0HkXoFPCbbq37+a2i6bAv1QSYY3A0xvWpcU5fZ4D 7CKYDYPBAdMXYwEwy1g4nYWfDAYqS4rthr3l3rS1Cy7Q2T1ZlMlD90GjD7oeQAEw orKulhkkDSzvxfvZCYTzXmUoBQE8sNXGDD+OFsJyowyt+ugM/xan+2tmhCaHSnda HpdCS2aRj779UBn9cOfELjffTNpS++PM6KFd8ZDaPcJSMginn/BAHTOeNfNUfL0Q +Enu59I82qMDzbG2z1Qezzjv7OTzyEvyvYzNbLOqljTBSBklDa3rHwhyk+g1oVBH +4xX1Vk5QBLde+Q0NS0gTkcqOQK8KT5+HEqiUfgLSNDETN0lSGsKbtvMfU/ikE1Y aLRkTp7nzUd03qjIFLS6RMf7JjucWWzH1ZXTHvbpDFAG7riOhYRD3Sw+0e7madTd +HXjH9dnOGnGPDL+FyDwtW6iclYwNjcIPQiNdOjwWfMA2Wmr7iq+aFUptRCwQTHB WtgJ3f8OHax2JXcm4grYfxELZip5vbWJJHUC84Drvmzw3X7FRITf+OEWjdNOzsRD Fc7w5ceThh9Id1BvcH2+ =R1qP -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== One more set of new features: * beacon report (for radio measurement) support in cfg80211/mac80211 * hwsim: allow wmediumd in namespaces * mac80211: extend 160MHz workaround to CSA IEs * mesh: properly encrypt group-addressed privacy action frames * mesh: allow setting peer AID * first steps for MU-MIMO monitor mode * along with various other cleanups and improvements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-06 22:32:15 -07:00
David S. Miller	30d0844bdc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-06 10:35:22 -07:00
David S. Miller	ae3e4562e2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Don't use userspace datatypes in bridge netfilter code, from Tobin Harding. 2) Iterate only once over the expectation table when removing the helper module, instead of once per-netns, from Florian Westphal. 3) Extra sanitization in xt_hook_ops_alloc() to return error in case we ever pass zero hooks, xt_hook_ops_alloc(): 4) Handle NFPROTO_INET from the logging core infrastructure, from Liping Zhang. 5) Autoload loggers when TRACE target is used from rules, this doesn't change the behaviour in case the user already selected nfnetlink_log as preferred way to print tracing logs, also from Liping Zhang. 6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields by cache lines, increases the size of entries in 11% per entry. From Florian Westphal. 7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian. 8) Remove useless defensive check in nf_logger_find_get() from Shivani Bhardwaj. 9) Remove zone extension as place it in the conntrack object, this is always include in the hashing and we expect more intensive use of zones since containers are in place. Also from Florian Westphal. 10) Owner match now works from any namespace, from Eric Bierdeman. 11) Make sure we only reply with TCP reset to TCP traffic from nf_reject_ipv4, patch from Liping Zhang. 12) Introduce --nflog-size to indicate amount of network packet bytes that are copied to userspace via log message, from Vishwanath Pai. This obsoletes --nflog-range that has never worked, it was designed to achieve this but it has never worked. 13) Introduce generic macros for nf_tables object generation masks. 14) Use generation mask in table, chain and set objects in nf_tables. This allows fixes interferences with ongoing preparation phase of the commit protocol and object listings going on at the same time. This update is introduced in three patches, one per object. 15) Check if the object is active in the next generation for element deactivation in the rbtree implementation, given that deactivation happens from the commit phase path we have to observe the future status of the object. 16) Support for deletion of just added elements in the hash set type. 17) Allow to resize hashtable from /proc entry, not only from the obscure /sys entry that maps to the module parameter, from Florian Westphal. 18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised anymore since we tear down the ruleset whenever the netdevice goes away. 19) Support for matching inverted set lookups, from Arturo Borrero. 20) Simplify the iptables_mangle_hook() by removing a superfluous extra branch. 21) Introduce ether_addr_equal_masked() and use it from the netfilter codebase, from Joe Perches. 22) Remove references to "Use netfilter MARK value as routing key" from the Netfilter Kconfig description given that this toggle doesn't exists already for 10 years, from Moritz Sichert. 23) Introduce generic NF_INVF() and use it from the xtables codebase, from Joe Perches. 24) Setting logger to NONE via /proc was not working unless explicit nul-termination was included in the string. This fixes seems to leave the former behaviour there, so we don't break backward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-06 09:15:15 -07:00
Masashi Honma	7d27a0ba7a	cfg80211: Add mesh peer AID setting API Previously, mesh power management functionality works only with kernel MPM. Because user space MPM did not report mesh peer AID to kernel, the kernel could not identify the bit in TIM element. So this patch adds mesh peer AID setting API. Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 15:04:52 +02:00
Johannes Berg	92b3a28a2b	mac80211: parse wide bandwidth channel switch IE with workaround Continuing the workaround implemented in commit 23665aaf9170 ("mac80211: Interoperability workaround for 80+80 and 160 MHz channels") use the same code to parse the Wide Bandwidth Channel Switch element by converting to VHT Operation element since the spec also just refers to that for parsing semantics, particularly with the workaround. While at it, remove some dead code - the IEEE80211_STA_DISABLE_40MHZ flag can never be set at this point since it's checked earlier and the wide_bw_chansw_ie pointer is set to NULL if it's set. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:55:04 +02:00
Johannes Berg	7d10f6b179	mac80211: report failure to start (partial) scan as scan abort Rather than reporting the scan as having completed, report it as being aborted. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:54:38 +02:00
Avraham Stern	7947d3e075	mac80211: Add support for beacon report radio measurement Add the following to support beacon report radio measurement with the measurement mode field set to passive or active: 1. Propagate the required scan duration to the device 2. Report the scan start time (in terms of TSF) 3. Report each BSS's detection time (also in terms of TSF) TSF times refer to the BSS that the interface that requested the scan is connected to. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com> [changed ath9k/10k, at76c59x-usb, iwlegacy, wl1251 and wlcore to match the new API] Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:53:19 +02:00
Avraham Stern	1d76250bd3	nl80211: support beacon report scanning Beacon report radio measurement requires reporting observed BSSs on the channels specified in the beacon request. If the measurement mode is set to passive or active, it requires actually performing a scan (passive or active, accordingly), and reporting the time that the scan was started and the time each beacon/probe was received (both in terms of TSF of the BSS of the requesting AP). If the request mode is table, this information is optional. In addition, the radio measurement request specifies the channel dwell time for the measurement. In order to use scan for beacon report when the mode is active or passive, add a parameter to scan request that specifies the channel dwell time, and add scan start time and beacon received time to scan results information. Supporting beacon report is required for Multi Band Operation (MBO). Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:51:31 +02:00
Aviya Erenfeld	c6e6a0c8be	nl80211: Add API to support VHT MU-MIMO air sniffer add API to support VHT MU-MIMO air sniffer. in MU-MIMO there are parallel frames on the air while the HW has only one RX. add the capability to sniff one of the MU-MIMO parallel frames by giving the sniffer additional information so it'll know which of the parallel frames it shall follow. Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting a MU-MIMO groupID in order to monitor packets from that group using VHT MU-MIMO. And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing MAC address to monitor mode. that option will be used by VHT MU-MIMO air sniffer to follow a station according to it's MAC address using VHT MU-MIMO. Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:46:04 +02:00
Johannes Berg	f89e07d4cf	mac80211: agg-rx: refuse ADDBA Request with timeout update The current implementation of handling ADDBA Request while a session is already active with the peer is wrong - in case the peer is using the existing session's dialog token this should be treated as update to the session, which can update the timeout value. We don't really have a good way of supporting that, so reject, but implement the required behaviour in the spec of "Even if the updated ADDBA Request frame is not accepted, the original Block ACK setup remains active." (802.11-2012 10.5.4) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-07-06 14:44:14 +02:00
Martin KaFai Lau	903ce4abdf	ipv6: Fix mem leak in rt6i_pcpu It was first reported and reproduced by Petr (thanks!) in https://bugzilla.kernel.org/show_bug.cgi?id=119581 free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy(). However, after fixing a deadlock bug in commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"), free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL. It is worth to note that rt6i_pcpu is protected by table->tb6_lock. kmemleak somehow did not report it. We nailed it down by observing the pcpu entries in /proc/vmallocinfo (first suggested by Hannes, thanks!). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Tested-by: Petr Novopashenniy <pety@rusnet.ru> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 14:09:23 -07:00
Vegard Nossum	ab58298cf4	net: fix decnet rtnexthop parsing dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0 (i.e. if userspace passes a malformed netlink message). Let's use the helpers from net/nexthop.h which take care of all this stuff. We can do exactly the same as e.g. fib_count_nexthops() and fib_get_nhs() from net/ipv4/fib_semantics.c. This fixes the softlockup for me. Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 14:08:47 -07:00
Ido Schimmel	2a4501ae18	neigh: Send a notification when DELAY_PROBE_TIME changes When the data plane is offloaded the traffic doesn't go through the networking stack. Therefore, after first resolving a neighbour the NUD state machine will transition it from REACHABLE to STALE until it's finally deleted by the garbage collector. To prevent such situations the offloading driver should notify the NUD state machine on any neighbours that were recently used. The driver's polling interval should be set so that the NUD state machine can function as if the traffic wasn't offloaded. Currently, there are no in-tree drivers that can report confirmation for a neighbour, but only 'used' indication. Therefore, the polling interval should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will transition from REACHABLE state to DELAY (instead of STALE) if "a packet was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861). Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via netlink or sysctl - so that offloading drivers can correctly set their polling interval. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 09:06:29 -07:00
Jiri Pirko	18bfb924f0	net: introduce default neigh_construct/destroy ndo calls for L2 upper devices L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 09:06:28 -07:00
Jiri Pirko	503eebc265	net: add dev arg to ndo_neigh_construct/destroy As the following patch will allow upper devices to follow the call down lower devices, we need to add dev here and not rely on n->dev. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 09:06:28 -07:00
Pavel Tikhomirov	c6ac37d8d8	netfilter: nf_log: fix error on write NONE to logger choice sysctl It is hard to unbind nf-logger: echo NONE > /proc/sys/net/netfilter/nf_log/0 bash: echo: write error: No such file or directory sysctl -w net.netfilter.nf_log.0=NONE sysctl: setting key "net.netfilter.nf_log.0": No such file or directory net.netfilter.nf_log.0 = NONE You need explicitly send '\0', for instance like: echo -e "NONE\0" > /proc/sys/net/netfilter/nf_log/0 That seem to be strange, so fix it using proc_dostring. Now it works fine: modprobe nfnetlink_log echo nfnetlink_log > /proc/sys/net/netfilter/nf_log/0 cat /proc/sys/net/netfilter/nf_log/0 nfnetlink_log echo NONE > /proc/sys/net/netfilter/nf_log/0 cat /proc/sys/net/netfilter/nf_log/0 NONE v2: add missed error check for proc_dostring Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-05 14:57:57 +02:00
David S. Miller	b77af26a79	This feature patchset includes the following changes: - Cleanup work by Markus Pargmann and Sven Eckelmann (six patches) - Initial Netlink support by Matthias Schiffer (two patches) - Throughput Meter implementation by Antonio Quartulli, a kernel-space traffic generator to estimate link speeds. This feature is useful on low-end WiFi APs where running iperf or netperf from userspace gives wrong results due to heavy userspace/kernelspace overhead. (two patches) - API clean-up work by Antonio Quartulli (one patch) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJXel3cAAoJEKEr45hCkp6hxs0P/jQZBJ37Bd4EHRGdhvCJWwsO j79zr7mIECub8a6PMkO1GI87ksJNtRdRw7XAIbLKTwsKEsUE0Gpv/MLLKgv/nD7X zatcoI4DujkgSojZIcOV/061+M9FAnCtAYv13jIS8nbXdqfGPxPfLua6Zbvx1GS2 z/Rqg/WbB2qDtDlUrp0W/8oXQ+k6062G7GigroPLmjdWd5lF0H6ly4loWsxFyr0U GVl44HM4nOj7DwkVlrGoOXnAbjpz9TNC/aA5TIS/tLFZkm5dvJjjKLDbxo5NM9aq hRhFy8Gbe0TmOxV3mKZUT1oHuaHgFDY2tADLiLF2g/ijgaTetXCBJ6DXQ7BkiZnh +t1fnutOB1D05+cZGDmlfb2bFXO6vdDwNzKYuPdeW0tUOVwzNIaMK+US1HffUD3F ciK/cALsLbfJ3QkUHJclE57baMuB2c7YWJUxGdp2r4lKHak6tc8+BsornI6lB6qY kcxip6EEaT7edjT66Qjq8GtFK7WIri5nHI9n5Js+Wwl1QENvkLmZRQ6qZexwSplS RTZmmO+i+Y4rGDa3xoVSlC+CEUO7D4VwhET2Jir7KJrVS+pFNRAmfpUNWxeauAls D1xWgBrWjjOYu5i3LjwC6cHl4eTWxBwWmBUaxLUUeyoR22ulIs6bXCQMWOLMbupd q8k2B5BW9waTAgb4Tam9 =PFHu -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20160704' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - Cleanup work by Markus Pargmann and Sven Eckelmann (six patches) - Initial Netlink support by Matthias Schiffer (two patches) - Throughput Meter implementation by Antonio Quartulli, a kernel-space traffic generator to estimate link speeds. This feature is useful on low-end WiFi APs where running iperf or netperf from userspace gives wrong results due to heavy userspace/kernelspace overhead. (two patches) - API clean-up work by Antonio Quartulli (one patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 23:33:59 -07:00
Jiri Pirko	7ce856aaaf	mlxsw: spectrum: Add couple of lower device helper functions Add functions that iterate over lower devices and find port device. As a dependency add netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macro with netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers. Also, add functions to return mlxsw struct according to lower device found and mlxsw_port struct with a reference to lower device. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 18:25:15 -07:00
Vegard Nossum	3dad5424ad	RDS: fix rds_tcp_init() error path If register_pernet_subsys() fails, we shouldn't try to call unregister_pernet_subsys(). Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Cc: stable@vger.kernel.org Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 16:09:49 -07:00
Daniel Borkmann	13c5c240f7	bpf: add bpf_get_hash_recalc helper If skb_clear_hash() was invoked due to mangling of relevant headers and BPF program needs skb->hash later on, we can add a helper to trigger hash recalculation via bpf_get_hash_recalc(). The helper will return the newly retrieved hash directly, but later access can also be done via skb context again through skb->hash directly (inline) without needing to call the helper once more. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 16:08:40 -07:00
John Fastabend	0967f24459	net: pktgen: support injecting packets for qdisc testing Add another xmit_mode to pktgen to allow testing xmit functionality of qdiscs. The new mode "queue_xmit" injects packets at __dev_queue_xmit() so that qdisc is called. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 16:07:34 -07:00
Jamal Hadi Salim	61cc535de3	net sched actions: skbedit convert to use more modern nla_put_xxx Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 15:11:14 -07:00
Jamal Hadi Salim	ff202ee1ed	net sched actions: skbedit add support for mod-ing skb pkt_type Extremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Example usage: tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \ match ip src 5.5.5.5/32 \ flowid 1:5 action skbedit ptype host This will tag all packets incoming from 5.5.5.5 with type PACKET_HOST Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 15:11:14 -07:00
Jamal Hadi Salim	8b10cab64c	net: simplify and make pkt_type_ok() available for other users Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-04 15:11:13 -07:00
Antonio Quartulli	29824a55c0	batman-adv: split routing API data structure in subobjects The routing API data structure contains several function pointers that can easily be grouped together based on the component they work with. Split the API in subobjects in order to improve definition readability. At the same time, remove the "bat_" prefix from the API object and its fields names. These are batman-adv private structs and there is no need to always prepend such prefix, which only makes function invocations much much longer. Signed-off-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2016-07-04 12:37:19 +02:00
Antonio Quartulli	33a3bb4a33	batman-adv: throughput meter implementation The throughput meter module is a simple, kernel-space replacement for throughtput measurements tool like iperf and netperf. It is intended to approximate TCP behaviour. It is invoked through batctl: the protocol is connection oriented, with cumulative acknowledgment and a dynamic-size sliding window. The test can be interrupted by batctl. A receiver side timeout avoids unlimited waitings for sender packets: after one second of inactivity, the receiver abort the ongoing test. Based on a prototype from Edo Monticelli <montik@autistici.org> Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2016-07-04 12:37:18 +02:00
Antonio Quartulli	f50ca95a69	batman-adv: return netdev status in the TX path Return the proper netdev TX status along the TX path so that the tp_meter can understand when the queue is full and should stop sending packets. Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2016-07-04 12:37:18 +02:00
Matthias Schiffer	5da0aef5e9	batman-adv: add netlink command to query generic mesh information files BATADV_CMD_GET_MESH_INFO is used to query basic information about a batman-adv softif (name, index and MAC address for both the softif and the primary hardif; routing algorithm; batman-adv version). Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Reduce the number of changes to BATADV_CMD_GET_MESH_INFO, add missing kerneldoc, add policy for attributes] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2016-07-04 12:37:17 +02:00
Matthias Schiffer	09748a22f4	batman-adv: add generic netlink family for batman-adv debugfs is currently severely broken virtually everywhere in the kernel where files are dynamically added and removed (see http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02196.html for some details). In addition to that, debugfs is not namespace-aware. Instead of adding new debugfs entries, the whole infrastructure should be moved to netlink. This will fix the long standing problem of large buffers for debug tables and hard to parse text files. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Strip down patch to only add genl family, add missing kerneldoc] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2016-07-04 12:37:17 +02:00
Joe Perches	c37a2dfa67	netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF netfilter uses multiple FWINV #defines with identical form that hide a specific structure variable and dereference it with a invflags member. $ git grep "#define FWINV" include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg)) net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg)) net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg))) net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg))) net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg))) net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg))) Consolidate these macros into a single NF_INVF macro. Miscellanea: o Neaten the alignment around these uses o A few lines are > 80 columns for intelligibility Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-03 10:55:07 +02:00
Or Gerlitz	08f4b5918b	net/devlink: Add E-Switch mode control Add the commands to set and show the mode of SRIOV E-Switch, two modes are supported: * legacy: operating in the "old" L2 based mode (DMAC --> VF vport) * switchdev: the E-Switch is referred to as whitebox switch configured using standard tools such as tc, bridge, openvswitch etc. To allow working with the tools, for each VF, a VF representor netdevice is created by the E-Switch manager vendor device driver instance (e.g PF). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-02 14:40:40 -04:00
David S. Miller	3ea00443f1	This feature patchset includes the following changes: - two patches with minimal clean up work by Antonio Quartulli and Simon Wunderlich - eight patches of B.A.T.M.A.N. V, API and documentation clean up work, by Antonio Quartulli and Marek Lindner - Andrew Lunn fixed the skb priority adoption when forwarding fragmented packets (two patches) - Multicast optimization support is now enabled for bridges which comes with some protocol updates, by Linus Luessing -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJXdmhpAAoJEKEr45hCkp6h/R0P/1K0mZjs1lk15j1oc0EeE/lJ a47nwLQiAj9O790SUhQuonUYtbw5jhxZq5P1zYvg44ngRdUhsH9yiwr+Yado40CW 5ek+EdfGfkwNThGG++knVrbhLPrGxSC9Q2qJCApergt4wViWvvovSJOZsKVcanei Qv9uGn6TIhZW3FFhvYk6/xgseZhjRISgxPkO1N20tMcC3f0w4YgM5QcCPGG2KB3N CYq1qryyl4Jf6NeNap/lXiTo6JA5qOHYe68ziotJTtlsrsYQ3WitJvuKO+bWuQIv zOU/jQ7qUwuabLT5TnzZKxQJvhrqfA5V20OM3yD4lnhdgvqVsHgHoIRy6RpN4U8M rFlgROZvm+k0ATnL8AcUtIY7EA/EA0ifHN2fdTAfQ0XNc0VxTXEWB4qTTBJu3+se N0+QyIjpXzgHqKxjIpr8Sm3tBO/ANCui/gWl5SToGXis3xDsRivvTMWNSNYjgDcP WdyLtx9h7RLNOh64Idwsq4yDHt+/P86z7xJQdlkmrshkjqL/HNgS93U5CeAC3mN0 S6N5PgZG07EYnGxzxDid+6x1UP1VA7dyqHJpxpYY7qbw+/aDVlq5XH/vI/9Lbq5i vu/54L8cVG5nBe54SZBeUib5W7Wkgf3POWzt+rrRwbHY+gAE1zUPQNNzgDtLHH0N K/XJwdcoGQzA5LEynGE7 =js/J -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20160701' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - two patches with minimal clean up work by Antonio Quartulli and Simon Wunderlich - eight patches of B.A.T.M.A.N. V, API and documentation clean up work, by Antonio Quartulli and Marek Lindner - Andrew Lunn fixed the skb priority adoption when forwarding fragmented packets (two patches) - Multicast optimization support is now enabled for bridges which comes with some protocol updates, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 17:05:00 -04:00
Richard Alpe	55e77a3e82	tipc: fix nl compat regression for link statistics Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes of the link name where left out. Making the output of tipc-config -ls look something like: Link statistics: dcast-link 1:data0-1.1.2:data0 1:data0-1.1.3:data0 Also, for the record, the patch that introduce this regression claims "Sending the whole object out can cause a leak". Which isn't very likely as this is a compat layer, where the data we are parsing is generated by us and we know the string to be NULL terminated. But you can of course never be to secure. Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 16:47:38 -04:00
Sowmini Varadhan	11bb62f7c0	RDS: Do not send a pong to an incoming ping with 0 src port RDS ping messages are sent with a non-zero src port to a zero dst port, so that the rds pong messages can be sent back to the originators src port. However if a confused/malicious sender sends a ping with a 0 src port, we'd have an infinite ping-pong loop. To avoid this, the receiver should ignore ping messages with a 0 src port. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 16:45:18 -04:00
Sowmini Varadhan	8315011ad6	RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts When reconnecting, the peer with the smaller IP address will initiate the reconnect, to avoid needless duelling SYN issues. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 16:45:17 -04:00

1 2 3 4 5 ...

42581 Commits