linux

iv/linux

Author	SHA1	Message	Date
Eric Dumazet	89527be8d8	net: add IFLA_TSO_{MAX_SIZE\|SEGS} attributes New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS are used to report to user-space the device TSO limits. ip -d link sh dev eth1 ... tso_max_size 65536 tso_max_segs 65535 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:55 +01:00
David S. Miller	1a01a07517	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is v2 including deadlock fix in conntrack ecache rework reported by Jakub Kicinski. The following patchset contains Netfilter updates for net-next, mostly updates to conntrack from Florian Westphal. 1) Add a dedicated list for conntrack event redelivery. 2) Include event redelivery list in conntrack dumps of dying type. 3) Remove per-cpu dying list for event redelivery, not used anymore. 4) Add netns .pre_exit to cttimeout to zap timeout objects before synchronize_rcu() call. 5) Remove nf_ct_unconfirmed_destroy. 6) Add generation id for conntrack extensions for conntrack timeout and helpers. 7) Detach timeout policy from conntrack on cttimeout module removal. 8) Remove __nf_ct_unconfirmed_destroy. 9) Remove unconfirmed list. 10) Remove unconditional local_bh_disable in init_conntrack(). 11) Consolidate conntrack iterator nf_ct_iterate_cleanup(). 12) Detect if ctnetlink listeners exist to short-circuit event path early. 13) Un-inline nf_ct_ecache_ext_add(). 14) Add nf_conntrack_events autodetect ctnetlink listener mode and make it default. 15) Add nf_ct_ecache_exist() to check for event cache extension. 16) Extend flowtable reverse route lookup to include source, iif, tos and mark, from Sven Auhagen. 17) Do not verify zero checksum UDP packets in nf_reject, from Kevin Mitchell. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:10:37 +01:00
Jonas Jelonek	569cf386ec	mac80211: minstrel_ht: support ieee80211_rate_status This patch adds support for the new struct ieee80211_rate_status and its annotation in struct ieee80211_tx_status in minstrel_ht. In minstrel_ht_tx_status, a check for the presence of instances of the new struct in ieee80211_tx_status is added. Based on this, minstrel_ht then gets and updates internal rate stats with either struct ieee80211_rate_status or ieee80211_tx_info->status.rates. Adjusted variants of minstrel_ht_txstat_valid, minstrel_ht_get_stats, minstrel_{ht/vht}_get_group_idx are added which use struct ieee80211_rate_status and struct rate_info instead of the legacy structs. struct rate_info from cfg80211.h does not provide whether short preamble was used for the transmission. So we retrieve this information from VIF and STA configuration and cache it in a new flag in struct minstrel_ht_sta per rate control instance. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Link: https://lore.kernel.org/r/20220509173958.1398201-3-jelonek.jonas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:07:58 +02:00
Jonas Jelonek	44fa75f207	mac80211: extend current rate control tx status API This patch adds the new struct ieee80211_rate_status and replaces 'struct rate_info rate' in ieee80211_tx_status with pointer and length annotation. The struct ieee80211_rate_status allows to: (1) receive tx power status feedback for transmit power control (TPC) per packet or packet retry (2) dynamic mapping of wifi chip specific multi-rate retry (mrr) chains with different lengths (3) increase the limit of annotatable rate indices to support IEEE802.11ac rate sets and beyond ieee80211_tx_info, control and status buffer, and ieee80211_tx_rate cannot be used to achieve these goals due to fixed size limitations. Our new struct contains a struct rate_info to annotate the rate that was used, retry count of the rate and tx power. It is intended for all information related to RC and TPC that needs to be passed from driver to mac80211 and its RC/TPC algorithms like Minstrel_HT. It corresponds to one stage in an mrr. Multiple subsequent instances of this struct can be included in struct ieee80211_tx_status via a pointer and a length variable. Those instances can be allocated on-stack. The former reference to a single instance of struct rate_info is replaced with our new annotation. An extension is introduced to struct ieee80211_hw. There are two new members called 'tx_power_levels' and 'max_txpwr_levels_idx' acting as a tx power level table. When a wifi device is registered, the driver shall supply all supported power levels in this list. This allows to support several quirks like differing power steps in power level ranges or alike. TPC can use this for algorithm and thus be designed more abstract instead of handling all possible step widths individually. Further mandatory changes in status.c, mt76 and ath11k drivers due to the removal of 'struct rate_info rate' are also included. status.c already uses the information in ieee80211_tx_status->rate in radiotap, this is now changed to use ieee80211_rate_status->rate_idx. mt76 driver already uses struct rate_info to pass the tx rate to status path. The new members of the ieee80211_tx_status are set to NULL and 0 because the previously passed rate is not relevant to rate control and accurate information is passed via tx_info->status.rates. For ath11k, the txrate can be passed via this struct because ath11k uses firmware RC and thus the information does not interfere with software RC. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Link: https://lore.kernel.org/r/20220509173958.1398201-2-jelonek.jonas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:05:02 +02:00
Peter Seiderer	ee0e16ab75	mac80211: minstrel_ht: fill all requested rates Fill all requested rates (in case of ath9k 4 rate slots are available, so fill all 4 instead of only 3), improves throughput in noisy environment. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Link: https://lore.kernel.org/r/20220402153014.31332-2-ps.report@gmx.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:03:39 +02:00
Lavanya Suresh	195b9a0fd5	mac80211: disable BSS color collision detection in case of no free colors AP may run out of BSS color after color collision detection event from driver. Disable BSS color collision detection if no free colors are available based on bss color disabled bit sent as a part of NL80211_ATTR_HE_BSS_COLOR attribute sent in NL80211_CMD_SET_BEACON. It can be reenabled once new color is available. Signed-off-by: Lavanya Suresh <lavaks@codeaurora.org> Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1649867295-7204-3-git-send-email-quic_ramess@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:46:30 +02:00
Rameshkumar Sundaram	3d48cb7481	nl80211: Parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon NL80211_ATTR_HE_BSS_COLOR attribute can be included in both NL80211_CMD_START_AP and NL80211_CMD_SET_BEACON commands. Move he_bss_color from cfg80211_ap_settings to cfg80211_beacon_data and parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon() to have bss color settings parsed for both start ap and set beacon commands. Add a new flag he_bss_color_valid to indicate whether NL80211_ATTR_HE_BSS_COLOR attribute is included. Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1649867295-7204-2-git-send-email-quic_ramess@quicinc.com [fix build ...] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:45:21 +02:00
Eyal Birger	e6175a2ed1	xfrm: fix "disable_policy" flag use when arriving from different devices In IPv4 setting the "disable_policy" flag on a device means no policy should be enforced for traffic originating from the device. This was implemented by seting the DST_NOPOLICY flag in the dst based on the originating device. However, dsts are cached in nexthops regardless of the originating devices, in which case, the DST_NOPOLICY flag value may be incorrect. Consider the following setup: +------------------------------+ \| ROUTER \| +-------------+ \| +-----------------+ \| \| ipsec src \|----\|-\|ipsec0 \| \| +-------------+ \| \|disable_policy=0 \| +----+ \| \| +-----------------+ \|eth1\|-\|----- +-------------+ \| +-----------------+ +----+ \| \| noipsec src \|----\|-\|eth0 \| \| +-------------+ \| \|disable_policy=1 \| \| \| +-----------------+ \| +------------------------------+ Where ROUTER has a default route towards eth1. dst entries for traffic arriving from eth0 would have DST_NOPOLICY and would be cached and therefore can be reused by traffic originating from ipsec0, skipping policy check. Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead of the DST in IN/FWD IPv4 policy checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-05-16 09:31:26 +02:00
Johannes Berg	5dfad10812	mac80211: mlme: track assoc_bss/associated separately We currently track whether we're associated and which the BSS is in the same variable (ifmgd->associated), but for MLD we'll need to move the BSS pointer to be per link, while the question whether we're associated or not is for the whole interface. Add ifmgd->assoc_bss that stores the pointer and change ifmgd->associated to be just a bool, so the question of whether we're associated can continue working after MLD rework, without requiring changes, while the BSS pointer will have to be changed/used checked per link. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:16:20 +02:00
Johannes Berg	16d0364c72	mac80211: remove useless bssid copy We don't need to copy this locally, we now only use the variable to print before doing other things. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:15:19 +02:00
Johannes Berg	53da4c45ca	mac80211: remove unused argument to ieee80211_sta_connection_lost() We never use the bssid argument to ieee80211_sta_connection_lost() so we might as well just remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:15:04 +02:00
Johannes Berg	926101d2b7	mac80211: mlme: use local SSID copy There's no need to look it up from the ifmgd->associated BSS configuration, we already maintain a local copy since commit b0140fda626e ("mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc"). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:14:51 +02:00
Johannes Berg	c8fe4b0b37	mac80211: use ifmgd->bssid instead of ifmgd->associated->bssid Since we always track the BSSID there when we get associated, these are equivalent, but ifmgd->bssid saves a dereference and thus makes the code a bit smaller, and more readable. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:13:22 +02:00
Johannes Berg	f344c58c25	mac80211: mlme: move in RSSI reporting code This code is tightly coupled to the sdata->u.mgd data structure, so there's no reason for it to be in utils. Move it to mlme.c. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:12:34 +02:00
Johannes Berg	97f7a47024	mac80211: unify CCMP/GCMP AAD construction Ping-Ke's previous patch adjusted the CCMP AAD construction to properly take the order bit into account, but failed to update the (identical) GCMP AAD construction as well. Unify the AAD construction between the two cases. Reported-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20220506105150.51d66e2a6f3c.I65f12be82c112365169e8a9f48c7a71300e814b9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:10:38 +02:00
Paolo Abeni	95d6865178	mptcp: fix subflow accounting on close If the PM closes a fully established MPJ subflow or the subflow creation errors out in it's early stage the subflows counter is not bumped accordingly. This change adds the missing accounting, additionally taking care of updating accordingly the 'accept_subflow' flag. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-13 17:04:30 -07:00
Linus Torvalds	6dd5884d1d	NFS client bugfixes for Linux 5.18 Highlights include: Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJ+j5kACgkQZwvnipYK APIQag/+JXtDt9xo0SsGTMJ2PJArnRoyd3QcZjJtoabaZTylZyILZ20sxvt/jgby miOKrI+bbsykbrwQijRLF/Yys1G+iMBzSWy4lpJ9eH4AVkfjr5qqauRbPobo/ZiI i5fDLR84FlzKYWBY1Nv6YE5VREukIlXbrq7KWe/HoS7/hSAamhkUv+a0M8iNLGO4 QtH7+M0iBZTI9yM4gFAcMAANV21SxvjqP1z62kbCp00qO2mL0PgF/2pxC9WgX24/ EZX037ykzKFkjgWfzT8+/eIfCQGIPi/9e6Ir4Bc99psVFOYd1YxkTLBycNwm1VOz 5RLORbURDVMPQH2/qZ57u7B0gJF76UZM4pH3gv+i9nFhoUqf3kFZAOy48MZEFz3X sPiQZLck63mvO9Bd3QX6pFZc0datiYmhuXRknjxV6Bz/Y41y3NzeJeX4r88s4q7l tisZDmaIm0Q+H07QOTL0aCk456amP6XLnO1+nu/PSR/3ImwJSaOpypHx/BxDJUTG TvpY1ouBZ8irfT6JrbfVnSdbedIZx4c+6btVw6mlT40edZF6M3r+3s8s2TU7x+co uiBMB4Qj/C19zqcf/DziiL1PZEJPm3lk0fBHc6JIWCV4I3eYxQ5J4nFJsy8/jPm1 lTgreoHWfnYC3WhlxUk+N3+9X/9+iFEJUxqGYfANf8GFaGqcYVQ= =/Yg+ -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "One more pull request. There was a bug in the fix to ensure that gss- proxy continues to work correctly after we fixed the AF_LOCAL socket leak in the RPC code. This therefore reverts that broken patch, and replaces it with one that works correctly. Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option" * tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix broken handling of the softreval mount option SUNRPC: Ensure that the gssproxy client can start in a connected state Revert "SUNRPC: Ensure gss-proxy connects on setup"	2022-05-13 11:04:37 -07:00
Jakub Kicinski	2c5f153647	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-05-13 1) Cleanups for the code behind the XFRM offload API. This is a preparation for the extension of the API for policy offload. From Leon Romanovsky. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: drop not needed flags variable in XFRM offload struct net/mlx5e: Use XFRM state direction instead of flags netdevsim: rely on XFRM state direction instead of flags ixgbe: propagate XFRM offload state direction instead of flags xfrm: store and rely on direction to construct offload flags xfrm: rename xfrm_state_offload struct to allow reuse xfrm: delete not used number of external headers xfrm: free not used XFRM_ESP_NO_TRAILER flag ==================== Link: https://lore.kernel.org/r/20220513151218.4010119-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-13 10:25:08 -07:00
Kevin Mitchell	4f9bd53084	netfilter: conntrack: skip verification of zero UDP checksum The checksum is optional for UDP packets. However nf_reject would previously require a valid checksum to elicit a response such as ICMP_DEST_UNREACH. Add some logic to nf_reject_verify_csum to determine if a UDP packet has a zero checksum and should therefore not be verified. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Sven Auhagen	3412e16418	netfilter: flowtable: nft_flow_route use more data for reverse route When creating a flow table entry, the reverse route is looked up based on the current packet. There can be scenarios where the user creates a custom ip rule to route the traffic differently. In order to support those scenarios, the lookup needs to add more information based on the current packet. The patch adds multiple new information to the route lookup. Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Florian Westphal	90d1daa458	netfilter: conntrack: add nf_conntrack_events autodetect mode This adds the new nf_conntrack_events=2 mode and makes it the default. This leverages the earlier flag in struct net to allow to avoid the event extension as long as no event listener is active in the namespace. This avoids, for most cases, allocation of ct->ext area. A followup patch will take further advantage of this by avoiding calls down into the event framework if the extension isn't present. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Florian Westphal	b0a7ab4a77	netfilter: conntrack: un-inline nf_ct_ecache_ext_add Only called when new ct is allocated or the extension isn't present. This function will be extended, place this in the conntrack module instead of inlining. The callers already depend on nf_conntrack module. Return value is changed to bool, noone used the returned pointer. Make sure that the core drops the newly allocated conntrack if the extension is requested but can't be added. This makes it necessary to ifdef the section, as the stub always returns false we'd drop every new conntrack if the the ecache extension is disabled in kconfig. Add from data path (xt_CT, nft_ct) is unchanged. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Florian Westphal	2794cdb0b9	netfilter: nfnetlink: allow to detect if ctnetlink listeners exist At this time, every new conntrack gets the 'event cache extension' enabled for it. This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults to 1. Changing the default to 0 means that commands that rely on the event notification extension, e.g. 'conntrack -E' or conntrackd, stop working. We COULD detect if there is a listener by means of 'nfnetlink_has_listeners()' and only add the extension if this is true. The downside is a dependency from conntrack module to nfnetlink module. This adds a different way: inc/dec a counter whenever a ctnetlink group is being (un)subscribed and toggle a flag in struct net. Next patches will take advantage of this and will only add the event extension if the flag is set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Pablo Neira Ayuso	8169ff5840	netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup() This patch adds a structure to collect all the context data that is passed to the cleanup iterator. struct nf_ct_iter_data { struct net net; void *data; u32 portid; int report; }; There is a netns field that allows to clean up conntrack entries specifically owned by the specified netns. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:27 +02:00
Florian Westphal	0bcfbafbcd	netfilter: conntrack: avoid unconditional local_bh_disable Now that the conntrack entry isn't placed on the pcpu list anymore the bh only needs to be disabled in the 'expectation present' case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:27 +02:00
Florian Westphal	8a75a2c174	netfilter: conntrack: remove unconfirmed list It has no function anymore and can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:53:27 +02:00
Florian Westphal	ace53fdc26	netfilter: conntrack: remove __nf_ct_unconfirmed_destroy Its not needed anymore: A. If entry is totally new, then the rcu-protected resource must already have been removed from global visibility before call to nf_ct_iterate_destroy. B. If entry was allocated before, but is not yet in the hash table (uncofirmed case), genid gets incremented and synchronize_rcu() call makes sure access has completed. C. Next attempt to peek at extension area will fail for unconfirmed conntracks, because ext->genid != genid. D. Conntracks in the hash are iterated as before. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:17 +02:00
Florian Westphal	42df4fb9b1	netfilter: cttimeout: decouple unlink and free on netns destruction Increment the extid on module removal; this makes sure that even in extreme cases any old uncofirmed entry that happened to be kept e.g. on nfnetlink_queue list will not trip over a stale timeout reference. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	c56716c69c	netfilter: extensions: introduce extension genid count Multiple netfilter extensions store pointers to external data in their extension area struct. Examples: 1. Timeout policies 2. Connection tracking helpers. No references are taken for these. When a helper or timeout policy is removed, the conntrack table gets traversed and affected extensions are cleared. Conntrack entries not yet in the hashtable are referenced via a special list, the unconfirmed list. On removal of a policy or connection tracking helper, the unconfirmed list gets traversed an all entries are marked as dying, this prevents them from getting committed to the table at insertion time: core checks for dying bit, if set, the conntrack entry gets destroyed at confirm time. The disadvantage is that each new conntrack has to be added to the percpu unconfirmed list, and each insertion needs to remove it from this list. The list is only ever needed when a policy or helper is removed -- a rare occurrence. Add a generation ID count: Instead of adding to the list and then traversing that list on policy/helper removal, increment a counter that is stored in the extension area. For unconfirmed conntracks, the extension has the genid valid at ct allocation time. Removal of a helper/policy etc. increments the counter. At confirmation time, validate that ext->genid == global_id. If the stored number is not the same, do not allow the conntrack insertion, just like as if a confirmed-list traversal would have flagged the entry as dying. After insertion, the genid is no longer relevant (conntrack entries are now reachable via the conntrack table iterators and is set to 0. This allows removal of the percpu unconfirmed list. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	17438b42ce	netfilter: remove nf_ct_unconfirmed_destroy helper This helper tags connections not yet in the conntrack table as dying. These nf_conn entries will be dropped instead when the core attempts to insert them from the input or postrouting 'confirm' hook. After the previous change, the entries get unlinked from the list earlier, so that by the time the actual exit hook runs, new connections no longer have a timeout policy assigned. Its enough to walk the hashtable instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	78222bacfc	netfilter: cttimeout: decouple unlink and free on netns destruction Make it so netns pre_exit unlinks the objects from the pernet list, so they cannot be found anymore. netns core issues a synchronize_rcu() before calling the exit hooks so any the time the exit hooks run unconfirmed nf_conn entries have been free'd or they have been committed to the hashtable. The exit hook still tags unconfirmed entries as dying, this can now be removed in a followup change. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	1397af5bfd	netfilter: conntrack: remove the percpu dying list Its no longer needed. Entries that need event redelivery are placed on the new pernet dying list. The advantage is that there is no need to take additional spinlock on conntrack removal unless event redelivery failed or the conntrack entry was never added to the table in the first place (confirmed bit not set). The IPS_CONFIRMED bit now needs to be set as soon as the entry has been unlinked from the unconfirmed list, else the destroy function may attempt to unlink it a second time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	0d3cc504ba	netfilter: conntrack: include ecache dying list in dumps The new pernet dying list includes conntrack entries that await delivery of the 'destroy' event via ctnetlink. The old percpu dying list will be removed soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:52:16 +02:00
Florian Westphal	2ed3bf188b	netfilter: ecache: use dedicated list for event redelivery This disentangles event redelivery and the percpu dying list. Because entries are now stored on a dedicated list, all entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries still have confirmed bit set -- the reference count is at least 1. The 'struct net' back-pointer can be removed as well. The pcpu dying list will be removed eventually, it has no functionality. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:51:28 +02:00
Eric Dumazet	04c494e68a	Revert "tcp/dccp: get rid of inet_twsk_purge()" This reverts commits: 0dad4087a86a2cbe177404dc73f18ada26a2c390 ("tcp/dccp: get rid of inet_twsk_purge()") d507204d3c5cc57d9a8bdf0a477615bb59ea1611 ("tcp/dccp: add tw->tw_bslot") As Leonard pointed out, a newly allocated netns can happen to reuse a freed 'struct net'. While TCP TW timers were covered by my patches, other things were not: 1) Lookups in rx path (INET_MATCH() and INET6_MATCH()), as they look at 4-tuple plus the 'struct net' pointer. 2) /proc/net/tcp[6] and inet_diag, same reason. 3) hashinfo->bhash[], same reason. Fixing all this seems risky, lets instead revert. In the future, we might have a per netns tcp hash table, or a per netns list of timewait sockets... Fixes: 0dad4087a86a ("tcp/dccp: get rid of inet_twsk_purge()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Leonard Crestez <cdleonard@gmail.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-13 12:24:12 +01:00
Luiz Augusto von Dentz	3b42055388	Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan When suspending the passive scanning _must_ have its filter_policy set to 0x01 to use the accept list otherwise _any_ advertise report would end up waking up the system. In order to fix the filter_policy the code now checks for hdev->suspended && HCI_CONN_FLAG_REMOTE_WAKEUP first, since the MGMT_OP_SET_DEVICE_FLAGS will reject any attempt to set HCI_CONN_FLAG_REMOTE_WAKEUP when it cannot be programmed in the acceptlist, so it can return success causing the proper filter_policy to be used. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215768 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:22:29 +02:00
Luiz Augusto von Dentz	a9a347655d	Bluetooth: MGMT: Add conditions for setting HCI_CONN_FLAG_REMOTE_WAKEUP HCI_CONN_FLAG_REMOTE_WAKEUP can only be set if device can be programmed in the allowlist which in case of device using RPA requires LL Privacy support to be enabled. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215768 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:22:29 +02:00
Eric Dumazet	4915d50e30	inet: add READ_ONCE(sk->sk_bound_dev_if) in INET_MATCH() INET_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET_MATCH() an inline function to ease our changes. v2: We remove the 32bit version of it, as modern compilers should generate the same code really, no need to try to be smarter. Also make 'struct net *net' the first argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-13 12:17:25 +01:00
Vasyl Vavrychuk	ff7f292611	Bluetooth: core: Fix missing power_on work cancel on HCI close Move power_on work cancel to hci_dev_close_sync to ensure that power_on work is canceled after HCI interface down, power off, rfkill, etc. For example, if hciconfig hci0 down is done early enough during boot, it may run before power_on work. Then, power_on work will actually bring up interface despite above hciconfig command. Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@opensynergy.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:49 +02:00
Niels Dossche	5e2b6064cb	Bluetooth: protect le accept and resolv lists with hdev->lock Concurrent operations from events on le_{accept,resolv}_list are currently unprotected by hdev->lock. Most existing code do already protect the lists with that lock. This can be observed in hci_debugfs and hci_sync. Add the protection for these events too. Fixes: b950aa88638c ("Bluetooth: Add definitions and track LE resolve list modification") Fixes: 0f36b589e4ee ("Bluetooth: Track LE white list modification via HCI commands") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:49 +02:00
Niels Dossche	fb048cae51	Bluetooth: use hdev lock for accept_list and reject_list in conn req All accesses (both reads and modifications) to hdev->{accept,reject}_list are protected by hdev lock, except the ones in hci_conn_request_evt. This can cause a race condition in the form of a list corruption. The solution is to protect these lists in hci_conn_request_evt as well. I was unable to find the exact commit that introduced the issue for the reject list, I was only able to find it for the accept list. Fixes: a55bd29d5227 ("Bluetooth: Add white list lookup for incoming connection requests") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:49 +02:00
Niels Dossche	50a3633ae5	Bluetooth: use hdev lock in activate_scan for hci_is_adv_monitoring hci_is_adv_monitoring's function documentation states that it must be called under the hdev lock. Paths that leads to an unlocked call are: discov_update => start_discovery => interleaved_discov => active_scan and: discov_update => start_discovery => active_scan The solution is to take the lock in active_scan during the duration of the call to hci_is_adv_monitoring. Fixes: c32d624640fd ("Bluetooth: disable filter dup when scan for adv monitor") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:49 +02:00
Luiz Augusto von Dentz	6b5c1cdac4	Bluetooth: Print broken quirks This prints warnings for controllers setting broken quirks to increase their visibility and warn about broken controllers firmware that probably needs updates to behave properly. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:48 +02:00
Luiz Augusto von Dentz	05abad8572	Bluetooth: HCI: Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk This adds HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk which can be used to mark HCI_Enhanced_Setup_Synchronous_Connection as broken even if its support command bit are set since some controller report it as supported but the command don't work properly with some configurations (e.g. BT_VOICE_TRANSPARENT/mSBC). Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:48 +02:00
Brian Gix	31396dd53f	Bluetooth: Keep MGMT pending queue ordered FIFO Small change to add new commands to tail of the list, and find/remove them from the head of the list. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:48 +02:00
Ying Hsu	7aa1e7d15f	Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout Connecting the same socket twice consecutively in sco_sock_connect() could lead to a race condition where two sco_conn objects are created but only one is associated with the socket. If the socket is closed before the SCO connection is established, the timer associated with the dangling sco_conn object won't be canceled. As the sock object is being freed, the use-after-free problem happens when the timer callback function sco_sock_timeout() accesses the socket. Here's the call trace: dump_stack+0x107/0x163 ? refcount_inc+0x1c/ print_address_description.constprop.0+0x1c/0x47e ? refcount_inc+0x1c/0x7b kasan_report+0x13a/0x173 ? refcount_inc+0x1c/0x7b check_memory_region+0x132/0x139 refcount_inc+0x1c/0x7b sco_sock_timeout+0xb2/0x1ba process_one_work+0x739/0xbd1 ? cancel_delayed_work+0x13f/0x13f ? __raw_spin_lock_init+0xf0/0xf0 ? to_kthread+0x59/0x85 worker_thread+0x593/0x70e kthread+0x346/0x35a ? drain_workqueue+0x31a/0x31a ? kthread_bind+0x4b/0x4b ret_from_fork+0x1f/0x30 Link: https://syzkaller.appspot.com/bug?extid=2bef95d3ab4daa10155b Reported-by: syzbot+2bef95d3ab4daa10155b@syzkaller.appspotmail.com Fixes: e1dee2c1de2b ("Bluetooth: fix repeated calls to sco_sock_kill") Signed-off-by: Ying Hsu <yinghsu@chromium.org> Reviewed-by: Joseph Hwang <josephsih@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-05-13 13:05:48 +02:00
Jie Wang	0f6deac3a0	net: page_pool: add page allocation stats for two fast page allocate path Currently If use page pool allocation stats to analysis a RX performance degradation problem. These stats only count for pages allocate from page_pool_alloc_pages. But nic drivers such as hns3 use page_pool_dev_alloc_frag to allocate pages, so page stats in this API should also be counted. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-13 11:28:55 +01:00
Martin KaFai Lau	cae3873c5b	net: inet: Retire port only listening_hash The listen sk is currently stored in two hash tables, listening_hash (hashed by port) and lhash2 (hashed by port and address). After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the TCP-SYN lookup fast path does not use listening_hash. The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2") also moved the seq_file (/proc/net/tcp) iteration usage from listening_hash to lhash2. There are still a few listening_hash usages left. One of them is inet_reuseport_add_sock() which uses the listening_hash to search a listen sk during the listen() system call. This turns out to be very slow on use cases that listen on many different VIPs at a popular port (e.g. 443). [ On top of the slowness in adding to the tail in the IPv6 case ]. The latter patch has a selftest to demonstrate this case. This patch takes this chance to move all remaining listening_hash usages to lhash2 and then retire listening_hash. Since most changes need to be done together, it is hard to cut the listening_hash to lhash2 switch into small patches. The changes in this patch is highlighted here for the review purpose. 1. Because of the listening_hash removal, lhash2 can use the sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node. This will also keep the sk_unhashed() check to work as is after stop adding sk to listening_hash. The union is removed from inet_listen_hashbucket because only nulls_head is needed. 2. icsk->icsk_listen_portaddr_node and its helpers are removed. 3. The current lhash2 users needs to iterate with sk_nulls_node instead of icsk_listen_portaddr_node. One case is in the inet[6]_lhash2_lookup(). Another case is the seq_file iterator in tcp_ipv4.c. One thing to note is sk_nulls_next() is needed because the old inet_lhash2_for_each_icsk_continue() does a "next" first before iterating. 4. Move the remaining listening_hash usage to lhash2 inet_reuseport_add_sock() which this series is trying to improve. inet_diag.c and mptcp_diag.c are the final two remaining use cases and is moved to lhash2 now also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:52:18 -07:00
Martin KaFai Lau	e8d0059000	net: inet: Open code inet_hash2 and inet_unhash2 This patch folds lhash2 related functions into __inet_hash and inet_unhash. This will make the removal of the listening_hash in a latter patch easier to review. First, this patch folds inet_hash2 into __inet_hash. For unhash, the current call sequence is like inet_unhash() => __inet_unhash() => inet_unhash2(). The specific testing cases in __inet_unhash() are mostly related to TCP_LISTEN sk and its caller inet_unhash() already has the TCP_LISTEN test, so this patch folds both __inet_unhash() and inet_unhash2() into inet_unhash(). Note that all listening_hash users also have lhash2 initialized, so the !h->lhash2 check is no longer needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:52:17 -07:00
Martin KaFai Lau	8ea1eebb49	net: inet: Remove count from inet_listen_hashbucket After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the count is no longer used. This patch removes it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:52:17 -07:00

... 2 3 4 5 6 ...

69390 Commits