IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* remove a double mutex-unlock
* fix a leak in an error path
* NULL pointer check
* include if_vlan.h where needed
* avoid RCU list traversal when not under RCU
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5UGA8ACgkQB8qZga/f
l8Qk4g/8DstHoC7R74x4jrOyGffw7ss7XC8YJi2x3IENg7mAYdEFvnUC7GQmrCso
aMypkOWac0H4cAw2BId8CFevbWZbanJ9uvC4JicQpaNa9bZkqBaDMFwO3k66S1hA
JBvmOqPy8FNuFRsXTicMPw7/I6QazR+sW7Ed68L7ygikkklvTeovH9qkL+zHHMcK
dAVS6ZKB2jSkEOtnlG01QMRwdIijTkD6KjGDECVMzau9XC+fGowQeuy+xuAU0THq
Myef9CpzU2Q47Sa61fEyXVC54Izp9B1ZuohM2wfrN2zC3SKxsmPM1lm4GhrJOfUF
wJTQY75z1kAmCAJDKpwBrFkyIEfUPQV1v9HHit0LX4y11NYNWL22liLhPv+kZDmD
9ht1IxAP1Nw/DSVKaXUmwk1tFkETfk2BvCkNjVrwKlt88QgvQAL/uzgKoz5E6VIM
hU1hp0mJCzsmyt/COWOf0l/QpO2nN3/64YHZgFiqOGnhXr5r0L74Jf3lcdWyaXRU
hcJ0M/EcfGLWOlHFcRkNdWCkBFk5Daf4/zzQ3vO9weAlb7wOLDo9mHoXMBgP6R9l
kCnxjxgIqmpQLvMo/sBt2cLaf3bJU40p/BYU7hqI7CerODFc3JKkCs1ODZPbYDaR
Fk60mlElkBD3P4EU8/6nbXHOfG9FvEiZjhT7kRcUdPx0dLA/KMk=
=7JH4
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-net-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg
====================
A few fixes:
* remove a double mutex-unlock
* fix a leak in an error path
* NULL pointer check
* include if_vlan.h where needed
* avoid RCU list traversal when not under RCU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* lots of small documentation fixes, from Jérôme Pouiller
* beacon protection (BIGTK) support from Jouni Malinen
* some initial code for TID configuration, from Tamizh chelvam
* I reverted some new API before it's actually used, because
it's wrong to mix controlled port and preauth
* a few other cleanups/fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl5UFlsACgkQB8qZga/f
l8SwNQ//Y/dELGODumDxY03l/Bj/+XL7HYl3Yn+J8mt2mYPC3zjTjvcRJBApiAog
1Oyd75fd+EqjxDUT+ngN8uJLQ/yCVsdwfpZhwV7VanAOuLI9aYkIXnai/Qs96rj3
yDIlrBVsmfsaxf/2e9UmsLmeUSm7C5s1EzaAIwoRGvcUH0pbH9WYAXF1QV+8fmXa
yoXuHV5Bv+wOW2xWqWJsFpoV109AW24pwJm0vlILcpFP/jno2GsRvwEpnC/GJhEA
4+jfEj0KlFkOewp0/HcqrUJp4yDEBhnhTTYgDL3hSWgKRVorKqY4/QmpKQCmpVQk
Qrb6k+TrnLmKBQKdqfd+PKAEC9U/9Wjg0KLPyc9btBGFNSUG3QoDigzxxIvSlW6w
2vyanDW6780FTIi8sA7sq1cBLosIyoFG44YYwbMidVtxhBk1LMRvetNAOnAJeycp
Abbp/A2EdvzM+ZMNMRwWlsgig6WkGY7jy/zpcmQUdALM+yT2o7D5ZwxE5pa2ggds
jf0eER1vVCEpOL70swNSZbAnDNCzBzTN64GX9gQIjdVT+nMUYQXwuOgEmho1FshD
bgZz4PcaOCCSTice9GYCC3C9OqXBsE2DyBwytYYzahyDiQH13Iz6wEq/+tIIjzCN
KKRdD12TXaPobLwv5zI3kogJ1I4P1fa6RZYpkngLpMbQrQ7qiDI=
=x0rf
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-net-next-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A new set of changes:
* lots of small documentation fixes, from Jérôme Pouiller
* beacon protection (BIGTK) support from Jouni Malinen
* some initial code for TID configuration, from Tamizh chelvam
* I reverted some new API before it's actually used, because
it's wrong to mix controlled port and preauth
* a few other cleanups/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a pr_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Bareudp tunnel module provides a generic L3 encapsulation
tunnelling module for tunnelling different protocols like MPLS,
IP,NSH etc inside a UDP tunnel.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning unix_wait_for_peer()
warning: context imbalance in unix_wait_for_peer() - unexpected unlock
The root cause is the missing annotation at unix_wait_for_peer()
Add the missing annotation __releases(&unix_sk(other)->lock)
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at dccp_child_process()
warning: context imbalance in dccp_child_process() - unexpected unlock
The root cause is the missing annotation at dccp_child_process()
Add the missing __releases(child) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_neigh_stop()
warning: context imbalance in nr_neigh_stop() - unexpected unlock
The root cause is the missing annotation at nr_neigh_stop()
Add the missing __releases(&nr_neigh_list_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_neigh_start()
warning: context imbalance in nr_neigh_start() - wrong count at exit
The root cause is the missing annotation at nr_neigh_start()
Add the missing __acquires(&nr_neigh_list_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_node_stop()
warning: context imbalance in nr_node_stop() - wrong count at exit
The root cause is the missing annotation at nr_node_stop()
Add the missing __releases(&nr_node_list_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_node_start()
warning: context imbalance in nr_node_start() - wrong count at exit
The root cause is the missing annotation at nr_node_start()
Add the missing __acquires(&nr_node_list_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_info_stop()
warning: context imbalance in nr_info_stop() - unexpected unlock
The root cause is the missing annotation at nr_info_stop()
Add the missing __releases(&nr_list_lock)
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at nr_info_start()
warning: context imbalance in nr_info_start() - wrong count at exit
The root cause is the missing annotation at nr_info_start()
Add the missing __acquires(&nr_list_lock)
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at llc_seq_start()
warning: context imbalance in llc_seq_start() - wrong count at exit
The root cause is the msiing annotation at llc_seq_start()
Add the missing __acquires(RCU) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at sctp_transport_walk_stop()
warning: context imbalance in sctp_transport_walk_stop
- wrong count at exit
The root cause is the missing annotation at sctp_transport_walk_stop()
Add the missing __releases(RCU) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at sctp_transport_walk_start()
warning: context imbalance in sctp_transport_walk_start
- wrong count at exit
The root cause is the missing annotation at sctp_transport_walk_start()
Add the missing __acquires(RCU) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse reports a warning at sctp_err_finish()
warning: context imbalance in sctp_err_finish() - unexpected unlock
The root cause is a missing annotation at sctp_err_finish()
Add the missing __releases(&((__sk)->sk_lock.slock)) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6mr_for_each_table() macro uses list_for_each_entry_rcu()
for traversing outside an RCU read side critical section
but under the protection of rtnl_mutex. Hence add the
corresponding lockdep expression to silence the following
false-positive warnings:
[ 4.319479] =============================
[ 4.319480] WARNING: suspicious RCU usage
[ 4.319482] 5.5.4-stable #17 Tainted: G E
[ 4.319483] -----------------------------
[ 4.319485] net/ipv6/ip6mr.c:1243 RCU-list traversed in non-reader section!!
[ 4.456831] =============================
[ 4.456832] WARNING: suspicious RCU usage
[ 4.456834] 5.5.4-stable #17 Tainted: G E
[ 4.456835] -----------------------------
[ 4.456837] net/ipv6/ip6mr.c:1582 RCU-list traversed in non-reader section!!
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
md5sig->head maybe traversed using hlist_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of socket lock.
Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_ulp_list is traversed using list_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of tcp_ulp_list_lock.
Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.t
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet traps that can report packets that were dropped during ACL
processing.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
if the packet has the vlan header inside (e.g. bridge with disabled
tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
to extract the vid before filtering which in turn calls pskb_may_pull()
and we may end up with a stale eth pointer. Moreover the cached eth header
pointer will generally be wrong after that operation. Remove the eth header
caching and just use eth_hdr() directly, the compiler does the right thing
and calculates it only once so we don't lose anything.
Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
generic_xdp_tx and xdp_do_generic_redirect are only used by builtin
code, so remove the EXPORT_SYMBOL_GPL for them.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement drv_set_tid_config api to allow TID specific
configuration and drv_reset_tid_config api to reset peer
specific TID configuration. This per-TID onfiguration
will be applied for all the connected stations when MAC is NULL.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-7-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID RTSCTS control
configuration to enable/disable through the
NL80211_TID_CONFIG_ATTR_RTSCTS_CTRL attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-5-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID AMPDU control
configuration to enable/disable aggregation through the
NL80211_TID_CONFIG_ATTR_AMPDU_CTRL attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-4-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds support to configure per TID retry configuration
through the NL80211_TID_CONFIG_ATTR_RETRY_SHORT and
NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes. This TID specific
retry configuration will have more precedence than phy level
configuration.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-3-git-send-email-tamizhr@codeaurora.org
[rebase completely on top of my previous API changes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make some changes to the TID-config API:
* use u16 in nl80211 (only, and restrict to using 8 bits for now),
to avoid issues in the future if we ever want to use higher TIDs.
* reject empty TIDs mask (via netlink policy)
* change feature advertising to not use extended feature flags but
have own mechanism for this, which simplifies the code
* fix all variable names from 'tid' to 'tids' since it's a mask
* change to cfg80211_ name prefixes, not ieee80211_
* fix some minor docs/spelling things.
Change-Id: Ia234d464b3f914cdeab82f540e018855be580dce
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the new NL80211_CMD_SET_TID_CONFIG command to support
data TID specific configuration. Per TID configuration is
passed in the nested NL80211_ATTR_TID_CONFIG attribute.
This patch adds support to configure per TID noack policy
through the NL80211_TID_CONFIG_ATTR_NOACK attribute.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1579506687-18296-2-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
local->sta_mtx is held in __ieee80211_check_fast_rx_iface().
No need to use list_for_each_entry_rcu() as it also requires
a cond argument to avoid false lockdep warnings when not used in
RCU read-side section (with CONFIG_PROVE_RCU_LIST).
Therefore use list_for_each_entry();
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds support for mac80211 to verify that received Beacon frames
have a valid MME in station mode when a BIGTK is configured.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-6-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds support for mac80211 to add an MME into Beacon frames in AP
mode when a BIGTK is configured.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-5-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When BIP is used to protect Beacon frames, the Timestamp field is masked
to zero. Otherwise, the BIP processing is identical to the way it was
already used with group-addressed Robust Management frames.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-4-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Extend mac80211 key configuration to support the new BIGTK with key
index values 6 and 7. Support for actually protecting Beacon frames
(adding the MME in AP mode and checking it in STA mode) is covered in
separate commits.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-3-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IEEE P802.11-REVmd/D3.0 adds support for protecting Beacon frames using
a new set of keys (BIGTK; key index 6..7) similarly to the way
group-addressed Robust Management frames are protected (IGTK; key index
4..5). Extend cfg80211 and nl80211 to allow the new BIGTK to be
configured. Add an extended feature flag to indicate driver support for
the new key index values to avoid array overflows in driver
implementations and also to indicate to user space when this
functionality is available.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-2-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These were helpful while working with extensions to NL80211_CMD_NEW_KEY,
so add more explicit error reporting for additional cases that can fail
while that command is being processed.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200222132548.20835-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 8c3ed7aa2b9ef666195b789e9b02e28383243fa8.
As Jouni points out, there's really no need for this, since the
RSN pre-authentication frames are normal data frames, not port
control frames (locally).
We can still revert this now since it hasn't actually gone beyond
-next.
Fixes: 8c3ed7aa2b9e ("nl80211: add src and dst addr attributes for control port tx/rx")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200224101910.b746e263287a.I9eb15d6895515179d50964dec3550c9dc784bb93@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 9b125c27998719288e4dcf2faf54511039526692.
As Jouni points out, there's really no need for this, since the
RSN pre-authentication frames are normal data frames, not port
control frames (locally).
Fixes: 9b125c279987 ("mac80211: support NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211_MAC_ADDRS")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200224101910.b87da63a3cd6.Ic94bc51a370c4aa7d19fbca9b96d90ab703257dc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
devlink_dpipe_table_find() should be called under either
rcu_read_lock() or devlink->lock. devlink_dpipe_table_register()
calls devlink_dpipe_table_find() without holding the lock
and acquires it later. Therefore hold the devlink->lock
from the beginning of devlink_dpipe_table_register().
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 2690048c01f3 ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval"), they
are not used now
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a rare corner case the new logic for undo of SYNACK RTO could
result in triggering the warning in tcp_fastretrans_alert() that says:
WARN_ON(tp->retrans_out != 0);
The warning looked like:
WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270
The sequence that tickles this bug is:
- Fast Open server receives TFO SYN with data, sends SYNACK
- (client receives SYNACK and sends ACK, but ACK is lost)
- server app sends some data packets
- (N of the first data packets are lost)
- server receives client ACK that has a TS ECR matching first SYNACK,
and also SACKs suggesting the first N data packets were lost
- server performs TS undo of SYNACK RTO, then immediately
enters recovery
- buggy behavior then performed a *second* undo that caused
the connection to be in CA_Open with retrans_out != 0
Basically, the incoming ACK packet with SACK blocks causes us to first
undo the cwnd reduction from the SYNACK RTO, but then immediately
enters fast recovery, which then makes us eligible for undo again. And
then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
using a "mash-up" of state from two different loss recovery phases: it
uses the timestamp info from the ACK of the original SYNACK, and the
undo_marker from the fast recovery.
This fix refines the logic to only invoke the tcp_try_undo_loss()
inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
CA_Loss. If peer SACKs triggered fast recovery, then
tcp_rcv_synrecv_state_fastopen() can't safely undo.
Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if attribute parsing fails and the genl family
does not support parallel operation, the error code returned
by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse().
Be sure to report the error for all genl families.
Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function")
Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in
ipv4_link_failure()"), __ip_options_compile() must be called under rcu
protection.
Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.
Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-02-21
The following pull-request contains BPF updates for your *net-next* tree.
We've added 25 non-merge commits during the last 4 day(s) which contain
a total of 33 files changed, 2433 insertions(+), 161 deletions(-).
The main changes are:
1) Allow for adding TCP listen sockets into sock_map/hash so they can be used
with reuseport BPF programs, from Jakub Sitnicki.
2) Add a new bpf_program__set_attach_target() helper for adding libbpf support
to specify the tracepoint/function dynamically, from Eelco Chaudron.
3) Add bpf_read_branch_records() BPF helper which helps use cases like profile
guided optimizations, from Daniel Xu.
4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu.
5) Relax BTF mandatory check if only used for libbpf itself e.g. to process
BTF defined maps, from Andrii Nakryiko.
6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has
been observed that former fails in envs with low memlock, from Yonghong Song.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose
reuseport_lock") has introduced lazy generation of reuseport group IDs that
survive group resize.
By comparing the identifier we check if BPF reuseport program is not trying
to select a socket from a BPF map that belongs to a different reuseport
group than the one the packet is for.
Because SOCKARRAY used to be the only BPF map type that can be used with
reuseport BPF, it was possible to delay the generation of reuseport group
ID until a socket from the group was inserted into BPF map for the first
time.
Now that SOCK{MAP,HASH} can be used with reuseport BPF we have two options,
either generate the reuseport ID on map update, like SOCKARRAY does, or
allocate an ID from the start when reuseport group gets created.
This patch takes the latter approach to keep sockmap free of calls into
reuseport code. This streamlines the reuseport_id access as its lifetime
now matches the longevity of reuseport object.
The cost of this simplification, however, is that we allocate reuseport IDs
for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their
setups. With the way identifiers are currently generated, we can have at
most S32_MAX reuseport groups, which hopefully is sufficient. If we ever
get close to the limit, we can switch an u64 counter like sk_cookie.
Another change is that we now always call into SOCKARRAY logic to unlink
the socket from the map when unhashing or closing the socket. Previously we
did it only when at least one socket from the group was in a BPF map.
It is worth noting that this doesn't conflict with sockmap tear-down in
case a socket is in a SOCK{MAP,HASH} and belongs to a reuseport
group. sockmap tear-down happens first:
prot->unhash
`- tcp_bpf_unhash
|- tcp_bpf_remove
| `- while (sk_psock_link_pop(psock))
| `- sk_psock_unlink
| `- sock_map_delete_from_link
| `- __sock_map_delete
| `- sock_map_unref
| `- sk_psock_put
| `- sk_psock_drop
| `- rcu_assign_sk_user_data(sk, NULL)
`- inet_unhash
`- reuseport_detach_sock
`- bpf_sk_reuseport_detach
`- WRITE_ONCE(sk->sk_user_data, NULL)
Suggested-by: Martin Lau <kafai@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200218171023.844439-10-jakub@cloudflare.com
SOCKMAP & SOCKHASH now support storing references to listening
sockets. Nothing keeps us from using these map types a collection of
sockets to select from in BPF reuseport programs. Whitelist the map types
with the bpf_sk_select_reuseport helper.
The restriction that the socket has to be a member of a reuseport group
still applies. Sockets in SOCKMAP/SOCKHASH that don't have sk_reuseport_cb
set are not a valid target and we signal it with -EINVAL.
The main benefit from this change is that, in contrast to
REUSEPORT_SOCKARRAY, SOCK{MAP,HASH} don't impose a restriction that a
listening socket can be just one BPF map at the same time.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200218171023.844439-9-jakub@cloudflare.com