linux/net/sched
Yunsheng Lin a90c57f2ce net: sched: fix packet stuck problem for lockless qdisc
Lockless qdisc has below concurrent problem:
    cpu0                 cpu1
     .                     .
q->enqueue                 .
     .                     .
qdisc_run_begin()          .
     .                     .
dequeue_skb()              .
     .                     .
sch_direct_xmit()          .
     .                     .
     .                q->enqueue
     .             qdisc_run_begin()
     .            return and do nothing
     .                     .
qdisc_run_end()            .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action)     cpu1             cpu2
          .                   .                .
          .              q->enqueue            .
          .            qdisc_run_begin()       .
          .              dequeue_skb()         .
          .                   .            q->enqueue
          .                   .                .
          .             sch_direct_xmit()      .
          .                   .         qdisc_run_begin()
          .                   .       return and do nothing
          .                   .                .
 clear __QDISC_STATE_SCHED    .                .
 qdisc_run_begin()            .                .
 return and do nothing        .                .
          .                   .                .
          .            qdisc_run_end()         .

This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
   not set, set STATE_MISSED and retry another spin_trylock() in
   case other CPU may not see STATE_MISSED after it releases the
   lock.
2. reschedule if STATE_MISSED is set after the lock is released
   at the end of qdisc_run_end().

For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.

Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.

Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

 threads  without+this_patch   with+this_patch      delta
    1        2.61Mpps            2.60Mpps           -0.3%
    2        3.97Mpps            3.82Mpps           -3.7%
    4        5.62Mpps            5.59Mpps           -0.5%
    8        2.78Mpps            2.77Mpps           -0.3%
   16        2.22Mpps            2.22Mpps           -0.0%

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:05:46 -07:00
..
act_api.c net: sched: fix err handler in tcf_action_init() 2021-04-08 13:47:33 -07:00
act_bpf.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
act_connmark.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_csum.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ct.c net/sched: act_ct: Remove redundant ct get and check 2021-04-28 13:51:18 -07:00
act_ctinfo.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_gact.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_gate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_ife.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ipt.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
act_mpls.c net/sched: act_mpls: ensure LSE is pullable before reading it 2020-12-03 11:13:37 -08:00
act_nat.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_pedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_police.c net/sched: act_police: add support for packet-per-second policing 2021-03-13 14:18:09 -08:00
act_sample.c psample: Encapsulate packet metadata in a struct 2021-03-14 15:00:43 -07:00
act_simple.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_skbedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_skbmod.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_tunnel_key.c net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels 2020-10-20 21:10:41 -07:00
act_vlan.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
cls_api.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-09 20:48:35 -07:00
cls_basic.c
cls_bpf.c
cls_cgroup.c
cls_flow.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
cls_flower.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-03-25 15:31:22 -07:00
cls_fw.c
cls_matchall.c net: qos offload add flow status with dropped count 2020-06-19 12:53:30 -07:00
cls_route.c net_sched: cls_route: remove the right filter from hashtable 2020-03-16 01:59:32 -07:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
cls_tcindex.c net_sched: avoid shift-out-of-bounds in tcindex_set_parms() 2021-01-15 18:11:10 -08:00
cls_u32.c net/sched: cls_u32: simplify the return expression of u32_reoffload_knode() 2020-12-08 16:22:53 -08:00
em_canid.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
em_cmp.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_nbyte.c net: sched: Return the correct errno code 2021-02-06 11:15:28 -08:00
em_text.c
em_u32.c
ematch.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
Kconfig net: sched: incorrect Kconfig dependencies on Netfilter modules 2020-12-09 15:49:29 -08:00
Makefile net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
sch_api.c net: sched: avoid duplicates in classes dump 2021-03-04 14:27:47 -08:00
sch_atm.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
sch_cbq.c net: sched: Mundane typo fixes 2021-03-24 15:09:11 -07:00
sch_cbs.c net: don't include ethtool.h from netdevice.h 2020-11-23 17:27:04 -08:00
sch_choke.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_codel.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_drr.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_dsmark.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_etf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_ets.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fifo.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fq_codel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_fq_pie.c net/sched: fq_pie: initialize timer earlier in fq_pie_init() 2020-12-04 14:15:01 -08:00
sch_fq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_frag.c net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets 2021-04-29 15:31:53 -07:00
sch_generic.c net: sched: fix packet stuck problem for lockless qdisc 2021-05-14 15:05:46 -07:00
sch_gred.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_hfsc.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_hhf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_htb.c sch_htb: fix null pointer dereference on a null new_q 2021-03-30 13:50:03 -07:00
sch_ingress.c
sch_mq.c
sch_mqprio.c
sch_multiq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_netem.c netem: fix zero division in tabledist 2020-10-29 11:45:47 -07:00
sch_pie.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
sch_plug.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_prio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_qfq.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_red.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_sfb.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_sfq.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_skbprio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_taprio.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-26 12:00:00 -07:00
sch_tbf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_teql.c net: sched: sch_teql: fix null-pointer dereference 2021-04-08 14:14:42 -07:00