linux/net/sched
Eric Dumazet 4a8e320c92 net: sched: use pinned timers
While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.

Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :

EST="est 1sec 4sec"
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
 tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
 tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
 tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
 tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
 tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
 tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
 tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done

We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.

    39.24%  [kernel]          [k] _raw_spin_lock
     2.65%  [kernel]          [k] netem_enqueue
     1.80%  [kernel]          [k] netem_dequeue
     1.63%  [kernel]          [k] copy_user_enhanced_fast_string
     1.45%  [kernel]          [k] _raw_spin_lock_bh

By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.

     5.84%  [kernel]          [k] netem_enqueue
     4.83%  [kernel]          [k] _raw_spin_lock
     2.92%  [kernel]          [k] copy_user_enhanced_fast_string

Current Qdiscs that benefit from this change are :

	netem, cbq, fq, hfsc, tbf, htb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:26:48 -04:00
..
act_api.c net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
act_csum.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_gact.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_ipt.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_mirred.c net_sched: hold tcf_lock in netdevice notifier 2014-07-20 20:31:42 -07:00
act_nat.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_pedit.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_police.c net: use ktime_get_ns() and ktime_get_real_ns() helpers 2014-08-22 19:57:23 -07:00
act_simple.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
act_skbedit.c net_sched: act: move tcf_hashinfo_init() into tcf_register_action() 2014-02-12 19:23:32 -05:00
cls_api.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
cls_basic.c net: sched: cls_basic use RCU 2014-09-13 12:30:25 -04:00
cls_bpf.c net_sched: fix suspicious RCU usage in cls_bpf_classify() 2014-09-15 17:42:08 -04:00
cls_cgroup.c net: sched: cls_cgroup need tcf_exts_init in all cases 2014-09-16 16:26:39 -04:00
cls_flow.c net: sched: cls_flow use RCU 2014-09-13 12:30:26 -04:00
cls_fw.c net: sched: cls_fw: add missing tcf_exts_init call in fw_change() 2014-09-16 15:59:36 -04:00
cls_route.c net: sched: RCU cls_route 2014-09-13 12:30:26 -04:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net: sched: rcu'ify cls_rsvp 2014-09-13 12:30:26 -04:00
cls_tcindex.c net_sched: fix a null pointer dereference in tcindex_set_parms() 2014-09-16 15:20:09 -04:00
cls_u32.c net: sched: fix compile warning in cls_u32 2014-09-22 16:47:19 -04:00
em_canid.c net: em_canid: remove useless statements from em_canid_change 2014-06-21 15:40:22 -07:00
em_cmp.c
em_ipset.c em_ipset: use dev_net() accessor 2013-10-18 16:23:06 -04:00
em_meta.c net: Change skb_get_rxhash to skb_get_hash 2013-12-17 16:36:21 -05:00
em_nbyte.c
em_text.c
em_u32.c
ematch.c
Kconfig net: pkt_sched: PIE AQM scheme 2014-01-06 15:13:01 -05:00
Makefile net: pkt_sched: PIE AQM scheme 2014-01-06 15:13:01 -05:00
sch_api.c net: sched: use pinned timers 2014-09-26 00:26:48 -04:00
sch_atm.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_blackhole.c
sch_cbq.c net: sched: use pinned timers 2014-09-26 00:26:48 -04:00
sch_choke.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
sch_codel.c
sch_drr.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_dsmark.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_fifo.c
sch_fq_codel.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_fq.c net: use ktime_get_ns() and ktime_get_real_ns() helpers 2014-08-22 19:57:23 -07:00
sch_generic.c net: sched: use __skb_queue_head_init() where applicable 2014-09-19 16:32:10 -04:00
sch_gred.c net_sched: replace pr_warning with pr_warn 2013-12-31 13:50:56 -05:00
sch_hfsc.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_hhf.c net: use the new API kvfree() 2014-06-05 00:49:51 -07:00
sch_htb.c net: sched: use pinned timers 2014-09-26 00:26:48 -04:00
sch_ingress.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_mq.c pkt_sched: give visibility to mq slave qdiscs 2013-12-09 19:54:47 -05:00
sch_mqprio.c net: qdisc: use rcu prefix and silence sparse warnings 2014-09-13 12:30:25 -04:00
sch_multiq.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_netem.c net: use the new API kvfree() 2014-06-05 00:49:51 -07:00
sch_pie.c net: sched: Cleanup PIE comments 2014-02-13 18:29:58 -05:00
sch_plug.c
sch_prio.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_qfq.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_red.c
sch_sfb.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_sfq.c net: rcu-ify tcf_proto 2014-09-13 12:30:25 -04:00
sch_tbf.c net: use ktime_get_ns() and ktime_get_real_ns() helpers 2014-08-22 19:57:23 -07:00
sch_teql.c net: qdisc: use rcu prefix and silence sparse warnings 2014-09-13 12:30:25 -04:00