linux/net/sched
Vladimir Oltean 1461d212ab net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
taprio can only operate as root qdisc, and to that end, there exists the
following check in taprio_init(), just as in mqprio:

	if (sch->parent != TC_H_ROOT)
		return -EOPNOTSUPP;

And indeed, when we try to attach taprio to an mqprio child, it fails as
expected:

$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Error: sch_taprio: Can only be attached as root qdisc.

(extack message added by me)

But when we try to attach a taprio child to a taprio root qdisc,
surprisingly it doesn't fail:

$ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI

This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.

In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.

Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.

When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.

The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.

In fact, since commit 13511704f8 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.

To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).

Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.

Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 11:41:14 -07:00
..
act_api.c net/sched: act_api: Notify user space if any actions were flushed before error 2022-06-27 21:51:23 -07:00
act_bpf.c bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress 2022-03-03 14:38:48 +00:00
act_connmark.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_csum.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_ct.c net/sched: act_ct: set 'net' pointer when creating new nf_flow_table 2022-07-11 16:25:14 +02:00
act_ctinfo.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_gact.c net/sched: act_gact: Add extack messages for offload failure 2022-04-08 13:45:43 +01:00
act_gate.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_ife.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_ipt.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
act_mpls.c net/sched: act_mpls: Add extack messages for offload failure 2022-04-08 13:45:43 +01:00
act_nat.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_pedit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-05-19 11:23:59 -07:00
act_police.c net/sched: act_police: allow 'continue' action offload 2022-07-06 12:44:39 +01:00
act_sample.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_simple.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_skbedit.c net: sched: support hash selecting tx queue 2022-04-19 12:20:45 +02:00
act_skbmod.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_tunnel_key.c net/sched: act_tunnel_key: Add extack message for offload failure 2022-04-08 13:45:43 +01:00
act_vlan.c net/sched: act_vlan: Add extack message for offload failure 2022-04-08 13:45:43 +01:00
cls_api.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-07-21 13:03:39 -07:00
cls_basic.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_bpf.c bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress 2022-03-03 14:38:48 +00:00
cls_cgroup.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_flow.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_flower.c net/sched: flower: Add PPPoE filter 2022-07-26 10:20:29 -07:00
cls_fw.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_matchall.c net/sched: matchall: Avoid overwriting error messages 2022-04-08 13:45:43 +01:00
cls_route.c net_sched: cls_route: disallow handle of 0 2022-08-15 11:46:30 +01:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_tcindex.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_u32.c net/sched: cls_u32: fix possible leak in u32_init_knode() 2022-04-15 14:26:11 -07:00
em_canid.c
em_cmp.c
em_ipset.c
em_ipt.c
em_meta.c net_sched: em_meta: add READ_ONCE() in var_sk_bound_if() 2022-05-16 10:31:06 +01:00
em_nbyte.c
em_text.c
em_u32.c
ematch.c
Kconfig
Makefile
sch_api.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
sch_atm.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_blackhole.c
sch_cake.c Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" 2022-08-31 20:02:28 -07:00
sch_cbq.c net/sched: sch_cbq: change the type of cbq_set_lss to void 2022-07-27 18:30:18 -07:00
sch_cbs.c
sch_choke.c
sch_codel.c
sch_drr.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_dsmark.c net/sched: store the last executed chain also for clsact egress 2021-07-29 22:17:37 +01:00
sch_etf.c
sch_ets.c net/sched: sch_ets: don't remove idle classes from the round-robin list 2021-12-13 12:30:23 +00:00
sch_fifo.c net_sched: fix NULL deref in fifo_set_limit() 2021-10-01 14:59:10 -07:00
sch_fq_codel.c fq_codel: generalise ce_threshold marking for subset of traffic 2021-10-20 15:24:36 -07:00
sch_fq_pie.c net/sched: fq_pie: prevent dismantle issue 2021-12-09 08:01:00 -08:00
sch_fq.c
sch_frag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
sch_generic.c net/sched: fix netdevice reference leaks in attach_default_qdiscs() 2022-08-30 15:10:08 +02:00
sch_gred.c net: sched: gred: dynamically allocate tc_gred_qopt_offload 2021-10-27 12:06:52 -07:00
sch_hfsc.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_hhf.c
sch_htb.c sch_htb: Fail on unsupported parameters when offload is requested 2022-01-25 20:00:02 -08:00
sch_ingress.c
sch_mq.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_mqprio.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_multiq.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_netem.c net/sched: sch_netem: Fix arithmetic in netem_dump() for 32-bit platforms 2022-06-17 20:29:38 -07:00
sch_pie.c
sch_plug.c
sch_prio.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_qfq.c sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc 2022-01-04 12:36:51 +00:00
sch_red.c
sch_sfb.c sch_sfb: Also store skb len before calling child enqueue 2022-09-08 11:12:58 +02:00
sch_sfq.c net/sched: store the last executed chain also for clsact egress 2021-07-29 22:17:37 +01:00
sch_skbprio.c
sch_taprio.c net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs 2022-09-20 11:41:14 -07:00
sch_tbf.c net: sched: tbf: don't call qdisc_put() while holding tree lock 2022-08-30 11:41:24 +02:00
sch_teql.c