net: cache for same cpu skb_attempt_defer_free

Optimise skb_attempt_defer_free() when run by the same CPU the skb was
allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
disable softirqs and put the buffer into cpu local caches.

CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1%
throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles,
the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note,
I'd expect the win doubled with rx only benchmarks, as the optimisation
is for the receive path, but the test spends >55% of CPU doing writes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Pavel Begunkov 2024-04-10 02:28:09 +01:00 committed by Jakub Kicinski
parent 9b9fd45869
commit 7cb31c46b9

View File

@ -6974,6 +6974,19 @@ free_now:
EXPORT_SYMBOL(__skb_ext_put); EXPORT_SYMBOL(__skb_ext_put);
#endif /* CONFIG_SKB_EXTENSIONS */ #endif /* CONFIG_SKB_EXTENSIONS */
static void kfree_skb_napi_cache(struct sk_buff *skb)
{
/* if SKB is a clone, don't handle this case */
if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
__kfree_skb(skb);
return;
}
local_bh_disable();
__napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
local_bh_enable();
}
/** /**
* skb_attempt_defer_free - queue skb for remote freeing * skb_attempt_defer_free - queue skb for remote freeing
* @skb: buffer * @skb: buffer
@ -6992,7 +7005,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
if (WARN_ON_ONCE(cpu >= nr_cpu_ids) || if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
!cpu_online(cpu) || !cpu_online(cpu) ||
cpu == raw_smp_processor_id()) { cpu == raw_smp_processor_id()) {
nodefer: __kfree_skb(skb); nodefer: kfree_skb_napi_cache(skb);
return; return;
} }