License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
/* SPDX-License-Identifier: GPL-2.0 */
2005-04-17 02:20:36 +04:00
# ifndef _NET_NEIGHBOUR_H
# define _NET_NEIGHBOUR_H
2006-08-08 04:57:44 +04:00
# include <linux/neighbour.h>
2005-04-17 02:20:36 +04:00
/*
* Generic neighbour manipulation
*
* Authors :
* Pedro Roque < roque @ di . fc . ul . pt >
* Alexey Kuznetsov < kuznet @ ms2 . inr . ac . ru >
*
* Changes :
*
* Harald Welte : < laforge @ gnumonks . org >
* - Add neighbour cache statistics like rtstat
*/
2011-07-27 03:09:06 +04:00
# include <linux/atomic.h>
2017-06-30 13:07:55 +03:00
# include <linux/refcount.h>
2005-04-17 02:20:36 +04:00
# include <linux/netdevice.h>
2005-12-27 07:43:12 +03:00
# include <linux/skbuff.h>
2005-04-17 02:20:36 +04:00
# include <linux/rcupdate.h>
# include <linux/seq_file.h>
2013-12-07 22:26:56 +04:00
# include <linux/bitmap.h>
2005-04-17 02:20:36 +04:00
# include <linux/err.h>
# include <linux/sysctl.h>
2009-07-30 07:15:07 +04:00
# include <linux/workqueue.h>
2007-03-22 21:50:06 +03:00
# include <net/rtnetlink.h>
2005-04-17 02:20:36 +04:00
2008-01-11 09:37:16 +03:00
/*
* NUD stands for " neighbor unreachability detection "
*/
2005-04-17 02:20:36 +04:00
# define NUD_IN_TIMER (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE)
# define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)
# define NUD_CONNECTED (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE)
struct neighbour ;
2013-12-07 22:26:53 +04:00
enum {
NEIGH_VAR_MCAST_PROBES ,
NEIGH_VAR_UCAST_PROBES ,
NEIGH_VAR_APP_PROBES ,
2015-03-19 16:41:46 +03:00
NEIGH_VAR_MCAST_REPROBES ,
2013-12-07 22:26:53 +04:00
NEIGH_VAR_RETRANS_TIME ,
NEIGH_VAR_BASE_REACHABLE_TIME ,
NEIGH_VAR_DELAY_PROBE_TIME ,
2022-06-29 11:48:32 +03:00
NEIGH_VAR_INTERVAL_PROBE_TIME_MS ,
2013-12-07 22:26:53 +04:00
NEIGH_VAR_GC_STALETIME ,
NEIGH_VAR_QUEUE_LEN_BYTES ,
NEIGH_VAR_PROXY_QLEN ,
NEIGH_VAR_ANYCAST_DELAY ,
NEIGH_VAR_PROXY_DELAY ,
NEIGH_VAR_LOCKTIME ,
# define NEIGH_VAR_DATA_MAX (NEIGH_VAR_LOCKTIME + 1)
/* Following are used as a second way to access one of the above */
NEIGH_VAR_QUEUE_LEN , /* same data as NEIGH_VAR_QUEUE_LEN_BYTES */
NEIGH_VAR_RETRANS_TIME_MS , /* same data as NEIGH_VAR_RETRANS_TIME */
NEIGH_VAR_BASE_REACHABLE_TIME_MS , /* same data as NEIGH_VAR_BASE_REACHABLE_TIME */
/* Following are used by "default" only */
NEIGH_VAR_GC_INTERVAL ,
NEIGH_VAR_GC_THRESH1 ,
NEIGH_VAR_GC_THRESH2 ,
NEIGH_VAR_GC_THRESH3 ,
NEIGH_VAR_MAX
} ;
2009-11-03 06:26:03 +03:00
struct neigh_parms {
2015-03-12 07:06:44 +03:00
possible_net_t net ;
2005-06-19 09:50:55 +04:00
struct net_device * dev ;
2021-12-05 07:22:09 +03:00
netdevice_tracker dev_tracker ;
2014-10-29 21:29:31 +03:00
struct list_head list ;
2011-12-20 00:04:41 +04:00
int ( * neigh_setup ) ( struct neighbour * ) ;
2005-04-17 02:20:36 +04:00
struct neigh_table * tbl ;
void * sysctl_table ;
int dead ;
2017-06-30 13:07:56 +03:00
refcount_t refcnt ;
2005-04-17 02:20:36 +04:00
struct rcu_head rcu_head ;
int reachable_time ;
net: neigh: decrement the family specific qlen
Commit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit
per-device") introduced the length counter qlen in struct neigh_parms.
There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and
while the family specific qlen is incremented in pneigh_enqueue(), the
mentioned commit decrements always the IPv4/ARP specific qlen,
regardless of the currently processed family, in pneigh_queue_purge()
and neigh_proxy_process().
As a result, with IPv6/ND, the family specific qlen is only incremented
(and never decremented) until it exceeds PROXY_QLEN, and then, according
to the check in pneigh_enqueue(), neighbor solicitations are not
answered anymore. As an example, this is noted when using the
subnet-router anycast address to access a Linux router. After a certain
amount of time (in the observed case, qlen exceeded PROXY_QLEN after two
days), the Linux router stops answering neighbor solicitations for its
subnet-router anycast address and effectively becomes unreachable.
Another result with IPv6/ND is that the IPv4/ARP specific qlen is
decremented more often than incremented. This leads to negative qlen
values, as a signed integer has been used for the length counter qlen,
and potentially to an integer overflow.
Fix this by introducing the helper function neigh_parms_qlen_dec(),
which decrements the family specific qlen. Thereby, make use of the
existing helper function neigh_get_dev_parms_rcu(), whose definition
therefore needs to be placed earlier in neighbour.c. Take the family
member from struct neigh_table to determine the currently processed
family and appropriately call neigh_parms_qlen_dec() from
pneigh_queue_purge() and neigh_proxy_process().
Additionally, use an unsigned integer for the length counter qlen.
Fixes: 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device")
Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 01:09:41 +03:00
u32 qlen ;
2013-12-07 22:26:53 +04:00
int data [ NEIGH_VAR_DATA_MAX ] ;
2013-12-07 22:26:56 +04:00
DECLARE_BITMAP ( data_state , NEIGH_VAR_DATA_MAX ) ;
2005-04-17 02:20:36 +04:00
} ;
2013-12-07 22:26:53 +04:00
static inline void neigh_var_set ( struct neigh_parms * p , int index , int val )
{
2013-12-07 22:26:56 +04:00
set_bit ( index , p - > data_state ) ;
2013-12-07 22:26:53 +04:00
p - > data [ index ] = val ;
}
# define NEIGH_VAR(p, attr) ((p)->data[NEIGH_VAR_ ## attr])
2014-01-09 17:13:47 +04:00
/* In ndo_neigh_setup, NEIGH_VAR_INIT should be used.
* In other cases , NEIGH_VAR_SET should be used .
*/
# define NEIGH_VAR_INIT(p, attr, val) (NEIGH_VAR(p, attr) = val)
2013-12-07 22:26:53 +04:00
# define NEIGH_VAR_SET(p, attr, val) neigh_var_set(p, NEIGH_VAR_ ## attr, val)
2013-12-07 22:26:56 +04:00
static inline void neigh_parms_data_state_setall ( struct neigh_parms * p )
{
bitmap_fill ( p - > data_state , NEIGH_VAR_DATA_MAX ) ;
}
static inline void neigh_parms_data_state_cleanall ( struct neigh_parms * p )
{
bitmap_zero ( p - > data_state , NEIGH_VAR_DATA_MAX ) ;
}
2009-11-03 06:26:03 +03:00
struct neigh_statistics {
2005-04-17 02:20:36 +04:00
unsigned long allocs ; /* number of allocated neighs */
unsigned long destroys ; /* number of destroyed neighs */
unsigned long hash_grows ; /* number of hash resizes */
2008-07-17 07:50:49 +04:00
unsigned long res_failed ; /* number of failed resolutions */
2005-04-17 02:20:36 +04:00
unsigned long lookups ; /* number of lookups */
unsigned long hits ; /* number of hits (among lookups) */
unsigned long rcv_probes_mcast ; /* number of received mcast ipv6 */
unsigned long rcv_probes_ucast ; /* number of received ucast ipv6 */
unsigned long periodic_gc_runs ; /* number of periodic GC runs */
unsigned long forced_gc_runs ; /* number of forced GC runs */
2008-07-17 07:50:49 +04:00
unsigned long unres_discards ; /* number of unresolved drops */
2015-08-07 21:10:37 +03:00
unsigned long table_fulls ; /* times even gc couldn't help */
2005-04-17 02:20:36 +04:00
} ;
2009-10-03 14:48:22 +04:00
# define NEIGH_CACHE_STAT_INC(tbl, field) this_cpu_inc((tbl)->stats->field)
2005-04-17 02:20:36 +04:00
2009-11-03 06:26:03 +03:00
struct neighbour {
2010-10-07 04:49:21 +04:00
struct neighbour __rcu * next ;
2005-04-17 02:20:36 +04:00
struct neigh_table * tbl ;
struct neigh_parms * parms ;
unsigned long confirmed ;
unsigned long updated ;
2010-11-11 09:57:19 +03:00
rwlock_t lock ;
2017-06-30 13:07:55 +03:00
refcount_t refcnt ;
neigh: new unresolved queue limits
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.
Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.
[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.
$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-09 16:07:14 +04:00
unsigned int arp_queue_len_bytes ;
2018-12-13 19:16:50 +03:00
struct sk_buff_head arp_queue ;
neigh: reorder struct neighbour fields
Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
> Here is the followup patch.
>
> Thanks !
>
Oops, this was an old version, the up2date ones also took care of "used"
field.
I guess its time for a sleep, sorry again.
[PATCH net-next V2] neigh: reorder struct neighbour fields
(refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
be placed on a separate cache lines.
refcnt can be often written, while other fields are mostly read.
This gave me good result on stress test :
before:
real 0m45.570s
user 0m15.525s
sys 9m56.669s
After:
real 0m41.841s
user 0m15.261s
sys 8m45.949s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 16:20:54 +04:00
struct timer_list timer ;
unsigned long used ;
2005-04-17 02:20:36 +04:00
atomic_t probes ;
2021-10-11 15:12:37 +03:00
u8 nud_state ;
u8 type ;
u8 dead ;
2018-12-16 01:09:06 +03:00
u8 protocol ;
2021-10-11 15:12:37 +03:00
u32 flags ;
2010-10-07 14:44:07 +04:00
seqlock_t ha_lock ;
2018-12-13 19:16:50 +03:00
unsigned char ha [ ALIGN ( MAX_ADDR_LEN , sizeof ( unsigned long ) ) ] __aligned ( 8 ) ;
2011-07-14 18:53:20 +04:00
struct hh_cache hh ;
2011-07-18 00:34:11 +04:00
int ( * output ) ( struct neighbour * , struct sk_buff * ) ;
2009-09-01 15:13:19 +04:00
const struct neigh_ops * ops ;
neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:
1. The gc algorithm will not evict PERMANENT entries as those entries
are managed by userspace, yet the existing algorithm walks the entire
hash table which means it always considers PERMANENT entries when
looking for entries to evict. In some use cases (e.g., EVPN) there
can be tens of thousands of PERMANENT entries leading to wasted
CPU cycles when gc kicks in. As an example, with 32k permanent
entries, neigh_alloc has been observed taking more than 4 msec per
invocation.
2. Currently, when the number of neighbor entries hits gc_thresh2 and
the last flush for the table was more than 5 seconds ago gc kicks in
walks the entire hash table evicting *all* entries not in PERMANENT
or REACHABLE state and not marked as externally learned. There is no
discriminator on when the neigh entry was created or if it just moved
from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).
It is possible for entries to be created or for established neighbor
entries to be moved to STALE (e.g., an external node sends an ARP
request) right before the 5 second window lapses:
-----|---------x|----------|-----
t-5 t t+5
If that happens those entries are evicted during gc causing unnecessary
thrashing on neighbor entries and userspace caches trying to track them.
Further, this contradicts the description of gc_thresh2 which says
"Entries older than 5 seconds will be cleared".
One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
whole point of having separate thresholds.
3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
when gc_thresh2 is exceeded is over kill and contributes to trashing
especially during startup.
This patch addresses these problems as follows:
1. Use of a separate list_head to track entries that can be garbage
collected along with a separate counter. PERMANENT entries are not
added to this list.
The gc_thresh parameters are only compared to the new counter, not the
total entries in the table. The forced_gc function is updated to only
walk this new gc_list looking for entries to evict.
2. Entries are added to the list head at the tail and removed from the
front.
3. Entries are only evicted if they were last updated more than 5 seconds
ago, adhering to the original intent of gc_thresh2.
4. Forced gc is stopped once the number of gc_entries drops below
gc_thresh2.
5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
when allocating a new neighbor for a PERMANENT entry. By extension this
means there are no explicit limits on the number of PERMANENT entries
that can be created, but this is no different than FIB entries or FDB
entries.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 23:24:57 +03:00
struct list_head gc_list ;
2021-10-11 15:12:38 +03:00
struct list_head managed_list ;
2010-10-07 04:49:21 +04:00
struct rcu_head rcu ;
neigh: reorder struct neighbour fields
Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
> Here is the followup patch.
>
> Thanks !
>
Oops, this was an old version, the up2date ones also took care of "used"
field.
I guess its time for a sleep, sorry again.
[PATCH net-next V2] neigh: reorder struct neighbour fields
(refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
be placed on a separate cache lines.
refcnt can be often written, while other fields are mostly read.
This gave me good result on stress test :
before:
real 0m45.570s
user 0m15.525s
sys 9m56.669s
After:
real 0m41.841s
user 0m15.261s
sys 8m45.949s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 16:20:54 +04:00
struct net_device * dev ;
2021-12-05 07:22:07 +03:00
netdevice_tracker dev_tracker ;
2005-04-17 02:20:36 +04:00
u8 primary_key [ 0 ] ;
2016-10-28 11:22:25 +03:00
} __randomize_layout ;
2005-04-17 02:20:36 +04:00
2009-11-03 06:26:03 +03:00
struct neigh_ops {
2005-04-17 02:20:36 +04:00
int family ;
2011-07-18 00:34:11 +04:00
void ( * solicit ) ( struct neighbour * , struct sk_buff * ) ;
void ( * error_report ) ( struct neighbour * , struct sk_buff * ) ;
int ( * output ) ( struct neighbour * , struct sk_buff * ) ;
int ( * connected_output ) ( struct neighbour * , struct sk_buff * ) ;
2005-04-17 02:20:36 +04:00
} ;
2009-11-03 06:26:03 +03:00
struct pneigh_entry {
2005-04-17 02:20:36 +04:00
struct pneigh_entry * next ;
2015-03-12 07:06:44 +03:00
possible_net_t net ;
2008-01-24 11:13:18 +03:00
struct net_device * dev ;
2021-12-05 07:22:08 +03:00
netdevice_tracker dev_tracker ;
2021-10-11 15:12:37 +03:00
u32 flags ;
2018-12-16 01:09:06 +03:00
u8 protocol ;
2023-06-01 04:54:32 +03:00
u32 key [ ] ;
2005-04-17 02:20:36 +04:00
} ;
/*
* neighbour table manipulation
*/
2011-12-29 00:06:58 +04:00
# define NEIGH_NUM_HASH_RND 4
2010-10-04 10:15:44 +04:00
struct neigh_hash_table {
2010-10-07 04:49:21 +04:00
struct neighbour __rcu * * hash_buckets ;
2011-07-11 12:28:12 +04:00
unsigned int hash_shift ;
2011-12-29 00:06:58 +04:00
__u32 hash_rnd [ NEIGH_NUM_HASH_RND ] ;
2010-10-04 10:15:44 +04:00
struct rcu_head rcu ;
} ;
2005-04-17 02:20:36 +04:00
2009-11-03 06:26:03 +03:00
struct neigh_table {
2005-04-17 02:20:36 +04:00
int family ;
2017-09-23 23:01:06 +03:00
unsigned int entry_size ;
2017-09-23 23:03:04 +03:00
unsigned int key_len ;
2015-03-02 09:13:22 +03:00
__be16 protocol ;
2010-10-04 10:15:44 +04:00
__u32 ( * hash ) ( const void * pkey ,
const struct net_device * dev ,
2011-12-29 00:06:58 +04:00
__u32 * hash_rnd ) ;
2015-03-04 02:10:44 +03:00
bool ( * key_eq ) ( const struct neighbour * , const void * pkey ) ;
2005-04-17 02:20:36 +04:00
int ( * constructor ) ( struct neighbour * ) ;
int ( * pconstructor ) ( struct pneigh_entry * ) ;
void ( * pdestructor ) ( struct pneigh_entry * ) ;
void ( * proxy_redo ) ( struct sk_buff * skb ) ;
2020-11-13 04:58:15 +03:00
int ( * is_multicast ) ( const void * pkey ) ;
2019-04-17 03:31:43 +03:00
bool ( * allow_add ) ( const struct net_device * dev ,
struct netlink_ext_ack * extack ) ;
2005-04-17 02:20:36 +04:00
char * id ;
struct neigh_parms parms ;
2014-10-29 21:29:31 +03:00
struct list_head parms_list ;
2005-04-17 02:20:36 +04:00
int gc_interval ;
int gc_thresh1 ;
int gc_thresh2 ;
int gc_thresh3 ;
unsigned long last_flush ;
2009-07-30 07:15:07 +04:00
struct delayed_work gc_work ;
2021-10-11 15:12:38 +03:00
struct delayed_work managed_work ;
2005-04-17 02:20:36 +04:00
struct timer_list proxy_timer ;
struct sk_buff_head proxy_queue ;
atomic_t entries ;
neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:
1. The gc algorithm will not evict PERMANENT entries as those entries
are managed by userspace, yet the existing algorithm walks the entire
hash table which means it always considers PERMANENT entries when
looking for entries to evict. In some use cases (e.g., EVPN) there
can be tens of thousands of PERMANENT entries leading to wasted
CPU cycles when gc kicks in. As an example, with 32k permanent
entries, neigh_alloc has been observed taking more than 4 msec per
invocation.
2. Currently, when the number of neighbor entries hits gc_thresh2 and
the last flush for the table was more than 5 seconds ago gc kicks in
walks the entire hash table evicting *all* entries not in PERMANENT
or REACHABLE state and not marked as externally learned. There is no
discriminator on when the neigh entry was created or if it just moved
from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).
It is possible for entries to be created or for established neighbor
entries to be moved to STALE (e.g., an external node sends an ARP
request) right before the 5 second window lapses:
-----|---------x|----------|-----
t-5 t t+5
If that happens those entries are evicted during gc causing unnecessary
thrashing on neighbor entries and userspace caches trying to track them.
Further, this contradicts the description of gc_thresh2 which says
"Entries older than 5 seconds will be cleared".
One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
whole point of having separate thresholds.
3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
when gc_thresh2 is exceeded is over kill and contributes to trashing
especially during startup.
This patch addresses these problems as follows:
1. Use of a separate list_head to track entries that can be garbage
collected along with a separate counter. PERMANENT entries are not
added to this list.
The gc_thresh parameters are only compared to the new counter, not the
total entries in the table. The forced_gc function is updated to only
walk this new gc_list looking for entries to evict.
2. Entries are added to the list head at the tail and removed from the
front.
3. Entries are only evicted if they were last updated more than 5 seconds
ago, adhering to the original intent of gc_thresh2.
4. Forced gc is stopped once the number of gc_entries drops below
gc_thresh2.
5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
when allocating a new neighbor for a PERMANENT entry. By extension this
means there are no explicit limits on the number of PERMANENT entries
that can be created, but this is no different than FIB entries or FDB
entries.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 23:24:57 +03:00
atomic_t gc_entries ;
struct list_head gc_list ;
2021-10-11 15:12:38 +03:00
struct list_head managed_list ;
2005-04-17 02:20:36 +04:00
rwlock_t lock ;
unsigned long last_rand ;
2010-02-16 18:20:26 +03:00
struct neigh_statistics __percpu * stats ;
2010-10-04 10:15:44 +04:00
struct neigh_hash_table __rcu * nht ;
2005-04-17 02:20:36 +04:00
struct pneigh_entry * * phash_buckets ;
} ;
2014-11-11 02:59:36 +03:00
enum {
NEIGH_ARP_TABLE = 0 ,
NEIGH_ND_TABLE = 1 ,
NEIGH_DN_TABLE = 2 ,
NEIGH_NR_TABLES ,
2015-03-08 01:25:56 +03:00
NEIGH_LINK_TABLE = NEIGH_NR_TABLES /* Pseudo table for neigh_xmit */
2014-11-11 02:59:36 +03:00
} ;
2013-12-07 22:26:55 +04:00
static inline int neigh_parms_family ( struct neigh_parms * p )
{
return p - > tbl - > family ;
}
2011-07-25 04:01:17 +04:00
# define NEIGH_PRIV_ALIGN sizeof(long long)
2013-02-09 11:00:59 +04:00
# define NEIGH_ENTRY_SIZE(size) ALIGN((size), NEIGH_PRIV_ALIGN)
2011-07-25 04:01:17 +04:00
static inline void * neighbour_priv ( const struct neighbour * n )
{
2013-01-24 04:44:23 +04:00
return ( char * ) n + n - > tbl - > entry_size ;
2011-07-25 04:01:17 +04:00
}
2005-04-17 02:20:36 +04:00
/* flags for neigh_update() */
2021-10-11 15:12:38 +03:00
# define NEIGH_UPDATE_F_OVERRIDE BIT(0)
# define NEIGH_UPDATE_F_WEAK_OVERRIDE BIT(1)
# define NEIGH_UPDATE_F_OVERRIDE_ISROUTER BIT(2)
# define NEIGH_UPDATE_F_USE BIT(3)
# define NEIGH_UPDATE_F_MANAGED BIT(4)
# define NEIGH_UPDATE_F_EXT_LEARNED BIT(5)
# define NEIGH_UPDATE_F_ISROUTER BIT(6)
# define NEIGH_UPDATE_F_ADMIN BIT(7)
2005-04-17 02:20:36 +04:00
2021-10-11 15:12:37 +03:00
/* In-kernel representation for NDA_FLAGS_EXT flags: */
# define NTF_OLD_MASK 0xff
# define NTF_EXT_SHIFT 8
2021-10-11 15:12:38 +03:00
# define NTF_EXT_MASK (NTF_EXT_MANAGED)
# define NTF_MANAGED (NTF_EXT_MANAGED << NTF_EXT_SHIFT)
2021-10-11 15:12:37 +03:00
2018-12-19 23:51:38 +03:00
extern const struct nla_policy nda_policy [ ] ;
2015-03-04 02:10:44 +03:00
static inline bool neigh_key_eq32 ( const struct neighbour * n , const void * pkey )
{
return * ( const u32 * ) n - > primary_key = = * ( const u32 * ) pkey ;
}
static inline bool neigh_key_eq128 ( const struct neighbour * n , const void * pkey )
{
const u32 * n32 = ( const u32 * ) n - > primary_key ;
const u32 * p32 = pkey ;
return ( ( n32 [ 0 ] ^ p32 [ 0 ] ) | ( n32 [ 1 ] ^ p32 [ 1 ] ) |
( n32 [ 2 ] ^ p32 [ 2 ] ) | ( n32 [ 3 ] ^ p32 [ 3 ] ) ) = = 0 ;
}
static inline struct neighbour * ___neigh_lookup_noref (
struct neigh_table * tbl ,
bool ( * key_eq ) ( const struct neighbour * n , const void * pkey ) ,
__u32 ( * hash ) ( const void * pkey ,
const struct net_device * dev ,
__u32 * hash_rnd ) ,
const void * pkey ,
struct net_device * dev )
{
2023-03-21 07:01:14 +03:00
struct neigh_hash_table * nht = rcu_dereference ( tbl - > nht ) ;
2015-03-04 02:10:44 +03:00
struct neighbour * n ;
u32 hash_val ;
hash_val = hash ( pkey , dev , nht - > hash_rnd ) > > ( 32 - nht - > hash_shift ) ;
2023-03-21 07:01:14 +03:00
for ( n = rcu_dereference ( nht - > hash_buckets [ hash_val ] ) ;
2015-03-04 02:10:44 +03:00
n ! = NULL ;
2023-03-21 07:01:14 +03:00
n = rcu_dereference ( n - > next ) ) {
2015-03-04 02:10:44 +03:00
if ( n - > dev = = dev & & key_eq ( n , pkey ) )
return n ;
}
return NULL ;
}
static inline struct neighbour * __neigh_lookup_noref ( struct neigh_table * tbl ,
const void * pkey ,
struct net_device * dev )
{
return ___neigh_lookup_noref ( tbl , tbl - > key_eq , tbl - > hash , pkey , dev ) ;
}
2021-11-23 05:54:30 +03:00
static inline void neigh_confirm ( struct neighbour * n )
{
if ( n ) {
unsigned long now = jiffies ;
/* avoid dirtying neighbour */
if ( READ_ONCE ( n - > confirmed ) ! = now )
WRITE_ONCE ( n - > confirmed , now ) ;
}
}
2014-11-11 02:59:36 +03:00
void neigh_table_init ( int index , struct neigh_table * tbl ) ;
int neigh_table_clear ( int index , struct neigh_table * tbl ) ;
2013-08-01 04:31:35 +04:00
struct neighbour * neigh_lookup ( struct neigh_table * tbl , const void * pkey ,
struct net_device * dev ) ;
struct neighbour * __neigh_create ( struct neigh_table * tbl , const void * pkey ,
struct net_device * dev , bool want_ref ) ;
2012-07-02 13:02:15 +04:00
static inline struct neighbour * neigh_create ( struct neigh_table * tbl ,
2005-04-17 02:20:36 +04:00
const void * pkey ,
2012-07-02 13:02:15 +04:00
struct net_device * dev )
{
return __neigh_create ( tbl , pkey , dev , true ) ;
}
2013-08-01 04:31:35 +04:00
void neigh_destroy ( struct neighbour * neigh ) ;
2022-02-01 22:39:42 +03:00
int __neigh_event_send ( struct neighbour * neigh , struct sk_buff * skb ,
const bool immediate_ok ) ;
2017-03-20 08:01:28 +03:00
int neigh_update ( struct neighbour * neigh , const u8 * lladdr , u8 new , u32 flags ,
u32 nlmsg_pid ) ;
2013-12-11 16:48:20 +04:00
void __neigh_set_probe_once ( struct neighbour * neigh ) ;
2017-06-02 19:01:49 +03:00
bool neigh_remove_one ( struct neighbour * ndel , struct neigh_table * tbl ) ;
2013-08-01 04:31:35 +04:00
void neigh_changeaddr ( struct neigh_table * tbl , struct net_device * dev ) ;
int neigh_ifdown ( struct neigh_table * tbl , struct net_device * dev ) ;
2018-10-12 06:33:49 +03:00
int neigh_carrier_down ( struct neigh_table * tbl , struct net_device * dev ) ;
2013-08-01 04:31:35 +04:00
int neigh_resolve_output ( struct neighbour * neigh , struct sk_buff * skb ) ;
int neigh_connected_output ( struct neighbour * neigh , struct sk_buff * skb ) ;
int neigh_direct_output ( struct neighbour * neigh , struct sk_buff * skb ) ;
struct neighbour * neigh_event_ns ( struct neigh_table * tbl ,
2005-04-17 02:20:36 +04:00
u8 * lladdr , void * saddr ,
struct net_device * dev ) ;
2013-08-01 04:31:35 +04:00
struct neigh_parms * neigh_parms_alloc ( struct net_device * dev ,
struct neigh_table * tbl ) ;
void neigh_parms_release ( struct neigh_table * tbl , struct neigh_parms * parms ) ;
2008-03-25 21:49:59 +03:00
static inline
2013-08-01 04:31:35 +04:00
struct net * neigh_parms_net ( const struct neigh_parms * parms )
2008-03-25 21:49:59 +03:00
{
2008-11-12 11:54:54 +03:00
return read_pnet ( & parms - > net ) ;
2008-03-25 21:49:59 +03:00
}
2013-08-01 04:31:35 +04:00
unsigned long neigh_rand_reach_time ( unsigned long base ) ;
2005-04-17 02:20:36 +04:00
2013-08-01 04:31:35 +04:00
void pneigh_enqueue ( struct neigh_table * tbl , struct neigh_parms * p ,
struct sk_buff * skb ) ;
struct pneigh_entry * pneigh_lookup ( struct neigh_table * tbl , struct net * net ,
const void * key , struct net_device * dev ,
int creat ) ;
struct pneigh_entry * __pneigh_lookup ( struct neigh_table * tbl , struct net * net ,
const void * key , struct net_device * dev ) ;
int pneigh_delete ( struct neigh_table * tbl , struct net * net , const void * key ,
struct net_device * dev ) ;
2005-04-17 02:20:36 +04:00
2013-08-01 04:31:35 +04:00
static inline struct net * pneigh_net ( const struct pneigh_entry * pneigh )
2008-03-25 21:49:59 +03:00
{
2008-11-12 11:54:54 +03:00
return read_pnet ( & pneigh - > net ) ;
2008-03-25 21:49:59 +03:00
}
2013-08-01 04:31:35 +04:00
void neigh_app_ns ( struct neighbour * n ) ;
void neigh_for_each ( struct neigh_table * tbl ,
void ( * cb ) ( struct neighbour * , void * ) , void * cookie ) ;
void __neigh_for_each_release ( struct neigh_table * tbl ,
int ( * cb ) ( struct neighbour * ) ) ;
2015-03-04 02:11:16 +03:00
int neigh_xmit ( int fam , struct net_device * , const void * , struct sk_buff * ) ;
2005-04-17 02:20:36 +04:00
struct neigh_seq_state {
2008-01-10 14:53:12 +03:00
struct seq_net_private p ;
2005-04-17 02:20:36 +04:00
struct neigh_table * tbl ;
2010-10-04 10:15:44 +04:00
struct neigh_hash_table * nht ;
2005-04-17 02:20:36 +04:00
void * ( * neigh_sub_iter ) ( struct neigh_seq_state * state ,
struct neighbour * n , loff_t * pos ) ;
unsigned int bucket ;
unsigned int flags ;
# define NEIGH_SEQ_NEIGH_ONLY 0x00000001
# define NEIGH_SEQ_IS_PNEIGH 0x00000002
# define NEIGH_SEQ_SKIP_NOARP 0x00000004
} ;
2013-08-01 04:31:35 +04:00
void * neigh_seq_start ( struct seq_file * , loff_t * , struct neigh_table * ,
unsigned int ) ;
void * neigh_seq_next ( struct seq_file * , void * , loff_t * ) ;
void neigh_seq_stop ( struct seq_file * , void * ) ;
2013-12-07 22:26:54 +04:00
int neigh_proc_dointvec ( struct ctl_table * ctl , int write ,
2020-06-03 08:52:35 +03:00
void * buffer , size_t * lenp , loff_t * ppos ) ;
2013-12-07 22:26:54 +04:00
int neigh_proc_dointvec_jiffies ( struct ctl_table * ctl , int write ,
2020-06-03 08:52:35 +03:00
void * buffer ,
2013-12-07 22:26:54 +04:00
size_t * lenp , loff_t * ppos ) ;
int neigh_proc_dointvec_ms_jiffies ( struct ctl_table * ctl , int write ,
2020-06-03 08:52:35 +03:00
void * buffer , size_t * lenp , loff_t * ppos ) ;
2013-12-07 22:26:54 +04:00
2013-08-01 04:31:35 +04:00
int neigh_sysctl_register ( struct net_device * dev , struct neigh_parms * p ,
2013-12-07 22:26:55 +04:00
proc_handler * proc_handler ) ;
2013-08-01 04:31:35 +04:00
void neigh_sysctl_unregister ( struct neigh_parms * p ) ;
2005-04-17 02:20:36 +04:00
static inline void __neigh_parms_put ( struct neigh_parms * parms )
{
2017-06-30 13:07:56 +03:00
refcount_dec ( & parms - > refcnt ) ;
2005-04-17 02:20:36 +04:00
}
static inline struct neigh_parms * neigh_parms_clone ( struct neigh_parms * parms )
{
2017-06-30 13:07:56 +03:00
refcount_inc ( & parms - > refcnt ) ;
2005-04-17 02:20:36 +04:00
return parms ;
}
/*
* Neighbour references
*/
static inline void neigh_release ( struct neighbour * neigh )
{
2017-06-30 13:07:55 +03:00
if ( refcount_dec_and_test ( & neigh - > refcnt ) )
2005-04-17 02:20:36 +04:00
neigh_destroy ( neigh ) ;
}
static inline struct neighbour * neigh_clone ( struct neighbour * neigh )
{
if ( neigh )
2017-06-30 13:07:55 +03:00
refcount_inc ( & neigh - > refcnt ) ;
2005-04-17 02:20:36 +04:00
return neigh ;
}
2017-06-30 13:07:55 +03:00
# define neigh_hold(n) refcount_inc(&(n)->refcnt)
2005-04-17 02:20:36 +04:00
2022-02-01 22:39:42 +03:00
static __always_inline int neigh_event_send_probe ( struct neighbour * neigh ,
struct sk_buff * skb ,
const bool immediate_ok )
2005-04-17 02:20:36 +04:00
{
2010-11-18 20:40:04 +03:00
unsigned long now = jiffies ;
2022-02-01 22:39:42 +03:00
2019-11-08 07:08:19 +03:00
if ( READ_ONCE ( neigh - > used ) ! = now )
WRITE_ONCE ( neigh - > used , now ) ;
2023-03-13 23:17:31 +03:00
if ( ! ( READ_ONCE ( neigh - > nud_state ) & ( NUD_CONNECTED | NUD_DELAY | NUD_PROBE ) ) )
2022-02-01 22:39:42 +03:00
return __neigh_event_send ( neigh , skb , immediate_ok ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
}
2022-02-01 22:39:42 +03:00
static inline int neigh_event_send ( struct neighbour * neigh , struct sk_buff * skb )
{
return neigh_event_send_probe ( neigh , skb , true ) ;
}
2014-09-18 13:29:03 +04:00
# if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
2010-04-15 14:26:39 +04:00
static inline int neigh_hh_bridge ( struct hh_cache * hh , struct sk_buff * skb )
{
2012-04-15 09:58:06 +04:00
unsigned int seq , hh_alen ;
2010-04-15 14:26:39 +04:00
do {
seq = read_seqbegin ( & hh - > hh_lock ) ;
hh_alen = HH_DATA_ALIGN ( ETH_HLEN ) ;
memcpy ( skb - > data - hh_alen , hh - > hh_data , ETH_ALEN + hh_alen - ETH_HLEN ) ;
} while ( read_seqretry ( & hh - > hh_lock , seq ) ) ;
return 0 ;
}
# endif
net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 06:19:56 +04:00
static inline int neigh_hh_output ( const struct hh_cache * hh , struct sk_buff * skb )
2006-12-08 02:08:17 +03:00
{
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 21:30:37 +03:00
unsigned int hh_alen = 0 ;
2012-04-15 09:58:06 +04:00
unsigned int seq ;
2017-04-10 11:11:17 +03:00
unsigned int hh_len ;
2006-12-08 02:08:17 +03:00
do {
seq = read_seqbegin ( & hh - > hh_lock ) ;
2019-11-08 05:29:11 +03:00
hh_len = READ_ONCE ( hh - > hh_len ) ;
net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 06:19:56 +04:00
if ( likely ( hh_len < = HH_DATA_MOD ) ) {
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 21:30:37 +03:00
hh_alen = HH_DATA_MOD ;
/* skb_push() would proceed silently if we have room for
* the unaligned size but not for the aligned size :
* check headroom explicitly .
*/
if ( likely ( skb_headroom ( skb ) > = HH_DATA_MOD ) ) {
/* this is inlined by gcc */
memcpy ( skb - > data - HH_DATA_MOD , hh - > hh_data ,
HH_DATA_MOD ) ;
}
net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 06:19:56 +04:00
} else {
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 21:30:37 +03:00
hh_alen = HH_DATA_ALIGN ( hh_len ) ;
net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 06:19:56 +04:00
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 21:30:37 +03:00
if ( likely ( skb_headroom ( skb ) > = hh_alen ) ) {
memcpy ( skb - > data - hh_alen , hh - > hh_data ,
hh_alen ) ;
}
net: output path optimizations
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-07 06:19:56 +04:00
}
2006-12-08 02:08:17 +03:00
} while ( read_seqretry ( & hh - > hh_lock , seq ) ) ;
neighbour: Avoid writing before skb->head in neigh_hh_output()
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.
In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.
Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.
v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
(Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
before the memcpy(), otherwise we'll have corrupted slabs on a running
kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
already implemented here explicitly (Eric Dumazet)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 21:30:37 +03:00
if ( WARN_ON_ONCE ( skb_headroom ( skb ) < hh_alen ) ) {
kfree_skb ( skb ) ;
return NET_XMIT_DROP ;
}
__skb_push ( skb , hh_len ) ;
2011-07-17 04:45:02 +04:00
return dev_queue_xmit ( skb ) ;
2006-12-08 02:08:17 +03:00
}
2019-04-06 02:30:33 +03:00
static inline int neigh_output ( struct neighbour * n , struct sk_buff * skb ,
bool skip_cache )
2017-02-11 14:49:20 +03:00
{
const struct hh_cache * hh = & n - > hh ;
2021-10-25 21:15:55 +03:00
/* n->nud_state and hh->hh_len could be changed under us.
* neigh_hh_output ( ) is taking care of the race later .
*/
if ( ! skip_cache & &
( READ_ONCE ( n - > nud_state ) & NUD_CONNECTED ) & &
READ_ONCE ( hh - > hh_len ) )
2017-02-11 14:49:20 +03:00
return neigh_hh_output ( hh , skb ) ;
2021-10-25 21:15:55 +03:00
2023-09-21 12:27:13 +03:00
return READ_ONCE ( n - > output ) ( n , skb ) ;
2017-02-11 14:49:20 +03:00
}
2005-04-17 02:20:36 +04:00
static inline struct neighbour *
__neigh_lookup ( struct neigh_table * tbl , const void * pkey , struct net_device * dev , int creat )
{
struct neighbour * n = neigh_lookup ( tbl , pkey , dev ) ;
if ( n | | ! creat )
return n ;
n = neigh_create ( tbl , pkey , dev ) ;
return IS_ERR ( n ) ? NULL : n ;
}
static inline struct neighbour *
__neigh_lookup_errno ( struct neigh_table * tbl , const void * pkey ,
struct net_device * dev )
{
struct neighbour * n = neigh_lookup ( tbl , pkey , dev ) ;
if ( n )
return n ;
return neigh_create ( tbl , pkey , dev ) ;
}
2005-08-15 04:24:31 +04:00
struct neighbour_cb {
unsigned long sched_next ;
unsigned int flags ;
} ;
# define LOCALLY_ENQUEUED 0x1
# define NEIGH_CB(skb) ((struct neighbour_cb *)(skb)->cb)
2005-04-17 02:20:36 +04:00
2010-10-07 14:44:07 +04:00
static inline void neigh_ha_snapshot ( char * dst , const struct neighbour * n ,
const struct net_device * dev )
{
unsigned int seq ;
do {
seq = read_seqbegin ( & n - > ha_lock ) ;
memcpy ( dst , n - > ha , dev - > addr_len ) ;
} while ( read_seqretry ( & n - > ha_lock , seq ) ) ;
}
2015-03-04 02:11:16 +03:00
2018-09-23 07:26:20 +03:00
static inline void neigh_update_is_router ( struct neighbour * neigh , u32 flags ,
int * notify )
{
u8 ndm_flags = 0 ;
ndm_flags | = ( flags & NEIGH_UPDATE_F_ISROUTER ) ? NTF_ROUTER : 0 ;
if ( ( neigh - > flags ^ ndm_flags ) & NTF_ROUTER ) {
if ( ndm_flags & NTF_ROUTER )
neigh - > flags | = NTF_ROUTER ;
else
neigh - > flags & = ~ NTF_ROUTER ;
* notify = 1 ;
}
}
2005-04-17 02:20:36 +04:00
# endif