linux/include
David Ahern 58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
..
acpi pci-v4.20-changes 2018-10-25 06:50:48 -07:00
asm-generic s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
clocksource
crypto KEYS: asym_tpm: extract key size & public key [ver #2] 2018-10-26 09:30:46 +01:00
drm drm, i915, amdgpu, bridge + core quirk 2018-11-02 10:58:20 -07:00
dt-bindings This time it looks like a quieter release cycle in the clk tree. I guess that's 2018-10-31 11:08:30 -07:00
keys KEYS: Move trusted.h to include/keys [ver #2] 2018-10-26 09:30:47 +01:00
kvm
linux bridge: Add br_fdb_clear_offload() 2018-12-07 12:59:08 -08:00
math-emu
media media: Rename vb2_m2m_request_queue -> v4l2_m2m_request_queue 2018-11-06 05:24:22 -05:00
memory
misc
net neighbor: Improve garbage collection 2018-12-07 16:03:10 -08:00
pcmcia
ras
rdma First merge window pull request 2018-10-26 07:38:19 -07:00
scsi
soc soc/qman: add return value to interrupt coalesce changing APIs 2018-11-23 11:17:06 -08:00
sound ASoC: Updates for v5.0/v4.20 2018-10-22 23:26:37 +02:00
target
trace net: Add trace events for all receive exit points 2018-11-30 13:23:25 -08:00
uapi devlink: Add 'fw_load_policy' generic parameter 2018-12-03 13:55:43 -08:00
video
xen CONFIG_XEN_PV breaks xen_create_contiguous_region on ARM 2018-11-02 17:18:34 +01:00