linux/kernel/bpf
Yonghong Song dc0988bbe1 bpf: Do not use bucket_lock for hashmap iterator
Currently, for hashmap, the bpf iterator will grab a bucket lock, a
spinlock, before traversing the elements in the bucket. This can ensure
all bpf visted elements are valid. But this mechanism may cause
deadlock if update/deletion happens to the same bucket of the
visited map in the program. For example, if we added bpf_map_update_elem()
call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
we will have the following deadlock:

  ============================================
  WARNING: possible recursive locking detected
  5.9.0-rc1+ #841 Not tainted
  --------------------------------------------
  test_progs/1750 is trying to acquire lock:
  ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410

  but task is already holding lock:
  ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&htab->buckets[i].raw_lock);
    lock(&htab->buckets[i].raw_lock);

   *** DEADLOCK ***
   ...
  Call Trace:
   dump_stack+0x78/0xa0
   __lock_acquire.cold.74+0x209/0x2e3
   lock_acquire+0xba/0x380
   ? htab_map_update_elem+0x1cf/0x410
   ? __lock_acquire+0x639/0x20c0
   _raw_spin_lock_irqsave+0x3b/0x80
   ? htab_map_update_elem+0x1cf/0x410
   htab_map_update_elem+0x1cf/0x410
   ? lock_acquire+0xba/0x380
   bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
   ? find_held_lock+0x34/0xa0
   bpf_iter_run_prog+0x81/0x16e
   __bpf_hash_map_seq_show+0x145/0x180
   bpf_seq_read+0xff/0x3d0
   vfs_read+0xad/0x1c0
   ksys_read+0x5f/0xe0
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ...

The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
and then grabbed again in htab_map_update_elem() in the bpf program, causing
deadlocks.

Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
 seq_ops->start():
     rcu_read_lock();
 seq_ops->next():
     rcu_read_unlock();
     /* next element */
     rcu_read_lock();
 seq_ops->stop();
     rcu_read_unlock();

Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
we may visit stale elements, miss some elements, or repeat some elements.
I think this is a reasonable compromise. For users wanting to avoid
stale, missing/repeated accesses, bpf_map batch access syscall interface
can be used.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
2020-09-03 17:36:41 -07:00
..
arraymap.c bpf: Implement bpf iterator for array maps 2020-07-25 20:16:33 -07:00
bpf_iter.c bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator 2020-08-18 17:36:23 -07:00
bpf_lru_list.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
bpf_lru_list.h bpf: Fix a typo "inacitve" -> "inactive" 2020-04-06 21:54:10 +02:00
bpf_lsm.c bpf: Use tracing helpers for lsm programs 2020-06-01 15:08:04 -07:00
bpf_struct_ops_types.h bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
bpf_struct_ops.c bpf: Set map_btf_{name, id} for all map types 2020-06-22 22:22:58 +02:00
btf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
cgroup.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
core.c bpf: Delete repeated words in comments 2020-08-07 18:57:24 +02:00
cpumap.c bpf: cpumap: Fix possible rcpu kthread hung 2020-07-21 09:15:28 -07:00
devmap.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-07-04 17:48:34 -07:00
disasm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
disasm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
dispatcher.c bpf: Remove bpf_image tree 2020-03-13 12:49:52 -07:00
hashtab.c bpf: Do not use bucket_lock for hashmap iterator 2020-09-03 17:36:41 -07:00
helpers.c bpf: Implement BPF ring buffer and verifier support for it 2020-06-01 14:38:22 -07:00
inode.c bpf: Create file bpf iterator 2020-05-09 17:05:26 -07:00
local_storage.c bpf/local_storage: Fix build without CONFIG_CGROUP 2020-07-25 20:16:36 -07:00
lpm_trie.c bpf: Remove redundant synchronize_rcu. 2020-07-01 08:07:13 -07:00
Makefile bpf: Add bpf_prog iterator 2020-07-25 20:16:32 -07:00
map_in_map.c bpf: Implement CAP_BPF 2020-05-15 17:29:41 +02:00
map_in_map.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
map_iter.c bpf: Change uapi for bpf iterator map elements 2020-08-06 16:39:14 -07:00
net_namespace.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
offload.c bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-17 16:53:49 +01:00
percpu_freelist.c bpf: Dont iterate over possible CPUs with interrupts disabled 2020-02-24 16:18:20 -08:00
percpu_freelist.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Remove redundant synchronize_rcu. 2020-07-01 08:07:13 -07:00
reuseport_array.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
ringbuf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
stackmap.c bpf: Iterate through all PT_NOTE sections when looking for build id 2020-08-12 18:14:49 -07:00
syscall.c bpf: Fix a buffer out-of-bound access when filling raw_tp link_info 2020-08-24 21:03:07 -07:00
sysfs_btf.c bpf: Support llvm-objcopy for vmlinux BTF 2020-03-19 12:32:38 +01:00
task_iter.c bpf: Avoid visit same object multiple times 2020-08-18 17:36:23 -07:00
tnum.c bpf: Verifier, do explicit ALU32 bounds tracking 2020-03-30 14:59:53 -07:00
trampoline.c bpf: lsm: Implement attach, detach and execution 2020-03-30 01:34:00 +02:00
verifier.c bpf: Delete repeated words in comments 2020-08-07 18:57:24 +02:00