829 Commits

Author SHA1 Message Date
Alexander Aring
3428785a65 dlm: use __le types for dlm header
This patch changes to use __le types directly in the dlm header
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:28 -05:00
Alexander Aring
d9efd005fd dlm: use __le types for options header
This patch changes to use __le types directly in the dlm option headers
structures which are casted at the right dlm message buffer positions.

Currently only midcomms.c using those headers which already was calling
endian conversions on-the-fly without using in/out functionality like
other endianness handling in dlm. Using __le types now will hopefully get
useful warnings in future if we do comparison against host byte order
values.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:24 -05:00
Alexander Aring
a8449f232e dlm: add __CHECKER__ for false positives
This patch will adds #ifndef __CHECKER__ for false positives warnings
about an imbalance lock/unlock srcu handling. Which are shown by running
sparse checks:

fs/dlm/midcomms.c:1065:20: warning: context imbalance in 'dlm_midcomms_get_mhandle' - wrong count at exit

Using __CHECKER__ will tell sparse to ignore these sections.

Those imbalances are false positive because from upper layer it is
always required to call a function in sequence, e.g. if
dlm_midcomms_get_mhandle() is successful there must be a
dlm_midcomms_commit_mhandle() call afterwards.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:20 -05:00
Alexander Aring
314a5540ff dlm: move global to static inits
Instead of init global module at module loading time we can move the
initialization of those global variables at memory initialization of the
module loader.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:16 -05:00
Alexander Aring
16d58904df dlm: remove unnecessary INIT_LIST_HEAD()
There is no need to call INIT_LIST_HEAD() when it's set directly
afterwards by list_add_tail().

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:13 -05:00
Alexander Aring
bcfad4265c dlm: improve plock logging if interrupted
This patch changes the log level if a plock is removed when interrupted
from debug to info. Additional it signals now that the plock entity was
removed to let the user know what's happening.

If on a dev_write() a pending plock cannot be find it will signal that
it might have been removed because wait interruption.

Before this patch there might be a "dev_write no op ..." info message
and the users can only guess that the plock was removed before because
the wait interruption. To be sure that is the case we log both messages
on the same log level.

Let both message be logged on info layer because it should not happened
a lot and if it happens it should be clear why the op was not found.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:09 -05:00
Alexander Aring
a800ba77fd dlm: rearrange async condition return
This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier
than checking afterwards again if the request was an asynchronous request.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:05 -05:00
Alexander Aring
bcbb4ba6c9 dlm: cleanup plock_op vs plock_xop
Lately the different casting between plock_op and plock_xop and list
holders which was involved showed some issues which were hard to see.
This patch removes the "plock_xop" structure and introduces a
"struct plock_async_data". This structure will be set in "struct plock_op"
in case of asynchronous lock handling as the original "plock_xop" was
made for. There is no need anymore to cast pointers around for
additional fields in case of asynchronous lock handling.  As disadvantage
another allocation was introduces but only needed in the asynchronous
case which is currently only used in combination with nfs lockd.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:02 -05:00
Alexander Aring
a559790caa dlm: replace sanity checks with WARN_ON
There are several sanity checks and recover handling if they occur in
the dlm plock handling. From my understanding those operation can't run
in parallel with any list manipulation which involved setting the list
holder of plock_op, if so we have a bug which this sanity check will
warn about. Previously if such sanity check occurred the dlm plock
handling was trying to recover from it by deleting the plock_op from a
list which the holder was set to. However there is a bug in the dlm
plock handling if this case ever happens. To make such bugs are more
visible for further investigations we add a WARN_ON() on those sanity
checks and remove the recovering handling because other possible side
effects.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:01:58 -05:00
Alexander Aring
42252d0d2a dlm: fix plock invalid read
This patch fixes an invalid read showed by KASAN. A unlock will allocate a
"struct plock_op" and a followed send_op() will append it to a global
send_list data structure. In some cases a followed dev_read() moves it
to recv_list and dev_write() will cast it to "struct plock_xop" and access
fields which are only available in those structures. At this point an
invalid read happens by accessing those fields.

To fix this issue the "callback" field is moved to "struct plock_op" to
indicate that a cast to "plock_xop" is allowed and does the additional
"plock_xop" handling if set.

Example of the KASAN output which showed the invalid read:

[ 2064.296453] ==================================================================
[ 2064.304852] BUG: KASAN: slab-out-of-bounds in dev_write+0x52b/0x5a0 [dlm]
[ 2064.306491] Read of size 8 at addr ffff88800ef227d8 by task dlm_controld/7484
[ 2064.308168]
[ 2064.308575] CPU: 0 PID: 7484 Comm: dlm_controld Kdump: loaded Not tainted 5.14.0+ #9
[ 2064.310292] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 2064.311618] Call Trace:
[ 2064.312218]  dump_stack_lvl+0x56/0x7b
[ 2064.313150]  print_address_description.constprop.8+0x21/0x150
[ 2064.314578]  ? dev_write+0x52b/0x5a0 [dlm]
[ 2064.315610]  ? dev_write+0x52b/0x5a0 [dlm]
[ 2064.316595]  kasan_report.cold.14+0x7f/0x11b
[ 2064.317674]  ? dev_write+0x52b/0x5a0 [dlm]
[ 2064.318687]  dev_write+0x52b/0x5a0 [dlm]
[ 2064.319629]  ? dev_read+0x4a0/0x4a0 [dlm]
[ 2064.320713]  ? bpf_lsm_kernfs_init_security+0x10/0x10
[ 2064.321926]  vfs_write+0x17e/0x930
[ 2064.322769]  ? __fget_light+0x1aa/0x220
[ 2064.323753]  ksys_write+0xf1/0x1c0
[ 2064.324548]  ? __ia32_sys_read+0xb0/0xb0
[ 2064.325464]  do_syscall_64+0x3a/0x80
[ 2064.326387]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2064.327606] RIP: 0033:0x7f807e4ba96f
[ 2064.328470] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 39 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 7c 87 f8 ff 48
[ 2064.332902] RSP: 002b:00007ffd50cfe6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 2064.334658] RAX: ffffffffffffffda RBX: 000055cc3886eb30 RCX: 00007f807e4ba96f
[ 2064.336275] RDX: 0000000000000040 RSI: 00007ffd50cfe7e0 RDI: 0000000000000010
[ 2064.337980] RBP: 00007ffd50cfe7e0 R08: 0000000000000000 R09: 0000000000000001
[ 2064.339560] R10: 000055cc3886eb30 R11: 0000000000000293 R12: 000055cc3886eb80
[ 2064.341237] R13: 000055cc3886eb00 R14: 000055cc3886f590 R15: 0000000000000001
[ 2064.342857]
[ 2064.343226] Allocated by task 12438:
[ 2064.344057]  kasan_save_stack+0x1c/0x40
[ 2064.345079]  __kasan_kmalloc+0x84/0xa0
[ 2064.345933]  kmem_cache_alloc_trace+0x13b/0x220
[ 2064.346953]  dlm_posix_unlock+0xec/0x720 [dlm]
[ 2064.348811]  do_lock_file_wait.part.32+0xca/0x1d0
[ 2064.351070]  fcntl_setlk+0x281/0xbc0
[ 2064.352879]  do_fcntl+0x5e4/0xfe0
[ 2064.354657]  __x64_sys_fcntl+0x11f/0x170
[ 2064.356550]  do_syscall_64+0x3a/0x80
[ 2064.358259]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2064.360745]
[ 2064.361511] Last potentially related work creation:
[ 2064.363957]  kasan_save_stack+0x1c/0x40
[ 2064.365811]  __kasan_record_aux_stack+0xaf/0xc0
[ 2064.368100]  call_rcu+0x11b/0xf70
[ 2064.369785]  dlm_process_incoming_buffer+0x47d/0xfd0 [dlm]
[ 2064.372404]  receive_from_sock+0x290/0x770 [dlm]
[ 2064.374607]  process_recv_sockets+0x32/0x40 [dlm]
[ 2064.377290]  process_one_work+0x9a8/0x16e0
[ 2064.379357]  worker_thread+0x87/0xbf0
[ 2064.381188]  kthread+0x3ac/0x490
[ 2064.383460]  ret_from_fork+0x22/0x30
[ 2064.385588]
[ 2064.386518] Second to last potentially related work creation:
[ 2064.389219]  kasan_save_stack+0x1c/0x40
[ 2064.391043]  __kasan_record_aux_stack+0xaf/0xc0
[ 2064.393303]  call_rcu+0x11b/0xf70
[ 2064.394885]  dlm_process_incoming_buffer+0x47d/0xfd0 [dlm]
[ 2064.397694]  receive_from_sock+0x290/0x770 [dlm]
[ 2064.399932]  process_recv_sockets+0x32/0x40 [dlm]
[ 2064.402180]  process_one_work+0x9a8/0x16e0
[ 2064.404388]  worker_thread+0x87/0xbf0
[ 2064.406124]  kthread+0x3ac/0x490
[ 2064.408021]  ret_from_fork+0x22/0x30
[ 2064.409834]
[ 2064.410599] The buggy address belongs to the object at ffff88800ef22780
[ 2064.410599]  which belongs to the cache kmalloc-96 of size 96
[ 2064.416495] The buggy address is located 88 bytes inside of
[ 2064.416495]  96-byte region [ffff88800ef22780, ffff88800ef227e0)
[ 2064.422045] The buggy address belongs to the page:
[ 2064.424635] page:00000000b6bef8bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xef22
[ 2064.428970] flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
[ 2064.432515] raw: 000fffffc0000200 ffffea0000d68b80 0000001400000014 ffff888001041780
[ 2064.436110] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 2064.439813] page dumped because: kasan: bad access detected
[ 2064.442548]
[ 2064.443310] Memory state around the buggy address:
[ 2064.445988]  ffff88800ef22680: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2064.449444]  ffff88800ef22700: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2064.452941] >ffff88800ef22780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
[ 2064.456383]                                                     ^
[ 2064.459386]  ffff88800ef22800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[ 2064.462788]  ffff88800ef22880: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2064.466239] ==================================================================

reproducer in python:

import argparse
import struct
import fcntl
import os

parser = argparse.ArgumentParser()

parser.add_argument('-f', '--file',
		    help='file to use fcntl, must be on dlm lock filesystem e.g. gfs2')

args = parser.parse_args()

f = open(args.file, 'wb+')

lockdata = struct.pack('hhllhh', fcntl.F_WRLCK,0,0,0,0,0)
fcntl.fcntl(f, fcntl.F_SETLK, lockdata)
lockdata = struct.pack('hhllhh', fcntl.F_UNLCK,0,0,0,0,0)
fcntl.fcntl(f, fcntl.F_SETLK, lockdata)

Fixes: 586759f03e2e ("gfs2: nfs lock support for gfs2")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:01:53 -05:00
Alexander Aring
67e4d8c51d dlm: fix missing check in validate_lock_args
This patch adds a additional check if lkb->lkb_wait_count is non zero as
it is done in validate_unlock_args() to check if any operation is in
progress. While on it add a comment taken from validate_unlock_args() to
signal what the check is doing.

There might be no changes because if lkb->lkb_wait_type is non zero
implies that lkb->lkb_wait_count is non zero. However we should add the
check as it does validate_unlock_args().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:01:49 -05:00
Dan Carpenter
1f4f10845e dlm: uninitialized variable on error in dlm_listen_for_all()
The "sock" variable is not initialized on this error path.

Cc: stable@vger.kernel.org
Fixes: 2dc6b1158c28 ("fs: dlm: introduce generic listen")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:01:44 -05:00
Linus Torvalds
6dc69d3d0d driver core changes for 5.17-rc1
Here is the set of changes for the driver core for 5.17-rc1.
 
 Lots of little things here, including:
 	- kobj_type cleanups
 	- auxiliary_bus documentation updates
 	- auxiliary_device conversions for some drivers (relevant
 	  subsystems all have provided acks for these)
 	- kernfs lock contention reduction for some workloads
 	- other tiny cleanups and changes.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R
 VRZJypgNfU/Pt0ykstZD
 =CO9J
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of changes for the driver core for 5.17-rc1.

  Lots of little things here, including:

   - kobj_type cleanups

   - auxiliary_bus documentation updates

   - auxiliary_device conversions for some drivers (relevant subsystems
     all have provided acks for these)

   - kernfs lock contention reduction for some workloads

   - other tiny cleanups and changes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits)
  kobject documentation: remove default_attrs information
  drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
  debugfs: lockdown: Allow reading debugfs files that are not world readable
  driver core: Make bus notifiers in right order in really_probe()
  driver core: Move driver_sysfs_remove() after driver_sysfs_add()
  firmware: edd: remove empty default_attrs array
  firmware: dmi-sysfs: use default_groups in kobj_type
  qemu_fw_cfg: use default_groups in kobj_type
  firmware: memmap: use default_groups in kobj_type
  sh: sq: use default_groups in kobj_type
  headers/uninline: Uninline single-use function: kobject_has_children()
  devtmpfs: mount with noexec and nosuid
  driver core: Simplify async probe test code by using ktime_ms_delta()
  nilfs2: use default_groups in kobj_type
  kobject: remove kset from struct kset_uevent_ops callbacks
  driver core: make kobj_type constant.
  driver core: platform: document registration-failure requirement
  vdpa/mlx5: Use auxiliary_device driver data helpers
  net/mlx5e: Use auxiliary_device driver data helpers
  soundwire: intel: Use auxiliary_device driver data helpers
  ...
2022-01-12 11:11:34 -08:00
Alexander Aring
feae43f8aa fs: dlm: print cluster addr if non-cluster node connects
This patch prints the cluster node address if a non-cluster node
(according to the dlm config setting) tries to connect. The current
hexdump call will print in a different loglevel and only available if
dynamic debug is enabled. Additional we using the ip address format
strings to print an IETF ip4/6 string represenation.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-01-04 09:40:24 -06:00
Greg Kroah-Hartman
cf6299b610 kobject: remove kset from struct kset_uevent_ops callbacks
There is no need to pass the pointer to the kset in the struct
kset_uevent_ops callbacks as no one uses it, so just remove that pointer
entirely.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com>
Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-28 11:26:18 +01:00
Alexander Aring
e4dc81ed5a fs: dlm: memory cache for lowcomms hotpath
This patch introduces a kmem cache for dlm_msg handles which are used
always if dlm sends a message out. Even if their are covered by midcomms
layer or not.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
3af2326ca0 fs: dlm: memory cache for writequeue_entry
This patch introduces a kmem cache for writequeue entry. A writequeue
entry get quite a lot allocated if dlm transmit messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
6c547f2640 fs: dlm: memory cache for midcomms hotpath
This patch will introduce a kmem cache for allocating message handles
which are needed for midcomms layer to take track of lowcomms messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
be3b0400ed fs: dlm: remove wq_alloc mutex
This patch cleanups the code for allocating a new buffer in the dlm
writequeue mechanism. There was a possible tuneup to allow scheduling
while a new writequeue entry needs to be allocated because either no
sending page is available or are full. To avoid multiple concurrent
users checking at the same time if an entry is available or full
alloc_wq was introduce that those are waiting if there is currently a
new writequeue entry in process to be queued so possible further users
will check on the new allocated writequeue entry if it's full.

To simplify the code we just remove this mutex and switch that the
already introduced spin lock will be held during writequeue check,
allocation and queueing. So other users can never check on available
writequeues while there is a new one in process but not queued yet.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
21d9ac1a53 fs: dlm: use event based wait for pending remove
This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
bcbfea41e1 fs: dlm: check for pending users filling buffers
Currently we don't care if the DLM application stack is filling buffers
(not committed yet) while we transmit some already committed buffers.
By checking on active writequeue users before dequeue a writequeue entry
we know there is coming more data and do nothing. We wait until the send
worker will be triggered again if the writequeue entry users hit zero.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
f70813d6a5 fs: dlm: use list_empty() to check last iteration
This patch will use list_empty(&ls->ls_cb_delay) to check for last list
iteration. In case of a multiply count of MAX_CB_QUEUE and the list is
empty we do a extra goto more which we can avoid by checking on
list_empty().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
1b9beda83e fs: dlm: fix build with CONFIG_IPV6 disabled
This patch will surround the AF_INET6 case in sk_error_report() of dlm
with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not
defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the
socket creation with AF_INET6 should already fail because a runtime
check if AF_INET6 is registered. However if there is the possibility
that AF_INET6 is set as sk_family the sk_error_report() callback will
print then an invalid family type error.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 4c3d90570bcc ("fs: dlm: don't call kernel_getpeername() in error_report()")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-17 08:46:23 -06:00
Alexander Aring
92c4460538 fs: dlm: replace use of socket sk_callback_lock with sock_lock
This patch will replace the use of socket sk_callback_lock lock and uses
socket lock instead. Some users like sunrpc, see commit ea9afca88bbe
("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving
from sk_callback_lock to sock_lock which seems to be held when the socket
callbacks are called.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15 11:49:18 -06:00
Alexander Aring
4c3d90570b fs: dlm: don't call kernel_getpeername() in error_report()
In some cases kernel_getpeername() will held the socket lock which is
already held when the socket layer calls error_report() callback. Since
commit 9dfc685e0262 ("inet: remove races in inet{6}_getname()") this
problem becomes more likely because the socket lock will be held always.
You will see something like:

bob9-u5 login: [  562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0
[  562.318562]  lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7
[  562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135
[  562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
[  562.321277] Call Trace:
[  562.321529]  <IRQ>
[  562.321734]  dump_stack_lvl+0x33/0x42
[  562.322282]  do_raw_spin_lock+0x8b/0xc0
[  562.322674]  lock_sock_nested+0x1e/0x50
[  562.323057]  inet_getname+0x39/0x110
[  562.323425]  ? sock_def_readable+0x80/0x80
[  562.323838]  lowcomms_error_report+0x63/0x260 [dlm]
[  562.324338]  ? wait_for_completion_interruptible_timeout+0xd2/0x120
[  562.324949]  ? lock_timer_base+0x67/0x80
[  562.325330]  ? do_raw_spin_unlock+0x49/0xc0
[  562.325735]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[  562.326218]  ? del_timer+0x54/0x80
[  562.326549]  sk_error_report+0x12/0x70
[  562.326919]  tcp_validate_incoming+0x3c8/0x530
[  562.327347]  ? kvm_clock_read+0x14/0x30
[  562.327718]  ? ktime_get+0x3b/0xa0
[  562.328055]  tcp_rcv_established+0x121/0x660
[  562.328466]  tcp_v4_do_rcv+0x132/0x260
[  562.328835]  tcp_v4_rcv+0xcea/0xe20
[  562.329173]  ip_protocol_deliver_rcu+0x35/0x1f0
[  562.329615]  ip_local_deliver_finish+0x54/0x60
[  562.330050]  ip_local_deliver+0xf7/0x110
[  562.330431]  ? inet_rtm_getroute+0x211/0x840
[  562.330848]  ? ip_protocol_deliver_rcu+0x1f0/0x1f0
[  562.331310]  ip_rcv+0xe1/0xf0
[  562.331603]  ? ip_local_deliver+0x110/0x110
[  562.332011]  __netif_receive_skb_core+0x46a/0x1040
[  562.332476]  ? inet_gro_receive+0x263/0x2e0
[  562.332885]  __netif_receive_skb_list_core+0x13b/0x2c0
[  562.333383]  netif_receive_skb_list_internal+0x1c8/0x2f0
[  562.333896]  ? update_load_avg+0x7e/0x5e0
[  562.334285]  gro_normal_list.part.149+0x19/0x40
[  562.334722]  napi_complete_done+0x67/0x160
[  562.335134]  virtnet_poll+0x2ad/0x408 [virtio_net]
[  562.335644]  __napi_poll+0x28/0x140
[  562.336012]  net_rx_action+0x23d/0x300
[  562.336414]  __do_softirq+0xf2/0x2ea
[  562.336803]  irq_exit_rcu+0xc1/0xf0
[  562.337173]  common_interrupt+0xb9/0xd0

It is and was always forbidden to call kernel_getpeername() in context
of error_report(). To get rid of the problem we access the destination
address for the peer over the socket structure. While on it we fix to
print out the destination port of the inet socket.

Fixes: 1a31833d085a ("DLM: Replace nodeid_to_addr with kernel_getpeername")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15 11:49:18 -06:00
Alexander Aring
6a628fa438 fs: dlm: fix potential buffer overflow
This patch fixes an potential overflow in sscanf and the maximum
declared string parsing length which seems to be excluding the null
termination symbol. This patch will just add one byte to be prepared on
a string with length of DLM_RESNAME_MAXLEN including the null
termination symbol.

Fixes: 5054e79de999 ("fs: dlm: add lkb debugfs functionality")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-12 09:38:19 -06:00
Zhang Mingyu
c8b9f34e22 fs: dlm:Remove unneeded semicolon
Eliminate the following coccinelle check warning:
fs/dlm/midcomms.c:972:2-3

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-05 13:03:34 -05:00
Alexander Aring
b87b1883ef fs: dlm: remove double list_first_entry call
This patch removes a list_first_entry() call which is already done by
the previous con_next_wq() call.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-03 16:07:20 -05:00
Alexander Aring
6c2e3bf68f fs: dlm: filter user dlm messages for kernel locks
This patch fixes the following crash by receiving a invalid message:

[  160.672220] ==================================================================
[  160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[  160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[  160.681447]
[  160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[  160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[  160.685574] Workqueue: dlm_recv process_recv_sockets
[  160.686721] Call Trace:
[  160.687310]  dump_stack_lvl+0x56/0x6f
[  160.688169]  ? dlm_user_add_ast+0xc3/0x370
[  160.689116]  kasan_report.cold.14+0x116/0x11b
[  160.690138]  ? dlm_user_add_ast+0xc3/0x370
[  160.690832]  dlm_user_add_ast+0xc3/0x370
[  160.691502]  _receive_unlock_reply+0x103/0x170
[  160.692241]  _receive_message+0x11df/0x1ec0
[  160.692926]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.693700]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.694427]  ? lock_acquire+0x175/0x400
[  160.695058]  ? do_purge.isra.51+0x200/0x200
[  160.695744]  ? lock_acquired+0x360/0x5d0
[  160.696400]  ? lock_contended+0x6a0/0x6a0
[  160.697055]  ? lock_release+0x21d/0x5e0
[  160.697686]  ? lock_is_held_type+0xe0/0x110
[  160.698352]  ? lock_is_held_type+0xe0/0x110
[  160.699026]  ? ___might_sleep+0x1cc/0x1e0
[  160.699698]  ? dlm_wait_requestqueue+0x94/0x140
[  160.700451]  ? dlm_process_requestqueue+0x240/0x240
[  160.701249]  ? down_write_killable+0x2b0/0x2b0
[  160.701988]  ? do_raw_spin_unlock+0xa2/0x130
[  160.702690]  dlm_receive_buffer+0x1a5/0x210
[  160.703385]  dlm_process_incoming_buffer+0x726/0x9f0
[  160.704210]  receive_from_sock+0x1c0/0x3b0
[  160.704886]  ? dlm_tcp_shutdown+0x30/0x30
[  160.705561]  ? lock_acquire+0x175/0x400
[  160.706197]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.706941]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.707681]  process_recv_sockets+0x32/0x40
[  160.708366]  process_one_work+0x55e/0xad0
[  160.709045]  ? pwq_dec_nr_in_flight+0x110/0x110
[  160.709820]  worker_thread+0x65/0x5e0
[  160.710423]  ? process_one_work+0xad0/0xad0
[  160.711087]  kthread+0x1ed/0x220
[  160.711628]  ? set_kthread_struct+0x80/0x80
[  160.712314]  ret_from_fork+0x22/0x30

The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.

It can be reproduced with two cluster nodes.

node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters

node 1:
dlm_tool join test

python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
          m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
63eab2b00b fs: dlm: add lkb waiters debugfs functionality
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
5054e79de9 fs: dlm: add lkb debugfs functionality
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.

Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
75d25ffe38 fs: dlm: allow create lkb with specific id range
This patch adds functionality to add a lkb with a specific id range.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
9af5b8f0ea fs: dlm: add debugfs rawmsg send functionality
This patch adds a dlm functionality to send a raw dlm message to a
specific cluster node. This raw message can be build by user space and
send out by writing the message to "rawmsg" dlm debugfs file.

There is a in progress scapy dlm module which provides a easy build of
DLM messages in user space. For example:

DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...)

The goal is to provide an easy reproducable state to crash DLM or to
fuzz the DLM kernel stack if there are possible ways to crash it.

Note: that if the sequence number is zero and dlm version is not set to
3.1 the kernel will automatic will set a right sequence number, otherwise
DLM stack testing is not possible.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
5c16febbc1 fs: dlm: let handle callback data as void
This patch changes the dlm_lowcomms_new_msg() function pointer private data
from "struct mhandle *" to "void *" to provide different structures than
just "struct mhandle".

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
3cb5977c52 fs: dlm: ls_count busy wait to event based wait
This patch changes the ls_count busy wait to use atomic counter values
and wait_event() to wait until ls_count reach zero. It will slightly
reduce the number of holding lslist_lock. At remove lockspace we need to
retry the wait because it a lockspace get could interefere between
wait_event() and holding the lock which deletes the lockspace list entry.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
164d88abd7 fs: dlm: requestqueue busy wait to event based wait
This patch changes the requestqueue busy waiting algorithm to use
atomic counter values and wait_event() to wait until the requestqueue is
empty. It will slightly reduce the number of holding ls_requestqueue_mutex
mutex.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
92732376fd fs: dlm: trace socket handling
This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
f1d3b8f91d fs: dlm: initial support for tracepoints
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.

The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.

I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
2f05ec4327 fs: dlm: make dlm_callback_resume quite
This patch makes dlm_callback_resume info printout less noisy by
accumulate all callback queues into one printout not in 25 times steps.
It seems this printout became lately quite noisy in relationship with
gfs2.

Before:

[241767.849302] dlm: bin: dlm_callback_resume 25
[241767.854846] dlm: bin: dlm_callback_resume 25
[241767.860373] dlm: bin: dlm_callback_resume 25
...
[241767.865920] dlm: bin: dlm_callback_resume 25
[241767.871352] dlm: bin: dlm_callback_resume 25
[241767.876733] dlm: bin: dlm_callback_resume 25

After the patch:

[  385.485728] dlm: gfs2: dlm_callback_resume 175

if zero it will not be printed out.

Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
e10249b190 fs: dlm: use dlm_recovery_stopped in condition
This patch will change to evaluate the dlm_recovery_stopped() in the
condition of the if branch instead fetch it before evaluating the
condition. As this is an atomic test-set operation it should be
evaluated in the condition itself.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
3e9736713d fs: dlm: use dlm_recovery_stopped instead of test_bit
This patch will change to use dlm_recovery_stopped() which is the dlm way
to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper.
It is an atomic operation but the check is still as before to fetch the
value if ls_recover_lock is held. There might be more further
investigations if the value can be changed afterwards and if it has any
side effects.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
658bd576f9 fs: dlm: move version conversion to compile time
This patch moves version conversion to little endian from a runtime
variable to compile time constant.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
fe93367541 fs: dlm: remove check SCTP is loaded message
Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try
load the sctp module before we try to create a sctp kernel socket. That
a socket creation fails now has more likely other reasons. This patch
removes the part of error to load the sctp module and instead printout
the error code.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
1aafd9c231 fs: dlm: debug improvements print nodeid
This patch improves the debug output for midcomms layer by also printing
out the nodeid where users counter belongs to.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
bb6866a5bd fs: dlm: fix small lockspace typo
This patch fixes a typo from lockspace to lockspace.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
dea450c90f fs: dlm: remove obsolete INBUF define
This patch removes an obsolete define for some length for an temporary
buffer which is not being used anymore. The use of this define is not
necessary anymore since commit 4798cbbfbd00 ("fs: dlm: rework receive
handling").

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
ecd9567314 fs: dlm: avoid comms shutdown delay in release_lockspace
When dlm_release_lockspace does active shutdown on connections to
other nodes, the active shutdown will wait for any exisitng passive
shutdowns to be resolved.  But, the sequence of operations during
dlm_release_lockspace can prevent the normal resolution of passive
shutdowns (processed normally by way of lockspace recovery.)
This disruption of passive shutdown handling can cause the active
shutdown to wait for a full timeout period, delaying the completion
of dlm_release_lockspace.

To fix this, make dlm_release_lockspace resolve existing passive
shutdowns (by calling dlm_clear_members earlier), before it does
active shutdowns.  The active shutdowns will not find any passive
shutdowns to wait for, and will not be delayed.

Reported-by: Chris Mackowski <cmackows@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-09-01 11:29:14 -05:00
Alexander Aring
aee742c992 fs: dlm: fix return -EINTR on recovery stopped
This patch will return -EINTR instead of 1 if recovery is stopped. In
case of ping_members() the return value will be checked if the error is
-EINTR for signaling another recovery was triggered and the whole
recovery process will come to a clean end to process the next one.
Returning 1 will abort the recovery process and can leave the recovery
in a broken state.

It was reported with the following kernel log message attached and a gfs2
mount stopped working:

"dlm: bobvirt1: dlm_recover_members error 1"

whereas 1 was returned because of a conversion of "dlm_recovery_stopped()"
to an errno was missing which this patch will introduce. While on it all
other possible missing errno conversions at other places were added as
they are done as in other places.

It might be worth to check the error case at this recovery level,
because some of the functionality also returns -ENOBUFS and check why
recovery ends in a broken state. However this will fix the issue if
another recovery was triggered at some points of recovery handling.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-08-19 11:33:03 -05:00
Alexander Aring
b97f85259f fs: dlm: implement delayed ack handling
This patch changes that we don't ack each message. Lowcomms will take
care about to send an ack back after a bulk of messages was processed.
Currently it's only when the whole receive buffer was processed, there
might better positions to send an ack back but only the lowcomms
implementation know when there are more data to receive. This patch has
also disadvantages that we might retransmit more on errors, however this
is a very rare case.

Tested with make_panic on gfs2 with three nodes by running:

trace-cmd record -p function -l 'dlm_send_ack' sleep 100

and

trace-cmd report | wc -l

Before patch:
- 20548
- 21376
- 21398

After patch:
- 18338
- 20679
- 19949

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-08-19 11:33:03 -05:00
Alexander Aring
62699b3f0a fs: dlm: move receive loop into receive handler
This patch moves the kernel_recvmsg() loop call into the
receive_from_sock() function instead of doing the loop outside the
function and abort the loop over it's return value.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-07-19 11:56:36 -05:00