IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Now the last users of show_stack() got converted to use an explicit log
level, show_stack_loglvl() can drop it's redundant suffix and become once
again well known show_stack().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200418201944.482088-51-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Print the stack trace with KERN_EMERG - it should be always visible.
Playing with console_loglevel is a bad idea as there may be more messages
printed than wanted. Also the stack trace might be not printed at all if
printk() was deferred and console_loglevel was raised back before the
trace got flushed.
Unfortunately, after rebasing on commit 2277b49258 ("kdb: Fix stack
crawling on 'running' CPUs that aren't the master"), kdb_show_stack() uses
now kdb_dump_stack_on_cpu(), which for now won't be converted as it uses
dump_stack() instead of show_stack().
Convert for now the branch that uses show_stack() and remove
console_loglevel exercise from that case.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20200418201944.482088-48-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the tty and serial driver updates for 5.8-rc1
Nothing huge at all, just a lot of little serial driver fixes, updates
for new devices and features, and other small things. Full details are
in the shortlog.
Note, you will get a conflict merging with your tree in the
Documentation/devicetree/bindings/serial/rs485.yaml file, but it should
be pretty obvious what to do. If not, I'm sure Rob will clean it all up
afterwards :)
All of these have been in linux-next with no issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzpCg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylRxACgjGtOKPjahONL4lWd0F8ZYEcyw7sAn34woBCO
BDUV3kolrRQ4OYNJWsHP
=TvqG
-----END PGP SIGNATURE-----
Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the tty and serial driver updates for 5.8-rc1
Nothing huge at all, just a lot of little serial driver fixes, updates
for new devices and features, and other small things. Full details are
in the shortlog.
All of these have been in linux-next with no issues for a while"
* tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits)
tty: serial: qcom_geni_serial: Add 51.2MHz frequency support
tty: serial: imx: clear Ageing Timer Interrupt in handler
serial: 8250_fintek: Add F81966 Support
sc16is7xx: Add flag to activate IrDA mode
dt-bindings: sc16is7xx: Add flag to activate IrDA mode
serial: 8250: Support rs485 bus termination GPIO
serial: 8520_port: Fix function param documentation
dt-bindings: serial: Add binding for rs485 bus termination GPIO
vt: keyboard: avoid signed integer overflow in k_ascii
serial: 8250: Enable 16550A variants by default on non-x86
tty: hvc_console, fix crashes on parallel open/close
serial: imx: Initialize lock for non-registered console
sc16is7xx: Read the LSR register for basic device presence check
sc16is7xx: Allow sharing the IRQ line
sc16is7xx: Use threaded IRQ
sc16is7xx: Always use falling edge IRQ
tty: n_gsm: Fix bogus i++ in gsm_data_kick
tty: n_gsm: Remove unnecessary test in gsm_print_packet()
serial: stm32: add no_console_suspend support
tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP
...
* The remainder of the code necessary to support the Kendryte K210.
* Support for building device trees into the kernel, as the K210 doesn't
have a bootloader that provides one.
* A K210 device tree and the associated defconfig update.
* Support for skipping PMP initialization on systems that trap on PMP
accesses rather than treating them as WARL.
* Support for KGDB.
* Improvements to text patching.
* Some cleanups to the SiFive L2 cache driver.
I may have a second part, but I wanted to get this out earlier rather than
later as they've been ready to go for a while now.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl7YRtMTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYif+oD/4kaoQqmiw0VvyFjBlPT/TC7OPbaoXO
5WBegSEAl+oOq+fUtjHomzn6HaTW2cPNexsqC/KuGP1/QL0OQ23T5O18LaCmrGMd
CvELeAcvu6G4kbRuHwF0U16hBI2A1xIYVUJ8qmbuHPpEwVel3AsP2RIjQ+bOvm7g
fVTIrlzl1s1eTsOXdOJb3sFZtISHb7o3lHZmcplDva3N13x8E0FrjuoJhHv0f2mj
kcmZcy3sr8luAkwAJmbqmdPuwYTlteMnucXdCUv/kG2SaPkBwU5TS+MN7No2ylXH
v4vxjkuSyJ1db72pla6uUB/cYD+mh2YI4tjjvH93s3k4I/GZMCrqivXZW2QtxBdt
na++GxK8e4FrBG+VCCCQvx2mqugpMFMDnZDt36N2BqwjOtEzO3W8GqjrKCovXtQf
farlgAiuFNVcoo7VFdpNZE+N5elQa0GWf57csCqqMNN4J8JLaOgyS7ByozJ9b9/A
1x4g7kQANnMvYiYAeV+C2YfBcK6x8gHEPlWio1QZqirW5la34VUEWa18+rQeSRMz
DArKPauCR1GuYZZVUkkDI8cXj0Gs5fTXsYmn+WsIpjFHnDYeYVe6Ocmczxyb3o3O
lQwm7+drHlSmMvgrOlWJMrIUbdiomgelDU/iD4PiZzKozC52yKDS6A5gzoG6uZSM
UleeAIeVG/TDCw==
=keTq
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.8-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The remainder of the code necessary to support the Kendryte K210:
* Support for building device trees into the kernel, as the K210
doesn't have a bootloader that provides one
* A K210 device tree and the associated defconfig update
* Support for skipping PMP initialization on systems that trap on
PMP accesses rather than treating them as WARL
- Support for KGDB
- Improvements to text patching
- Some cleanups to the SiFive L2 cache driver
* tag 'riscv-for-linus-5.8-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
soc: sifive: l2 cache: Mark l2_get_priv_group as static
soc: sifive: l2 cache: Eliminate an unsigned zero compare warning
riscv: Add support to determine no. of L2 cache way enabled
riscv: cacheinfo: Implement cache_get_priv_group with a generic ops structure
riscv: Use text_mutex instead of patch_lock
riscv: Use NOKPROBE_SYMBOL() instead of __krpobes annotation
riscv: Remove the 'riscv_' prefix of function name
riscv: Add SW single-step support for KDB
riscv: Use the XML target descriptions to report 3 system registers
riscv: Add KGDB support
kgdb: Add kgdb_has_hit_break function
RISC-V: Skip setting up PMPs on traps
riscv: K210: Update defconfig
riscv: K210: Add a built-in device tree
riscv: Allow device trees to be built into the kernel
Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined
by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG'
is set, and the user can define an environment variable named 'KDBFLAGS'
too. These are puzzling indeed.
After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature.
So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we
wrote into. After this modification, we can use `md4c1 kdb_flags` instead,
to observe the state flags.
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20200521072125.21103-1-liwei391@huawei.com
[daniel.thompson@linaro.org: Make kdb_flags unsigned to avoid arithmetic
right shift]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
From code inspection the math in handle_ctrl_cmd() looks super sketchy
because it subjects -1 from cmdptr and then does a "%
KDB_CMD_HISTORY_COUNT". It turns out that this code works because
"cmdptr" is unsigned and KDB_CMD_HISTORY_COUNT is a nice power of 2.
Let's make this a little less sketchy.
This patch should be a no-op.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200507161125.1.I2cce9ac66e141230c3644b8174b6c15d4e769232@changeid
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
When I combined kgdboc_earlycon with an inflight patch titled ("soc:
qcom-geni-se: Add interconnect support to fix earlycon crash") [1]
things went boom. Specifically I got a crash during the transition
between kgdboc_earlycon and the main kgdboc that looked like this:
Call trace:
__schedule_bug+0x68/0x6c
__schedule+0x75c/0x924
schedule+0x8c/0xbc
schedule_timeout+0x9c/0xfc
do_wait_for_common+0xd0/0x160
wait_for_completion_timeout+0x54/0x74
rpmh_write_batch+0x1fc/0x23c
qcom_icc_bcm_voter_commit+0x1b4/0x388
qcom_icc_set+0x2c/0x3c
apply_constraints+0x5c/0x98
icc_set_bw+0x204/0x3bc
icc_put+0x30/0xf8
geni_remove_earlycon_icc_vote+0x6c/0x9c
qcom_geni_serial_earlycon_exit+0x10/0x1c
kgdboc_earlycon_deinit+0x38/0x58
kgdb_register_io_module+0x11c/0x194
configure_kgdboc+0x108/0x174
kgdboc_probe+0x38/0x60
platform_drv_probe+0x90/0xb0
really_probe+0x130/0x2fc
...
The problem was that we were holding the "kgdb_registration_lock"
while calling into code that didn't expect to be called in spinlock
context.
Let's slightly defer when we call the deinit code so that it's not
done under spinlock.
NOTE: this does mean that the "deinit" call of the old kgdb IO module
is now made _after_ the init of the new IO module, but presumably
that's OK.
[1] https://lkml.kernel.org/r/1588919619-21355-3-git-send-email-akashast@codeaurora.org
Fixes: 220995622d ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200526142001.1.I523dc33f96589cb9956f5679976d402c8cda36fa@changeid
[daniel.thompson@linaro.org: Resolved merge issues by hand]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The break instruction in RISC-V does not have an immediate value field, so
the kernel cannot identify the purpose of each trap exception through the
opcode. This makes the existing identification schemes in other
architecture unsuitable for the RISC-V kernel. To solve this problem, this
patch adds kgdb_has_hit_break(), which can help RISC-V kernel identify
the KGDB trap exception.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
We want to enable kgdb to debug the early parts of the kernel.
Unfortunately kgdb normally is a client of the tty API in the kernel
and serial drivers don't register to the tty layer until fairly late
in the boot process.
Serial drivers do, however, commonly register a boot console. Let's
enable the kgdboc driver to work with boot consoles to provide early
debugging.
This change co-opts the existing read() function pointer that's part
of "struct console". It's assumed that if a boot console (with the
flag CON_BOOT) has implemented read() that both the read() and write()
function are polling functions. That means they work without
interrupts and read() will return immediately (with 0 bytes read) if
there's nothing to read. This should be a safe assumption since it
appears that no current boot consoles implement read() right now and
there seems no reason to do so unless they wanted to support
"kgdboc_earlycon".
The normal/expected way to make all this work is to use
"kgdboc_earlycon" and "kgdboc" together. You should point them both
to the same physical serial connection. At boot time, as the system
transitions from the boot console to the normal console (and registers
a tty), kgdb will switch over.
One awkward part of all this, though, is that there can be a window
where the boot console goes away and we can't quite transtion over to
the main kgdboc that uses the tty layer. There are two main problems:
1. The act of registering the tty doesn't cause any call into kgdboc
so there is a window of time when the tty is there but kgdboc's
init code hasn't been called so we can't transition to it.
2. On some serial drivers the normal console inits (and replaces the
boot console) quite early in the system. Presumably these drivers
were coded up before earlycon worked as well as it does today and
probably they don't need to do this anymore, but it causes us
problems nontheless.
Problem #1 is not too big of a deal somewhat due to the luck of probe
ordering. kgdboc is last in the tty/serial/Makefile so its probe gets
right after all other tty devices. It's not fun to rely on this, but
it does work for the most part.
Problem #2 is a big deal, but only for some serial drivers. Other
serial drivers end up registering the console (which gets rid of the
boot console) and tty at nearly the same time.
The way we'll deal with the window when the system has stopped using
the boot console and the time when we're setup using the tty is to
keep using the boot console. This may sound surprising, but it has
been found to work well in practice. If it doesn't work, it shouldn't
be too hard for a given serial driver to make it keep working.
Specifically, it's expected that the read()/write() function provided
in the boot console should be the same (or nearly the same) as the
normal kgdb polling functions. That means continuing to use them
should work just fine. To make things even more likely to work work
we'll also trap the recently added exit() function in the boot console
we're using and delay any calls to it until we're all done with the
boot console.
NOTE: there could be ways to use all this in weird / unexpected ways.
If you do something like this, it's a bit of a buyer beware situation.
Specifically:
- If you specify only "kgdboc_earlycon" but not "kgdboc" then
(depending on your serial driver) things will probably work OK, but
you'll get a warning printed the first time you use kgdb after the
boot console is gone. You'd only be able to do this, of course, if
the serial driver you're running atop provided an early boot console.
- If your "kgdboc_earlycon" and "kgdboc" devices are not the same
device things should work OK, but it'll be your job to switch over
which device you're monitoring (including figuring out how to switch
over gdb in-flight if you're using it).
When trying to enable "kgdboc_earlycon" it should be noted that the
names that are registered through the boot console layer and the tty
layer are not the same for the same port. For example when debugging
on one board I'd need to pass "kgdboc_earlycon=qcom_geni
kgdboc=ttyMSM0" to enable things properly. Since digging up the boot
console name is a pain and there will rarely be more than one boot
console enabled, you can provide the "kgdboc_earlycon" parameter
without specifying the name of the boot console. In this case we'll
just pick the first boot that implements read() that we find.
This new "kgdboc_earlycon" parameter should be contrasted to the
existing "ekgdboc" parameter. While both provide a way to debug very
early, the usage and mechanisms are quite different. Specifically
"kgdboc_earlycon" is meant to be used in tandem with "kgdboc" and
there is a transition from one to the other. The "ekgdboc" parameter,
on the other hand, replaces the "kgdboc" parameter. It runs the same
logic as the "kgdboc" parameter but just relies on your TTY driver
being present super early. The only known usage of the old "ekgdboc"
parameter is documented as "ekgdboc=kbd earlyprintk=vga". It should
be noted that "kbd" has special treatment allowing it to init early as
a tty device.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/20200507130644.v4.8.I8fba5961bf452ab92350654aa61957f23ecf0100@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If we detect that we recursively entered the debugger we should hack
our I/O ops to NULL so that the panic() in the next line won't
actually cause another recursion into the debugger. The first line of
kgdb_panic() will check this and return.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20200507130644.v4.6.I89de39f68736c9de610e6f241e68d8dbc44bc266@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Using kgdb requires at least some level of architecture-level
initialization. If nothing else, it relies on the architecture to
pass breakpoints / crashes onto kgdb.
On some architectures this all works super early, specifically it
starts working at some point in time before Linux parses
early_params's. On other architectures it doesn't. A survey of a few
platforms:
a) x86: Presumably it all works early since "ekgdboc" is documented to
work here.
b) arm64: Catching crashes works; with a simple patch breakpoints can
also be made to work.
c) arm: Nothing in kgdb works until
paging_init() -> devicemaps_init() -> early_trap_init()
Let's be conservative and, by default, process "kgdbwait" (which tells
the kernel to drop into the debugger ASAP at boot) a bit later at
dbg_late_init() time. If an architecture has tested it and wants to
re-enable super early debugging, they can select the
ARCH_HAS_EARLY_DEBUG KConfig option. We'll do this for x86 to start.
It should be noted that dbg_late_init() is still called quite early in
the system.
Note that this patch doesn't affect when kgdb runs its init. If kgdb
is set to initialize early it will still initialize when parsing
early_param's. This patch _only_ inhibits the initial breakpoint from
"kgdbwait". This means:
* Without any extra patches arm64 platforms will at least catch
crashes after kgdb inits.
* arm platforms will catch crashes (and could handle a hardcoded
kgdb_breakpoint()) any time after early_trap_init() runs, even
before dbg_late_init().
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200507130644.v4.4.I3113aea1b08d8ce36dc3720209392ae8b815201b@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
In commit 81eaadcae8 ("kgdboc: disable the console lock when in
kgdb") we avoided the WARN_CONSOLE_UNLOCKED() yell when we were in
kgdboc. That still works fine, but it turns out that we get a similar
yell when using other I/O drivers. One example is the "I/O driver"
for the kgdb test suite (kgdbts). When I enabled that I again got the
same yells.
Even though "kgdbts" doesn't actually interact with the user over the
console, using it still causes kgdb to print to the consoles. That
trips the same warning:
con_is_visible+0x60/0x68
con_scroll+0x110/0x1b8
lf+0x4c/0xc8
vt_console_print+0x1b8/0x348
vkdb_printf+0x320/0x89c
kdb_printf+0x68/0x90
kdb_main_loop+0x190/0x860
kdb_stub+0x2cc/0x3ec
kgdb_cpu_enter+0x268/0x744
kgdb_handle_exception+0x1a4/0x200
kgdb_compiled_brk_fn+0x34/0x44
brk_handler+0x7c/0xb8
do_debug_exception+0x1b4/0x228
Let's increment/decrement the "ignore_console_lock_warning" variable
all the time when we enter the debugger.
This will allow us to later revert commit 81eaadcae8 ("kgdboc:
disable the console lock when in kgdb").
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20200507130644.v4.1.Ied2b058357152ebcc8bf68edd6f20a11d98d7d4e@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
With earlier commits, the API no longer discards the const-ness of the
sysrq_key_op. As such we can add the notation.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-9-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kernel doc does not understand POD variables to be referred to.
.../debug_core.c:73: warning: cannot understand function prototype:
'int kgdb_connected; '
Convert kernel doc to pure comment.
Fixes: dc7d552705 ("kgdb: core")
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Currently the PROMPT variable could be abused to provoke the printf()
machinery to read outside the current stack frame. Normally this
doesn't matter becaues md is already a much better tool for reading
from memory.
However the md command can be disabled by not setting KDB_ENABLE_MEM_READ.
Let's also prevent PROMPT from being modified in these circumstances.
Whilst adding a comment to help future code reviewers we also remove
the #ifdef where PROMPT in consumed. There is no problem passing an
unused (0) to snprintf when !CONFIG_SMP.
argument
Reported-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Currently the code to manage the kdb history buffer uses strncpy() to
copy strings to/and from the history and exhibits the classic "but
nobody ever told me that strncpy() doesn't always terminate strings"
bug. Modern gcc compilers recognise this bug and issue a warning.
In reality these calls will only abridge the copied string if kdb_read()
has *already* overflowed the command buffer. Thus the use of counted
copies here is only used to reduce the secondary effects of a bug
elsewhere in the code.
Therefore transitioning these calls into strscpy() (without checking
the return code) is appropriate.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
This reverts commit bbfceba15f.
When DBG_MAX_REG_NUM is zero then a number of symbols are conditionally
defined. It is therefore not possible to check it using C expressions.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Acked-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Replace open coded single-linked list iteration loop with for_each_console()
helper in use.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The point bp is assigned a value that is never read, it is being
re-assigned later to bp = &kdb_breakpoints[lowbp] in a for-loop.
Remove the redundant assignment.
Addresses-Coverity ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20191128130753.181246-1-colin.king@canonical.com
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If you switch to a sleeping task with the "pid" command and then type
"rd", kdb tells you this:
No current kdb registers. You may need to select another task
diag: -17: Invalid register name
The first message makes sense, but not the second. Fix it by just
returning 0 after commands accessing the current registers finish if
we've already printed the "No current kdb registers" error.
While fixing kdb_rd(), change the function to use "if" rather than
"ifdef". It cleans the function up a bit and any modern compiler will
have no trouble handling still producing good code.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111624.5.I121f4c6f0c19266200bf6ef003de78841e5bfc3d@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Some (but not all?) of the kdb backtrace paths would cause the
kdb_current_task and kdb_current_regs to remain changed. As discussed
in a review of a previous patch [1], this doesn't seem intuitive, so
let's fix that.
...but, it turns out that there's actually no longer any reason to set
the current task / current regs while backtracing anymore anyway. As
of commit 2277b49258 ("kdb: Fix stack crawling on 'running' CPUs
that aren't the master") if we're backtracing on a task running on a
CPU we ask that CPU to do the backtrace itself. Linux can do that
without anything fancy. If we're doing backtrace on a sleeping task
we can also do that fine without updating globals. So this patch
mostly just turns into deleting a bunch of code.
[1] https://lore.kernel.org/r/20191010150735.dhrj3pbjgmjrdpwr@holly.lan
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111624.4.Ibc3d982bbeb9e46872d43973ba808cd4c79537c7@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The kdb_current_task variable has been declared in
"kernel/debug/kdb/kdb_private.h" since 2010 when kdb was added to the
mainline kernel. This is not a public header. There should be no
reason that kdb_current_task should be exported and there are no
in-kernel users that need it. Remove the export.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111623.3.I14b22b5eb15ca8f3812ab33e96621231304dc1f7@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
As of the patch ("MIPS: kdb: Remove old workaround for backtracing on
other CPUs") there is no reason for kdb_current_regs to be in the
public "kdb.h". Let's move it next to kdb_current_task.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111623.2.Iadbfb484e90b557cc4b5ac9890bfca732cd99d77@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Currently if sequences such as "\ehelp\r" are delivered to the console then
the h gets eaten by the escape handling code. Since pressing escape
becomes something of a nervous twitch for vi users (and that escape doesn't
have much effect at a shell prompt) it is more helpful to emit the 'h' than
the '\e'.
We don't simply choose to emit the final character for all escape sequences
since that will do odd things for unsupported escape sequences (in
other words we retain the existing behaviour once we see '\e[').
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191025073328.643-6-daniel.thompson@linaro.org
Currently if an escape timer is interrupted by a character from a
different input source then the new character is discarded and the
function returns '\e' (which will be discarded by the level above).
It is hard to see why this would ever be the desired behaviour.
Fix this to return the new character rather than the '\e'.
This is a bigger refactor than might be expected because the new
character needs to go through escape sequence detection.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191025073328.643-5-daniel.thompson@linaro.org
kdb_read() contains special case logic to force it exit after reading
a single character. We can remove all the special case logic by directly
calling the function to read a single character instead. This also
allows us to tidy up the function prototype which, because it now matches
getchar(), we can also rename in order to make its role clearer.
This does involve some extra code to handle btaprompt properly but we
don't mind the new lines of code here because the old code had some
interesting problems (bad newline handling, treating unexpected
characters like <cr>).
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191025073328.643-4-daniel.thompson@linaro.org
Currently kdb_read_get_key() contains complex control flow that, on
close inspection, turns out to be unnecessary. In particular:
1. It is impossible to enter the branch conditioned on (escape_delay == 1)
except when the loop enters with (escape_delay == 2) allowing us to
combine the branches.
2. Most of the code conditioned on (escape_delay == 2) simply modifies
local data and then breaks out of the loop causing the function to
return escape_data[0].
3. Based on #2 there is not actually any need to ever explicitly set
escape_delay to 2 because we it is much simpler to directly return
escape_data[0] instead.
4. escape_data[0] is, for all but one exit path, known to be '\e'.
Simplify the code based on these observations.
There is a subtle (and harmless) change of behaviour resulting from this
simplification: instead of letting the escape timeout after ~1998
milliseconds we now timeout after ~2000 milliseconds
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191025073328.643-3-daniel.thompson@linaro.org
kdb_read_get_key() has extremely complex break/continue control flow
managed by state variables and is very hard to review or modify. In
particular the way the escape sequence handling interacts with the
general control flow is hard to follow. Separate out the escape key
handling, without changing the control flow. This makes the main body of
the code easier to review.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191025073328.643-2-daniel.thompson@linaro.org
Recent versions of gcc (reported on gcc-7.4) issue array subscript
warnings for builds where SMP is not enabled.
kernel/debug/debug_core.c: In function 'kdb_dump_stack_on_cpu':
kernel/debug/debug_core.c:452:17: warning: array subscript is outside array
+bounds [-Warray-bounds]
if (!(kgdb_info[cpu].exception_state & DCPU_IS_SLAVE)) {
~~~~~~~~~^~~~~
kernel/debug/debug_core.c:469:33: warning: array subscript is outside array
+bounds [-Warray-bounds]
kgdb_info[cpu].exception_state |= DCPU_WANT_BT;
kernel/debug/debug_core.c:470:18: warning: array subscript is outside array
+bounds [-Warray-bounds]
while (kgdb_info[cpu].exception_state & DCPU_WANT_BT)
There is no bug here but there is scope to improve the code
generation for non-SMP systems (whilst also silencing the warning).
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 2277b49258 ("kdb: Fix stack crawling on 'running' CPUs that aren't the master")
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20191021101057.23861-1-daniel.thompson@linaro.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
give you the right info. Specifically on many architectures
(including arm64, where I tested) you can't dump the stack of a
"running" process that isn't the process running on the current CPU.
This can be seen by this:
echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
# wait 2 seconds
<sysrq>g
Here's what I see now on rk3399-gru-kevin. I see the stack crawl for
the CPU that handled the sysrq but everything else just shows me stuck
in __switch_to() which is bogus:
======
[0]kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0, 1-3(I), 4, 5(I)
Stack traceback for pid 0
0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0
Call trace:
dump_backtrace+0x0/0x138
...
kgdb_compiled_brk_fn+0x34/0x44
...
sysrq_handle_dbg+0x34/0x5c
Stack traceback for pid 0
0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65616c0
Stack traceback for pid 0
0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65806c0
Stack traceback for pid 0
0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f659f6c0
Stack traceback for pid 1474
0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash
Call trace:
__switch_to+0x1e4/0x240
__schedule+0x464/0x618
0xffffffc0dde8b040
Stack traceback for pid 0
0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65dd6c0
===
The problem is that 'btc' eventually boils down to
show_stack(task_struct, NULL);
...and show_stack() doesn't work for "running" CPUs because their
registers haven't been stashed.
On x86 things might work better (I haven't tested) because kdb has a
special case for x86 in kdb_show_stack() where it passes the stack
pointer to show_stack(). This wouldn't work on arm64 where the stack
crawling function seems needs the "fp" and "pc", not the "sp" which is
presumably why arm64's show_stack() function totally ignores the "sp"
parameter.
NOTE: we _can_ get a good stack dump for all the cpus if we manually
switch each one to the kdb master and do a back trace. AKA:
cpu 4
bt
...will give the expected trace. That's because now arm64's
dump_backtrace will now see that "tsk == current" and go through a
different path.
In this patch I fix the problems by catching a request to stack crawl
a task that's running on a CPU and then I ask that CPU to do the stack
crawl.
NOTE: this will (presumably) change what stack crawls are printed for
x86 machines. Now kdb functions will show up in the stack crawl.
Presumably this is OK but if it's not we can go back and add a special
case for x86 again.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
rounded up that I'd crash. I was going to copy the same fix from
commit 162bc7f5af ("kdb: Don't back trace on a cpu that didn't round
up") into the "not all the CPUs" case, but decided it'd be better to
clean things up a little bit.
This consolidates the two code paths. It is _slightly_ wasteful in in
that the checks for "cpu" being too small or being offline isn't
really needed when we're iterating over all online CPUs, but that
really shouldn't hurt. Better to have the same code path.
While at it, eliminate at least one slightly ugly (and totally
needless) recursive use of kdb_parse().
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The kdb_bt1() had a mysterious "argcount" parameter passed in (always
the number 5, by the way) and never used. Presumably this is just old
cruft. Remove it. While at it, upgrade the btaprompt parameter to a
full fledged bool instead of an int.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
From doing a 'git log --patch kernel/debug', it looks as if DCPU_SSTEP
has never been used. Presumably it used to be used back when kgdb was
out of tree and nobody thought to delete the definition when the usage
went away. Delete.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Right now kgdb/kdb hooks up to debug panics by registering for the panic
notifier. This works OK except that it means that kgdb/kdb gets called
_after_ the CPUs in the system are taken offline. That means that if
anything important was happening on those CPUs (like something that might
have contributed to the panic) you can't debug them.
Specifically I ran into a case where I got a panic because a task was
"blocked for more than 120 seconds" which was detected on CPU 2. I nicely
got shown stack traces in the kernel log for all CPUs including CPU 0,
which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of
__mmc_switch().
I then ended up at the kdb prompt where switched over to kgdb to try to
look at local variables of the process on CPU 0. I found that I couldn't.
Digging more, I found that I had no info on any tasks running on CPUs
other than CPU 2 and that asking kdb for help showed me "Error: no saved
data for this cpu". This was because all the CPUs were offline.
Let's move the entry of kdb/kgdb to a direct call from panic() and stop
using the generic notifier. Putting a direct call in allows us to order
things more properly and it also doesn't seem like we're breaking any
abstractions by calling into the debugger from the panic function.
Daniel said:
: This patch changes the way kdump and kgdb interact with each other.
: However it would seem rather odd to have both tools simultaneously armed
: and, even if they were, the user still has the option to use panic_timeout
: to force a kdump to happen. Thus I think the change of order is
: acceptable.
Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment that says that module_event() is not static is clearly
wrong.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
strncmp(str, const, len) is error-prone.
We had better use newly introduced
str_has_prefix() instead of it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The strncpy() function may leave the destination string buffer
unterminated, better use strscpy() instead.
This fixes the following warning with gcc 8.2:
kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr':
kernel/debug/kdb/kdb_io.c:449:3: warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation]
strncpy(kdb_prompt_str, prompt, CMD_BUFLEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The "whichcpu" comes from argv[3]. The cpu_online() macro looks up the
cpu in a bitmap of online cpus, but if the value is too high then it
could read beyond the end of the bitmap and possibly Oops.
Fixes: 5d5314d679 ("kdb: core for kgdb back end (1 of 2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If you drop into kdb and type "summary", it prints out a line that
says this:
ccversion CCVERSION
...and I don't mean that it actually prints out the version of the C
compiler. It literally prints out the string "CCVERSION".
The version of the C Compiler is already printed at boot up and it
doesn't seem useful to replicate this in kdb. Let's just delete it.
We can also delete the bit of the Makefile that called the C compiler
in an attempt to pass this into kdb. This will remove one extra call
to the C compiler at Makefile parse time and (very slightly) speed up
builds.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The strcpy() function is being deprecated. Replace it by the safer
strscpy() and fix the following Coverity warning:
"You might overrun the 129-character fixed-size string ks_namebuf
by copying name without checking the length."
Addresses-Coverity-ID: 138995 ("Copy into fixed size buffer")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The strcpy() function is being deprecated. Replace it by the safer
strscpy() and fix the following Coverity warning:
"You might overrun the 1024-character fixed-size string remcom_in_buffer
by copying cmd without checking the length."
Addresses-Coverity-ID: 138999 ("Copy into fixed size buffer")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
kernel/debug/gdbstub.c: In function ‘gdb_serial_stub’:
kernel/debug/gdbstub.c:1031:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (remcom_in_buffer[1] == '\0') {
^
kernel/debug/gdbstub.c:1036:3: note: here
case 'C': /* Exception passing */
^~~~
kernel/debug/gdbstub.c:1040:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (tmp == 0)
^
kernel/debug/gdbstub.c:1043:3: note: here
case 'c': /* Continue packet */
^~~~
kernel/debug/gdbstub.c:1050:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
dbg_activate_sw_breakpoints();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/debug/gdbstub.c:1052:3: note: here
default:
^~~~~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
defcmd_in_progress is the state trace for command group processing
- within a command group or not - usable is an indicator if a command
set is valid (allocated/non-empty) - so use a bool for those binary
indication here.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If you have a CPU that fails to round up and then run 'btc' you'll end
up crashing in kdb becaue we dereferenced NULL. Let's add a check.
It's wise to also set the task to NULL when leaving the debugger so
that if we fail to round up on a later entry into the debugger we
won't backtrace a stale task.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If we're using the default implementation of kgdb_roundup_cpus() that
uses smp_call_function_single_async() we can end up hanging
kgdb_roundup_cpus() if we try to round up a CPU that failed to round
up before.
Specifically smp_call_function_single_async() will try to wait on the
csd lock for the CPU that we're trying to round up. If the previous
round up never finished then that lock could still be held and we'll
just sit there hanging.
There's not a lot of use trying to round up a CPU that failed to round
up before. Let's keep a flag that indicates whether the CPU started
but didn't finish to round up before. If we see that flag set then
we'll skip the next round up.
In general we have a few goals here:
- We never want to end up calling smp_call_function_single_async()
when the csd is still locked. This is accomplished because
flush_smp_call_function_queue() unlocks the csd _before_ invoking
the callback. That means that when kgdb_nmicallback() runs we know
for sure the the csd is no longer locked. Thus when we set
"rounding_up = false" we know for sure that the csd is unlocked.
- If there are no timeouts rounding up we should never skip a round
up.
NOTE #1: In general trying to continue running after failing to round
up CPUs doesn't appear to be supported in the debugger. When I
simulate this I find that kdb reports "Catastrophic error detected"
when I try to continue. I can overrule and continue anyway, but it
should be noted that we may be entering the land of dragons here.
Possibly the "Catastrophic error detected" was added _because_ of the
future failure to round up, but even so this is an area of the code
that hasn't been strongly tested.
NOTE #2: I did a bit of testing before and after this change. I
introduced a 10 second hang in the kernel while holding a spinlock
that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...".
Before this change if I did:
- Invoke hang
- Enter debugger
- g (which warns about Catastrophic error, g again to go anyway)
- g
- Enter debugger
...I'd hang the rest of the 10 seconds without getting a debugger
prompt. After this change I end up in the debugger the 2nd time after
only 1 second with the standard warning about 'Timed out waiting for
secondary CPUs.'
I'll also note that once the CPU finished waiting I could actually
debug it (aka "btc" worked)
I won't promise that everything works perfectly if the errant CPU
comes back at just the wrong time (like as we're entering or exiting
the debugger) but it certainly seems like an improvement.
NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in
kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some
implementations override kgdb_call_nmi_hook(). It shouldn't hurt to
have it in kgdb_nmicallback() in any case.
NOTE #4: this logic is really only needed because there is no API call
like "smp_try_call_function_single_async()" or "smp_csd_is_locked()".
If such an API existed then we'd use it instead, but it seemed a bit
much to add an API like this just for kgdb.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system. Specifically it hit:
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
Specifically it looked like this:
sysrq: SysRq : DEBUG
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
pstate: 604003c9 (nZCv DAIF +PAN -UAO)
pc : lockdep_hardirqs_on+0xf0/0x160
...
Call trace:
lockdep_hardirqs_on+0xf0/0x160
trace_hardirqs_on+0x188/0x1ac
kgdb_roundup_cpus+0x14/0x3c
kgdb_cpu_enter+0x53c/0x5cc
kgdb_handle_exception+0x180/0x1d4
kgdb_compiled_brk_fn+0x30/0x3c
brk_handler+0x134/0x178
do_debug_exception+0xfc/0x178
el1_dbg+0x18/0x78
kgdb_breakpoint+0x34/0x58
sysrq_handle_dbg+0x54/0x5c
__handle_sysrq+0x114/0x21c
handle_sysrq+0x30/0x3c
qcom_geni_serial_isr+0x2dc/0x30c
...
...
irq event stamp: ...45
hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
---[ end trace adf21f830c46e638 ]---
Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.
Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs. Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.
In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.
Looking at all the people who previously had copies of this code,
there were a few variants. I've attempted to keep the variants
working like they used to. Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
get_irq_regs().
* For arch/mips there was a bit of extra code around
kgdb_nmicallback()
NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before. We'll try to round it up
again and potentially hang when we try to grab the csd lock. That's
not new behavior but we'll still try to do better in a future patch.
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:
> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().
Nobody used those flags. Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them. So we can definitely remove the flags.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>