2019-05-19 15:08:55 +03:00
// SPDX-License-Identifier: GPL-2.0-only
2005-04-17 02:20:36 +04:00
/*
* linux / kernel / printk . c
*
* Copyright ( C ) 1991 , 1992 Linus Torvalds
*
* Modified to make sys_syslog ( ) more flexible : added commands to
* return the last 4 k of kernel messages , regardless of whether
* they ' ve been read or not . Added option to suppress kernel printk ' s
* to the console . Added hook for sending the console messages
* elsewhere , in preparation for a serial line console ( someday ) .
* Ted Ts ' o , 2 / 11 / 93.
* Modified for sysctl support , 1 / 8 / 97 , Chris Horn .
2005-10-31 02:02:46 +03:00
* Fixed SMP synchronization , 08 / 08 / 99 , Manfred Spraul
2006-01-15 04:43:54 +03:00
* manfred @ colorfullife . com
2005-04-17 02:20:36 +04:00
* Rewrote bits to get rid of console_lock
2008-10-16 09:01:59 +04:00
* 01 Mar01 Andrew Morton
2005-04-17 02:20:36 +04:00
*/
2018-09-29 19:45:52 +03:00
# define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
2005-04-17 02:20:36 +04:00
# include <linux/kernel.h>
# include <linux/mm.h>
# include <linux/tty.h>
# include <linux/tty_driver.h>
# include <linux/console.h>
# include <linux/init.h>
2007-10-16 12:23:46 +04:00
# include <linux/jiffies.h>
# include <linux/nmi.h>
2005-04-17 02:20:36 +04:00
# include <linux/module.h>
2006-06-25 16:48:15 +04:00
# include <linux/moduleparam.h>
2005-04-17 02:20:36 +04:00
# include <linux/delay.h>
# include <linux/smp.h>
# include <linux/security.h>
2011-05-25 04:13:20 +04:00
# include <linux/memblock.h>
2005-04-17 02:20:36 +04:00
# include <linux/syscalls.h>
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
# include <linux/crash_core.h>
2009-09-22 18:18:09 +04:00
# include <linux/ratelimit.h>
2009-10-16 16:09:18 +04:00
# include <linux/kmsg_dump.h>
2010-02-04 02:36:43 +03:00
# include <linux/syslog.h>
2010-06-04 09:11:25 +04:00
# include <linux/cpu.h>
2011-01-13 03:59:43 +03:00
# include <linux/rculist.h>
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
# include <linux/poll.h>
2012-10-12 20:00:23 +04:00
# include <linux/irq_work.h>
2014-08-07 03:09:08 +04:00
# include <linux/ctype.h>
2015-02-22 19:58:50 +03:00
# include <linux/uio.h>
2017-02-01 18:36:40 +03:00
# include <linux/sched/clock.h>
2017-02-08 20:51:35 +03:00
# include <linux/sched/debug.h>
2017-02-08 20:51:37 +03:00
# include <linux/sched/task_stack.h>
2005-04-17 02:20:36 +04:00
2016-12-24 22:46:01 +03:00
# include <linux/uaccess.h>
2016-08-03 00:03:59 +03:00
# include <asm/sections.h>
2005-04-17 02:20:36 +04:00
2018-03-23 03:33:28 +03:00
# include <trace/events/initcall.h>
2011-11-24 23:03:08 +04:00
# define CREATE_TRACE_POINTS
# include <trace/events/printk.h>
2020-07-09 16:23:44 +03:00
# include "printk_ringbuffer.h"
2013-08-01 00:53:44 +04:00
# include "console_cmdline.h"
2013-08-01 00:53:45 +04:00
# include "braille.h"
2016-05-21 03:00:33 +03:00
# include "internal.h"
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
int console_printk [ 4 ] = {
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_DEFAULT , /* console_loglevel */
2014-08-07 03:09:01 +04:00
MESSAGE_LOGLEVEL_DEFAULT , /* default_message_loglevel */
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_MIN , /* minimum_console_loglevel */
CONSOLE_LOGLEVEL_DEFAULT , /* default_console_loglevel */
2005-04-17 02:20:36 +04:00
} ;
2019-02-08 21:24:49 +03:00
EXPORT_SYMBOL_GPL ( console_printk ) ;
2005-04-17 02:20:36 +04:00
2018-07-31 14:06:57 +03:00
atomic_t ignore_console_lock_warning __read_mostly = ATOMIC_INIT ( 0 ) ;
EXPORT_SYMBOL ( ignore_console_lock_warning ) ;
2023-04-13 13:08:59 +03:00
EXPORT_TRACEPOINT_SYMBOL_GPL ( console ) ;
2005-04-17 02:20:36 +04:00
/*
2007-02-17 22:10:16 +03:00
* Low level drivers may need that to know if they can schedule in
2005-04-17 02:20:36 +04:00
* their unblank ( ) callback or not . So let ' s export it .
*/
int oops_in_progress ;
EXPORT_SYMBOL ( oops_in_progress ) ;
/*
2022-11-21 14:10:12 +03:00
* console_mutex protects console_list updates and console - > flags updates .
* The flags are synchronized only for consoles that are registered , i . e .
* accessible via the console list .
*/
static DEFINE_MUTEX ( console_mutex ) ;
2005-04-17 02:20:36 +04:00
/*
2023-07-17 22:46:06 +03:00
* console_sem protects updates to console - > seq
2022-11-16 19:21:51 +03:00
* and also provides serialization for console printing .
2005-04-17 02:20:36 +04:00
*/
2023-03-29 13:14:42 +03:00
static DEFINE_SEMAPHORE ( console_sem , 1 ) ;
2022-11-16 19:21:14 +03:00
HLIST_HEAD ( console_list ) ;
EXPORT_SYMBOL_GPL ( console_list ) ;
2022-11-16 19:21:15 +03:00
DEFINE_STATIC_SRCU ( console_srcu ) ;
2008-06-02 15:19:08 +04:00
panic: avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack,
then the ending msg like:
[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------
The above message are very useful for debugging.
But if system is configured to not reboot on panic, say the
"panic_timeout" parameter equals 0, it will likely print out many noisy
message like WARN() call stack for each and every CPU except the panic
one, messages like below:
WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:
<IRQ>
try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages,
which makes debugging much more difficult, as the original context gets
lost on screen.
Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real
panic call stack and context.
Adding a flag "suppress_printk" which gets set in panic() to avoid those
noisy messages, without changing current kernel behavior that both panic
blinking and sysrq magic key can work as is, suggested by Petr Mladek.
To verify this, make sure kernel is not configured to reboot on panic and
in console
# echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.
Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-15 01:45:34 +03:00
/*
* System may need to suppress printk message under certain
* circumstances , like after kernel panic happens .
*/
int __read_mostly suppress_printk ;
console: implement lockdep support for console_lock
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-22 21:52:11 +04:00
# ifdef CONFIG_LOCKDEP
static struct lockdep_map console_lock_dep_map = {
. name = " console_lock "
} ;
2022-11-21 14:10:12 +03:00
void lockdep_assert_console_list_lock_held ( void )
{
lockdep_assert_held ( & console_mutex ) ;
}
EXPORT_SYMBOL ( lockdep_assert_console_list_lock_held ) ;
console: implement lockdep support for console_lock
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-22 21:52:11 +04:00
# endif
2022-11-16 19:21:15 +03:00
# ifdef CONFIG_DEBUG_LOCK_ALLOC
bool console_srcu_read_lock_is_held ( void )
{
return srcu_read_lock_held ( & console_srcu ) ;
}
2023-01-12 19:12:13 +03:00
EXPORT_SYMBOL ( console_srcu_read_lock_is_held ) ;
console: implement lockdep support for console_lock
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-22 21:52:11 +04:00
# endif
2016-08-03 00:04:07 +03:00
enum devkmsg_log_bits {
__DEVKMSG_LOG_BIT_ON = 0 ,
__DEVKMSG_LOG_BIT_OFF ,
__DEVKMSG_LOG_BIT_LOCK ,
} ;
enum devkmsg_log_masks {
DEVKMSG_LOG_MASK_ON = BIT ( __DEVKMSG_LOG_BIT_ON ) ,
DEVKMSG_LOG_MASK_OFF = BIT ( __DEVKMSG_LOG_BIT_OFF ) ,
DEVKMSG_LOG_MASK_LOCK = BIT ( __DEVKMSG_LOG_BIT_LOCK ) ,
} ;
/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
# define DEVKMSG_LOG_MASK_DEFAULT 0
static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT ;
static int __control_devkmsg ( char * str )
{
2019-08-09 10:10:34 +03:00
size_t len ;
2016-08-03 00:04:07 +03:00
if ( ! str )
return - EINVAL ;
2019-08-09 10:10:34 +03:00
len = str_has_prefix ( str , " on " ) ;
if ( len ) {
2016-08-03 00:04:07 +03:00
devkmsg_log = DEVKMSG_LOG_MASK_ON ;
2019-08-09 10:10:34 +03:00
return len ;
}
len = str_has_prefix ( str , " off " ) ;
if ( len ) {
2016-08-03 00:04:07 +03:00
devkmsg_log = DEVKMSG_LOG_MASK_OFF ;
2019-08-09 10:10:34 +03:00
return len ;
}
len = str_has_prefix ( str , " ratelimit " ) ;
if ( len ) {
2016-08-03 00:04:07 +03:00
devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT ;
2019-08-09 10:10:34 +03:00
return len ;
2016-08-03 00:04:07 +03:00
}
2019-08-09 10:10:34 +03:00
2016-08-03 00:04:07 +03:00
return - EINVAL ;
}
static int __init control_devkmsg ( char * str )
{
2022-03-01 01:05:56 +03:00
if ( __control_devkmsg ( str ) < 0 ) {
pr_warn ( " printk.devkmsg: bad option string '%s' \n " , str ) ;
2016-08-03 00:04:07 +03:00
return 1 ;
2022-03-01 01:05:56 +03:00
}
2016-08-03 00:04:07 +03:00
/*
* Set sysctl string accordingly :
*/
2018-01-19 07:39:01 +03:00
if ( devkmsg_log = = DEVKMSG_LOG_MASK_ON )
strcpy ( devkmsg_log_str , " on " ) ;
else if ( devkmsg_log = = DEVKMSG_LOG_MASK_OFF )
strcpy ( devkmsg_log_str , " off " ) ;
2016-08-03 00:04:07 +03:00
/* else "ratelimit" which is set by default. */
/*
* Sysctl cannot change it anymore . The kernel command line setting of
* this parameter is to force the setting to be permanent throughout the
* runtime of the system . This is a precation measure against userspace
* trying to be a smarta * * and attempting to change it up on us .
*/
devkmsg_log | = DEVKMSG_LOG_MASK_LOCK ;
2022-03-01 01:05:56 +03:00
return 1 ;
2016-08-03 00:04:07 +03:00
}
__setup ( " printk.devkmsg= " , control_devkmsg ) ;
char devkmsg_log_str [ DEVKMSG_STR_MAX_SIZE ] = " ratelimit " ;
2022-01-22 09:13:34 +03:00
# if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
2016-08-03 00:04:07 +03:00
int devkmsg_sysctl_set_loglvl ( struct ctl_table * table , int write ,
2020-04-24 09:43:38 +03:00
void * buffer , size_t * lenp , loff_t * ppos )
2016-08-03 00:04:07 +03:00
{
char old_str [ DEVKMSG_STR_MAX_SIZE ] ;
unsigned int old ;
int err ;
if ( write ) {
if ( devkmsg_log & DEVKMSG_LOG_MASK_LOCK )
return - EINVAL ;
old = devkmsg_log ;
strncpy ( old_str , devkmsg_log_str , DEVKMSG_STR_MAX_SIZE ) ;
}
err = proc_dostring ( table , write , buffer , lenp , ppos ) ;
if ( err )
return err ;
if ( write ) {
err = __control_devkmsg ( devkmsg_log_str ) ;
/*
* Do not accept an unknown string OR a known string with
* trailing crap . . .
*/
if ( err < 0 | | ( err + 1 ! = * lenp ) ) {
/* ... and restore old setting. */
devkmsg_log = old ;
strncpy ( devkmsg_log_str , old_str , DEVKMSG_STR_MAX_SIZE ) ;
return - EINVAL ;
}
}
return 0 ;
}
2022-01-22 09:13:34 +03:00
# endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
2016-08-03 00:04:07 +03:00
2022-11-21 14:10:12 +03:00
/**
* console_list_lock - Lock the console list
*
* For console list or console - > flags updates
*/
void console_list_lock ( void )
{
/*
2022-11-16 19:21:44 +03:00
* In unregister_console ( ) and console_force_preferred_locked ( ) ,
* synchronize_srcu ( ) is called with the console_list_lock held .
* Therefore it is not allowed that the console_list_lock is taken
* with the srcu_lock held .
2022-11-21 14:10:12 +03:00
*
* Detecting if this context is really in the read - side critical
* section is only possible if the appropriate debug options are
* enabled .
*/
WARN_ON_ONCE ( debug_lockdep_rcu_enabled ( ) & &
srcu_read_lock_held ( & console_srcu ) ) ;
mutex_lock ( & console_mutex ) ;
}
EXPORT_SYMBOL ( console_list_lock ) ;
/**
* console_list_unlock - Unlock the console list
*
* Counterpart to console_list_lock ( )
*/
void console_list_unlock ( void )
{
mutex_unlock ( & console_mutex ) ;
}
EXPORT_SYMBOL ( console_list_unlock ) ;
2022-11-16 19:21:15 +03:00
/**
* console_srcu_read_lock - Register a new reader for the
* SRCU - protected console list
*
* Use for_each_console_srcu ( ) to iterate the console list
*
* Context : Any context .
2022-11-22 11:55:23 +03:00
* Return : A cookie to pass to console_srcu_read_unlock ( ) .
2022-11-16 19:21:15 +03:00
*/
int console_srcu_read_lock ( void )
{
return srcu_read_lock_nmisafe ( & console_srcu ) ;
}
EXPORT_SYMBOL ( console_srcu_read_lock ) ;
/**
* console_srcu_read_unlock - Unregister an old reader from
* the SRCU - protected console list
2022-11-22 11:55:23 +03:00
* @ cookie : cookie returned from console_srcu_read_lock ( )
2022-11-16 19:21:15 +03:00
*
* Counterpart to console_srcu_read_lock ( )
*/
void console_srcu_read_unlock ( int cookie )
{
srcu_read_unlock_nmisafe ( & console_srcu , cookie ) ;
}
EXPORT_SYMBOL ( console_srcu_read_unlock ) ;
2014-06-05 03:11:36 +04:00
/*
* Helper macros to handle lockdep when locking / unlocking console_sem . We use
* macros instead of functions so that _RET_IP_ contains useful information .
*/
# define down_console_sem() do { \
down ( & console_sem ) ; \
mutex_acquire ( & console_lock_dep_map , 0 , 0 , _RET_IP_ ) ; \
} while ( 0 )
static int __down_trylock_console_sem ( unsigned long ip )
{
2016-12-27 17:16:09 +03:00
int lock_failed ;
unsigned long flags ;
/*
* Here and in __up_console_sem ( ) we need to be in safe mode ,
* because spindump / WARN / etc from under console - > lock will
* deadlock in printk ( ) - > down_trylock_console_sem ( ) otherwise .
*/
printk_safe_enter_irqsave ( flags ) ;
lock_failed = down_trylock ( & console_sem ) ;
printk_safe_exit_irqrestore ( flags ) ;
if ( lock_failed )
2014-06-05 03:11:36 +04:00
return 1 ;
mutex_acquire ( & console_lock_dep_map , 0 , 1 , ip ) ;
return 0 ;
}
# define down_trylock_console_sem() __down_trylock_console_sem(_RET_IP_)
2016-12-27 17:16:09 +03:00
static void __up_console_sem ( unsigned long ip )
{
unsigned long flags ;
2019-09-19 19:09:40 +03:00
mutex_release ( & console_lock_dep_map , ip ) ;
2016-12-27 17:16:09 +03:00
printk_safe_enter_irqsave ( flags ) ;
up ( & console_sem ) ;
printk_safe_exit_irqrestore ( flags ) ;
}
# define up_console_sem() __up_console_sem(_RET_IP_)
2014-06-05 03:11:36 +04:00
2022-02-02 20:18:18 +03:00
static bool panic_in_progress ( void )
{
return unlikely ( atomic_read ( & panic_cpu ) ! = PANIC_CPU_INVALID ) ;
}
2022-06-23 17:51:54 +03:00
/*
* This is used for debugging the mess that is the VT code by
* keeping track if we have the console semaphore held . It ' s
* definitely not the perfect debug tool ( we don ' t know if _WE_
* hold it and are racing , but it helps tracking those weird code
* paths in the console code where we end up in places I want
* locked without the console semaphore held ) .
*/
2023-07-17 22:46:06 +03:00
static int console_locked ;
2005-04-17 02:20:36 +04:00
/*
* Array of consoles built from command line options ( console = )
*/
# define MAX_CMDLINECONSOLES 8
static struct console_cmdline console_cmdline [ MAX_CMDLINECONSOLES ] ;
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
static int preferred_console = - 1 ;
xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 02:31:07 +04:00
int console_set_on_cmdline ;
EXPORT_SYMBOL ( console_set_on_cmdline ) ;
2005-04-17 02:20:36 +04:00
/* Flag: console code may call schedule() */
static int console_may_schedule ;
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 08:41:49 +03:00
enum con_msg_format_flags {
MSG_FORMAT_DEFAULT = 0 ,
MSG_FORMAT_SYSLOG = ( 1 < < 0 ) ,
} ;
static int console_msg_format = MSG_FORMAT_DEFAULT ;
2012-05-03 04:29:13 +04:00
/*
2020-07-09 16:23:44 +03:00
* The printk log buffer consists of a sequenced collection of records , each
2020-09-21 14:18:45 +03:00
* containing variable length message text . Every record also contains its
* own meta - data ( @ info ) .
2012-05-03 04:29:13 +04:00
*
2020-07-09 16:23:44 +03:00
* Every record meta - data carries the timestamp in microseconds , as well as
* the standard userspace syslog level and syslog facility . The usual kernel
* messages use LOG_KERN ; userspace - injected messages always carry a matching
* syslog facility , by default LOG_USER . The origin of every message can be
* reliably determined that way .
2012-05-03 04:29:13 +04:00
*
2020-07-09 16:23:44 +03:00
* The human readable log message of a record is available in @ text , the
* length of the message text in @ text_len . The stored message is not
* terminated .
2012-05-03 04:29:13 +04:00
*
2020-07-09 16:23:44 +03:00
* Optionally , a record can carry a dictionary of properties ( key / value
2020-09-21 14:18:45 +03:00
* pairs ) , to provide userspace with a machine - readable message context .
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
*
* Examples for well - defined , commonly used property names are :
* DEVICE = b12 : 8 device identifier
* b12 : 8 block dev_t
* c127 : 3 char dev_t
* n8 netdev ifindex
* + sound : card0 subsystem : devname
* SUBSYSTEM = pci driver - core subsystem name
*
2020-09-21 14:18:45 +03:00
* Valid characters in property names are [ a - zA - Z0 - 9. - _ ] . Property names
* and values are terminated by a ' \0 ' character .
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
*
2020-07-09 16:23:44 +03:00
* Example of record values :
2020-09-21 14:18:45 +03:00
* record . text_buf = " it's a line " ( unterminated )
* record . info . seq = 56
* record . info . ts_nsec = 36863
* record . info . text_len = 11
* record . info . facility = 0 ( LOG_KERN )
* record . info . flags = 0
* record . info . level = 3 ( LOG_ERR )
* record . info . caller_id = 299 ( task 299 )
* record . info . dev_info . subsystem = " pci " ( terminated )
* record . info . dev_info . device = " +pci:0000:00:01.0 " ( terminated )
2020-07-09 16:23:44 +03:00
*
* The ' struct printk_info ' buffer must never be directly exported to
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
* userspace , it is a kernel - private implementation detail that might
* need to be changed in the future , when the requirements change .
*
* / dev / kmsg exports the structured data in the following line format :
2015-07-01 00:59:03 +03:00
* " <level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text> \n "
*
* Users of the export format should ignore possible additional values
* separated by ' , ' , and find the message after the ' ; ' character .
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
*
* The optional key / value pairs are attached as continuation lines starting
* with a space character and terminated by a newline . All possible
* non - prinatable characters are escaped in the " \xff " notation .
2012-05-03 04:29:13 +04:00
*/
2021-03-03 13:15:23 +03:00
/* syslog_lock protects syslog_* variables and write access to clear_seq. */
2021-07-15 22:33:58 +03:00
static DEFINE_MUTEX ( syslog_lock ) ;
2021-03-03 13:15:23 +03:00
2012-07-17 05:35:29 +04:00
# ifdef CONFIG_PRINTK
2023-09-20 18:52:38 +03:00
/*
* During panic , heavy printk by other CPUs can delay the
* panic and risk deadlock on console resources .
*/
static int __read_mostly suppress_panic_printk ;
2013-03-23 02:04:39 +04:00
DECLARE_WAIT_QUEUE_HEAD ( log_wait ) ;
2021-03-03 13:15:23 +03:00
/* All 3 protected by @syslog_lock. */
2012-05-09 03:37:51 +04:00
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq ;
2012-07-09 21:05:10 +04:00
static size_t syslog_partial ;
2018-12-04 13:00:01 +03:00
static bool syslog_time ;
2012-05-03 04:29:13 +04:00
2021-03-03 13:15:21 +03:00
struct latched_seq {
seqcount_latch_t latch ;
u64 val [ 2 ] ;
} ;
/*
* The next printk record to read after the last ' clear ' command . There are
* two copies ( updated with seqcount_latch ) so that reads can locklessly
2021-03-03 13:15:23 +03:00
* access a valid value . Writers are synchronized by @ syslog_lock .
2021-03-03 13:15:21 +03:00
*/
static struct latched_seq clear_seq = {
. latch = SEQCNT_LATCH_ZERO ( clear_seq . latch ) ,
. val [ 0 ] = 0 ,
. val [ 1 ] = 0 ,
} ;
2012-05-03 04:29:13 +04:00
2015-11-07 03:30:38 +03:00
# define LOG_LEVEL(v) ((v) & 0x07)
# define LOG_FACILITY(v) ((v) >> 3 & 0xff)
2012-05-09 03:37:51 +04:00
/* record buffer */
2020-07-09 16:23:44 +03:00
# define LOG_ALIGN __alignof__(unsigned long)
2012-05-09 03:37:51 +04:00
# define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
2018-09-29 19:45:53 +03:00
# define LOG_BUF_LEN_MAX (u32)(1 << 31)
2012-05-11 02:14:33 +04:00
static char __log_buf [ __LOG_BUF_LEN ] __aligned ( LOG_ALIGN ) ;
2012-05-09 03:37:51 +04:00
static char * log_buf = __log_buf ;
static u32 log_buf_len = __LOG_BUF_LEN ;
2020-07-09 16:23:44 +03:00
/*
* Define the average message size . This only affects the number of
* descriptors that will be available . Underestimating is better than
* overestimating ( too many available descriptors is better than not enough ) .
*/
# define PRB_AVGBITS 5 /* 32 character average length */
# if CONFIG_LOG_BUF_SHIFT <= PRB_AVGBITS
# error CONFIG_LOG_BUF_SHIFT value too small.
# endif
_DEFINE_PRINTKRB ( printk_rb_static , CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS ,
2020-09-19 01:34:21 +03:00
PRB_AVGBITS , & __log_buf [ 0 ] ) ;
2020-07-09 16:23:44 +03:00
static struct printk_ringbuffer printk_rb_dynamic ;
2023-09-16 22:20:05 +03:00
struct printk_ringbuffer * prb = & printk_rb_static ;
2020-07-09 16:23:44 +03:00
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
/*
* We cannot access per - CPU data ( e . g . per - CPU flush irq_work ) before
* per_cpu_areas are initialised . This variable is set to true when
* it ' s safe to access per - CPU data .
*/
2022-09-24 03:04:41 +03:00
static bool __printk_percpu_data_ready __ro_after_init ;
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
bool printk_percpu_data_ready ( void )
{
return __printk_percpu_data_ready ;
}
2021-03-03 13:15:23 +03:00
/* Must be called under syslog_lock. */
2021-03-03 13:15:21 +03:00
static void latched_seq_write ( struct latched_seq * ls , u64 val )
{
raw_write_seqcount_latch ( & ls - > latch ) ;
ls - > val [ 0 ] = val ;
raw_write_seqcount_latch ( & ls - > latch ) ;
ls - > val [ 1 ] = val ;
}
/* Can be called from any context. */
static u64 latched_seq_read_nolock ( struct latched_seq * ls )
{
unsigned int seq ;
unsigned int idx ;
u64 val ;
do {
seq = raw_read_seqcount_latch ( & ls - > latch ) ;
idx = seq & 0x1 ;
val = ls - > val [ idx ] ;
2023-05-19 13:20:59 +03:00
} while ( raw_read_seqcount_latch_retry ( & ls - > latch , seq ) ) ;
2021-03-03 13:15:21 +03:00
return val ;
}
2014-08-09 09:45:30 +04:00
/* Return log buffer address */
char * log_buf_addr_get ( void )
{
return log_buf ;
}
/* Return log buffer size */
u32 log_buf_len_get ( void )
{
return log_buf_len ;
}
2014-06-05 03:11:32 +04:00
/*
* Define how much of the log buffer we could take at maximum . The value
* must be greater than two . Note that only half of the buffer is available
* when the index points to the middle .
*/
# define MAX_LOG_TAKE_PART 4
static const char trunc_msg [ ] = " <truncated> " ;
2020-07-09 16:23:44 +03:00
static void truncate_msg ( u16 * text_len , u16 * trunc_msg_len )
2014-06-05 03:11:32 +04:00
{
/*
* The message should not take the whole buffer . Otherwise , it might
* get removed too soon .
*/
u32 max_text_len = log_buf_len / MAX_LOG_TAKE_PART ;
2020-07-09 16:23:44 +03:00
2014-06-05 03:11:32 +04:00
if ( * text_len > max_text_len )
* text_len = max_text_len ;
2020-07-09 16:23:44 +03:00
/* enable the warning message (if there is room) */
2014-06-05 03:11:32 +04:00
* trunc_msg_len = strlen ( trunc_msg ) ;
2020-07-09 16:23:44 +03:00
if ( * text_len > = * trunc_msg_len )
* text_len - = * trunc_msg_len ;
else
* trunc_msg_len = 0 ;
2014-06-05 03:11:32 +04:00
}
2014-08-07 03:09:05 +04:00
int dmesg_restrict = IS_ENABLED ( CONFIG_SECURITY_DMESG_RESTRICT ) ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
static int syslog_action_restricted ( int type )
{
if ( dmesg_restrict )
return 1 ;
/*
* Unless restricted , we allow " read all " and " get buffer size "
* for everybody .
*/
return type ! = SYSLOG_ACTION_READ_ALL & &
type ! = SYSLOG_ACTION_SIZE_BUFFER ;
}
2017-08-10 07:11:00 +03:00
static int check_syslog_permissions ( int type , int source )
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
{
/*
* If this is from / proc / kmsg and we ' ve already opened it , then we ' ve
* already done the capabilities checks at open time .
*/
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC & & type ! = SYSLOG_ACTION_OPEN )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
if ( syslog_action_restricted ( type ) ) {
if ( capable ( CAP_SYSLOG ) )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
/*
* For historical reasons , accept CAP_SYS_ADMIN too , with
* a warning .
*/
if ( capable ( CAP_SYS_ADMIN ) ) {
pr_warn_once ( " %s (%d): Attempt to access syslog with "
" CAP_SYS_ADMIN but no CAP_SYSLOG "
" (deprecated). \n " ,
current - > comm , task_pid_nr ( current ) ) ;
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
}
return - EPERM ;
}
2015-06-26 01:01:44 +03:00
ok :
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return security_syslog ( type ) ;
}
2015-06-26 01:01:24 +03:00
static void append_char ( char * * pp , char * e , char c )
{
if ( * pp < e )
* ( * pp ) + + = c ;
}
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
2020-07-09 16:23:44 +03:00
static ssize_t info_print_ext_header ( char * buf , size_t size ,
struct printk_info * info )
2015-06-26 01:01:27 +03:00
{
2020-07-09 16:23:44 +03:00
u64 ts_usec = info - > ts_nsec ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
char caller [ 20 ] ;
# ifdef CONFIG_PRINTK_CALLER
2020-07-09 16:23:44 +03:00
u32 id = info - > caller_id ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
snprintf ( caller , sizeof ( caller ) , " ,caller=%c%u " ,
id & 0x80000000 ? ' C ' : ' T ' , id & ~ 0x80000000 ) ;
# else
caller [ 0 ] = ' \0 ' ;
# endif
2015-06-26 01:01:27 +03:00
do_div ( ts_usec , 1000 ) ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
return scnprintf ( buf , size , " %u,%llu,%llu,%c%s; " ,
2020-07-09 16:23:44 +03:00
( info - > facility < < 3 ) | info - > level , info - > seq ,
ts_usec , info - > flags & LOG_CONT ? ' c ' : ' - ' , caller ) ;
2015-06-26 01:01:27 +03:00
}
2020-09-21 14:18:45 +03:00
static ssize_t msg_add_ext_text ( char * buf , size_t size ,
const char * text , size_t text_len ,
unsigned char endc )
2015-06-26 01:01:27 +03:00
{
char * p = buf , * e = buf + size ;
size_t i ;
/* escape non-printable characters */
for ( i = 0 ; i < text_len ; i + + ) {
unsigned char c = text [ i ] ;
if ( c < ' ' | | c > = 127 | | c = = ' \\ ' )
p + = scnprintf ( p , e - p , " \\ x%02x " , c ) ;
else
append_char ( & p , e , c ) ;
}
2020-09-21 14:18:45 +03:00
append_char ( & p , e , endc ) ;
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
return p - buf ;
}
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
static ssize_t msg_add_dict_text ( char * buf , size_t size ,
const char * key , const char * val )
{
size_t val_len = strlen ( val ) ;
ssize_t len ;
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
if ( ! val_len )
return 0 ;
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
len = msg_add_ext_text ( buf , size , " " , 0 , ' ' ) ; /* dict prefix */
len + = msg_add_ext_text ( buf + len , size - len , key , strlen ( key ) , ' = ' ) ;
len + = msg_add_ext_text ( buf + len , size - len , val , val_len , ' \n ' ) ;
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
return len ;
}
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
static ssize_t msg_print_ext_body ( char * buf , size_t size ,
char * text , size_t text_len ,
struct dev_printk_info * dev_info )
{
ssize_t len ;
2015-06-26 01:01:27 +03:00
2020-09-21 14:18:45 +03:00
len = msg_add_ext_text ( buf , size , text , text_len , ' \n ' ) ;
if ( ! dev_info )
goto out ;
len + = msg_add_dict_text ( buf + len , size - len , " SUBSYSTEM " ,
dev_info - > subsystem ) ;
len + = msg_add_dict_text ( buf + len , size - len , " DEVICE " ,
dev_info - > device ) ;
out :
return len ;
2015-06-26 01:01:27 +03:00
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
/* /dev/kmsg - userspace message inject/listen interface */
struct devkmsg_user {
2021-03-03 13:15:22 +03:00
atomic64_t seq ;
2016-08-03 00:04:07 +03:00
struct ratelimit_state rs ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
struct mutex lock ;
2023-01-09 13:07:59 +03:00
struct printk_buffers pbufs ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
} ;
2018-11-24 07:10:25 +03:00
static __printf ( 3 , 4 ) __cold
int devkmsg_emit ( int facility , int level , const char * fmt , . . . )
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
2020-09-21 14:18:45 +03:00
r = vprintk_emit ( facility , level , NULL , fmt , args ) ;
2018-11-24 07:10:25 +03:00
va_end ( args ) ;
return r ;
}
2014-08-23 20:23:53 +04:00
static ssize_t devkmsg_write ( struct kiocb * iocb , struct iov_iter * from )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
{
char * buf , * line ;
int level = default_message_loglevel ;
int facility = 1 ; /* LOG_USER */
2016-08-03 00:04:07 +03:00
struct file * file = iocb - > ki_filp ;
struct devkmsg_user * user = file - > private_data ;
2015-02-11 21:56:46 +03:00
size_t len = iov_iter_count ( from ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ssize_t ret = len ;
2023-03-20 10:02:01 +03:00
if ( len > PRINTKRB_RECORD_MAX )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return - EINVAL ;
2016-08-03 00:04:07 +03:00
/* Ignore when user logging is disabled. */
if ( devkmsg_log & DEVKMSG_LOG_MASK_OFF )
return len ;
/* Ratelimit when not explicitly enabled. */
if ( ! ( devkmsg_log & DEVKMSG_LOG_MASK_ON ) ) {
if ( ! ___ratelimit ( & user - > rs , current - > comm ) )
return ret ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
buf = kmalloc ( len + 1 , GFP_KERNEL ) ;
if ( buf = = NULL )
return - ENOMEM ;
2014-08-23 20:23:53 +04:00
buf [ len ] = ' \0 ' ;
2016-11-02 05:09:04 +03:00
if ( ! copy_from_iter_full ( buf , len , from ) ) {
2014-08-23 20:23:53 +04:00
kfree ( buf ) ;
return - EFAULT ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
/*
* Extract and skip the syslog prefix < [ 0 - 9 ] * > . Coming from userspace
* the decimal value represents 32 bit , the lower 3 bit are the log
* level , the rest are the log facility .
*
* If no prefix or no userspace facility is specified , we
* enforce LOG_USER , to be able to reliably distinguish
* kernel - generated messages from userspace - injected ones .
*/
line = buf ;
if ( line [ 0 ] = = ' < ' ) {
char * endp = NULL ;
2015-11-07 03:30:38 +03:00
unsigned int u ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2015-11-07 03:30:38 +03:00
u = simple_strtoul ( line + 1 , & endp , 10 ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( endp & & endp [ 0 ] = = ' > ' ) {
2015-11-07 03:30:38 +03:00
level = LOG_LEVEL ( u ) ;
if ( LOG_FACILITY ( u ) ! = 0 )
facility = LOG_FACILITY ( u ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
endp + + ;
line = endp ;
}
}
2018-11-24 07:10:25 +03:00
devkmsg_emit ( facility , level , " %s " , line ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
kfree ( buf ) ;
return ret ;
}
static ssize_t devkmsg_read ( struct file * file , char __user * buf ,
size_t count , loff_t * ppos )
{
struct devkmsg_user * user = file - > private_data ;
2023-01-09 13:07:59 +03:00
char * outbuf = & user - > pbufs . outbuf [ 0 ] ;
struct printk_message pmsg = {
. pbufs = & user - > pbufs ,
} ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ssize_t ret ;
printk: use mutex lock to stop syslog_seq from going wild
Although syslog_seq and log_next_seq stuff are protected by logbuf_lock
spin log, it's not enough. Say we have two processes A and B, and let
syslog_seq = N, while log_next_seq = N + 1, and the two processes both
come to syslog_print at almost the same time. And No matter which
process get the spin lock first, it will increase syslog_seq by one,
then release spin lock; thus later, another process increase syslog_seq
by one again. In this case, syslog_seq is bigger than syslog_next_seq.
And latter, it would make:
wait_event_interruptiable(log_wait, syslog != log_next_seq)
don't wait any more even there is no new write comes. Thus it introduce
a infinite loop reading.
I can easily see this kind of issue by the following steps:
# cat /proc/kmsg # at meantime, I don't kill rsyslog
# So they are the two processes.
# xinit # I added drm.debug=6 in the kernel parameter line,
# so that it will produce lots of message and let that
# issue happen
It's 100% reproducable on my side. And my disk will be filled up by
/var/log/messages in a quite short time.
So, introduce a mutex_lock to stop syslog_seq from going wild just like
what devkmsg_read() does. It does fix this issue as expected.
v2: use mutex_lock_interruptiable() instead (comments from Kay)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-By: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-16 17:21:51 +04:00
ret = mutex_lock_interruptible ( & user - > lock ) ;
if ( ret )
return ret ;
2016-12-27 17:16:11 +03:00
2023-01-09 13:07:59 +03:00
if ( ! printk_get_next_message ( & pmsg , atomic64_read ( & user - > seq ) , true , false ) ) {
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( file - > f_flags & O_NONBLOCK ) {
ret = - EAGAIN ;
goto out ;
}
2022-04-22 00:22:38 +03:00
/*
* Guarantee this task is visible on the waitqueue before
* checking the wake condition .
*
* The full memory barrier within set_current_state ( ) of
* prepare_to_wait_event ( ) pairs with the full memory barrier
* within wq_has_sleeper ( ) .
*
2022-04-22 00:22:40 +03:00
* This pairs with __wake_up_klogd : A .
2022-04-22 00:22:38 +03:00
*/
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = wait_event_interruptible ( log_wait ,
2023-01-09 13:07:59 +03:00
printk_get_next_message ( & pmsg , atomic64_read ( & user - > seq ) , true ,
false ) ) ; /* LMM(devkmsg_read:A) */
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( ret )
goto out ;
}
2023-01-09 13:07:59 +03:00
if ( pmsg . dropped ) {
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
/* our last seen message is gone, return error and reset */
2023-01-09 13:07:59 +03:00
atomic64_set ( & user - > seq , pmsg . seq ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = - EPIPE ;
goto out ;
}
2023-01-09 13:07:59 +03:00
atomic64_set ( & user - > seq , pmsg . seq + 1 ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2023-01-09 13:07:59 +03:00
if ( pmsg . outbuf_len > count ) {
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = - EINVAL ;
goto out ;
}
2023-01-09 13:07:59 +03:00
if ( copy_to_user ( buf , outbuf , pmsg . outbuf_len ) ) {
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = - EFAULT ;
goto out ;
}
2023-01-09 13:07:59 +03:00
ret = pmsg . outbuf_len ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
out :
mutex_unlock ( & user - > lock ) ;
return ret ;
}
2020-07-10 20:44:23 +03:00
/*
* Be careful when modifying this function ! ! !
*
* Only few operations are supported because the device works only with the
* entire variable length messages ( records ) . Non - standard values are
* returned in the other cases and has been this way for quite some time .
* User space applications might depend on this behavior .
*/
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
static loff_t devkmsg_llseek ( struct file * file , loff_t offset , int whence )
{
struct devkmsg_user * user = file - > private_data ;
loff_t ret = 0 ;
if ( offset )
return - ESPIPE ;
switch ( whence ) {
case SEEK_SET :
/* the first record */
2021-03-03 13:15:22 +03:00
atomic64_set ( & user - > seq , prb_first_valid_seq ( prb ) ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
break ;
case SEEK_DATA :
/*
* The first record after the last SYSLOG_ACTION_CLEAR ,
* like issued by ' dmesg - c ' . Reading / dev / kmsg itself
* changes no global state , and does not clear anything .
*/
2021-03-03 13:15:22 +03:00
atomic64_set ( & user - > seq , latched_seq_read_nolock ( & clear_seq ) ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
break ;
case SEEK_END :
/* after the last record */
2021-03-03 13:15:22 +03:00
atomic64_set ( & user - > seq , prb_next_seq ( prb ) ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
break ;
default :
ret = - EINVAL ;
}
return ret ;
}
2017-07-03 07:42:43 +03:00
static __poll_t devkmsg_poll ( struct file * file , poll_table * wait )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
{
struct devkmsg_user * user = file - > private_data ;
2021-02-11 20:31:52 +03:00
struct printk_info info ;
2017-07-03 07:42:43 +03:00
__poll_t ret = 0 ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
poll_wait ( file , & log_wait , wait ) ;
2021-03-03 13:15:22 +03:00
if ( prb_read_valid_info ( prb , atomic64_read ( & user - > seq ) , & info , NULL ) ) {
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
/* return error when data has vanished underneath us */
2021-03-03 13:15:22 +03:00
if ( info . seq ! = atomic64_read ( & user - > seq ) )
2018-02-12 01:34:03 +03:00
ret = EPOLLIN | EPOLLRDNORM | EPOLLERR | EPOLLPRI ;
2013-04-30 03:17:20 +04:00
else
2018-02-12 01:34:03 +03:00
ret = EPOLLIN | EPOLLRDNORM ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
return ret ;
}
static int devkmsg_open ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user ;
int err ;
2016-08-03 00:04:07 +03:00
if ( devkmsg_log & DEVKMSG_LOG_MASK_OFF )
return - EPERM ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2016-08-03 00:04:07 +03:00
/* write-only does not need any file context */
if ( ( file - > f_flags & O_ACCMODE ) ! = O_WRONLY ) {
err = check_syslog_permissions ( SYSLOG_ACTION_READ_ALL ,
SYSLOG_FROM_READER ) ;
if ( err )
return err ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2021-08-30 10:17:01 +03:00
user = kvmalloc ( sizeof ( struct devkmsg_user ) , GFP_KERNEL ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( ! user )
return - ENOMEM ;
2016-08-03 00:04:07 +03:00
ratelimit_default_init ( & user - > rs ) ;
ratelimit_set_flags ( & user - > rs , RATELIMIT_MSG_ON_RELEASE ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
mutex_init ( & user - > lock ) ;
2021-03-03 13:15:22 +03:00
atomic64_set ( & user - > seq , prb_first_valid_seq ( prb ) ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
file - > private_data = user ;
return 0 ;
}
static int devkmsg_release ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user = file - > private_data ;
2016-08-03 00:04:07 +03:00
ratelimit_state_exit ( & user - > rs ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
mutex_destroy ( & user - > lock ) ;
2021-08-30 10:17:01 +03:00
kvfree ( user ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return 0 ;
}
const struct file_operations kmsg_fops = {
. open = devkmsg_open ,
. read = devkmsg_read ,
2014-08-23 20:23:53 +04:00
. write_iter = devkmsg_write ,
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
. llseek = devkmsg_llseek ,
. poll = devkmsg_poll ,
. release = devkmsg_release ,
} ;
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
# ifdef CONFIG_CRASH_CORE
2009-04-03 03:58:57 +04:00
/*
2013-11-13 03:08:54 +04:00
* This appends the listed symbols to / proc / vmcore
2009-04-03 03:58:57 +04:00
*
2013-11-13 03:08:54 +04:00
* / proc / vmcore is used by various utilities , like crash and makedumpfile to
2009-04-03 03:58:57 +04:00
* obtain access to symbols that are otherwise very difficult to locate . These
* symbols are specifically used so that utilities can access and extract the
* dmesg log from a vmcore file after a crash .
*/
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
void log_buf_vmcoreinfo_setup ( void )
2009-04-03 03:58:57 +04:00
{
2020-09-21 14:18:45 +03:00
struct dev_printk_info * dev_info = NULL ;
2020-07-09 16:23:44 +03:00
VMCOREINFO_SYMBOL ( prb ) ;
VMCOREINFO_SYMBOL ( printk_rb_static ) ;
VMCOREINFO_SYMBOL ( clear_seq ) ;
2012-07-18 21:18:12 +04:00
/*
2020-07-09 16:23:44 +03:00
* Export struct size and field offsets . User space tools can
2012-07-18 21:18:12 +04:00
* parse it and detect any changes to structure down the line .
*/
2020-07-09 16:23:44 +03:00
VMCOREINFO_STRUCT_SIZE ( printk_ringbuffer ) ;
VMCOREINFO_OFFSET ( printk_ringbuffer , desc_ring ) ;
VMCOREINFO_OFFSET ( printk_ringbuffer , text_data_ring ) ;
VMCOREINFO_OFFSET ( printk_ringbuffer , fail ) ;
VMCOREINFO_STRUCT_SIZE ( prb_desc_ring ) ;
VMCOREINFO_OFFSET ( prb_desc_ring , count_bits ) ;
VMCOREINFO_OFFSET ( prb_desc_ring , descs ) ;
2020-09-19 01:34:19 +03:00
VMCOREINFO_OFFSET ( prb_desc_ring , infos ) ;
2020-07-09 16:23:44 +03:00
VMCOREINFO_OFFSET ( prb_desc_ring , head_id ) ;
VMCOREINFO_OFFSET ( prb_desc_ring , tail_id ) ;
VMCOREINFO_STRUCT_SIZE ( prb_desc ) ;
VMCOREINFO_OFFSET ( prb_desc , state_var ) ;
VMCOREINFO_OFFSET ( prb_desc , text_blk_lpos ) ;
VMCOREINFO_STRUCT_SIZE ( prb_data_blk_lpos ) ;
VMCOREINFO_OFFSET ( prb_data_blk_lpos , begin ) ;
VMCOREINFO_OFFSET ( prb_data_blk_lpos , next ) ;
VMCOREINFO_STRUCT_SIZE ( printk_info ) ;
VMCOREINFO_OFFSET ( printk_info , seq ) ;
VMCOREINFO_OFFSET ( printk_info , ts_nsec ) ;
VMCOREINFO_OFFSET ( printk_info , text_len ) ;
VMCOREINFO_OFFSET ( printk_info , caller_id ) ;
2020-09-21 14:18:45 +03:00
VMCOREINFO_OFFSET ( printk_info , dev_info ) ;
VMCOREINFO_STRUCT_SIZE ( dev_printk_info ) ;
VMCOREINFO_OFFSET ( dev_printk_info , subsystem ) ;
VMCOREINFO_LENGTH ( printk_info_subsystem , sizeof ( dev_info - > subsystem ) ) ;
VMCOREINFO_OFFSET ( dev_printk_info , device ) ;
VMCOREINFO_LENGTH ( printk_info_device , sizeof ( dev_info - > device ) ) ;
2020-07-09 16:23:44 +03:00
VMCOREINFO_STRUCT_SIZE ( prb_data_ring ) ;
VMCOREINFO_OFFSET ( prb_data_ring , size_bits ) ;
VMCOREINFO_OFFSET ( prb_data_ring , data ) ;
VMCOREINFO_OFFSET ( prb_data_ring , head_lpos ) ;
VMCOREINFO_OFFSET ( prb_data_ring , tail_lpos ) ;
VMCOREINFO_SIZE ( atomic_long_t ) ;
VMCOREINFO_TYPE_OFFSET ( atomic_long_t , counter ) ;
2021-03-03 13:15:21 +03:00
VMCOREINFO_STRUCT_SIZE ( latched_seq ) ;
VMCOREINFO_OFFSET ( latched_seq , val ) ;
2009-04-03 03:58:57 +04:00
}
# endif
2011-05-25 04:13:20 +04:00
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len ;
2014-08-07 03:08:52 +04:00
/* we practice scaling the ring buffer by powers of 2 */
2018-09-29 19:45:53 +03:00
static void __init log_buf_len_update ( u64 size )
2005-04-17 02:20:36 +04:00
{
2018-09-29 19:45:53 +03:00
if ( size > ( u64 ) LOG_BUF_LEN_MAX ) {
size = ( u64 ) LOG_BUF_LEN_MAX ;
pr_err ( " log_buf over 2G is not supported. \n " ) ;
}
2005-04-17 02:20:36 +04:00
if ( size )
size = roundup_pow_of_two ( size ) ;
2011-05-25 04:13:20 +04:00
if ( size > log_buf_len )
2018-09-29 19:45:53 +03:00
new_log_buf_len = ( unsigned long ) size ;
2014-08-07 03:08:52 +04:00
}
/* save requested log_buf_len since it's too early to process it */
static int __init log_buf_len_setup ( char * str )
{
2018-09-29 19:45:53 +03:00
u64 size ;
2018-09-29 19:45:50 +03:00
if ( ! str )
return - EINVAL ;
size = memparse ( str , & str ) ;
2014-08-07 03:08:52 +04:00
log_buf_len_update ( size ) ;
2011-05-25 04:13:20 +04:00
return 0 ;
2005-04-17 02:20:36 +04:00
}
2011-05-25 04:13:20 +04:00
early_param ( " log_buf_len " , log_buf_len_setup ) ;
2014-10-14 02:51:11 +04:00
# ifdef CONFIG_SMP
# define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
2014-08-07 03:08:56 +04:00
static void __init log_buf_add_cpu ( void )
{
unsigned int cpu_extra ;
/*
* archs should set up cpu_possible_bits properly with
* set_cpu_possible ( ) after setup_arch ( ) but just in
* case lets ensure this is valid .
*/
if ( num_possible_cpus ( ) = = 1 )
return ;
cpu_extra = ( num_possible_cpus ( ) - 1 ) * __LOG_CPU_MAX_BUF_LEN ;
/* by default this will only continue through for large > 64 CPUs */
if ( cpu_extra < = __LOG_BUF_LEN / 2 )
return ;
pr_info ( " log_buf_len individual max cpu contribution: %d bytes \n " ,
__LOG_CPU_MAX_BUF_LEN ) ;
pr_info ( " log_buf_len total cpu_extra contributions: %d bytes \n " ,
cpu_extra ) ;
pr_info ( " log_buf_len min size: %d bytes \n " , __LOG_BUF_LEN ) ;
log_buf_len_update ( cpu_extra + __LOG_BUF_LEN ) ;
}
2014-10-14 02:51:11 +04:00
# else /* !CONFIG_SMP */
static inline void log_buf_add_cpu ( void ) { }
# endif /* CONFIG_SMP */
2014-08-07 03:08:56 +04:00
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
static void __init set_percpu_data_ready ( void )
{
__printk_percpu_data_ready = true ;
}
2020-07-09 16:23:44 +03:00
static unsigned int __init add_to_rb ( struct printk_ringbuffer * rb ,
struct printk_record * r )
{
struct prb_reserved_entry e ;
struct printk_record dest_r ;
2020-09-19 01:34:21 +03:00
prb_rec_init_wr ( & dest_r , r - > info - > text_len ) ;
2020-07-09 16:23:44 +03:00
if ( ! prb_reserve ( & e , rb , & dest_r ) )
return 0 ;
2020-09-14 15:33:51 +03:00
memcpy ( & dest_r . text_buf [ 0 ] , & r - > text_buf [ 0 ] , r - > info - > text_len ) ;
dest_r . info - > text_len = r - > info - > text_len ;
2020-07-09 16:23:44 +03:00
dest_r . info - > facility = r - > info - > facility ;
dest_r . info - > level = r - > info - > level ;
dest_r . info - > flags = r - > info - > flags ;
dest_r . info - > ts_nsec = r - > info - > ts_nsec ;
dest_r . info - > caller_id = r - > info - > caller_id ;
2020-09-21 14:18:45 +03:00
memcpy ( & dest_r . info - > dev_info , & r - > info - > dev_info , sizeof ( dest_r . info - > dev_info ) ) ;
2020-07-09 16:23:44 +03:00
2020-09-14 15:33:54 +03:00
prb_final_commit ( & e ) ;
2020-07-09 16:23:44 +03:00
return prb_record_text_space ( & e ) ;
}
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
static char setup_text_buf [ PRINTKRB_RECORD_MAX ] __initdata ;
2020-07-09 16:23:44 +03:00
2011-05-25 04:13:20 +04:00
void __init setup_log_buf ( int early )
{
2020-09-19 01:34:19 +03:00
struct printk_info * new_infos ;
2020-07-09 16:23:44 +03:00
unsigned int new_descs_count ;
struct prb_desc * new_descs ;
struct printk_info info ;
struct printk_record r ;
2021-07-15 22:33:56 +03:00
unsigned int text_size ;
2020-07-09 16:23:44 +03:00
size_t new_descs_size ;
2020-09-19 01:34:19 +03:00
size_t new_infos_size ;
2011-05-25 04:13:20 +04:00
unsigned long flags ;
char * new_log_buf ;
2018-10-10 14:33:08 +03:00
unsigned int free ;
2020-07-09 16:23:44 +03:00
u64 seq ;
2011-05-25 04:13:20 +04:00
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
/*
* Some archs call setup_log_buf ( ) multiple times - first is very
* early , e . g . from setup_arch ( ) , and second - when percpu_areas
* are initialised .
*/
if ( ! early )
set_percpu_data_ready ( ) ;
2014-08-07 03:08:56 +04:00
if ( log_buf ! = __log_buf )
return ;
if ( ! early & & ! new_log_buf_len )
log_buf_add_cpu ( ) ;
2011-05-25 04:13:20 +04:00
if ( ! new_log_buf_len )
return ;
2005-04-17 02:20:36 +04:00
2020-07-09 16:23:44 +03:00
new_descs_count = new_log_buf_len > > PRB_AVGBITS ;
if ( new_descs_count = = 0 ) {
pr_err ( " new_log_buf_len: %lu too small \n " , new_log_buf_len ) ;
return ;
}
2019-03-12 09:30:42 +03:00
new_log_buf = memblock_alloc ( new_log_buf_len , LOG_ALIGN ) ;
2011-05-25 04:13:20 +04:00
if ( unlikely ( ! new_log_buf ) ) {
2020-07-09 16:23:44 +03:00
pr_err ( " log_buf_len: %lu text bytes not available \n " ,
new_log_buf_len ) ;
2011-05-25 04:13:20 +04:00
return ;
}
2020-07-09 16:23:44 +03:00
new_descs_size = new_descs_count * sizeof ( struct prb_desc ) ;
new_descs = memblock_alloc ( new_descs_size , LOG_ALIGN ) ;
if ( unlikely ( ! new_descs ) ) {
pr_err ( " log_buf_len: %zu desc bytes not available \n " ,
new_descs_size ) ;
2020-09-19 01:34:21 +03:00
goto err_free_log_buf ;
2020-09-19 01:34:19 +03:00
}
new_infos_size = new_descs_count * sizeof ( struct printk_info ) ;
new_infos = memblock_alloc ( new_infos_size , LOG_ALIGN ) ;
if ( unlikely ( ! new_infos ) ) {
pr_err ( " log_buf_len: %zu info bytes not available \n " ,
new_infos_size ) ;
goto err_free_descs ;
2020-07-09 16:23:44 +03:00
}
2020-09-19 01:34:21 +03:00
prb_rec_init_rd ( & r , & info , & setup_text_buf [ 0 ] , sizeof ( setup_text_buf ) ) ;
2020-07-09 16:23:44 +03:00
prb_init ( & printk_rb_dynamic ,
new_log_buf , ilog2 ( new_log_buf_len ) ,
2020-09-19 01:34:19 +03:00
new_descs , ilog2 ( new_descs_count ) ,
new_infos ) ;
2020-07-09 16:23:44 +03:00
2021-07-15 22:33:56 +03:00
local_irq_save ( flags ) ;
2020-07-09 16:23:44 +03:00
2011-05-25 04:13:20 +04:00
log_buf_len = new_log_buf_len ;
log_buf = new_log_buf ;
new_log_buf_len = 0 ;
2020-07-09 16:23:44 +03:00
free = __LOG_BUF_LEN ;
2021-07-15 22:33:56 +03:00
prb_for_each_record ( 0 , & printk_rb_static , seq , & r ) {
text_size = add_to_rb ( & printk_rb_dynamic , & r ) ;
if ( text_size > free )
free = 0 ;
else
free - = text_size ;
}
2020-07-09 16:23:44 +03:00
prb = & printk_rb_dynamic ;
2021-07-15 22:33:56 +03:00
local_irq_restore ( flags ) ;
/*
* Copy any remaining messages that might have appeared from
* NMI context after copying but before switching to the
* dynamic buffer .
*/
prb_for_each_record ( seq , & printk_rb_static , seq , & r ) {
text_size = add_to_rb ( & printk_rb_dynamic , & r ) ;
if ( text_size > free )
free = 0 ;
else
free - = text_size ;
}
2011-05-25 04:13:20 +04:00
2020-07-09 16:23:44 +03:00
if ( seq ! = prb_next_seq ( & printk_rb_static ) ) {
pr_err ( " dropped %llu messages \n " ,
prb_next_seq ( & printk_rb_static ) - seq ) ;
}
2018-09-29 19:45:53 +03:00
pr_info ( " log_buf_len: %u bytes \n " , log_buf_len ) ;
pr_info ( " early log buf free: %u(%u%%) \n " ,
2011-05-25 04:13:20 +04:00
free , ( free * 100 ) / __LOG_BUF_LEN ) ;
2020-09-19 01:34:19 +03:00
return ;
err_free_descs :
2021-11-05 23:43:22 +03:00
memblock_free ( new_descs , new_descs_size ) ;
2020-09-19 01:34:19 +03:00
err_free_log_buf :
2021-11-05 23:43:22 +03:00
memblock_free ( new_log_buf , new_log_buf_len ) ;
2011-05-25 04:13:20 +04:00
}
2005-04-17 02:20:36 +04:00
2012-12-18 03:59:56 +04:00
static bool __read_mostly ignore_loglevel ;
static int __init ignore_loglevel_setup ( char * str )
{
2014-08-07 03:09:12 +04:00
ignore_loglevel = true ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: ignoring loglevel setting. \n " ) ;
2012-12-18 03:59:56 +04:00
return 0 ;
}
early_param ( " ignore_loglevel " , ignore_loglevel_setup ) ;
module_param ( ignore_loglevel , bool , S_IRUGO | S_IWUSR ) ;
2015-02-13 02:01:34 +03:00
MODULE_PARM_DESC ( ignore_loglevel ,
" ignore loglevel setting (prints all kernel messages to the console) " ) ;
2012-12-18 03:59:56 +04:00
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
static bool suppress_message_printing ( int level )
{
return ( level > = console_loglevel & & ! ignore_loglevel ) ;
}
2007-10-16 12:23:46 +04:00
# ifdef CONFIG_BOOT_PRINTK_DELAY
2010-10-27 01:22:48 +04:00
static int boot_delay ; /* msecs delay after each printk during bootup */
2009-09-23 03:43:31 +04:00
static unsigned long long loops_per_msec ; /* based on boot_delay */
2007-10-16 12:23:46 +04:00
static int __init boot_delay_setup ( char * str )
{
unsigned long lpj ;
lpj = preset_lpj ? preset_lpj : 1000000 ; /* some guess */
loops_per_msec = ( unsigned long long ) lpj / 1000 * HZ ;
get_option ( & str , & boot_delay ) ;
if ( boot_delay > 10 * 1000 )
boot_delay = 0 ;
2009-09-23 03:43:31 +04:00
pr_debug ( " boot_delay: %u, preset_lpj: %ld, lpj: %lu, "
" HZ: %d, loops_per_msec: %llu \n " ,
boot_delay , preset_lpj , lpj , HZ , loops_per_msec ) ;
2013-11-13 03:08:53 +04:00
return 0 ;
2007-10-16 12:23:46 +04:00
}
2013-11-13 03:08:53 +04:00
early_param ( " boot_delay " , boot_delay_setup ) ;
2007-10-16 12:23:46 +04:00
2012-12-18 03:59:56 +04:00
static void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
unsigned long long k ;
unsigned long timeout ;
2017-05-16 21:42:45 +03:00
if ( ( boot_delay = = 0 | | system_state > = SYSTEM_RUNNING )
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
| | suppress_message_printing ( level ) ) {
2007-10-16 12:23:46 +04:00
return ;
2012-12-18 03:59:56 +04:00
}
2007-10-16 12:23:46 +04:00
2009-09-23 03:43:31 +04:00
k = ( unsigned long long ) loops_per_msec * boot_delay ;
2007-10-16 12:23:46 +04:00
timeout = jiffies + msecs_to_jiffies ( boot_delay ) ;
while ( k ) {
k - - ;
cpu_relax ( ) ;
/*
* use ( volatile ) jiffies to prevent
* compiler reduction ; loop termination via jiffies
* is secondary and may or may not happen .
*/
if ( time_after ( jiffies , timeout ) )
break ;
touch_nmi_watchdog ( ) ;
}
}
# else
2012-12-18 03:59:56 +04:00
static inline void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
}
# endif
2014-08-07 03:09:05 +04:00
static bool printk_time = IS_ENABLED ( CONFIG_PRINTK_TIME ) ;
2012-05-03 04:29:13 +04:00
module_param_named ( time , printk_time , bool , S_IRUGO | S_IWUSR ) ;
2018-12-11 12:49:05 +03:00
static size_t print_syslog ( unsigned int level , char * buf )
2012-06-28 11:38:53 +04:00
{
2018-12-11 12:49:05 +03:00
return sprintf ( buf , " <%u> " , level ) ;
}
printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e42 ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 03:35:50 +04:00
2012-06-28 11:38:53 +04:00
static size_t print_time ( u64 ts , char * buf )
{
2018-12-04 13:00:01 +03:00
unsigned long rem_nsec = do_div ( ts , 1000000000 ) ;
2012-06-28 11:38:53 +04:00
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
return sprintf ( buf , " [%5lu.%06lu] " ,
2012-06-28 11:38:53 +04:00
( unsigned long ) ts , rem_nsec / 1000 ) ;
}
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
# ifdef CONFIG_PRINTK_CALLER
static size_t print_caller ( u32 id , char * buf )
{
char caller [ 12 ] ;
snprintf ( caller , sizeof ( caller ) , " %c%u " ,
id & 0x80000000 ? ' C ' : ' T ' , id & ~ 0x80000000 ) ;
return sprintf ( buf , " [%6s] " , caller ) ;
}
# else
# define print_caller(id, buf) 0
# endif
2020-07-09 16:23:44 +03:00
static size_t info_print_prefix ( const struct printk_info * info , bool syslog ,
bool time , char * buf )
2012-05-10 06:30:45 +04:00
{
2012-05-14 01:30:46 +04:00
size_t len = 0 ;
2012-05-10 06:30:45 +04:00
2018-12-11 12:49:05 +03:00
if ( syslog )
2020-07-09 16:23:44 +03:00
len = print_syslog ( ( info - > facility < < 3 ) | info - > level , buf ) ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
2018-12-04 13:00:01 +03:00
if ( time )
2020-07-09 16:23:44 +03:00
len + = print_time ( info - > ts_nsec , buf + len ) ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
2020-07-09 16:23:44 +03:00
len + = print_caller ( info - > caller_id , buf + len ) ;
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 00:05:04 +03:00
if ( IS_ENABLED ( CONFIG_PRINTK_CALLER ) | | time ) {
buf [ len + + ] = ' ' ;
buf [ len ] = ' \0 ' ;
}
2012-05-14 01:30:46 +04:00
return len ;
2012-05-10 06:30:45 +04:00
}
2020-07-09 16:23:44 +03:00
/*
* Prepare the record for printing . The text is shifted within the given
* buffer to avoid a need for another one . The following operations are
* done :
*
* - Add prefix for each line .
2021-01-14 20:04:12 +03:00
* - Drop truncated lines that no longer fit into the buffer .
2020-07-09 16:23:44 +03:00
* - Add the trailing newline that has been removed in vprintk_store ( ) .
2021-01-14 20:04:12 +03:00
* - Add a string terminator .
*
* Since the produced string is always terminated , the maximum possible
* return value is @ r - > text_buf_size - 1 ;
2020-07-09 16:23:44 +03:00
*
* Return : The length of the updated / prepared text , including the added
2021-01-14 20:04:12 +03:00
* prefixes and the newline . The terminator is not counted . The dropped
* line ( s ) are not counted .
2020-07-09 16:23:44 +03:00
*/
static size_t record_print_text ( struct printk_record * r , bool syslog ,
bool time )
2012-05-03 04:29:13 +04:00
{
2020-07-09 16:23:44 +03:00
size_t text_len = r - > info - > text_len ;
size_t buf_size = r - > text_buf_size ;
char * text = r - > text_buf ;
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
char prefix [ PRINTK_PREFIX_MAX ] ;
2020-07-09 16:23:44 +03:00
bool truncated = false ;
size_t prefix_len ;
size_t line_len ;
size_t len = 0 ;
char * next ;
2012-05-14 01:30:46 +04:00
2020-09-30 12:01:33 +03:00
/*
* If the message was truncated because the buffer was not large
* enough , treat the available text as if it were the full text .
*/
if ( text_len > buf_size )
text_len = buf_size ;
2012-05-14 01:30:46 +04:00
2020-07-09 16:23:44 +03:00
prefix_len = info_print_prefix ( r - > info , syslog , time , prefix ) ;
2012-05-14 01:30:46 +04:00
2020-07-09 16:23:44 +03:00
/*
* @ text_len : bytes of unprocessed text
* @ line_len : bytes of current line _without_ newline
* @ text : pointer to beginning of current line
* @ len : number of bytes prepared in r - > text_buf
*/
for ( ; ; ) {
next = memchr ( text , ' \n ' , text_len ) ;
2012-05-14 01:30:46 +04:00
if ( next ) {
2020-07-09 16:23:44 +03:00
line_len = next - text ;
2012-05-14 01:30:46 +04:00
} else {
2020-07-09 16:23:44 +03:00
/* Drop truncated line(s). */
if ( truncated )
break ;
line_len = text_len ;
2012-05-14 01:30:46 +04:00
}
2012-05-03 04:29:13 +04:00
2020-07-09 16:23:44 +03:00
/*
* Truncate the text if there is not enough space to add the
2021-01-14 20:04:12 +03:00
* prefix and a trailing newline and a terminator .
2020-07-09 16:23:44 +03:00
*/
2021-01-14 20:04:12 +03:00
if ( len + prefix_len + text_len + 1 + 1 > buf_size ) {
2020-07-09 16:23:44 +03:00
/* Drop even the current line if no space. */
2021-01-14 20:04:12 +03:00
if ( len + prefix_len + line_len + 1 + 1 > buf_size )
2012-05-14 01:30:46 +04:00
break ;
2012-05-03 04:29:13 +04:00
2021-01-14 20:04:12 +03:00
text_len = buf_size - len - prefix_len - 1 - 1 ;
2020-07-09 16:23:44 +03:00
truncated = true ;
2012-05-14 01:30:46 +04:00
}
2012-05-03 04:29:13 +04:00
2020-07-09 16:23:44 +03:00
memmove ( text + prefix_len , text , text_len ) ;
memcpy ( text , prefix , prefix_len ) ;
2021-01-14 20:04:12 +03:00
/*
* Increment the prepared length to include the text and
* prefix that were just moved + copied . Also increment for the
* newline at the end of this line . If this is the last line ,
* there is no newline , but it will be added immediately below .
*/
2020-07-09 16:23:44 +03:00
len + = prefix_len + line_len + 1 ;
if ( text_len = = line_len ) {
/*
2021-01-14 20:04:12 +03:00
* This is the last line . Add the trailing newline
* removed in vprintk_store ( ) .
2020-07-09 16:23:44 +03:00
*/
text [ prefix_len + line_len ] = ' \n ' ;
break ;
}
/*
* Advance beyond the added prefix and the related line with
* its newline .
*/
text + = prefix_len + line_len + 1 ;
/*
* The remaining text has only decreased by the line with its
* newline .
*
* Note that @ text_len can become zero . It happens when @ text
* ended with a newline ( either due to truncation or the
* original string ending with " \n \n " ) . The loop is correctly
* repeated and ( if not truncated ) an empty line with a prefix
* will be prepared .
*/
text_len - = line_len + 1 ;
}
2012-05-03 04:29:13 +04:00
2021-01-14 20:04:12 +03:00
/*
* If a buffer was provided , it will be terminated . Space for the
* string terminator is guaranteed to be available . The terminator is
* not counted in the return value .
*/
if ( buf_size > 0 )
2021-01-24 23:27:28 +03:00
r - > text_buf [ len ] = 0 ;
2021-01-14 20:04:12 +03:00
2012-05-03 04:29:13 +04:00
return len ;
}
2020-07-09 16:23:44 +03:00
static size_t get_record_print_text_size ( struct printk_info * info ,
unsigned int line_count ,
bool syslog , bool time )
{
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
char prefix [ PRINTK_PREFIX_MAX ] ;
2020-07-09 16:23:44 +03:00
size_t prefix_len ;
prefix_len = info_print_prefix ( info , syslog , time , prefix ) ;
/*
* Each line will be preceded with a prefix . The intermediate
* newlines are already within the text , but a final trailing
* newline will be added .
*/
return ( ( prefix_len * line_count ) + info - > text_len + 1 ) ;
}
2021-03-03 13:15:19 +03:00
/*
* Beginning with @ start_seq , find the first record where it and all following
* records up to ( but not including ) @ max_seq fit into @ size .
*
* @ max_seq is simply an upper bound and does not need to exist . If the caller
* does not require an upper bound , - 1 can be used for @ max_seq .
*/
static u64 find_first_fitting_seq ( u64 start_seq , u64 max_seq , size_t size ,
bool syslog , bool time )
{
struct printk_info info ;
unsigned int line_count ;
size_t len = 0 ;
u64 seq ;
/* Determine the size of the records up to @max_seq. */
prb_for_each_info ( start_seq , prb , seq , & info , & line_count ) {
if ( info . seq > = max_seq )
break ;
len + = get_record_print_text_size ( & info , line_count , syslog , time ) ;
}
/*
* Adjust the upper bound for the next loop to avoid subtracting
* lengths that were never added .
*/
if ( seq < max_seq )
max_seq = seq ;
/*
* Move first record forward until length fits into the buffer . Ignore
* newest messages that were not counted in the above cycle . Messages
* might appear and get lost in the meantime . This is a best effort
* that prevents an infinite loop that could occur with a retry .
*/
prb_for_each_info ( start_seq , prb , seq , & info , & line_count ) {
if ( len < = size | | info . seq > = max_seq )
break ;
len - = get_record_print_text_size ( & info , line_count , syslog , time ) ;
}
return seq ;
}
2021-07-15 22:33:59 +03:00
/* The caller is responsible for making sure @size is greater than 0. */
2012-05-03 04:29:13 +04:00
static int syslog_print ( char __user * buf , int size )
{
2020-07-09 16:23:44 +03:00
struct printk_info info ;
struct printk_record r ;
2012-05-03 04:29:13 +04:00
char * text ;
2012-06-22 19:36:09 +04:00
int len = 0 ;
2021-07-15 22:33:59 +03:00
u64 seq ;
2012-05-03 04:29:13 +04:00
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
text = kmalloc ( PRINTK_MESSAGE_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
prb_rec_init_rd ( & r , & info , text , PRINTK_MESSAGE_MAX ) ;
2020-07-09 16:23:44 +03:00
2021-07-15 22:33:59 +03:00
mutex_lock ( & syslog_lock ) ;
/*
* Wait for the @ syslog_seq record to be available . @ syslog_seq may
* change while waiting .
*/
do {
seq = syslog_seq ;
mutex_unlock ( & syslog_lock ) ;
2022-04-22 00:22:38 +03:00
/*
* Guarantee this task is visible on the waitqueue before
* checking the wake condition .
*
* The full memory barrier within set_current_state ( ) of
* prepare_to_wait_event ( ) pairs with the full memory barrier
* within wq_has_sleeper ( ) .
*
2022-04-22 00:22:40 +03:00
* This pairs with __wake_up_klogd : A .
2022-04-22 00:22:38 +03:00
*/
len = wait_event_interruptible ( log_wait ,
prb_read_valid ( prb , seq , NULL ) ) ; /* LMM(syslog_print:A) */
2021-07-15 22:33:59 +03:00
mutex_lock ( & syslog_lock ) ;
if ( len )
goto out ;
} while ( syslog_seq ! = seq ) ;
/*
* Copy records that fit into the buffer . The above cycle makes sure
* that the first record is always available .
*/
do {
2012-06-22 19:36:09 +04:00
size_t n ;
2012-07-09 21:05:10 +04:00
size_t skip ;
2021-07-15 22:33:59 +03:00
int err ;
2012-06-22 19:36:09 +04:00
2021-07-15 22:33:59 +03:00
if ( ! prb_read_valid ( prb , syslog_seq , & r ) )
2012-06-22 19:36:09 +04:00
break ;
2021-07-15 22:33:59 +03:00
2020-07-09 16:23:44 +03:00
if ( r . info - > seq ! = syslog_seq ) {
/* message is gone, move to next valid one */
syslog_seq = r . info - > seq ;
syslog_partial = 0 ;
}
2012-07-09 21:05:10 +04:00
2018-12-04 13:00:01 +03:00
/*
* To keep reading / counting partial line consistent ,
* use printk_time value as of the beginning of a line .
*/
if ( ! syslog_partial )
syslog_time = printk_time ;
2012-07-09 21:05:10 +04:00
skip = syslog_partial ;
2020-07-09 16:23:44 +03:00
n = record_print_text ( & r , true , syslog_time ) ;
2012-07-09 21:05:10 +04:00
if ( n - syslog_partial < = size ) {
/* message fits into buffer, move forward */
2020-07-09 16:23:44 +03:00
syslog_seq = r . info - > seq + 1 ;
2012-07-09 21:05:10 +04:00
n - = syslog_partial ;
syslog_partial = 0 ;
} else if ( ! len ) {
/* partial read(), remember position */
n = size ;
syslog_partial + = n ;
2012-06-22 19:36:09 +04:00
} else
n = 0 ;
if ( ! n )
break ;
2021-07-15 22:33:59 +03:00
mutex_unlock ( & syslog_lock ) ;
err = copy_to_user ( buf , text + skip , n ) ;
mutex_lock ( & syslog_lock ) ;
if ( err ) {
2012-06-22 19:36:09 +04:00
if ( ! len )
len = - EFAULT ;
break ;
}
2012-07-09 21:05:10 +04:00
len + = n ;
size - = n ;
buf + = n ;
2021-07-15 22:33:59 +03:00
} while ( size ) ;
out :
mutex_unlock ( & syslog_lock ) ;
2012-05-03 04:29:13 +04:00
kfree ( text ) ;
return len ;
}
static int syslog_print_all ( char __user * buf , int size , bool clear )
{
2020-07-09 16:23:44 +03:00
struct printk_info info ;
struct printk_record r ;
2012-05-03 04:29:13 +04:00
char * text ;
int len = 0 ;
2018-06-20 16:56:19 +03:00
u64 seq ;
2018-12-04 13:00:01 +03:00
bool time ;
2018-06-20 16:56:19 +03:00
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
text = kmalloc ( PRINTK_MESSAGE_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
2018-12-04 13:00:01 +03:00
time = printk_time ;
2018-06-20 16:56:19 +03:00
/*
* Find first record that fits , including all following records ,
* into the user - provided buffer for this dump .
*/
2021-03-03 13:15:21 +03:00
seq = find_first_fitting_seq ( latched_seq_read_nolock ( & clear_seq ) , - 1 ,
size , true , time ) ;
2012-05-03 04:29:13 +04:00
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
prb_rec_init_rd ( & r , & info , text , PRINTK_MESSAGE_MAX ) ;
2012-05-03 04:29:13 +04:00
2020-07-09 16:23:44 +03:00
prb_for_each_record ( seq , prb , seq , & r ) {
int textlen ;
2012-05-03 04:29:13 +04:00
2020-07-09 16:23:44 +03:00
textlen = record_print_text ( & r , true , time ) ;
if ( len + textlen > size ) {
seq - - ;
break ;
}
2012-05-03 04:29:13 +04:00
2018-06-20 16:56:19 +03:00
if ( copy_to_user ( buf + len , text , textlen ) )
len = - EFAULT ;
else
len + = textlen ;
2012-05-03 04:29:13 +04:00
2020-07-09 16:23:44 +03:00
if ( len < 0 )
break ;
2012-05-03 04:29:13 +04:00
}
2021-03-03 13:15:23 +03:00
if ( clear ) {
2021-07-15 22:33:58 +03:00
mutex_lock ( & syslog_lock ) ;
2021-03-03 13:15:21 +03:00
latched_seq_write ( & clear_seq , seq ) ;
2021-07-15 22:33:58 +03:00
mutex_unlock ( & syslog_lock ) ;
2021-03-03 13:15:23 +03:00
}
2012-05-03 04:29:13 +04:00
kfree ( text ) ;
return len ;
}
2018-06-27 18:06:41 +03:00
static void syslog_clear ( void )
{
2021-07-15 22:33:58 +03:00
mutex_lock ( & syslog_lock ) ;
2021-03-03 13:15:21 +03:00
latched_seq_write ( & clear_seq , prb_next_seq ( prb ) ) ;
2021-07-15 22:33:58 +03:00
mutex_unlock ( & syslog_lock ) ;
2021-03-03 13:15:23 +03:00
}
2015-06-26 01:01:47 +03:00
int do_syslog ( int type , char __user * buf , int len , int source )
2005-04-17 02:20:36 +04:00
{
2021-02-11 20:31:52 +03:00
struct printk_info info ;
2012-05-03 04:29:13 +04:00
bool clear = false ;
2014-12-11 02:50:15 +03:00
static int saved_console_loglevel = LOGLEVEL_DEFAULT ;
2011-02-11 04:53:55 +03:00
int error ;
2005-04-17 02:20:36 +04:00
2015-06-26 01:01:47 +03:00
error = check_syslog_permissions ( type , source ) ;
2011-02-11 04:53:55 +03:00
if ( error )
2017-07-30 06:36:36 +03:00
return error ;
2010-11-16 02:36:29 +03:00
2005-04-17 02:20:36 +04:00
switch ( type ) {
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_CLOSE : /* Close log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_OPEN : /* Open log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_READ : /* Read from log */
2005-04-17 02:20:36 +04:00
if ( ! buf | | len < 0 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( ! len )
2017-07-30 06:36:36 +03:00
return 0 ;
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 05:57:57 +03:00
if ( ! access_ok ( buf , len ) )
2017-07-30 06:36:36 +03:00
return - EFAULT ;
2012-05-03 04:29:13 +04:00
error = syslog_print ( buf , len ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR :
2012-05-03 04:29:13 +04:00
clear = true ;
2020-10-03 01:46:27 +03:00
fallthrough ;
2010-02-04 02:37:13 +03:00
/* Read last kernel messages */
case SYSLOG_ACTION_READ_ALL :
2005-04-17 02:20:36 +04:00
if ( ! buf | | len < 0 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( ! len )
2017-07-30 06:36:36 +03:00
return 0 ;
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 05:57:57 +03:00
if ( ! access_ok ( buf , len ) )
2017-07-30 06:36:36 +03:00
return - EFAULT ;
2012-05-03 04:29:13 +04:00
error = syslog_print_all ( buf , len , clear ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR :
2018-06-27 18:06:41 +03:00
syslog_clear ( ) ;
2012-06-23 01:12:19 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel = = LOGLEVEL_DEFAULT )
2009-07-06 15:31:48 +04:00
saved_console_loglevel = console_loglevel ;
2005-04-17 02:20:36 +04:00
console_loglevel = minimum_console_loglevel ;
break ;
2010-02-04 02:37:13 +03:00
/* Enable logging to console */
case SYSLOG_ACTION_CONSOLE_ON :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel ! = LOGLEVEL_DEFAULT ) {
2009-07-06 15:31:48 +04:00
console_loglevel = saved_console_loglevel ;
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2009-07-06 15:31:48 +04:00
}
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Set level of messages printed to console */
case SYSLOG_ACTION_CONSOLE_LEVEL :
2005-04-17 02:20:36 +04:00
if ( len < 1 | | len > 8 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( len < minimum_console_loglevel )
len = minimum_console_loglevel ;
console_loglevel = len ;
2009-07-06 15:31:48 +04:00
/* Implicitly re-enable logging to console */
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD :
2021-07-15 22:33:58 +03:00
mutex_lock ( & syslog_lock ) ;
2021-02-11 20:31:52 +03:00
if ( ! prb_read_valid_info ( prb , syslog_seq , & info , NULL ) ) {
/* No unread messages. */
2021-07-15 22:33:58 +03:00
mutex_unlock ( & syslog_lock ) ;
2021-02-11 20:31:52 +03:00
return 0 ;
}
if ( info . seq ! = syslog_seq ) {
2012-05-03 04:29:13 +04:00
/* messages are gone, move to first one */
2021-02-11 20:31:52 +03:00
syslog_seq = info . seq ;
2012-07-09 21:05:10 +04:00
syslog_partial = 0 ;
2012-05-03 04:29:13 +04:00
}
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC ) {
2012-05-03 04:29:13 +04:00
/*
* Short - cut for poll ( / " proc/kmsg " ) which simply checks
* for pending data , not the size ; return the count of
* records , not the length .
*/
2020-07-09 16:23:44 +03:00
error = prb_next_seq ( prb ) - syslog_seq ;
2012-05-03 04:29:13 +04:00
} else {
2018-12-04 13:00:01 +03:00
bool time = syslog_partial ? syslog_time : printk_time ;
2020-07-09 16:23:44 +03:00
unsigned int line_count ;
u64 seq ;
prb_for_each_info ( syslog_seq , prb , seq , & info ,
& line_count ) {
error + = get_record_print_text_size ( & info , line_count ,
true , time ) ;
2018-12-04 13:00:01 +03:00
time = printk_time ;
2012-05-03 04:29:13 +04:00
}
2012-07-09 21:05:10 +04:00
error - = syslog_partial ;
2012-05-03 04:29:13 +04:00
}
2021-07-15 22:33:58 +03:00
mutex_unlock ( & syslog_lock ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER :
2005-04-17 02:20:36 +04:00
error = log_buf_len ;
break ;
default :
error = - EINVAL ;
break ;
}
2017-07-30 06:36:36 +03:00
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:29 +03:00
SYSCALL_DEFINE3 ( syslog , int , type , char __user * , buf , int , len )
2005-04-17 02:20:36 +04:00
{
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return do_syslog ( type , buf , len , SYSLOG_FROM_READER ) ;
2005-04-17 02:20:36 +04:00
}
2018-01-12 19:08:37 +03:00
/*
* Special console_lock variants that help to reduce the risk of soft - lockups .
* They allow to pass console_lock to another printk ( ) call using a busy wait .
*/
# ifdef CONFIG_LOCKDEP
static struct lockdep_map console_owner_dep_map = {
. name = " console_owner "
} ;
# endif
static DEFINE_RAW_SPINLOCK ( console_owner_lock ) ;
static struct task_struct * console_owner ;
static bool console_waiter ;
/**
* console_lock_spinning_enable - mark beginning of code where another
* thread might safely busy wait
*
* This basically converts console_lock into a spinlock . This marks
* the section where the console_lock owner can not sleep , because
* there may be a waiter spinning ( like a spinlock ) . Also it must be
* ready to hand over the lock at the end of the section .
*/
static void console_lock_spinning_enable ( void )
{
raw_spin_lock ( & console_owner_lock ) ;
console_owner = current ;
raw_spin_unlock ( & console_owner_lock ) ;
/* The waiter may spin on us after setting console_owner */
spin_acquire ( & console_owner_dep_map , 0 , 0 , _THIS_IP_ ) ;
}
/**
* console_lock_spinning_disable_and_check - mark end of code where another
* thread was able to busy wait and check if there is a waiter
2023-01-16 15:56:34 +03:00
* @ cookie : cookie returned from console_srcu_read_lock ( )
2018-01-12 19:08:37 +03:00
*
* This is called at the end of the section where spinning is allowed .
* It has two functions . First , it is a signal that it is no longer
* safe to start busy waiting for the lock . Second , it checks if
* there is a busy waiter and passes the lock rights to her .
*
2022-11-16 19:21:27 +03:00
* Important : Callers lose both the console_lock and the SRCU read lock if
* there was a busy waiter . They must not touch items synchronized by
* console_lock or SRCU read lock in this case .
2018-01-12 19:08:37 +03:00
*
* Return : 1 if the lock rights were passed , 0 otherwise .
*/
2022-11-16 19:21:27 +03:00
static int console_lock_spinning_disable_and_check ( int cookie )
2018-01-12 19:08:37 +03:00
{
int waiter ;
raw_spin_lock ( & console_owner_lock ) ;
waiter = READ_ONCE ( console_waiter ) ;
console_owner = NULL ;
raw_spin_unlock ( & console_owner_lock ) ;
if ( ! waiter ) {
2019-09-19 19:09:40 +03:00
spin_release ( & console_owner_dep_map , _THIS_IP_ ) ;
2018-01-12 19:08:37 +03:00
return 0 ;
}
/* The waiter is now free to continue */
WRITE_ONCE ( console_waiter , false ) ;
2019-09-19 19:09:40 +03:00
spin_release ( & console_owner_dep_map , _THIS_IP_ ) ;
2018-01-12 19:08:37 +03:00
2022-11-16 19:21:27 +03:00
/*
* Preserve lockdep lock ordering . Release the SRCU read lock before
* releasing the console_lock .
*/
console_srcu_read_unlock ( cookie ) ;
2018-01-12 19:08:37 +03:00
/*
* Hand off console_lock to waiter . The waiter will perform
* the up ( ) . After this , the waiter is the console_lock owner .
*/
2019-09-19 19:09:40 +03:00
mutex_release ( & console_lock_dep_map , _THIS_IP_ ) ;
2018-01-12 19:08:37 +03:00
return 1 ;
}
/**
* console_trylock_spinning - try to get console_lock by busy waiting
*
* This allows to busy wait for the console_lock when the current
* owner is running in specially marked sections . It means that
* the current owner is running and cannot reschedule until it
* is ready to lose the lock .
*
* Return : 1 if we got the lock , 0 othrewise
*/
static int console_trylock_spinning ( void )
{
struct task_struct * owner = NULL ;
bool waiter ;
bool spin = false ;
unsigned long flags ;
if ( console_trylock ( ) )
return 1 ;
2022-02-02 20:18:19 +03:00
/*
* It ' s unsafe to spin once a panic has begun . If we are the
* panic CPU , we may have already halted the owner of the
* console_sem . If we are not the panic CPU , then we should
* avoid taking console_sem , so the panic CPU has a better
* chance of cleanly acquiring it later .
*/
if ( panic_in_progress ( ) )
return 0 ;
2018-01-12 19:08:37 +03:00
printk_safe_enter_irqsave ( flags ) ;
raw_spin_lock ( & console_owner_lock ) ;
owner = READ_ONCE ( console_owner ) ;
waiter = READ_ONCE ( console_waiter ) ;
if ( ! waiter & & owner & & owner ! = current ) {
WRITE_ONCE ( console_waiter , true ) ;
spin = true ;
}
raw_spin_unlock ( & console_owner_lock ) ;
/*
* If there is an active printk ( ) writing to the
* consoles , instead of having it write our data too ,
* see if we can offload that load from the active
* printer , and do some printing ourselves .
* Go into a spin only if there isn ' t already a waiter
* spinning , and there is an active printer , and
* that active printer isn ' t us ( recursive printk ? ) .
*/
if ( ! spin ) {
printk_safe_exit_irqrestore ( flags ) ;
return 0 ;
}
/* We spin waiting for the owner to release us */
spin_acquire ( & console_owner_dep_map , 0 , 0 , _THIS_IP_ ) ;
/* Owner will clear console_waiter on hand off */
while ( READ_ONCE ( console_waiter ) )
cpu_relax ( ) ;
2019-09-19 19:09:40 +03:00
spin_release ( & console_owner_dep_map , _THIS_IP_ ) ;
2018-01-12 19:08:37 +03:00
printk_safe_exit_irqrestore ( flags ) ;
/*
* The owner passed the console lock to us .
* Since we did not spin on console lock , annotate
* this as a trylock . Otherwise lockdep will
* complain .
*/
mutex_acquire ( & console_lock_dep_map , 0 , 1 , _THIS_IP_ ) ;
return 1 ;
}
2021-07-15 22:33:55 +03:00
/*
* Recursion is tracked separately on each CPU . If NMIs are supported , an
* additional NMI context per CPU is also separately tracked . Until per - CPU
* is available , a separate " early tracking " is performed .
*/
static DEFINE_PER_CPU ( u8 , printk_count ) ;
static u8 printk_count_early ;
# ifdef CONFIG_HAVE_NMI
static DEFINE_PER_CPU ( u8 , printk_count_nmi ) ;
static u8 printk_count_nmi_early ;
# endif
/*
* Recursion is limited to keep the output sane . printk ( ) should not require
* more than 1 level of recursion ( allowing , for example , printk ( ) to trigger
* a WARN ) , but a higher value is used in case some printk - internal errors
* exist , such as the ringbuffer validation checks failing .
*/
# define PRINTK_MAX_RECURSION 3
/*
* Return a pointer to the dedicated counter for the CPU + context of the
* caller .
*/
static u8 * __printk_recursion_counter ( void )
{
# ifdef CONFIG_HAVE_NMI
if ( in_nmi ( ) ) {
if ( printk_percpu_data_ready ( ) )
return this_cpu_ptr ( & printk_count_nmi ) ;
return & printk_count_nmi_early ;
}
# endif
if ( printk_percpu_data_ready ( ) )
return this_cpu_ptr ( & printk_count ) ;
return & printk_count_early ;
}
/*
* Enter recursion tracking . Interrupts are disabled to simplify tracking .
* The caller must check the boolean return value to see if the recursion is
* allowed . On failure , interrupts are not disabled .
*
* @ recursion_ptr must be a variable of type ( u8 * ) and is the same variable
* that is passed to printk_exit_irqrestore ( ) .
*/
# define printk_enter_irqsave(recursion_ptr, flags) \
( { \
bool success = true ; \
\
typecheck ( u8 * , recursion_ptr ) ; \
local_irq_save ( flags ) ; \
( recursion_ptr ) = __printk_recursion_counter ( ) ; \
if ( * ( recursion_ptr ) > PRINTK_MAX_RECURSION ) { \
local_irq_restore ( flags ) ; \
success = false ; \
} else { \
( * ( recursion_ptr ) ) + + ; \
} \
success ; \
} )
/* Exit recursion tracking, restoring interrupts. */
# define printk_exit_irqrestore(recursion_ptr, flags) \
do { \
typecheck ( u8 * , recursion_ptr ) ; \
( * ( recursion_ptr ) ) - - ; \
local_irq_restore ( flags ) ; \
} while ( 0 )
2009-09-23 03:43:33 +04:00
int printk_delay_msec __read_mostly ;
2022-04-22 00:22:42 +03:00
static inline void printk_delay ( int level )
2009-09-23 03:43:33 +04:00
{
2022-04-22 00:22:42 +03:00
boot_delay_msec ( level ) ;
2009-09-23 03:43:33 +04:00
if ( unlikely ( printk_delay_msec ) ) {
int m = printk_delay_msec ;
while ( m - - ) {
mdelay ( 1 ) ;
touch_nmi_watchdog ( ) ;
}
}
}
2019-02-16 13:59:33 +03:00
static inline u32 printk_caller_id ( void )
{
return in_task ( ) ? task_pid_nr ( current ) :
2022-04-22 00:22:41 +03:00
0x80000000 + smp_processor_id ( ) ;
2019-02-16 13:59:33 +03:00
}
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
/**
2021-06-15 19:52:51 +03:00
* printk_parse_prefix - Parse level and control flags .
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
*
* @ text : The terminated text message .
* @ level : A pointer to the current level value , will be updated .
2021-06-15 19:52:48 +03:00
* @ flags : A pointer to the current printk_info flags , will be updated .
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
*
* @ level may be NULL if the caller is not interested in the parsed value .
* Otherwise the variable pointed to by @ level must be set to
* LOGLEVEL_DEFAULT in order to be updated with the parsed value .
*
2021-06-15 19:52:48 +03:00
* @ flags may be NULL if the caller is not interested in the parsed value .
* Otherwise the variable pointed to by @ flags will be OR ' d with the parsed
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
* value .
*
* Return : The length of the parsed level and control flags .
*/
2021-06-15 19:52:51 +03:00
u16 printk_parse_prefix ( const char * text , int * level ,
2021-06-15 19:52:48 +03:00
enum printk_info_flags * flags )
2016-10-09 08:02:09 +03:00
{
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
u16 prefix_len = 0 ;
int kern_level ;
2019-02-16 13:59:33 +03:00
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
while ( * text ) {
kern_level = printk_get_level ( text ) ;
if ( ! kern_level )
break ;
2020-09-14 15:33:54 +03:00
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
switch ( kern_level ) {
case ' 0 ' . . . ' 7 ' :
if ( level & & * level = = LOGLEVEL_DEFAULT )
* level = kern_level - ' 0 ' ;
break ;
case ' c ' : /* KERN_CONT */
2021-06-15 19:52:48 +03:00
if ( flags )
* flags | = LOG_CONT ;
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
}
prefix_len + = 2 ;
text + = 2 ;
}
return prefix_len ;
}
2021-09-27 17:22:03 +03:00
__printf ( 5 , 0 )
2021-06-15 19:52:48 +03:00
static u16 printk_sprint ( char * text , u16 size , int facility ,
enum printk_info_flags * flags , const char * fmt ,
va_list args )
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
{
u16 text_len ;
text_len = vscnprintf ( text , size , fmt , args ) ;
/* Mark and strip a trailing newline. */
if ( text_len & & text [ text_len - 1 ] = = ' \n ' ) {
text_len - - ;
2021-06-15 19:52:48 +03:00
* flags | = LOG_NEWLINE ;
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
}
/* Strip log level and control flags. */
if ( facility = = 0 ) {
u16 prefix_len ;
2021-06-15 19:52:51 +03:00
prefix_len = printk_parse_prefix ( text , NULL , NULL ) ;
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
if ( prefix_len ) {
text_len - = prefix_len ;
memmove ( text , text + prefix_len , text_len ) ;
2020-09-14 15:33:54 +03:00
}
2016-10-09 08:02:09 +03:00
}
2023-01-12 22:43:39 +03:00
trace_console ( text , text_len ) ;
printk, tracing: fix console tracepoint
The original intent of the 'console' tracepoint per the commit 95100358491a
("printk/tracing: Add console output tracing") had been to "[...] record
any printk messages into the trace, regardless of the current console
loglevel. This can help correlate (existing) printk debugging with other
tracing."
Petr points out [1] that calling trace_console_rcuidle() in
call_console_driver() had been the wrong thing for a while, because
"printk() always used console_trylock() and the message was flushed to
the console only when the trylock succeeded. And it was always deferred
in NMI or when printed via printk_deferred()."
With the commit 09c5ba0aa2fc ("printk: add kthread console printers"),
things only got worse, and calls to call_console_driver() no longer
happen with typical printk() calls but always appear deferred [2].
As such, the tracepoint can no longer serve its purpose to clearly
correlate printk() calls and other tracing, as well as breaks usecases
that expect every printk() call to result in a callback of the console
tracepoint. Notably, the KFENCE and KCSAN test suites, which want to
capture console output and assume a printk() immediately gives us a
callback to the console tracepoint.
Fix the console tracepoint by moving it into printk_sprint() [3].
One notable difference is that by moving tracing into printk_sprint(),
the 'text' will no longer include the "header" (loglevel and timestamp),
but only the raw message. Arguably this is less of a problem now that
the console tracepoint happens on the printk() call and isn't delayed.
Link: https://lore.kernel.org/all/Ym+WqKStCg%2FEHfh3@alley/ [1]
Link: https://lore.kernel.org/all/CA+G9fYu2kS0wR4WqMRsj2rePKV9XLgOU1PiXnMvpT+Z=c2ucHA@mail.gmail.com/ [2]
Link: https://lore.kernel.org/all/87fslup9dx.fsf@jogness.linutronix.de/ [3]
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Marco Elver <elver@google.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220503073844.4148944-1-elver@google.com
2022-05-03 10:38:44 +03:00
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
return text_len ;
2016-10-09 08:02:09 +03:00
}
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
__printf ( 4 , 0 )
2018-06-27 17:08:15 +03:00
int vprintk_store ( int facility , int level ,
2020-09-21 14:18:45 +03:00
const struct dev_printk_info * dev_info ,
2018-06-27 17:08:15 +03:00
const char * fmt , va_list args )
2005-04-17 02:20:36 +04:00
{
2020-12-09 03:44:52 +03:00
struct prb_reserved_entry e ;
2021-06-15 19:52:48 +03:00
enum printk_info_flags flags = 0 ;
2020-12-09 03:44:52 +03:00
struct printk_record r ;
2021-07-15 22:33:55 +03:00
unsigned long irqflags ;
2020-12-09 03:44:52 +03:00
u16 trunc_msg_len = 0 ;
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
char prefix_buf [ 8 ] ;
2021-07-15 22:33:55 +03:00
u8 * recursion_ptr ;
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
u16 reserve_size ;
va_list args2 ;
2022-04-22 00:22:41 +03:00
u32 caller_id ;
2020-12-09 03:44:52 +03:00
u16 text_len ;
2021-07-15 22:33:55 +03:00
int ret = 0 ;
2020-12-09 03:44:52 +03:00
u64 ts_nsec ;
2007-10-16 12:23:46 +04:00
2022-04-22 00:22:41 +03:00
if ( ! printk_enter_irqsave ( recursion_ptr , irqflags ) )
return 0 ;
2012-05-03 04:29:13 +04:00
/*
2020-12-09 03:44:52 +03:00
* Since the duration of printk ( ) can vary depending on the message
* and state of the ringbuffer , grab the timestamp now so that it is
* close to the call of printk ( ) . This provides a more deterministic
* timestamp with respect to the caller .
2012-05-03 04:29:13 +04:00
*/
2020-12-09 03:44:52 +03:00
ts_nsec = local_clock ( ) ;
2009-06-16 21:57:02 +04:00
2022-04-22 00:22:41 +03:00
caller_id = printk_caller_id ( ) ;
2021-07-15 22:33:55 +03:00
2012-05-03 04:29:13 +04:00
/*
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
* The sprintf needs to come first since the syslog prefix might be
* passed in as a parameter . An extra byte must be reserved so that
* later the vscnprintf ( ) into the reserved buffer has room for the
* terminating ' \0 ' , which is not counted by vsnprintf ( ) .
2012-05-03 04:29:13 +04:00
*/
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
va_copy ( args2 , args ) ;
reserve_size = vsnprintf ( & prefix_buf [ 0 ] , sizeof ( prefix_buf ) , fmt , args2 ) + 1 ;
va_end ( args2 ) ;
2011-03-13 05:19:51 +03:00
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
if ( reserve_size > PRINTKRB_RECORD_MAX )
reserve_size = PRINTKRB_RECORD_MAX ;
2012-07-31 01:40:19 +04:00
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
/* Extract log level or control flags. */
if ( facility = = 0 )
2021-06-15 19:52:51 +03:00
printk_parse_prefix ( & prefix_buf [ 0 ] , & level , & flags ) ;
2009-06-16 21:57:02 +04:00
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_DEFAULT )
2012-05-14 22:46:27 +04:00
level = default_message_loglevel ;
2011-03-13 05:19:51 +03:00
2020-09-21 14:18:45 +03:00
if ( dev_info )
2021-06-15 19:52:48 +03:00
flags | = LOG_NEWLINE ;
2011-03-13 05:19:51 +03:00
2021-06-15 19:52:48 +03:00
if ( flags & LOG_CONT ) {
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
prb_rec_init_wr ( & r , reserve_size ) ;
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
if ( prb_reserve_in_last ( & e , prb , & r , caller_id , PRINTKRB_RECORD_MAX ) ) {
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
text_len = printk_sprint ( & r . text_buf [ r . info - > text_len ] , reserve_size ,
2021-06-15 19:52:48 +03:00
facility , & flags , fmt , args ) ;
2020-12-09 03:44:52 +03:00
r . info - > text_len + = text_len ;
2012-07-31 01:40:19 +04:00
2021-06-15 19:52:48 +03:00
if ( flags & LOG_NEWLINE ) {
2020-12-09 03:44:52 +03:00
r . info - > flags | = LOG_NEWLINE ;
prb_final_commit ( & e ) ;
} else {
prb_commit ( & e ) ;
2012-07-31 01:40:19 +04:00
}
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 06:32:40 +03:00
2021-07-15 22:33:55 +03:00
ret = text_len ;
goto out ;
2009-06-16 21:57:02 +04:00
}
}
2020-12-09 03:44:52 +03:00
/*
* Explicitly initialize the record before every prb_reserve ( ) call .
* prb_reserve_in_last ( ) and prb_reserve ( ) purposely invalidate the
* structure when they fail .
*/
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
prb_rec_init_wr ( & r , reserve_size ) ;
2020-12-09 03:44:52 +03:00
if ( ! prb_reserve ( & e , prb , & r ) ) {
/* truncate the message if it is too long for empty buffer */
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
truncate_msg ( & reserve_size , & trunc_msg_len ) ;
2011-03-13 05:19:51 +03:00
printk: remove logbuf_lock writer-protection of ringbuffer
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock writer-protection of
the ringbuffer. The reader-protection is not removed because some
variables, used by readers, are using @logbuf_lock for synchronization:
@syslog_seq, @syslog_time, @syslog_partial, @console_seq,
struct kmsg_dumper.
For PRINTK_NMI_DIRECT_CONTEXT_MASK, @logbuf_lock usage is not removed
because it may be used for dumper synchronization.
Without @logbuf_lock synchronization of vprintk_store() it is no
longer possible to use the single static buffer for temporarily
sprint'ing the message. Instead, use vsnprintf() to determine the
length and perform the real vscnprintf() using the area reserved from
the ringbuffer. This leads to suboptimal packing of the message data,
but will result in less wasted storage than multiple per-cpu buffers
to support lockless temporary sprint'ing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201209004453.17720-3-john.ogness@linutronix.de
2020-12-09 03:44:53 +03:00
prb_rec_init_wr ( & r , reserve_size + trunc_msg_len ) ;
2020-12-09 03:44:52 +03:00
if ( ! prb_reserve ( & e , prb , & r ) )
2021-07-15 22:33:55 +03:00
goto out ;
2020-12-09 03:44:52 +03:00
}
/* fill message */
2021-06-15 19:52:48 +03:00
text_len = printk_sprint ( & r . text_buf [ 0 ] , reserve_size , facility , & flags , fmt , args ) ;
2020-12-09 03:44:52 +03:00
if ( trunc_msg_len )
memcpy ( & r . text_buf [ text_len ] , trunc_msg , trunc_msg_len ) ;
r . info - > text_len = text_len + trunc_msg_len ;
r . info - > facility = facility ;
r . info - > level = level & 7 ;
2021-06-15 19:52:48 +03:00
r . info - > flags = flags & 0x1f ;
2020-12-09 03:44:52 +03:00
r . info - > ts_nsec = ts_nsec ;
r . info - > caller_id = caller_id ;
2020-09-21 14:18:45 +03:00
if ( dev_info )
2020-12-09 03:44:52 +03:00
memcpy ( & r . info - > dev_info , dev_info , sizeof ( r . info - > dev_info ) ) ;
2008-05-12 23:21:04 +04:00
2020-12-09 03:44:52 +03:00
/* A message without a trailing newline can be continued. */
2021-06-15 19:52:48 +03:00
if ( ! ( flags & LOG_NEWLINE ) )
2020-12-09 03:44:52 +03:00
prb_commit ( & e ) ;
else
prb_final_commit ( & e ) ;
2021-07-15 22:33:55 +03:00
ret = text_len + trunc_msg_len ;
out :
printk_exit_irqrestore ( recursion_ptr , irqflags ) ;
return ret ;
2018-06-27 17:08:15 +03:00
}
2005-04-17 02:20:36 +04:00
2018-06-27 17:08:15 +03:00
asmlinkage int vprintk_emit ( int facility , int level ,
2020-09-21 14:18:45 +03:00
const struct dev_printk_info * dev_info ,
2018-06-27 17:08:15 +03:00
const char * fmt , va_list args )
{
int printed_len ;
2020-07-09 16:23:43 +03:00
bool in_sched = false ;
2018-06-27 17:08:15 +03:00
panic: avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack,
then the ending msg like:
[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------
The above message are very useful for debugging.
But if system is configured to not reboot on panic, say the
"panic_timeout" parameter equals 0, it will likely print out many noisy
message like WARN() call stack for each and every CPU except the panic
one, messages like below:
WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:
<IRQ>
try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages,
which makes debugging much more difficult, as the original context gets
lost on screen.
Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real
panic call stack and context.
Adding a flag "suppress_printk" which gets set in panic() to avoid those
noisy messages, without changing current kernel behavior that both panic
blinking and sysrq magic key can work as is, suggested by Petr Mladek.
To verify this, make sure kernel is not configured to reboot on panic and
in console
# echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.
Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-15 01:45:34 +03:00
/* Suppress unimportant messages after panic happens */
if ( unlikely ( suppress_printk ) )
return 0 ;
2022-02-02 20:18:20 +03:00
if ( unlikely ( suppress_panic_printk ) & &
atomic_read ( & panic_cpu ) ! = raw_smp_processor_id ( ) )
return 0 ;
2018-06-27 17:08:15 +03:00
if ( level = = LOGLEVEL_SCHED ) {
level = LOGLEVEL_DEFAULT ;
in_sched = true ;
}
2022-04-22 00:22:42 +03:00
printk_delay ( level ) ;
2018-06-27 17:08:15 +03:00
2020-09-21 14:18:45 +03:00
printed_len = vprintk_store ( facility , level , dev_info , fmt , args ) ;
2014-06-05 03:11:37 +04:00
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
/* If called from the scheduler, we can not call up(). */
2022-06-23 17:51:56 +03:00
if ( ! in_sched ) {
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
/*
2022-04-22 00:22:44 +03:00
* The caller may be holding system - critical or
2022-06-23 17:51:56 +03:00
* timing - sensitive locks . Disable preemption during
2022-04-22 00:22:44 +03:00
* printing of all remaining records to all consoles so that
* this context can return as soon as possible . Hopefully
* another printk ( ) caller will take over the printing .
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
*/
preempt_disable ( ) ;
2014-07-03 02:22:38 +04:00
/*
* Try to acquire and then immediately release the console
2022-04-22 00:22:44 +03:00
* semaphore . The release will print out buffers . With the
* spinning variant , this context tries to take over the
* printing from another printing context .
2014-07-03 02:22:38 +04:00
*/
2018-01-12 19:08:37 +03:00
if ( console_trylock_spinning ( ) )
2014-07-03 02:22:38 +04:00
console_unlock ( ) ;
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
preempt_enable ( ) ;
2014-07-03 02:22:38 +04:00
}
2006-06-25 16:47:40 +04:00
2023-07-17 22:46:05 +03:00
if ( in_sched )
defer_console_output ( ) ;
else
wake_up_klogd ( ) ;
2005-04-17 02:20:36 +04:00
return printed_len ;
}
2012-05-03 04:29:13 +04:00
EXPORT_SYMBOL ( vprintk_emit ) ;
2016-08-09 20:48:18 +03:00
int vprintk_default ( const char * fmt , va_list args )
2014-06-20 01:33:31 +04:00
{
2020-09-21 14:18:45 +03:00
return vprintk_emit ( 0 , LOGLEVEL_DEFAULT , NULL , fmt , args ) ;
2014-06-20 01:33:31 +04:00
}
EXPORT_SYMBOL_GPL ( vprintk_default ) ;
printk: Userspace format indexing support
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
2021-06-15 19:52:53 +03:00
asmlinkage __visible int _printk ( const char * fmt , . . . )
2012-05-03 04:29:13 +04:00
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
printk: rename vprintk_func to vprintk
The printk code is already hard enough to understand. Remove an
unnecessary indirection by renaming vprintk_func to vprintk (adding
the asmlinkage annotation), and removing the vprintk definition from
printk.c. That way, printk is implemented in terms of vprintk as one
would expect, and there's no "vprintk_func, what's that? Some function
pointer that gets set where?"
The declaration of vprintk in linux/printk.h already has the
__printf(1,0) attribute, there's no point repeating that with the
definition - it's for diagnostics in callers.
linux/printk.h already contains a static inline {return 0;} definition
of vprintk when !CONFIG_PRINTK.
Since the corresponding stub definition of vprintk_func was not marked
"static inline", any translation unit including internal.h would get a
definition of vprintk_func - it just so happens that for
!CONFIG_PRINTK, there is precisely one such TU, namely printk.c. Had
there been more, it would be a link error; now it's just a silly waste
of a few bytes of .text, which one must assume are rather precious to
anyone disabling PRINTK.
$ objdump -dr kernel/printk/printk.o
00000330 <vprintk_func>:
330: 31 c0 xor %eax,%eax
332: c3 ret
333: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
33a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210323144201.486050-1-linux@rasmusvillemoes.dk
2021-03-23 17:42:01 +03:00
r = vprintk ( fmt , args ) ;
2012-05-03 04:29:13 +04:00
va_end ( args ) ;
return r ;
}
printk: Userspace format indexing support
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
2021-06-15 19:52:53 +03:00
EXPORT_SYMBOL ( _printk ) ;
2012-05-09 03:37:51 +04:00
2022-09-24 03:04:37 +03:00
static bool pr_flush ( int timeout_ms , bool reset_on_progress ) ;
2022-04-22 00:22:46 +03:00
static bool __pr_flush ( struct console * con , int timeout_ms , bool reset_on_progress ) ;
2012-07-17 05:35:29 +04:00
# else /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2018-12-04 13:00:01 +03:00
# define printk_time false
2014-08-07 03:09:08 +04:00
2020-07-09 16:23:44 +03:00
# define prb_read_valid(rb, seq, r) false
# define prb_first_valid_seq(rb) 0
2022-04-22 00:22:44 +03:00
# define prb_next_seq(rb) 0
2020-07-09 16:23:44 +03:00
2012-07-17 05:35:29 +04:00
static u64 syslog_seq ;
2020-07-09 16:23:44 +03:00
2022-09-24 03:04:37 +03:00
static bool pr_flush ( int timeout_ms , bool reset_on_progress ) { return true ; }
2022-04-22 00:22:46 +03:00
static bool __pr_flush ( struct console * con , int timeout_ms , bool reset_on_progress ) { return true ; }
2005-05-01 19:59:02 +04:00
2012-05-09 03:37:51 +04:00
# endif /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2013-04-30 03:17:18 +04:00
# ifdef CONFIG_EARLY_PRINTK
struct console * early_console ;
2014-05-02 02:44:38 +04:00
asmlinkage __visible void early_printk ( const char * fmt , . . . )
2013-04-30 03:17:18 +04:00
{
va_list ap ;
2014-12-11 02:45:53 +03:00
char buf [ 512 ] ;
int n ;
if ( ! early_console )
return ;
2013-04-30 03:17:18 +04:00
va_start ( ap , fmt ) ;
2014-12-11 02:45:53 +03:00
n = vscnprintf ( buf , sizeof ( buf ) , fmt , ap ) ;
2013-04-30 03:17:18 +04:00
va_end ( ap ) ;
2014-12-11 02:45:53 +03:00
early_console - > write ( early_console , buf , n ) ;
2013-04-30 03:17:18 +04:00
}
# endif
2022-02-16 13:41:38 +03:00
static void set_user_specified ( struct console_cmdline * c , bool user_specified )
{
if ( ! user_specified )
return ;
/*
* @ c console was defined by the user on the command line .
* Do not clear when added twice also by SPCR or the device tree .
*/
c - > user_specified = true ;
/* At least one console defined by the user on the command line. */
console_set_on_cmdline = 1 ;
}
2023-10-12 09:42:57 +03:00
static int __add_preferred_console ( const char * name , const short idx , char * options ,
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
char * brl_options , bool user_specified )
2008-04-30 11:54:51 +04:00
{
struct console_cmdline * c ;
int i ;
2023-10-12 09:42:56 +03:00
/*
* We use a signed short index for struct console for device drivers to
* indicate a not yet assigned index or port . However , a negative index
* value is not valid for preferred console .
*/
if ( idx < 0 )
return - EINVAL ;
2008-04-30 11:54:51 +04:00
/*
* See if this tty is not yet registered , and
* if we have a slot free .
*/
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
2013-08-01 00:53:46 +04:00
if ( strcmp ( c - > name , name ) = = 0 & & c - > index = = idx ) {
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
if ( ! brl_options )
preferred_console = i ;
2022-02-16 13:41:38 +03:00
set_user_specified ( c , user_specified ) ;
2013-08-01 00:53:46 +04:00
return 0 ;
2008-04-30 11:54:51 +04:00
}
2013-08-01 00:53:46 +04:00
}
2008-04-30 11:54:51 +04:00
if ( i = = MAX_CMDLINECONSOLES )
return - E2BIG ;
if ( ! brl_options )
2017-03-15 13:28:51 +03:00
preferred_console = i ;
2022-11-30 11:01:41 +03:00
strscpy ( c - > name , name , sizeof ( c - > name ) ) ;
2008-04-30 11:54:51 +04:00
c - > options = options ;
2022-02-16 13:41:38 +03:00
set_user_specified ( c , user_specified ) ;
2013-08-01 00:53:45 +04:00
braille_set_options ( c , brl_options ) ;
2008-04-30 11:54:51 +04:00
c - > index = idx ;
return 0 ;
}
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 08:41:49 +03:00
static int __init console_msg_format_setup ( char * str )
{
if ( ! strcmp ( str , " syslog " ) )
console_msg_format = MSG_FORMAT_SYSLOG ;
if ( ! strcmp ( str , " default " ) )
console_msg_format = MSG_FORMAT_DEFAULT ;
return 1 ;
}
__setup ( " console_msg_format= " , console_msg_format_setup ) ;
2006-03-24 14:18:19 +03:00
/*
2014-08-07 03:09:03 +04:00
* Set up a console . Called via do_early_param ( ) in init / main . c
* for each " console= " parameter in the boot command line .
2006-03-24 14:18:19 +03:00
*/
static int __init console_setup ( char * str )
{
2014-08-07 03:09:03 +04:00
char buf [ sizeof ( console_cmdline [ 0 ] . name ) + 4 ] ; /* 4 for "ttyS" */
2008-04-30 11:54:51 +04:00
char * s , * options , * brl_options = NULL ;
2006-03-24 14:18:19 +03:00
int idx ;
2020-11-11 16:54:50 +03:00
/*
* console = " " or console = null have been suggested as a way to
* disable console output . Use ttynull that has been created
2021-03-28 07:39:32 +03:00
* for exactly this purpose .
2020-11-11 16:54:50 +03:00
*/
if ( str [ 0 ] = = 0 | | strcmp ( str , " null " ) = = 0 ) {
__add_preferred_console ( " ttynull " , 0 , NULL , NULL , true ) ;
2020-05-22 09:53:06 +03:00
return 1 ;
2020-11-11 16:54:50 +03:00
}
2020-05-22 09:53:06 +03:00
2013-08-01 00:53:45 +04:00
if ( _braille_console_setup ( & str , & brl_options ) )
return 1 ;
2008-04-30 11:54:51 +04:00
2006-03-24 14:18:19 +03:00
/*
* Decode str into name , index , options .
*/
if ( str [ 0 ] > = ' 0 ' & & str [ 0 ] < = ' 9 ' ) {
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS " ) ;
strncpy ( buf + 4 , str , sizeof ( buf ) - 5 ) ;
2006-03-24 14:18:19 +03:00
} else {
2007-07-16 10:37:27 +04:00
strncpy ( buf , str , sizeof ( buf ) - 1 ) ;
2006-03-24 14:18:19 +03:00
}
2007-07-16 10:37:27 +04:00
buf [ sizeof ( buf ) - 1 ] = 0 ;
2014-08-07 03:09:08 +04:00
options = strchr ( str , ' , ' ) ;
if ( options )
2006-03-24 14:18:19 +03:00
* ( options + + ) = 0 ;
# ifdef __sparc__
if ( ! strcmp ( str , " ttya " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS0 " ) ;
2006-03-24 14:18:19 +03:00
if ( ! strcmp ( str , " ttyb " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS1 " ) ;
2006-03-24 14:18:19 +03:00
# endif
2007-07-16 10:37:27 +04:00
for ( s = buf ; * s ; s + + )
2014-08-07 03:09:08 +04:00
if ( isdigit ( * s ) | | * s = = ' , ' )
2006-03-24 14:18:19 +03:00
break ;
idx = simple_strtoul ( s , NULL , 10 ) ;
* s = 0 ;
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
__add_preferred_console ( buf , idx , options , brl_options , true ) ;
2006-03-24 14:18:19 +03:00
return 1 ;
}
__setup ( " console= " , console_setup ) ;
2005-05-17 08:53:47 +04:00
/**
* add_preferred_console - add a device to the list of preferred consoles .
2005-11-14 03:08:14 +03:00
* @ name : device name
* @ idx : device index
* @ options : options for this console
2005-05-17 08:53:47 +04:00
*
* The last preferred console added will be used for kernel messages
* and stdin / out / err for init . Normally this is used by console_setup
* above to handle user - supplied console arguments ; however it can also
* be used by arch - specific code either to override the user or more
* commonly to provide a default console ( ie from PROM variables ) when
* the user has not supplied one .
*/
2023-10-12 09:42:57 +03:00
int add_preferred_console ( const char * name , const short idx , char * options )
2005-05-17 08:53:47 +04:00
{
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
return __add_preferred_console ( name , idx , options , NULL , false ) ;
2005-05-17 08:53:47 +04:00
}
2014-08-07 03:09:12 +04:00
bool console_suspend_enabled = true ;
2007-10-18 14:04:50 +04:00
EXPORT_SYMBOL ( console_suspend_enabled ) ;
static int __init console_suspend_disable ( char * str )
{
2014-08-07 03:09:12 +04:00
console_suspend_enabled = false ;
2007-10-18 14:04:50 +04:00
return 1 ;
}
__setup ( " no_console_suspend " , console_suspend_disable ) ;
2011-11-01 04:11:27 +04:00
module_param_named ( console_suspend , console_suspend_enabled ,
bool , S_IRUGO | S_IWUSR ) ;
MODULE_PARM_DESC ( console_suspend , " suspend console during suspend "
" and hibernate operations " ) ;
2007-10-18 14:04:50 +04:00
2021-07-27 16:06:35 +03:00
static bool printk_console_no_auto_verbose ;
void console_verbose ( void )
{
if ( console_loglevel & & ! printk_console_no_auto_verbose )
console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH ;
}
EXPORT_SYMBOL_GPL ( console_verbose ) ;
module_param_named ( console_no_auto_verbose , printk_console_no_auto_verbose , bool , 0644 ) ;
MODULE_PARM_DESC ( console_no_auto_verbose , " Disable console loglevel raise to highest on oops/panic/etc " ) ;
2006-06-20 05:16:01 +04:00
/**
* suspend_console - suspend the console subsystem
*
* This disables printk ( ) while we go into suspend states
*/
void suspend_console ( void )
{
2023-07-17 22:46:06 +03:00
struct console * con ;
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2018-03-22 16:58:33 +03:00
pr_info ( " Suspending console(s) (use no_console_suspend to debug) \n " ) ;
2022-04-22 00:22:46 +03:00
pr_flush ( 1000 , true ) ;
2023-07-17 22:46:06 +03:00
console_list_lock ( ) ;
for_each_console ( con )
console_srcu_write_flags ( con , con - > flags | CON_SUSPENDED ) ;
console_list_unlock ( ) ;
/*
* Ensure that all SRCU list walks have completed . All printing
* contexts must be able to see that they are suspended so that it
* is guaranteed that all printing has stopped when this function
* completes .
*/
synchronize_srcu ( & console_srcu ) ;
2006-06-20 05:16:01 +04:00
}
void resume_console ( void )
{
2023-07-17 22:46:06 +03:00
struct console * con ;
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2023-07-17 22:46:06 +03:00
console_list_lock ( ) ;
for_each_console ( con )
console_srcu_write_flags ( con , con - > flags & ~ CON_SUSPENDED ) ;
console_list_unlock ( ) ;
/*
* Ensure that all SRCU list walks have completed . All printing
* contexts must be able to see they are no longer suspended so
* that they are guaranteed to wake up and resume printing .
*/
synchronize_srcu ( & console_srcu ) ;
2022-04-22 00:22:46 +03:00
pr_flush ( 1000 , true ) ;
2006-06-20 05:16:01 +04:00
}
2010-06-04 09:11:25 +04:00
/**
* console_cpu_notify - print deferred console messages after CPU hotplug
2016-11-03 17:49:58 +03:00
* @ cpu : unused
2010-06-04 09:11:25 +04:00
*
* If printk ( ) is called from a CPU that is not online yet , the messages
2017-01-21 13:47:29 +03:00
* will be printed on the console only if there are CON_ANYTIME consoles .
* This function is called when a new CPU comes online ( or fails to come
* up ) or goes offline .
2010-06-04 09:11:25 +04:00
*/
2016-11-03 17:49:58 +03:00
static int console_cpu_notify ( unsigned int cpu )
{
2016-11-17 19:31:55 +03:00
if ( ! cpuhp_tasks_frozen ) {
2017-01-21 13:47:29 +03:00
/* If trylock fails, someone else is doing the printing */
if ( console_trylock ( ) )
console_unlock ( ) ;
2010-06-04 09:11:25 +04:00
}
2016-11-03 17:49:58 +03:00
return 0 ;
2010-06-04 09:11:25 +04:00
}
2023-07-17 22:46:03 +03:00
/*
2023-07-17 22:46:07 +03:00
* Return true if a panic is in progress on a remote CPU .
*
* On true , the local CPU should immediately release any printing resources
* that may be needed by the panic CPU .
2023-07-17 22:46:03 +03:00
*/
2023-07-17 22:46:07 +03:00
bool other_cpu_in_panic ( void )
2023-07-17 22:46:03 +03:00
{
if ( ! panic_in_progress ( ) )
return false ;
/*
* We can use raw_smp_processor_id ( ) here because it is impossible for
* the task to be migrated to the panic_cpu , or away from it . If
* panic_cpu has already been set , and we ' re not currently executing on
* that CPU , then we never will be .
*/
return atomic_read ( & panic_cpu ) ! = raw_smp_processor_id ( ) ;
}
2005-04-17 02:20:36 +04:00
/**
2022-11-16 19:21:51 +03:00
* console_lock - block the console subsystem from printing
2005-04-17 02:20:36 +04:00
*
2022-11-16 19:21:51 +03:00
* Acquires a lock which guarantees that no consoles will
* be in or enter their write ( ) callback .
2005-04-17 02:20:36 +04:00
*
* Can sleep , returns nothing .
*/
2011-01-26 02:07:35 +03:00
void console_lock ( void )
2005-04-17 02:20:36 +04:00
{
2012-09-18 03:03:31 +04:00
might_sleep ( ) ;
2023-07-17 22:46:03 +03:00
/* On panic, the console_lock must be left to the panic cpu. */
2023-07-17 22:46:07 +03:00
while ( other_cpu_in_panic ( ) )
2023-07-17 22:46:03 +03:00
msleep ( 1000 ) ;
2014-06-05 03:11:36 +04:00
down_console_sem ( ) ;
2022-06-23 17:51:54 +03:00
console_locked = 1 ;
2005-04-17 02:20:36 +04:00
console_may_schedule = 1 ;
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_lock ) ;
2005-04-17 02:20:36 +04:00
2011-01-26 02:07:35 +03:00
/**
2022-11-16 19:21:51 +03:00
* console_trylock - try to block the console subsystem from printing
2011-01-26 02:07:35 +03:00
*
2022-11-16 19:21:51 +03:00
* Try to acquire a lock which guarantees that no consoles will
* be in or enter their write ( ) callback .
2011-01-26 02:07:35 +03:00
*
* returns 1 on success , and 0 on failure to acquire the lock .
*/
int console_trylock ( void )
2005-04-17 02:20:36 +04:00
{
2023-07-17 22:46:03 +03:00
/* On panic, the console_lock must be left to the panic cpu. */
2023-07-17 22:46:07 +03:00
if ( other_cpu_in_panic ( ) )
2011-01-26 02:07:35 +03:00
return 0 ;
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) )
2011-01-26 02:07:35 +03:00
return 0 ;
2022-06-23 17:51:54 +03:00
console_locked = 1 ;
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
console_may_schedule = 0 ;
2011-01-26 02:07:35 +03:00
return 1 ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_trylock ) ;
2005-04-17 02:20:36 +04:00
int is_console_locked ( void )
{
2022-06-23 17:51:55 +03:00
return console_locked ;
2005-04-17 02:20:36 +04:00
}
2018-06-28 16:20:27 +03:00
EXPORT_SYMBOL ( is_console_locked ) ;
2005-04-17 02:20:36 +04:00
2022-06-23 17:51:56 +03:00
/*
* Check if the given console is currently capable and allowed to print
* records .
*
2022-11-16 19:21:29 +03:00
* Requires the console_srcu_read_lock .
2022-06-23 17:51:56 +03:00
*/
static inline bool console_is_usable ( struct console * con )
2022-04-22 00:22:44 +03:00
{
2022-11-16 19:21:29 +03:00
short flags = console_srcu_read_flags ( con ) ;
if ( ! ( flags & CON_ENABLED ) )
2022-06-23 17:51:56 +03:00
return false ;
2023-07-17 22:46:06 +03:00
if ( ( flags & CON_SUSPENDED ) )
return false ;
2022-06-23 17:51:56 +03:00
if ( ! con - > write )
2022-04-22 00:22:44 +03:00
return false ;
/*
* Console drivers may assume that per - cpu resources have been
* allocated . So unless they ' re explicitly marked as being able to
* cope ( CON_ANYTIME ) don ' t call them until this CPU is officially up .
*/
2022-11-16 19:21:29 +03:00
if ( ! cpu_online ( raw_smp_processor_id ( ) ) & & ! ( flags & CON_ANYTIME ) )
2022-04-22 00:22:44 +03:00
return false ;
return true ;
}
static void __console_unlock ( void )
{
2022-06-23 17:51:54 +03:00
console_locked = 0 ;
2022-04-22 00:22:44 +03:00
up_console_sem ( ) ;
}
2023-09-20 18:52:38 +03:00
# ifdef CONFIG_PRINTK
2022-04-22 00:22:44 +03:00
/*
2023-01-09 13:07:58 +03:00
* Prepend the message in @ pmsg - > pbufs - > outbuf with a " dropped message " . This
* is achieved by shifting the existing message over and inserting the dropped
* message .
*
* @ pmsg is the printk message to prepend .
2022-04-22 00:22:44 +03:00
*
2023-01-09 13:07:58 +03:00
* @ dropped is the dropped count to report in the dropped message .
2022-04-22 00:22:45 +03:00
*
2023-01-09 13:07:58 +03:00
* If the message text in @ pmsg - > pbufs - > outbuf does not have enough space for
* the dropped message , the message text will be sufficiently truncated .
2022-04-22 00:22:45 +03:00
*
2023-01-09 13:07:58 +03:00
* If @ pmsg - > pbufs - > outbuf is modified , @ pmsg - > outbuf_len is updated .
*/
2023-09-16 22:20:06 +03:00
void console_prepend_dropped ( struct printk_message * pmsg , unsigned long dropped )
2023-01-09 13:07:58 +03:00
{
struct printk_buffers * pbufs = pmsg - > pbufs ;
const size_t scratchbuf_sz = sizeof ( pbufs - > scratchbuf ) ;
const size_t outbuf_sz = sizeof ( pbufs - > outbuf ) ;
char * scratchbuf = & pbufs - > scratchbuf [ 0 ] ;
char * outbuf = & pbufs - > outbuf [ 0 ] ;
size_t len ;
2023-01-17 19:10:31 +03:00
len = scnprintf ( scratchbuf , scratchbuf_sz ,
2023-01-09 13:07:58 +03:00
" ** %lu printk messages dropped ** \n " , dropped ) ;
/*
* Make sure outbuf is sufficiently large before prepending .
* Keep at least the prefix when the message must be truncated .
* It is a rather theoretical problem when someone tries to
* use a minimalist buffer .
*/
printk: adjust string limit macros
The various internal size limit macros have names and/or values that
do not fit well to their current usage.
Rename the macros so that their purpose is clear and, if needed,
provide a more appropriate value. In general, the new macros and
values will lead to less memory usage. The new macros are...
PRINTK_MESSAGE_MAX:
This is the maximum size for a formatted message on a console,
devkmsg, or syslog. It does not matter which format the message has
(normal or extended). It replaces the use of CONSOLE_EXT_LOG_MAX for
console and devkmsg. It replaces the use of CONSOLE_LOG_MAX for
syslog.
Historically, normal messages have been allowed to print up to 1kB,
whereas extended messages have been allowed to print up to 8kB.
However, the difference in lengths of these message types is not
significant and in multi-line records, normal messages are probably
larger. Also, because 1kB is only slightly above the allowed record
size, multi-line normal messages could be easily truncated during
formatting.
This new macro should be significantly larger than the allowed
record size to allow sufficient space for extended or multi-line
prefix text. A value of 2kB should be plenty of space. For normal
messages this represents a doubling of the historically allowed
amount. For extended messages it reduces the excessive 8kB size,
thus reducing memory usage needed for message formatting.
PRINTK_PREFIX_MAX:
This is the maximum size allowed for a record prefix (used by
console and syslog). It replaces PREFIX_MAX. The value is left
unchanged.
PRINTKRB_RECORD_MAX:
This is the maximum size allowed to be reserved for a record in the
ringbuffer. It is used by all readers and writers with the printk
ringbuffer. It replaces LOG_LINE_MAX.
Previously this was set to "1kB - PREFIX_MAX", which makes some
sense if 1kB is the limit for normal message output and prefixes are
enabled. However, with the allowance of larger output and the
existence of multi-line records, the value is rather bizarre.
Round the value up to 1kB.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230109100800.1085541-9-john.ogness@linutronix.de
2023-01-09 13:08:00 +03:00
if ( WARN_ON_ONCE ( len + PRINTK_PREFIX_MAX > = outbuf_sz ) )
2023-01-09 13:07:58 +03:00
return ;
if ( pmsg - > outbuf_len + len > = outbuf_sz ) {
/* Truncate the message, but keep it terminated. */
pmsg - > outbuf_len = outbuf_sz - ( len + 1 ) ;
outbuf [ pmsg - > outbuf_len ] = 0 ;
}
memmove ( outbuf + len , outbuf , pmsg - > outbuf_len + 1 ) ;
memcpy ( outbuf , scratchbuf , len ) ;
pmsg - > outbuf_len + = len ;
}
2022-04-22 00:22:44 +03:00
/*
2023-01-09 13:07:57 +03:00
* Read and format the specified record ( or a later record if the specified
* record is not available ) .
2022-04-22 00:22:44 +03:00
*
2023-01-09 13:07:57 +03:00
* @ pmsg will contain the formatted result . @ pmsg - > pbufs must point to a
* struct printk_buffers .
2022-11-16 19:21:27 +03:00
*
2023-01-09 13:07:57 +03:00
* @ seq is the record to read and format . If it is not available , the next
* valid record is read .
2022-04-22 00:22:44 +03:00
*
2023-01-09 13:07:57 +03:00
* @ is_extended specifies if the message should be formatted for extended
* console output .
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
*
2023-01-09 13:07:59 +03:00
* @ may_supress specifies if records may be skipped based on loglevel .
*
2023-01-09 13:07:57 +03:00
* Returns false if no record is available . Otherwise true and all fields
* of @ pmsg are valid . ( See the documentation of struct printk_message
* for information about the @ pmsg fields . )
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
*/
2023-09-16 22:20:06 +03:00
bool printk_get_next_message ( struct printk_message * pmsg , u64 seq ,
bool is_extended , bool may_suppress )
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
{
2022-06-23 17:51:55 +03:00
static int panic_console_dropped ;
2023-01-09 13:07:56 +03:00
2023-01-09 13:07:57 +03:00
struct printk_buffers * pbufs = pmsg - > pbufs ;
const size_t scratchbuf_sz = sizeof ( pbufs - > scratchbuf ) ;
const size_t outbuf_sz = sizeof ( pbufs - > outbuf ) ;
char * scratchbuf = & pbufs - > scratchbuf [ 0 ] ;
char * outbuf = & pbufs - > outbuf [ 0 ] ;
2022-04-22 00:22:44 +03:00
struct printk_info info ;
struct printk_record r ;
2023-01-09 13:07:57 +03:00
size_t len = 0 ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:56 +03:00
/*
* Formatting extended messages requires a separate buffer , so use the
* scratch buffer to read in the ringbuffer text .
*
* Formatting normal messages is done in - place , so read the ringbuffer
* text directly into the output buffer .
*/
if ( is_extended )
prb_rec_init_rd ( & r , & info , scratchbuf , scratchbuf_sz ) ;
else
prb_rec_init_rd ( & r , & info , outbuf , outbuf_sz ) ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:57 +03:00
if ( ! prb_read_valid ( prb , seq , & r ) )
2022-04-22 00:22:44 +03:00
return false ;
2023-01-09 13:07:57 +03:00
pmsg - > seq = r . info - > seq ;
pmsg - > dropped = r . info - > seq - seq ;
/*
* Check for dropped messages in panic here so that printk
* suppression can occur as early as possible if necessary .
*/
if ( pmsg - > dropped & &
panic_in_progress ( ) & &
panic_console_dropped + + > 10 ) {
suppress_panic_printk = 1 ;
pr_warn_once ( " Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock. \n " ) ;
2022-04-22 00:22:44 +03:00
}
/* Skip record that has level above the console loglevel. */
2023-01-09 13:07:59 +03:00
if ( may_suppress & & suppress_message_printing ( r . info - > level ) )
2023-01-09 13:07:57 +03:00
goto out ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:56 +03:00
if ( is_extended ) {
len = info_print_ext_header ( outbuf , outbuf_sz , r . info ) ;
len + = msg_print_ext_body ( outbuf + len , outbuf_sz - len ,
2022-04-22 00:22:44 +03:00
& r . text_buf [ 0 ] , r . info - > text_len , & r . info - > dev_info ) ;
} else {
len = record_print_text ( & r , console_msg_format & MSG_FORMAT_SYSLOG , printk_time ) ;
}
2023-01-09 13:07:57 +03:00
out :
pmsg - > outbuf_len = len ;
return true ;
}
2023-09-16 22:20:02 +03:00
/*
* Used as the printk buffers for non - panic , serialized console printing .
* This is for legacy ( ! CON_NBCON ) as well as all boot ( CON_BOOT ) consoles .
* Its usage requires the console_lock held .
*/
struct printk_buffers printk_shared_pbufs ;
2023-01-09 13:07:57 +03:00
/*
* Print one record for the given console . The record printed is whatever
* record is the next available record for the given console .
2022-04-22 00:22:45 +03:00
*
2022-04-22 00:22:44 +03:00
* @ handover will be set to true if a printk waiter has taken over the
2022-11-16 19:21:27 +03:00
* console_lock , in which case the caller is no longer holding both the
* console_lock and the SRCU read lock . Otherwise it is set to false .
*
* @ cookie is the cookie from the SRCU read lock .
2022-04-22 00:22:44 +03:00
*
* Returns false if the given console has no next record to print , otherwise
* true .
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
*
2022-11-16 19:21:27 +03:00
* Requires the console_lock and the SRCU read lock .
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
*/
2023-01-09 13:07:57 +03:00
static bool console_emit_next_record ( struct console * con , bool * handover , int cookie )
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
{
2023-01-09 13:07:57 +03:00
bool is_extended = console_srcu_read_flags ( con ) & CON_EXTENDED ;
2023-09-16 22:20:02 +03:00
char * outbuf = & printk_shared_pbufs . outbuf [ 0 ] ;
2023-01-09 13:07:57 +03:00
struct printk_message pmsg = {
2023-09-16 22:20:02 +03:00
. pbufs = & printk_shared_pbufs ,
2023-01-09 13:07:57 +03:00
} ;
unsigned long flags ;
2022-04-22 00:22:44 +03:00
2022-06-23 17:51:55 +03:00
* handover = false ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:59 +03:00
if ( ! printk_get_next_message ( & pmsg , con - > seq , is_extended , true ) )
2022-04-22 00:22:44 +03:00
return false ;
2023-01-09 13:07:57 +03:00
con - > dropped + = pmsg . dropped ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:57 +03:00
/* Skip messages of formatted length 0. */
if ( pmsg . outbuf_len = = 0 ) {
con - > seq = pmsg . seq + 1 ;
2022-04-22 00:22:44 +03:00
goto skip ;
}
2023-01-09 13:07:58 +03:00
if ( con - > dropped & & ! is_extended ) {
console_prepend_dropped ( & pmsg , con - > dropped ) ;
con - > dropped = 0 ;
2022-04-22 00:22:44 +03:00
}
2022-06-23 17:51:55 +03:00
/*
* While actively printing out messages , if another printk ( )
* were to occur on another CPU , it may wait for this one to
* finish . This task can not be preempted if there is a
* waiter waiting to take over .
*
* Interrupts are disabled because the hand over to a waiter
* must not be interrupted until the hand over is completed
* ( @ console_waiter is cleared ) .
*/
printk_safe_enter_irqsave ( flags ) ;
console_lock_spinning_enable ( ) ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:58 +03:00
/* Do not trace print latency. */
stop_critical_timings ( ) ;
/* Write everything out to the hardware. */
con - > write ( con , outbuf , pmsg . outbuf_len ) ;
2022-06-23 17:51:55 +03:00
start_critical_timings ( ) ;
2022-04-22 00:22:44 +03:00
2023-01-09 13:07:57 +03:00
con - > seq = pmsg . seq + 1 ;
2022-04-22 00:22:44 +03:00
2022-11-16 19:21:27 +03:00
* handover = console_lock_spinning_disable_and_check ( cookie ) ;
2022-06-23 17:51:55 +03:00
printk_safe_exit_irqrestore ( flags ) ;
2022-04-22 00:22:44 +03:00
skip :
return true ;
}
2023-09-20 18:52:38 +03:00
# else
static bool console_emit_next_record ( struct console * con , bool * handover , int cookie )
{
* handover = false ;
return false ;
}
# endif /* CONFIG_PRINTK */
2022-04-22 00:22:44 +03:00
/*
* Print out all remaining records to all consoles .
*
* @ do_cond_resched is set by the caller . It can be true only in schedulable
* context .
*
* @ next_seq is set to the sequence number after the last available record .
* The value is valid only when this function returns true . It means that all
* usable consoles are completely flushed .
*
* @ handover will be set to true if a printk waiter has taken over the
* console_lock , in which case the caller is no longer holding the
* console_lock . Otherwise it is set to false .
*
* Returns true when there was at least one usable console and all messages
* were flushed to all usable consoles . A returned false informs the caller
* that everything was not flushed ( either there were no usable consoles or
* another context has taken over printing or it is a panic situation and this
2022-06-23 17:51:56 +03:00
* is not the panic CPU ) . Regardless the reason , the caller should assume it
* is not useful to immediately try again .
2022-04-22 00:22:44 +03:00
*
* Requires the console_lock .
*/
static bool console_flush_all ( bool do_cond_resched , u64 * next_seq , bool * handover )
{
bool any_usable = false ;
struct console * con ;
bool any_progress ;
2022-11-16 19:21:27 +03:00
int cookie ;
2022-04-22 00:22:44 +03:00
* next_seq = 0 ;
* handover = false ;
do {
any_progress = false ;
2022-11-16 19:21:27 +03:00
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( con ) {
2022-04-22 00:22:44 +03:00
bool progress ;
if ( ! console_is_usable ( con ) )
continue ;
any_usable = true ;
2023-01-09 13:07:56 +03:00
progress = console_emit_next_record ( con , handover , cookie ) ;
2022-11-16 19:21:27 +03:00
/*
* If a handover has occurred , the SRCU read lock
* is already released .
*/
2022-04-22 00:22:44 +03:00
if ( * handover )
return false ;
/* Track the next of the highest seq flushed. */
if ( con - > seq > * next_seq )
* next_seq = con - > seq ;
if ( ! progress )
continue ;
any_progress = true ;
/* Allow panic_cpu to take over the consoles safely. */
2023-07-17 22:46:07 +03:00
if ( other_cpu_in_panic ( ) )
2022-11-16 19:21:27 +03:00
goto abandon ;
2022-04-22 00:22:44 +03:00
if ( do_cond_resched )
cond_resched ( ) ;
}
2022-11-16 19:21:27 +03:00
console_srcu_read_unlock ( cookie ) ;
2022-04-22 00:22:44 +03:00
} while ( any_progress ) ;
return any_usable ;
2022-11-16 19:21:27 +03:00
abandon :
console_srcu_read_unlock ( cookie ) ;
return false ;
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
}
2005-04-17 02:20:36 +04:00
/**
2022-11-16 19:21:51 +03:00
* console_unlock - unblock the console subsystem from printing
2005-04-17 02:20:36 +04:00
*
2022-11-16 19:21:51 +03:00
* Releases the console_lock which the caller holds to block printing of
* the console subsystem .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* While the console_lock was held , console output may have been buffered
* by printk ( ) . If this is the case , console_unlock ( ) ; emits
* the output prior to releasing the lock .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* console_unlock ( ) ; may be called from any context .
2005-04-17 02:20:36 +04:00
*/
2011-01-26 02:07:35 +03:00
void console_unlock ( void )
2005-04-17 02:20:36 +04:00
{
2022-04-22 00:22:44 +03:00
bool do_cond_resched ;
bool handover ;
bool flushed ;
u64 next_seq ;
2005-04-17 02:20:36 +04:00
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/*
2017-03-24 19:14:05 +03:00
* Console drivers are called with interrupts disabled , so
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
* @ console_may_schedule should be cleared before ; however , we may
* end up dumping a lot of lines , for example , if called from
* console registration path , and should invoke cond_resched ( )
* between lines if allowable . Not doing so can cause a very long
* scheduling stall on a slow console leading to RCU stall and
* softlockup warnings which exacerbate the issue with more
2022-04-22 00:22:44 +03:00
* messages practically incapacitating the system . Therefore , create
* a local to use for the printing loop .
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
*/
do_cond_resched = console_may_schedule ;
2006-08-05 23:14:16 +04:00
2022-04-22 00:22:44 +03:00
do {
console_may_schedule = 0 ;
2012-05-10 06:30:45 +04:00
2022-04-22 00:22:44 +03:00
flushed = console_flush_all ( do_cond_resched , & next_seq , & handover ) ;
if ( ! handover )
__console_unlock ( ) ;
2018-09-13 15:34:06 +03:00
2020-07-09 16:23:44 +03:00
/*
2022-04-22 00:22:44 +03:00
* Abort if there was a failure to flush all messages to all
* usable consoles . Either it is not possible to flush ( in
* which case it would be an infinite loop of retrying ) or
* another context has taken over printing .
2020-07-09 16:23:44 +03:00
*/
2022-04-22 00:22:44 +03:00
if ( ! flushed )
break ;
2012-05-03 04:29:13 +04:00
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
/*
2022-04-22 00:22:44 +03:00
* Some context may have added new records after
* console_flush_all ( ) but before unlocking the console .
* Re - check if there is a new record to flush . If the trylock
* fails , another context is already handling the printing .
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
*/
2022-04-22 00:22:44 +03:00
} while ( prb_read_valid ( prb , next_seq , NULL ) & & console_trylock ( ) ) ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_unlock ) ;
2005-04-17 02:20:36 +04:00
2005-11-14 03:08:14 +03:00
/**
* console_conditional_schedule - yield the CPU if required
2005-04-17 02:20:36 +04:00
*
* If the console code is currently allowed to sleep , and
* if this CPU should yield the CPU to another task , do
* so here .
*
2011-01-26 02:07:35 +03:00
* Must be called within console_lock ( ) ; .
2005-04-17 02:20:36 +04:00
*/
void __sched console_conditional_schedule ( void )
{
if ( console_may_schedule )
cond_resched ( ) ;
}
EXPORT_SYMBOL ( console_conditional_schedule ) ;
void console_unblank ( void )
{
2023-07-17 22:46:02 +03:00
bool found_unblank = false ;
2005-04-17 02:20:36 +04:00
struct console * c ;
2022-11-16 19:21:30 +03:00
int cookie ;
2005-04-17 02:20:36 +04:00
2023-07-17 22:46:02 +03:00
/*
* First check if there are any consoles implementing the unblank ( )
* callback . If not , there is no reason to continue and take the
* console lock , which in particular can be dangerous if
* @ oops_in_progress is set .
*/
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( c ) {
if ( ( console_srcu_read_flags ( c ) & CON_ENABLED ) & & c - > unblank ) {
found_unblank = true ;
break ;
}
}
console_srcu_read_unlock ( cookie ) ;
if ( ! found_unblank )
return ;
2005-04-17 02:20:36 +04:00
/*
2022-11-16 19:21:30 +03:00
* Stop console printing because the unblank ( ) callback may
* assume the console is not within its write ( ) callback .
*
* If @ oops_in_progress is set , this may be an atomic context .
* In that case , attempt a trylock as best - effort .
2005-04-17 02:20:36 +04:00
*/
if ( oops_in_progress ) {
2023-07-17 22:46:02 +03:00
/* Semaphores are not NMI-safe. */
if ( in_nmi ( ) )
return ;
/*
* Attempting to trylock the console lock can deadlock
* if another CPU was stopped while modifying the
* semaphore . " Hope and pray " that this is not the
* current situation .
*/
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) ! = 0 )
2005-04-17 02:20:36 +04:00
return ;
} else
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
2022-06-23 17:51:54 +03:00
console_locked = 1 ;
2005-04-17 02:20:36 +04:00
console_may_schedule = 0 ;
2022-11-16 19:21:30 +03:00
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( c ) {
if ( ( console_srcu_read_flags ( c ) & CON_ENABLED ) & & c - > unblank )
2005-04-17 02:20:36 +04:00
c - > unblank ( ) ;
2022-11-16 19:21:30 +03:00
}
console_srcu_read_unlock ( cookie ) ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2022-04-22 00:22:46 +03:00
if ( ! oops_in_progress )
pr_flush ( 1000 , true ) ;
2005-04-17 02:20:36 +04:00
}
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/**
* console_flush_on_panic - flush console content on panic
2019-05-18 00:31:50 +03:00
* @ mode : flush all messages in buffer or just the pending ones
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
*
* Immediately output all pending messages no matter what .
*/
2019-05-18 00:31:50 +03:00
void console_flush_on_panic ( enum con_flush_mode mode )
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
{
2023-07-17 22:46:04 +03:00
bool handover ;
u64 next_seq ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/*
2023-07-17 22:46:04 +03:00
* Ignore the console lock and flush out the messages . Attempting a
* trylock would not be useful because :
*
* - if it is contended , it must be ignored anyway
* - console_lock ( ) and console_trylock ( ) block and fail
* respectively in panic for non - panic CPUs
* - semaphores are not NMI - safe
*/
/*
* If another context is holding the console lock ,
* @ console_may_schedule might be set . Clear it so that
* this context does not call cond_resched ( ) while flushing .
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
*/
console_may_schedule = 0 ;
2019-05-18 00:31:50 +03:00
2022-04-22 00:22:44 +03:00
if ( mode = = CONSOLE_REPLAY_ALL ) {
struct console * c ;
2023-09-16 22:20:05 +03:00
short flags ;
2022-11-16 19:21:31 +03:00
int cookie ;
2022-04-22 00:22:44 +03:00
u64 seq ;
seq = prb_first_valid_seq ( prb ) ;
2022-11-16 19:21:31 +03:00
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( c ) {
2023-09-16 22:20:05 +03:00
flags = console_srcu_read_flags ( c ) ;
if ( flags & CON_NBCON ) {
nbcon_seq_force ( c , seq ) ;
} else {
/*
* This is an unsynchronized assignment . On
* panic legacy consoles are only best effort .
*/
c - > seq = seq ;
}
2022-11-16 19:21:31 +03:00
}
console_srcu_read_unlock ( cookie ) ;
2022-04-22 00:22:44 +03:00
}
2023-07-17 22:46:04 +03:00
console_flush_all ( false , & next_seq , & handover ) ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
}
2005-04-17 02:20:36 +04:00
/*
* Return the console tty driver structure and its associated index
*/
struct tty_driver * console_device ( int * index )
{
struct console * c ;
struct tty_driver * driver = NULL ;
2022-11-16 19:21:32 +03:00
int cookie ;
2005-04-17 02:20:36 +04:00
2022-11-16 19:21:32 +03:00
/*
* Take console_lock to serialize device ( ) callback with
* other console operations . For example , fg_console is
* modified under console_lock when switching vt .
*/
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2022-11-16 19:21:32 +03:00
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( c ) {
2005-04-17 02:20:36 +04:00
if ( ! c - > device )
continue ;
driver = c - > device ( c , index ) ;
if ( driver )
break ;
}
2022-11-16 19:21:32 +03:00
console_srcu_read_unlock ( cookie ) ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
return driver ;
}
/*
* Prevent further output on the passed console device so that ( for example )
* serial drivers can disable console output before suspending a port , and can
* re - enable output afterwards .
*/
void console_stop ( struct console * console )
{
2022-04-22 00:22:46 +03:00
__pr_flush ( console , 1000 , true ) ;
2022-11-21 14:10:12 +03:00
console_list_lock ( ) ;
2022-11-16 19:21:24 +03:00
console_srcu_write_flags ( console , console - > flags & ~ CON_ENABLED ) ;
2022-11-21 14:10:12 +03:00
console_list_unlock ( ) ;
2022-11-16 19:21:15 +03:00
/*
* Ensure that all SRCU list walks have completed . All contexts must
* be able to see that this console is disabled so that ( for example )
* the caller can suspend the port without risk of another context
* using the port .
*/
synchronize_srcu ( & console_srcu ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_stop ) ;
void console_start ( struct console * console )
{
2022-11-21 14:10:12 +03:00
console_list_lock ( ) ;
2022-11-16 19:21:24 +03:00
console_srcu_write_flags ( console , console - > flags | CON_ENABLED ) ;
2022-11-21 14:10:12 +03:00
console_list_unlock ( ) ;
2022-04-22 00:22:46 +03:00
__pr_flush ( console , 1000 , true ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_start ) ;
2011-03-23 02:34:20 +03:00
static int __read_mostly keep_bootcon ;
static int __init keep_bootcon_setup ( char * str )
{
keep_bootcon = 1 ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: skip boot console de-registration. \n " ) ;
2011-03-23 02:34:20 +03:00
return 0 ;
}
early_param ( " keep_bootcon " , keep_bootcon_setup ) ;
2020-02-13 12:51:31 +03:00
/*
* This is called by register_console ( ) to try to match
* the newly registered console with any of the ones selected
* by either the command line or add_preferred_console ( ) and
* setup / enable it .
*
* Care need to be taken with consoles that are statically
* enabled such as netconsole
*/
2021-11-22 16:26:45 +03:00
static int try_enable_preferred_console ( struct console * newcon ,
bool user_specified )
2020-02-13 12:51:31 +03:00
{
struct console_cmdline * c ;
2020-06-18 19:47:50 +03:00
int i , err ;
2020-02-13 12:51:31 +03:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
if ( c - > user_specified ! = user_specified )
continue ;
2020-02-13 12:51:31 +03:00
if ( ! newcon - > match | |
newcon - > match ( newcon , c - > name , c - > index , c - > options ) ! = 0 ) {
/* default matching */
BUILD_BUG_ON ( sizeof ( c - > name ) ! = sizeof ( newcon - > name ) ) ;
if ( strcmp ( c - > name , newcon - > name ) ! = 0 )
continue ;
if ( newcon - > index > = 0 & &
newcon - > index ! = c - > index )
continue ;
if ( newcon - > index < 0 )
newcon - > index = c - > index ;
if ( _braille_register_console ( newcon , c ) )
return 0 ;
if ( newcon - > setup & &
2020-06-18 19:47:50 +03:00
( err = newcon - > setup ( newcon , c - > options ) ) ! = 0 )
return err ;
2020-02-13 12:51:31 +03:00
}
newcon - > flags | = CON_ENABLED ;
2021-11-22 16:26:47 +03:00
if ( i = = preferred_console )
2020-02-13 12:51:31 +03:00
newcon - > flags | = CON_CONSDEV ;
return 0 ;
}
/*
* Some consoles , such as pstore and netconsole , can be enabled even
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
* without matching . Accept the pre - enabled consoles only when match ( )
2020-06-18 19:47:51 +03:00
* and setup ( ) had a chance to be called .
2020-02-13 12:51:31 +03:00
*/
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
if ( newcon - > flags & CON_ENABLED & & c - > user_specified = = user_specified )
2020-02-13 12:51:31 +03:00
return 0 ;
return - ENOENT ;
}
2021-11-22 16:26:45 +03:00
/* Try to enable the console unconditionally */
static void try_enable_default_console ( struct console * newcon )
{
if ( newcon - > index < 0 )
newcon - > index = 0 ;
if ( newcon - > setup & & newcon - > setup ( newcon , NULL ) ! = 0 )
return ;
newcon - > flags | = CON_ENABLED ;
printk/console: Remove need_default_console variable
The variable @need_default_console is used to decide whether a newly
registered console should get enabled by default.
The logic is complicated. It can be modified in a register_console()
call. But it is always re-evaluated in the next call by the following
condition:
if (need_default_console || bcon || !console_drivers)
need_default_console = preferred_console < 0;
In short, the value is updated when either of the condition is valid:
+ the value is still, or again, "true"
+ boot/early console is still the first in @console_driver list
+ @console_driver list is empty
The value is updated according to @preferred_console. In particular,
it is set to "false" when a @preferred_console was set by
__add_preferred_console(). This happens when a non-braille console
was added via the command line, device tree, or SPCR.
It far from clear what this all means together. Let's look at
@need_default_console from another angle:
1. The value is "true" by default. It means that it is always set
according to @preferred_console during the first register_console()
call.
By other words, the first register_console() call will register
the console by default only when none non-braille console was defined
via the command line, device tree, or SPCR.
2. The value will always stay "false" when @preferred_console is set.
By other words, try_enable_default_console() will never get called
when a non-braille console is explicitly required.
4. The value might be set to "false" in try_enable_default_console()
when a console with tty binding (driver) gets enabled.
In this case CON_CONSDEV is set as well. It causes that the console
will be inserted as first into the list @console_driver. It might
be either real or boot/early console.
5. The value will be set _back_ to "true" in the next register_console()
call when:
+ The console added by the previous register_console() had been
a boot/early one.
+ The last console has been unregistered in the meantime and
a boot/early console became first in @console_drivers list
again. Or the list became empty.
By other words, the value will stay "false" only when the last
registered console was real, had tty binding, and was not removed
in the mean time.
The main logic looks clear:
+ Consoles are enabled by default only when no one is preferred
via the command line, device tree, or SPCR.
+ By default, any console is enabled until a real console
with tty binding gets registered.
The behavior when the real console with tty binding is later removed
is a bit unclear:
+ By default, any new console is registered again only when there
is no console or the first console in the list is a boot one.
The question is why the code is suddenly happy when a real console
without tty binding is the first in the list. It looks like an overlook
and bug.
Conclusion:
The state of @preferred_console and the first console in @console_driver
list should be enough to decide whether we need to enable the given console
by default.
The rules are simple. New consoles are _not_ enabled by default
when either of the following conditions is true:
+ @preferred_console is set. It means that a non-braille console
is explicitly configured via the command line, device tree, or SPCR.
+ A real console with tty binding is registered. Such a console will
have CON_CONSDEV flag set and will always be the first in
@console_drivers list.
Note:
The new code does not use @bcon variable. The meaning of the variable
is far from clear. The direct check of the first console in the list
makes it more clear that only real console fulfills requirements
of the default console.
Behavior change:
As already discussed above. There was one situation where the original
code worked a strange way. Let's have:
+ console A: real console without tty binding
+ console B: real console with tty binding
and do:
register_console(A); /* 1st step */
register_console(B); /* 2nd step */
unregister_console(B); /* 3rd step */
register_console(B); /* 4th step */
The original code will not register the console B in the 4th step.
@need_default_console is set to "false" in 2nd step. The real console
with tty binding (driver) is then removed in the 3rd step.
But @need_default_console will stay "false" in the 4th step because
there is no boot/early console and @registered_consoles list is not
empty.
The new code will register the console B in the 4th step because
it checks whether the first console has tty binding (->driver)
This behavior change should acceptable:
1. The scenario requires manual intervention (console removal).
The system should boot with the same consoles as before.
2. Console B is registered again probably because the user wants
to use it. The most likely scenario is that the related
module is reloaded.
3. It makes the behavior more consistent and predictable.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-11-22 16:26:48 +03:00
if ( newcon - > device )
2021-11-22 16:26:45 +03:00
newcon - > flags | = CON_CONSDEV ;
}
2022-11-16 19:21:18 +03:00
static void console_init_seq ( struct console * newcon , bool bootcon_registered )
2022-11-16 19:21:17 +03:00
{
2022-11-16 19:21:18 +03:00
struct console * con ;
bool handover ;
if ( newcon - > flags & ( CON_PRINTBUFFER | CON_BOOT ) ) {
2022-11-16 19:21:17 +03:00
/* Get a consistent copy of @syslog_seq. */
mutex_lock ( & syslog_lock ) ;
newcon - > seq = syslog_seq ;
mutex_unlock ( & syslog_lock ) ;
} else {
2022-11-16 19:21:18 +03:00
/* Begin with next message added to ringbuffer. */
2022-11-16 19:21:17 +03:00
newcon - > seq = prb_next_seq ( prb ) ;
2022-11-16 19:21:18 +03:00
/*
* If any enabled boot consoles are due to be unregistered
* shortly , some may not be caught up and may be the same
* device as @ newcon . Since it is not known which boot console
* is the same device , flush all consoles and , if necessary ,
* start with the message of the enabled boot console that is
* the furthest behind .
*/
if ( bootcon_registered & & ! keep_bootcon ) {
2022-11-16 19:21:51 +03:00
/*
* Hold the console_lock to stop console printing and
* guarantee safe access to console - > seq .
*/
console_lock ( ) ;
2022-11-16 19:21:18 +03:00
/*
* Flush all consoles and set the console to start at
* the next unprinted sequence number .
*/
if ( ! console_flush_all ( true , & newcon - > seq , & handover ) ) {
/*
* Flushing failed . Just choose the lowest
* sequence of the enabled boot consoles .
*/
/*
* If there was a handover , this context no
* longer holds the console_lock .
*/
if ( handover )
console_lock ( ) ;
newcon - > seq = prb_next_seq ( prb ) ;
for_each_console ( con ) {
if ( ( con - > flags & CON_BOOT ) & &
( con - > flags & CON_ENABLED ) & &
con - > seq < newcon - > seq ) {
newcon - > seq = con - > seq ;
}
}
}
2022-11-16 19:21:51 +03:00
console_unlock ( ) ;
2022-11-16 19:21:18 +03:00
}
2022-11-16 19:21:17 +03:00
}
}
2022-11-16 19:21:14 +03:00
# define console_first() \
hlist_entry ( console_list . first , struct console , node )
2022-11-21 14:10:12 +03:00
static int unregister_console_locked ( struct console * console ) ;
2005-04-17 02:20:36 +04:00
/*
* The console driver calls this routine during kernel initialization
* to register the console printing procedure with printk ( ) and to
* print any messages that were printed by the kernel before the
* console driver was initialized .
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
*
* This can happen pretty early during the boot process ( because of
* early_printk ) - sometimes before setup_arch ( ) completes - be careful
* of what kernel features are used - they may not be initialised yet .
*
* There are two types of consoles - bootconsoles ( early_printk ) and
* " real " consoles ( everything which is not a bootconsole ) which are
* handled differently .
* - Any number of bootconsoles can be registered at any time .
* - As soon as a " real " console is registered , all bootconsoles
* will be unregistered automatically .
* - Once a " real " console is registered , any attempt to register a
* bootconsoles will be rejected
2005-04-17 02:20:36 +04:00
*/
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
void register_console ( struct console * newcon )
2005-04-17 02:20:36 +04:00
{
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
struct console * con ;
2022-11-16 19:21:16 +03:00
bool bootcon_registered = false ;
bool realcon_registered = false ;
2020-02-13 12:51:31 +03:00
int err ;
2005-04-17 02:20:36 +04:00
2022-11-21 14:10:12 +03:00
console_list_lock ( ) ;
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
for_each_console ( con ) {
if ( WARN ( con = = newcon , " console '%s%d' already registered \n " ,
2022-11-21 14:10:12 +03:00
con - > name , con - > index ) ) {
goto unlock ;
}
2013-08-02 14:23:34 +04:00
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
if ( con - > flags & CON_BOOT )
2022-11-16 19:21:16 +03:00
bootcon_registered = true ;
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
else
2022-11-16 19:21:16 +03:00
realcon_registered = true ;
2007-05-08 11:26:49 +04:00
}
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
/* Do not register boot consoles when there already is a real one. */
2022-11-16 19:21:16 +03:00
if ( ( newcon - > flags & CON_BOOT ) & & realcon_registered ) {
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
pr_info ( " Too late to register bootconsole %s%d \n " ,
newcon - > name , newcon - > index ) ;
2022-11-21 14:10:12 +03:00
goto unlock ;
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
}
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
2023-09-16 22:20:03 +03:00
if ( newcon - > flags & CON_NBCON ) {
/*
* Ensure the nbcon console buffers can be allocated
* before modifying any global data .
*/
if ( ! nbcon_alloc ( newcon ) )
goto unlock ;
}
2005-04-17 02:20:36 +04:00
/*
printk/console: Remove need_default_console variable
The variable @need_default_console is used to decide whether a newly
registered console should get enabled by default.
The logic is complicated. It can be modified in a register_console()
call. But it is always re-evaluated in the next call by the following
condition:
if (need_default_console || bcon || !console_drivers)
need_default_console = preferred_console < 0;
In short, the value is updated when either of the condition is valid:
+ the value is still, or again, "true"
+ boot/early console is still the first in @console_driver list
+ @console_driver list is empty
The value is updated according to @preferred_console. In particular,
it is set to "false" when a @preferred_console was set by
__add_preferred_console(). This happens when a non-braille console
was added via the command line, device tree, or SPCR.
It far from clear what this all means together. Let's look at
@need_default_console from another angle:
1. The value is "true" by default. It means that it is always set
according to @preferred_console during the first register_console()
call.
By other words, the first register_console() call will register
the console by default only when none non-braille console was defined
via the command line, device tree, or SPCR.
2. The value will always stay "false" when @preferred_console is set.
By other words, try_enable_default_console() will never get called
when a non-braille console is explicitly required.
4. The value might be set to "false" in try_enable_default_console()
when a console with tty binding (driver) gets enabled.
In this case CON_CONSDEV is set as well. It causes that the console
will be inserted as first into the list @console_driver. It might
be either real or boot/early console.
5. The value will be set _back_ to "true" in the next register_console()
call when:
+ The console added by the previous register_console() had been
a boot/early one.
+ The last console has been unregistered in the meantime and
a boot/early console became first in @console_drivers list
again. Or the list became empty.
By other words, the value will stay "false" only when the last
registered console was real, had tty binding, and was not removed
in the mean time.
The main logic looks clear:
+ Consoles are enabled by default only when no one is preferred
via the command line, device tree, or SPCR.
+ By default, any console is enabled until a real console
with tty binding gets registered.
The behavior when the real console with tty binding is later removed
is a bit unclear:
+ By default, any new console is registered again only when there
is no console or the first console in the list is a boot one.
The question is why the code is suddenly happy when a real console
without tty binding is the first in the list. It looks like an overlook
and bug.
Conclusion:
The state of @preferred_console and the first console in @console_driver
list should be enough to decide whether we need to enable the given console
by default.
The rules are simple. New consoles are _not_ enabled by default
when either of the following conditions is true:
+ @preferred_console is set. It means that a non-braille console
is explicitly configured via the command line, device tree, or SPCR.
+ A real console with tty binding is registered. Such a console will
have CON_CONSDEV flag set and will always be the first in
@console_drivers list.
Note:
The new code does not use @bcon variable. The meaning of the variable
is far from clear. The direct check of the first console in the list
makes it more clear that only real console fulfills requirements
of the default console.
Behavior change:
As already discussed above. There was one situation where the original
code worked a strange way. Let's have:
+ console A: real console without tty binding
+ console B: real console with tty binding
and do:
register_console(A); /* 1st step */
register_console(B); /* 2nd step */
unregister_console(B); /* 3rd step */
register_console(B); /* 4th step */
The original code will not register the console B in the 4th step.
@need_default_console is set to "false" in 2nd step. The real console
with tty binding (driver) is then removed in the 3rd step.
But @need_default_console will stay "false" in the 4th step because
there is no boot/early console and @registered_consoles list is not
empty.
The new code will register the console B in the 4th step because
it checks whether the first console has tty binding (->driver)
This behavior change should acceptable:
1. The scenario requires manual intervention (console removal).
The system should boot with the same consoles as before.
2. Console B is registered again probably because the user wants
to use it. The most likely scenario is that the related
module is reloaded.
3. It makes the behavior more consistent and predictable.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-11-22 16:26:48 +03:00
* See if we want to enable this console driver by default .
*
* Nope when a console is preferred by the command line , device
* tree , or SPCR .
*
* The first real console with tty binding ( driver ) wins . More
* consoles might get enabled before the right one is found .
*
* Note that a console with tty binding will have CON_CONSDEV
* flag set and will be first in the list .
2005-04-17 02:20:36 +04:00
*/
printk/console: Remove need_default_console variable
The variable @need_default_console is used to decide whether a newly
registered console should get enabled by default.
The logic is complicated. It can be modified in a register_console()
call. But it is always re-evaluated in the next call by the following
condition:
if (need_default_console || bcon || !console_drivers)
need_default_console = preferred_console < 0;
In short, the value is updated when either of the condition is valid:
+ the value is still, or again, "true"
+ boot/early console is still the first in @console_driver list
+ @console_driver list is empty
The value is updated according to @preferred_console. In particular,
it is set to "false" when a @preferred_console was set by
__add_preferred_console(). This happens when a non-braille console
was added via the command line, device tree, or SPCR.
It far from clear what this all means together. Let's look at
@need_default_console from another angle:
1. The value is "true" by default. It means that it is always set
according to @preferred_console during the first register_console()
call.
By other words, the first register_console() call will register
the console by default only when none non-braille console was defined
via the command line, device tree, or SPCR.
2. The value will always stay "false" when @preferred_console is set.
By other words, try_enable_default_console() will never get called
when a non-braille console is explicitly required.
4. The value might be set to "false" in try_enable_default_console()
when a console with tty binding (driver) gets enabled.
In this case CON_CONSDEV is set as well. It causes that the console
will be inserted as first into the list @console_driver. It might
be either real or boot/early console.
5. The value will be set _back_ to "true" in the next register_console()
call when:
+ The console added by the previous register_console() had been
a boot/early one.
+ The last console has been unregistered in the meantime and
a boot/early console became first in @console_drivers list
again. Or the list became empty.
By other words, the value will stay "false" only when the last
registered console was real, had tty binding, and was not removed
in the mean time.
The main logic looks clear:
+ Consoles are enabled by default only when no one is preferred
via the command line, device tree, or SPCR.
+ By default, any console is enabled until a real console
with tty binding gets registered.
The behavior when the real console with tty binding is later removed
is a bit unclear:
+ By default, any new console is registered again only when there
is no console or the first console in the list is a boot one.
The question is why the code is suddenly happy when a real console
without tty binding is the first in the list. It looks like an overlook
and bug.
Conclusion:
The state of @preferred_console and the first console in @console_driver
list should be enough to decide whether we need to enable the given console
by default.
The rules are simple. New consoles are _not_ enabled by default
when either of the following conditions is true:
+ @preferred_console is set. It means that a non-braille console
is explicitly configured via the command line, device tree, or SPCR.
+ A real console with tty binding is registered. Such a console will
have CON_CONSDEV flag set and will always be the first in
@console_drivers list.
Note:
The new code does not use @bcon variable. The meaning of the variable
is far from clear. The direct check of the first console in the list
makes it more clear that only real console fulfills requirements
of the default console.
Behavior change:
As already discussed above. There was one situation where the original
code worked a strange way. Let's have:
+ console A: real console without tty binding
+ console B: real console with tty binding
and do:
register_console(A); /* 1st step */
register_console(B); /* 2nd step */
unregister_console(B); /* 3rd step */
register_console(B); /* 4th step */
The original code will not register the console B in the 4th step.
@need_default_console is set to "false" in 2nd step. The real console
with tty binding (driver) is then removed in the 3rd step.
But @need_default_console will stay "false" in the 4th step because
there is no boot/early console and @registered_consoles list is not
empty.
The new code will register the console B in the 4th step because
it checks whether the first console has tty binding (->driver)
This behavior change should acceptable:
1. The scenario requires manual intervention (console removal).
The system should boot with the same consoles as before.
2. Console B is registered again probably because the user wants
to use it. The most likely scenario is that the related
module is reloaded.
3. It makes the behavior more consistent and predictable.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-11-22 16:26:48 +03:00
if ( preferred_console < 0 ) {
2022-11-16 19:21:14 +03:00
if ( hlist_empty ( & console_list ) | | ! console_first ( ) - > device | |
console_first ( ) - > flags & CON_BOOT ) {
printk/console: Remove need_default_console variable
The variable @need_default_console is used to decide whether a newly
registered console should get enabled by default.
The logic is complicated. It can be modified in a register_console()
call. But it is always re-evaluated in the next call by the following
condition:
if (need_default_console || bcon || !console_drivers)
need_default_console = preferred_console < 0;
In short, the value is updated when either of the condition is valid:
+ the value is still, or again, "true"
+ boot/early console is still the first in @console_driver list
+ @console_driver list is empty
The value is updated according to @preferred_console. In particular,
it is set to "false" when a @preferred_console was set by
__add_preferred_console(). This happens when a non-braille console
was added via the command line, device tree, or SPCR.
It far from clear what this all means together. Let's look at
@need_default_console from another angle:
1. The value is "true" by default. It means that it is always set
according to @preferred_console during the first register_console()
call.
By other words, the first register_console() call will register
the console by default only when none non-braille console was defined
via the command line, device tree, or SPCR.
2. The value will always stay "false" when @preferred_console is set.
By other words, try_enable_default_console() will never get called
when a non-braille console is explicitly required.
4. The value might be set to "false" in try_enable_default_console()
when a console with tty binding (driver) gets enabled.
In this case CON_CONSDEV is set as well. It causes that the console
will be inserted as first into the list @console_driver. It might
be either real or boot/early console.
5. The value will be set _back_ to "true" in the next register_console()
call when:
+ The console added by the previous register_console() had been
a boot/early one.
+ The last console has been unregistered in the meantime and
a boot/early console became first in @console_drivers list
again. Or the list became empty.
By other words, the value will stay "false" only when the last
registered console was real, had tty binding, and was not removed
in the mean time.
The main logic looks clear:
+ Consoles are enabled by default only when no one is preferred
via the command line, device tree, or SPCR.
+ By default, any console is enabled until a real console
with tty binding gets registered.
The behavior when the real console with tty binding is later removed
is a bit unclear:
+ By default, any new console is registered again only when there
is no console or the first console in the list is a boot one.
The question is why the code is suddenly happy when a real console
without tty binding is the first in the list. It looks like an overlook
and bug.
Conclusion:
The state of @preferred_console and the first console in @console_driver
list should be enough to decide whether we need to enable the given console
by default.
The rules are simple. New consoles are _not_ enabled by default
when either of the following conditions is true:
+ @preferred_console is set. It means that a non-braille console
is explicitly configured via the command line, device tree, or SPCR.
+ A real console with tty binding is registered. Such a console will
have CON_CONSDEV flag set and will always be the first in
@console_drivers list.
Note:
The new code does not use @bcon variable. The meaning of the variable
is far from clear. The direct check of the first console in the list
makes it more clear that only real console fulfills requirements
of the default console.
Behavior change:
As already discussed above. There was one situation where the original
code worked a strange way. Let's have:
+ console A: real console without tty binding
+ console B: real console with tty binding
and do:
register_console(A); /* 1st step */
register_console(B); /* 2nd step */
unregister_console(B); /* 3rd step */
register_console(B); /* 4th step */
The original code will not register the console B in the 4th step.
@need_default_console is set to "false" in 2nd step. The real console
with tty binding (driver) is then removed in the 3rd step.
But @need_default_console will stay "false" in the 4th step because
there is no boot/early console and @registered_consoles list is not
empty.
The new code will register the console B in the 4th step because
it checks whether the first console has tty binding (->driver)
This behavior change should acceptable:
1. The scenario requires manual intervention (console removal).
The system should boot with the same consoles as before.
2. Console B is registered again probably because the user wants
to use it. The most likely scenario is that the related
module is reloaded.
3. It makes the behavior more consistent and predictable.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-11-22 16:26:48 +03:00
try_enable_default_console ( newcon ) ;
}
}
2005-04-17 02:20:36 +04:00
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
/* See if this console matches one we selected on the command line */
2021-11-22 16:26:45 +03:00
err = try_enable_preferred_console ( newcon , true ) ;
2013-08-01 00:53:45 +04:00
printk: Fix preferred console selection with multiple matches
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-02-13 12:51:32 +03:00
/* If not, try to match against the platform default(s) */
if ( err = = - ENOENT )
2021-11-22 16:26:45 +03:00
err = try_enable_preferred_console ( newcon , false ) ;
2005-04-17 02:20:36 +04:00
2020-02-13 12:51:31 +03:00
/* printk() messages are not printed to the Braille console. */
2023-09-16 22:20:03 +03:00
if ( err | | newcon - > flags & CON_BRL ) {
if ( newcon - > flags & CON_NBCON )
nbcon_free ( newcon ) ;
2022-11-21 14:10:12 +03:00
goto unlock ;
2023-09-16 22:20:03 +03:00
}
2005-04-17 02:20:36 +04:00
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* If we have a bootconsole , and are switching to a real console ,
* don ' t print everything out again , since when the boot console , and
* the real console are the same physical device , it ' s annoying to
* see the beginning boot messages twice
*/
2022-11-16 19:21:16 +03:00
if ( bootcon_registered & &
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) ) {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags & = ~ CON_PRINTBUFFER ;
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
}
2005-04-17 02:20:36 +04:00
2022-11-16 19:21:15 +03:00
newcon - > dropped = 0 ;
2022-11-16 19:21:18 +03:00
console_init_seq ( newcon , bootcon_registered ) ;
2022-11-16 19:21:15 +03:00
2023-09-16 22:20:00 +03:00
if ( newcon - > flags & CON_NBCON )
nbcon_init ( newcon ) ;
2005-04-17 02:20:36 +04:00
/*
2022-11-16 19:21:14 +03:00
* Put this console in the list - keep the
* preferred driver at the head of the list .
2005-04-17 02:20:36 +04:00
*/
2022-11-16 19:21:14 +03:00
if ( hlist_empty ( & console_list ) ) {
/* Ensure CON_CONSDEV is always set for the head. */
2020-02-13 12:51:33 +03:00
newcon - > flags | = CON_CONSDEV ;
2022-11-16 19:21:15 +03:00
hlist_add_head_rcu ( & newcon - > node , & console_list ) ;
2022-11-16 19:21:14 +03:00
} else if ( newcon - > flags & CON_CONSDEV ) {
/* Only the new head can have CON_CONSDEV set. */
2022-11-16 19:21:24 +03:00
console_srcu_write_flags ( console_first ( ) , console_first ( ) - > flags & ~ CON_CONSDEV ) ;
2022-11-16 19:21:15 +03:00
hlist_add_head_rcu ( & newcon - > node , & console_list ) ;
2015-06-26 01:01:30 +03:00
2022-04-22 00:22:44 +03:00
} else {
2022-11-16 19:21:15 +03:00
hlist_add_behind_rcu ( & newcon - > node , console_list . first ) ;
2005-04-17 02:20:36 +04:00
}
2022-11-16 19:21:15 +03:00
/*
* No need to synchronize SRCU here ! The caller does not rely
* on all contexts being able to see the new console before
* register_console ( ) completes .
*/
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* By unregistering the bootconsoles after we enable the real console
* we get the " console xxx enabled " message on all the consoles -
* boot consoles , real consoles , etc - this is to ensure that end
* users know there might be something in the kernel ' s log buffer that
* went to the bootconsole ( that they do not see on the real console )
*/
2022-04-22 00:22:43 +03:00
con_printk ( KERN_INFO , newcon , " enabled \n " ) ;
2022-11-16 19:21:16 +03:00
if ( bootcon_registered & &
2011-03-23 02:34:20 +03:00
( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) & &
! keep_bootcon ) {
2022-11-16 19:21:14 +03:00
struct hlist_node * tmp ;
hlist_for_each_entry_safe ( con , tmp , & console_list , node ) {
printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.
The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.
The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.
The information about the first console is still used for two more
decisions:
1. It prevents duplicate output on non-boot consoles with
CON_CONSDEV flag set.
2. Early/boot consoles are unregistered when a real console with
CON_CONSDEV is registered and @keep_bootcon is not set.
The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:
1. Consoles with CON_CONSDEV flag are put at the beginning of
the list. It is either the preferred console or any console
with tty binding registered by default.
2. Another console might become the first in the list when
the first console in the list is unregistered. It might
happen either explicitly or automatically when boot
consoles are unregistered.
There is one more important rule:
+ Boot consoles can't be registered when any real console
is already registered.
It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.
Let's try to make it easier:
1. Add variable @bootcon_enabled and set it by iterating all registered
consoles. The variable has obvious meaning and more predictable
behavior. Any speed optimization and other tricks are not worth it.
2. Use a generic name for the variable that is used to iterate
the list on registered console drivers.
Behavior change:
No, maybe surprisingly, there is _no_ behavior change!
Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.
By other words, the behavior would change when the following conditions
are true:
+ a console with CON_CONSDEV flag is added
+ a real (non-boot) console is the first in the list
+ a boot console is later in the list
Now, a real console might be first in the list only when:
+ It was the first registered console. In this case, there can't be
any boot console because any later ones were rejected.
+ It was put at the first position because it had CON_CONSDEV flag
set. It was either the preferred console or it was a console with
tty binding registered by default. We are interested only in
a real consoles here. And real console with tty binding fulfills
conditions of the default console.
Now, there is always only one console that is either preferred
or fulfills conditions of the default console. It can't be already
in the list and being registered at the same time.
As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.
Final dilemma:
OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?
OK, let's look at it from another angle:
1. The ordering of @console_drivers list is important only in
console_device() function. The first console driver with tty
binding gets associated with /dev/console.
2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
for the driver that is returned by console_device().
3. A boot console is removed and the duplicated output is prevented
when the real console with CON_CONSDEV flag is registered.
Now, in the ideal world:
+ The driver associated with /dev/console should be either a console
preferred via the command line, device tree, or SPCR. Or it should
be the first real console with tty binding registered by default.
+ The code should match the related boot and real console drivers.
It should unregister only the obsolete boot driver. And the duplicated
output should be prevented only on the related real driver.
It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.
It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.
Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-11-22 16:26:49 +03:00
if ( con - > flags & CON_BOOT )
2022-11-21 14:10:12 +03:00
unregister_console_locked ( con ) ;
2022-11-16 19:21:14 +03:00
}
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
}
2022-11-21 14:10:12 +03:00
unlock :
console_list_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( register_console ) ;
2022-11-21 14:10:12 +03:00
/* Must be called under console_list_lock(). */
static int unregister_console_locked ( struct console * console )
2005-04-17 02:20:36 +04:00
{
2013-08-01 00:53:45 +04:00
int res ;
2005-04-17 02:20:36 +04:00
2022-11-21 14:10:12 +03:00
lockdep_assert_console_list_lock_held ( ) ;
2022-04-22 00:22:43 +03:00
con_printk ( KERN_INFO , console , " disabled \n " ) ;
2013-11-13 03:08:49 +04:00
2013-08-01 00:53:45 +04:00
res = _braille_unregister_console ( console ) ;
2020-02-03 16:31:28 +03:00
if ( res < 0 )
2013-08-01 00:53:45 +04:00
return res ;
2020-02-03 16:31:28 +03:00
if ( res > 0 )
return 0 ;
2008-04-30 11:54:51 +04:00
2022-11-16 19:21:14 +03:00
/* Disable it unconditionally */
2022-11-16 19:21:24 +03:00
console_srcu_write_flags ( console , console - > flags & ~ CON_ENABLED ) ;
2022-11-16 19:21:14 +03:00
2022-11-16 19:21:51 +03:00
if ( ! console_is_registered_locked ( console ) )
2022-11-16 19:21:14 +03:00
return - ENODEV ;
2005-10-31 02:02:46 +03:00
2022-11-16 19:21:15 +03:00
hlist_del_init_rcu ( & console - > node ) ;
2020-02-03 16:31:29 +03:00
2007-05-08 11:26:49 +04:00
/*
2022-11-16 19:21:14 +03:00
* < HISTORICAL >
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
* If this isn ' t the last console and it has CON_CONSDEV set , we
* need to set it on the next preferred console .
2022-11-16 19:21:14 +03:00
* < / HISTORICAL >
*
* The above makes no sense as there is no guarantee that the next
* console has any device attached . Oh well . . . .
2005-04-17 02:20:36 +04:00
*/
2022-11-16 19:21:14 +03:00
if ( ! hlist_empty ( & console_list ) & & console - > flags & CON_CONSDEV )
2022-11-16 19:21:24 +03:00
console_srcu_write_flags ( console_first ( ) , console_first ( ) - > flags | CON_CONSDEV ) ;
2005-04-17 02:20:36 +04:00
2022-11-16 19:21:15 +03:00
/*
* Ensure that all SRCU list walks have completed . All contexts
* must not be able to see this console in the list so that any
* exit / cleanup routines can be performed safely .
*/
synchronize_srcu ( & console_srcu ) ;
2005-04-17 02:20:36 +04:00
2023-09-16 22:20:00 +03:00
if ( console - > flags & CON_NBCON )
2023-09-16 22:20:03 +03:00
nbcon_free ( console ) ;
2023-09-16 22:20:00 +03:00
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
2020-02-03 16:31:29 +03:00
2020-02-03 16:31:30 +03:00
if ( console - > exit )
res = console - > exit ( console ) ;
2020-02-03 16:31:29 +03:00
return res ;
2005-04-17 02:20:36 +04:00
}
2020-02-03 16:31:29 +03:00
2022-11-21 14:10:12 +03:00
int unregister_console ( struct console * console )
{
int res ;
2020-02-03 16:31:29 +03:00
2022-11-21 14:10:12 +03:00
console_list_lock ( ) ;
res = unregister_console_locked ( console ) ;
console_list_unlock ( ) ;
2005-04-17 02:20:36 +04:00
return res ;
}
EXPORT_SYMBOL ( unregister_console ) ;
2005-05-01 19:59:02 +04:00
2022-11-16 19:21:44 +03:00
/**
* console_force_preferred_locked - force a registered console preferred
* @ con : The registered console to force preferred .
*
* Must be called under console_list_lock ( ) .
*/
void console_force_preferred_locked ( struct console * con )
{
struct console * cur_pref_con ;
if ( ! console_is_registered_locked ( con ) )
return ;
cur_pref_con = console_first ( ) ;
/* Already preferred? */
if ( cur_pref_con = = con )
return ;
/*
* Delete , but do not re - initialize the entry . This allows the console
* to continue to appear registered ( via any hlist_unhashed_lockless ( )
* checks ) , even though it was briefly removed from the console list .
*/
hlist_del_rcu ( & con - > node ) ;
/*
* Ensure that all SRCU list walks have completed so that the console
* can be added to the beginning of the console list and its forward
* list pointer can be re - initialized .
*/
synchronize_srcu ( & console_srcu ) ;
con - > flags | = CON_CONSDEV ;
WARN_ON ( ! con - > device ) ;
/* Only the new head can have CON_CONSDEV set. */
console_srcu_write_flags ( cur_pref_con , cur_pref_con - > flags & ~ CON_CONSDEV ) ;
hlist_add_head_rcu ( & con - > node , & console_list ) ;
}
EXPORT_SYMBOL ( console_force_preferred_locked ) ;
2017-04-13 01:37:14 +03:00
/*
* Initialize the console device . This is called * early * , so
* we can ' t necessarily depend on lots of kernel help here .
* Just do some early initializations , and do the complex setup
* later .
*/
void __init console_init ( void )
{
2018-03-23 03:33:28 +03:00
int ret ;
2018-08-22 07:56:13 +03:00
initcall_t call ;
initcall_entry_t * ce ;
2017-04-13 01:37:14 +03:00
/* Setup the default TTY line discipline. */
n_tty_init ( ) ;
/*
* set up the console device so that later boot sequences can
* inform about problems etc . .
*/
2018-08-22 07:56:13 +03:00
ce = __con_initcall_start ;
2018-03-23 03:33:28 +03:00
trace_initcall_level ( " console " ) ;
2018-08-22 07:56:13 +03:00
while ( ce < __con_initcall_end ) {
call = initcall_from_entry ( ce ) ;
trace_initcall_start ( call ) ;
ret = call ( ) ;
trace_initcall_finish ( call , ret ) ;
ce + + ;
2017-04-13 01:37:14 +03:00
}
}
2016-01-16 03:58:21 +03:00
/*
* Some boot consoles access data that is in the init section and which will
* be discarded after the initcalls have been run . To make sure that no code
* will access this data , unregister the boot consoles in a late initcall .
*
* If for some reason , such as deferred probe or the driver being a loadable
* module , the real console hasn ' t registered yet at this point , there will
* be a brief interval in which no messages are logged to the console , which
* makes it difficult to diagnose problems that occur during this time .
*
* To mitigate this problem somewhat , only unregister consoles whose memory
2017-07-14 15:51:12 +03:00
* intersects with the init section . Note that all other boot consoles will
2021-03-28 07:39:32 +03:00
* get unregistered when the real preferred console is registered .
2016-01-16 03:58:21 +03:00
*/
2010-06-04 09:11:25 +04:00
static int __init printk_late_init ( void )
2007-08-20 23:22:47 +04:00
{
2022-11-16 19:21:14 +03:00
struct hlist_node * tmp ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
struct console * con ;
2016-11-03 17:49:58 +03:00
int ret ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
2022-11-21 14:10:12 +03:00
console_list_lock ( ) ;
2022-11-16 19:21:14 +03:00
hlist_for_each_entry_safe ( con , tmp , & console_list , node ) {
2017-07-14 15:51:13 +03:00
if ( ! ( con - > flags & CON_BOOT ) )
continue ;
/* Check addresses that might be used for enabled consoles. */
if ( init_section_intersects ( con , sizeof ( * con ) ) | |
init_section_contains ( con - > write , 0 ) | |
init_section_contains ( con - > read , 0 ) | |
init_section_contains ( con - > device , 0 ) | |
init_section_contains ( con - > unblank , 0 ) | |
init_section_contains ( con - > data , 0 ) ) {
2016-01-16 03:58:21 +03:00
/*
2017-07-14 15:51:12 +03:00
* Please , consider moving the reported consoles out
* of the init section .
2016-01-16 03:58:21 +03:00
*/
2017-07-14 15:51:12 +03:00
pr_warn ( " bootconsole [%s%d] uses init memory and must be disabled even before the real one is ready \n " ,
con - > name , con - > index ) ;
2022-11-21 14:10:12 +03:00
unregister_console_locked ( con ) ;
2007-08-22 07:14:58 +04:00
}
2007-08-20 23:22:47 +04:00
}
2022-11-21 14:10:12 +03:00
console_list_unlock ( ) ;
2022-11-16 19:21:14 +03:00
2016-11-03 17:49:58 +03:00
ret = cpuhp_setup_state_nocalls ( CPUHP_PRINTK_DEAD , " printk:dead " , NULL ,
console_cpu_notify ) ;
WARN_ON ( ret < 0 ) ;
ret = cpuhp_setup_state_nocalls ( CPUHP_AP_ONLINE_DYN , " printk:online " ,
console_cpu_notify , NULL ) ;
WARN_ON ( ret < 0 ) ;
2022-01-22 09:12:33 +03:00
printk_sysctl_init ( ) ;
2007-08-20 23:22:47 +04:00
return 0 ;
}
2010-06-04 09:11:25 +04:00
late_initcall ( printk_late_init ) ;
2007-08-20 23:22:47 +04:00
2008-02-08 15:21:25 +03:00
# if defined CONFIG_PRINTK
2022-04-22 00:22:46 +03:00
/* If @con is specified, only wait for that console. Otherwise wait for all. */
static bool __pr_flush ( struct console * con , int timeout_ms , bool reset_on_progress )
{
2023-10-06 11:21:51 +03:00
unsigned long timeout_jiffies = msecs_to_jiffies ( timeout_ms ) ;
unsigned long remaining_jiffies = timeout_jiffies ;
2022-04-22 00:22:46 +03:00
struct console * c ;
u64 last_diff = 0 ;
u64 printk_seq ;
2023-09-16 22:20:05 +03:00
short flags ;
2022-11-16 19:21:28 +03:00
int cookie ;
2022-04-22 00:22:46 +03:00
u64 diff ;
u64 seq ;
might_sleep ( ) ;
seq = prb_next_seq ( prb ) ;
2023-10-06 11:21:50 +03:00
/* Flush the consoles so that records up to @seq are printed. */
console_lock ( ) ;
console_unlock ( ) ;
2022-04-22 00:22:46 +03:00
for ( ; ; ) {
2023-10-06 11:21:51 +03:00
unsigned long begin_jiffies ;
unsigned long slept_jiffies ;
2022-04-22 00:22:46 +03:00
diff = 0 ;
2022-11-16 19:21:28 +03:00
/*
* Hold the console_lock to guarantee safe access to
2023-10-06 11:21:50 +03:00
* console - > seq . Releasing console_lock flushes more
* records in case @ seq is still not printed on all
* usable consoles .
2022-11-16 19:21:28 +03:00
*/
2022-04-22 00:22:46 +03:00
console_lock ( ) ;
2022-07-15 09:10:42 +03:00
2022-11-16 19:21:28 +03:00
cookie = console_srcu_read_lock ( ) ;
for_each_console_srcu ( c ) {
2022-04-22 00:22:46 +03:00
if ( con & & con ! = c )
continue ;
2023-09-16 22:20:05 +03:00
flags = console_srcu_read_flags ( c ) ;
2023-07-17 22:46:06 +03:00
/*
* If consoles are not usable , it cannot be expected
* that they make forward progress , so only increment
* @ diff for usable consoles .
*/
2022-04-22 00:22:46 +03:00
if ( ! console_is_usable ( c ) )
continue ;
2023-09-16 22:20:05 +03:00
if ( flags & CON_NBCON ) {
printk_seq = nbcon_seq_read ( c ) ;
} else {
printk_seq = c - > seq ;
}
2022-04-22 00:22:46 +03:00
if ( printk_seq < seq )
diff + = seq - printk_seq ;
}
2022-11-16 19:21:28 +03:00
console_srcu_read_unlock ( cookie ) ;
2022-04-22 00:22:46 +03:00
2023-07-17 22:46:06 +03:00
if ( diff ! = last_diff & & reset_on_progress )
2023-10-06 11:21:51 +03:00
remaining_jiffies = timeout_jiffies ;
2022-04-22 00:22:46 +03:00
2022-07-15 09:10:42 +03:00
console_unlock ( ) ;
2023-07-17 22:46:06 +03:00
/* Note: @diff is 0 if there are no usable consoles. */
2023-10-06 11:21:51 +03:00
if ( diff = = 0 | | remaining_jiffies = = 0 )
2022-04-22 00:22:46 +03:00
break ;
2023-10-06 11:21:51 +03:00
/* msleep(1) might sleep much longer. Check time by jiffies. */
begin_jiffies = jiffies ;
msleep ( 1 ) ;
slept_jiffies = jiffies - begin_jiffies ;
remaining_jiffies - = min ( slept_jiffies , remaining_jiffies ) ;
2022-04-22 00:22:46 +03:00
last_diff = diff ;
}
return ( diff = = 0 ) ;
}
/**
* pr_flush ( ) - Wait for printing threads to catch up .
*
* @ timeout_ms : The maximum time ( in ms ) to wait .
* @ reset_on_progress : Reset the timeout if forward progress is seen .
*
* A value of 0 for @ timeout_ms means no waiting will occur . A value of - 1
* represents infinite waiting .
*
* If @ reset_on_progress is true , the timeout will be reset whenever any
* printer has been seen to make some forward progress .
*
* Context : Process context . May sleep while acquiring console lock .
2023-07-17 22:46:06 +03:00
* Return : true if all usable printers are caught up .
2022-04-22 00:22:46 +03:00
*/
2022-09-24 03:04:37 +03:00
static bool pr_flush ( int timeout_ms , bool reset_on_progress )
2022-04-22 00:22:46 +03:00
{
return __pr_flush ( NULL , timeout_ms , reset_on_progress ) ;
}
2013-03-23 02:04:39 +04:00
/*
* Delayed printk version , for scheduler - internal messages :
*/
2022-06-23 17:51:56 +03:00
# define PRINTK_PENDING_WAKEUP 0x01
# define PRINTK_PENDING_OUTPUT 0x02
2013-03-23 02:04:39 +04:00
static DEFINE_PER_CPU ( int , printk_pending ) ;
static void wake_up_klogd_work_func ( struct irq_work * irq_work )
{
2022-02-11 14:23:37 +03:00
int pending = this_cpu_xchg ( printk_pending , 0 ) ;
2013-03-23 02:04:39 +04:00
2022-06-23 17:51:56 +03:00
if ( pending & PRINTK_PENDING_OUTPUT ) {
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
/* If trylock fails, someone else is doing the printing */
if ( console_trylock ( ) )
console_unlock ( ) ;
2013-03-23 02:04:39 +04:00
}
if ( pending & PRINTK_PENDING_WAKEUP )
2022-05-26 23:30:56 +03:00
wake_up_interruptible ( & log_wait ) ;
2013-03-23 02:04:39 +04:00
}
2020-06-15 12:51:29 +03:00
static DEFINE_PER_CPU ( struct irq_work , wake_up_klogd_work ) =
IRQ_WORK_INIT_LAZY ( wake_up_klogd_work_func ) ;
2013-03-23 02:04:39 +04:00
2022-04-22 00:22:40 +03:00
static void __wake_up_klogd ( int val )
2013-03-23 02:04:39 +04:00
{
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
if ( ! printk_percpu_data_ready ( ) )
return ;
2013-03-23 02:04:39 +04:00
preempt_disable ( ) ;
2022-04-22 00:22:38 +03:00
/*
* Guarantee any new records can be seen by tasks preparing to wait
* before this context checks if the wait queue is empty .
*
* The full memory barrier within wq_has_sleeper ( ) pairs with the full
* memory barrier within set_current_state ( ) of
* prepare_to_wait_event ( ) , which is called after ___wait_event ( ) adds
* the waiter but before it has checked the wait condition .
*
2022-06-23 17:51:56 +03:00
* This pairs with devkmsg_read : A and syslog_print : A .
2022-04-22 00:22:38 +03:00
*/
2022-04-22 00:22:40 +03:00
if ( wq_has_sleeper ( & log_wait ) | | /* LMM(__wake_up_klogd:A) */
2022-06-23 17:51:56 +03:00
( val & PRINTK_PENDING_OUTPUT ) ) {
2022-04-22 00:22:40 +03:00
this_cpu_or ( printk_pending , val ) ;
2014-08-17 21:30:24 +04:00
irq_work_queue ( this_cpu_ptr ( & wake_up_klogd_work ) ) ;
2013-03-23 02:04:39 +04:00
}
preempt_enable ( ) ;
}
2008-07-25 12:45:58 +04:00
2023-07-17 22:46:05 +03:00
/**
* wake_up_klogd - Wake kernel logging daemon
*
* Use this function when new records have been added to the ringbuffer
* and the console printing of those records has already occurred or is
* known to be handled by some other context . This function will only
* wake the logging daemon .
*
* Context : Any context .
*/
2022-04-22 00:22:40 +03:00
void wake_up_klogd ( void )
2012-03-15 15:35:37 +04:00
{
2022-04-22 00:22:40 +03:00
__wake_up_klogd ( PRINTK_PENDING_WAKEUP ) ;
}
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-03 14:30:02 +03:00
2023-07-17 22:46:05 +03:00
/**
* defer_console_output - Wake kernel logging daemon and trigger
* console printing in a deferred context
*
* Use this function when new records have been added to the ringbuffer ,
* this context is responsible for console printing those records , but
* the current context is not allowed to perform the console printing .
* Trigger an irq_work context to perform the console printing . This
* function also wakes the logging daemon .
*
* Context : Any context .
*/
2022-04-22 00:22:40 +03:00
void defer_console_output ( void )
{
/*
* New messages may have been added directly to the ringbuffer
* using vprintk_store ( ) , so wake any waiters as well .
*/
2022-06-23 17:51:56 +03:00
__wake_up_klogd ( PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT ) ;
2018-06-27 17:08:16 +03:00
}
2021-11-07 07:51:16 +03:00
void printk_trigger_flush ( void )
{
defer_console_output ( ) ;
}
2018-06-27 17:08:16 +03:00
int vprintk_deferred ( const char * fmt , va_list args )
{
2023-07-17 22:46:05 +03:00
return vprintk_emit ( 0 , LOGLEVEL_SCHED , NULL , fmt , args ) ;
2012-03-15 15:35:37 +04:00
}
printk: Userspace format indexing support
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
2021-06-15 19:52:53 +03:00
int _printk_deferred ( const char * fmt , . . . )
2017-04-20 11:52:31 +03:00
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
r = vprintk_deferred ( fmt , args ) ;
va_end ( args ) ;
return r ;
}
2005-04-17 02:20:36 +04:00
/*
* printk rate limiting , lifted from the networking subsystem .
*
2008-07-30 09:33:38 +04:00
* This enforces a rate limit : not more than 10 kernel messages
* every 5 s to make a denial - of - service attack impossible .
2005-04-17 02:20:36 +04:00
*/
2008-07-30 09:33:38 +04:00
DEFINE_RATELIMIT_STATE ( printk_ratelimit_state , 5 * HZ , 10 ) ;
2009-10-23 16:58:11 +04:00
int __printk_ratelimit ( const char * func )
2005-04-17 02:20:36 +04:00
{
2009-10-23 16:58:11 +04:00
return ___ratelimit ( & printk_ratelimit_state , func ) ;
2005-04-17 02:20:36 +04:00
}
2009-10-23 16:58:11 +04:00
EXPORT_SYMBOL ( __printk_ratelimit ) ;
2006-11-03 09:07:16 +03:00
/**
* printk_timed_ratelimit - caller - controlled printk ratelimiting
* @ caller_jiffies : pointer to caller ' s state
* @ interval_msecs : minimum interval between prints
*
* printk_timed_ratelimit ( ) returns true if more than @ interval_msecs
* milliseconds have elapsed since the last time printk_timed_ratelimit ( )
* returned true .
*/
bool printk_timed_ratelimit ( unsigned long * caller_jiffies ,
unsigned int interval_msecs )
{
2014-08-07 03:09:08 +04:00
unsigned long elapsed = jiffies - * caller_jiffies ;
if ( * caller_jiffies & & elapsed < = msecs_to_jiffies ( interval_msecs ) )
return false ;
* caller_jiffies = jiffies ;
return true ;
2006-11-03 09:07:16 +03:00
}
EXPORT_SYMBOL ( printk_timed_ratelimit ) ;
2009-10-16 16:09:18 +04:00
static DEFINE_SPINLOCK ( dump_list_lock ) ;
static LIST_HEAD ( dump_list ) ;
/**
* kmsg_dump_register - register a kernel log dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Adds a kernel log dumper to the system . The dump callback in the
* structure will be called when the kernel oopses or panics and must be
* set . Returns zero on success and % - EINVAL or % - EBUSY otherwise .
*/
int kmsg_dump_register ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EBUSY ;
/* The dump callback needs to be set */
if ( ! dumper - > dump )
return - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
/* Don't allow registering multiple times */
if ( ! dumper - > registered ) {
dumper - > registered = 1 ;
2011-01-13 03:59:43 +03:00
list_add_tail_rcu ( & dumper - > list , & dump_list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_register ) ;
/**
* kmsg_dump_unregister - unregister a kmsg dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Removes a dump device from the system . Returns zero on success and
* % - EINVAL otherwise .
*/
int kmsg_dump_unregister ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
if ( dumper - > registered ) {
dumper - > registered = 0 ;
2011-01-13 03:59:43 +03:00
list_del_rcu ( & dumper - > list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
2011-01-13 03:59:43 +03:00
synchronize_rcu ( ) ;
2009-10-16 16:09:18 +04:00
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_unregister ) ;
2012-05-03 04:29:13 +04:00
static bool always_kmsg_dump ;
module_param_named ( always_kmsg_dump , always_kmsg_dump , bool , S_IRUGO | S_IWUSR ) ;
2020-05-08 05:36:22 +03:00
const char * kmsg_dump_reason_str ( enum kmsg_dump_reason reason )
{
switch ( reason ) {
case KMSG_DUMP_PANIC :
return " Panic " ;
case KMSG_DUMP_OOPS :
return " Oops " ;
case KMSG_DUMP_EMERG :
return " Emergency " ;
case KMSG_DUMP_SHUTDOWN :
return " Shutdown " ;
default :
return " Unknown " ;
}
}
EXPORT_SYMBOL_GPL ( kmsg_dump_reason_str ) ;
2009-10-16 16:09:18 +04:00
/**
* kmsg_dump - dump kernel log to kernel message dumpers .
* @ reason : the reason ( oops , panic etc ) for dumping
*
2012-06-15 16:07:51 +04:00
* Call each of the registered dumper ' s dump ( ) callback , which can
* retrieve the kmsg records with kmsg_dump_get_line ( ) or
* kmsg_dump_get_buffer ( ) .
2009-10-16 16:09:18 +04:00
*/
void kmsg_dump ( enum kmsg_dump_reason reason )
{
struct kmsg_dumper * dumper ;
2012-06-15 16:07:51 +04:00
rcu_read_lock ( ) ;
list_for_each_entry_rcu ( dumper , & dump_list , list ) {
2020-05-05 18:45:06 +03:00
enum kmsg_dump_reason max_reason = dumper - > max_reason ;
/*
* If client has not provided a specific max_reason , default
* to KMSG_DUMP_OOPS , unless always_kmsg_dump was set .
*/
if ( max_reason = = KMSG_DUMP_UNDEF ) {
max_reason = always_kmsg_dump ? KMSG_DUMP_MAX :
KMSG_DUMP_OOPS ;
}
if ( reason > max_reason )
2012-06-15 16:07:51 +04:00
continue ;
/* invoke dumper which will iterate over records */
dumper - > dump ( dumper , reason ) ;
}
rcu_read_unlock ( ) ;
}
/**
2021-03-03 13:15:27 +03:00
* kmsg_dump_get_line - retrieve one kmsg log line
2021-03-03 13:15:25 +03:00
* @ iter : kmsg dump iterator
2012-06-15 16:07:51 +04:00
* @ syslog : include the " <4> " prefixes
* @ line : buffer to copy the line to
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the beginning of the kmsg buffer , with the oldest kmsg
* record , and copy one record into the provided buffer .
*
* Consecutive calls will return the next available record moving
* towards the end of the buffer with the youngest messages .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
2021-03-03 13:15:27 +03:00
bool kmsg_dump_get_line ( struct kmsg_dump_iter * iter , bool syslog ,
char * line , size_t size , size_t * len )
2012-06-15 16:07:51 +04:00
{
2021-03-03 13:15:25 +03:00
u64 min_seq = latched_seq_read_nolock ( & clear_seq ) ;
2020-07-09 16:23:44 +03:00
struct printk_info info ;
unsigned int line_count ;
struct printk_record r ;
2012-06-15 16:07:51 +04:00
size_t l = 0 ;
bool ret = false ;
2021-03-03 13:15:25 +03:00
if ( iter - > cur_seq < min_seq )
iter - > cur_seq = min_seq ;
2020-09-19 01:34:21 +03:00
prb_rec_init_rd ( & r , & info , line , size ) ;
2020-07-09 16:23:44 +03:00
/* Read text or count text lines? */
if ( line ) {
2021-03-03 13:15:25 +03:00
if ( ! prb_read_valid ( prb , iter - > cur_seq , & r ) )
2020-07-09 16:23:44 +03:00
goto out ;
l = record_print_text ( & r , syslog , printk_time ) ;
} else {
2021-03-03 13:15:25 +03:00
if ( ! prb_read_valid_info ( prb , iter - > cur_seq ,
2020-07-09 16:23:44 +03:00
& info , & line_count ) ) {
goto out ;
}
l = get_record_print_text_size ( & info , line_count , syslog ,
printk_time ) ;
2009-10-16 16:09:18 +04:00
2020-07-09 16:23:44 +03:00
}
2012-06-15 16:07:51 +04:00
2021-03-03 13:15:25 +03:00
iter - > cur_seq = r . info - > seq + 1 ;
2012-06-15 16:07:51 +04:00
ret = true ;
out :
if ( len )
* len = l ;
return ret ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_get_line ) ;
/**
* kmsg_dump_get_buffer - copy kmsg log lines
2021-03-03 13:15:25 +03:00
* @ iter : kmsg dump iterator
2012-06-15 16:07:51 +04:00
* @ syslog : include the " <4> " prefixes
2012-07-01 02:37:24 +04:00
* @ buf : buffer to copy the line to
2012-06-15 16:07:51 +04:00
* @ size : maximum size of the buffer
2021-03-03 13:15:18 +03:00
* @ len_out : length of line placed into buffer
2012-06-15 16:07:51 +04:00
*
* Start at the end of the kmsg buffer and fill the provided buffer
2020-08-07 06:32:27 +03:00
* with as many of the * youngest * kmsg records that fit into it .
2012-06-15 16:07:51 +04:00
* If the buffer is large enough , all available kmsg records will be
* copied with a single call .
*
* Consecutive calls will fill the buffer with the next block of
* available older records , not including the earlier retrieved ones .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
2021-03-03 13:15:25 +03:00
bool kmsg_dump_get_buffer ( struct kmsg_dump_iter * iter , bool syslog ,
2021-03-03 13:15:18 +03:00
char * buf , size_t size , size_t * len_out )
2012-06-15 16:07:51 +04:00
{
2021-03-03 13:15:25 +03:00
u64 min_seq = latched_seq_read_nolock ( & clear_seq ) ;
2020-07-09 16:23:44 +03:00
struct printk_info info ;
struct printk_record r ;
2012-06-15 16:07:51 +04:00
u64 seq ;
u64 next_seq ;
2021-03-03 13:15:18 +03:00
size_t len = 0 ;
2012-06-15 16:07:51 +04:00
bool ret = false ;
2018-12-04 13:00:01 +03:00
bool time = printk_time ;
2012-06-15 16:07:51 +04:00
2021-03-03 13:15:24 +03:00
if ( ! buf | | ! size )
2012-06-15 16:07:51 +04:00
goto out ;
2021-03-03 13:15:25 +03:00
if ( iter - > cur_seq < min_seq )
iter - > cur_seq = min_seq ;
if ( prb_read_valid_info ( prb , iter - > cur_seq , & info , NULL ) ) {
if ( info . seq ! = iter - > cur_seq ) {
2021-02-11 20:31:52 +03:00
/* messages are gone, move to first available one */
2021-03-03 13:15:25 +03:00
iter - > cur_seq = info . seq ;
2021-02-11 20:31:52 +03:00
}
2012-06-15 16:07:51 +04:00
}
/* last entry */
2021-07-15 22:33:56 +03:00
if ( iter - > cur_seq > = iter - > next_seq )
2012-06-15 16:07:51 +04:00
goto out ;
2021-03-03 13:15:18 +03:00
/*
* Find first record that fits , including all following records ,
2021-03-03 13:15:19 +03:00
* into the user - provided buffer for this dump . Pass in size - 1
* because this function ( by way of record_print_text ( ) ) will
* not write more than size - 1 bytes of text into @ buf .
2021-03-03 13:15:18 +03:00
*/
2021-03-03 13:15:25 +03:00
seq = find_first_fitting_seq ( iter - > cur_seq , iter - > next_seq ,
2021-03-03 13:15:19 +03:00
size - 1 , syslog , time ) ;
2012-06-15 16:07:51 +04:00
2021-03-03 13:15:18 +03:00
/*
* Next kmsg_dump_get_buffer ( ) invocation will dump block of
* older records stored right before this one .
*/
2012-06-15 16:07:51 +04:00
next_seq = seq ;
2021-03-03 13:15:18 +03:00
prb_rec_init_rd ( & r , & info , buf , size ) ;
prb_for_each_record ( seq , prb , seq , & r ) {
2021-03-03 13:15:25 +03:00
if ( r . info - > seq > = iter - > next_seq )
2020-07-09 16:23:44 +03:00
break ;
2021-03-03 13:15:18 +03:00
len + = record_print_text ( & r , syslog , time ) ;
2020-07-09 16:23:44 +03:00
2021-03-03 13:15:18 +03:00
/* Adjust record to store to remaining buffer space. */
prb_rec_init_rd ( & r , & info , buf + len , size - len ) ;
2012-06-15 16:07:51 +04:00
}
2021-03-03 13:15:25 +03:00
iter - > next_seq = next_seq ;
2012-06-15 16:07:51 +04:00
ret = true ;
out :
2021-03-03 13:15:18 +03:00
if ( len_out )
* len_out = len ;
2012-06-15 16:07:51 +04:00
return ret ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_get_buffer ) ;
2009-10-16 16:09:18 +04:00
2012-06-15 16:07:51 +04:00
/**
2020-04-18 14:35:36 +03:00
* kmsg_dump_rewind - reset the iterator
2021-03-03 13:15:25 +03:00
* @ iter : kmsg dump iterator
2012-06-15 16:07:51 +04:00
*
* Reset the dumper ' s iterator so that kmsg_dump_get_line ( ) and
* kmsg_dump_get_buffer ( ) can be called again and used multiple
* times within the same dumper . dump ( ) callback .
*/
2021-03-03 13:15:25 +03:00
void kmsg_dump_rewind ( struct kmsg_dump_iter * iter )
2012-06-15 16:07:51 +04:00
{
2021-03-03 13:15:27 +03:00
iter - > cur_seq = latched_seq_read_nolock ( & clear_seq ) ;
iter - > next_seq = prb_next_seq ( prb ) ;
2009-10-16 16:09:18 +04:00
}
2012-06-15 16:07:51 +04:00
EXPORT_SYMBOL_GPL ( kmsg_dump_rewind ) ;
2013-05-01 02:27:12 +04:00
2008-02-08 15:21:25 +03:00
# endif
2021-06-17 12:50:50 +03:00
# ifdef CONFIG_SMP
2022-04-22 00:22:36 +03:00
static atomic_t printk_cpu_sync_owner = ATOMIC_INIT ( - 1 ) ;
static atomic_t printk_cpu_sync_nested = ATOMIC_INIT ( 0 ) ;
2021-06-17 12:50:50 +03:00
/**
2022-04-22 00:22:36 +03:00
* __printk_cpu_sync_wait ( ) - Busy wait until the printk cpu - reentrant
* spinning lock is not owned by any CPU .
2021-06-17 12:50:50 +03:00
*
* Context : Any context .
*/
2022-04-22 00:22:36 +03:00
void __printk_cpu_sync_wait ( void )
2021-06-17 12:50:50 +03:00
{
do {
cpu_relax ( ) ;
2022-04-22 00:22:36 +03:00
} while ( atomic_read ( & printk_cpu_sync_owner ) ! = - 1 ) ;
2021-06-17 12:50:50 +03:00
}
2022-04-22 00:22:36 +03:00
EXPORT_SYMBOL ( __printk_cpu_sync_wait ) ;
2021-06-17 12:50:50 +03:00
/**
2022-04-22 00:22:36 +03:00
* __printk_cpu_sync_try_get ( ) - Try to acquire the printk cpu - reentrant
* spinning lock .
2021-06-17 12:50:50 +03:00
*
* If no processor has the lock , the calling processor takes the lock and
* becomes the owner . If the calling processor is already the owner of the
* lock , this function succeeds immediately .
*
* Context : Any context . Expects interrupts to be disabled .
* Return : 1 on success , otherwise 0.
*/
2022-04-22 00:22:36 +03:00
int __printk_cpu_sync_try_get ( void )
2021-06-17 12:50:50 +03:00
{
int cpu ;
int old ;
cpu = smp_processor_id ( ) ;
2021-06-17 12:50:51 +03:00
/*
* Guarantee loads and stores from this CPU when it is the lock owner
* are _not_ visible to the previous lock owner . This pairs with
2022-04-22 00:22:36 +03:00
* __printk_cpu_sync_put : B .
2021-06-17 12:50:51 +03:00
*
* Memory barrier involvement :
*
2022-04-22 00:22:36 +03:00
* If __printk_cpu_sync_try_get : A reads from __printk_cpu_sync_put : B ,
* then __printk_cpu_sync_put : A can never read from
* __printk_cpu_sync_try_get : B .
2021-06-17 12:50:51 +03:00
*
* Relies on :
*
2022-04-22 00:22:36 +03:00
* RELEASE from __printk_cpu_sync_put : A to __printk_cpu_sync_put : B
2021-06-17 12:50:51 +03:00
* of the previous CPU
* matching
2022-04-22 00:22:36 +03:00
* ACQUIRE from __printk_cpu_sync_try_get : A to
* __printk_cpu_sync_try_get : B of this CPU
2021-06-17 12:50:51 +03:00
*/
2022-04-22 00:22:36 +03:00
old = atomic_cmpxchg_acquire ( & printk_cpu_sync_owner , - 1 ,
cpu ) ; /* LMM(__printk_cpu_sync_try_get:A) */
2021-06-17 12:50:50 +03:00
if ( old = = - 1 ) {
2021-06-17 12:50:51 +03:00
/*
* This CPU is now the owner and begins loading / storing
2022-04-22 00:22:36 +03:00
* data : LMM ( __printk_cpu_sync_try_get : B )
2021-06-17 12:50:51 +03:00
*/
2021-06-17 12:50:50 +03:00
return 1 ;
2021-06-17 12:50:51 +03:00
2021-06-17 12:50:50 +03:00
} else if ( old = = cpu ) {
/* This CPU is already the owner. */
2022-04-22 00:22:36 +03:00
atomic_inc ( & printk_cpu_sync_nested ) ;
2021-06-17 12:50:50 +03:00
return 1 ;
}
return 0 ;
}
2022-04-22 00:22:36 +03:00
EXPORT_SYMBOL ( __printk_cpu_sync_try_get ) ;
2021-06-17 12:50:50 +03:00
/**
2022-04-22 00:22:36 +03:00
* __printk_cpu_sync_put ( ) - Release the printk cpu - reentrant spinning lock .
2021-06-17 12:50:50 +03:00
*
* The calling processor must be the owner of the lock .
*
* Context : Any context . Expects interrupts to be disabled .
*/
2022-04-22 00:22:36 +03:00
void __printk_cpu_sync_put ( void )
2021-06-17 12:50:50 +03:00
{
2022-04-22 00:22:36 +03:00
if ( atomic_read ( & printk_cpu_sync_nested ) ) {
atomic_dec ( & printk_cpu_sync_nested ) ;
2021-06-17 12:50:50 +03:00
return ;
}
2021-06-17 12:50:51 +03:00
/*
* This CPU is finished loading / storing data :
2022-04-22 00:22:36 +03:00
* LMM ( __printk_cpu_sync_put : A )
2021-06-17 12:50:51 +03:00
*/
/*
* Guarantee loads and stores from this CPU when it was the
* lock owner are visible to the next lock owner . This pairs
2022-04-22 00:22:36 +03:00
* with __printk_cpu_sync_try_get : A .
2021-06-17 12:50:51 +03:00
*
* Memory barrier involvement :
*
2022-04-22 00:22:36 +03:00
* If __printk_cpu_sync_try_get : A reads from __printk_cpu_sync_put : B ,
* then __printk_cpu_sync_try_get : B reads from __printk_cpu_sync_put : A .
2021-06-17 12:50:51 +03:00
*
* Relies on :
*
2022-04-22 00:22:36 +03:00
* RELEASE from __printk_cpu_sync_put : A to __printk_cpu_sync_put : B
2021-06-17 12:50:51 +03:00
* of this CPU
* matching
2022-04-22 00:22:36 +03:00
* ACQUIRE from __printk_cpu_sync_try_get : A to
* __printk_cpu_sync_try_get : B of the next CPU
2021-06-17 12:50:51 +03:00
*/
2022-04-22 00:22:36 +03:00
atomic_set_release ( & printk_cpu_sync_owner ,
- 1 ) ; /* LMM(__printk_cpu_sync_put:B) */
2021-06-17 12:50:50 +03:00
}
2022-04-22 00:22:36 +03:00
EXPORT_SYMBOL ( __printk_cpu_sync_put ) ;
2021-06-17 12:50:50 +03:00
# endif /* CONFIG_SMP */