2005-04-17 02:20:36 +04:00
/*
* linux / kernel / printk . c
*
* Copyright ( C ) 1991 , 1992 Linus Torvalds
*
* Modified to make sys_syslog ( ) more flexible : added commands to
* return the last 4 k of kernel messages , regardless of whether
* they ' ve been read or not . Added option to suppress kernel printk ' s
* to the console . Added hook for sending the console messages
* elsewhere , in preparation for a serial line console ( someday ) .
* Ted Ts ' o , 2 / 11 / 93.
* Modified for sysctl support , 1 / 8 / 97 , Chris Horn .
2005-10-31 02:02:46 +03:00
* Fixed SMP synchronization , 08 / 08 / 99 , Manfred Spraul
2006-01-15 04:43:54 +03:00
* manfred @ colorfullife . com
2005-04-17 02:20:36 +04:00
* Rewrote bits to get rid of console_lock
2008-10-16 09:01:59 +04:00
* 01 Mar01 Andrew Morton
2005-04-17 02:20:36 +04:00
*/
# include <linux/kernel.h>
# include <linux/mm.h>
# include <linux/tty.h>
# include <linux/tty_driver.h>
# include <linux/console.h>
# include <linux/init.h>
2007-10-16 12:23:46 +04:00
# include <linux/jiffies.h>
# include <linux/nmi.h>
2005-04-17 02:20:36 +04:00
# include <linux/module.h>
2006-06-25 16:48:15 +04:00
# include <linux/moduleparam.h>
2005-04-17 02:20:36 +04:00
# include <linux/delay.h>
# include <linux/smp.h>
# include <linux/security.h>
# include <linux/bootmem.h>
2011-05-25 04:13:20 +04:00
# include <linux/memblock.h>
2005-04-17 02:20:36 +04:00
# include <linux/syscalls.h>
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
# include <linux/crash_core.h>
2010-05-21 06:04:27 +04:00
# include <linux/kdb.h>
2009-09-22 18:18:09 +04:00
# include <linux/ratelimit.h>
2009-10-16 16:09:18 +04:00
# include <linux/kmsg_dump.h>
2010-02-04 02:36:43 +03:00
# include <linux/syslog.h>
2010-06-04 09:11:25 +04:00
# include <linux/cpu.h>
# include <linux/notifier.h>
2011-01-13 03:59:43 +03:00
# include <linux/rculist.h>
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
# include <linux/poll.h>
2012-10-12 20:00:23 +04:00
# include <linux/irq_work.h>
2013-05-01 02:27:12 +04:00
# include <linux/utsname.h>
2014-08-07 03:09:08 +04:00
# include <linux/ctype.h>
2015-02-22 19:58:50 +03:00
# include <linux/uio.h>
2017-02-01 18:36:40 +03:00
# include <linux/sched/clock.h>
2017-02-08 20:51:35 +03:00
# include <linux/sched/debug.h>
2017-02-08 20:51:37 +03:00
# include <linux/sched/task_stack.h>
2005-04-17 02:20:36 +04:00
2016-12-24 22:46:01 +03:00
# include <linux/uaccess.h>
2016-08-03 00:03:59 +03:00
# include <asm/sections.h>
2005-04-17 02:20:36 +04:00
2011-11-24 23:03:08 +04:00
# define CREATE_TRACE_POINTS
# include <trace/events/printk.h>
2013-08-01 00:53:44 +04:00
# include "console_cmdline.h"
2013-08-01 00:53:45 +04:00
# include "braille.h"
2016-05-21 03:00:33 +03:00
# include "internal.h"
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
int console_printk [ 4 ] = {
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_DEFAULT , /* console_loglevel */
2014-08-07 03:09:01 +04:00
MESSAGE_LOGLEVEL_DEFAULT , /* default_message_loglevel */
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_MIN , /* minimum_console_loglevel */
CONSOLE_LOGLEVEL_DEFAULT , /* default_console_loglevel */
2005-04-17 02:20:36 +04:00
} ;
/*
2007-02-17 22:10:16 +03:00
* Low level drivers may need that to know if they can schedule in
2005-04-17 02:20:36 +04:00
* their unblank ( ) callback or not . So let ' s export it .
*/
int oops_in_progress ;
EXPORT_SYMBOL ( oops_in_progress ) ;
/*
* console_sem protects the console_drivers list , and also
* provides serialisation for access to the entire console
* driver system .
*/
2010-09-07 18:33:43 +04:00
static DEFINE_SEMAPHORE ( console_sem ) ;
2005-04-17 02:20:36 +04:00
struct console * console_drivers ;
2008-06-02 15:19:08 +04:00
EXPORT_SYMBOL_GPL ( console_drivers ) ;
console: implement lockdep support for console_lock
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-22 21:52:11 +04:00
# ifdef CONFIG_LOCKDEP
static struct lockdep_map console_lock_dep_map = {
. name = " console_lock "
} ;
# endif
2016-08-03 00:04:07 +03:00
enum devkmsg_log_bits {
__DEVKMSG_LOG_BIT_ON = 0 ,
__DEVKMSG_LOG_BIT_OFF ,
__DEVKMSG_LOG_BIT_LOCK ,
} ;
enum devkmsg_log_masks {
DEVKMSG_LOG_MASK_ON = BIT ( __DEVKMSG_LOG_BIT_ON ) ,
DEVKMSG_LOG_MASK_OFF = BIT ( __DEVKMSG_LOG_BIT_OFF ) ,
DEVKMSG_LOG_MASK_LOCK = BIT ( __DEVKMSG_LOG_BIT_LOCK ) ,
} ;
/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
# define DEVKMSG_LOG_MASK_DEFAULT 0
static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT ;
static int __control_devkmsg ( char * str )
{
if ( ! str )
return - EINVAL ;
if ( ! strncmp ( str , " on " , 2 ) ) {
devkmsg_log = DEVKMSG_LOG_MASK_ON ;
return 2 ;
} else if ( ! strncmp ( str , " off " , 3 ) ) {
devkmsg_log = DEVKMSG_LOG_MASK_OFF ;
return 3 ;
} else if ( ! strncmp ( str , " ratelimit " , 9 ) ) {
devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT ;
return 9 ;
}
return - EINVAL ;
}
static int __init control_devkmsg ( char * str )
{
if ( __control_devkmsg ( str ) < 0 )
return 1 ;
/*
* Set sysctl string accordingly :
*/
2018-01-19 07:39:01 +03:00
if ( devkmsg_log = = DEVKMSG_LOG_MASK_ON )
strcpy ( devkmsg_log_str , " on " ) ;
else if ( devkmsg_log = = DEVKMSG_LOG_MASK_OFF )
strcpy ( devkmsg_log_str , " off " ) ;
2016-08-03 00:04:07 +03:00
/* else "ratelimit" which is set by default. */
/*
* Sysctl cannot change it anymore . The kernel command line setting of
* this parameter is to force the setting to be permanent throughout the
* runtime of the system . This is a precation measure against userspace
* trying to be a smarta * * and attempting to change it up on us .
*/
devkmsg_log | = DEVKMSG_LOG_MASK_LOCK ;
return 0 ;
}
__setup ( " printk.devkmsg= " , control_devkmsg ) ;
char devkmsg_log_str [ DEVKMSG_STR_MAX_SIZE ] = " ratelimit " ;
int devkmsg_sysctl_set_loglvl ( struct ctl_table * table , int write ,
void __user * buffer , size_t * lenp , loff_t * ppos )
{
char old_str [ DEVKMSG_STR_MAX_SIZE ] ;
unsigned int old ;
int err ;
if ( write ) {
if ( devkmsg_log & DEVKMSG_LOG_MASK_LOCK )
return - EINVAL ;
old = devkmsg_log ;
strncpy ( old_str , devkmsg_log_str , DEVKMSG_STR_MAX_SIZE ) ;
}
err = proc_dostring ( table , write , buffer , lenp , ppos ) ;
if ( err )
return err ;
if ( write ) {
err = __control_devkmsg ( devkmsg_log_str ) ;
/*
* Do not accept an unknown string OR a known string with
* trailing crap . . .
*/
if ( err < 0 | | ( err + 1 ! = * lenp ) ) {
/* ... and restore old setting. */
devkmsg_log = old ;
strncpy ( devkmsg_log_str , old_str , DEVKMSG_STR_MAX_SIZE ) ;
return - EINVAL ;
}
}
return 0 ;
}
2015-06-26 01:01:30 +03:00
/*
* Number of registered extended console drivers .
*
* If extended consoles are present , in - kernel cont reassembly is disabled
* and each fragment is stored as a separate log entry with proper
* continuation flag so that every emitted message has full metadata . This
* doesn ' t change the result for regular consoles or / proc / kmsg . For
* / dev / kmsg , as long as the reader concatenates messages according to
* consecutive continuation flags , the end result should be the same too .
*/
static int nr_ext_console_drivers ;
2014-06-05 03:11:36 +04:00
/*
* Helper macros to handle lockdep when locking / unlocking console_sem . We use
* macros instead of functions so that _RET_IP_ contains useful information .
*/
# define down_console_sem() do { \
down ( & console_sem ) ; \
mutex_acquire ( & console_lock_dep_map , 0 , 0 , _RET_IP_ ) ; \
} while ( 0 )
static int __down_trylock_console_sem ( unsigned long ip )
{
2016-12-27 17:16:09 +03:00
int lock_failed ;
unsigned long flags ;
/*
* Here and in __up_console_sem ( ) we need to be in safe mode ,
* because spindump / WARN / etc from under console - > lock will
* deadlock in printk ( ) - > down_trylock_console_sem ( ) otherwise .
*/
printk_safe_enter_irqsave ( flags ) ;
lock_failed = down_trylock ( & console_sem ) ;
printk_safe_exit_irqrestore ( flags ) ;
if ( lock_failed )
2014-06-05 03:11:36 +04:00
return 1 ;
mutex_acquire ( & console_lock_dep_map , 0 , 1 , ip ) ;
return 0 ;
}
# define down_trylock_console_sem() __down_trylock_console_sem(_RET_IP_)
2016-12-27 17:16:09 +03:00
static void __up_console_sem ( unsigned long ip )
{
unsigned long flags ;
mutex_release ( & console_lock_dep_map , 1 , ip ) ;
printk_safe_enter_irqsave ( flags ) ;
up ( & console_sem ) ;
printk_safe_exit_irqrestore ( flags ) ;
}
# define up_console_sem() __up_console_sem(_RET_IP_)
2014-06-05 03:11:36 +04:00
2005-04-17 02:20:36 +04:00
/*
* This is used for debugging the mess that is the VT code by
* keeping track if we have the console semaphore held . It ' s
* definitely not the perfect debug tool ( we don ' t know if _WE_
2014-08-07 03:09:03 +04:00
* hold it and are racing , but it helps tracking those weird code
* paths in the console code where we end up in places I want
* locked without the console sempahore held ) .
2005-04-17 02:20:36 +04:00
*/
2006-06-20 05:16:01 +04:00
static int console_locked , console_suspended ;
2005-04-17 02:20:36 +04:00
2011-03-23 02:34:21 +03:00
/*
* If exclusive_console is non - NULL then only this console is to be printed to .
*/
static struct console * exclusive_console ;
2005-04-17 02:20:36 +04:00
/*
* Array of consoles built from command line options ( console = )
*/
# define MAX_CMDLINECONSOLES 8
static struct console_cmdline console_cmdline [ MAX_CMDLINECONSOLES ] ;
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
static int preferred_console = - 1 ;
xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 02:31:07 +04:00
int console_set_on_cmdline ;
EXPORT_SYMBOL ( console_set_on_cmdline ) ;
2005-04-17 02:20:36 +04:00
/* Flag: console code may call schedule() */
static int console_may_schedule ;
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 08:41:49 +03:00
enum con_msg_format_flags {
MSG_FORMAT_DEFAULT = 0 ,
MSG_FORMAT_SYSLOG = ( 1 < < 0 ) ,
} ;
static int console_msg_format = MSG_FORMAT_DEFAULT ;
2012-05-03 04:29:13 +04:00
/*
* The printk log buffer consists of a chain of concatenated variable
* length records . Every record starts with a record header , containing
* the overall length of the record .
*
* The heads to the first and last entry in the buffer , as well as the
2014-08-07 03:09:03 +04:00
* sequence numbers of these entries are maintained when messages are
* stored .
2012-05-03 04:29:13 +04:00
*
* If the heads indicate available messages , the length in the header
* tells the start next message . A length = = 0 for the next message
* indicates a wrap - around to the beginning of the buffer .
*
* Every record carries the monotonic timestamp in microseconds , as well as
* the standard userspace syslog level and syslog facility . The usual
* kernel messages use LOG_KERN ; userspace - injected messages always carry
* a matching syslog facility , by default LOG_USER . The origin of every
* message can be reliably determined that way .
*
* The human readable log message directly follows the message header . The
* length of the message text is stored in the header , the stored message
* is not terminated .
*
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
* Optionally , a message can carry a dictionary of properties ( key / value pairs ) ,
* to provide userspace with a machine - readable message context .
*
* Examples for well - defined , commonly used property names are :
* DEVICE = b12 : 8 device identifier
* b12 : 8 block dev_t
* c127 : 3 char dev_t
* n8 netdev ifindex
* + sound : card0 subsystem : devname
* SUBSYSTEM = pci driver - core subsystem name
*
* Valid characters in property names are [ a - zA - Z0 - 9. - _ ] . The plain text value
* follows directly after a ' = ' character . Every property is terminated by
* a ' \0 ' character . The last property is not terminated .
*
* Example of a message structure :
* 0000 ff 8f 00 00 00 00 00 00 monotonic time in nsec
* 000 8 34 00 record is 52 bytes long
* 000 a 0 b 00 text is 11 bytes long
* 000 c 1f 00 dictionary is 23 bytes long
* 000 e 03 00 LOG_KERN ( facility ) LOG_ERR ( level )
* 0010 69 74 27 73 20 61 20 6 c " it's a l "
* 69 6 e 65 " ine "
* 001 b 44 45 56 49 43 " DEVIC "
* 45 3 d 62 38 3 a 32 00 44 " E=b8:2 \0 D "
* 52 49 56 45 52 3 d 62 75 " RIVER=bu "
* 67 " g "
* 0032 00 00 00 padding to next message header
*
2013-08-01 00:53:47 +04:00
* The ' struct printk_log ' buffer header must never be directly exported to
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
* userspace , it is a kernel - private implementation detail that might
* need to be changed in the future , when the requirements change .
*
* / dev / kmsg exports the structured data in the following line format :
2015-07-01 00:59:03 +03:00
* " <level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text> \n "
*
* Users of the export format should ignore possible additional values
* separated by ' , ' , and find the message after the ' ; ' character .
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
*
* The optional key / value pairs are attached as continuation lines starting
* with a space character and terminated by a newline . All possible
* non - prinatable characters are escaped in the " \xff " notation .
2012-05-03 04:29:13 +04:00
*/
2012-06-28 11:38:53 +04:00
enum log_flags {
2012-07-09 23:15:42 +04:00
LOG_NOCONS = 1 , /* already flushed, do not print to console */
LOG_NEWLINE = 2 , /* text ended with a newline */
LOG_PREFIX = 4 , /* text started with a prefix */
LOG_CONT = 8 , /* text is a fragment of a continuation line */
2012-06-28 11:38:53 +04:00
} ;
2013-08-01 00:53:47 +04:00
struct printk_log {
2012-05-03 04:29:13 +04:00
u64 ts_nsec ; /* timestamp in nanoseconds */
u16 len ; /* length of entire record */
u16 text_len ; /* length of text buffer */
u16 dict_len ; /* length of dictionary buffer */
2012-06-28 11:38:53 +04:00
u8 facility ; /* syslog facility */
u8 flags : 5 ; /* internal record flags */
u8 level : 3 ; /* syslog level */
2016-01-21 02:00:48 +03:00
}
# ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
__packed __aligned ( 4 )
# endif
;
2012-05-03 04:29:13 +04:00
/*
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
* The logbuf_lock protects kmsg buffer , indices , counters . This can be taken
* within the scheduler ' s rq lock . It must be released before calling
* console_unlock ( ) or anything else that might wake up a process .
2012-05-03 04:29:13 +04:00
*/
2016-05-21 03:00:42 +03:00
DEFINE_RAW_SPINLOCK ( logbuf_lock ) ;
2005-05-01 19:59:02 +04:00
2016-12-27 17:16:11 +03:00
/*
* Helper macros to lock / unlock logbuf_lock and switch between
* printk - safe / unsafe modes .
*/
# define logbuf_lock_irq() \
do { \
printk_safe_enter_irq ( ) ; \
raw_spin_lock ( & logbuf_lock ) ; \
} while ( 0 )
# define logbuf_unlock_irq() \
do { \
raw_spin_unlock ( & logbuf_lock ) ; \
printk_safe_exit_irq ( ) ; \
} while ( 0 )
# define logbuf_lock_irqsave(flags) \
do { \
printk_safe_enter_irqsave ( flags ) ; \
raw_spin_lock ( & logbuf_lock ) ; \
} while ( 0 )
# define logbuf_unlock_irqrestore(flags) \
do { \
raw_spin_unlock ( & logbuf_lock ) ; \
printk_safe_exit_irqrestore ( flags ) ; \
} while ( 0 )
2012-07-17 05:35:29 +04:00
# ifdef CONFIG_PRINTK
2013-03-23 02:04:39 +04:00
DECLARE_WAIT_QUEUE_HEAD ( log_wait ) ;
2012-05-09 03:37:51 +04:00
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq ;
static u32 syslog_idx ;
2012-07-09 21:05:10 +04:00
static size_t syslog_partial ;
2012-05-03 04:29:13 +04:00
/* index and sequence number of the first record stored in the buffer */
static u64 log_first_seq ;
static u32 log_first_idx ;
/* index and sequence number of the next record to store in the buffer */
static u64 log_next_seq ;
static u32 log_next_idx ;
2012-07-17 05:35:30 +04:00
/* the next printk record to write to the console */
static u64 console_seq ;
static u32 console_idx ;
2012-05-03 04:29:13 +04:00
/* the next printk record to read after the last 'clear' command */
static u64 clear_seq ;
static u32 clear_idx ;
2012-07-17 05:35:29 +04:00
# define PREFIX_MAX 32
2014-08-07 03:09:08 +04:00
# define LOG_LINE_MAX (1024 - PREFIX_MAX)
2012-05-09 03:37:51 +04:00
2015-11-07 03:30:38 +03:00
# define LOG_LEVEL(v) ((v) & 0x07)
# define LOG_FACILITY(v) ((v) >> 3 & 0xff)
2012-05-09 03:37:51 +04:00
/* record buffer */
2013-08-01 00:53:47 +04:00
# define LOG_ALIGN __alignof__(struct printk_log)
2012-05-09 03:37:51 +04:00
# define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
2012-05-11 02:14:33 +04:00
static char __log_buf [ __LOG_BUF_LEN ] __aligned ( LOG_ALIGN ) ;
2012-05-09 03:37:51 +04:00
static char * log_buf = __log_buf ;
static u32 log_buf_len = __LOG_BUF_LEN ;
2014-08-09 09:45:30 +04:00
/* Return log buffer address */
char * log_buf_addr_get ( void )
{
return log_buf ;
}
/* Return log buffer size */
u32 log_buf_len_get ( void )
{
return log_buf_len ;
}
2012-05-03 04:29:13 +04:00
/* human readable text of the record */
2013-08-01 00:53:47 +04:00
static char * log_text ( const struct printk_log * msg )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
return ( char * ) msg + sizeof ( struct printk_log ) ;
2012-05-03 04:29:13 +04:00
}
/* optional key/value pair dictionary attached to the record */
2013-08-01 00:53:47 +04:00
static char * log_dict ( const struct printk_log * msg )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
return ( char * ) msg + sizeof ( struct printk_log ) + msg - > text_len ;
2012-05-03 04:29:13 +04:00
}
/* get record by index; idx must point to valid msg */
2013-08-01 00:53:47 +04:00
static struct printk_log * log_from_idx ( u32 idx )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg = ( struct printk_log * ) ( log_buf + idx ) ;
2012-05-03 04:29:13 +04:00
/*
* A length = = 0 record is the end of buffer marker . Wrap around and
* read the message at the start of the buffer .
*/
if ( ! msg - > len )
2013-08-01 00:53:47 +04:00
return ( struct printk_log * ) log_buf ;
2012-05-03 04:29:13 +04:00
return msg ;
}
/* get next record; idx must point to valid msg */
static u32 log_next ( u32 idx )
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg = ( struct printk_log * ) ( log_buf + idx ) ;
2012-05-03 04:29:13 +04:00
/* length == 0 indicates the end of the buffer; wrap */
/*
* A length = = 0 record is the end of buffer marker . Wrap around and
* read the message at the start of the buffer as * this * one , and
* return the one after that .
*/
if ( ! msg - > len ) {
2013-08-01 00:53:47 +04:00
msg = ( struct printk_log * ) log_buf ;
2012-05-03 04:29:13 +04:00
return msg - > len ;
}
return idx + msg - > len ;
}
2014-06-05 03:11:30 +04:00
/*
* Check whether there is enough free space for the given message .
*
* The same values of first_idx and next_idx mean that the buffer
* is either empty or full .
*
* If the buffer is empty , we must respect the position of the indexes .
* They cannot be reset to the beginning of the buffer .
*/
static int logbuf_has_space ( u32 msg_size , bool empty )
2014-06-05 03:11:28 +04:00
{
u32 free ;
2014-06-05 03:11:30 +04:00
if ( log_next_idx > log_first_idx | | empty )
2014-06-05 03:11:28 +04:00
free = max ( log_buf_len - log_next_idx , log_first_idx ) ;
else
free = log_first_idx - log_next_idx ;
/*
* We need space also for an empty header that signalizes wrapping
* of the buffer .
*/
return free > = msg_size + sizeof ( struct printk_log ) ;
}
2014-06-05 03:11:30 +04:00
static int log_make_free_space ( u32 msg_size )
2014-06-05 03:11:28 +04:00
{
2016-03-18 00:21:30 +03:00
while ( log_first_seq < log_next_seq & &
! logbuf_has_space ( msg_size , false ) ) {
2014-08-07 03:09:03 +04:00
/* drop old messages until we have enough contiguous space */
2014-06-05 03:11:28 +04:00
log_first_idx = log_next ( log_first_idx ) ;
log_first_seq + + ;
}
2014-06-05 03:11:30 +04:00
2016-03-18 00:21:30 +03:00
if ( clear_seq < log_first_seq ) {
clear_seq = log_first_seq ;
clear_idx = log_first_idx ;
}
2014-06-05 03:11:30 +04:00
/* sequence numbers are equal, so the log buffer is empty */
2016-03-18 00:21:30 +03:00
if ( logbuf_has_space ( msg_size , log_first_seq = = log_next_seq ) )
2014-06-05 03:11:30 +04:00
return 0 ;
return - ENOMEM ;
2014-06-05 03:11:28 +04:00
}
2014-06-05 03:11:31 +04:00
/* compute the message size including the padding bytes */
static u32 msg_used_size ( u16 text_len , u16 dict_len , u32 * pad_len )
{
u32 size ;
size = sizeof ( struct printk_log ) + text_len + dict_len ;
* pad_len = ( - size ) & ( LOG_ALIGN - 1 ) ;
size + = * pad_len ;
return size ;
}
2014-06-05 03:11:32 +04:00
/*
* Define how much of the log buffer we could take at maximum . The value
* must be greater than two . Note that only half of the buffer is available
* when the index points to the middle .
*/
# define MAX_LOG_TAKE_PART 4
static const char trunc_msg [ ] = " <truncated> " ;
static u32 truncate_msg ( u16 * text_len , u16 * trunc_msg_len ,
u16 * dict_len , u32 * pad_len )
{
/*
* The message should not take the whole buffer . Otherwise , it might
* get removed too soon .
*/
u32 max_text_len = log_buf_len / MAX_LOG_TAKE_PART ;
if ( * text_len > max_text_len )
* text_len = max_text_len ;
/* enable the warning message */
* trunc_msg_len = strlen ( trunc_msg ) ;
/* disable the "dict" completely */
* dict_len = 0 ;
/* compute the size again, count also the warning message */
return msg_used_size ( * text_len + * trunc_msg_len , 0 , pad_len ) ;
}
2012-05-03 04:29:13 +04:00
/* insert record into the buffer, discard old ones, update heads */
2014-06-05 03:11:33 +04:00
static int log_store ( int facility , int level ,
enum log_flags flags , u64 ts_nsec ,
const char * dict , u16 dict_len ,
const char * text , u16 text_len )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-05-03 04:29:13 +04:00
u32 size , pad_len ;
2014-06-05 03:11:32 +04:00
u16 trunc_msg_len = 0 ;
2012-05-03 04:29:13 +04:00
/* number of '\0' padding bytes to next message */
2014-06-05 03:11:31 +04:00
size = msg_used_size ( text_len , dict_len , & pad_len ) ;
2012-05-03 04:29:13 +04:00
2014-06-05 03:11:32 +04:00
if ( log_make_free_space ( size ) ) {
/* truncate the message if it is too long for empty buffer */
size = truncate_msg ( & text_len , & trunc_msg_len ,
& dict_len , & pad_len ) ;
/* survive when the log buffer is too small for trunc_msg */
if ( log_make_free_space ( size ) )
2014-06-05 03:11:33 +04:00
return 0 ;
2014-06-05 03:11:32 +04:00
}
2012-05-03 04:29:13 +04:00
2014-04-04 01:48:42 +04:00
if ( log_next_idx + size + sizeof ( struct printk_log ) > log_buf_len ) {
2012-05-03 04:29:13 +04:00
/*
* This message + an additional empty header does not fit
* at the end of the buffer . Add an empty header with len = = 0
* to signify a wrap around .
*/
2013-08-01 00:53:47 +04:00
memset ( log_buf + log_next_idx , 0 , sizeof ( struct printk_log ) ) ;
2012-05-03 04:29:13 +04:00
log_next_idx = 0 ;
}
/* fill message */
2013-08-01 00:53:47 +04:00
msg = ( struct printk_log * ) ( log_buf + log_next_idx ) ;
2012-05-03 04:29:13 +04:00
memcpy ( log_text ( msg ) , text , text_len ) ;
msg - > text_len = text_len ;
2014-06-05 03:11:32 +04:00
if ( trunc_msg_len ) {
memcpy ( log_text ( msg ) + text_len , trunc_msg , trunc_msg_len ) ;
msg - > text_len + = trunc_msg_len ;
}
2012-05-03 04:29:13 +04:00
memcpy ( log_dict ( msg ) , dict , dict_len ) ;
msg - > dict_len = dict_len ;
2012-06-28 11:38:53 +04:00
msg - > facility = facility ;
msg - > level = level & 7 ;
msg - > flags = flags & 0x1f ;
if ( ts_nsec > 0 )
msg - > ts_nsec = ts_nsec ;
else
msg - > ts_nsec = local_clock ( ) ;
2012-05-03 04:29:13 +04:00
memset ( log_dict ( msg ) + dict_len , 0 , pad_len ) ;
2014-04-04 01:48:43 +04:00
msg - > len = size ;
2012-05-03 04:29:13 +04:00
/* insert message */
log_next_idx + = msg - > len ;
log_next_seq + + ;
2014-06-05 03:11:33 +04:00
return msg - > text_len ;
2012-05-03 04:29:13 +04:00
}
2005-05-01 19:59:02 +04:00
2014-08-07 03:09:05 +04:00
int dmesg_restrict = IS_ENABLED ( CONFIG_SECURITY_DMESG_RESTRICT ) ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
static int syslog_action_restricted ( int type )
{
if ( dmesg_restrict )
return 1 ;
/*
* Unless restricted , we allow " read all " and " get buffer size "
* for everybody .
*/
return type ! = SYSLOG_ACTION_READ_ALL & &
type ! = SYSLOG_ACTION_SIZE_BUFFER ;
}
2017-08-10 07:11:00 +03:00
static int check_syslog_permissions ( int type , int source )
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
{
/*
* If this is from / proc / kmsg and we ' ve already opened it , then we ' ve
* already done the capabilities checks at open time .
*/
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC & & type ! = SYSLOG_ACTION_OPEN )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
if ( syslog_action_restricted ( type ) ) {
if ( capable ( CAP_SYSLOG ) )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
/*
* For historical reasons , accept CAP_SYS_ADMIN too , with
* a warning .
*/
if ( capable ( CAP_SYS_ADMIN ) ) {
pr_warn_once ( " %s (%d): Attempt to access syslog with "
" CAP_SYS_ADMIN but no CAP_SYSLOG "
" (deprecated). \n " ,
current - > comm , task_pid_nr ( current ) ) ;
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
}
return - EPERM ;
}
2015-06-26 01:01:44 +03:00
ok :
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return security_syslog ( type ) ;
}
2015-06-26 01:01:24 +03:00
static void append_char ( char * * pp , char * e , char c )
{
if ( * pp < e )
* ( * pp ) + + = c ;
}
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
2015-06-26 01:01:27 +03:00
static ssize_t msg_print_ext_header ( char * buf , size_t size ,
2016-10-25 21:27:31 +03:00
struct printk_log * msg , u64 seq )
2015-06-26 01:01:27 +03:00
{
u64 ts_usec = msg - > ts_nsec ;
do_div ( ts_usec , 1000 ) ;
return scnprintf ( buf , size , " %u,%llu,%llu,%c; " ,
2016-10-25 21:27:31 +03:00
( msg - > facility < < 3 ) | msg - > level , seq , ts_usec ,
msg - > flags & LOG_CONT ? ' c ' : ' - ' ) ;
2015-06-26 01:01:27 +03:00
}
static ssize_t msg_print_ext_body ( char * buf , size_t size ,
char * dict , size_t dict_len ,
char * text , size_t text_len )
{
char * p = buf , * e = buf + size ;
size_t i ;
/* escape non-printable characters */
for ( i = 0 ; i < text_len ; i + + ) {
unsigned char c = text [ i ] ;
if ( c < ' ' | | c > = 127 | | c = = ' \\ ' )
p + = scnprintf ( p , e - p , " \\ x%02x " , c ) ;
else
append_char ( & p , e , c ) ;
}
append_char ( & p , e , ' \n ' ) ;
if ( dict_len ) {
bool line = true ;
for ( i = 0 ; i < dict_len ; i + + ) {
unsigned char c = dict [ i ] ;
if ( line ) {
append_char ( & p , e , ' ' ) ;
line = false ;
}
if ( c = = ' \0 ' ) {
append_char ( & p , e , ' \n ' ) ;
line = true ;
continue ;
}
if ( c < ' ' | | c > = 127 | | c = = ' \\ ' ) {
p + = scnprintf ( p , e - p , " \\ x%02x " , c ) ;
continue ;
}
append_char ( & p , e , c ) ;
}
append_char ( & p , e , ' \n ' ) ;
}
return p - buf ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
/* /dev/kmsg - userspace message inject/listen interface */
struct devkmsg_user {
u64 seq ;
u32 idx ;
2016-08-03 00:04:07 +03:00
struct ratelimit_state rs ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
struct mutex lock ;
2015-06-26 01:01:24 +03:00
char buf [ CONSOLE_EXT_LOG_MAX ] ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
} ;
2014-08-23 20:23:53 +04:00
static ssize_t devkmsg_write ( struct kiocb * iocb , struct iov_iter * from )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
{
char * buf , * line ;
int level = default_message_loglevel ;
int facility = 1 ; /* LOG_USER */
2016-08-03 00:04:07 +03:00
struct file * file = iocb - > ki_filp ;
struct devkmsg_user * user = file - > private_data ;
2015-02-11 21:56:46 +03:00
size_t len = iov_iter_count ( from ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ssize_t ret = len ;
2016-08-03 00:04:07 +03:00
if ( ! user | | len > LOG_LINE_MAX )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return - EINVAL ;
2016-08-03 00:04:07 +03:00
/* Ignore when user logging is disabled. */
if ( devkmsg_log & DEVKMSG_LOG_MASK_OFF )
return len ;
/* Ratelimit when not explicitly enabled. */
if ( ! ( devkmsg_log & DEVKMSG_LOG_MASK_ON ) ) {
if ( ! ___ratelimit ( & user - > rs , current - > comm ) )
return ret ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
buf = kmalloc ( len + 1 , GFP_KERNEL ) ;
if ( buf = = NULL )
return - ENOMEM ;
2014-08-23 20:23:53 +04:00
buf [ len ] = ' \0 ' ;
2016-11-02 05:09:04 +03:00
if ( ! copy_from_iter_full ( buf , len , from ) ) {
2014-08-23 20:23:53 +04:00
kfree ( buf ) ;
return - EFAULT ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
/*
* Extract and skip the syslog prefix < [ 0 - 9 ] * > . Coming from userspace
* the decimal value represents 32 bit , the lower 3 bit are the log
* level , the rest are the log facility .
*
* If no prefix or no userspace facility is specified , we
* enforce LOG_USER , to be able to reliably distinguish
* kernel - generated messages from userspace - injected ones .
*/
line = buf ;
if ( line [ 0 ] = = ' < ' ) {
char * endp = NULL ;
2015-11-07 03:30:38 +03:00
unsigned int u ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2015-11-07 03:30:38 +03:00
u = simple_strtoul ( line + 1 , & endp , 10 ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( endp & & endp [ 0 ] = = ' > ' ) {
2015-11-07 03:30:38 +03:00
level = LOG_LEVEL ( u ) ;
if ( LOG_FACILITY ( u ) ! = 0 )
facility = LOG_FACILITY ( u ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
endp + + ;
len - = endp - line ;
line = endp ;
}
}
printk_emit ( facility , level , NULL , 0 , " %s " , line ) ;
kfree ( buf ) ;
return ret ;
}
static ssize_t devkmsg_read ( struct file * file , char __user * buf ,
size_t count , loff_t * ppos )
{
struct devkmsg_user * user = file - > private_data ;
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
size_t len ;
ssize_t ret ;
if ( ! user )
return - EBADF ;
printk: use mutex lock to stop syslog_seq from going wild
Although syslog_seq and log_next_seq stuff are protected by logbuf_lock
spin log, it's not enough. Say we have two processes A and B, and let
syslog_seq = N, while log_next_seq = N + 1, and the two processes both
come to syslog_print at almost the same time. And No matter which
process get the spin lock first, it will increase syslog_seq by one,
then release spin lock; thus later, another process increase syslog_seq
by one again. In this case, syslog_seq is bigger than syslog_next_seq.
And latter, it would make:
wait_event_interruptiable(log_wait, syslog != log_next_seq)
don't wait any more even there is no new write comes. Thus it introduce
a infinite loop reading.
I can easily see this kind of issue by the following steps:
# cat /proc/kmsg # at meantime, I don't kill rsyslog
# So they are the two processes.
# xinit # I added drm.debug=6 in the kernel parameter line,
# so that it will produce lots of message and let that
# issue happen
It's 100% reproducable on my side. And my disk will be filled up by
/var/log/messages in a quite short time.
So, introduce a mutex_lock to stop syslog_seq from going wild just like
what devkmsg_read() does. It does fix this issue as expected.
v2: use mutex_lock_interruptiable() instead (comments from Kay)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-By: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-16 17:21:51 +04:00
ret = mutex_lock_interruptible ( & user - > lock ) ;
if ( ret )
return ret ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
while ( user - > seq = = log_next_seq ) {
if ( file - > f_flags & O_NONBLOCK ) {
ret = - EAGAIN ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
goto out ;
}
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = wait_event_interruptible ( log_wait ,
user - > seq ! = log_next_seq ) ;
if ( ret )
goto out ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
if ( user - > seq < log_first_seq ) {
/* our last seen message is gone, return error and reset */
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
ret = - EPIPE ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
goto out ;
}
msg = log_from_idx ( user - > idx ) ;
2015-06-26 01:01:27 +03:00
len = msg_print_ext_header ( user - > buf , sizeof ( user - > buf ) ,
2016-10-25 21:27:31 +03:00
msg , user - > seq ) ;
2015-06-26 01:01:27 +03:00
len + = msg_print_ext_body ( user - > buf + len , sizeof ( user - > buf ) - len ,
log_dict ( msg ) , msg - > dict_len ,
log_text ( msg ) , msg - > text_len ) ;
kmsg - export "continuation record" flag to /dev/kmsg
In some cases we are forced to store individual records for a continuation
line print.
Export a flag to allow the external re-construction of the line. The flag
allows us to apply a similar logic externally which is used internally when
the console, /proc/kmsg or the syslog() output is printed.
$ cat /dev/kmsg
4,165,0,-;Free swap = 0kB
4,166,0,-;Total swap = 0kB
6,167,0,c;[
4,168,0,+;0
4,169,0,+;1
4,170,0,+;2
4,171,0,+;3
4,172,0,+;]
6,173,0,-;[0 1 2 3 ]
6,174,0,-;Console: colour VGA+ 80x25
6,175,0,-;console [tty0] enabled
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 05:35:30 +04:00
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
user - > idx = log_next ( user - > idx ) ;
user - > seq + + ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( len > count ) {
ret = - EINVAL ;
goto out ;
}
if ( copy_to_user ( buf , user - > buf , len ) ) {
ret = - EFAULT ;
goto out ;
}
ret = len ;
out :
mutex_unlock ( & user - > lock ) ;
return ret ;
}
static loff_t devkmsg_llseek ( struct file * file , loff_t offset , int whence )
{
struct devkmsg_user * user = file - > private_data ;
loff_t ret = 0 ;
if ( ! user )
return - EBADF ;
if ( offset )
return - ESPIPE ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
switch ( whence ) {
case SEEK_SET :
/* the first record */
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
break ;
case SEEK_DATA :
/*
* The first record after the last SYSLOG_ACTION_CLEAR ,
* like issued by ' dmesg - c ' . Reading / dev / kmsg itself
* changes no global state , and does not clear anything .
*/
user - > idx = clear_idx ;
user - > seq = clear_seq ;
break ;
case SEEK_END :
/* after the last record */
user - > idx = log_next_idx ;
user - > seq = log_next_seq ;
break ;
default :
ret = - EINVAL ;
}
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return ret ;
}
2017-07-03 07:42:43 +03:00
static __poll_t devkmsg_poll ( struct file * file , poll_table * wait )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
{
struct devkmsg_user * user = file - > private_data ;
2017-07-03 07:42:43 +03:00
__poll_t ret = 0 ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( ! user )
2018-02-12 01:34:03 +03:00
return EPOLLERR | EPOLLNVAL ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
poll_wait ( file , & log_wait , wait ) ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( user - > seq < log_next_seq ) {
/* return error when data has vanished underneath us */
if ( user - > seq < log_first_seq )
2018-02-12 01:34:03 +03:00
ret = EPOLLIN | EPOLLRDNORM | EPOLLERR | EPOLLPRI ;
2013-04-30 03:17:20 +04:00
else
2018-02-12 01:34:03 +03:00
ret = EPOLLIN | EPOLLRDNORM ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return ret ;
}
static int devkmsg_open ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user ;
int err ;
2016-08-03 00:04:07 +03:00
if ( devkmsg_log & DEVKMSG_LOG_MASK_OFF )
return - EPERM ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2016-08-03 00:04:07 +03:00
/* write-only does not need any file context */
if ( ( file - > f_flags & O_ACCMODE ) ! = O_WRONLY ) {
err = check_syslog_permissions ( SYSLOG_ACTION_READ_ALL ,
SYSLOG_FROM_READER ) ;
if ( err )
return err ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
user = kmalloc ( sizeof ( struct devkmsg_user ) , GFP_KERNEL ) ;
if ( ! user )
return - ENOMEM ;
2016-08-03 00:04:07 +03:00
ratelimit_default_init ( & user - > rs ) ;
ratelimit_set_flags ( & user - > rs , RATELIMIT_MSG_ON_RELEASE ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
mutex_init ( & user - > lock ) ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
file - > private_data = user ;
return 0 ;
}
static int devkmsg_release ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user = file - > private_data ;
if ( ! user )
return 0 ;
2016-08-03 00:04:07 +03:00
ratelimit_state_exit ( & user - > rs ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
mutex_destroy ( & user - > lock ) ;
kfree ( user ) ;
return 0 ;
}
const struct file_operations kmsg_fops = {
. open = devkmsg_open ,
. read = devkmsg_read ,
2014-08-23 20:23:53 +04:00
. write_iter = devkmsg_write ,
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
. llseek = devkmsg_llseek ,
. poll = devkmsg_poll ,
. release = devkmsg_release ,
} ;
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
# ifdef CONFIG_CRASH_CORE
2009-04-03 03:58:57 +04:00
/*
2013-11-13 03:08:54 +04:00
* This appends the listed symbols to / proc / vmcore
2009-04-03 03:58:57 +04:00
*
2013-11-13 03:08:54 +04:00
* / proc / vmcore is used by various utilities , like crash and makedumpfile to
2009-04-03 03:58:57 +04:00
* obtain access to symbols that are otherwise very difficult to locate . These
* symbols are specifically used so that utilities can access and extract the
* dmesg log from a vmcore file after a crash .
*/
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.
The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.
This patch (of 5):
Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.
But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.
Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09 01:56:18 +03:00
void log_buf_vmcoreinfo_setup ( void )
2009-04-03 03:58:57 +04:00
{
VMCOREINFO_SYMBOL ( log_buf ) ;
VMCOREINFO_SYMBOL ( log_buf_len ) ;
2012-05-03 04:29:13 +04:00
VMCOREINFO_SYMBOL ( log_first_idx ) ;
2016-03-18 00:21:30 +03:00
VMCOREINFO_SYMBOL ( clear_idx ) ;
2012-05-03 04:29:13 +04:00
VMCOREINFO_SYMBOL ( log_next_idx ) ;
2012-07-18 21:18:12 +04:00
/*
2013-08-01 00:53:47 +04:00
* Export struct printk_log size and field offsets . User space tools can
2012-07-18 21:18:12 +04:00
* parse it and detect any changes to structure down the line .
*/
2013-08-01 00:53:47 +04:00
VMCOREINFO_STRUCT_SIZE ( printk_log ) ;
VMCOREINFO_OFFSET ( printk_log , ts_nsec ) ;
VMCOREINFO_OFFSET ( printk_log , len ) ;
VMCOREINFO_OFFSET ( printk_log , text_len ) ;
VMCOREINFO_OFFSET ( printk_log , dict_len ) ;
2009-04-03 03:58:57 +04:00
}
# endif
2011-05-25 04:13:20 +04:00
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len ;
2014-08-07 03:08:52 +04:00
/* we practice scaling the ring buffer by powers of 2 */
static void __init log_buf_len_update ( unsigned size )
2005-04-17 02:20:36 +04:00
{
if ( size )
size = roundup_pow_of_two ( size ) ;
2011-05-25 04:13:20 +04:00
if ( size > log_buf_len )
new_log_buf_len = size ;
2014-08-07 03:08:52 +04:00
}
/* save requested log_buf_len since it's too early to process it */
static int __init log_buf_len_setup ( char * str )
{
unsigned size = memparse ( str , & str ) ;
log_buf_len_update ( size ) ;
2011-05-25 04:13:20 +04:00
return 0 ;
2005-04-17 02:20:36 +04:00
}
2011-05-25 04:13:20 +04:00
early_param ( " log_buf_len " , log_buf_len_setup ) ;
2014-10-14 02:51:11 +04:00
# ifdef CONFIG_SMP
# define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
2014-08-07 03:08:56 +04:00
static void __init log_buf_add_cpu ( void )
{
unsigned int cpu_extra ;
/*
* archs should set up cpu_possible_bits properly with
* set_cpu_possible ( ) after setup_arch ( ) but just in
* case lets ensure this is valid .
*/
if ( num_possible_cpus ( ) = = 1 )
return ;
cpu_extra = ( num_possible_cpus ( ) - 1 ) * __LOG_CPU_MAX_BUF_LEN ;
/* by default this will only continue through for large > 64 CPUs */
if ( cpu_extra < = __LOG_BUF_LEN / 2 )
return ;
pr_info ( " log_buf_len individual max cpu contribution: %d bytes \n " ,
__LOG_CPU_MAX_BUF_LEN ) ;
pr_info ( " log_buf_len total cpu_extra contributions: %d bytes \n " ,
cpu_extra ) ;
pr_info ( " log_buf_len min size: %d bytes \n " , __LOG_BUF_LEN ) ;
log_buf_len_update ( cpu_extra + __LOG_BUF_LEN ) ;
}
2014-10-14 02:51:11 +04:00
# else /* !CONFIG_SMP */
static inline void log_buf_add_cpu ( void ) { }
# endif /* CONFIG_SMP */
2014-08-07 03:08:56 +04:00
2011-05-25 04:13:20 +04:00
void __init setup_log_buf ( int early )
{
unsigned long flags ;
char * new_log_buf ;
int free ;
2014-08-07 03:08:56 +04:00
if ( log_buf ! = __log_buf )
return ;
if ( ! early & & ! new_log_buf_len )
log_buf_add_cpu ( ) ;
2011-05-25 04:13:20 +04:00
if ( ! new_log_buf_len )
return ;
2005-04-17 02:20:36 +04:00
2011-05-25 04:13:20 +04:00
if ( early ) {
2014-01-22 03:50:23 +04:00
new_log_buf =
2014-08-07 03:08:49 +04:00
memblock_virt_alloc ( new_log_buf_len , LOG_ALIGN ) ;
2011-05-25 04:13:20 +04:00
} else {
2014-08-07 03:08:49 +04:00
new_log_buf = memblock_virt_alloc_nopanic ( new_log_buf_len ,
LOG_ALIGN ) ;
2011-05-25 04:13:20 +04:00
}
if ( unlikely ( ! new_log_buf ) ) {
pr_err ( " log_buf_len: %ld bytes not available \n " ,
new_log_buf_len ) ;
return ;
}
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2011-05-25 04:13:20 +04:00
log_buf_len = new_log_buf_len ;
log_buf = new_log_buf ;
new_log_buf_len = 0 ;
2012-05-03 04:29:13 +04:00
free = __LOG_BUF_LEN - log_next_idx ;
memcpy ( log_buf , __log_buf , __LOG_BUF_LEN ) ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2011-05-25 04:13:20 +04:00
2014-08-07 03:08:54 +04:00
pr_info ( " log_buf_len: %d bytes \n " , log_buf_len ) ;
2011-05-25 04:13:20 +04:00
pr_info ( " early log buf free: %d(%d%%) \n " ,
free , ( free * 100 ) / __LOG_BUF_LEN ) ;
}
2005-04-17 02:20:36 +04:00
2012-12-18 03:59:56 +04:00
static bool __read_mostly ignore_loglevel ;
static int __init ignore_loglevel_setup ( char * str )
{
2014-08-07 03:09:12 +04:00
ignore_loglevel = true ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: ignoring loglevel setting. \n " ) ;
2012-12-18 03:59:56 +04:00
return 0 ;
}
early_param ( " ignore_loglevel " , ignore_loglevel_setup ) ;
module_param ( ignore_loglevel , bool , S_IRUGO | S_IWUSR ) ;
2015-02-13 02:01:34 +03:00
MODULE_PARM_DESC ( ignore_loglevel ,
" ignore loglevel setting (prints all kernel messages to the console) " ) ;
2012-12-18 03:59:56 +04:00
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
static bool suppress_message_printing ( int level )
{
return ( level > = console_loglevel & & ! ignore_loglevel ) ;
}
2007-10-16 12:23:46 +04:00
# ifdef CONFIG_BOOT_PRINTK_DELAY
2010-10-27 01:22:48 +04:00
static int boot_delay ; /* msecs delay after each printk during bootup */
2009-09-23 03:43:31 +04:00
static unsigned long long loops_per_msec ; /* based on boot_delay */
2007-10-16 12:23:46 +04:00
static int __init boot_delay_setup ( char * str )
{
unsigned long lpj ;
lpj = preset_lpj ? preset_lpj : 1000000 ; /* some guess */
loops_per_msec = ( unsigned long long ) lpj / 1000 * HZ ;
get_option ( & str , & boot_delay ) ;
if ( boot_delay > 10 * 1000 )
boot_delay = 0 ;
2009-09-23 03:43:31 +04:00
pr_debug ( " boot_delay: %u, preset_lpj: %ld, lpj: %lu, "
" HZ: %d, loops_per_msec: %llu \n " ,
boot_delay , preset_lpj , lpj , HZ , loops_per_msec ) ;
2013-11-13 03:08:53 +04:00
return 0 ;
2007-10-16 12:23:46 +04:00
}
2013-11-13 03:08:53 +04:00
early_param ( " boot_delay " , boot_delay_setup ) ;
2007-10-16 12:23:46 +04:00
2012-12-18 03:59:56 +04:00
static void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
unsigned long long k ;
unsigned long timeout ;
2017-05-16 21:42:45 +03:00
if ( ( boot_delay = = 0 | | system_state > = SYSTEM_RUNNING )
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
| | suppress_message_printing ( level ) ) {
2007-10-16 12:23:46 +04:00
return ;
2012-12-18 03:59:56 +04:00
}
2007-10-16 12:23:46 +04:00
2009-09-23 03:43:31 +04:00
k = ( unsigned long long ) loops_per_msec * boot_delay ;
2007-10-16 12:23:46 +04:00
timeout = jiffies + msecs_to_jiffies ( boot_delay ) ;
while ( k ) {
k - - ;
cpu_relax ( ) ;
/*
* use ( volatile ) jiffies to prevent
* compiler reduction ; loop termination via jiffies
* is secondary and may or may not happen .
*/
if ( time_after ( jiffies , timeout ) )
break ;
touch_nmi_watchdog ( ) ;
}
}
# else
2012-12-18 03:59:56 +04:00
static inline void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
}
# endif
2014-08-07 03:09:05 +04:00
static bool printk_time = IS_ENABLED ( CONFIG_PRINTK_TIME ) ;
2012-05-03 04:29:13 +04:00
module_param_named ( time , printk_time , bool , S_IRUGO | S_IWUSR ) ;
2012-06-28 11:38:53 +04:00
static size_t print_time ( u64 ts , char * buf )
{
unsigned long rem_nsec ;
if ( ! printk_time )
return 0 ;
printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e42 ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 03:35:50 +04:00
rem_nsec = do_div ( ts , 1000000000 ) ;
2012-06-28 11:38:53 +04:00
if ( ! buf )
printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e42 ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 03:35:50 +04:00
return snprintf ( NULL , 0 , " [%5lu.000000] " , ( unsigned long ) ts ) ;
2012-06-28 11:38:53 +04:00
return sprintf ( buf , " [%5lu.%06lu] " ,
( unsigned long ) ts , rem_nsec / 1000 ) ;
}
2013-08-01 00:53:47 +04:00
static size_t print_prefix ( const struct printk_log * msg , bool syslog , char * buf )
2012-05-10 06:30:45 +04:00
{
2012-05-14 01:30:46 +04:00
size_t len = 0 ;
2012-07-06 20:50:09 +04:00
unsigned int prefix = ( msg - > facility < < 3 ) | msg - > level ;
2012-05-10 06:30:45 +04:00
2012-05-14 01:30:46 +04:00
if ( syslog ) {
if ( buf ) {
2012-07-06 20:50:09 +04:00
len + = sprintf ( buf , " <%u> " , prefix ) ;
2012-05-14 01:30:46 +04:00
} else {
len + = 3 ;
2012-07-06 20:50:09 +04:00
if ( prefix > 999 )
len + = 3 ;
else if ( prefix > 99 )
len + = 2 ;
else if ( prefix > 9 )
2012-05-14 01:30:46 +04:00
len + + ;
}
}
2012-05-10 06:30:45 +04:00
2012-06-28 11:38:53 +04:00
len + = print_time ( msg - > ts_nsec , buf ? buf + len : NULL ) ;
2012-05-14 01:30:46 +04:00
return len ;
2012-05-10 06:30:45 +04:00
}
2016-10-25 21:27:31 +03:00
static size_t msg_print_text ( const struct printk_log * msg , bool syslog , char * buf , size_t size )
2012-05-03 04:29:13 +04:00
{
2012-05-14 01:30:46 +04:00
const char * text = log_text ( msg ) ;
size_t text_size = msg - > text_len ;
size_t len = 0 ;
do {
const char * next = memchr ( text , ' \n ' , text_size ) ;
size_t text_len ;
if ( next ) {
text_len = next - text ;
next + + ;
text_size - = next - text ;
} else {
text_len = text_size ;
}
2012-05-03 04:29:13 +04:00
2012-05-14 01:30:46 +04:00
if ( buf ) {
if ( print_prefix ( msg , syslog , NULL ) +
2012-07-17 05:35:29 +04:00
text_len + 1 > = size - len )
2012-05-14 01:30:46 +04:00
break ;
2012-05-03 04:29:13 +04:00
2016-10-25 21:27:31 +03:00
len + = print_prefix ( msg , syslog , buf + len ) ;
2012-05-14 01:30:46 +04:00
memcpy ( buf + len , text , text_len ) ;
len + = text_len ;
2016-10-25 21:27:31 +03:00
buf [ len + + ] = ' \n ' ;
2012-05-14 01:30:46 +04:00
} else {
/* SYSLOG_ACTION_* buffer size only calculation */
2016-10-25 21:27:31 +03:00
len + = print_prefix ( msg , syslog , NULL ) ;
2012-07-09 23:15:42 +04:00
len + = text_len ;
2016-10-25 21:27:31 +03:00
len + + ;
2012-05-14 01:30:46 +04:00
}
2012-05-03 04:29:13 +04:00
2012-05-14 01:30:46 +04:00
text = next ;
} while ( text ) ;
2012-05-03 04:29:13 +04:00
return len ;
}
static int syslog_print ( char __user * buf , int size )
{
char * text ;
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-06-22 19:36:09 +04:00
int len = 0 ;
2012-05-03 04:29:13 +04:00
2012-07-17 05:35:29 +04:00
text = kmalloc ( LOG_LINE_MAX + PREFIX_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
2012-06-22 19:36:09 +04:00
while ( size > 0 ) {
size_t n ;
2012-07-09 21:05:10 +04:00
size_t skip ;
2012-06-22 19:36:09 +04:00
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
2012-06-22 19:36:09 +04:00
if ( syslog_seq < log_first_seq ) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq ;
syslog_idx = log_first_idx ;
2012-07-09 21:05:10 +04:00
syslog_partial = 0 ;
2012-06-22 19:36:09 +04:00
}
if ( syslog_seq = = log_next_seq ) {
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
2012-06-22 19:36:09 +04:00
break ;
}
2012-07-09 21:05:10 +04:00
skip = syslog_partial ;
2012-06-22 19:36:09 +04:00
msg = log_from_idx ( syslog_idx ) ;
2016-10-25 21:27:31 +03:00
n = msg_print_text ( msg , true , text , LOG_LINE_MAX + PREFIX_MAX ) ;
2012-07-09 21:05:10 +04:00
if ( n - syslog_partial < = size ) {
/* message fits into buffer, move forward */
2012-06-22 19:36:09 +04:00
syslog_idx = log_next ( syslog_idx ) ;
syslog_seq + + ;
2012-07-09 21:05:10 +04:00
n - = syslog_partial ;
syslog_partial = 0 ;
} else if ( ! len ) {
/* partial read(), remember position */
n = size ;
syslog_partial + = n ;
2012-06-22 19:36:09 +04:00
} else
n = 0 ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
2012-06-22 19:36:09 +04:00
if ( ! n )
break ;
2012-07-09 21:05:10 +04:00
if ( copy_to_user ( buf , text + skip , n ) ) {
2012-06-22 19:36:09 +04:00
if ( ! len )
len = - EFAULT ;
break ;
}
2012-07-09 21:05:10 +04:00
len + = n ;
size - = n ;
buf + = n ;
2012-05-03 04:29:13 +04:00
}
kfree ( text ) ;
return len ;
}
static int syslog_print_all ( char __user * buf , int size , bool clear )
{
char * text ;
int len = 0 ;
2012-07-17 05:35:29 +04:00
text = kmalloc ( LOG_LINE_MAX + PREFIX_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
2012-05-03 04:29:13 +04:00
if ( buf ) {
u64 next_seq ;
u64 seq ;
u32 idx ;
/*
* Find first record that fits , including all following records ,
* into the user - provided buffer for this dump .
2012-06-15 16:07:51 +04:00
*/
2012-05-03 04:29:13 +04:00
seq = clear_seq ;
idx = clear_idx ;
while ( seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2016-10-25 21:27:31 +03:00
len + = msg_print_text ( msg , true , NULL , 0 ) ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
2012-06-15 16:07:51 +04:00
/* move first record forward until length fits into the buffer */
2012-05-03 04:29:13 +04:00
seq = clear_seq ;
idx = clear_idx ;
while ( len > size & & seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2016-10-25 21:27:31 +03:00
len - = msg_print_text ( msg , true , NULL , 0 ) ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
2012-06-15 16:07:51 +04:00
/* last message fitting into this dump */
2012-05-03 04:29:13 +04:00
next_seq = log_next_seq ;
len = 0 ;
while ( len > = 0 & & seq < next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-03 04:29:13 +04:00
int textlen ;
2016-10-25 21:27:31 +03:00
textlen = msg_print_text ( msg , true , text ,
2012-07-17 05:35:29 +04:00
LOG_LINE_MAX + PREFIX_MAX ) ;
2012-05-03 04:29:13 +04:00
if ( textlen < 0 ) {
len = textlen ;
break ;
}
idx = log_next ( idx ) ;
seq + + ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
2012-05-03 04:29:13 +04:00
if ( copy_to_user ( buf + len , text , textlen ) )
len = - EFAULT ;
else
len + = textlen ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
2012-05-03 04:29:13 +04:00
if ( seq < log_first_seq ) {
/* messages are gone, move to next one */
seq = log_first_seq ;
idx = log_first_idx ;
}
}
}
if ( clear ) {
clear_seq = log_next_seq ;
clear_idx = log_next_idx ;
}
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
2012-05-03 04:29:13 +04:00
kfree ( text ) ;
return len ;
}
2015-06-26 01:01:47 +03:00
int do_syslog ( int type , char __user * buf , int len , int source )
2005-04-17 02:20:36 +04:00
{
2012-05-03 04:29:13 +04:00
bool clear = false ;
2014-12-11 02:50:15 +03:00
static int saved_console_loglevel = LOGLEVEL_DEFAULT ;
2011-02-11 04:53:55 +03:00
int error ;
2005-04-17 02:20:36 +04:00
2015-06-26 01:01:47 +03:00
error = check_syslog_permissions ( type , source ) ;
2011-02-11 04:53:55 +03:00
if ( error )
2017-07-30 06:36:36 +03:00
return error ;
2010-11-16 02:36:29 +03:00
2005-04-17 02:20:36 +04:00
switch ( type ) {
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_CLOSE : /* Close log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_OPEN : /* Open log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_READ : /* Read from log */
2005-04-17 02:20:36 +04:00
if ( ! buf | | len < 0 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( ! len )
2017-07-30 06:36:36 +03:00
return 0 ;
if ( ! access_ok ( VERIFY_WRITE , buf , len ) )
return - EFAULT ;
2005-10-31 02:02:46 +03:00
error = wait_event_interruptible ( log_wait ,
2012-05-03 04:29:13 +04:00
syslog_seq ! = log_next_seq ) ;
kmsg: properly handle concurrent non-blocking read() from /proc/kmsg
The /proc/kmsg read() interface is internally simply wired up to a sequence
of syslog() syscalls, which might are racy between their checks and actions,
regarding concurrency.
In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on
usual O_NONBLOCK behavior, the recently introduced mutex might block an
O_NONBLOCK reader in read(), when poll() returns for it, but another process
has already read the data in the meantime. We've seen that while running
artificial test setups and tools that "fight" about /proc/kmsg data.
This restores the original /proc/kmsg behavior, where in case of concurrent
read()s, poll() might wake up but the read() syscall will just return 0 to
the caller, while another process has "stolen" the data.
This is in the general case not the expected behavior, but it is the exact
same one, that can easily be triggered with a 3.4 kernel, and some tools
might just rely on it.
The mutex is not needed, the original integrity issue which introduced it,
is in the meantime covered by:
"fill buffer with more than a single message for SYSLOG_ACTION_READ"
116e90b23f74d303e8d607c7a7d54f60f14ab9f2
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-06 20:50:09 +04:00
if ( error )
2017-07-30 06:36:36 +03:00
return error ;
2012-05-03 04:29:13 +04:00
error = syslog_print ( buf , len ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR :
2012-05-03 04:29:13 +04:00
clear = true ;
2005-04-17 02:20:36 +04:00
/* FALL THRU */
2010-02-04 02:37:13 +03:00
/* Read last kernel messages */
case SYSLOG_ACTION_READ_ALL :
2005-04-17 02:20:36 +04:00
if ( ! buf | | len < 0 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( ! len )
2017-07-30 06:36:36 +03:00
return 0 ;
if ( ! access_ok ( VERIFY_WRITE , buf , len ) )
return - EFAULT ;
2012-05-03 04:29:13 +04:00
error = syslog_print_all ( buf , len , clear ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR :
2012-05-03 04:29:13 +04:00
syslog_print_all ( NULL , 0 , true ) ;
2012-06-23 01:12:19 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel = = LOGLEVEL_DEFAULT )
2009-07-06 15:31:48 +04:00
saved_console_loglevel = console_loglevel ;
2005-04-17 02:20:36 +04:00
console_loglevel = minimum_console_loglevel ;
break ;
2010-02-04 02:37:13 +03:00
/* Enable logging to console */
case SYSLOG_ACTION_CONSOLE_ON :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel ! = LOGLEVEL_DEFAULT ) {
2009-07-06 15:31:48 +04:00
console_loglevel = saved_console_loglevel ;
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2009-07-06 15:31:48 +04:00
}
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Set level of messages printed to console */
case SYSLOG_ACTION_CONSOLE_LEVEL :
2005-04-17 02:20:36 +04:00
if ( len < 1 | | len > 8 )
2017-07-30 06:36:36 +03:00
return - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( len < minimum_console_loglevel )
len = minimum_console_loglevel ;
console_loglevel = len ;
2009-07-06 15:31:48 +04:00
/* Implicitly re-enable logging to console */
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD :
2016-12-27 17:16:11 +03:00
logbuf_lock_irq ( ) ;
2012-05-03 04:29:13 +04:00
if ( syslog_seq < log_first_seq ) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq ;
syslog_idx = log_first_idx ;
2012-07-09 21:05:10 +04:00
syslog_partial = 0 ;
2012-05-03 04:29:13 +04:00
}
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC ) {
2012-05-03 04:29:13 +04:00
/*
* Short - cut for poll ( / " proc/kmsg " ) which simply checks
* for pending data , not the size ; return the count of
* records , not the length .
*/
2014-08-07 03:08:59 +04:00
error = log_next_seq - syslog_seq ;
2012-05-03 04:29:13 +04:00
} else {
2012-07-09 23:15:42 +04:00
u64 seq = syslog_seq ;
u32 idx = syslog_idx ;
2012-05-03 04:29:13 +04:00
while ( seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2016-10-25 21:27:31 +03:00
error + = msg_print_text ( msg , true , NULL , 0 ) ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
2012-07-09 21:05:10 +04:00
error - = syslog_partial ;
2012-05-03 04:29:13 +04:00
}
2016-12-27 17:16:11 +03:00
logbuf_unlock_irq ( ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER :
2005-04-17 02:20:36 +04:00
error = log_buf_len ;
break ;
default :
error = - EINVAL ;
break ;
}
2017-07-30 06:36:36 +03:00
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:29 +03:00
SYSCALL_DEFINE3 ( syslog , int , type , char __user * , buf , int , len )
2005-04-17 02:20:36 +04:00
{
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return do_syslog ( type , buf , len , SYSLOG_FROM_READER ) ;
2005-04-17 02:20:36 +04:00
}
2018-01-12 19:08:37 +03:00
/*
* Special console_lock variants that help to reduce the risk of soft - lockups .
* They allow to pass console_lock to another printk ( ) call using a busy wait .
*/
# ifdef CONFIG_LOCKDEP
static struct lockdep_map console_owner_dep_map = {
. name = " console_owner "
} ;
# endif
static DEFINE_RAW_SPINLOCK ( console_owner_lock ) ;
static struct task_struct * console_owner ;
static bool console_waiter ;
/**
* console_lock_spinning_enable - mark beginning of code where another
* thread might safely busy wait
*
* This basically converts console_lock into a spinlock . This marks
* the section where the console_lock owner can not sleep , because
* there may be a waiter spinning ( like a spinlock ) . Also it must be
* ready to hand over the lock at the end of the section .
*/
static void console_lock_spinning_enable ( void )
{
raw_spin_lock ( & console_owner_lock ) ;
console_owner = current ;
raw_spin_unlock ( & console_owner_lock ) ;
/* The waiter may spin on us after setting console_owner */
spin_acquire ( & console_owner_dep_map , 0 , 0 , _THIS_IP_ ) ;
}
/**
* console_lock_spinning_disable_and_check - mark end of code where another
* thread was able to busy wait and check if there is a waiter
*
* This is called at the end of the section where spinning is allowed .
* It has two functions . First , it is a signal that it is no longer
* safe to start busy waiting for the lock . Second , it checks if
* there is a busy waiter and passes the lock rights to her .
*
* Important : Callers lose the lock if there was a busy waiter .
* They must not touch items synchronized by console_lock
* in this case .
*
* Return : 1 if the lock rights were passed , 0 otherwise .
*/
static int console_lock_spinning_disable_and_check ( void )
{
int waiter ;
raw_spin_lock ( & console_owner_lock ) ;
waiter = READ_ONCE ( console_waiter ) ;
console_owner = NULL ;
raw_spin_unlock ( & console_owner_lock ) ;
if ( ! waiter ) {
spin_release ( & console_owner_dep_map , 1 , _THIS_IP_ ) ;
return 0 ;
}
/* The waiter is now free to continue */
WRITE_ONCE ( console_waiter , false ) ;
spin_release ( & console_owner_dep_map , 1 , _THIS_IP_ ) ;
/*
* Hand off console_lock to waiter . The waiter will perform
* the up ( ) . After this , the waiter is the console_lock owner .
*/
mutex_release ( & console_lock_dep_map , 1 , _THIS_IP_ ) ;
return 1 ;
}
/**
* console_trylock_spinning - try to get console_lock by busy waiting
*
* This allows to busy wait for the console_lock when the current
* owner is running in specially marked sections . It means that
* the current owner is running and cannot reschedule until it
* is ready to lose the lock .
*
* Return : 1 if we got the lock , 0 othrewise
*/
static int console_trylock_spinning ( void )
{
struct task_struct * owner = NULL ;
bool waiter ;
bool spin = false ;
unsigned long flags ;
if ( console_trylock ( ) )
return 1 ;
printk_safe_enter_irqsave ( flags ) ;
raw_spin_lock ( & console_owner_lock ) ;
owner = READ_ONCE ( console_owner ) ;
waiter = READ_ONCE ( console_waiter ) ;
if ( ! waiter & & owner & & owner ! = current ) {
WRITE_ONCE ( console_waiter , true ) ;
spin = true ;
}
raw_spin_unlock ( & console_owner_lock ) ;
/*
* If there is an active printk ( ) writing to the
* consoles , instead of having it write our data too ,
* see if we can offload that load from the active
* printer , and do some printing ourselves .
* Go into a spin only if there isn ' t already a waiter
* spinning , and there is an active printer , and
* that active printer isn ' t us ( recursive printk ? ) .
*/
if ( ! spin ) {
printk_safe_exit_irqrestore ( flags ) ;
return 0 ;
}
/* We spin waiting for the owner to release us */
spin_acquire ( & console_owner_dep_map , 0 , 0 , _THIS_IP_ ) ;
/* Owner will clear console_waiter on hand off */
while ( READ_ONCE ( console_waiter ) )
cpu_relax ( ) ;
spin_release ( & console_owner_dep_map , 1 , _THIS_IP_ ) ;
printk_safe_exit_irqrestore ( flags ) ;
/*
* The owner passed the console lock to us .
* Since we did not spin on console lock , annotate
* this as a trylock . Otherwise lockdep will
* complain .
*/
mutex_acquire ( & console_lock_dep_map , 0 , 1 , _THIS_IP_ ) ;
return 1 ;
}
2005-04-17 02:20:36 +04:00
/*
* Call the console drivers , asking them to write out
* log_buf [ start ] to log_buf [ end - 1 ] .
2011-01-26 02:07:35 +03:00
* The console_lock must be held .
2005-04-17 02:20:36 +04:00
*/
2016-12-24 17:09:01 +03:00
static void call_console_drivers ( const char * ext_text , size_t ext_len ,
2015-06-26 01:01:30 +03:00
const char * text , size_t len )
2005-04-17 02:20:36 +04:00
{
2012-05-03 04:29:13 +04:00
struct console * con ;
2005-04-17 02:20:36 +04:00
2017-02-18 14:42:54 +03:00
trace_console_rcuidle ( text , len ) ;
2012-05-03 04:29:13 +04:00
if ( ! console_drivers )
return ;
for_each_console ( con ) {
if ( exclusive_console & & con ! = exclusive_console )
continue ;
if ( ! ( con - > flags & CON_ENABLED ) )
continue ;
if ( ! con - > write )
continue ;
if ( ! cpu_online ( smp_processor_id ( ) ) & &
! ( con - > flags & CON_ANYTIME ) )
continue ;
2015-06-26 01:01:30 +03:00
if ( con - > flags & CON_EXTENDED )
con - > write ( con , ext_text , ext_len ) ;
else
con - > write ( con , text , len ) ;
2012-05-03 04:29:13 +04:00
}
2005-04-17 02:20:36 +04:00
}
2009-09-23 03:43:33 +04:00
int printk_delay_msec __read_mostly ;
static inline void printk_delay ( void )
{
if ( unlikely ( printk_delay_msec ) ) {
int m = printk_delay_msec ;
while ( m - - ) {
mdelay ( 1 ) ;
touch_nmi_watchdog ( ) ;
}
}
}
2012-06-28 11:38:53 +04:00
/*
* Continuation lines are buffered , and not committed to the record buffer
* until the line is complete , or a race forces it . The line fragments
* though , are printed immediately to the consoles to ensure everything has
* reached the console in case of a kernel crash .
*/
static struct cont {
char buf [ LOG_LINE_MAX ] ;
size_t len ; /* length == 0 means unused buffer */
struct task_struct * owner ; /* task of first print*/
u64 ts_nsec ; /* time of first print */
u8 level ; /* log level of first message */
2014-08-07 03:09:03 +04:00
u8 facility ; /* log facility of first message */
2012-07-17 05:35:30 +04:00
enum log_flags flags ; /* prefix, newline flags */
2012-06-28 11:38:53 +04:00
} cont ;
2016-10-09 21:53:00 +03:00
static void cont_flush ( void )
2012-06-28 11:38:53 +04:00
{
if ( cont . len = = 0 )
return ;
printk: remove console flushing special cases for partial buffered lines
It actively hurts proper merging, and makes for a lot of special cases.
There was a good(ish) reason for doing it originally, but it's getting
too painful to maintain. And most of the original reasons for it are
long gone.
So instead of having special code to flush partial lines to the console
(as opposed to the record buffers), do _all_ the console writing from
the record buffer, and be done with it.
If an oops happens (or some other synchronous event), we will flush the
partial lines due to the oops printing activity, so this does not affect
that. It does mean that if you have a completely hung machine, a
partial preceding line may not have been printed out.
That was some of the original reason for this complexity, in fact, back
when we used to test for the historical i386 "halt" instruction problem
by doing
pr_info("Checking 'hlt' instruction... ");
if (!boot_cpu_data.hlt_works_ok) {
pr_cont("disabled\n");
return;
}
halt();
halt();
halt();
halt();
pr_cont("OK\n");
and that model no longer works (it the 'hlt' instruction kills the
machine, the partial line won't have been flushed, so you won't even see
it).
Of course, that was also back in the days when people actually had
textual console output rather than a graphical splash-screen at bootup.
How times change..
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-25 21:27:31 +03:00
log_store ( cont . facility , cont . level , cont . flags , cont . ts_nsec ,
NULL , 0 , cont . buf , cont . len ) ;
cont . len = 0 ;
2012-06-28 11:38:53 +04:00
}
2016-10-09 21:53:00 +03:00
static bool cont_add ( int facility , int level , enum log_flags flags , const char * text , size_t len )
2012-06-28 11:38:53 +04:00
{
2015-06-26 01:01:30 +03:00
/*
* If ext consoles are present , flush and skip in - kernel
* continuation . See nr_ext_console_drivers definition . Also , if
* the line gets too long , split it up in separate records .
*/
if ( nr_ext_console_drivers | | cont . len + len > sizeof ( cont . buf ) ) {
2016-10-09 21:53:00 +03:00
cont_flush ( ) ;
2012-06-28 11:38:53 +04:00
return false ;
}
if ( ! cont . len ) {
cont . facility = facility ;
cont . level = level ;
cont . owner = current ;
cont . ts_nsec = local_clock ( ) ;
2016-10-09 21:53:00 +03:00
cont . flags = flags ;
2012-06-28 11:38:53 +04:00
}
memcpy ( cont . buf + cont . len , text , len ) ;
cont . len + = len ;
2012-07-17 05:35:30 +04:00
2016-10-09 21:53:00 +03:00
// The original flags come from the first line,
// but later continuations can add a newline.
if ( flags & LOG_NEWLINE ) {
cont . flags | = LOG_NEWLINE ;
cont_flush ( ) ;
}
2012-07-17 05:35:30 +04:00
if ( cont . len > ( sizeof ( cont . buf ) * 80 ) / 100 )
2016-10-09 21:53:00 +03:00
cont_flush ( ) ;
2012-07-17 05:35:30 +04:00
2012-06-28 11:38:53 +04:00
return true ;
}
2016-10-09 08:02:09 +03:00
static size_t log_output ( int facility , int level , enum log_flags lflags , const char * dict , size_t dictlen , char * text , size_t text_len )
{
/*
2016-10-09 21:53:00 +03:00
* If an earlier line was buffered , and we ' re a continuation
* write from the same process , try to add it to the buffer .
2016-10-09 08:02:09 +03:00
*/
if ( cont . len ) {
2016-10-09 21:53:00 +03:00
if ( cont . owner = = current & & ( lflags & LOG_CONT ) ) {
if ( cont_add ( facility , level , lflags , text , text_len ) )
return text_len ;
}
/* Otherwise, make sure it's flushed */
cont_flush ( ) ;
}
2016-10-09 08:02:09 +03:00
2016-10-19 19:11:24 +03:00
/* Skip empty continuation lines that couldn't be added - they just flush */
if ( ! text_len & & ( lflags & LOG_CONT ) )
return 0 ;
2016-10-09 21:53:00 +03:00
/* If it doesn't end in a newline, try to buffer the current line */
if ( ! ( lflags & LOG_NEWLINE ) ) {
if ( cont_add ( facility , level , lflags , text , text_len ) )
2016-10-09 08:02:09 +03:00
return text_len ;
}
2016-10-09 21:53:00 +03:00
/* Store it in the record log */
2016-10-09 08:02:09 +03:00
return log_store ( facility , level , lflags , 0 , dict , dictlen , text , text_len ) ;
}
2012-05-03 04:29:13 +04:00
asmlinkage int vprintk_emit ( int facility , int level ,
const char * dict , size_t dictlen ,
const char * fmt , va_list args )
2005-04-17 02:20:36 +04:00
{
2012-05-03 04:29:13 +04:00
static char textbuf [ LOG_LINE_MAX ] ;
char * text = textbuf ;
2017-07-11 09:40:55 +03:00
size_t text_len ;
2012-07-09 23:15:42 +04:00
enum log_flags lflags = 0 ;
2008-05-12 23:21:04 +04:00
unsigned long flags ;
2017-07-11 09:40:55 +03:00
int printed_len ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
bool in_sched = false ;
2014-06-05 03:11:35 +04:00
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_SCHED ) {
level = LOGLEVEL_DEFAULT ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
in_sched = true ;
}
2005-04-17 02:20:36 +04:00
2012-12-18 03:59:56 +04:00
boot_delay_msec ( level ) ;
2009-09-23 03:43:33 +04:00
printk_delay ( ) ;
2007-10-16 12:23:46 +04:00
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
/* This stops the holder of console_sem just where we want him */
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-05-03 04:29:13 +04:00
/*
* The printf needs to come first ; we need the syslog
* prefix which might be passed - in as a parameter .
*/
2014-10-14 02:51:13 +04:00
text_len = vscnprintf ( text , sizeof ( textbuf ) , fmt , args ) ;
2009-06-16 21:57:02 +04:00
2012-05-03 04:29:13 +04:00
/* mark and strip a trailing newline */
2012-05-14 22:46:27 +04:00
if ( text_len & & text [ text_len - 1 ] = = ' \n ' ) {
text_len - - ;
2012-07-09 23:15:42 +04:00
lflags | = LOG_NEWLINE ;
2012-05-03 04:29:13 +04:00
}
2011-03-13 05:19:51 +03:00
2012-07-31 01:40:19 +04:00
/* strip kernel syslog prefix and extract log level or control flags */
if ( facility = = 0 ) {
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 06:32:40 +03:00
int kern_level ;
2012-07-31 01:40:19 +04:00
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 06:32:40 +03:00
while ( ( kern_level = printk_get_level ( text ) ) ! = 0 ) {
2012-07-31 01:40:19 +04:00
switch ( kern_level ) {
case ' 0 ' . . . ' 7 ' :
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_DEFAULT )
2012-07-31 01:40:19 +04:00
level = kern_level - ' 0 ' ;
2014-12-11 02:50:15 +03:00
/* fallthrough */
2012-07-31 01:40:19 +04:00
case ' d ' : /* KERN_DEFAULT */
lflags | = LOG_PREFIX ;
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 06:32:40 +03:00
break ;
case ' c ' : /* KERN_CONT */
lflags | = LOG_CONT ;
2012-07-31 01:40:19 +04:00
}
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 06:32:40 +03:00
text_len - = 2 ;
text + = 2 ;
2009-06-16 21:57:02 +04:00
}
}
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_DEFAULT )
2012-05-14 22:46:27 +04:00
level = default_message_loglevel ;
2011-03-13 05:19:51 +03:00
2012-07-09 23:15:42 +04:00
if ( dict )
lflags | = LOG_PREFIX | LOG_NEWLINE ;
2008-05-12 23:21:04 +04:00
2017-07-11 09:40:55 +03:00
printed_len = log_output ( facility , level , lflags , dict , dictlen , text , text_len ) ;
2005-04-17 02:20:36 +04:00
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2014-06-05 03:11:37 +04:00
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
/* If called from the scheduler, we can not call up(). */
2014-07-03 02:22:38 +04:00
if ( ! in_sched ) {
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
/*
* Disable preemption to avoid being preempted while holding
* console_sem which would prevent anyone from printing to
* console
*/
preempt_disable ( ) ;
2014-07-03 02:22:38 +04:00
/*
* Try to acquire and then immediately release the console
* semaphore . The release will print out buffers and wake up
* / dev / kmsg and syslog ( ) users .
*/
2018-01-12 19:08:37 +03:00
if ( console_trylock_spinning ( ) )
2014-07-03 02:22:38 +04:00
console_unlock ( ) ;
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
preempt_enable ( ) ;
2014-07-03 02:22:38 +04:00
}
2006-06-25 16:47:40 +04:00
2005-04-17 02:20:36 +04:00
return printed_len ;
}
2012-05-03 04:29:13 +04:00
EXPORT_SYMBOL ( vprintk_emit ) ;
asmlinkage int vprintk ( const char * fmt , va_list args )
{
2016-12-27 17:16:04 +03:00
return vprintk_func ( fmt , args ) ;
2012-05-03 04:29:13 +04:00
}
2005-04-17 02:20:36 +04:00
EXPORT_SYMBOL ( vprintk ) ;
2012-05-03 04:29:13 +04:00
asmlinkage int printk_emit ( int facility , int level ,
const char * dict , size_t dictlen ,
const char * fmt , . . . )
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
r = vprintk_emit ( facility , level , dict , dictlen , fmt , args ) ;
va_end ( args ) ;
return r ;
}
EXPORT_SYMBOL ( printk_emit ) ;
2016-08-09 20:48:18 +03:00
int vprintk_default ( const char * fmt , va_list args )
2014-06-20 01:33:31 +04:00
{
int r ;
# ifdef CONFIG_KGDB_KDB
kdb: call vkdb_printf() from vprintk_default() only when wanted
kdb_trap_printk allows to pass normal printk() messages to kdb via
vkdb_printk(). For example, it is used to get backtrace using the
classic show_stack(), see kdb_show_stack().
vkdb_printf() tries to avoid a potential infinite loop by disabling the
trap. But this approach is racy, for example:
CPU1 CPU2
vkdb_printf()
// assume that kdb_trap_printk == 0
saved_trap_printk = kdb_trap_printk;
kdb_trap_printk = 0;
kdb_show_stack()
kdb_trap_printk++;
Problem1: Now, a nested printk() on CPU0 calls vkdb_printf()
even when it should have been disabled. It will not
cause a deadlock but...
// using the outdated saved value: 0
kdb_trap_printk = saved_trap_printk;
kdb_trap_printk--;
Problem2: Now, kdb_trap_printk == -1 and will stay like this.
It means that all messages will get passed to kdb from
now on.
This patch removes the racy saved_trap_printk handling. Instead, the
recursion is prevented by a check for the locked CPU.
The solution is still kind of racy. A non-related printk(), from
another process, might get trapped by vkdb_printf(). And the wanted
printk() might not get trapped because kdb_printf_cpu is assigned. But
this problem existed even with the original code.
A proper solution would be to get_cpu() before setting kdb_trap_printk
and trap messages only from this CPU. I am not sure if it is worth the
effort, though.
In fact, the race is very theoretical. When kdb is running any of the
commands that use kdb_trap_printk there is a single active CPU and the
other CPUs should be in a holding pen inside kgdb_cpu_enter().
The only time this is violated is when there is a timeout waiting for
the other CPUs to report to the holding pen.
Finally, note that the situation is a bit schizophrenic. vkdb_printf()
explicitly allows recursion but only from KDB code that calls
kdb_printf() directly. On the other hand, the generic printk()
recursion is not allowed because it might cause an infinite loop. This
is why we could not hide the decision inside vkdb_printf() easily.
Link: http://lkml.kernel.org/r/1480412276-16690-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 02:05:58 +03:00
/* Allow to pass printk() to kdb but avoid a recursion. */
if ( unlikely ( kdb_trap_printk & & kdb_printf_cpu < 0 ) ) {
2014-11-07 21:37:57 +03:00
r = vkdb_printf ( KDB_MSGSRC_PRINTK , fmt , args ) ;
2014-06-20 01:33:31 +04:00
return r ;
}
# endif
2016-08-09 20:48:18 +03:00
r = vprintk_emit ( 0 , LOGLEVEL_DEFAULT , NULL , 0 , fmt , args ) ;
2014-06-20 01:33:31 +04:00
return r ;
}
EXPORT_SYMBOL_GPL ( vprintk_default ) ;
2012-05-03 04:29:13 +04:00
/**
* printk - print a kernel message
* @ fmt : format string
*
* This is printk ( ) . It can be called from any context . We want it to work .
*
* We try to grab the console_lock . If we succeed , it ' s easy - we log the
* output and call the console drivers . If we fail to get the semaphore , we
* place the output into the log buffer and return . The current holder of
* the console_sem will notice the new output in console_unlock ( ) ; and will
* send it to the consoles before releasing the lock .
*
* One effect of this deferred printing is that code which calls printk ( ) and
* then changes console_loglevel may break . This is because console_loglevel
* is inspected when the actual printing occurs .
*
* See also :
* printf ( 3 )
*
* See the vsnprintf ( ) documentation for format string extensions over C99 .
*/
2014-05-02 02:44:38 +04:00
asmlinkage __visible int printk ( const char * fmt , . . . )
2012-05-03 04:29:13 +04:00
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
2016-08-09 20:48:18 +03:00
r = vprintk_func ( fmt , args ) ;
2012-05-03 04:29:13 +04:00
va_end ( args ) ;
return r ;
}
EXPORT_SYMBOL ( printk ) ;
2012-05-09 03:37:51 +04:00
2012-07-17 05:35:29 +04:00
# else /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2012-07-17 05:35:29 +04:00
# define LOG_LINE_MAX 0
# define PREFIX_MAX 0
2014-08-07 03:09:08 +04:00
2012-07-17 05:35:29 +04:00
static u64 syslog_seq ;
static u32 syslog_idx ;
2012-07-17 05:35:30 +04:00
static u64 console_seq ;
static u32 console_idx ;
2012-07-17 05:35:29 +04:00
static u64 log_first_seq ;
static u32 log_first_idx ;
static u64 log_next_seq ;
2015-06-26 01:01:30 +03:00
static char * log_text ( const struct printk_log * msg ) { return NULL ; }
static char * log_dict ( const struct printk_log * msg ) { return NULL ; }
2013-08-01 00:53:47 +04:00
static struct printk_log * log_from_idx ( u32 idx ) { return NULL ; }
2012-05-09 03:37:51 +04:00
static u32 log_next ( u32 idx ) { return 0 ; }
2015-06-26 01:01:30 +03:00
static ssize_t msg_print_ext_header ( char * buf , size_t size ,
2016-10-25 21:27:31 +03:00
struct printk_log * msg ,
u64 seq ) { return 0 ; }
2015-06-26 01:01:30 +03:00
static ssize_t msg_print_ext_body ( char * buf , size_t size ,
char * dict , size_t dict_len ,
char * text , size_t text_len ) { return 0 ; }
2018-01-12 19:08:37 +03:00
static void console_lock_spinning_enable ( void ) { }
static int console_lock_spinning_disable_and_check ( void ) { return 0 ; }
2016-12-24 17:09:01 +03:00
static void call_console_drivers ( const char * ext_text , size_t ext_len ,
2015-06-26 01:01:30 +03:00
const char * text , size_t len ) { }
2016-10-25 21:27:31 +03:00
static size_t msg_print_text ( const struct printk_log * msg ,
2012-07-09 23:15:42 +04:00
bool syslog , char * buf , size_t size ) { return 0 ; }
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
static bool suppress_message_printing ( int level ) { return false ; }
2005-05-01 19:59:02 +04:00
2012-05-09 03:37:51 +04:00
# endif /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2013-04-30 03:17:18 +04:00
# ifdef CONFIG_EARLY_PRINTK
struct console * early_console ;
2014-05-02 02:44:38 +04:00
asmlinkage __visible void early_printk ( const char * fmt , . . . )
2013-04-30 03:17:18 +04:00
{
va_list ap ;
2014-12-11 02:45:53 +03:00
char buf [ 512 ] ;
int n ;
if ( ! early_console )
return ;
2013-04-30 03:17:18 +04:00
va_start ( ap , fmt ) ;
2014-12-11 02:45:53 +03:00
n = vscnprintf ( buf , sizeof ( buf ) , fmt , ap ) ;
2013-04-30 03:17:18 +04:00
va_end ( ap ) ;
2014-12-11 02:45:53 +03:00
early_console - > write ( early_console , buf , n ) ;
2013-04-30 03:17:18 +04:00
}
# endif
2008-04-30 11:54:51 +04:00
static int __add_preferred_console ( char * name , int idx , char * options ,
char * brl_options )
{
struct console_cmdline * c ;
int i ;
/*
* See if this tty is not yet registered , and
* if we have a slot free .
*/
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
2013-08-01 00:53:46 +04:00
if ( strcmp ( c - > name , name ) = = 0 & & c - > index = = idx ) {
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
if ( ! brl_options )
preferred_console = i ;
2013-08-01 00:53:46 +04:00
return 0 ;
2008-04-30 11:54:51 +04:00
}
2013-08-01 00:53:46 +04:00
}
2008-04-30 11:54:51 +04:00
if ( i = = MAX_CMDLINECONSOLES )
return - E2BIG ;
if ( ! brl_options )
2017-03-15 13:28:51 +03:00
preferred_console = i ;
2008-04-30 11:54:51 +04:00
strlcpy ( c - > name , name , sizeof ( c - > name ) ) ;
c - > options = options ;
2013-08-01 00:53:45 +04:00
braille_set_options ( c , brl_options ) ;
2008-04-30 11:54:51 +04:00
c - > index = idx ;
return 0 ;
}
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 08:41:49 +03:00
static int __init console_msg_format_setup ( char * str )
{
if ( ! strcmp ( str , " syslog " ) )
console_msg_format = MSG_FORMAT_SYSLOG ;
if ( ! strcmp ( str , " default " ) )
console_msg_format = MSG_FORMAT_DEFAULT ;
return 1 ;
}
__setup ( " console_msg_format= " , console_msg_format_setup ) ;
2006-03-24 14:18:19 +03:00
/*
2014-08-07 03:09:03 +04:00
* Set up a console . Called via do_early_param ( ) in init / main . c
* for each " console= " parameter in the boot command line .
2006-03-24 14:18:19 +03:00
*/
static int __init console_setup ( char * str )
{
2014-08-07 03:09:03 +04:00
char buf [ sizeof ( console_cmdline [ 0 ] . name ) + 4 ] ; /* 4 for "ttyS" */
2008-04-30 11:54:51 +04:00
char * s , * options , * brl_options = NULL ;
2006-03-24 14:18:19 +03:00
int idx ;
2013-08-01 00:53:45 +04:00
if ( _braille_console_setup ( & str , & brl_options ) )
return 1 ;
2008-04-30 11:54:51 +04:00
2006-03-24 14:18:19 +03:00
/*
* Decode str into name , index , options .
*/
if ( str [ 0 ] > = ' 0 ' & & str [ 0 ] < = ' 9 ' ) {
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS " ) ;
strncpy ( buf + 4 , str , sizeof ( buf ) - 5 ) ;
2006-03-24 14:18:19 +03:00
} else {
2007-07-16 10:37:27 +04:00
strncpy ( buf , str , sizeof ( buf ) - 1 ) ;
2006-03-24 14:18:19 +03:00
}
2007-07-16 10:37:27 +04:00
buf [ sizeof ( buf ) - 1 ] = 0 ;
2014-08-07 03:09:08 +04:00
options = strchr ( str , ' , ' ) ;
if ( options )
2006-03-24 14:18:19 +03:00
* ( options + + ) = 0 ;
# ifdef __sparc__
if ( ! strcmp ( str , " ttya " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS0 " ) ;
2006-03-24 14:18:19 +03:00
if ( ! strcmp ( str , " ttyb " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS1 " ) ;
2006-03-24 14:18:19 +03:00
# endif
2007-07-16 10:37:27 +04:00
for ( s = buf ; * s ; s + + )
2014-08-07 03:09:08 +04:00
if ( isdigit ( * s ) | | * s = = ' , ' )
2006-03-24 14:18:19 +03:00
break ;
idx = simple_strtoul ( s , NULL , 10 ) ;
* s = 0 ;
2008-04-30 11:54:51 +04:00
__add_preferred_console ( buf , idx , options , brl_options ) ;
xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 02:31:07 +04:00
console_set_on_cmdline = 1 ;
2006-03-24 14:18:19 +03:00
return 1 ;
}
__setup ( " console= " , console_setup ) ;
2005-05-17 08:53:47 +04:00
/**
* add_preferred_console - add a device to the list of preferred consoles .
2005-11-14 03:08:14 +03:00
* @ name : device name
* @ idx : device index
* @ options : options for this console
2005-05-17 08:53:47 +04:00
*
* The last preferred console added will be used for kernel messages
* and stdin / out / err for init . Normally this is used by console_setup
* above to handle user - supplied console arguments ; however it can also
* be used by arch - specific code either to override the user or more
* commonly to provide a default console ( ie from PROM variables ) when
* the user has not supplied one .
*/
2007-12-29 12:19:49 +03:00
int add_preferred_console ( char * name , int idx , char * options )
2005-05-17 08:53:47 +04:00
{
2008-04-30 11:54:51 +04:00
return __add_preferred_console ( name , idx , options , NULL ) ;
2005-05-17 08:53:47 +04:00
}
2014-08-07 03:09:12 +04:00
bool console_suspend_enabled = true ;
2007-10-18 14:04:50 +04:00
EXPORT_SYMBOL ( console_suspend_enabled ) ;
static int __init console_suspend_disable ( char * str )
{
2014-08-07 03:09:12 +04:00
console_suspend_enabled = false ;
2007-10-18 14:04:50 +04:00
return 1 ;
}
__setup ( " no_console_suspend " , console_suspend_disable ) ;
2011-11-01 04:11:27 +04:00
module_param_named ( console_suspend , console_suspend_enabled ,
bool , S_IRUGO | S_IWUSR ) ;
MODULE_PARM_DESC ( console_suspend , " suspend console during suspend "
" and hibernate operations " ) ;
2007-10-18 14:04:50 +04:00
2006-06-20 05:16:01 +04:00
/**
* suspend_console - suspend the console subsystem
*
* This disables printk ( ) while we go into suspend states
*/
void suspend_console ( void )
{
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2008-07-24 08:28:32 +04:00
printk ( " Suspending console(s) (use no_console_suspend to debug) \n " ) ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2006-06-20 05:16:01 +04:00
console_suspended = 1 ;
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
}
void resume_console ( void )
{
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2014-06-05 03:11:36 +04:00
down_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
console_suspended = 0 ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2006-06-20 05:16:01 +04:00
}
2010-06-04 09:11:25 +04:00
/**
* console_cpu_notify - print deferred console messages after CPU hotplug
2016-11-03 17:49:58 +03:00
* @ cpu : unused
2010-06-04 09:11:25 +04:00
*
* If printk ( ) is called from a CPU that is not online yet , the messages
2017-01-21 13:47:29 +03:00
* will be printed on the console only if there are CON_ANYTIME consoles .
* This function is called when a new CPU comes online ( or fails to come
* up ) or goes offline .
2010-06-04 09:11:25 +04:00
*/
2016-11-03 17:49:58 +03:00
static int console_cpu_notify ( unsigned int cpu )
{
2016-11-17 19:31:55 +03:00
if ( ! cpuhp_tasks_frozen ) {
2017-01-21 13:47:29 +03:00
/* If trylock fails, someone else is doing the printing */
if ( console_trylock ( ) )
console_unlock ( ) ;
2010-06-04 09:11:25 +04:00
}
2016-11-03 17:49:58 +03:00
return 0 ;
2010-06-04 09:11:25 +04:00
}
2005-04-17 02:20:36 +04:00
/**
2011-01-26 02:07:35 +03:00
* console_lock - lock the console system for exclusive use .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* Acquires a lock which guarantees that the caller has
2005-04-17 02:20:36 +04:00
* exclusive access to the console system and the console_drivers list .
*
* Can sleep , returns nothing .
*/
2011-01-26 02:07:35 +03:00
void console_lock ( void )
2005-04-17 02:20:36 +04:00
{
2012-09-18 03:03:31 +04:00
might_sleep ( ) ;
2014-06-05 03:11:36 +04:00
down_console_sem ( ) ;
2009-02-14 04:07:24 +03:00
if ( console_suspended )
return ;
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
console_may_schedule = 1 ;
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_lock ) ;
2005-04-17 02:20:36 +04:00
2011-01-26 02:07:35 +03:00
/**
* console_trylock - try to lock the console system for exclusive use .
*
2014-08-07 03:09:03 +04:00
* Try to acquire a lock which guarantees that the caller has exclusive
* access to the console system and the console_drivers list .
2011-01-26 02:07:35 +03:00
*
* returns 1 on success , and 0 on failure to acquire the lock .
*/
int console_trylock ( void )
2005-04-17 02:20:36 +04:00
{
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) )
2011-01-26 02:07:35 +03:00
return 0 ;
2009-02-14 04:07:24 +03:00
if ( console_suspended ) {
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2011-01-26 02:07:35 +03:00
return 0 ;
2009-02-14 04:07:24 +03:00
}
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
printk: Never set console_may_schedule in console_trylock()
This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].
Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.
Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.
[1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-16 07:47:16 +03:00
console_may_schedule = 0 ;
2011-01-26 02:07:35 +03:00
return 1 ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_trylock ) ;
2005-04-17 02:20:36 +04:00
int is_console_locked ( void )
{
return console_locked ;
}
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
/*
* Check if we have any console that is capable of printing while cpu is
* booting or shutting down . Requires console_sem .
*/
static int have_callable_console ( void )
{
struct console * con ;
for_each_console ( con )
2016-03-18 00:21:27 +03:00
if ( ( con - > flags & CON_ENABLED ) & &
( con - > flags & CON_ANYTIME ) )
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
return 1 ;
return 0 ;
}
/*
* Can we actually use the console at this time on this cpu ?
*
* Console drivers may assume that per - cpu resources have been allocated . So
* unless they ' re explicitly marked as being able to cope ( CON_ANYTIME ) don ' t
* call them until this CPU is officially up .
*/
static inline int can_use_console ( void )
{
return cpu_online ( raw_smp_processor_id ( ) ) | | have_callable_console ( ) ;
}
2005-04-17 02:20:36 +04:00
/**
2011-01-26 02:07:35 +03:00
* console_unlock - unlock the console system
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* Releases the console_lock which the caller holds on the console system
2005-04-17 02:20:36 +04:00
* and the console driver list .
*
2011-01-26 02:07:35 +03:00
* While the console_lock was held , console output may have been buffered
* by printk ( ) . If this is the case , console_unlock ( ) ; emits
* the output prior to releasing the lock .
2005-04-17 02:20:36 +04:00
*
2012-05-09 03:37:51 +04:00
* If there is output waiting , we wake / dev / kmsg and syslog ( ) users .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* console_unlock ( ) ; may be called from any context .
2005-04-17 02:20:36 +04:00
*/
2011-01-26 02:07:35 +03:00
void console_unlock ( void )
2005-04-17 02:20:36 +04:00
{
2015-06-26 01:01:30 +03:00
static char ext_text [ CONSOLE_EXT_LOG_MAX ] ;
2012-07-17 05:35:29 +04:00
static char text [ LOG_LINE_MAX + PREFIX_MAX ] ;
2012-05-03 04:29:13 +04:00
static u64 seen_seq ;
2005-04-17 02:20:36 +04:00
unsigned long flags ;
2012-05-03 04:29:13 +04:00
bool wake_klogd = false ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
bool do_cond_resched , retry ;
2005-04-17 02:20:36 +04:00
2006-06-20 05:16:01 +04:00
if ( console_suspended ) {
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
return ;
}
2006-08-05 23:14:16 +04:00
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/*
2017-03-24 19:14:05 +03:00
* Console drivers are called with interrupts disabled , so
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
* @ console_may_schedule should be cleared before ; however , we may
* end up dumping a lot of lines , for example , if called from
* console registration path , and should invoke cond_resched ( )
* between lines if allowable . Not doing so can cause a very long
* scheduling stall on a slow console leading to RCU stall and
* softlockup warnings which exacerbate the issue with more
* messages practically incapacitating the system .
2017-03-24 19:14:05 +03:00
*
* console_trylock ( ) is not able to detect the preemptive
* context reliably . Therefore the value must be stored before
* and cleared after the the " again " goto label .
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
*/
do_cond_resched = console_may_schedule ;
2017-03-24 19:14:05 +03:00
again :
2006-08-05 23:14:16 +04:00
console_may_schedule = 0 ;
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
/*
* We released the console_sem lock , so we need to recheck if
* cpu is online and ( if not ) is there at least one CON_ANYTIME
* console .
*/
if ( ! can_use_console ( ) ) {
console_locked = 0 ;
up_console_sem ( ) ;
return ;
}
2012-05-03 04:29:13 +04:00
for ( ; ; ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2015-06-26 01:01:30 +03:00
size_t ext_len = 0 ;
2012-05-14 01:30:46 +04:00
size_t len ;
2012-05-03 04:29:13 +04:00
2016-12-27 17:16:09 +03:00
printk_safe_enter_irqsave ( flags ) ;
raw_spin_lock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
if ( seen_seq ! = log_next_seq ) {
wake_klogd = true ;
seen_seq = log_next_seq ;
}
if ( console_seq < log_first_seq ) {
2017-10-23 19:51:48 +03:00
len = sprintf ( text , " ** %u printk messages dropped ** \n " ,
2014-06-05 03:11:45 +04:00
( unsigned ) ( log_first_seq - console_seq ) ) ;
2012-05-03 04:29:13 +04:00
/* messages are gone, move to first one */
console_seq = log_first_seq ;
console_idx = log_first_idx ;
2014-06-05 03:11:45 +04:00
} else {
len = 0 ;
2012-05-03 04:29:13 +04:00
}
2012-06-28 11:38:53 +04:00
skip :
2012-05-03 04:29:13 +04:00
if ( console_seq = = log_next_seq )
break ;
msg = log_from_idx ( console_idx ) ;
2016-12-24 17:09:01 +03:00
if ( suppress_message_printing ( msg - > level ) ) {
2012-06-28 11:38:53 +04:00
/*
* Skip record we have buffered and already printed
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
* directly to the console when we received it , and
* record that has level above the console loglevel .
2012-06-28 11:38:53 +04:00
*/
console_idx = log_next ( console_idx ) ;
console_seq + + ;
goto skip ;
}
2012-05-10 06:30:45 +04:00
printk: add console_msg_format command line option
0day and kernelCI automatically parse kernel log - basically some sort
of grepping using the pre-defined text patterns - in order to detect
and report regressions/errors. There are several sources they get the
kernel logs from:
a) dmesg or /proc/ksmg
This is the preferred way. Because `dmesg --raw' (see later Note)
and /proc/kmsg output contains facility and log level, which greatly
simplifies grepping for EMERG/ALERT/CRIT/ERR messages.
b) serial consoles
This option is harder to maintain, because serial console messages
don't contain facility and log level.
This patch introduces a `console_msg_format=' command line option,
to switch between different message formatting on serial consoles.
For the time being we have just two options - default and syslog.
The "default" option just keeps the existing format. While the
"syslog" option makes serial console messages to appear in syslog
format [syslog() syscall], matching the `dmesg -S --raw' and
`cat /proc/kmsg' output formats:
- facility and log level
- time stamp (depends on printk_time/PRINTK_TIME)
- message
<%u>[time stamp] text\n
NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually
"dmesg -S --raw" that always prints messages in syslog format [per
Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog
format sometimes. console_msg_format=syslog enables syslog format,
thus in documentation we mention "dmesg -S --raw", not "dmesg --raw".
Per Kevin Hilman:
: Right now we can get this info from a "dmesg --raw" after bootup,
: but it would be really nice in certain automation frameworks to
: have a kernel command-line option to enable printing of loglevels
: in default boot log.
:
: This is especially useful when ingesting kernel logs into advanced
: search/analytics frameworks (I'm playing with and ELK stack: Elastic
: Search, Logstash, Kibana).
:
: The other important reason for having this on the command line is that
: for testing linux-next (and other bleeding edge developer branches),
: it's common that we never make it to userspace, so can't even run
: "dmesg --raw" (or equivalent.) So we really want this on the primary
: boot (serial) console.
Per Fengguang Wu, 0day scripts should quickly benefit from that
feature, because they will be able to switch to a more reliable
parsing, based on messages' facility and log levels [1]:
`#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )'
instead of doing text pattern matching
`#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} |
grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern |
grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist`
[1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb
Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-12-21 08:41:49 +03:00
len + = msg_print_text ( msg ,
console_msg_format & MSG_FORMAT_SYSLOG ,
text + len ,
sizeof ( text ) - len ) ;
2015-06-26 01:01:30 +03:00
if ( nr_ext_console_drivers ) {
ext_len = msg_print_ext_header ( ext_text ,
sizeof ( ext_text ) ,
2016-10-25 21:27:31 +03:00
msg , console_seq ) ;
2015-06-26 01:01:30 +03:00
ext_len + = msg_print_ext_body ( ext_text + ext_len ,
sizeof ( ext_text ) - ext_len ,
log_dict ( msg ) , msg - > dict_len ,
log_text ( msg ) , msg - > text_len ) ;
}
2012-05-03 04:29:13 +04:00
console_idx = log_next ( console_idx ) ;
console_seq + + ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
/*
* While actively printing out messages , if another printk ( )
* were to occur on another CPU , it may wait for this one to
* finish . This task can not be preempted if there is a
* waiter waiting to take over .
*/
2018-01-12 19:08:37 +03:00
console_lock_spinning_enable ( ) ;
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
2008-05-12 23:20:42 +04:00
stop_critical_timings ( ) ; /* don't trace print latency */
2016-12-24 17:09:01 +03:00
call_console_drivers ( ext_text , ext_len , text , len ) ;
2008-05-12 23:20:42 +04:00
start_critical_timings ( ) ;
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
2018-01-12 19:08:37 +03:00
if ( console_lock_spinning_disable_and_check ( ) ) {
printk_safe_exit_irqrestore ( flags ) ;
2018-02-26 17:44:20 +03:00
goto out ;
2018-01-12 19:08:37 +03:00
}
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
2016-12-27 17:16:09 +03:00
printk_safe_exit_irqrestore ( flags ) ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
if ( do_cond_resched )
cond_resched ( ) ;
2005-04-17 02:20:36 +04:00
}
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-01-10 16:24:17 +03:00
2005-04-17 02:20:36 +04:00
console_locked = 0 ;
2011-03-23 02:34:21 +03:00
/* Release the exclusive_console once it is used */
if ( unlikely ( exclusive_console ) )
exclusive_console = NULL ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock ( & logbuf_lock ) ;
2011-06-22 13:20:09 +04:00
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2011-06-22 13:20:09 +04:00
/*
* Someone could have filled up the buffer again , so re - check if there ' s
* something to flush . In case we cannot trylock the console_sem again ,
* there ' s a new owner and the console_unlock ( ) from them will do the
* flush , no worries .
*/
2009-07-25 19:50:36 +04:00
raw_spin_lock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
retry = console_seq ! = log_next_seq ;
2016-12-27 17:16:09 +03:00
raw_spin_unlock ( & logbuf_lock ) ;
printk_safe_exit_irqrestore ( flags ) ;
2011-12-09 02:34:13 +04:00
2011-06-22 13:20:09 +04:00
if ( retry & & console_trylock ( ) )
goto again ;
2018-02-26 17:44:20 +03:00
out :
2007-02-10 12:46:19 +03:00
if ( wake_klogd )
wake_up_klogd ( ) ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_unlock ) ;
2005-04-17 02:20:36 +04:00
2005-11-14 03:08:14 +03:00
/**
* console_conditional_schedule - yield the CPU if required
2005-04-17 02:20:36 +04:00
*
* If the console code is currently allowed to sleep , and
* if this CPU should yield the CPU to another task , do
* so here .
*
2011-01-26 02:07:35 +03:00
* Must be called within console_lock ( ) ; .
2005-04-17 02:20:36 +04:00
*/
void __sched console_conditional_schedule ( void )
{
if ( console_may_schedule )
cond_resched ( ) ;
}
EXPORT_SYMBOL ( console_conditional_schedule ) ;
void console_unblank ( void )
{
struct console * c ;
/*
* console_unblank can no longer be called in interrupt context unless
* oops_in_progress is set to 1. .
*/
if ( oops_in_progress ) {
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) ! = 0 )
2005-04-17 02:20:36 +04:00
return ;
} else
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
console_may_schedule = 0 ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
for_each_console ( c )
2005-04-17 02:20:36 +04:00
if ( ( c - > flags & CON_ENABLED ) & & c - > unblank )
c - > unblank ( ) ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/**
* console_flush_on_panic - flush console content on panic
*
* Immediately output all pending messages no matter what .
*/
void console_flush_on_panic ( void )
{
/*
* If someone else is holding the console lock , trylock will fail
* and may_schedule may be set . Ignore and proceed to unlock so
* that messages are flushed out . As this can be called from any
* context and we don ' t want to get preempted while flushing ,
* ensure may_schedule is cleared .
*/
console_trylock ( ) ;
console_may_schedule = 0 ;
console_unlock ( ) ;
}
2005-04-17 02:20:36 +04:00
/*
* Return the console tty driver structure and its associated index
*/
struct tty_driver * console_device ( int * index )
{
struct console * c ;
struct tty_driver * driver = NULL ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
for_each_console ( c ) {
2005-04-17 02:20:36 +04:00
if ( ! c - > device )
continue ;
driver = c - > device ( c , index ) ;
if ( driver )
break ;
}
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
return driver ;
}
/*
* Prevent further output on the passed console device so that ( for example )
* serial drivers can disable console output before suspending a port , and can
* re - enable output afterwards .
*/
void console_stop ( struct console * console )
{
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console - > flags & = ~ CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_stop ) ;
void console_start ( struct console * console )
{
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console - > flags | = CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_start ) ;
2011-03-23 02:34:20 +03:00
static int __read_mostly keep_bootcon ;
static int __init keep_bootcon_setup ( char * str )
{
keep_bootcon = 1 ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: skip boot console de-registration. \n " ) ;
2011-03-23 02:34:20 +03:00
return 0 ;
}
early_param ( " keep_bootcon " , keep_bootcon_setup ) ;
2005-04-17 02:20:36 +04:00
/*
* The console driver calls this routine during kernel initialization
* to register the console printing procedure with printk ( ) and to
* print any messages that were printed by the kernel before the
* console driver was initialized .
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
*
* This can happen pretty early during the boot process ( because of
* early_printk ) - sometimes before setup_arch ( ) completes - be careful
* of what kernel features are used - they may not be initialised yet .
*
* There are two types of consoles - bootconsoles ( early_printk ) and
* " real " consoles ( everything which is not a bootconsole ) which are
* handled differently .
* - Any number of bootconsoles can be registered at any time .
* - As soon as a " real " console is registered , all bootconsoles
* will be unregistered automatically .
* - Once a " real " console is registered , any attempt to register a
* bootconsoles will be rejected
2005-04-17 02:20:36 +04:00
*/
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
void register_console ( struct console * newcon )
2005-04-17 02:20:36 +04:00
{
2005-10-31 02:02:46 +03:00
int i ;
2005-04-17 02:20:36 +04:00
unsigned long flags ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
struct console * bcon = NULL ;
2013-08-01 00:53:46 +04:00
struct console_cmdline * c ;
2017-03-15 13:28:50 +03:00
static bool has_preferred ;
2005-04-17 02:20:36 +04:00
2013-08-02 14:23:34 +04:00
if ( console_drivers )
for_each_console ( bcon )
if ( WARN ( bcon = = newcon ,
" console '%s%d' already registered \n " ,
bcon - > name , bcon - > index ) )
return ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
/*
* before we register a new CON_BOOT console , make sure we don ' t
* already have a valid console
*/
if ( console_drivers & & newcon - > flags & CON_BOOT ) {
/* find the last or real console */
for_each_console ( bcon ) {
if ( ! ( bcon - > flags & CON_BOOT ) ) {
2013-11-13 03:08:50 +04:00
pr_info ( " Too late to register bootconsole %s%d \n " ,
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > name , newcon - > index ) ;
return ;
}
}
2007-05-08 11:26:49 +04:00
}
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( console_drivers & & console_drivers - > flags & CON_BOOT )
bcon = console_drivers ;
2017-03-15 13:28:50 +03:00
if ( ! has_preferred | | bcon | | ! console_drivers )
2017-03-15 13:28:51 +03:00
has_preferred = preferred_console > = 0 ;
2005-04-17 02:20:36 +04:00
/*
* See if we want to use this console driver . If we
* didn ' t select a console we take the first one
* that registers here .
*/
2017-03-15 13:28:50 +03:00
if ( ! has_preferred ) {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( newcon - > index < 0 )
newcon - > index = 0 ;
if ( newcon - > setup = = NULL | |
newcon - > setup ( newcon , NULL ) = = 0 ) {
newcon - > flags | = CON_ENABLED ;
if ( newcon - > device ) {
newcon - > flags | = CON_CONSDEV ;
2017-03-15 13:28:50 +03:00
has_preferred = true ;
2008-05-12 23:21:04 +04:00
}
2005-04-17 02:20:36 +04:00
}
}
/*
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
* See if this console matches one we selected on
* the command line .
2005-04-17 02:20:36 +04:00
*/
Revert "printk: fix double printing with earlycon"
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e.
The commit regression to users that define both console=ttyS1
and console=ttyS0 on the command line, see
https://lkml.kernel.org/r/20170509082915.GA13236@bistromath.localdomain
The kernel log messages always appeared only on one serial port. It is
even documented in Documentation/admin-guide/serial-console.rst:
"Note that you can only define one console per device type (serial,
video)."
The above mentioned commit changed the order in which the command line
parameters are searched. As a result, the kernel log messages go to
the last mentioned ttyS* instead of the first one.
We long thought that using two console=ttyS* on the command line
did not make sense. But then we realized that console= parameters
were handled also by systemd, see
http://0pointer.de/blog/projects/serial-console.html
"By default systemd will instantiate one serial-getty@.service on
the main kernel console, if it is not a virtual terminal."
where
"[4] If multiple kernel consoles are used simultaneously, the main
console is the one listed first in /sys/class/tty/console/active,
which is the last one listed on the kernel command line."
This puts the original report into another light. The system is running
in qemu. The first serial port is used to store the messages into a file.
The second one is used to login to the system via a socket. It depends
on systemd and the historic kernel behavior.
By other words, systemd causes that it makes sense to define both
console=ttyS1 console=ttyS0 on the command line. The kernel fix
caused regression related to userspace (systemd) and need to be
reverted.
In addition, it went out that the fix helped only partially.
The messages still were duplicated when the boot console was
removed early by late_initcall(printk_late_init). Then the entire
log was replayed when the same console was registered as a normal one.
Link: 20170606160339.GC7604@pathway.suse.cz
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Nair, Jayachandran" <Jayachandran.Nair@cavium.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-06-08 13:01:30 +03:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
2015-03-09 23:27:12 +03:00
if ( ! newcon - > match | |
newcon - > match ( newcon , c - > name , c - > index , c - > options ) ! = 0 ) {
/* default matching */
BUILD_BUG_ON ( sizeof ( c - > name ) ! = sizeof ( newcon - > name ) ) ;
if ( strcmp ( c - > name , newcon - > name ) ! = 0 )
continue ;
if ( newcon - > index > = 0 & &
newcon - > index ! = c - > index )
continue ;
if ( newcon - > index < 0 )
newcon - > index = c - > index ;
2013-08-01 00:53:45 +04:00
2015-03-09 23:27:12 +03:00
if ( _braille_register_console ( newcon , c ) )
return ;
if ( newcon - > setup & &
newcon - > setup ( newcon , c - > options ) ! = 0 )
break ;
}
2013-08-01 00:53:45 +04:00
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags | = CON_ENABLED ;
2017-03-15 13:28:51 +03:00
if ( i = = preferred_console ) {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags | = CON_CONSDEV ;
2017-03-15 13:28:50 +03:00
has_preferred = true ;
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
}
2005-04-17 02:20:36 +04:00
break ;
}
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( ! ( newcon - > flags & CON_ENABLED ) )
2005-04-17 02:20:36 +04:00
return ;
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* If we have a bootconsole , and are switching to a real console ,
* don ' t print everything out again , since when the boot console , and
* the real console are the same physical device , it ' s annoying to
* see the beginning boot messages twice
*/
if ( bcon & & ( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) )
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags & = ~ CON_PRINTBUFFER ;
2005-04-17 02:20:36 +04:00
/*
* Put this console in the list - keep the
* preferred driver at the head of the list .
*/
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( ( newcon - > flags & CON_CONSDEV ) | | console_drivers = = NULL ) {
newcon - > next = console_drivers ;
console_drivers = newcon ;
if ( newcon - > next )
newcon - > next - > flags & = ~ CON_CONSDEV ;
2005-04-17 02:20:36 +04:00
} else {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > next = console_drivers - > next ;
console_drivers - > next = newcon ;
2005-04-17 02:20:36 +04:00
}
2015-06-26 01:01:30 +03:00
if ( newcon - > flags & CON_EXTENDED )
if ( ! nr_ext_console_drivers + + )
pr_info ( " printk: continuation disabled due to ext consoles, expect more fragments in /dev/kmsg \n " ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( newcon - > flags & CON_PRINTBUFFER ) {
2005-04-17 02:20:36 +04:00
/*
2011-01-26 02:07:35 +03:00
* console_unlock ( ) ; will print out the buffered messages
2005-04-17 02:20:36 +04:00
* for us .
*/
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-05-03 04:29:13 +04:00
console_seq = syslog_seq ;
console_idx = syslog_idx ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2011-03-23 02:34:21 +03:00
/*
* We ' re about to replay the log buffer . Only do this to the
* just - registered console to avoid excessive message spam to
* the already - registered consoles .
*/
exclusive_console = newcon ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* By unregistering the bootconsoles after we enable the real console
* we get the " console xxx enabled " message on all the consoles -
* boot consoles , real consoles , etc - this is to ensure that end
* users know there might be something in the kernel ' s log buffer that
* went to the bootconsole ( that they do not see on the real console )
*/
2013-11-13 03:08:50 +04:00
pr_info ( " %sconsole [%s%d] enabled \n " ,
2013-11-13 03:08:49 +04:00
( newcon - > flags & CON_BOOT ) ? " boot " : " " ,
newcon - > name , newcon - > index ) ;
2011-03-23 02:34:20 +03:00
if ( bcon & &
( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) & &
! keep_bootcon ) {
2013-11-13 03:08:49 +04:00
/* We need to iterate through all boot consoles, to make
* sure we print everything out , before we unregister them .
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
*/
for_each_console ( bcon )
if ( bcon - > flags & CON_BOOT )
unregister_console ( bcon ) ;
}
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( register_console ) ;
2005-10-31 02:02:46 +03:00
int unregister_console ( struct console * console )
2005-04-17 02:20:36 +04:00
{
2005-10-31 02:02:46 +03:00
struct console * a , * b ;
2013-08-01 00:53:45 +04:00
int res ;
2005-04-17 02:20:36 +04:00
2013-11-13 03:08:50 +04:00
pr_info ( " %sconsole [%s%d] disabled \n " ,
2013-11-13 03:08:49 +04:00
( console - > flags & CON_BOOT ) ? " boot " : " " ,
console - > name , console - > index ) ;
2013-08-01 00:53:45 +04:00
res = _braille_unregister_console ( console ) ;
if ( res )
return res ;
2008-04-30 11:54:51 +04:00
2013-08-01 00:53:45 +04:00
res = 1 ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
if ( console_drivers = = console ) {
console_drivers = console - > next ;
res = 0 ;
2005-11-24 00:37:44 +03:00
} else if ( console_drivers ) {
2005-04-17 02:20:36 +04:00
for ( a = console_drivers - > next , b = console_drivers ;
a ; b = a , a = b - > next ) {
if ( a = = console ) {
b - > next = a - > next ;
res = 0 ;
break ;
2005-10-31 02:02:46 +03:00
}
2005-04-17 02:20:36 +04:00
}
}
2005-10-31 02:02:46 +03:00
2015-06-26 01:01:30 +03:00
if ( ! res & & ( console - > flags & CON_EXTENDED ) )
nr_ext_console_drivers - - ;
2007-05-08 11:26:49 +04:00
/*
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
* If this isn ' t the last console and it has CON_CONSDEV set , we
* need to set it on the next preferred console .
2005-04-17 02:20:36 +04:00
*/
2007-05-08 11:26:49 +04:00
if ( console_drivers ! = NULL & & console - > flags & CON_CONSDEV )
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
console_drivers - > flags | = CON_CONSDEV ;
2005-04-17 02:20:36 +04:00
2014-05-14 02:04:39 +04:00
console - > flags & = ~ CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
2005-04-17 02:20:36 +04:00
return res ;
}
EXPORT_SYMBOL ( unregister_console ) ;
2005-05-01 19:59:02 +04:00
2017-04-13 01:37:14 +03:00
/*
* Initialize the console device . This is called * early * , so
* we can ' t necessarily depend on lots of kernel help here .
* Just do some early initializations , and do the complex setup
* later .
*/
void __init console_init ( void )
{
initcall_t * call ;
/* Setup the default TTY line discipline. */
n_tty_init ( ) ;
/*
* set up the console device so that later boot sequences can
* inform about problems etc . .
*/
call = __con_initcall_start ;
while ( call < __con_initcall_end ) {
( * call ) ( ) ;
call + + ;
}
}
2016-01-16 03:58:21 +03:00
/*
* Some boot consoles access data that is in the init section and which will
* be discarded after the initcalls have been run . To make sure that no code
* will access this data , unregister the boot consoles in a late initcall .
*
* If for some reason , such as deferred probe or the driver being a loadable
* module , the real console hasn ' t registered yet at this point , there will
* be a brief interval in which no messages are logged to the console , which
* makes it difficult to diagnose problems that occur during this time .
*
* To mitigate this problem somewhat , only unregister consoles whose memory
2017-07-14 15:51:12 +03:00
* intersects with the init section . Note that all other boot consoles will
* get unregistred when the real preferred console is registered .
2016-01-16 03:58:21 +03:00
*/
2010-06-04 09:11:25 +04:00
static int __init printk_late_init ( void )
2007-08-20 23:22:47 +04:00
{
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
struct console * con ;
2016-11-03 17:49:58 +03:00
int ret ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
for_each_console ( con ) {
2017-07-14 15:51:13 +03:00
if ( ! ( con - > flags & CON_BOOT ) )
continue ;
/* Check addresses that might be used for enabled consoles. */
if ( init_section_intersects ( con , sizeof ( * con ) ) | |
init_section_contains ( con - > write , 0 ) | |
init_section_contains ( con - > read , 0 ) | |
init_section_contains ( con - > device , 0 ) | |
init_section_contains ( con - > unblank , 0 ) | |
init_section_contains ( con - > data , 0 ) ) {
2016-01-16 03:58:21 +03:00
/*
2017-07-14 15:51:12 +03:00
* Please , consider moving the reported consoles out
* of the init section .
2016-01-16 03:58:21 +03:00
*/
2017-07-14 15:51:12 +03:00
pr_warn ( " bootconsole [%s%d] uses init memory and must be disabled even before the real one is ready \n " ,
con - > name , con - > index ) ;
unregister_console ( con ) ;
2007-08-22 07:14:58 +04:00
}
2007-08-20 23:22:47 +04:00
}
2016-11-03 17:49:58 +03:00
ret = cpuhp_setup_state_nocalls ( CPUHP_PRINTK_DEAD , " printk:dead " , NULL ,
console_cpu_notify ) ;
WARN_ON ( ret < 0 ) ;
ret = cpuhp_setup_state_nocalls ( CPUHP_AP_ONLINE_DYN , " printk:online " ,
console_cpu_notify , NULL ) ;
WARN_ON ( ret < 0 ) ;
2007-08-20 23:22:47 +04:00
return 0 ;
}
2010-06-04 09:11:25 +04:00
late_initcall ( printk_late_init ) ;
2007-08-20 23:22:47 +04:00
2008-02-08 15:21:25 +03:00
# if defined CONFIG_PRINTK
2013-03-23 02:04:39 +04:00
/*
* Delayed printk version , for scheduler - internal messages :
*/
# define PRINTK_PENDING_WAKEUP 0x01
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
# define PRINTK_PENDING_OUTPUT 0x02
2013-03-23 02:04:39 +04:00
static DEFINE_PER_CPU ( int , printk_pending ) ;
static void wake_up_klogd_work_func ( struct irq_work * irq_work )
{
int pending = __this_cpu_xchg ( printk_pending , 0 ) ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
if ( pending & PRINTK_PENDING_OUTPUT ) {
/* If trylock fails, someone else is doing the printing */
if ( console_trylock ( ) )
console_unlock ( ) ;
2013-03-23 02:04:39 +04:00
}
if ( pending & PRINTK_PENDING_WAKEUP )
wake_up_interruptible ( & log_wait ) ;
}
static DEFINE_PER_CPU ( struct irq_work , wake_up_klogd_work ) = {
. func = wake_up_klogd_work_func ,
. flags = IRQ_WORK_LAZY ,
} ;
void wake_up_klogd ( void )
{
preempt_disable ( ) ;
if ( waitqueue_active ( & log_wait ) ) {
this_cpu_or ( printk_pending , PRINTK_PENDING_WAKEUP ) ;
2014-08-17 21:30:24 +04:00
irq_work_queue ( this_cpu_ptr ( & wake_up_klogd_work ) ) ;
2013-03-23 02:04:39 +04:00
}
preempt_enable ( ) ;
}
2008-07-25 12:45:58 +04:00
2017-04-20 11:52:31 +03:00
int vprintk_deferred ( const char * fmt , va_list args )
2012-03-15 15:35:37 +04:00
{
int r ;
2014-12-11 02:50:15 +03:00
r = vprintk_emit ( 0 , LOGLEVEL_SCHED , NULL , 0 , fmt , args ) ;
2012-03-15 15:35:37 +04:00
2017-04-20 11:52:31 +03:00
preempt_disable ( ) ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
__this_cpu_or ( printk_pending , PRINTK_PENDING_OUTPUT ) ;
2014-08-17 21:30:24 +04:00
irq_work_queue ( this_cpu_ptr ( & wake_up_klogd_work ) ) ;
2014-06-05 03:11:39 +04:00
preempt_enable ( ) ;
2012-03-15 15:35:37 +04:00
return r ;
}
2017-04-20 11:52:31 +03:00
int printk_deferred ( const char * fmt , . . . )
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
r = vprintk_deferred ( fmt , args ) ;
va_end ( args ) ;
return r ;
}
2005-04-17 02:20:36 +04:00
/*
* printk rate limiting , lifted from the networking subsystem .
*
2008-07-30 09:33:38 +04:00
* This enforces a rate limit : not more than 10 kernel messages
* every 5 s to make a denial - of - service attack impossible .
2005-04-17 02:20:36 +04:00
*/
2008-07-30 09:33:38 +04:00
DEFINE_RATELIMIT_STATE ( printk_ratelimit_state , 5 * HZ , 10 ) ;
2009-10-23 16:58:11 +04:00
int __printk_ratelimit ( const char * func )
2005-04-17 02:20:36 +04:00
{
2009-10-23 16:58:11 +04:00
return ___ratelimit ( & printk_ratelimit_state , func ) ;
2005-04-17 02:20:36 +04:00
}
2009-10-23 16:58:11 +04:00
EXPORT_SYMBOL ( __printk_ratelimit ) ;
2006-11-03 09:07:16 +03:00
/**
* printk_timed_ratelimit - caller - controlled printk ratelimiting
* @ caller_jiffies : pointer to caller ' s state
* @ interval_msecs : minimum interval between prints
*
* printk_timed_ratelimit ( ) returns true if more than @ interval_msecs
* milliseconds have elapsed since the last time printk_timed_ratelimit ( )
* returned true .
*/
bool printk_timed_ratelimit ( unsigned long * caller_jiffies ,
unsigned int interval_msecs )
{
2014-08-07 03:09:08 +04:00
unsigned long elapsed = jiffies - * caller_jiffies ;
if ( * caller_jiffies & & elapsed < = msecs_to_jiffies ( interval_msecs ) )
return false ;
* caller_jiffies = jiffies ;
return true ;
2006-11-03 09:07:16 +03:00
}
EXPORT_SYMBOL ( printk_timed_ratelimit ) ;
2009-10-16 16:09:18 +04:00
static DEFINE_SPINLOCK ( dump_list_lock ) ;
static LIST_HEAD ( dump_list ) ;
/**
* kmsg_dump_register - register a kernel log dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Adds a kernel log dumper to the system . The dump callback in the
* structure will be called when the kernel oopses or panics and must be
* set . Returns zero on success and % - EINVAL or % - EBUSY otherwise .
*/
int kmsg_dump_register ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EBUSY ;
/* The dump callback needs to be set */
if ( ! dumper - > dump )
return - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
/* Don't allow registering multiple times */
if ( ! dumper - > registered ) {
dumper - > registered = 1 ;
2011-01-13 03:59:43 +03:00
list_add_tail_rcu ( & dumper - > list , & dump_list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_register ) ;
/**
* kmsg_dump_unregister - unregister a kmsg dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Removes a dump device from the system . Returns zero on success and
* % - EINVAL otherwise .
*/
int kmsg_dump_unregister ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
if ( dumper - > registered ) {
dumper - > registered = 0 ;
2011-01-13 03:59:43 +03:00
list_del_rcu ( & dumper - > list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
2011-01-13 03:59:43 +03:00
synchronize_rcu ( ) ;
2009-10-16 16:09:18 +04:00
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_unregister ) ;
2012-05-03 04:29:13 +04:00
static bool always_kmsg_dump ;
module_param_named ( always_kmsg_dump , always_kmsg_dump , bool , S_IRUGO | S_IWUSR ) ;
2009-10-16 16:09:18 +04:00
/**
* kmsg_dump - dump kernel log to kernel message dumpers .
* @ reason : the reason ( oops , panic etc ) for dumping
*
2012-06-15 16:07:51 +04:00
* Call each of the registered dumper ' s dump ( ) callback , which can
* retrieve the kmsg records with kmsg_dump_get_line ( ) or
* kmsg_dump_get_buffer ( ) .
2009-10-16 16:09:18 +04:00
*/
void kmsg_dump ( enum kmsg_dump_reason reason )
{
struct kmsg_dumper * dumper ;
unsigned long flags ;
2012-03-06 02:59:10 +04:00
if ( ( reason > KMSG_DUMP_OOPS ) & & ! always_kmsg_dump )
return ;
2012-06-15 16:07:51 +04:00
rcu_read_lock ( ) ;
list_for_each_entry_rcu ( dumper , & dump_list , list ) {
if ( dumper - > max_reason & & reason > dumper - > max_reason )
continue ;
/* initialize iterator with data about the stored records */
dumper - > active = true ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-06-15 16:07:51 +04:00
dumper - > cur_seq = clear_seq ;
dumper - > cur_idx = clear_idx ;
dumper - > next_seq = log_next_seq ;
dumper - > next_idx = log_next_idx ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2012-06-15 16:07:51 +04:00
/* invoke dumper which will iterate over records */
dumper - > dump ( dumper , reason ) ;
/* reset iterator */
dumper - > active = false ;
}
rcu_read_unlock ( ) ;
}
/**
2012-07-21 04:28:07 +04:00
* kmsg_dump_get_line_nolock - retrieve one kmsg log line ( unlocked version )
2012-06-15 16:07:51 +04:00
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
* @ line : buffer to copy the line to
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the beginning of the kmsg buffer , with the oldest kmsg
* record , and copy one record into the provided buffer .
*
* Consecutive calls will return the next available record moving
* towards the end of the buffer with the youngest messages .
*
* A return value of FALSE indicates that there are no more records to
* read .
2012-07-21 04:28:07 +04:00
*
* The function is similar to kmsg_dump_get_line ( ) , but grabs no locks .
2012-06-15 16:07:51 +04:00
*/
2012-07-21 04:28:07 +04:00
bool kmsg_dump_get_line_nolock ( struct kmsg_dumper * dumper , bool syslog ,
char * line , size_t size , size_t * len )
2012-06-15 16:07:51 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-06-15 16:07:51 +04:00
size_t l = 0 ;
bool ret = false ;
if ( ! dumper - > active )
goto out ;
2012-05-03 04:29:13 +04:00
2012-06-15 16:07:51 +04:00
if ( dumper - > cur_seq < log_first_seq ) {
/* messages are gone, move to first available one */
dumper - > cur_seq = log_first_seq ;
dumper - > cur_idx = log_first_idx ;
}
2009-10-16 16:09:18 +04:00
2012-06-15 16:07:51 +04:00
/* last entry */
2012-07-21 04:28:07 +04:00
if ( dumper - > cur_seq > = log_next_seq )
2012-06-15 16:07:51 +04:00
goto out ;
2009-10-16 16:09:18 +04:00
2012-06-15 16:07:51 +04:00
msg = log_from_idx ( dumper - > cur_idx ) ;
2016-10-25 21:27:31 +03:00
l = msg_print_text ( msg , syslog , line , size ) ;
2012-06-15 16:07:51 +04:00
dumper - > cur_idx = log_next ( dumper - > cur_idx ) ;
dumper - > cur_seq + + ;
ret = true ;
out :
if ( len )
* len = l ;
return ret ;
}
2012-07-21 04:28:07 +04:00
/**
* kmsg_dump_get_line - retrieve one kmsg log line
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
* @ line : buffer to copy the line to
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the beginning of the kmsg buffer , with the oldest kmsg
* record , and copy one record into the provided buffer .
*
* Consecutive calls will return the next available record moving
* towards the end of the buffer with the youngest messages .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
bool kmsg_dump_get_line ( struct kmsg_dumper * dumper , bool syslog ,
char * line , size_t size , size_t * len )
{
unsigned long flags ;
bool ret ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-07-21 04:28:07 +04:00
ret = kmsg_dump_get_line_nolock ( dumper , syslog , line , size , len ) ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2012-07-21 04:28:07 +04:00
return ret ;
}
2012-06-15 16:07:51 +04:00
EXPORT_SYMBOL_GPL ( kmsg_dump_get_line ) ;
/**
* kmsg_dump_get_buffer - copy kmsg log lines
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
2012-07-01 02:37:24 +04:00
* @ buf : buffer to copy the line to
2012-06-15 16:07:51 +04:00
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the end of the kmsg buffer and fill the provided buffer
* with as many of the the * youngest * kmsg records that fit into it .
* If the buffer is large enough , all available kmsg records will be
* copied with a single call .
*
* Consecutive calls will fill the buffer with the next block of
* available older records , not including the earlier retrieved ones .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
bool kmsg_dump_get_buffer ( struct kmsg_dumper * dumper , bool syslog ,
char * buf , size_t size , size_t * len )
{
unsigned long flags ;
u64 seq ;
u32 idx ;
u64 next_seq ;
u32 next_idx ;
size_t l = 0 ;
bool ret = false ;
if ( ! dumper - > active )
goto out ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-06-15 16:07:51 +04:00
if ( dumper - > cur_seq < log_first_seq ) {
/* messages are gone, move to first available one */
dumper - > cur_seq = log_first_seq ;
dumper - > cur_idx = log_first_idx ;
}
/* last entry */
if ( dumper - > cur_seq > = dumper - > next_seq ) {
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2012-06-15 16:07:51 +04:00
goto out ;
}
/* calculate length of entire buffer */
seq = dumper - > cur_seq ;
idx = dumper - > cur_idx ;
while ( seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-06-15 16:07:51 +04:00
2016-10-25 21:27:31 +03:00
l + = msg_print_text ( msg , true , NULL , 0 ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
/* move first record forward until length fits into the buffer */
seq = dumper - > cur_seq ;
idx = dumper - > cur_idx ;
while ( l > size & & seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2009-10-16 16:09:18 +04:00
2016-10-25 21:27:31 +03:00
l - = msg_print_text ( msg , true , NULL , 0 ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
2009-10-16 16:09:18 +04:00
}
2012-06-15 16:07:51 +04:00
/* last message in next interation */
next_seq = seq ;
next_idx = idx ;
l = 0 ;
while ( seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-06-15 16:07:51 +04:00
2016-10-25 21:27:31 +03:00
l + = msg_print_text ( msg , syslog , buf + l , size - l ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
dumper - > next_seq = next_seq ;
dumper - > next_idx = next_idx ;
ret = true ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2012-06-15 16:07:51 +04:00
out :
if ( len )
* len = l ;
return ret ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_get_buffer ) ;
2009-10-16 16:09:18 +04:00
2012-07-21 04:28:07 +04:00
/**
* kmsg_dump_rewind_nolock - reset the interator ( unlocked version )
* @ dumper : registered kmsg dumper
*
* Reset the dumper ' s iterator so that kmsg_dump_get_line ( ) and
* kmsg_dump_get_buffer ( ) can be called again and used multiple
* times within the same dumper . dump ( ) callback .
*
* The function is similar to kmsg_dump_rewind ( ) , but grabs no locks .
*/
void kmsg_dump_rewind_nolock ( struct kmsg_dumper * dumper )
{
dumper - > cur_seq = clear_seq ;
dumper - > cur_idx = clear_idx ;
dumper - > next_seq = log_next_seq ;
dumper - > next_idx = log_next_idx ;
}
2012-06-15 16:07:51 +04:00
/**
* kmsg_dump_rewind - reset the interator
* @ dumper : registered kmsg dumper
*
* Reset the dumper ' s iterator so that kmsg_dump_get_line ( ) and
* kmsg_dump_get_buffer ( ) can be called again and used multiple
* times within the same dumper . dump ( ) callback .
*/
void kmsg_dump_rewind ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
2016-12-27 17:16:11 +03:00
logbuf_lock_irqsave ( flags ) ;
2012-07-21 04:28:07 +04:00
kmsg_dump_rewind_nolock ( dumper ) ;
2016-12-27 17:16:11 +03:00
logbuf_unlock_irqrestore ( flags ) ;
2009-10-16 16:09:18 +04:00
}
2012-06-15 16:07:51 +04:00
EXPORT_SYMBOL_GPL ( kmsg_dump_rewind ) ;
2013-05-01 02:27:12 +04:00
2013-05-01 02:27:15 +04:00
static char dump_stack_arch_desc_str [ 128 ] ;
/**
* dump_stack_set_arch_desc - set arch - specific str to show with task dumps
* @ fmt : printf - style format string
* @ . . . : arguments for the format string
*
* The configured string will be printed right after utsname during task
* dumps . Usually used to add arch - specific system identifiers . If an
* arch wants to make use of such an ID string , it should initialize this
* as soon as possible during boot .
*/
void __init dump_stack_set_arch_desc ( const char * fmt , . . . )
{
va_list args ;
va_start ( args , fmt ) ;
vsnprintf ( dump_stack_arch_desc_str , sizeof ( dump_stack_arch_desc_str ) ,
fmt , args ) ;
va_end ( args ) ;
}
2013-05-01 02:27:12 +04:00
/**
* dump_stack_print_info - print generic debug info for dump_stack ( )
* @ log_lvl : log level
*
* Arch - specific dump_stack ( ) implementations can use this function to
* print out the same debug information as the generic dump_stack ( ) .
*/
void dump_stack_print_info ( const char * log_lvl )
{
printk ( " %sCPU: %d PID: %d Comm: %.20s %s %s %.*s \n " ,
log_lvl , raw_smp_processor_id ( ) , current - > pid , current - > comm ,
print_tainted ( ) , init_utsname ( ) - > release ,
( int ) strcspn ( init_utsname ( ) - > version , " " ) ,
init_utsname ( ) - > version ) ;
2013-05-01 02:27:15 +04:00
if ( dump_stack_arch_desc_str [ 0 ] ! = ' \0 ' )
printk ( " %sHardware name: %s \n " ,
log_lvl , dump_stack_arch_desc_str ) ;
2013-05-01 02:27:22 +04:00
print_worker_info ( log_lvl , current ) ;
2013-05-01 02:27:12 +04:00
}
dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 02:27:17 +04:00
/**
* show_regs_print_info - print generic debug info for show_regs ( )
* @ log_lvl : log level
*
* show_regs ( ) implementations can use this function to print out generic
* debug information .
*/
void show_regs_print_info ( const char * log_lvl )
{
dump_stack_print_info ( log_lvl ) ;
}
2008-02-08 15:21:25 +03:00
# endif