2005-04-17 02:20:36 +04:00
/*
* linux / kernel / printk . c
*
* Copyright ( C ) 1991 , 1992 Linus Torvalds
*
* Modified to make sys_syslog ( ) more flexible : added commands to
* return the last 4 k of kernel messages , regardless of whether
* they ' ve been read or not . Added option to suppress kernel printk ' s
* to the console . Added hook for sending the console messages
* elsewhere , in preparation for a serial line console ( someday ) .
* Ted Ts ' o , 2 / 11 / 93.
* Modified for sysctl support , 1 / 8 / 97 , Chris Horn .
2005-10-31 02:02:46 +03:00
* Fixed SMP synchronization , 08 / 08 / 99 , Manfred Spraul
2006-01-15 04:43:54 +03:00
* manfred @ colorfullife . com
2005-04-17 02:20:36 +04:00
* Rewrote bits to get rid of console_lock
2008-10-16 09:01:59 +04:00
* 01 Mar01 Andrew Morton
2005-04-17 02:20:36 +04:00
*/
# include <linux/kernel.h>
# include <linux/mm.h>
# include <linux/tty.h>
# include <linux/tty_driver.h>
# include <linux/console.h>
# include <linux/init.h>
2007-10-16 12:23:46 +04:00
# include <linux/jiffies.h>
# include <linux/nmi.h>
2005-04-17 02:20:36 +04:00
# include <linux/module.h>
2006-06-25 16:48:15 +04:00
# include <linux/moduleparam.h>
2005-04-17 02:20:36 +04:00
# include <linux/delay.h>
# include <linux/smp.h>
# include <linux/security.h>
# include <linux/bootmem.h>
2011-05-25 04:13:20 +04:00
# include <linux/memblock.h>
2005-04-17 02:20:36 +04:00
# include <linux/syscalls.h>
2009-04-03 03:58:57 +04:00
# include <linux/kexec.h>
2010-05-21 06:04:27 +04:00
# include <linux/kdb.h>
2009-09-22 18:18:09 +04:00
# include <linux/ratelimit.h>
2009-10-16 16:09:18 +04:00
# include <linux/kmsg_dump.h>
2010-02-04 02:36:43 +03:00
# include <linux/syslog.h>
2010-06-04 09:11:25 +04:00
# include <linux/cpu.h>
# include <linux/notifier.h>
2011-01-13 03:59:43 +03:00
# include <linux/rculist.h>
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
# include <linux/poll.h>
2012-10-12 20:00:23 +04:00
# include <linux/irq_work.h>
2013-05-01 02:27:12 +04:00
# include <linux/utsname.h>
2014-08-07 03:09:08 +04:00
# include <linux/ctype.h>
2015-02-22 19:58:50 +03:00
# include <linux/uio.h>
2005-04-17 02:20:36 +04:00
# include <asm/uaccess.h>
2016-01-16 03:58:21 +03:00
# include <asm-generic/sections.h>
2005-04-17 02:20:36 +04:00
2011-11-24 23:03:08 +04:00
# define CREATE_TRACE_POINTS
# include <trace/events/printk.h>
2013-08-01 00:53:44 +04:00
# include "console_cmdline.h"
2013-08-01 00:53:45 +04:00
# include "braille.h"
2016-05-21 03:00:33 +03:00
# include "internal.h"
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
int console_printk [ 4 ] = {
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_DEFAULT , /* console_loglevel */
2014-08-07 03:09:01 +04:00
MESSAGE_LOGLEVEL_DEFAULT , /* default_message_loglevel */
2014-06-05 03:11:46 +04:00
CONSOLE_LOGLEVEL_MIN , /* minimum_console_loglevel */
CONSOLE_LOGLEVEL_DEFAULT , /* default_console_loglevel */
2005-04-17 02:20:36 +04:00
} ;
/*
2007-02-17 22:10:16 +03:00
* Low level drivers may need that to know if they can schedule in
2005-04-17 02:20:36 +04:00
* their unblank ( ) callback or not . So let ' s export it .
*/
int oops_in_progress ;
EXPORT_SYMBOL ( oops_in_progress ) ;
/*
* console_sem protects the console_drivers list , and also
* provides serialisation for access to the entire console
* driver system .
*/
2010-09-07 18:33:43 +04:00
static DEFINE_SEMAPHORE ( console_sem ) ;
2005-04-17 02:20:36 +04:00
struct console * console_drivers ;
2008-06-02 15:19:08 +04:00
EXPORT_SYMBOL_GPL ( console_drivers ) ;
console: implement lockdep support for console_lock
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-22 21:52:11 +04:00
# ifdef CONFIG_LOCKDEP
static struct lockdep_map console_lock_dep_map = {
. name = " console_lock "
} ;
# endif
2015-06-26 01:01:30 +03:00
/*
* Number of registered extended console drivers .
*
* If extended consoles are present , in - kernel cont reassembly is disabled
* and each fragment is stored as a separate log entry with proper
* continuation flag so that every emitted message has full metadata . This
* doesn ' t change the result for regular consoles or / proc / kmsg . For
* / dev / kmsg , as long as the reader concatenates messages according to
* consecutive continuation flags , the end result should be the same too .
*/
static int nr_ext_console_drivers ;
2014-06-05 03:11:36 +04:00
/*
* Helper macros to handle lockdep when locking / unlocking console_sem . We use
* macros instead of functions so that _RET_IP_ contains useful information .
*/
# define down_console_sem() do { \
down ( & console_sem ) ; \
mutex_acquire ( & console_lock_dep_map , 0 , 0 , _RET_IP_ ) ; \
} while ( 0 )
static int __down_trylock_console_sem ( unsigned long ip )
{
if ( down_trylock ( & console_sem ) )
return 1 ;
mutex_acquire ( & console_lock_dep_map , 0 , 1 , ip ) ;
return 0 ;
}
# define down_trylock_console_sem() __down_trylock_console_sem(_RET_IP_)
# define up_console_sem() do { \
mutex_release ( & console_lock_dep_map , 1 , _RET_IP_ ) ; \
up ( & console_sem ) ; \
} while ( 0 )
2005-04-17 02:20:36 +04:00
/*
* This is used for debugging the mess that is the VT code by
* keeping track if we have the console semaphore held . It ' s
* definitely not the perfect debug tool ( we don ' t know if _WE_
2014-08-07 03:09:03 +04:00
* hold it and are racing , but it helps tracking those weird code
* paths in the console code where we end up in places I want
* locked without the console sempahore held ) .
2005-04-17 02:20:36 +04:00
*/
2006-06-20 05:16:01 +04:00
static int console_locked , console_suspended ;
2005-04-17 02:20:36 +04:00
2011-03-23 02:34:21 +03:00
/*
* If exclusive_console is non - NULL then only this console is to be printed to .
*/
static struct console * exclusive_console ;
2005-04-17 02:20:36 +04:00
/*
* Array of consoles built from command line options ( console = )
*/
# define MAX_CMDLINECONSOLES 8
static struct console_cmdline console_cmdline [ MAX_CMDLINECONSOLES ] ;
2013-08-01 00:53:44 +04:00
2005-04-17 02:20:36 +04:00
static int selected_console = - 1 ;
static int preferred_console = - 1 ;
xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 02:31:07 +04:00
int console_set_on_cmdline ;
EXPORT_SYMBOL ( console_set_on_cmdline ) ;
2005-04-17 02:20:36 +04:00
/* Flag: console code may call schedule() */
static int console_may_schedule ;
2012-05-03 04:29:13 +04:00
/*
* The printk log buffer consists of a chain of concatenated variable
* length records . Every record starts with a record header , containing
* the overall length of the record .
*
* The heads to the first and last entry in the buffer , as well as the
2014-08-07 03:09:03 +04:00
* sequence numbers of these entries are maintained when messages are
* stored .
2012-05-03 04:29:13 +04:00
*
* If the heads indicate available messages , the length in the header
* tells the start next message . A length = = 0 for the next message
* indicates a wrap - around to the beginning of the buffer .
*
* Every record carries the monotonic timestamp in microseconds , as well as
* the standard userspace syslog level and syslog facility . The usual
* kernel messages use LOG_KERN ; userspace - injected messages always carry
* a matching syslog facility , by default LOG_USER . The origin of every
* message can be reliably determined that way .
*
* The human readable log message directly follows the message header . The
* length of the message text is stored in the header , the stored message
* is not terminated .
*
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
* Optionally , a message can carry a dictionary of properties ( key / value pairs ) ,
* to provide userspace with a machine - readable message context .
*
* Examples for well - defined , commonly used property names are :
* DEVICE = b12 : 8 device identifier
* b12 : 8 block dev_t
* c127 : 3 char dev_t
* n8 netdev ifindex
* + sound : card0 subsystem : devname
* SUBSYSTEM = pci driver - core subsystem name
*
* Valid characters in property names are [ a - zA - Z0 - 9. - _ ] . The plain text value
* follows directly after a ' = ' character . Every property is terminated by
* a ' \0 ' character . The last property is not terminated .
*
* Example of a message structure :
* 0000 ff 8f 00 00 00 00 00 00 monotonic time in nsec
* 000 8 34 00 record is 52 bytes long
* 000 a 0 b 00 text is 11 bytes long
* 000 c 1f 00 dictionary is 23 bytes long
* 000 e 03 00 LOG_KERN ( facility ) LOG_ERR ( level )
* 0010 69 74 27 73 20 61 20 6 c " it's a l "
* 69 6 e 65 " ine "
* 001 b 44 45 56 49 43 " DEVIC "
* 45 3 d 62 38 3 a 32 00 44 " E=b8:2 \0 D "
* 52 49 56 45 52 3 d 62 75 " RIVER=bu "
* 67 " g "
* 0032 00 00 00 padding to next message header
*
2013-08-01 00:53:47 +04:00
* The ' struct printk_log ' buffer header must never be directly exported to
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
* userspace , it is a kernel - private implementation detail that might
* need to be changed in the future , when the requirements change .
*
* / dev / kmsg exports the structured data in the following line format :
2015-07-01 00:59:03 +03:00
* " <level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text> \n "
*
* Users of the export format should ignore possible additional values
* separated by ' , ' , and find the message after the ' ; ' character .
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
*
* The optional key / value pairs are attached as continuation lines starting
* with a space character and terminated by a newline . All possible
* non - prinatable characters are escaped in the " \xff " notation .
2012-05-03 04:29:13 +04:00
*/
2012-06-28 11:38:53 +04:00
enum log_flags {
2012-07-09 23:15:42 +04:00
LOG_NOCONS = 1 , /* already flushed, do not print to console */
LOG_NEWLINE = 2 , /* text ended with a newline */
LOG_PREFIX = 4 , /* text started with a prefix */
LOG_CONT = 8 , /* text is a fragment of a continuation line */
2012-06-28 11:38:53 +04:00
} ;
2013-08-01 00:53:47 +04:00
struct printk_log {
2012-05-03 04:29:13 +04:00
u64 ts_nsec ; /* timestamp in nanoseconds */
u16 len ; /* length of entire record */
u16 text_len ; /* length of text buffer */
u16 dict_len ; /* length of dictionary buffer */
2012-06-28 11:38:53 +04:00
u8 facility ; /* syslog facility */
u8 flags : 5 ; /* internal record flags */
u8 level : 3 ; /* syslog level */
2016-01-21 02:00:48 +03:00
}
# ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
__packed __aligned ( 4 )
# endif
;
2012-05-03 04:29:13 +04:00
/*
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
* The logbuf_lock protects kmsg buffer , indices , counters . This can be taken
* within the scheduler ' s rq lock . It must be released before calling
* console_unlock ( ) or anything else that might wake up a process .
2012-05-03 04:29:13 +04:00
*/
2016-05-21 03:00:42 +03:00
DEFINE_RAW_SPINLOCK ( logbuf_lock ) ;
2005-05-01 19:59:02 +04:00
2012-07-17 05:35:29 +04:00
# ifdef CONFIG_PRINTK
2013-03-23 02:04:39 +04:00
DECLARE_WAIT_QUEUE_HEAD ( log_wait ) ;
2012-05-09 03:37:51 +04:00
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq ;
static u32 syslog_idx ;
2012-07-09 23:15:42 +04:00
static enum log_flags syslog_prev ;
2012-07-09 21:05:10 +04:00
static size_t syslog_partial ;
2012-05-03 04:29:13 +04:00
/* index and sequence number of the first record stored in the buffer */
static u64 log_first_seq ;
static u32 log_first_idx ;
/* index and sequence number of the next record to store in the buffer */
static u64 log_next_seq ;
static u32 log_next_idx ;
2012-07-17 05:35:30 +04:00
/* the next printk record to write to the console */
static u64 console_seq ;
static u32 console_idx ;
static enum log_flags console_prev ;
2012-05-03 04:29:13 +04:00
/* the next printk record to read after the last 'clear' command */
static u64 clear_seq ;
static u32 clear_idx ;
2012-07-17 05:35:29 +04:00
# define PREFIX_MAX 32
2014-08-07 03:09:08 +04:00
# define LOG_LINE_MAX (1024 - PREFIX_MAX)
2012-05-09 03:37:51 +04:00
2015-11-07 03:30:38 +03:00
# define LOG_LEVEL(v) ((v) & 0x07)
# define LOG_FACILITY(v) ((v) >> 3 & 0xff)
2012-05-09 03:37:51 +04:00
/* record buffer */
2013-08-01 00:53:47 +04:00
# define LOG_ALIGN __alignof__(struct printk_log)
2012-05-09 03:37:51 +04:00
# define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
2012-05-11 02:14:33 +04:00
static char __log_buf [ __LOG_BUF_LEN ] __aligned ( LOG_ALIGN ) ;
2012-05-09 03:37:51 +04:00
static char * log_buf = __log_buf ;
static u32 log_buf_len = __LOG_BUF_LEN ;
2014-08-09 09:45:30 +04:00
/* Return log buffer address */
char * log_buf_addr_get ( void )
{
return log_buf ;
}
/* Return log buffer size */
u32 log_buf_len_get ( void )
{
return log_buf_len ;
}
2012-05-03 04:29:13 +04:00
/* human readable text of the record */
2013-08-01 00:53:47 +04:00
static char * log_text ( const struct printk_log * msg )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
return ( char * ) msg + sizeof ( struct printk_log ) ;
2012-05-03 04:29:13 +04:00
}
/* optional key/value pair dictionary attached to the record */
2013-08-01 00:53:47 +04:00
static char * log_dict ( const struct printk_log * msg )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
return ( char * ) msg + sizeof ( struct printk_log ) + msg - > text_len ;
2012-05-03 04:29:13 +04:00
}
/* get record by index; idx must point to valid msg */
2013-08-01 00:53:47 +04:00
static struct printk_log * log_from_idx ( u32 idx )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg = ( struct printk_log * ) ( log_buf + idx ) ;
2012-05-03 04:29:13 +04:00
/*
* A length = = 0 record is the end of buffer marker . Wrap around and
* read the message at the start of the buffer .
*/
if ( ! msg - > len )
2013-08-01 00:53:47 +04:00
return ( struct printk_log * ) log_buf ;
2012-05-03 04:29:13 +04:00
return msg ;
}
/* get next record; idx must point to valid msg */
static u32 log_next ( u32 idx )
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg = ( struct printk_log * ) ( log_buf + idx ) ;
2012-05-03 04:29:13 +04:00
/* length == 0 indicates the end of the buffer; wrap */
/*
* A length = = 0 record is the end of buffer marker . Wrap around and
* read the message at the start of the buffer as * this * one , and
* return the one after that .
*/
if ( ! msg - > len ) {
2013-08-01 00:53:47 +04:00
msg = ( struct printk_log * ) log_buf ;
2012-05-03 04:29:13 +04:00
return msg - > len ;
}
return idx + msg - > len ;
}
2014-06-05 03:11:30 +04:00
/*
* Check whether there is enough free space for the given message .
*
* The same values of first_idx and next_idx mean that the buffer
* is either empty or full .
*
* If the buffer is empty , we must respect the position of the indexes .
* They cannot be reset to the beginning of the buffer .
*/
static int logbuf_has_space ( u32 msg_size , bool empty )
2014-06-05 03:11:28 +04:00
{
u32 free ;
2014-06-05 03:11:30 +04:00
if ( log_next_idx > log_first_idx | | empty )
2014-06-05 03:11:28 +04:00
free = max ( log_buf_len - log_next_idx , log_first_idx ) ;
else
free = log_first_idx - log_next_idx ;
/*
* We need space also for an empty header that signalizes wrapping
* of the buffer .
*/
return free > = msg_size + sizeof ( struct printk_log ) ;
}
2014-06-05 03:11:30 +04:00
static int log_make_free_space ( u32 msg_size )
2014-06-05 03:11:28 +04:00
{
2016-03-18 00:21:30 +03:00
while ( log_first_seq < log_next_seq & &
! logbuf_has_space ( msg_size , false ) ) {
2014-08-07 03:09:03 +04:00
/* drop old messages until we have enough contiguous space */
2014-06-05 03:11:28 +04:00
log_first_idx = log_next ( log_first_idx ) ;
log_first_seq + + ;
}
2014-06-05 03:11:30 +04:00
2016-03-18 00:21:30 +03:00
if ( clear_seq < log_first_seq ) {
clear_seq = log_first_seq ;
clear_idx = log_first_idx ;
}
2014-06-05 03:11:30 +04:00
/* sequence numbers are equal, so the log buffer is empty */
2016-03-18 00:21:30 +03:00
if ( logbuf_has_space ( msg_size , log_first_seq = = log_next_seq ) )
2014-06-05 03:11:30 +04:00
return 0 ;
return - ENOMEM ;
2014-06-05 03:11:28 +04:00
}
2014-06-05 03:11:31 +04:00
/* compute the message size including the padding bytes */
static u32 msg_used_size ( u16 text_len , u16 dict_len , u32 * pad_len )
{
u32 size ;
size = sizeof ( struct printk_log ) + text_len + dict_len ;
* pad_len = ( - size ) & ( LOG_ALIGN - 1 ) ;
size + = * pad_len ;
return size ;
}
2014-06-05 03:11:32 +04:00
/*
* Define how much of the log buffer we could take at maximum . The value
* must be greater than two . Note that only half of the buffer is available
* when the index points to the middle .
*/
# define MAX_LOG_TAKE_PART 4
static const char trunc_msg [ ] = " <truncated> " ;
static u32 truncate_msg ( u16 * text_len , u16 * trunc_msg_len ,
u16 * dict_len , u32 * pad_len )
{
/*
* The message should not take the whole buffer . Otherwise , it might
* get removed too soon .
*/
u32 max_text_len = log_buf_len / MAX_LOG_TAKE_PART ;
if ( * text_len > max_text_len )
* text_len = max_text_len ;
/* enable the warning message */
* trunc_msg_len = strlen ( trunc_msg ) ;
/* disable the "dict" completely */
* dict_len = 0 ;
/* compute the size again, count also the warning message */
return msg_used_size ( * text_len + * trunc_msg_len , 0 , pad_len ) ;
}
2012-05-03 04:29:13 +04:00
/* insert record into the buffer, discard old ones, update heads */
2014-06-05 03:11:33 +04:00
static int log_store ( int facility , int level ,
enum log_flags flags , u64 ts_nsec ,
const char * dict , u16 dict_len ,
const char * text , u16 text_len )
2012-05-03 04:29:13 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-05-03 04:29:13 +04:00
u32 size , pad_len ;
2014-06-05 03:11:32 +04:00
u16 trunc_msg_len = 0 ;
2012-05-03 04:29:13 +04:00
/* number of '\0' padding bytes to next message */
2014-06-05 03:11:31 +04:00
size = msg_used_size ( text_len , dict_len , & pad_len ) ;
2012-05-03 04:29:13 +04:00
2014-06-05 03:11:32 +04:00
if ( log_make_free_space ( size ) ) {
/* truncate the message if it is too long for empty buffer */
size = truncate_msg ( & text_len , & trunc_msg_len ,
& dict_len , & pad_len ) ;
/* survive when the log buffer is too small for trunc_msg */
if ( log_make_free_space ( size ) )
2014-06-05 03:11:33 +04:00
return 0 ;
2014-06-05 03:11:32 +04:00
}
2012-05-03 04:29:13 +04:00
2014-04-04 01:48:42 +04:00
if ( log_next_idx + size + sizeof ( struct printk_log ) > log_buf_len ) {
2012-05-03 04:29:13 +04:00
/*
* This message + an additional empty header does not fit
* at the end of the buffer . Add an empty header with len = = 0
* to signify a wrap around .
*/
2013-08-01 00:53:47 +04:00
memset ( log_buf + log_next_idx , 0 , sizeof ( struct printk_log ) ) ;
2012-05-03 04:29:13 +04:00
log_next_idx = 0 ;
}
/* fill message */
2013-08-01 00:53:47 +04:00
msg = ( struct printk_log * ) ( log_buf + log_next_idx ) ;
2012-05-03 04:29:13 +04:00
memcpy ( log_text ( msg ) , text , text_len ) ;
msg - > text_len = text_len ;
2014-06-05 03:11:32 +04:00
if ( trunc_msg_len ) {
memcpy ( log_text ( msg ) + text_len , trunc_msg , trunc_msg_len ) ;
msg - > text_len + = trunc_msg_len ;
}
2012-05-03 04:29:13 +04:00
memcpy ( log_dict ( msg ) , dict , dict_len ) ;
msg - > dict_len = dict_len ;
2012-06-28 11:38:53 +04:00
msg - > facility = facility ;
msg - > level = level & 7 ;
msg - > flags = flags & 0x1f ;
if ( ts_nsec > 0 )
msg - > ts_nsec = ts_nsec ;
else
msg - > ts_nsec = local_clock ( ) ;
2012-05-03 04:29:13 +04:00
memset ( log_dict ( msg ) + dict_len , 0 , pad_len ) ;
2014-04-04 01:48:43 +04:00
msg - > len = size ;
2012-05-03 04:29:13 +04:00
/* insert message */
log_next_idx + = msg - > len ;
log_next_seq + + ;
2014-06-05 03:11:33 +04:00
return msg - > text_len ;
2012-05-03 04:29:13 +04:00
}
2005-05-01 19:59:02 +04:00
2014-08-07 03:09:05 +04:00
int dmesg_restrict = IS_ENABLED ( CONFIG_SECURITY_DMESG_RESTRICT ) ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
static int syslog_action_restricted ( int type )
{
if ( dmesg_restrict )
return 1 ;
/*
* Unless restricted , we allow " read all " and " get buffer size "
* for everybody .
*/
return type ! = SYSLOG_ACTION_READ_ALL & &
type ! = SYSLOG_ACTION_SIZE_BUFFER ;
}
2015-06-26 01:01:47 +03:00
int check_syslog_permissions ( int type , int source )
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
{
/*
* If this is from / proc / kmsg and we ' ve already opened it , then we ' ve
* already done the capabilities checks at open time .
*/
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC & & type ! = SYSLOG_ACTION_OPEN )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
if ( syslog_action_restricted ( type ) ) {
if ( capable ( CAP_SYSLOG ) )
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
/*
* For historical reasons , accept CAP_SYS_ADMIN too , with
* a warning .
*/
if ( capable ( CAP_SYS_ADMIN ) ) {
pr_warn_once ( " %s (%d): Attempt to access syslog with "
" CAP_SYS_ADMIN but no CAP_SYSLOG "
" (deprecated). \n " ,
current - > comm , task_pid_nr ( current ) ) ;
2015-06-26 01:01:44 +03:00
goto ok ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
}
return - EPERM ;
}
2015-06-26 01:01:44 +03:00
ok :
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return security_syslog ( type ) ;
}
2015-10-20 10:39:03 +03:00
EXPORT_SYMBOL_GPL ( check_syslog_permissions ) ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
2015-06-26 01:01:24 +03:00
static void append_char ( char * * pp , char * e , char c )
{
if ( * pp < e )
* ( * pp ) + + = c ;
}
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
2015-06-26 01:01:27 +03:00
static ssize_t msg_print_ext_header ( char * buf , size_t size ,
struct printk_log * msg , u64 seq ,
enum log_flags prev_flags )
{
u64 ts_usec = msg - > ts_nsec ;
char cont = ' - ' ;
do_div ( ts_usec , 1000 ) ;
/*
* If we couldn ' t merge continuation line fragments during the print ,
* export the stored flags to allow an optional external merge of the
* records . Merging the records isn ' t always neccessarily correct , like
* when we hit a race during printing . In most cases though , it produces
* better readable output . ' c ' in the record flags mark the first
* fragment of a line , ' + ' the following .
*/
if ( msg - > flags & LOG_CONT & & ! ( prev_flags & LOG_CONT ) )
cont = ' c ' ;
else if ( ( msg - > flags & LOG_CONT ) | |
( ( prev_flags & LOG_CONT ) & & ! ( msg - > flags & LOG_PREFIX ) ) )
cont = ' + ' ;
return scnprintf ( buf , size , " %u,%llu,%llu,%c; " ,
( msg - > facility < < 3 ) | msg - > level , seq , ts_usec , cont ) ;
}
static ssize_t msg_print_ext_body ( char * buf , size_t size ,
char * dict , size_t dict_len ,
char * text , size_t text_len )
{
char * p = buf , * e = buf + size ;
size_t i ;
/* escape non-printable characters */
for ( i = 0 ; i < text_len ; i + + ) {
unsigned char c = text [ i ] ;
if ( c < ' ' | | c > = 127 | | c = = ' \\ ' )
p + = scnprintf ( p , e - p , " \\ x%02x " , c ) ;
else
append_char ( & p , e , c ) ;
}
append_char ( & p , e , ' \n ' ) ;
if ( dict_len ) {
bool line = true ;
for ( i = 0 ; i < dict_len ; i + + ) {
unsigned char c = dict [ i ] ;
if ( line ) {
append_char ( & p , e , ' ' ) ;
line = false ;
}
if ( c = = ' \0 ' ) {
append_char ( & p , e , ' \n ' ) ;
line = true ;
continue ;
}
if ( c < ' ' | | c > = 127 | | c = = ' \\ ' ) {
p + = scnprintf ( p , e - p , " \\ x%02x " , c ) ;
continue ;
}
append_char ( & p , e , c ) ;
}
append_char ( & p , e , ' \n ' ) ;
}
return p - buf ;
}
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
/* /dev/kmsg - userspace message inject/listen interface */
struct devkmsg_user {
u64 seq ;
u32 idx ;
kmsg - export "continuation record" flag to /dev/kmsg
In some cases we are forced to store individual records for a continuation
line print.
Export a flag to allow the external re-construction of the line. The flag
allows us to apply a similar logic externally which is used internally when
the console, /proc/kmsg or the syslog() output is printed.
$ cat /dev/kmsg
4,165,0,-;Free swap = 0kB
4,166,0,-;Total swap = 0kB
6,167,0,c;[
4,168,0,+;0
4,169,0,+;1
4,170,0,+;2
4,171,0,+;3
4,172,0,+;]
6,173,0,-;[0 1 2 3 ]
6,174,0,-;Console: colour VGA+ 80x25
6,175,0,-;console [tty0] enabled
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 05:35:30 +04:00
enum log_flags prev ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
struct mutex lock ;
2015-06-26 01:01:24 +03:00
char buf [ CONSOLE_EXT_LOG_MAX ] ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
} ;
2014-08-23 20:23:53 +04:00
static ssize_t devkmsg_write ( struct kiocb * iocb , struct iov_iter * from )
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
{
char * buf , * line ;
int level = default_message_loglevel ;
int facility = 1 ; /* LOG_USER */
2015-02-11 21:56:46 +03:00
size_t len = iov_iter_count ( from ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ssize_t ret = len ;
if ( len > LOG_LINE_MAX )
return - EINVAL ;
buf = kmalloc ( len + 1 , GFP_KERNEL ) ;
if ( buf = = NULL )
return - ENOMEM ;
2014-08-23 20:23:53 +04:00
buf [ len ] = ' \0 ' ;
if ( copy_from_iter ( buf , len , from ) ! = len ) {
kfree ( buf ) ;
return - EFAULT ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
/*
* Extract and skip the syslog prefix < [ 0 - 9 ] * > . Coming from userspace
* the decimal value represents 32 bit , the lower 3 bit are the log
* level , the rest are the log facility .
*
* If no prefix or no userspace facility is specified , we
* enforce LOG_USER , to be able to reliably distinguish
* kernel - generated messages from userspace - injected ones .
*/
line = buf ;
if ( line [ 0 ] = = ' < ' ) {
char * endp = NULL ;
2015-11-07 03:30:38 +03:00
unsigned int u ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
2015-11-07 03:30:38 +03:00
u = simple_strtoul ( line + 1 , & endp , 10 ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( endp & & endp [ 0 ] = = ' > ' ) {
2015-11-07 03:30:38 +03:00
level = LOG_LEVEL ( u ) ;
if ( LOG_FACILITY ( u ) ! = 0 )
facility = LOG_FACILITY ( u ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
endp + + ;
len - = endp - line ;
line = endp ;
}
}
printk_emit ( facility , level , NULL , 0 , " %s " , line ) ;
kfree ( buf ) ;
return ret ;
}
static ssize_t devkmsg_read ( struct file * file , char __user * buf ,
size_t count , loff_t * ppos )
{
struct devkmsg_user * user = file - > private_data ;
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
size_t len ;
ssize_t ret ;
if ( ! user )
return - EBADF ;
printk: use mutex lock to stop syslog_seq from going wild
Although syslog_seq and log_next_seq stuff are protected by logbuf_lock
spin log, it's not enough. Say we have two processes A and B, and let
syslog_seq = N, while log_next_seq = N + 1, and the two processes both
come to syslog_print at almost the same time. And No matter which
process get the spin lock first, it will increase syslog_seq by one,
then release spin lock; thus later, another process increase syslog_seq
by one again. In this case, syslog_seq is bigger than syslog_next_seq.
And latter, it would make:
wait_event_interruptiable(log_wait, syslog != log_next_seq)
don't wait any more even there is no new write comes. Thus it introduce
a infinite loop reading.
I can easily see this kind of issue by the following steps:
# cat /proc/kmsg # at meantime, I don't kill rsyslog
# So they are the two processes.
# xinit # I added drm.debug=6 in the kernel parameter line,
# so that it will produce lots of message and let that
# issue happen
It's 100% reproducable on my side. And my disk will be filled up by
/var/log/messages in a quite short time.
So, introduce a mutex_lock to stop syslog_seq from going wild just like
what devkmsg_read() does. It does fix this issue as expected.
v2: use mutex_lock_interruptiable() instead (comments from Kay)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-By: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-16 17:21:51 +04:00
ret = mutex_lock_interruptible ( & user - > lock ) ;
if ( ret )
return ret ;
2012-07-06 20:50:08 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
while ( user - > seq = = log_next_seq ) {
if ( file - > f_flags & O_NONBLOCK ) {
ret = - EAGAIN ;
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
goto out ;
}
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
ret = wait_event_interruptible ( log_wait ,
user - > seq ! = log_next_seq ) ;
if ( ret )
goto out ;
2012-07-06 20:50:08 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
if ( user - > seq < log_first_seq ) {
/* our last seen message is gone, return error and reset */
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
ret = - EPIPE ;
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
goto out ;
}
msg = log_from_idx ( user - > idx ) ;
2015-06-26 01:01:27 +03:00
len = msg_print_ext_header ( user - > buf , sizeof ( user - > buf ) ,
msg , user - > seq , user - > prev ) ;
len + = msg_print_ext_body ( user - > buf + len , sizeof ( user - > buf ) - len ,
log_dict ( msg ) , msg - > dict_len ,
log_text ( msg ) , msg - > text_len ) ;
kmsg - export "continuation record" flag to /dev/kmsg
In some cases we are forced to store individual records for a continuation
line print.
Export a flag to allow the external re-construction of the line. The flag
allows us to apply a similar logic externally which is used internally when
the console, /proc/kmsg or the syslog() output is printed.
$ cat /dev/kmsg
4,165,0,-;Free swap = 0kB
4,166,0,-;Total swap = 0kB
6,167,0,c;[
4,168,0,+;0
4,169,0,+;1
4,170,0,+;2
4,171,0,+;3
4,172,0,+;]
6,173,0,-;[0 1 2 3 ]
6,174,0,-;Console: colour VGA+ 80x25
6,175,0,-;console [tty0] enabled
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 05:35:30 +04:00
user - > prev = msg - > flags ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
user - > idx = log_next ( user - > idx ) ;
user - > seq + + ;
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( len > count ) {
ret = - EINVAL ;
goto out ;
}
if ( copy_to_user ( buf , user - > buf , len ) ) {
ret = - EFAULT ;
goto out ;
}
ret = len ;
out :
mutex_unlock ( & user - > lock ) ;
return ret ;
}
static loff_t devkmsg_llseek ( struct file * file , loff_t offset , int whence )
{
struct devkmsg_user * user = file - > private_data ;
loff_t ret = 0 ;
if ( ! user )
return - EBADF ;
if ( offset )
return - ESPIPE ;
2012-07-06 20:50:08 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
switch ( whence ) {
case SEEK_SET :
/* the first record */
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
break ;
case SEEK_DATA :
/*
* The first record after the last SYSLOG_ACTION_CLEAR ,
* like issued by ' dmesg - c ' . Reading / dev / kmsg itself
* changes no global state , and does not clear anything .
*/
user - > idx = clear_idx ;
user - > seq = clear_seq ;
break ;
case SEEK_END :
/* after the last record */
user - > idx = log_next_idx ;
user - > seq = log_next_seq ;
break ;
default :
ret = - EINVAL ;
}
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return ret ;
}
static unsigned int devkmsg_poll ( struct file * file , poll_table * wait )
{
struct devkmsg_user * user = file - > private_data ;
int ret = 0 ;
if ( ! user )
return POLLERR | POLLNVAL ;
poll_wait ( file , & log_wait , wait ) ;
2012-07-06 20:50:08 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( user - > seq < log_next_seq ) {
/* return error when data has vanished underneath us */
if ( user - > seq < log_first_seq )
ret = POLLIN | POLLRDNORM | POLLERR | POLLPRI ;
2013-04-30 03:17:20 +04:00
else
ret = POLLIN | POLLRDNORM ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
}
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
return ret ;
}
static int devkmsg_open ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user ;
int err ;
/* write-only does not need any file context */
if ( ( file - > f_flags & O_ACCMODE ) = = O_WRONLY )
return 0 ;
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
err = check_syslog_permissions ( SYSLOG_ACTION_READ_ALL ,
SYSLOG_FROM_READER ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
if ( err )
return err ;
user = kmalloc ( sizeof ( struct devkmsg_user ) , GFP_KERNEL ) ;
if ( ! user )
return - ENOMEM ;
mutex_init ( & user - > lock ) ;
2012-07-06 20:50:08 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
user - > idx = log_first_idx ;
user - > seq = log_first_seq ;
2012-07-06 20:50:08 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
file - > private_data = user ;
return 0 ;
}
static int devkmsg_release ( struct inode * inode , struct file * file )
{
struct devkmsg_user * user = file - > private_data ;
if ( ! user )
return 0 ;
mutex_destroy ( & user - > lock ) ;
kfree ( user ) ;
return 0 ;
}
const struct file_operations kmsg_fops = {
. open = devkmsg_open ,
. read = devkmsg_read ,
2014-08-23 20:23:53 +04:00
. write_iter = devkmsg_write ,
kmsg: export printk records to the /dev/kmsg interface
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-03 04:29:41 +04:00
. llseek = devkmsg_llseek ,
. poll = devkmsg_poll ,
. release = devkmsg_release ,
} ;
2015-09-10 01:38:55 +03:00
# ifdef CONFIG_KEXEC_CORE
2009-04-03 03:58:57 +04:00
/*
2013-11-13 03:08:54 +04:00
* This appends the listed symbols to / proc / vmcore
2009-04-03 03:58:57 +04:00
*
2013-11-13 03:08:54 +04:00
* / proc / vmcore is used by various utilities , like crash and makedumpfile to
2009-04-03 03:58:57 +04:00
* obtain access to symbols that are otherwise very difficult to locate . These
* symbols are specifically used so that utilities can access and extract the
* dmesg log from a vmcore file after a crash .
*/
void log_buf_kexec_setup ( void )
{
VMCOREINFO_SYMBOL ( log_buf ) ;
VMCOREINFO_SYMBOL ( log_buf_len ) ;
2012-05-03 04:29:13 +04:00
VMCOREINFO_SYMBOL ( log_first_idx ) ;
2016-03-18 00:21:30 +03:00
VMCOREINFO_SYMBOL ( clear_idx ) ;
2012-05-03 04:29:13 +04:00
VMCOREINFO_SYMBOL ( log_next_idx ) ;
2012-07-18 21:18:12 +04:00
/*
2013-08-01 00:53:47 +04:00
* Export struct printk_log size and field offsets . User space tools can
2012-07-18 21:18:12 +04:00
* parse it and detect any changes to structure down the line .
*/
2013-08-01 00:53:47 +04:00
VMCOREINFO_STRUCT_SIZE ( printk_log ) ;
VMCOREINFO_OFFSET ( printk_log , ts_nsec ) ;
VMCOREINFO_OFFSET ( printk_log , len ) ;
VMCOREINFO_OFFSET ( printk_log , text_len ) ;
VMCOREINFO_OFFSET ( printk_log , dict_len ) ;
2009-04-03 03:58:57 +04:00
}
# endif
2011-05-25 04:13:20 +04:00
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len ;
2014-08-07 03:08:52 +04:00
/* we practice scaling the ring buffer by powers of 2 */
static void __init log_buf_len_update ( unsigned size )
2005-04-17 02:20:36 +04:00
{
if ( size )
size = roundup_pow_of_two ( size ) ;
2011-05-25 04:13:20 +04:00
if ( size > log_buf_len )
new_log_buf_len = size ;
2014-08-07 03:08:52 +04:00
}
/* save requested log_buf_len since it's too early to process it */
static int __init log_buf_len_setup ( char * str )
{
unsigned size = memparse ( str , & str ) ;
log_buf_len_update ( size ) ;
2011-05-25 04:13:20 +04:00
return 0 ;
2005-04-17 02:20:36 +04:00
}
2011-05-25 04:13:20 +04:00
early_param ( " log_buf_len " , log_buf_len_setup ) ;
2014-10-14 02:51:11 +04:00
# ifdef CONFIG_SMP
# define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
2014-08-07 03:08:56 +04:00
static void __init log_buf_add_cpu ( void )
{
unsigned int cpu_extra ;
/*
* archs should set up cpu_possible_bits properly with
* set_cpu_possible ( ) after setup_arch ( ) but just in
* case lets ensure this is valid .
*/
if ( num_possible_cpus ( ) = = 1 )
return ;
cpu_extra = ( num_possible_cpus ( ) - 1 ) * __LOG_CPU_MAX_BUF_LEN ;
/* by default this will only continue through for large > 64 CPUs */
if ( cpu_extra < = __LOG_BUF_LEN / 2 )
return ;
pr_info ( " log_buf_len individual max cpu contribution: %d bytes \n " ,
__LOG_CPU_MAX_BUF_LEN ) ;
pr_info ( " log_buf_len total cpu_extra contributions: %d bytes \n " ,
cpu_extra ) ;
pr_info ( " log_buf_len min size: %d bytes \n " , __LOG_BUF_LEN ) ;
log_buf_len_update ( cpu_extra + __LOG_BUF_LEN ) ;
}
2014-10-14 02:51:11 +04:00
# else /* !CONFIG_SMP */
static inline void log_buf_add_cpu ( void ) { }
# endif /* CONFIG_SMP */
2014-08-07 03:08:56 +04:00
2011-05-25 04:13:20 +04:00
void __init setup_log_buf ( int early )
{
unsigned long flags ;
char * new_log_buf ;
int free ;
2014-08-07 03:08:56 +04:00
if ( log_buf ! = __log_buf )
return ;
if ( ! early & & ! new_log_buf_len )
log_buf_add_cpu ( ) ;
2011-05-25 04:13:20 +04:00
if ( ! new_log_buf_len )
return ;
2005-04-17 02:20:36 +04:00
2011-05-25 04:13:20 +04:00
if ( early ) {
2014-01-22 03:50:23 +04:00
new_log_buf =
2014-08-07 03:08:49 +04:00
memblock_virt_alloc ( new_log_buf_len , LOG_ALIGN ) ;
2011-05-25 04:13:20 +04:00
} else {
2014-08-07 03:08:49 +04:00
new_log_buf = memblock_virt_alloc_nopanic ( new_log_buf_len ,
LOG_ALIGN ) ;
2011-05-25 04:13:20 +04:00
}
if ( unlikely ( ! new_log_buf ) ) {
pr_err ( " log_buf_len: %ld bytes not available \n " ,
new_log_buf_len ) ;
return ;
}
2009-07-25 19:50:36 +04:00
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
2011-05-25 04:13:20 +04:00
log_buf_len = new_log_buf_len ;
log_buf = new_log_buf ;
new_log_buf_len = 0 ;
2012-05-03 04:29:13 +04:00
free = __LOG_BUF_LEN - log_next_idx ;
memcpy ( log_buf , __log_buf , __LOG_BUF_LEN ) ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
2011-05-25 04:13:20 +04:00
2014-08-07 03:08:54 +04:00
pr_info ( " log_buf_len: %d bytes \n " , log_buf_len ) ;
2011-05-25 04:13:20 +04:00
pr_info ( " early log buf free: %d(%d%%) \n " ,
free , ( free * 100 ) / __LOG_BUF_LEN ) ;
}
2005-04-17 02:20:36 +04:00
2012-12-18 03:59:56 +04:00
static bool __read_mostly ignore_loglevel ;
static int __init ignore_loglevel_setup ( char * str )
{
2014-08-07 03:09:12 +04:00
ignore_loglevel = true ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: ignoring loglevel setting. \n " ) ;
2012-12-18 03:59:56 +04:00
return 0 ;
}
early_param ( " ignore_loglevel " , ignore_loglevel_setup ) ;
module_param ( ignore_loglevel , bool , S_IRUGO | S_IWUSR ) ;
2015-02-13 02:01:34 +03:00
MODULE_PARM_DESC ( ignore_loglevel ,
" ignore loglevel setting (prints all kernel messages to the console) " ) ;
2012-12-18 03:59:56 +04:00
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
static bool suppress_message_printing ( int level )
{
return ( level > = console_loglevel & & ! ignore_loglevel ) ;
}
2007-10-16 12:23:46 +04:00
# ifdef CONFIG_BOOT_PRINTK_DELAY
2010-10-27 01:22:48 +04:00
static int boot_delay ; /* msecs delay after each printk during bootup */
2009-09-23 03:43:31 +04:00
static unsigned long long loops_per_msec ; /* based on boot_delay */
2007-10-16 12:23:46 +04:00
static int __init boot_delay_setup ( char * str )
{
unsigned long lpj ;
lpj = preset_lpj ? preset_lpj : 1000000 ; /* some guess */
loops_per_msec = ( unsigned long long ) lpj / 1000 * HZ ;
get_option ( & str , & boot_delay ) ;
if ( boot_delay > 10 * 1000 )
boot_delay = 0 ;
2009-09-23 03:43:31 +04:00
pr_debug ( " boot_delay: %u, preset_lpj: %ld, lpj: %lu, "
" HZ: %d, loops_per_msec: %llu \n " ,
boot_delay , preset_lpj , lpj , HZ , loops_per_msec ) ;
2013-11-13 03:08:53 +04:00
return 0 ;
2007-10-16 12:23:46 +04:00
}
2013-11-13 03:08:53 +04:00
early_param ( " boot_delay " , boot_delay_setup ) ;
2007-10-16 12:23:46 +04:00
2012-12-18 03:59:56 +04:00
static void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
unsigned long long k ;
unsigned long timeout ;
2012-12-18 03:59:56 +04:00
if ( ( boot_delay = = 0 | | system_state ! = SYSTEM_BOOTING )
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
| | suppress_message_printing ( level ) ) {
2007-10-16 12:23:46 +04:00
return ;
2012-12-18 03:59:56 +04:00
}
2007-10-16 12:23:46 +04:00
2009-09-23 03:43:31 +04:00
k = ( unsigned long long ) loops_per_msec * boot_delay ;
2007-10-16 12:23:46 +04:00
timeout = jiffies + msecs_to_jiffies ( boot_delay ) ;
while ( k ) {
k - - ;
cpu_relax ( ) ;
/*
* use ( volatile ) jiffies to prevent
* compiler reduction ; loop termination via jiffies
* is secondary and may or may not happen .
*/
if ( time_after ( jiffies , timeout ) )
break ;
touch_nmi_watchdog ( ) ;
}
}
# else
2012-12-18 03:59:56 +04:00
static inline void boot_delay_msec ( int level )
2007-10-16 12:23:46 +04:00
{
}
# endif
2014-08-07 03:09:05 +04:00
static bool printk_time = IS_ENABLED ( CONFIG_PRINTK_TIME ) ;
2012-05-03 04:29:13 +04:00
module_param_named ( time , printk_time , bool , S_IRUGO | S_IWUSR ) ;
2012-06-28 11:38:53 +04:00
static size_t print_time ( u64 ts , char * buf )
{
unsigned long rem_nsec ;
if ( ! printk_time )
return 0 ;
printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e42 ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 03:35:50 +04:00
rem_nsec = do_div ( ts , 1000000000 ) ;
2012-06-28 11:38:53 +04:00
if ( ! buf )
printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e42 ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 03:35:50 +04:00
return snprintf ( NULL , 0 , " [%5lu.000000] " , ( unsigned long ) ts ) ;
2012-06-28 11:38:53 +04:00
return sprintf ( buf , " [%5lu.%06lu] " ,
( unsigned long ) ts , rem_nsec / 1000 ) ;
}
2013-08-01 00:53:47 +04:00
static size_t print_prefix ( const struct printk_log * msg , bool syslog , char * buf )
2012-05-10 06:30:45 +04:00
{
2012-05-14 01:30:46 +04:00
size_t len = 0 ;
2012-07-06 20:50:09 +04:00
unsigned int prefix = ( msg - > facility < < 3 ) | msg - > level ;
2012-05-10 06:30:45 +04:00
2012-05-14 01:30:46 +04:00
if ( syslog ) {
if ( buf ) {
2012-07-06 20:50:09 +04:00
len + = sprintf ( buf , " <%u> " , prefix ) ;
2012-05-14 01:30:46 +04:00
} else {
len + = 3 ;
2012-07-06 20:50:09 +04:00
if ( prefix > 999 )
len + = 3 ;
else if ( prefix > 99 )
len + = 2 ;
else if ( prefix > 9 )
2012-05-14 01:30:46 +04:00
len + + ;
}
}
2012-05-10 06:30:45 +04:00
2012-06-28 11:38:53 +04:00
len + = print_time ( msg - > ts_nsec , buf ? buf + len : NULL ) ;
2012-05-14 01:30:46 +04:00
return len ;
2012-05-10 06:30:45 +04:00
}
2013-08-01 00:53:47 +04:00
static size_t msg_print_text ( const struct printk_log * msg , enum log_flags prev ,
2012-07-09 23:15:42 +04:00
bool syslog , char * buf , size_t size )
2012-05-03 04:29:13 +04:00
{
2012-05-14 01:30:46 +04:00
const char * text = log_text ( msg ) ;
size_t text_size = msg - > text_len ;
2012-07-09 23:15:42 +04:00
bool prefix = true ;
bool newline = true ;
2012-05-14 01:30:46 +04:00
size_t len = 0 ;
2012-07-09 23:15:42 +04:00
if ( ( prev & LOG_CONT ) & & ! ( msg - > flags & LOG_PREFIX ) )
prefix = false ;
if ( msg - > flags & LOG_CONT ) {
if ( ( prev & LOG_CONT ) & & ! ( prev & LOG_NEWLINE ) )
prefix = false ;
if ( ! ( msg - > flags & LOG_NEWLINE ) )
newline = false ;
}
2012-05-14 01:30:46 +04:00
do {
const char * next = memchr ( text , ' \n ' , text_size ) ;
size_t text_len ;
if ( next ) {
text_len = next - text ;
next + + ;
text_size - = next - text ;
} else {
text_len = text_size ;
}
2012-05-03 04:29:13 +04:00
2012-05-14 01:30:46 +04:00
if ( buf ) {
if ( print_prefix ( msg , syslog , NULL ) +
2012-07-17 05:35:29 +04:00
text_len + 1 > = size - len )
2012-05-14 01:30:46 +04:00
break ;
2012-05-03 04:29:13 +04:00
2012-07-09 23:15:42 +04:00
if ( prefix )
len + = print_prefix ( msg , syslog , buf + len ) ;
2012-05-14 01:30:46 +04:00
memcpy ( buf + len , text , text_len ) ;
len + = text_len ;
2012-07-09 23:15:42 +04:00
if ( next | | newline )
buf [ len + + ] = ' \n ' ;
2012-05-14 01:30:46 +04:00
} else {
/* SYSLOG_ACTION_* buffer size only calculation */
2012-07-09 23:15:42 +04:00
if ( prefix )
len + = print_prefix ( msg , syslog , NULL ) ;
len + = text_len ;
if ( next | | newline )
len + + ;
2012-05-14 01:30:46 +04:00
}
2012-05-03 04:29:13 +04:00
2012-07-09 23:15:42 +04:00
prefix = true ;
2012-05-14 01:30:46 +04:00
text = next ;
} while ( text ) ;
2012-05-03 04:29:13 +04:00
return len ;
}
static int syslog_print ( char __user * buf , int size )
{
char * text ;
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-06-22 19:36:09 +04:00
int len = 0 ;
2012-05-03 04:29:13 +04:00
2012-07-17 05:35:29 +04:00
text = kmalloc ( LOG_LINE_MAX + PREFIX_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
2012-06-22 19:36:09 +04:00
while ( size > 0 ) {
size_t n ;
2012-07-09 21:05:10 +04:00
size_t skip ;
2012-06-22 19:36:09 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
if ( syslog_seq < log_first_seq ) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq ;
syslog_idx = log_first_idx ;
2012-07-09 23:15:42 +04:00
syslog_prev = 0 ;
2012-07-09 21:05:10 +04:00
syslog_partial = 0 ;
2012-06-22 19:36:09 +04:00
}
if ( syslog_seq = = log_next_seq ) {
raw_spin_unlock_irq ( & logbuf_lock ) ;
break ;
}
2012-07-09 21:05:10 +04:00
skip = syslog_partial ;
2012-06-22 19:36:09 +04:00
msg = log_from_idx ( syslog_idx ) ;
2012-07-17 05:35:29 +04:00
n = msg_print_text ( msg , syslog_prev , true , text ,
LOG_LINE_MAX + PREFIX_MAX ) ;
2012-07-09 21:05:10 +04:00
if ( n - syslog_partial < = size ) {
/* message fits into buffer, move forward */
2012-06-22 19:36:09 +04:00
syslog_idx = log_next ( syslog_idx ) ;
syslog_seq + + ;
2012-07-09 23:15:42 +04:00
syslog_prev = msg - > flags ;
2012-07-09 21:05:10 +04:00
n - = syslog_partial ;
syslog_partial = 0 ;
} else if ( ! len ) {
/* partial read(), remember position */
n = size ;
syslog_partial + = n ;
2012-06-22 19:36:09 +04:00
} else
n = 0 ;
raw_spin_unlock_irq ( & logbuf_lock ) ;
if ( ! n )
break ;
2012-07-09 21:05:10 +04:00
if ( copy_to_user ( buf , text + skip , n ) ) {
2012-06-22 19:36:09 +04:00
if ( ! len )
len = - EFAULT ;
break ;
}
2012-07-09 21:05:10 +04:00
len + = n ;
size - = n ;
buf + = n ;
2012-05-03 04:29:13 +04:00
}
kfree ( text ) ;
return len ;
}
static int syslog_print_all ( char __user * buf , int size , bool clear )
{
char * text ;
int len = 0 ;
2012-07-17 05:35:29 +04:00
text = kmalloc ( LOG_LINE_MAX + PREFIX_MAX , GFP_KERNEL ) ;
2012-05-03 04:29:13 +04:00
if ( ! text )
return - ENOMEM ;
raw_spin_lock_irq ( & logbuf_lock ) ;
if ( buf ) {
u64 next_seq ;
u64 seq ;
u32 idx ;
2012-07-09 23:15:42 +04:00
enum log_flags prev ;
2012-05-03 04:29:13 +04:00
/*
* Find first record that fits , including all following records ,
* into the user - provided buffer for this dump .
2012-06-15 16:07:51 +04:00
*/
2012-05-03 04:29:13 +04:00
seq = clear_seq ;
idx = clear_idx ;
2012-07-09 23:15:42 +04:00
prev = 0 ;
2012-05-03 04:29:13 +04:00
while ( seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2012-07-09 23:15:42 +04:00
len + = msg_print_text ( msg , prev , true , NULL , 0 ) ;
2012-08-10 23:07:09 +04:00
prev = msg - > flags ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
2012-06-15 16:07:51 +04:00
/* move first record forward until length fits into the buffer */
2012-05-03 04:29:13 +04:00
seq = clear_seq ;
idx = clear_idx ;
2012-07-09 23:15:42 +04:00
prev = 0 ;
2012-05-03 04:29:13 +04:00
while ( len > size & & seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2012-07-09 23:15:42 +04:00
len - = msg_print_text ( msg , prev , true , NULL , 0 ) ;
2012-08-10 23:07:09 +04:00
prev = msg - > flags ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
}
2012-06-15 16:07:51 +04:00
/* last message fitting into this dump */
2012-05-03 04:29:13 +04:00
next_seq = log_next_seq ;
len = 0 ;
while ( len > = 0 & & seq < next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-03 04:29:13 +04:00
int textlen ;
2012-07-17 05:35:29 +04:00
textlen = msg_print_text ( msg , prev , true , text ,
LOG_LINE_MAX + PREFIX_MAX ) ;
2012-05-03 04:29:13 +04:00
if ( textlen < 0 ) {
len = textlen ;
break ;
}
idx = log_next ( idx ) ;
seq + + ;
2012-07-09 23:15:42 +04:00
prev = msg - > flags ;
2012-05-03 04:29:13 +04:00
raw_spin_unlock_irq ( & logbuf_lock ) ;
if ( copy_to_user ( buf + len , text , textlen ) )
len = - EFAULT ;
else
len + = textlen ;
raw_spin_lock_irq ( & logbuf_lock ) ;
if ( seq < log_first_seq ) {
/* messages are gone, move to next one */
seq = log_first_seq ;
idx = log_first_idx ;
2012-07-09 23:15:42 +04:00
prev = 0 ;
2012-05-03 04:29:13 +04:00
}
}
}
if ( clear ) {
clear_seq = log_next_seq ;
clear_idx = log_next_idx ;
}
raw_spin_unlock_irq ( & logbuf_lock ) ;
kfree ( text ) ;
return len ;
}
2015-06-26 01:01:47 +03:00
int do_syslog ( int type , char __user * buf , int len , int source )
2005-04-17 02:20:36 +04:00
{
2012-05-03 04:29:13 +04:00
bool clear = false ;
2014-12-11 02:50:15 +03:00
static int saved_console_loglevel = LOGLEVEL_DEFAULT ;
2011-02-11 04:53:55 +03:00
int error ;
2005-04-17 02:20:36 +04:00
2015-06-26 01:01:47 +03:00
error = check_syslog_permissions ( type , source ) ;
2011-02-11 04:53:55 +03:00
if ( error )
goto out ;
2010-11-16 02:36:29 +03:00
2005-04-17 02:20:36 +04:00
switch ( type ) {
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_CLOSE : /* Close log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_OPEN : /* Open log */
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
case SYSLOG_ACTION_READ : /* Read from log */
2005-04-17 02:20:36 +04:00
error = - EINVAL ;
if ( ! buf | | len < 0 )
goto out ;
error = 0 ;
if ( ! len )
goto out ;
if ( ! access_ok ( VERIFY_WRITE , buf , len ) ) {
error = - EFAULT ;
goto out ;
}
2005-10-31 02:02:46 +03:00
error = wait_event_interruptible ( log_wait ,
2012-05-03 04:29:13 +04:00
syslog_seq ! = log_next_seq ) ;
kmsg: properly handle concurrent non-blocking read() from /proc/kmsg
The /proc/kmsg read() interface is internally simply wired up to a sequence
of syslog() syscalls, which might are racy between their checks and actions,
regarding concurrency.
In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on
usual O_NONBLOCK behavior, the recently introduced mutex might block an
O_NONBLOCK reader in read(), when poll() returns for it, but another process
has already read the data in the meantime. We've seen that while running
artificial test setups and tools that "fight" about /proc/kmsg data.
This restores the original /proc/kmsg behavior, where in case of concurrent
read()s, poll() might wake up but the read() syscall will just return 0 to
the caller, while another process has "stolen" the data.
This is in the general case not the expected behavior, but it is the exact
same one, that can easily be triggered with a 3.4 kernel, and some tools
might just rely on it.
The mutex is not needed, the original integrity issue which introduced it,
is in the meantime covered by:
"fill buffer with more than a single message for SYSLOG_ACTION_READ"
116e90b23f74d303e8d607c7a7d54f60f14ab9f2
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-06 20:50:09 +04:00
if ( error )
2005-04-17 02:20:36 +04:00
goto out ;
2012-05-03 04:29:13 +04:00
error = syslog_print ( buf , len ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR :
2012-05-03 04:29:13 +04:00
clear = true ;
2005-04-17 02:20:36 +04:00
/* FALL THRU */
2010-02-04 02:37:13 +03:00
/* Read last kernel messages */
case SYSLOG_ACTION_READ_ALL :
2005-04-17 02:20:36 +04:00
error = - EINVAL ;
if ( ! buf | | len < 0 )
goto out ;
error = 0 ;
if ( ! len )
goto out ;
if ( ! access_ok ( VERIFY_WRITE , buf , len ) ) {
error = - EFAULT ;
goto out ;
}
2012-05-03 04:29:13 +04:00
error = syslog_print_all ( buf , len , clear ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR :
2012-05-03 04:29:13 +04:00
syslog_print_all ( NULL , 0 , true ) ;
2012-06-23 01:12:19 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel = = LOGLEVEL_DEFAULT )
2009-07-06 15:31:48 +04:00
saved_console_loglevel = console_loglevel ;
2005-04-17 02:20:36 +04:00
console_loglevel = minimum_console_loglevel ;
break ;
2010-02-04 02:37:13 +03:00
/* Enable logging to console */
case SYSLOG_ACTION_CONSOLE_ON :
2014-12-11 02:50:15 +03:00
if ( saved_console_loglevel ! = LOGLEVEL_DEFAULT ) {
2009-07-06 15:31:48 +04:00
console_loglevel = saved_console_loglevel ;
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2009-07-06 15:31:48 +04:00
}
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Set level of messages printed to console */
case SYSLOG_ACTION_CONSOLE_LEVEL :
2005-04-17 02:20:36 +04:00
error = - EINVAL ;
if ( len < 1 | | len > 8 )
goto out ;
if ( len < minimum_console_loglevel )
len = minimum_console_loglevel ;
console_loglevel = len ;
2009-07-06 15:31:48 +04:00
/* Implicitly re-enable logging to console */
2014-12-11 02:50:15 +03:00
saved_console_loglevel = LOGLEVEL_DEFAULT ;
2005-04-17 02:20:36 +04:00
error = 0 ;
break ;
2010-02-04 02:37:13 +03:00
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD :
2012-05-03 04:29:13 +04:00
raw_spin_lock_irq ( & logbuf_lock ) ;
if ( syslog_seq < log_first_seq ) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq ;
syslog_idx = log_first_idx ;
2012-07-09 23:15:42 +04:00
syslog_prev = 0 ;
2012-07-09 21:05:10 +04:00
syslog_partial = 0 ;
2012-05-03 04:29:13 +04:00
}
2015-06-26 01:01:47 +03:00
if ( source = = SYSLOG_FROM_PROC ) {
2012-05-03 04:29:13 +04:00
/*
* Short - cut for poll ( / " proc/kmsg " ) which simply checks
* for pending data , not the size ; return the count of
* records , not the length .
*/
2014-08-07 03:08:59 +04:00
error = log_next_seq - syslog_seq ;
2012-05-03 04:29:13 +04:00
} else {
2012-07-09 23:15:42 +04:00
u64 seq = syslog_seq ;
u32 idx = syslog_idx ;
enum log_flags prev = syslog_prev ;
2012-05-03 04:29:13 +04:00
error = 0 ;
while ( seq < log_next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-05-14 01:30:46 +04:00
2012-07-09 23:15:42 +04:00
error + = msg_print_text ( msg , prev , true , NULL , 0 ) ;
2012-05-03 04:29:13 +04:00
idx = log_next ( idx ) ;
seq + + ;
2012-07-09 23:15:42 +04:00
prev = msg - > flags ;
2012-05-03 04:29:13 +04:00
}
2012-07-09 21:05:10 +04:00
error - = syslog_partial ;
2012-05-03 04:29:13 +04:00
}
raw_spin_unlock_irq ( & logbuf_lock ) ;
2005-04-17 02:20:36 +04:00
break ;
2010-02-04 02:37:13 +03:00
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER :
2005-04-17 02:20:36 +04:00
error = log_buf_len ;
break ;
default :
error = - EINVAL ;
break ;
}
out :
return error ;
}
2009-01-14 16:14:29 +03:00
SYSCALL_DEFINE3 ( syslog , int , type , char __user * , buf , int , len )
2005-04-17 02:20:36 +04:00
{
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 01:04:39 +04:00
return do_syslog ( type , buf , len , SYSLOG_FROM_READER ) ;
2005-04-17 02:20:36 +04:00
}
/*
* Call the console drivers , asking them to write out
* log_buf [ start ] to log_buf [ end - 1 ] .
2011-01-26 02:07:35 +03:00
* The console_lock must be held .
2005-04-17 02:20:36 +04:00
*/
2015-06-26 01:01:30 +03:00
static void call_console_drivers ( int level ,
const char * ext_text , size_t ext_len ,
const char * text , size_t len )
2005-04-17 02:20:36 +04:00
{
2012-05-03 04:29:13 +04:00
struct console * con ;
2005-04-17 02:20:36 +04:00
2013-04-30 03:17:16 +04:00
trace_console ( text , len ) ;
2012-05-03 04:29:13 +04:00
if ( ! console_drivers )
return ;
for_each_console ( con ) {
if ( exclusive_console & & con ! = exclusive_console )
continue ;
if ( ! ( con - > flags & CON_ENABLED ) )
continue ;
if ( ! con - > write )
continue ;
if ( ! cpu_online ( smp_processor_id ( ) ) & &
! ( con - > flags & CON_ANYTIME ) )
continue ;
2015-06-26 01:01:30 +03:00
if ( con - > flags & CON_EXTENDED )
con - > write ( con , ext_text , ext_len ) ;
else
con - > write ( con , text , len ) ;
2012-05-03 04:29:13 +04:00
}
2005-04-17 02:20:36 +04:00
}
/*
2015-02-13 02:01:34 +03:00
* Zap console related locks when oopsing .
* To leave time for slow consoles to print a full oops ,
* only zap at most once every 30 seconds .
2005-04-17 02:20:36 +04:00
*/
static void zap_locks ( void )
{
static unsigned long oops_timestamp ;
if ( time_after_eq ( jiffies , oops_timestamp ) & &
2015-02-13 02:01:34 +03:00
! time_after ( jiffies , oops_timestamp + 30 * HZ ) )
2005-04-17 02:20:36 +04:00
return ;
oops_timestamp = jiffies ;
2011-06-07 13:17:30 +04:00
debug_locks_off ( ) ;
2005-04-17 02:20:36 +04:00
/* If a crash is occurring, make sure we can't deadlock */
2009-07-25 19:50:36 +04:00
raw_spin_lock_init ( & logbuf_lock ) ;
2005-04-17 02:20:36 +04:00
/* And make sure that we print immediately */
2010-09-07 18:33:43 +04:00
sema_init ( & console_sem , 1 ) ;
2005-04-17 02:20:36 +04:00
}
2009-09-23 03:43:33 +04:00
int printk_delay_msec __read_mostly ;
static inline void printk_delay ( void )
{
if ( unlikely ( printk_delay_msec ) ) {
int m = printk_delay_msec ;
while ( m - - ) {
mdelay ( 1 ) ;
touch_nmi_watchdog ( ) ;
}
}
}
2012-06-28 11:38:53 +04:00
/*
* Continuation lines are buffered , and not committed to the record buffer
* until the line is complete , or a race forces it . The line fragments
* though , are printed immediately to the consoles to ensure everything has
* reached the console in case of a kernel crash .
*/
static struct cont {
char buf [ LOG_LINE_MAX ] ;
size_t len ; /* length == 0 means unused buffer */
size_t cons ; /* bytes written to console */
struct task_struct * owner ; /* task of first print*/
u64 ts_nsec ; /* time of first print */
u8 level ; /* log level of first message */
2014-08-07 03:09:03 +04:00
u8 facility ; /* log facility of first message */
2012-07-17 05:35:30 +04:00
enum log_flags flags ; /* prefix, newline flags */
2012-06-28 11:38:53 +04:00
bool flushed : 1 ; /* buffer sealed and committed */
} cont ;
2012-07-17 05:35:29 +04:00
static void cont_flush ( enum log_flags flags )
2012-06-28 11:38:53 +04:00
{
if ( cont . flushed )
return ;
if ( cont . len = = 0 )
return ;
2012-07-17 05:35:30 +04:00
if ( cont . cons ) {
/*
* If a fragment of this line was directly flushed to the
* console ; wait for the console to pick up the rest of the
* line . LOG_NOCONS suppresses a duplicated output .
*/
log_store ( cont . facility , cont . level , flags | LOG_NOCONS ,
cont . ts_nsec , NULL , 0 , cont . buf , cont . len ) ;
cont . flags = flags ;
cont . flushed = true ;
} else {
/*
* If no fragment of this line ever reached the console ,
* just submit it to the store and free the buffer .
*/
log_store ( cont . facility , cont . level , flags , 0 ,
NULL , 0 , cont . buf , cont . len ) ;
cont . len = 0 ;
}
2012-06-28 11:38:53 +04:00
}
static bool cont_add ( int facility , int level , const char * text , size_t len )
{
if ( cont . len & & cont . flushed )
return false ;
2015-06-26 01:01:30 +03:00
/*
* If ext consoles are present , flush and skip in - kernel
* continuation . See nr_ext_console_drivers definition . Also , if
* the line gets too long , split it up in separate records .
*/
if ( nr_ext_console_drivers | | cont . len + len > sizeof ( cont . buf ) ) {
2012-07-17 05:35:29 +04:00
cont_flush ( LOG_CONT ) ;
2012-06-28 11:38:53 +04:00
return false ;
}
if ( ! cont . len ) {
cont . facility = facility ;
cont . level = level ;
cont . owner = current ;
cont . ts_nsec = local_clock ( ) ;
2012-07-17 05:35:30 +04:00
cont . flags = 0 ;
2012-06-28 11:38:53 +04:00
cont . cons = 0 ;
cont . flushed = false ;
}
memcpy ( cont . buf + cont . len , text , len ) ;
cont . len + = len ;
2012-07-17 05:35:30 +04:00
if ( cont . len > ( sizeof ( cont . buf ) * 80 ) / 100 )
cont_flush ( LOG_CONT ) ;
2012-06-28 11:38:53 +04:00
return true ;
}
static size_t cont_print_text ( char * text , size_t size )
{
size_t textlen = 0 ;
size_t len ;
2012-07-17 05:35:30 +04:00
if ( cont . cons = = 0 & & ( console_prev & LOG_NEWLINE ) ) {
2012-06-28 11:38:53 +04:00
textlen + = print_time ( cont . ts_nsec , text ) ;
size - = textlen ;
}
len = cont . len - cont . cons ;
if ( len > 0 ) {
if ( len + 1 > size )
len = size - 1 ;
memcpy ( text + textlen , cont . buf + cont . cons , len ) ;
textlen + = len ;
cont . cons = cont . len ;
}
if ( cont . flushed ) {
2012-07-17 05:35:30 +04:00
if ( cont . flags & LOG_NEWLINE )
text [ textlen + + ] = ' \n ' ;
2012-06-28 11:38:53 +04:00
/* got everything, release buffer */
cont . len = 0 ;
}
return textlen ;
}
2012-05-03 04:29:13 +04:00
asmlinkage int vprintk_emit ( int facility , int level ,
const char * dict , size_t dictlen ,
const char * fmt , va_list args )
2005-04-17 02:20:36 +04:00
{
2016-01-16 03:59:23 +03:00
static bool recursion_bug ;
2012-05-03 04:29:13 +04:00
static char textbuf [ LOG_LINE_MAX ] ;
char * text = textbuf ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
size_t text_len = 0 ;
2012-07-09 23:15:42 +04:00
enum log_flags lflags = 0 ;
2008-05-12 23:21:04 +04:00
unsigned long flags ;
2008-01-25 23:07:58 +03:00
int this_cpu ;
2012-05-03 04:29:13 +04:00
int printed_len = 0 ;
2016-05-21 03:00:36 +03:00
int nmi_message_lost ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
bool in_sched = false ;
2014-06-05 03:11:35 +04:00
/* cpu currently holding logbuf_lock in this function */
2014-12-11 02:51:21 +03:00
static unsigned int logbuf_cpu = UINT_MAX ;
2014-06-05 03:11:35 +04:00
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_SCHED ) {
level = LOGLEVEL_DEFAULT ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
in_sched = true ;
}
2005-04-17 02:20:36 +04:00
2012-12-18 03:59:56 +04:00
boot_delay_msec ( level ) ;
2009-09-23 03:43:33 +04:00
printk_delay ( ) ;
2007-10-16 12:23:46 +04:00
2011-06-07 13:17:30 +04:00
local_irq_save ( flags ) ;
2008-01-25 23:07:58 +03:00
this_cpu = smp_processor_id ( ) ;
/*
* Ouch , printk recursed into itself !
*/
2012-05-03 04:29:13 +04:00
if ( unlikely ( logbuf_cpu = = this_cpu ) ) {
2008-01-25 23:07:58 +03:00
/*
* If a crash is occurring during printk ( ) on this CPU ,
* then try to get the crash message out but make sure
* we can ' t deadlock . Otherwise just return to avoid the
* recursion and return - but flag the recursion so that
* it can be printed at the next appropriate moment :
*/
2011-06-07 13:17:30 +04:00
if ( ! oops_in_progress & & ! lockdep_recursing ( current ) ) {
2016-01-16 03:59:23 +03:00
recursion_bug = true ;
2014-08-07 03:09:10 +04:00
local_irq_restore ( flags ) ;
return 0 ;
2008-01-25 23:07:58 +03:00
}
zap_locks ( ) ;
}
2006-07-03 11:24:58 +04:00
lockdep_off ( ) ;
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
/* This stops the holder of console_sem just where we want him */
2009-07-25 19:50:36 +04:00
raw_spin_lock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
logbuf_cpu = this_cpu ;
2005-04-17 02:20:36 +04:00
2014-09-10 01:50:48 +04:00
if ( unlikely ( recursion_bug ) ) {
2012-05-03 04:29:13 +04:00
static const char recursion_msg [ ] =
" BUG: recent printk recursion! " ;
2016-01-16 03:59:23 +03:00
recursion_bug = false ;
2012-05-03 04:29:13 +04:00
/* emit KERN_CRIT message */
2014-06-05 03:11:33 +04:00
printed_len + = log_store ( 0 , 2 , LOG_PREFIX | LOG_NEWLINE , 0 ,
2014-09-10 01:50:48 +04:00
NULL , 0 , recursion_msg ,
strlen ( recursion_msg ) ) ;
2008-01-25 23:07:58 +03:00
}
2005-04-17 02:20:36 +04:00
2016-05-21 03:00:36 +03:00
nmi_message_lost = get_nmi_message_lost ( ) ;
if ( unlikely ( nmi_message_lost ) ) {
text_len = scnprintf ( textbuf , sizeof ( textbuf ) ,
" BAD LUCK: lost %d message(s) from NMI context! " ,
nmi_message_lost ) ;
printed_len + = log_store ( 0 , 2 , LOG_PREFIX | LOG_NEWLINE , 0 ,
NULL , 0 , textbuf , text_len ) ;
}
2012-05-03 04:29:13 +04:00
/*
* The printf needs to come first ; we need the syslog
* prefix which might be passed - in as a parameter .
*/
2014-10-14 02:51:13 +04:00
text_len = vscnprintf ( text , sizeof ( textbuf ) , fmt , args ) ;
2009-06-16 21:57:02 +04:00
2012-05-03 04:29:13 +04:00
/* mark and strip a trailing newline */
2012-05-14 22:46:27 +04:00
if ( text_len & & text [ text_len - 1 ] = = ' \n ' ) {
text_len - - ;
2012-07-09 23:15:42 +04:00
lflags | = LOG_NEWLINE ;
2012-05-03 04:29:13 +04:00
}
2011-03-13 05:19:51 +03:00
2012-07-31 01:40:19 +04:00
/* strip kernel syslog prefix and extract log level or control flags */
if ( facility = = 0 ) {
int kern_level = printk_get_level ( text ) ;
if ( kern_level ) {
const char * end_of_header = printk_skip_level ( text ) ;
switch ( kern_level ) {
case ' 0 ' . . . ' 7 ' :
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_DEFAULT )
2012-07-31 01:40:19 +04:00
level = kern_level - ' 0 ' ;
2014-12-11 02:50:15 +03:00
/* fallthrough */
2012-07-31 01:40:19 +04:00
case ' d ' : /* KERN_DEFAULT */
lflags | = LOG_PREFIX ;
}
2014-04-04 01:48:41 +04:00
/*
* No need to check length here because vscnprintf
* put ' \0 ' at the end of the string . Only valid and
* newly printed level is detected .
*/
2012-07-31 01:40:19 +04:00
text_len - = end_of_header - text ;
text = ( char * ) end_of_header ;
2009-06-16 21:57:02 +04:00
}
}
2014-12-11 02:50:15 +03:00
if ( level = = LOGLEVEL_DEFAULT )
2012-05-14 22:46:27 +04:00
level = default_message_loglevel ;
2011-03-13 05:19:51 +03:00
2012-07-09 23:15:42 +04:00
if ( dict )
lflags | = LOG_PREFIX | LOG_NEWLINE ;
2008-05-12 23:21:04 +04:00
2012-07-09 23:15:42 +04:00
if ( ! ( lflags & LOG_NEWLINE ) ) {
2012-06-28 11:38:53 +04:00
/*
* Flush the conflicting buffer . An earlier newline was missing ,
* or another task also prints continuation lines .
*/
2012-07-09 23:15:42 +04:00
if ( cont . len & & ( lflags & LOG_PREFIX | | cont . owner ! = current ) )
2012-07-17 05:35:30 +04:00
cont_flush ( LOG_NEWLINE ) ;
2012-05-14 22:46:27 +04:00
2012-06-28 11:38:53 +04:00
/* buffer line if possible, otherwise store it right away */
2014-06-05 03:11:33 +04:00
if ( cont_add ( facility , level , text , text_len ) )
printed_len + = text_len ;
else
printed_len + = log_store ( facility , level ,
lflags | LOG_CONT , 0 ,
dict , dictlen , text , text_len ) ;
2012-05-10 06:32:53 +04:00
} else {
2012-06-28 11:38:53 +04:00
bool stored = false ;
2012-05-14 22:46:27 +04:00
2012-06-28 11:38:53 +04:00
/*
2012-06-29 19:40:11 +04:00
* If an earlier newline was missing and it was the same task ,
* either merge it with the current buffer and flush , or if
* there was a race with interrupts ( prefix = = true ) then just
* flush it out and store this line separately .
2014-01-24 03:54:19 +04:00
* If the preceding printk was from a different task and missed
* a newline , flush and append the newline .
2012-06-28 11:38:53 +04:00
*/
2014-01-24 03:54:19 +04:00
if ( cont . len ) {
if ( cont . owner = = current & & ! ( lflags & LOG_PREFIX ) )
stored = cont_add ( facility , level , text ,
text_len ) ;
2012-07-17 05:35:30 +04:00
cont_flush ( LOG_NEWLINE ) ;
2012-05-14 22:46:27 +04:00
}
2012-06-28 11:38:53 +04:00
2014-06-05 03:11:33 +04:00
if ( stored )
printed_len + = text_len ;
else
printed_len + = log_store ( facility , level , lflags , 0 ,
dict , dictlen , text , text_len ) ;
2005-04-17 02:20:36 +04:00
}
2014-06-05 03:11:35 +04:00
logbuf_cpu = UINT_MAX ;
raw_spin_unlock ( & logbuf_lock ) ;
2014-08-07 03:09:10 +04:00
lockdep_on ( ) ;
local_irq_restore ( flags ) ;
2014-06-05 03:11:37 +04:00
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
/* If called from the scheduler, we can not call up(). */
2014-07-03 02:22:38 +04:00
if ( ! in_sched ) {
2014-08-07 03:09:10 +04:00
lockdep_off ( ) ;
2014-07-03 02:22:38 +04:00
/*
* Try to acquire and then immediately release the console
* semaphore . The release will print out buffers and wake up
* / dev / kmsg and syslog ( ) users .
*/
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
if ( console_trylock ( ) )
2014-07-03 02:22:38 +04:00
console_unlock ( ) ;
2014-08-07 03:09:10 +04:00
lockdep_on ( ) ;
2014-07-03 02:22:38 +04:00
}
2006-06-25 16:47:40 +04:00
2005-04-17 02:20:36 +04:00
return printed_len ;
}
2012-05-03 04:29:13 +04:00
EXPORT_SYMBOL ( vprintk_emit ) ;
asmlinkage int vprintk ( const char * fmt , va_list args )
{
2014-12-11 02:50:15 +03:00
return vprintk_emit ( 0 , LOGLEVEL_DEFAULT , NULL , 0 , fmt , args ) ;
2012-05-03 04:29:13 +04:00
}
2005-04-17 02:20:36 +04:00
EXPORT_SYMBOL ( vprintk ) ;
2012-05-03 04:29:13 +04:00
asmlinkage int printk_emit ( int facility , int level ,
const char * dict , size_t dictlen ,
const char * fmt , . . . )
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
r = vprintk_emit ( facility , level , dict , dictlen , fmt , args ) ;
va_end ( args ) ;
return r ;
}
EXPORT_SYMBOL ( printk_emit ) ;
2016-08-03 00:03:53 +03:00
# ifdef CONFIG_PRINTK
# define define_pr_level(func, loglevel) \
asmlinkage __visible void func ( const char * fmt , . . . ) \
{ \
va_list args ; \
\
va_start ( args , fmt ) ; \
vprintk_default ( loglevel , fmt , args ) ; \
va_end ( args ) ; \
} \
EXPORT_SYMBOL ( func )
define_pr_level ( __pr_emerg , LOGLEVEL_EMERG ) ;
define_pr_level ( __pr_alert , LOGLEVEL_ALERT ) ;
define_pr_level ( __pr_crit , LOGLEVEL_CRIT ) ;
define_pr_level ( __pr_err , LOGLEVEL_ERR ) ;
define_pr_level ( __pr_warn , LOGLEVEL_WARNING ) ;
define_pr_level ( __pr_notice , LOGLEVEL_NOTICE ) ;
define_pr_level ( __pr_info , LOGLEVEL_INFO ) ;
# endif
int vprintk_default ( int level , const char * fmt , va_list args )
2014-06-20 01:33:31 +04:00
{
int r ;
# ifdef CONFIG_KGDB_KDB
if ( unlikely ( kdb_trap_printk ) ) {
2014-11-07 21:37:57 +03:00
r = vkdb_printf ( KDB_MSGSRC_PRINTK , fmt , args ) ;
2014-06-20 01:33:31 +04:00
return r ;
}
# endif
2016-08-03 00:03:53 +03:00
r = vprintk_emit ( 0 , level , NULL , 0 , fmt , args ) ;
2014-06-20 01:33:31 +04:00
return r ;
}
EXPORT_SYMBOL_GPL ( vprintk_default ) ;
2012-05-03 04:29:13 +04:00
/**
* printk - print a kernel message
* @ fmt : format string
*
* This is printk ( ) . It can be called from any context . We want it to work .
*
* We try to grab the console_lock . If we succeed , it ' s easy - we log the
* output and call the console drivers . If we fail to get the semaphore , we
* place the output into the log buffer and return . The current holder of
* the console_sem will notice the new output in console_unlock ( ) ; and will
* send it to the consoles before releasing the lock .
*
* One effect of this deferred printing is that code which calls printk ( ) and
* then changes console_loglevel may break . This is because console_loglevel
* is inspected when the actual printing occurs .
*
* See also :
* printf ( 3 )
*
* See the vsnprintf ( ) documentation for format string extensions over C99 .
*/
2014-05-02 02:44:38 +04:00
asmlinkage __visible int printk ( const char * fmt , . . . )
2012-05-03 04:29:13 +04:00
{
va_list args ;
int r ;
va_start ( args , fmt ) ;
2016-08-03 00:03:53 +03:00
r = vprintk_func ( LOGLEVEL_DEFAULT , fmt , args ) ;
2012-05-03 04:29:13 +04:00
va_end ( args ) ;
return r ;
}
EXPORT_SYMBOL ( printk ) ;
2012-05-09 03:37:51 +04:00
2012-07-17 05:35:29 +04:00
# else /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2012-07-17 05:35:29 +04:00
# define LOG_LINE_MAX 0
# define PREFIX_MAX 0
2014-08-07 03:09:08 +04:00
2012-07-17 05:35:29 +04:00
static u64 syslog_seq ;
static u32 syslog_idx ;
2012-07-17 05:35:30 +04:00
static u64 console_seq ;
static u32 console_idx ;
2012-07-17 05:35:29 +04:00
static enum log_flags syslog_prev ;
static u64 log_first_seq ;
static u32 log_first_idx ;
static u64 log_next_seq ;
2012-07-17 05:35:30 +04:00
static enum log_flags console_prev ;
2012-06-28 11:38:53 +04:00
static struct cont {
size_t len ;
size_t cons ;
u8 level ;
bool flushed : 1 ;
} cont ;
2015-06-26 01:01:30 +03:00
static char * log_text ( const struct printk_log * msg ) { return NULL ; }
static char * log_dict ( const struct printk_log * msg ) { return NULL ; }
2013-08-01 00:53:47 +04:00
static struct printk_log * log_from_idx ( u32 idx ) { return NULL ; }
2012-05-09 03:37:51 +04:00
static u32 log_next ( u32 idx ) { return 0 ; }
2015-06-26 01:01:30 +03:00
static ssize_t msg_print_ext_header ( char * buf , size_t size ,
struct printk_log * msg , u64 seq ,
enum log_flags prev_flags ) { return 0 ; }
static ssize_t msg_print_ext_body ( char * buf , size_t size ,
char * dict , size_t dict_len ,
char * text , size_t text_len ) { return 0 ; }
static void call_console_drivers ( int level ,
const char * ext_text , size_t ext_len ,
const char * text , size_t len ) { }
2013-08-01 00:53:47 +04:00
static size_t msg_print_text ( const struct printk_log * msg , enum log_flags prev ,
2012-07-09 23:15:42 +04:00
bool syslog , char * buf , size_t size ) { return 0 ; }
2012-06-28 11:38:53 +04:00
static size_t cont_print_text ( char * text , size_t size ) { return 0 ; }
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
static bool suppress_message_printing ( int level ) { return false ; }
2005-05-01 19:59:02 +04:00
2014-11-21 17:16:58 +03:00
/* Still needs to be defined for users */
DEFINE_PER_CPU ( printk_func_t , printk_func ) ;
2012-05-09 03:37:51 +04:00
# endif /* CONFIG_PRINTK */
2005-05-01 19:59:02 +04:00
2013-04-30 03:17:18 +04:00
# ifdef CONFIG_EARLY_PRINTK
struct console * early_console ;
2014-05-02 02:44:38 +04:00
asmlinkage __visible void early_printk ( const char * fmt , . . . )
2013-04-30 03:17:18 +04:00
{
va_list ap ;
2014-12-11 02:45:53 +03:00
char buf [ 512 ] ;
int n ;
if ( ! early_console )
return ;
2013-04-30 03:17:18 +04:00
va_start ( ap , fmt ) ;
2014-12-11 02:45:53 +03:00
n = vscnprintf ( buf , sizeof ( buf ) , fmt , ap ) ;
2013-04-30 03:17:18 +04:00
va_end ( ap ) ;
2014-12-11 02:45:53 +03:00
early_console - > write ( early_console , buf , n ) ;
2013-04-30 03:17:18 +04:00
}
# endif
2008-04-30 11:54:51 +04:00
static int __add_preferred_console ( char * name , int idx , char * options ,
char * brl_options )
{
struct console_cmdline * c ;
int i ;
/*
* See if this tty is not yet registered , and
* if we have a slot free .
*/
2013-08-01 00:53:46 +04:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
if ( strcmp ( c - > name , name ) = = 0 & & c - > index = = idx ) {
if ( ! brl_options )
selected_console = i ;
return 0 ;
2008-04-30 11:54:51 +04:00
}
2013-08-01 00:53:46 +04:00
}
2008-04-30 11:54:51 +04:00
if ( i = = MAX_CMDLINECONSOLES )
return - E2BIG ;
if ( ! brl_options )
selected_console = i ;
strlcpy ( c - > name , name , sizeof ( c - > name ) ) ;
c - > options = options ;
2013-08-01 00:53:45 +04:00
braille_set_options ( c , brl_options ) ;
2008-04-30 11:54:51 +04:00
c - > index = idx ;
return 0 ;
}
2006-03-24 14:18:19 +03:00
/*
2014-08-07 03:09:03 +04:00
* Set up a console . Called via do_early_param ( ) in init / main . c
* for each " console= " parameter in the boot command line .
2006-03-24 14:18:19 +03:00
*/
static int __init console_setup ( char * str )
{
2014-08-07 03:09:03 +04:00
char buf [ sizeof ( console_cmdline [ 0 ] . name ) + 4 ] ; /* 4 for "ttyS" */
2008-04-30 11:54:51 +04:00
char * s , * options , * brl_options = NULL ;
2006-03-24 14:18:19 +03:00
int idx ;
2013-08-01 00:53:45 +04:00
if ( _braille_console_setup ( & str , & brl_options ) )
return 1 ;
2008-04-30 11:54:51 +04:00
2006-03-24 14:18:19 +03:00
/*
* Decode str into name , index , options .
*/
if ( str [ 0 ] > = ' 0 ' & & str [ 0 ] < = ' 9 ' ) {
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS " ) ;
strncpy ( buf + 4 , str , sizeof ( buf ) - 5 ) ;
2006-03-24 14:18:19 +03:00
} else {
2007-07-16 10:37:27 +04:00
strncpy ( buf , str , sizeof ( buf ) - 1 ) ;
2006-03-24 14:18:19 +03:00
}
2007-07-16 10:37:27 +04:00
buf [ sizeof ( buf ) - 1 ] = 0 ;
2014-08-07 03:09:08 +04:00
options = strchr ( str , ' , ' ) ;
if ( options )
2006-03-24 14:18:19 +03:00
* ( options + + ) = 0 ;
# ifdef __sparc__
if ( ! strcmp ( str , " ttya " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS0 " ) ;
2006-03-24 14:18:19 +03:00
if ( ! strcmp ( str , " ttyb " ) )
2007-07-16 10:37:27 +04:00
strcpy ( buf , " ttyS1 " ) ;
2006-03-24 14:18:19 +03:00
# endif
2007-07-16 10:37:27 +04:00
for ( s = buf ; * s ; s + + )
2014-08-07 03:09:08 +04:00
if ( isdigit ( * s ) | | * s = = ' , ' )
2006-03-24 14:18:19 +03:00
break ;
idx = simple_strtoul ( s , NULL , 10 ) ;
* s = 0 ;
2008-04-30 11:54:51 +04:00
__add_preferred_console ( buf , idx , options , brl_options ) ;
xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console). This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).
This is okay as long tty is a useful console. But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy. In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7
did that for us.
Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled. But even then we still want to
enable the Xen console as well.
Problem: when tty registers, we can't yet know whether the PVFB is
enabled. By the time we can know (xenstore is up), the console setup
game is over.
Solution: enable console tty by default, but keep hvc as the preferred
console. Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 02:31:07 +04:00
console_set_on_cmdline = 1 ;
2006-03-24 14:18:19 +03:00
return 1 ;
}
__setup ( " console= " , console_setup ) ;
2005-05-17 08:53:47 +04:00
/**
* add_preferred_console - add a device to the list of preferred consoles .
2005-11-14 03:08:14 +03:00
* @ name : device name
* @ idx : device index
* @ options : options for this console
2005-05-17 08:53:47 +04:00
*
* The last preferred console added will be used for kernel messages
* and stdin / out / err for init . Normally this is used by console_setup
* above to handle user - supplied console arguments ; however it can also
* be used by arch - specific code either to override the user or more
* commonly to provide a default console ( ie from PROM variables ) when
* the user has not supplied one .
*/
2007-12-29 12:19:49 +03:00
int add_preferred_console ( char * name , int idx , char * options )
2005-05-17 08:53:47 +04:00
{
2008-04-30 11:54:51 +04:00
return __add_preferred_console ( name , idx , options , NULL ) ;
2005-05-17 08:53:47 +04:00
}
2014-08-07 03:09:12 +04:00
bool console_suspend_enabled = true ;
2007-10-18 14:04:50 +04:00
EXPORT_SYMBOL ( console_suspend_enabled ) ;
static int __init console_suspend_disable ( char * str )
{
2014-08-07 03:09:12 +04:00
console_suspend_enabled = false ;
2007-10-18 14:04:50 +04:00
return 1 ;
}
__setup ( " no_console_suspend " , console_suspend_disable ) ;
2011-11-01 04:11:27 +04:00
module_param_named ( console_suspend , console_suspend_enabled ,
bool , S_IRUGO | S_IWUSR ) ;
MODULE_PARM_DESC ( console_suspend , " suspend console during suspend "
" and hibernate operations " ) ;
2007-10-18 14:04:50 +04:00
2006-06-20 05:16:01 +04:00
/**
* suspend_console - suspend the console subsystem
*
* This disables printk ( ) while we go into suspend states
*/
void suspend_console ( void )
{
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2008-07-24 08:28:32 +04:00
printk ( " Suspending console(s) (use no_console_suspend to debug) \n " ) ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2006-06-20 05:16:01 +04:00
console_suspended = 1 ;
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
}
void resume_console ( void )
{
2007-10-18 14:04:50 +04:00
if ( ! console_suspend_enabled )
return ;
2014-06-05 03:11:36 +04:00
down_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
console_suspended = 0 ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2006-06-20 05:16:01 +04:00
}
2010-06-04 09:11:25 +04:00
/**
* console_cpu_notify - print deferred console messages after CPU hotplug
* @ self : notifier struct
* @ action : CPU hotplug event
* @ hcpu : unused
*
* If printk ( ) is called from a CPU that is not online yet , the messages
* will be spooled but will not show up on the console . This function is
* called when a new CPU comes online ( or fails to come up ) , and ensures
* that any such output gets printed .
*/
2013-06-19 22:53:51 +04:00
static int console_cpu_notify ( struct notifier_block * self ,
2010-06-04 09:11:25 +04:00
unsigned long action , void * hcpu )
{
switch ( action ) {
case CPU_ONLINE :
case CPU_DEAD :
case CPU_DOWN_FAILED :
case CPU_UP_CANCELED :
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
console_unlock ( ) ;
2010-06-04 09:11:25 +04:00
}
return NOTIFY_OK ;
}
2005-04-17 02:20:36 +04:00
/**
2011-01-26 02:07:35 +03:00
* console_lock - lock the console system for exclusive use .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* Acquires a lock which guarantees that the caller has
2005-04-17 02:20:36 +04:00
* exclusive access to the console system and the console_drivers list .
*
* Can sleep , returns nothing .
*/
2011-01-26 02:07:35 +03:00
void console_lock ( void )
2005-04-17 02:20:36 +04:00
{
2012-09-18 03:03:31 +04:00
might_sleep ( ) ;
2014-06-05 03:11:36 +04:00
down_console_sem ( ) ;
2009-02-14 04:07:24 +03:00
if ( console_suspended )
return ;
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
console_may_schedule = 1 ;
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_lock ) ;
2005-04-17 02:20:36 +04:00
2011-01-26 02:07:35 +03:00
/**
* console_trylock - try to lock the console system for exclusive use .
*
2014-08-07 03:09:03 +04:00
* Try to acquire a lock which guarantees that the caller has exclusive
* access to the console system and the console_drivers list .
2011-01-26 02:07:35 +03:00
*
* returns 1 on success , and 0 on failure to acquire the lock .
*/
int console_trylock ( void )
2005-04-17 02:20:36 +04:00
{
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) )
2011-01-26 02:07:35 +03:00
return 0 ;
2009-02-14 04:07:24 +03:00
if ( console_suspended ) {
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2011-01-26 02:07:35 +03:00
return 0 ;
2009-02-14 04:07:24 +03:00
}
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
2016-03-18 00:21:23 +03:00
/*
* When PREEMPT_COUNT disabled we can ' t reliably detect if it ' s
* safe to schedule ( e . g . calling printk while holding a spin_lock ) ,
* because preempt_disable ( ) / preempt_enable ( ) are just barriers there
* and preempt_count ( ) is always 0.
*
* RCU read sections have a separate preemption counter when
* PREEMPT_RCU enabled thus we must take extra care and check
* rcu_preempt_depth ( ) , otherwise RCU read sections modify
* preempt_count ( ) .
*/
console_may_schedule = ! oops_in_progress & &
preemptible ( ) & &
! rcu_preempt_depth ( ) ;
2011-01-26 02:07:35 +03:00
return 1 ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_trylock ) ;
2005-04-17 02:20:36 +04:00
int is_console_locked ( void )
{
return console_locked ;
}
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
/*
* Check if we have any console that is capable of printing while cpu is
* booting or shutting down . Requires console_sem .
*/
static int have_callable_console ( void )
{
struct console * con ;
for_each_console ( con )
2016-03-18 00:21:27 +03:00
if ( ( con - > flags & CON_ENABLED ) & &
( con - > flags & CON_ANYTIME ) )
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
return 1 ;
return 0 ;
}
/*
* Can we actually use the console at this time on this cpu ?
*
* Console drivers may assume that per - cpu resources have been allocated . So
* unless they ' re explicitly marked as being able to cope ( CON_ANYTIME ) don ' t
* call them until this CPU is officially up .
*/
static inline int can_use_console ( void )
{
return cpu_online ( raw_smp_processor_id ( ) ) | | have_callable_console ( ) ;
}
2012-07-17 05:35:30 +04:00
static void console_cont_flush ( char * text , size_t size )
{
unsigned long flags ;
size_t len ;
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
if ( ! cont . len )
goto out ;
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
if ( suppress_message_printing ( cont . level ) ) {
cont . cons = cont . len ;
if ( cont . flushed )
cont . len = 0 ;
goto out ;
}
2012-07-17 05:35:30 +04:00
/*
* We still queue earlier records , likely because the console was
* busy . The earlier ones need to be printed before this one , we
* did not flush any fragment so far , so just let it queue up .
*/
if ( console_seq < log_next_seq & & ! cont . cons )
goto out ;
len = cont_print_text ( text , size ) ;
raw_spin_unlock ( & logbuf_lock ) ;
stop_critical_timings ( ) ;
2015-06-26 01:01:30 +03:00
call_console_drivers ( cont . level , NULL , 0 , text , len ) ;
2012-07-17 05:35:30 +04:00
start_critical_timings ( ) ;
local_irq_restore ( flags ) ;
return ;
out :
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
}
2012-05-03 04:29:13 +04:00
2005-04-17 02:20:36 +04:00
/**
2011-01-26 02:07:35 +03:00
* console_unlock - unlock the console system
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* Releases the console_lock which the caller holds on the console system
2005-04-17 02:20:36 +04:00
* and the console driver list .
*
2011-01-26 02:07:35 +03:00
* While the console_lock was held , console output may have been buffered
* by printk ( ) . If this is the case , console_unlock ( ) ; emits
* the output prior to releasing the lock .
2005-04-17 02:20:36 +04:00
*
2012-05-09 03:37:51 +04:00
* If there is output waiting , we wake / dev / kmsg and syslog ( ) users .
2005-04-17 02:20:36 +04:00
*
2011-01-26 02:07:35 +03:00
* console_unlock ( ) ; may be called from any context .
2005-04-17 02:20:36 +04:00
*/
2011-01-26 02:07:35 +03:00
void console_unlock ( void )
2005-04-17 02:20:36 +04:00
{
2015-06-26 01:01:30 +03:00
static char ext_text [ CONSOLE_EXT_LOG_MAX ] ;
2012-07-17 05:35:29 +04:00
static char text [ LOG_LINE_MAX + PREFIX_MAX ] ;
2012-05-03 04:29:13 +04:00
static u64 seen_seq ;
2005-04-17 02:20:36 +04:00
unsigned long flags ;
2012-05-03 04:29:13 +04:00
bool wake_klogd = false ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
bool do_cond_resched , retry ;
2005-04-17 02:20:36 +04:00
2006-06-20 05:16:01 +04:00
if ( console_suspended ) {
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2006-06-20 05:16:01 +04:00
return ;
}
2006-08-05 23:14:16 +04:00
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/*
* Console drivers are called under logbuf_lock , so
* @ console_may_schedule should be cleared before ; however , we may
* end up dumping a lot of lines , for example , if called from
* console registration path , and should invoke cond_resched ( )
* between lines if allowable . Not doing so can cause a very long
* scheduling stall on a slow console leading to RCU stall and
* softlockup warnings which exacerbate the issue with more
* messages practically incapacitating the system .
*/
do_cond_resched = console_may_schedule ;
2006-08-05 23:14:16 +04:00
console_may_schedule = 0 ;
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
again :
/*
* We released the console_sem lock , so we need to recheck if
* cpu is online and ( if not ) is there at least one CON_ANYTIME
* console .
*/
if ( ! can_use_console ( ) ) {
console_locked = 0 ;
up_console_sem ( ) ;
return ;
}
2012-06-28 11:38:53 +04:00
/* flush buffered message fragment immediately to console */
2012-07-17 05:35:30 +04:00
console_cont_flush ( text , sizeof ( text ) ) ;
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:21:20 +03:00
2012-05-03 04:29:13 +04:00
for ( ; ; ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2015-06-26 01:01:30 +03:00
size_t ext_len = 0 ;
2012-05-14 01:30:46 +04:00
size_t len ;
2012-05-03 04:29:13 +04:00
int level ;
2009-07-25 19:50:36 +04:00
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
2012-05-03 04:29:13 +04:00
if ( seen_seq ! = log_next_seq ) {
wake_klogd = true ;
seen_seq = log_next_seq ;
}
if ( console_seq < log_first_seq ) {
2014-06-05 03:11:45 +04:00
len = sprintf ( text , " ** %u printk messages dropped ** " ,
( unsigned ) ( log_first_seq - console_seq ) ) ;
2012-05-03 04:29:13 +04:00
/* messages are gone, move to first one */
console_seq = log_first_seq ;
console_idx = log_first_idx ;
2012-07-09 23:15:42 +04:00
console_prev = 0 ;
2014-06-05 03:11:45 +04:00
} else {
len = 0 ;
2012-05-03 04:29:13 +04:00
}
2012-06-28 11:38:53 +04:00
skip :
2012-05-03 04:29:13 +04:00
if ( console_seq = = log_next_seq )
break ;
msg = log_from_idx ( console_idx ) ;
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
level = msg - > level ;
if ( ( msg - > flags & LOG_NOCONS ) | |
suppress_message_printing ( level ) ) {
2012-06-28 11:38:53 +04:00
/*
* Skip record we have buffered and already printed
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 00:03:56 +03:00
* directly to the console when we received it , and
* record that has level above the console loglevel .
2012-06-28 11:38:53 +04:00
*/
console_idx = log_next ( console_idx ) ;
console_seq + + ;
2012-07-06 20:50:09 +04:00
/*
* We will get here again when we register a new
* CON_PRINTBUFFER console . Clear the flag so we
* will properly dump everything later .
*/
msg - > flags & = ~ LOG_NOCONS ;
2012-07-17 05:35:30 +04:00
console_prev = msg - > flags ;
2012-06-28 11:38:53 +04:00
goto skip ;
}
2012-05-10 06:30:45 +04:00
2014-06-05 03:11:45 +04:00
len + = msg_print_text ( msg , console_prev , false ,
text + len , sizeof ( text ) - len ) ;
2015-06-26 01:01:30 +03:00
if ( nr_ext_console_drivers ) {
ext_len = msg_print_ext_header ( ext_text ,
sizeof ( ext_text ) ,
msg , console_seq , console_prev ) ;
ext_len + = msg_print_ext_body ( ext_text + ext_len ,
sizeof ( ext_text ) - ext_len ,
log_dict ( msg ) , msg - > dict_len ,
log_text ( msg ) , msg - > text_len ) ;
}
2012-05-03 04:29:13 +04:00
console_idx = log_next ( console_idx ) ;
console_seq + + ;
2012-07-09 23:15:42 +04:00
console_prev = msg - > flags ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
2008-05-12 23:20:42 +04:00
stop_critical_timings ( ) ; /* don't trace print latency */
2015-06-26 01:01:30 +03:00
call_console_drivers ( level , ext_text , ext_len , text , len ) ;
2008-05-12 23:20:42 +04:00
start_critical_timings ( ) ;
2005-04-17 02:20:36 +04:00
local_irq_restore ( flags ) ;
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
if ( do_cond_resched )
cond_resched ( ) ;
2005-04-17 02:20:36 +04:00
}
console_locked = 0 ;
2011-03-23 02:34:21 +03:00
/* Release the exclusive_console once it is used */
if ( unlikely ( exclusive_console ) )
exclusive_console = NULL ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock ( & logbuf_lock ) ;
2011-06-22 13:20:09 +04:00
2014-06-05 03:11:36 +04:00
up_console_sem ( ) ;
2011-06-22 13:20:09 +04:00
/*
* Someone could have filled up the buffer again , so re - check if there ' s
* something to flush . In case we cannot trylock the console_sem again ,
* there ' s a new owner and the console_unlock ( ) from them will do the
* flush , no worries .
*/
2009-07-25 19:50:36 +04:00
raw_spin_lock ( & logbuf_lock ) ;
2012-05-03 04:29:13 +04:00
retry = console_seq ! = log_next_seq ;
2011-12-09 02:34:13 +04:00
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
2011-06-22 13:20:09 +04:00
if ( retry & & console_trylock ( ) )
goto again ;
2007-02-10 12:46:19 +03:00
if ( wake_klogd )
wake_up_klogd ( ) ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
EXPORT_SYMBOL ( console_unlock ) ;
2005-04-17 02:20:36 +04:00
2005-11-14 03:08:14 +03:00
/**
* console_conditional_schedule - yield the CPU if required
2005-04-17 02:20:36 +04:00
*
* If the console code is currently allowed to sleep , and
* if this CPU should yield the CPU to another task , do
* so here .
*
2011-01-26 02:07:35 +03:00
* Must be called within console_lock ( ) ; .
2005-04-17 02:20:36 +04:00
*/
void __sched console_conditional_schedule ( void )
{
if ( console_may_schedule )
cond_resched ( ) ;
}
EXPORT_SYMBOL ( console_conditional_schedule ) ;
void console_unblank ( void )
{
struct console * c ;
/*
* console_unblank can no longer be called in interrupt context unless
* oops_in_progress is set to 1. .
*/
if ( oops_in_progress ) {
2014-06-05 03:11:36 +04:00
if ( down_trylock_console_sem ( ) ! = 0 )
2005-04-17 02:20:36 +04:00
return ;
} else
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console_locked = 1 ;
console_may_schedule = 0 ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
for_each_console ( c )
2005-04-17 02:20:36 +04:00
if ( ( c - > flags & CON_ENABLED ) & & c - > unblank )
c - > unblank ( ) ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
printk: do cond_resched() between lines while outputting to consoles
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 03:58:24 +03:00
/**
* console_flush_on_panic - flush console content on panic
*
* Immediately output all pending messages no matter what .
*/
void console_flush_on_panic ( void )
{
/*
* If someone else is holding the console lock , trylock will fail
* and may_schedule may be set . Ignore and proceed to unlock so
* that messages are flushed out . As this can be called from any
* context and we don ' t want to get preempted while flushing ,
* ensure may_schedule is cleared .
*/
console_trylock ( ) ;
console_may_schedule = 0 ;
console_unlock ( ) ;
}
2005-04-17 02:20:36 +04:00
/*
* Return the console tty driver structure and its associated index
*/
struct tty_driver * console_device ( int * index )
{
struct console * c ;
struct tty_driver * driver = NULL ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
for_each_console ( c ) {
2005-04-17 02:20:36 +04:00
if ( ! c - > device )
continue ;
driver = c - > device ( c , index ) ;
if ( driver )
break ;
}
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
return driver ;
}
/*
* Prevent further output on the passed console device so that ( for example )
* serial drivers can disable console output before suspending a port , and can
* re - enable output afterwards .
*/
void console_stop ( struct console * console )
{
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console - > flags & = ~ CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_stop ) ;
void console_start ( struct console * console )
{
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
console - > flags | = CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( console_start ) ;
2011-03-23 02:34:20 +03:00
static int __read_mostly keep_bootcon ;
static int __init keep_bootcon_setup ( char * str )
{
keep_bootcon = 1 ;
2013-11-13 03:08:50 +04:00
pr_info ( " debug: skip boot console de-registration. \n " ) ;
2011-03-23 02:34:20 +03:00
return 0 ;
}
early_param ( " keep_bootcon " , keep_bootcon_setup ) ;
2005-04-17 02:20:36 +04:00
/*
* The console driver calls this routine during kernel initialization
* to register the console printing procedure with printk ( ) and to
* print any messages that were printed by the kernel before the
* console driver was initialized .
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
*
* This can happen pretty early during the boot process ( because of
* early_printk ) - sometimes before setup_arch ( ) completes - be careful
* of what kernel features are used - they may not be initialised yet .
*
* There are two types of consoles - bootconsoles ( early_printk ) and
* " real " consoles ( everything which is not a bootconsole ) which are
* handled differently .
* - Any number of bootconsoles can be registered at any time .
* - As soon as a " real " console is registered , all bootconsoles
* will be unregistered automatically .
* - Once a " real " console is registered , any attempt to register a
* bootconsoles will be rejected
2005-04-17 02:20:36 +04:00
*/
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
void register_console ( struct console * newcon )
2005-04-17 02:20:36 +04:00
{
2005-10-31 02:02:46 +03:00
int i ;
2005-04-17 02:20:36 +04:00
unsigned long flags ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
struct console * bcon = NULL ;
2013-08-01 00:53:46 +04:00
struct console_cmdline * c ;
2005-04-17 02:20:36 +04:00
2013-08-02 14:23:34 +04:00
if ( console_drivers )
for_each_console ( bcon )
if ( WARN ( bcon = = newcon ,
" console '%s%d' already registered \n " ,
bcon - > name , bcon - > index ) )
return ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
/*
* before we register a new CON_BOOT console , make sure we don ' t
* already have a valid console
*/
if ( console_drivers & & newcon - > flags & CON_BOOT ) {
/* find the last or real console */
for_each_console ( bcon ) {
if ( ! ( bcon - > flags & CON_BOOT ) ) {
2013-11-13 03:08:50 +04:00
pr_info ( " Too late to register bootconsole %s%d \n " ,
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > name , newcon - > index ) ;
return ;
}
}
2007-05-08 11:26:49 +04:00
}
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( console_drivers & & console_drivers - > flags & CON_BOOT )
bcon = console_drivers ;
if ( preferred_console < 0 | | bcon | | ! console_drivers )
2005-04-17 02:20:36 +04:00
preferred_console = selected_console ;
/*
* See if we want to use this console driver . If we
* didn ' t select a console we take the first one
* that registers here .
*/
if ( preferred_console < 0 ) {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( newcon - > index < 0 )
newcon - > index = 0 ;
if ( newcon - > setup = = NULL | |
newcon - > setup ( newcon , NULL ) = = 0 ) {
newcon - > flags | = CON_ENABLED ;
if ( newcon - > device ) {
newcon - > flags | = CON_CONSDEV ;
2008-05-12 23:21:04 +04:00
preferred_console = 0 ;
}
2005-04-17 02:20:36 +04:00
}
}
/*
* See if this console matches one we selected on
* the command line .
*/
2013-08-01 00:53:46 +04:00
for ( i = 0 , c = console_cmdline ;
i < MAX_CMDLINECONSOLES & & c - > name [ 0 ] ;
i + + , c + + ) {
2015-03-09 23:27:12 +03:00
if ( ! newcon - > match | |
newcon - > match ( newcon , c - > name , c - > index , c - > options ) ! = 0 ) {
/* default matching */
BUILD_BUG_ON ( sizeof ( c - > name ) ! = sizeof ( newcon - > name ) ) ;
if ( strcmp ( c - > name , newcon - > name ) ! = 0 )
continue ;
if ( newcon - > index > = 0 & &
newcon - > index ! = c - > index )
continue ;
if ( newcon - > index < 0 )
newcon - > index = c - > index ;
2013-08-01 00:53:45 +04:00
2015-03-09 23:27:12 +03:00
if ( _braille_register_console ( newcon , c ) )
return ;
if ( newcon - > setup & &
newcon - > setup ( newcon , c - > options ) ! = 0 )
break ;
}
2013-08-01 00:53:45 +04:00
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags | = CON_ENABLED ;
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
if ( i = = selected_console ) {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags | = CON_CONSDEV ;
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
preferred_console = selected_console ;
}
2005-04-17 02:20:36 +04:00
break ;
}
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( ! ( newcon - > flags & CON_ENABLED ) )
2005-04-17 02:20:36 +04:00
return ;
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* If we have a bootconsole , and are switching to a real console ,
* don ' t print everything out again , since when the boot console , and
* the real console are the same physical device , it ' s annoying to
* see the beginning boot messages twice
*/
if ( bcon & & ( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) )
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > flags & = ~ CON_PRINTBUFFER ;
2005-04-17 02:20:36 +04:00
/*
* Put this console in the list - keep the
* preferred driver at the head of the list .
*/
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( ( newcon - > flags & CON_CONSDEV ) | | console_drivers = = NULL ) {
newcon - > next = console_drivers ;
console_drivers = newcon ;
if ( newcon - > next )
newcon - > next - > flags & = ~ CON_CONSDEV ;
2005-04-17 02:20:36 +04:00
} else {
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
newcon - > next = console_drivers - > next ;
console_drivers - > next = newcon ;
2005-04-17 02:20:36 +04:00
}
2015-06-26 01:01:30 +03:00
if ( newcon - > flags & CON_EXTENDED )
if ( ! nr_ext_console_drivers + + )
pr_info ( " printk: continuation disabled due to ext consoles, expect more fragments in /dev/kmsg \n " ) ;
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
if ( newcon - > flags & CON_PRINTBUFFER ) {
2005-04-17 02:20:36 +04:00
/*
2011-01-26 02:07:35 +03:00
* console_unlock ( ) ; will print out the buffered messages
2005-04-17 02:20:36 +04:00
* for us .
*/
2009-07-25 19:50:36 +04:00
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
2012-05-03 04:29:13 +04:00
console_seq = syslog_seq ;
console_idx = syslog_idx ;
2012-07-09 23:15:42 +04:00
console_prev = syslog_prev ;
2009-07-25 19:50:36 +04:00
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
2011-03-23 02:34:21 +03:00
/*
* We ' re about to replay the log buffer . Only do this to the
* just - registered console to avoid excessive message spam to
* the already - registered consoles .
*/
exclusive_console = newcon ;
2005-04-17 02:20:36 +04:00
}
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
/*
* By unregistering the bootconsoles after we enable the real console
* we get the " console xxx enabled " message on all the consoles -
* boot consoles , real consoles , etc - this is to ensure that end
* users know there might be something in the kernel ' s log buffer that
* went to the bootconsole ( that they do not see on the real console )
*/
2013-11-13 03:08:50 +04:00
pr_info ( " %sconsole [%s%d] enabled \n " ,
2013-11-13 03:08:49 +04:00
( newcon - > flags & CON_BOOT ) ? " boot " : " " ,
newcon - > name , newcon - > index ) ;
2011-03-23 02:34:20 +03:00
if ( bcon & &
( ( newcon - > flags & ( CON_CONSDEV | CON_BOOT ) ) = = CON_CONSDEV ) & &
! keep_bootcon ) {
2013-11-13 03:08:49 +04:00
/* We need to iterate through all boot consoles, to make
* sure we print everything out , before we unregister them .
printk: Ensure that "console enabled" messages are printed on the console
Today, when a console is registered without CON_PRINTBUFFER,
end users never see the announcement of it being added, and
never know if they missed something, if the console is really
at the start or not, and just leads to general confusion.
This re-orders existing code, to make sure the console is
added, before the "console [%s%d] enabled" is printed out -
ensuring that this message is _always_ seen.
This has the desired/intended side effect of making sure that
"console enabled:" messages are printed on the bootconsole, and
the real console. This does cause the same line is printed
twice if the bootconsole and real console are the same device,
but if they are on different devices, the message is printed to
both consoles.
Signed-off-by : Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
LKML-Reference: <200907091308.37370.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-09 21:08:37 +04:00
*/
for_each_console ( bcon )
if ( bcon - > flags & CON_BOOT )
unregister_console ( bcon ) ;
}
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( register_console ) ;
2005-10-31 02:02:46 +03:00
int unregister_console ( struct console * console )
2005-04-17 02:20:36 +04:00
{
2005-10-31 02:02:46 +03:00
struct console * a , * b ;
2013-08-01 00:53:45 +04:00
int res ;
2005-04-17 02:20:36 +04:00
2013-11-13 03:08:50 +04:00
pr_info ( " %sconsole [%s%d] disabled \n " ,
2013-11-13 03:08:49 +04:00
( console - > flags & CON_BOOT ) ? " boot " : " " ,
console - > name , console - > index ) ;
2013-08-01 00:53:45 +04:00
res = _braille_unregister_console ( console ) ;
if ( res )
return res ;
2008-04-30 11:54:51 +04:00
2013-08-01 00:53:45 +04:00
res = 1 ;
2011-01-26 02:07:35 +03:00
console_lock ( ) ;
2005-04-17 02:20:36 +04:00
if ( console_drivers = = console ) {
console_drivers = console - > next ;
res = 0 ;
2005-11-24 00:37:44 +03:00
} else if ( console_drivers ) {
2005-04-17 02:20:36 +04:00
for ( a = console_drivers - > next , b = console_drivers ;
a ; b = a , a = b - > next ) {
if ( a = = console ) {
b - > next = a - > next ;
res = 0 ;
break ;
2005-10-31 02:02:46 +03:00
}
2005-04-17 02:20:36 +04:00
}
}
2005-10-31 02:02:46 +03:00
2015-06-26 01:01:30 +03:00
if ( ! res & & ( console - > flags & CON_EXTENDED ) )
nr_ext_console_drivers - - ;
2007-05-08 11:26:49 +04:00
/*
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
* If this isn ' t the last console and it has CON_CONSDEV set , we
* need to set it on the next preferred console .
2005-04-17 02:20:36 +04:00
*/
2007-05-08 11:26:49 +04:00
if ( console_drivers ! = NULL & & console - > flags & CON_CONSDEV )
[PATCH] CON_CONSDEV bit not set correctly on last console
According to include/linux/console.h, CON_CONSDEV flag should be set on
the last console specified on the boot command line:
86 #define CON_PRINTBUFFER (1)
87 #define CON_CONSDEV (2) /* Last on the command line */
88 #define CON_ENABLED (4)
89 #define CON_BOOT (8)
This does not currently happen if there is more than one console specified
on the boot commandline. Instead, it gets set on the first console on the
command line. This can cause problems for things like kdb that look for
the CON_CONSDEV flag to see if the console is valid.
Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
preferred console at unregister time if the console being unregistered
currently has that bit set.
Example (from sn2 ia64):
elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
in this case, the flags on ttySG console struct will be 0x4 (should be
0x6).
Attached patch against bk fixes both issues for the cases I looked at. It
uses selected_console (which gets incremented for each console specified on
the command line) as the indicator of which console to set CON_CONSDEV on.
When adding the console to the list, if the previous one had CON_CONSDEV
set, it masks it out. Tested on ia64 and x86.
The problem with the current behavior is it breaks overriding the default from
the boot line. In the ia64 case, there may be a global append line defining
console=a in elilo.conf. Then you want to boot your kernel, and want to
override the default by passing console=b on the boot line. elilo constructs
the kernel cmdline by starting with the value of the global append line, then
tacks on whatever else you specify, which puts console=b last.
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 11:09:05 +04:00
console_drivers - > flags | = CON_CONSDEV ;
2005-04-17 02:20:36 +04:00
2014-05-14 02:04:39 +04:00
console - > flags & = ~ CON_ENABLED ;
2011-01-26 02:07:35 +03:00
console_unlock ( ) ;
2010-12-01 20:51:05 +03:00
console_sysfs_notify ( ) ;
2005-04-17 02:20:36 +04:00
return res ;
}
EXPORT_SYMBOL ( unregister_console ) ;
2005-05-01 19:59:02 +04:00
2016-01-16 03:58:21 +03:00
/*
* Some boot consoles access data that is in the init section and which will
* be discarded after the initcalls have been run . To make sure that no code
* will access this data , unregister the boot consoles in a late initcall .
*
* If for some reason , such as deferred probe or the driver being a loadable
* module , the real console hasn ' t registered yet at this point , there will
* be a brief interval in which no messages are logged to the console , which
* makes it difficult to diagnose problems that occur during this time .
*
* To mitigate this problem somewhat , only unregister consoles whose memory
* intersects with the init section . Note that code exists elsewhere to get
* rid of the boot console as soon as the proper console shows up , so there
* won ' t be side - effects from postponing the removal .
*/
2010-06-04 09:11:25 +04:00
static int __init printk_late_init ( void )
2007-08-20 23:22:47 +04:00
{
printk: Enable the use of more than one CON_BOOT (early console)
Today, register_console() assumes the following usage:
- The first console to register with a flag set to CON_BOOT
is the one and only bootconsole.
- If another register_console() is called with an additional
CON_BOOT, it is silently rejected.
- As soon as a console without the CON_BOOT set calls
registers the bootconsole is automatically unregistered.
- Once there is a "real" console - register_console() will
silently reject any consoles with it's CON_BOOT flag set.
In many systems (alpha, blackfin, microblaze, mips, powerpc,
sh, & x86), there are early_printk implementations, which use
the CON_BOOT which come out serial ports, vga, usb, & memory
buffers.
In many embedded systems, it would be nice to have two
bootconsoles - in case the primary fails, you always have
access to a backup memory buffer - but this requires at least
two CON_BOOT consoles...
This patch enables that functionality.
With the change applied, on boot you get (if you try to
re-enable a boot console after the "real" console has been
registered):
root:/> dmesg | grep console
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
Kernel command line: root=/dev/mtdblock0 rw earlyprintk=serial,uart0,57600 console=ttyBF0,57600 nmi_debug=regs
console handover:boot [early_BFuart0] boot [early_shadow0] -> real [ttyBF0]
Too late to register bootconsole early_shadow0
or:
root:/> dmesg | grep console
Kernel command line: root=/dev/mtdblock0 rw console=ttyBF0,57600
console [ttyBF0] enabled
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Cc: "Paul Mundt" <lethal@linux-sh.org>
LKML-Reference: <200907012108.38030.rgetz@blackfin.uclinux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 05:08:37 +04:00
struct console * con ;
for_each_console ( con ) {
2011-08-26 02:59:11 +04:00
if ( ! keep_bootcon & & con - > flags & CON_BOOT ) {
2016-01-16 03:58:21 +03:00
/*
* Make sure to unregister boot consoles whose data
* resides in the init section before the init section
* is discarded . Boot consoles whose data will stick
* around will automatically be unregistered when the
* proper console replaces them .
*/
if ( init_section_intersects ( con , sizeof ( * con ) ) )
unregister_console ( con ) ;
2007-08-22 07:14:58 +04:00
}
2007-08-20 23:22:47 +04:00
}
2010-06-04 09:11:25 +04:00
hotcpu_notifier ( console_cpu_notify , 0 ) ;
2007-08-20 23:22:47 +04:00
return 0 ;
}
2010-06-04 09:11:25 +04:00
late_initcall ( printk_late_init ) ;
2007-08-20 23:22:47 +04:00
2008-02-08 15:21:25 +03:00
# if defined CONFIG_PRINTK
2013-03-23 02:04:39 +04:00
/*
* Delayed printk version , for scheduler - internal messages :
*/
# define PRINTK_PENDING_WAKEUP 0x01
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
# define PRINTK_PENDING_OUTPUT 0x02
2013-03-23 02:04:39 +04:00
static DEFINE_PER_CPU ( int , printk_pending ) ;
static void wake_up_klogd_work_func ( struct irq_work * irq_work )
{
int pending = __this_cpu_xchg ( printk_pending , 0 ) ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
if ( pending & PRINTK_PENDING_OUTPUT ) {
/* If trylock fails, someone else is doing the printing */
if ( console_trylock ( ) )
console_unlock ( ) ;
2013-03-23 02:04:39 +04:00
}
if ( pending & PRINTK_PENDING_WAKEUP )
wake_up_interruptible ( & log_wait ) ;
}
static DEFINE_PER_CPU ( struct irq_work , wake_up_klogd_work ) = {
. func = wake_up_klogd_work_func ,
. flags = IRQ_WORK_LAZY ,
} ;
void wake_up_klogd ( void )
{
preempt_disable ( ) ;
if ( waitqueue_active ( & log_wait ) ) {
this_cpu_or ( printk_pending , PRINTK_PENDING_WAKEUP ) ;
2014-08-17 21:30:24 +04:00
irq_work_queue ( this_cpu_ptr ( & wake_up_klogd_work ) ) ;
2013-03-23 02:04:39 +04:00
}
preempt_enable ( ) ;
}
2008-07-25 12:45:58 +04:00
2014-06-05 03:11:40 +04:00
int printk_deferred ( const char * fmt , . . . )
2012-03-15 15:35:37 +04:00
{
va_list args ;
int r ;
2014-06-05 03:11:39 +04:00
preempt_disable ( ) ;
2012-03-15 15:35:37 +04:00
va_start ( args , fmt ) ;
2014-12-11 02:50:15 +03:00
r = vprintk_emit ( 0 , LOGLEVEL_SCHED , NULL , 0 , fmt , args ) ;
2012-03-15 15:35:37 +04:00
va_end ( args ) ;
printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.
What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.
There's a couple of issues with this approach.
1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.
2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.
In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.
Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters. This must be avoided while
holding the logbuf_lock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 03:11:38 +04:00
__this_cpu_or ( printk_pending , PRINTK_PENDING_OUTPUT ) ;
2014-08-17 21:30:24 +04:00
irq_work_queue ( this_cpu_ptr ( & wake_up_klogd_work ) ) ;
2014-06-05 03:11:39 +04:00
preempt_enable ( ) ;
2012-03-15 15:35:37 +04:00
return r ;
}
2005-04-17 02:20:36 +04:00
/*
* printk rate limiting , lifted from the networking subsystem .
*
2008-07-30 09:33:38 +04:00
* This enforces a rate limit : not more than 10 kernel messages
* every 5 s to make a denial - of - service attack impossible .
2005-04-17 02:20:36 +04:00
*/
2008-07-30 09:33:38 +04:00
DEFINE_RATELIMIT_STATE ( printk_ratelimit_state , 5 * HZ , 10 ) ;
2009-10-23 16:58:11 +04:00
int __printk_ratelimit ( const char * func )
2005-04-17 02:20:36 +04:00
{
2009-10-23 16:58:11 +04:00
return ___ratelimit ( & printk_ratelimit_state , func ) ;
2005-04-17 02:20:36 +04:00
}
2009-10-23 16:58:11 +04:00
EXPORT_SYMBOL ( __printk_ratelimit ) ;
2006-11-03 09:07:16 +03:00
/**
* printk_timed_ratelimit - caller - controlled printk ratelimiting
* @ caller_jiffies : pointer to caller ' s state
* @ interval_msecs : minimum interval between prints
*
* printk_timed_ratelimit ( ) returns true if more than @ interval_msecs
* milliseconds have elapsed since the last time printk_timed_ratelimit ( )
* returned true .
*/
bool printk_timed_ratelimit ( unsigned long * caller_jiffies ,
unsigned int interval_msecs )
{
2014-08-07 03:09:08 +04:00
unsigned long elapsed = jiffies - * caller_jiffies ;
if ( * caller_jiffies & & elapsed < = msecs_to_jiffies ( interval_msecs ) )
return false ;
* caller_jiffies = jiffies ;
return true ;
2006-11-03 09:07:16 +03:00
}
EXPORT_SYMBOL ( printk_timed_ratelimit ) ;
2009-10-16 16:09:18 +04:00
static DEFINE_SPINLOCK ( dump_list_lock ) ;
static LIST_HEAD ( dump_list ) ;
/**
* kmsg_dump_register - register a kernel log dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Adds a kernel log dumper to the system . The dump callback in the
* structure will be called when the kernel oopses or panics and must be
* set . Returns zero on success and % - EINVAL or % - EBUSY otherwise .
*/
int kmsg_dump_register ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EBUSY ;
/* The dump callback needs to be set */
if ( ! dumper - > dump )
return - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
/* Don't allow registering multiple times */
if ( ! dumper - > registered ) {
dumper - > registered = 1 ;
2011-01-13 03:59:43 +03:00
list_add_tail_rcu ( & dumper - > list , & dump_list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_register ) ;
/**
* kmsg_dump_unregister - unregister a kmsg dumper .
2009-12-18 02:27:27 +03:00
* @ dumper : pointer to the kmsg_dumper structure
2009-10-16 16:09:18 +04:00
*
* Removes a dump device from the system . Returns zero on success and
* % - EINVAL otherwise .
*/
int kmsg_dump_unregister ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
int err = - EINVAL ;
spin_lock_irqsave ( & dump_list_lock , flags ) ;
if ( dumper - > registered ) {
dumper - > registered = 0 ;
2011-01-13 03:59:43 +03:00
list_del_rcu ( & dumper - > list ) ;
2009-10-16 16:09:18 +04:00
err = 0 ;
}
spin_unlock_irqrestore ( & dump_list_lock , flags ) ;
2011-01-13 03:59:43 +03:00
synchronize_rcu ( ) ;
2009-10-16 16:09:18 +04:00
return err ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_unregister ) ;
2012-05-03 04:29:13 +04:00
static bool always_kmsg_dump ;
module_param_named ( always_kmsg_dump , always_kmsg_dump , bool , S_IRUGO | S_IWUSR ) ;
2009-10-16 16:09:18 +04:00
/**
* kmsg_dump - dump kernel log to kernel message dumpers .
* @ reason : the reason ( oops , panic etc ) for dumping
*
2012-06-15 16:07:51 +04:00
* Call each of the registered dumper ' s dump ( ) callback , which can
* retrieve the kmsg records with kmsg_dump_get_line ( ) or
* kmsg_dump_get_buffer ( ) .
2009-10-16 16:09:18 +04:00
*/
void kmsg_dump ( enum kmsg_dump_reason reason )
{
struct kmsg_dumper * dumper ;
unsigned long flags ;
2012-03-06 02:59:10 +04:00
if ( ( reason > KMSG_DUMP_OOPS ) & & ! always_kmsg_dump )
return ;
2012-06-15 16:07:51 +04:00
rcu_read_lock ( ) ;
list_for_each_entry_rcu ( dumper , & dump_list , list ) {
if ( dumper - > max_reason & & reason > dumper - > max_reason )
continue ;
/* initialize iterator with data about the stored records */
dumper - > active = true ;
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
dumper - > cur_seq = clear_seq ;
dumper - > cur_idx = clear_idx ;
dumper - > next_seq = log_next_seq ;
dumper - > next_idx = log_next_idx ;
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
/* invoke dumper which will iterate over records */
dumper - > dump ( dumper , reason ) ;
/* reset iterator */
dumper - > active = false ;
}
rcu_read_unlock ( ) ;
}
/**
2012-07-21 04:28:07 +04:00
* kmsg_dump_get_line_nolock - retrieve one kmsg log line ( unlocked version )
2012-06-15 16:07:51 +04:00
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
* @ line : buffer to copy the line to
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the beginning of the kmsg buffer , with the oldest kmsg
* record , and copy one record into the provided buffer .
*
* Consecutive calls will return the next available record moving
* towards the end of the buffer with the youngest messages .
*
* A return value of FALSE indicates that there are no more records to
* read .
2012-07-21 04:28:07 +04:00
*
* The function is similar to kmsg_dump_get_line ( ) , but grabs no locks .
2012-06-15 16:07:51 +04:00
*/
2012-07-21 04:28:07 +04:00
bool kmsg_dump_get_line_nolock ( struct kmsg_dumper * dumper , bool syslog ,
char * line , size_t size , size_t * len )
2012-06-15 16:07:51 +04:00
{
2013-08-01 00:53:47 +04:00
struct printk_log * msg ;
2012-06-15 16:07:51 +04:00
size_t l = 0 ;
bool ret = false ;
if ( ! dumper - > active )
goto out ;
2012-05-03 04:29:13 +04:00
2012-06-15 16:07:51 +04:00
if ( dumper - > cur_seq < log_first_seq ) {
/* messages are gone, move to first available one */
dumper - > cur_seq = log_first_seq ;
dumper - > cur_idx = log_first_idx ;
}
2009-10-16 16:09:18 +04:00
2012-06-15 16:07:51 +04:00
/* last entry */
2012-07-21 04:28:07 +04:00
if ( dumper - > cur_seq > = log_next_seq )
2012-06-15 16:07:51 +04:00
goto out ;
2009-10-16 16:09:18 +04:00
2012-06-15 16:07:51 +04:00
msg = log_from_idx ( dumper - > cur_idx ) ;
2012-07-09 23:15:42 +04:00
l = msg_print_text ( msg , 0 , syslog , line , size ) ;
2012-06-15 16:07:51 +04:00
dumper - > cur_idx = log_next ( dumper - > cur_idx ) ;
dumper - > cur_seq + + ;
ret = true ;
out :
if ( len )
* len = l ;
return ret ;
}
2012-07-21 04:28:07 +04:00
/**
* kmsg_dump_get_line - retrieve one kmsg log line
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
* @ line : buffer to copy the line to
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the beginning of the kmsg buffer , with the oldest kmsg
* record , and copy one record into the provided buffer .
*
* Consecutive calls will return the next available record moving
* towards the end of the buffer with the youngest messages .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
bool kmsg_dump_get_line ( struct kmsg_dumper * dumper , bool syslog ,
char * line , size_t size , size_t * len )
{
unsigned long flags ;
bool ret ;
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
ret = kmsg_dump_get_line_nolock ( dumper , syslog , line , size , len ) ;
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
return ret ;
}
2012-06-15 16:07:51 +04:00
EXPORT_SYMBOL_GPL ( kmsg_dump_get_line ) ;
/**
* kmsg_dump_get_buffer - copy kmsg log lines
* @ dumper : registered kmsg dumper
* @ syslog : include the " <4> " prefixes
2012-07-01 02:37:24 +04:00
* @ buf : buffer to copy the line to
2012-06-15 16:07:51 +04:00
* @ size : maximum size of the buffer
* @ len : length of line placed into buffer
*
* Start at the end of the kmsg buffer and fill the provided buffer
* with as many of the the * youngest * kmsg records that fit into it .
* If the buffer is large enough , all available kmsg records will be
* copied with a single call .
*
* Consecutive calls will fill the buffer with the next block of
* available older records , not including the earlier retrieved ones .
*
* A return value of FALSE indicates that there are no more records to
* read .
*/
bool kmsg_dump_get_buffer ( struct kmsg_dumper * dumper , bool syslog ,
char * buf , size_t size , size_t * len )
{
unsigned long flags ;
u64 seq ;
u32 idx ;
u64 next_seq ;
u32 next_idx ;
2012-07-09 23:15:42 +04:00
enum log_flags prev ;
2012-06-15 16:07:51 +04:00
size_t l = 0 ;
bool ret = false ;
if ( ! dumper - > active )
goto out ;
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
if ( dumper - > cur_seq < log_first_seq ) {
/* messages are gone, move to first available one */
dumper - > cur_seq = log_first_seq ;
dumper - > cur_idx = log_first_idx ;
}
/* last entry */
if ( dumper - > cur_seq > = dumper - > next_seq ) {
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
goto out ;
}
/* calculate length of entire buffer */
seq = dumper - > cur_seq ;
idx = dumper - > cur_idx ;
2012-07-09 23:15:42 +04:00
prev = 0 ;
2012-06-15 16:07:51 +04:00
while ( seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-06-15 16:07:51 +04:00
2012-07-09 23:15:42 +04:00
l + = msg_print_text ( msg , prev , true , NULL , 0 ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
2012-07-09 23:15:42 +04:00
prev = msg - > flags ;
2012-06-15 16:07:51 +04:00
}
/* move first record forward until length fits into the buffer */
seq = dumper - > cur_seq ;
idx = dumper - > cur_idx ;
2012-07-09 23:15:42 +04:00
prev = 0 ;
2012-06-15 16:07:51 +04:00
while ( l > size & & seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2009-10-16 16:09:18 +04:00
2012-07-09 23:15:42 +04:00
l - = msg_print_text ( msg , prev , true , NULL , 0 ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
2012-07-09 23:15:42 +04:00
prev = msg - > flags ;
2009-10-16 16:09:18 +04:00
}
2012-06-15 16:07:51 +04:00
/* last message in next interation */
next_seq = seq ;
next_idx = idx ;
l = 0 ;
while ( seq < dumper - > next_seq ) {
2013-08-01 00:53:47 +04:00
struct printk_log * msg = log_from_idx ( idx ) ;
2012-06-15 16:07:51 +04:00
2012-07-09 23:15:42 +04:00
l + = msg_print_text ( msg , prev , syslog , buf + l , size - l ) ;
2012-06-15 16:07:51 +04:00
idx = log_next ( idx ) ;
seq + + ;
2012-07-09 23:15:42 +04:00
prev = msg - > flags ;
2012-06-15 16:07:51 +04:00
}
dumper - > next_seq = next_seq ;
dumper - > next_idx = next_idx ;
ret = true ;
2012-05-03 04:29:13 +04:00
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
2012-06-15 16:07:51 +04:00
out :
if ( len )
* len = l ;
return ret ;
}
EXPORT_SYMBOL_GPL ( kmsg_dump_get_buffer ) ;
2009-10-16 16:09:18 +04:00
2012-07-21 04:28:07 +04:00
/**
* kmsg_dump_rewind_nolock - reset the interator ( unlocked version )
* @ dumper : registered kmsg dumper
*
* Reset the dumper ' s iterator so that kmsg_dump_get_line ( ) and
* kmsg_dump_get_buffer ( ) can be called again and used multiple
* times within the same dumper . dump ( ) callback .
*
* The function is similar to kmsg_dump_rewind ( ) , but grabs no locks .
*/
void kmsg_dump_rewind_nolock ( struct kmsg_dumper * dumper )
{
dumper - > cur_seq = clear_seq ;
dumper - > cur_idx = clear_idx ;
dumper - > next_seq = log_next_seq ;
dumper - > next_idx = log_next_idx ;
}
2012-06-15 16:07:51 +04:00
/**
* kmsg_dump_rewind - reset the interator
* @ dumper : registered kmsg dumper
*
* Reset the dumper ' s iterator so that kmsg_dump_get_line ( ) and
* kmsg_dump_get_buffer ( ) can be called again and used multiple
* times within the same dumper . dump ( ) callback .
*/
void kmsg_dump_rewind ( struct kmsg_dumper * dumper )
{
unsigned long flags ;
raw_spin_lock_irqsave ( & logbuf_lock , flags ) ;
2012-07-21 04:28:07 +04:00
kmsg_dump_rewind_nolock ( dumper ) ;
2012-06-15 16:07:51 +04:00
raw_spin_unlock_irqrestore ( & logbuf_lock , flags ) ;
2009-10-16 16:09:18 +04:00
}
2012-06-15 16:07:51 +04:00
EXPORT_SYMBOL_GPL ( kmsg_dump_rewind ) ;
2013-05-01 02:27:12 +04:00
2013-05-01 02:27:15 +04:00
static char dump_stack_arch_desc_str [ 128 ] ;
/**
* dump_stack_set_arch_desc - set arch - specific str to show with task dumps
* @ fmt : printf - style format string
* @ . . . : arguments for the format string
*
* The configured string will be printed right after utsname during task
* dumps . Usually used to add arch - specific system identifiers . If an
* arch wants to make use of such an ID string , it should initialize this
* as soon as possible during boot .
*/
void __init dump_stack_set_arch_desc ( const char * fmt , . . . )
{
va_list args ;
va_start ( args , fmt ) ;
vsnprintf ( dump_stack_arch_desc_str , sizeof ( dump_stack_arch_desc_str ) ,
fmt , args ) ;
va_end ( args ) ;
}
2013-05-01 02:27:12 +04:00
/**
* dump_stack_print_info - print generic debug info for dump_stack ( )
* @ log_lvl : log level
*
* Arch - specific dump_stack ( ) implementations can use this function to
* print out the same debug information as the generic dump_stack ( ) .
*/
void dump_stack_print_info ( const char * log_lvl )
{
printk ( " %sCPU: %d PID: %d Comm: %.20s %s %s %.*s \n " ,
log_lvl , raw_smp_processor_id ( ) , current - > pid , current - > comm ,
print_tainted ( ) , init_utsname ( ) - > release ,
( int ) strcspn ( init_utsname ( ) - > version , " " ) ,
init_utsname ( ) - > version ) ;
2013-05-01 02:27:15 +04:00
if ( dump_stack_arch_desc_str [ 0 ] ! = ' \0 ' )
printk ( " %sHardware name: %s \n " ,
log_lvl , dump_stack_arch_desc_str ) ;
2013-05-01 02:27:22 +04:00
print_worker_info ( log_lvl , current ) ;
2013-05-01 02:27:12 +04:00
}
dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 02:27:17 +04:00
/**
* show_regs_print_info - print generic debug info for show_regs ( )
* @ log_lvl : log level
*
* show_regs ( ) implementations can use this function to print out generic
* debug information .
*/
void show_regs_print_info ( const char * log_lvl )
{
dump_stack_print_info ( log_lvl ) ;
2016-07-29 01:48:23 +03:00
printk ( " %stask: %p task.stack: %p \n " ,
log_lvl , current , task_stack_page ( current ) ) ;
dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 02:27:17 +04:00
}
2008-02-08 15:21:25 +03:00
# endif