47bece87b1
Add NMI sourcing functionality (Can only be active if nmi_watchdog is inactive). Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
85 lines
3.2 KiB
Plaintext
85 lines
3.2 KiB
Plaintext
Last reviewed: 06/02/2009
|
|
|
|
HP iLO2 NMI Watchdog Driver
|
|
NMI sourcing for iLO2 based ProLiant Servers
|
|
Documentation and Driver by
|
|
Thomas Mingarelli <thomas.mingarelli@hp.com>
|
|
|
|
The HP iLO2 NMI Watchdog driver is a kernel module that provides basic
|
|
watchdog functionality and the added benefit of NMI sourcing. Both the
|
|
watchdog functionality and the NMI sourcing capability need to be enabled
|
|
by the user. Remember that the two modes are not dependant on one another.
|
|
A user can have the NMI sourcing without the watchdog timer and vice-versa.
|
|
|
|
Watchdog functionality is enabled like any other common watchdog driver. That
|
|
is, an application needs to be started that kicks off the watchdog timer. A
|
|
basic application exists in the Documentation/watchdog/src directory called
|
|
watchdog-test.c. Simply compile the C file and kick it off. If the system
|
|
gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will
|
|
not be updated in a timely fashion and a hardware system reset (also known as
|
|
an Automatic Server Recovery (ASR)) event will occur.
|
|
|
|
The hpwdt driver also has three (3) module parameters. They are the following:
|
|
|
|
soft_margin - allows the user to set the watchdog timer value
|
|
allow_kdump - allows the user to save off a kernel dump image after an NMI
|
|
nowayout - basic watchdog parameter that does not allow the timer to
|
|
be restarted or an impending ASR to be escaped.
|
|
|
|
NOTE: More information about watchdog drivers in general, including the ioctl
|
|
interface to /dev/watchdog can be found in
|
|
Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
|
|
|
|
The NMI sourcing capability is disabled when the driver discovers that the
|
|
nmi_watchdog is turned on (nmi_watchdog = 1). This is due to the inability to
|
|
distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
|
|
Linux kernel. What this means is that the hpwdt nmi handler code is called
|
|
each time the NMI signal fires off. This could amount to several thousands of
|
|
NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
|
|
confused" message in the logs or if the system gets into a hung state, then
|
|
the user should reboot with nmi_watchdog=0.
|
|
|
|
1. If the kernel has not been booted with nmi_watchdog turned off then
|
|
edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the
|
|
currently booting kernel line.
|
|
2. reboot the sever
|
|
|
|
Now, the hpwdt can successfully receive and source the NMI and provide a log
|
|
message that details the reason for the NMI (as determined by the HP BIOS).
|
|
|
|
Below is a list of NMIs the HP BIOS understands along with the associated
|
|
code (reason):
|
|
|
|
No source found 00h
|
|
|
|
Uncorrectable Memory Error 01h
|
|
|
|
ASR NMI 1Bh
|
|
|
|
PCI Parity Error 20h
|
|
|
|
NMI Button Press 27h
|
|
|
|
SB_BUS_NMI 28h
|
|
|
|
ILO Doorbell NMI 29h
|
|
|
|
ILO IOP NMI 2Ah
|
|
|
|
ILO Watchdog NMI 2Bh
|
|
|
|
Proc Throt NMI 2Ch
|
|
|
|
Front Side Bus NMI 2Dh
|
|
|
|
PCI Express Error 2Fh
|
|
|
|
DMA controller NMI 30h
|
|
|
|
Hypertransport/CSI Error 31h
|
|
|
|
|
|
|
|
-- Tom Mingarelli
|
|
(thomas.mingarelli@hp.com)
|