18bd1963ae
Remove references to deprecated features like NMI sourcing and obsoleted module parameters. Add details concerning new module parameter pretimeout and tips to programming it. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
67 lines
3.0 KiB
Plaintext
67 lines
3.0 KiB
Plaintext
Last reviewed: 08/20/2018
|
|
|
|
HPE iLO NMI Watchdog Driver
|
|
for iLO based ProLiant Servers
|
|
|
|
The HPE iLO NMI Watchdog driver is a kernel module that provides basic
|
|
watchdog functionality and handler for the iLO "Generate NMI to System"
|
|
virtual button.
|
|
|
|
All references to iLO in this document imply it also works on iLO2 and all
|
|
subsequent generations.
|
|
|
|
Watchdog functionality is enabled like any other common watchdog driver. That
|
|
is, an application needs to be started that kicks off the watchdog timer. A
|
|
basic application exists in tools/testing/selftests/watchdog/ named
|
|
watchdog-test.c. Simply compile the C file and kick it off. If the system
|
|
gets into a bad state and hangs, the HPE ProLiant iLO timer register will
|
|
not be updated in a timely fashion and a hardware system reset (also known as
|
|
an Automatic Server Recovery (ASR)) event will occur.
|
|
|
|
The hpwdt driver also has the following module parameters:
|
|
|
|
soft_margin - allows the user to set the watchdog timer value.
|
|
Default value is 30 seconds.
|
|
timeout - an alias of soft_margin.
|
|
pretimeout - allows the user to set the watchdog pretimeout value.
|
|
This is the number of seconds before timeout when an
|
|
NMI is delivered to the system. Setting the value to
|
|
zero disables the pretimeout NMI.
|
|
Default value is 9 seconds.
|
|
nowayout - basic watchdog parameter that does not allow the timer to
|
|
be restarted or an impending ASR to be escaped.
|
|
Default value is set when compiling the kernel. If it is set
|
|
to "Y", then there is no way of disabling the watchdog once
|
|
it has been started.
|
|
|
|
NOTE: More information about watchdog drivers in general, including the ioctl
|
|
interface to /dev/watchdog can be found in
|
|
Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
|
|
|
|
Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
|
|
can only be set to 9 seconds. Attempts to set pretimeout to other
|
|
non-zero values will be rounded, possibly to zero. Users should verify
|
|
the pretimeout value after attempting to set pretimeout or timeout.
|
|
|
|
Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
|
|
panic. This is to allow for a crash dump to be collected. It is incumbent
|
|
upon the user to have properly configured the system for kdump.
|
|
|
|
The default Linux kernel behavior upon panic is to print a kernel tombstone
|
|
and loop forever. This is generally not what a watchdog user wants.
|
|
|
|
For those wishing to learn more please see:
|
|
Documentation/kdump/kdump.txt
|
|
Documentation/admin-guide/kernel-parameters.txt (panic=)
|
|
Your Linux Distribution specific documentation.
|
|
|
|
If the hpwdt does not receive the NMI associated with an expiring timer,
|
|
the iLO will proceed to reset the system at timeout if the timer hasn't
|
|
been updated.
|
|
|
|
--
|
|
|
|
The HPE iLO NMI Watchdog Driver and documentation were originally developed
|
|
by Tom Mingarelli.
|
|
|