2019-06-12 20:53:01 +03:00
===========================
HPE iLO NMI Watchdog Driver
===========================
for iLO based ProLiant Servers
==============================
2018-08-20 22:31:23 +03:00
Last reviewed: 08/20/2018
2009-06-04 23:50:45 +04:00
2016-04-06 21:40:05 +03:00
The HPE iLO NMI Watchdog driver is a kernel module that provides basic
2018-08-20 22:31:23 +03:00
watchdog functionality and handler for the iLO "Generate NMI to System"
virtual button.
2016-04-06 21:40:05 +03:00
All references to iLO in this document imply it also works on iLO2 and all
subsequent generations.
2009-06-04 23:50:45 +04:00
Watchdog functionality is enabled like any other common watchdog driver. That
is, an application needs to be started that kicks off the watchdog timer. A
2017-10-12 23:24:10 +03:00
basic application exists in tools/testing/selftests/watchdog/ named
2009-06-04 23:50:45 +04:00
watchdog-test.c. Simply compile the C file and kick it off. If the system
2016-04-06 21:40:05 +03:00
gets into a bad state and hangs, the HPE ProLiant iLO timer register will
2009-06-04 23:50:45 +04:00
not be updated in a timely fashion and a hardware system reset (also known as
an Automatic Server Recovery (ASR)) event will occur.
2018-08-20 22:31:23 +03:00
The hpwdt driver also has the following module parameters:
2009-06-04 23:50:45 +04:00
2019-06-12 20:53:01 +03:00
============ ================================================================
soft_margin allows the user to set the watchdog timer value.
2016-04-06 21:40:05 +03:00
Default value is 30 seconds.
2019-06-12 20:53:01 +03:00
timeout an alias of soft_margin.
pretimeout allows the user to set the watchdog pretimeout value.
2018-08-20 22:31:23 +03:00
This is the number of seconds before timeout when an
NMI is delivered to the system. Setting the value to
zero disables the pretimeout NMI.
Default value is 9 seconds.
2019-06-12 20:53:01 +03:00
nowayout basic watchdog parameter that does not allow the timer to
2009-06-04 23:50:45 +04:00
be restarted or an impending ASR to be escaped.
2016-04-06 21:40:05 +03:00
Default value is set when compiling the kernel. If it is set
to "Y", then there is no way of disabling the watchdog once
it has been started.
2019-05-17 23:59:42 +03:00
kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI
before calling panic. (-1) disables the watchdog. When value
is > 0, the timer is reprogrammed with the greater of
value or current timeout value.
2019-06-12 20:53:01 +03:00
============ ================================================================
2009-06-04 23:50:45 +04:00
2019-06-12 20:53:01 +03:00
NOTE:
More information about watchdog drivers in general, including the ioctl
2009-06-04 23:50:45 +04:00
interface to /dev/watchdog can be found in
2019-06-12 20:53:01 +03:00
Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
can only be set to 9 seconds. Attempts to set pretimeout to other
non-zero values will be rounded, possibly to zero. Users should verify
the pretimeout value after attempting to set pretimeout or timeout.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
panic. This is to allow for a crash dump to be collected. It is incumbent
upon the user to have properly configured the system for kdump.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
The default Linux kernel behavior upon panic is to print a kernel tombstone
and loop forever. This is generally not what a watchdog user wants.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
For those wishing to learn more please see:
2019-06-12 20:52:49 +03:00
Documentation/kdump/kdump.rst
2018-08-20 22:31:23 +03:00
Documentation/admin-guide/kernel-parameters.txt (panic=)
Your Linux Distribution specific documentation.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
If the hpwdt does not receive the NMI associated with an expiring timer,
the iLO will proceed to reset the system at timeout if the timer hasn't
been updated.
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
--
2009-06-04 23:50:45 +04:00
2018-08-20 22:31:23 +03:00
The HPE iLO NMI Watchdog Driver and documentation were originally developed
by Tom Mingarelli.