bbbca72352
Presently PAPR doesn't support injecting smart errors on an NVDIMM. This makes testing the NVDIMM health reporting functionality difficult as simulating NVDIMM health related events need a hacked up qemu version. To solve this problem this patch proposes simulating certain set of NVDIMM health related events in papr_scm. Specifically 'fatal' health state and 'dirty' shutdown state. These error can be injected via the user-space 'ndctl-inject-smart(1)' command. With the proposed patch and corresponding ndctl patches following command flow is expected: $ sudo ndctl list -DH -d nmem0 ... "health_state":"ok", "shutdown_state":"clean", ... # inject unsafe shutdown and fatal health error $ sudo ndctl inject-smart nmem0 -Uf ... "health_state":"fatal", "shutdown_state":"dirty", ... # uninject all errors $ sudo ndctl inject-smart nmem0 -N ... "health_state":"ok", "shutdown_state":"clean", ... The patch adds a new member 'health_bitmap_inject_mask' inside struct papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs at nmemX/papr/health_bitmap_inject. A new PDSM named 'SMART_INJECT' is proposed that accepts newly introduced 'struct nd_papr_pdsm_smart_inject' as payload thats exchanged between libndctl and papr_scm to indicate the requested smart-error states. When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject() constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being fetched from the hypervisor, the health_bitmap reflects requested smart-error states. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com
76 lines
2.8 KiB
Plaintext
76 lines
2.8 KiB
Plaintext
What: /sys/bus/nd/devices/nmemX/papr/flags
|
|
Date: Apr, 2020
|
|
KernelVersion: v5.8
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Report flags indicating various states of a
|
|
papr-pmem NVDIMM device. Each flag maps to a one or
|
|
more bits set in the dimm-health-bitmap retrieved in
|
|
response to H_SCM_HEALTH hcall. The details of the bit
|
|
flags returned in response to this hcall is available
|
|
at 'Documentation/powerpc/papr_hcalls.rst' . Below are
|
|
the flags reported in this sysfs file:
|
|
|
|
* "not_armed"
|
|
Indicates that NVDIMM contents will not
|
|
survive a power cycle.
|
|
* "flush_fail"
|
|
Indicates that NVDIMM contents
|
|
couldn't be flushed during last
|
|
shut-down event.
|
|
* "restore_fail"
|
|
Indicates that NVDIMM contents
|
|
couldn't be restored during NVDIMM
|
|
initialization.
|
|
* "encrypted"
|
|
NVDIMM contents are encrypted.
|
|
* "smart_notify"
|
|
There is health event for the NVDIMM.
|
|
* "scrubbed"
|
|
Indicating that contents of the
|
|
NVDIMM have been scrubbed.
|
|
* "locked"
|
|
Indicating that NVDIMM contents cant
|
|
be modified until next power cycle.
|
|
|
|
What: /sys/bus/nd/devices/nmemX/papr/perf_stats
|
|
Date: May, 2020
|
|
KernelVersion: v5.9
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Report various performance stats related to papr-scm NVDIMM
|
|
device. This attribute is only available for NVDIMM devices
|
|
that support reporting NVDIMM performance stats. Each stat is
|
|
reported on a new line with each line composed of a
|
|
stat-identifier followed by it value. Below are currently known
|
|
dimm performance stats which are reported:
|
|
|
|
* "CtlResCt" : Controller Reset Count
|
|
* "CtlResTm" : Controller Reset Elapsed Time
|
|
* "PonSecs " : Power-on Seconds
|
|
* "MemLife " : Life Remaining
|
|
* "CritRscU" : Critical Resource Utilization
|
|
* "HostLCnt" : Host Load Count
|
|
* "HostSCnt" : Host Store Count
|
|
* "HostSDur" : Host Store Duration
|
|
* "HostLDur" : Host Load Duration
|
|
* "MedRCnt " : Media Read Count
|
|
* "MedWCnt " : Media Write Count
|
|
* "MedRDur " : Media Read Duration
|
|
* "MedWDur " : Media Write Duration
|
|
* "CchRHCnt" : Cache Read Hit Count
|
|
* "CchWHCnt" : Cache Write Hit Count
|
|
* "FastWCnt" : Fast Write Count
|
|
|
|
What: /sys/bus/nd/devices/nmemX/papr/health_bitmap_inject
|
|
Date: Jan, 2022
|
|
KernelVersion: v5.17
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Reports the health bitmap inject bitmap that is applied to
|
|
bitmap received from PowerVM via the H_SCM_HEALTH. This is used
|
|
to forcibly set specific bits returned from Hcall. These is then
|
|
used to simulate various health or shutdown states for an nvdimm
|
|
and are set by user-space tools like ndctl by issuing a PAPR DSM.
|
|
|