linux/drivers/acpi/apei
Shuai Xue a70297d221 ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
There are two major types of uncorrected recoverable (UCR) errors :

 - Synchronous error: The error is detected and raised at the point of
   the consumption in the execution flow, e.g. when a CPU tries to
   access a poisoned cache line. The CPU will take a synchronous error
   exception such as Synchronous External Abort (SEA) on Arm64 and
   Machine Check Exception (MCE) on X86. OS requires to take action (for
   example, offline failure page/kill failure thread) to recover this
   uncorrectable error.

 - Asynchronous error: The error is detected out of processor execution
   context, e.g. when an error is detected by a background scrubber.
   Some data in the memory are corrupted. But the data have not been
   consumed. OS is optional to take action to recover this uncorrectable
   error.

When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.

To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-21 14:50:25 +01:00
..
apei-base.c ACPI: APEI: Remove a useless include 2022-12-02 20:18:50 +01:00
apei-internal.h efi: fix missing prototype warnings 2023-05-25 09:26:19 +02:00
bert.c ACPI: APEI: mark bert_disable as __initdata 2023-06-12 19:23:25 +02:00
einj.c ACPI: APEI: EINJ: Add support for vendor defined error types 2023-11-21 21:10:44 +01:00
erst-dbg.c ACPI: APEI: Fix missing ERST record id 2022-04-13 20:29:24 +02:00
erst.c ACPI: APEI: Use ERST timeout for slow devices 2023-10-24 20:50:17 +02:00
ghes.c ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events 2023-12-21 14:50:25 +01:00
hest.c ACPI: APEI: fix return value of __setup handlers 2022-03-08 19:43:39 +01:00
Kconfig ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue 2019-02-07 23:10:45 +01:00
Makefile