IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Slow devices such as flash may not meet the default 1ms timeout value,
so use the ERST max execution time value that they provide as the
timeout if it is larger.
Signed-off-by: Jeshua Smith <jeshuas@nvidia.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ghes_handle_aer() passes AER data to the PCI core for logging and
recovery by calling aer_recover_queue() with a pointer to struct
aer_capability_regs.
The problem was that aer_recover_queue() queues the pointer directly
without copying the aer_capability_regs data. The pointer was to
the ghes->estatus buffer, which could be reused before
aer_recover_work_func() reads the data.
To avoid this problem, allocate a new aer_capability_regs structure
from the ghes_estatus_pool, copy the AER data from the ghes->estatus
buffer into it, pass a pointer to the new struct to
aer_recover_queue(), and free it after aer_recover_work_func() has
processed it.
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since 315bada690e0 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present. Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.
Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It's only used inside the __init section. Mark it __initdata.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ghes_estatus_pool_size_request() is unused now, so remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cper.c file needs to include an extra header, and efi_zboot_entry
needs an extern declaration to avoid these 'make W=1' warnings:
drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes]
drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes]
To make this easier, move the cper specific declarations to
include/linux/cper.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
OSPM executes an EXECUTE_OPERATION action to instruct the platform to begin
the injection operation, then executes a GET_COMMAND_STATUS action to
determine the status of the completed operation. The ACPI Specification
documented error codes[1] are:
0 = Success (Linux #define EINJ_STATUS_SUCCESS)
1 = Unknown failure (Linux #define EINJ_STATUS_FAIL)
2 = Invalid Access (Linux #define EINJ_STATUS_INVAL)
The original code report -EBUSY for both "Unknown Failure" and "Invalid
Access" cases. Actually, firmware could do some platform dependent sanity
checks and returns different error codes, e.g. "Invalid Access" to indicate
to the user that the parameters they supplied cannot be used for injection.
To this end, fix to return -EINVAL in the __einj_error_inject() error
handling case instead of always -EBUSY, when explicitly indicated by the
platform in the status of the completed operation.
[1] ACPI Specification 6.5 18.6.1. Error Injection Table
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI 6.5 added six new error types for CXL. See chapter 18
table 18.30.
Add strings for the new types so that Linux will list them in the
/sys/kernel/debug/apei/einj/available_error_types file.
It seems no other changes are needed. Linux already accepts
the CXL codes (on a BIOS that advertises them).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The bit map of error types to inject is 32-bit width [1].
Add parameter check to reflect the fact.
[1] ACPI Specification 6.4, Section 18.6.4. Error Types
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown". After a timer is set to this state, then it can no
longer be re-armed.
The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed. It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.
This was created by using a coccinelle script and the following
commands:
$ cat timer.cocci
@@
expression ptr, slab;
identifier timer, rfield;
@@
(
- del_timer(&ptr->timer);
+ timer_shutdown(&ptr->timer);
|
- del_timer_sync(&ptr->timer);
+ timer_shutdown_sync(&ptr->timer);
)
... when strict
when != ptr->timer
(
kfree_rcu(ptr, rfield);
|
kmem_cache_free(slab, ptr);
|
kfree(ptr);
)
$ spatch timer.cocci . > /tmp/t.patch
$ patch -p1 < /tmp/t.patch
Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drop this forced built-in only configuration by disentangling it from
GHES. Work by Jia He.
- The usual small cleanups and improvements all over EDAC land
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXPvoACgkQEsHwGGHe
VUq98xAAmhz4u9e9pXG0Ixkx25ZtnZ+YxANeQ53Hsa2gWicbcoFgL2E30gi97c1y
X9W361B2Q5dYq+J/YRUnEOXlI/KMWLxzNykSvVipUFNfxXZH+PijEAArz2V35/uE
6ZISRLUYVYEtHEoUXbTogeyBmBUnIaJfYheZCluDQlWPggsDESP1qmE+FTg25OBs
rDl5y+zUZYPxrWustNodVThPyhdMwGyYAUS6qYKCoNs9SNkAjGnrXoPc9j/U+cV+
qMY2dNS3uKnCujKEssQhcHucyWgCEDvmEKWMH4ItryV2UBBjpNRoM6HDe7XFKwVJ
riOKX8VDrpdSdlV1jbCx9KB47BUwFygOYsFdW7gIDJ1hb8usN4nSYQNDIlZKEIQG
cHNpv2XGT+pCSvyc4Iv2Fgyvnp25XensSQwQAtk5Y4/lJL1yrgcPjMOkPmRS+mmH
BclDWNbL+gwqkyWxgfoivDBOetLgwJYTr2ewBr6QbBtwLB8rL4BxXIdomcoFPuxi
jAxixZnTbS+Xq5S7uYK4r6KbaHGcJtwolXMGjx13IHmPfvYtTTQzfRcrBlAtQ/pV
BDLoygmDVlkhSVx6bi5V5QZ06rcWYR4cRsBQ54FnBGMr730ZljgFONOHFtUab28T
C+YUOaeLEYEYI0cIkkyoSuiz6avB6YvQAiyEPM0EdHZrQFwhBBw=
=DFp8
-----END PGP SIGNATURE-----
Merge tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Make ghes_edac a simple module like the rest of the EDAC drivers and
drop the forced built-in only configuration by disentangling it from
GHES (Jia He)
- The usual small cleanups and improvements all over EDAC land
* tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()
EDAC/i5400: Fix typo in comment: vaious -> various
EDAC/mc_sysfs: Increase legacy channel support to 12
MAINTAINERS: Make Mauro EDAC reviewer
MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac
EDAC/igen6: Return the correct error type when not the MC owner
apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
EDAC: Check for GHES preference in the chipset-specific EDAC drivers
EDAC/ghes: Make ghes_edac a proper module
EDAC/ghes: Prepare to make ghes_edac a proper module
EDAC/ghes: Add a notifier for reporting memory errors
efi/cper: Export several helpers for ghes_edac to use
EDAC/i5000: Mark as BROKEN
Move error type descriptions into an array and loop over error types
to improve readability and maintainability.
Replace seq_printf() with seq_puts() as recommended by checkpatch.pl.
Signed-off-by: Jay Lu <jaylu102@amd.com>
Co-developed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Checkpatch reveals warnings and an error due to missing lines and
incorrect indentations. Add the missing lines after declarations and
fix the suspect indentations.
Signed-off-by: Jay Lu <jaylu102@amd.com>
Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This file does not use rcu, so there is no point in including
<linux/rculist.h>.
So just remove it.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Silence the following warnings when make W=1:
| CC drivers/acpi/apei/apei-base.c
| warning: no previous prototype for 'arch_apei_enable_cmcff' [-Wmissing-prototypes]
| int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
| ^
| CC drivers/acpi/apei/apei-base.c
| warning: no previous prototype for 'arch_apei_report_mem_error' [-Wmissing-prototypes]
| void __weak arch_apei_report_mem_error(int sev,
| ^
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:
/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
*/
New elements are added to the cache this way:
if (!ghes_estatus_cached(estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, estatus))
ghes_estatus_cache_add(ghes->generic, estatus);
The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.
Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.
And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.
Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.
Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.
And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.
This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.
Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.
[ bp:
- Flesh out and summarize what was discussed on the thread now
that that cache contraption is understood;
- Touch up code style. ]
Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no
driver is bound") the driver core cares for cleaning driver data, so
don't do it in the driver, too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:
/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
*/
New elements are added to the cache this way:
if (!ghes_estatus_cached(estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, estatus))
ghes_estatus_cache_add(ghes->generic, estatus);
The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.
Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.
And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.
Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.
Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.
And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.
This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.
Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.
[ bp:
- Flesh out and summarize what was discussed on the thread now
that that cache contraption is understood;
- Touch up code style. ]
Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
Commit
dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
introduced a bug leading to ghes_edac_register() to be invoked before
edac_init(). Because at that time the bus "edac" hadn't been even
registered, this created sysfs nodes as /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.
Fix this by turning ghes_edac into a proper module.
The list of GHES devices returned is not protected from being modified
concurrently but it is pretty static as it gets created only during GHES
init and latter is not a module so...
[ bp: Massage. ]
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
To make ghes_edac a proper module, prepare to decouple its dependencies
from GHES.
Move the ghes_edac.force_load parameter to ghes.c in order to
properly control whether ghes_edac should be force-loaded: In
ghes_edac_register() it is too late to set the module flag.
Introduce a helper ghes_get_devices(), which returns the list of GHES
devices which got probed when the platform-check passes on the system.
The previous force_load check is not needed in ghes_edac_unregister()
since it will be checked in the module's init function of ghes_edac
later.
[ bp: Massage. ]
Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
In order to make it a proper module and disentangle it from facilities,
add a notifier for reporting memory errors. Use an atomic notifier
because calls sites like ghes_proc_in_irq() run in interrupt context.
[ bp: Massage commit message. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.
For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.
The code use init_mm to check whether the error occurs in user space:
if (current->mm != &init_mm)
The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.
In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.
sdei: event 805 on CPU 113 failed with error: -2
Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.
The condition should generally just do
if (current->mm)
as described in active_mm.rst documentation.
Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.
Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Return the erst_get_record_id_begin() and apei_exec_write_register()
return values directly instead of storing them in redundant local
variables.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Print total number of records found during BERT log parsing.
This also simplify dmesg parser implementation for BERT events.
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When a platform marks a memory range as "special purpose" it is not
onlined as System RAM by default. However, it is still suitable for
error injection. Add IORES_DESC_SOFT_RESERVED to einj_error_inject() as
a permissible memory type in the sanity checking of the arguments to
_EINJ.
Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration")
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reported-by: Omar Avelar <omar.avelar@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The fix in commit 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT
table data") does not work as intended on systems where the BIOS has a
fixed size block of memory for the BERT table, relying on s/w to quit
when it finds a record with estatus->block_status == 0. On these systems
all errors are suppressed because the check:
if (region_len < ACPI_BERT_PRINT_MAX_LEN)
always fails.
New scheme skips individual CPER records that are too large, and also
limits the total number of records that will be printed to 5.
Fixes: 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT table data")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Delete the redundant word 'the'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
[ rjw: New subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some validation tests dynamically inject errors into memory used by
applications to check that the system can recover from a variety of
poison consumption sceenarios.
But sometimes the virtual address picked by these tests is mapped to
the zero page.
This causes additional unexpected machine checks as other processes that
map the zero page also consume the poison.
Disallow injection to the zero page.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Read a record is cleared by others, but the deleted record cache entry is
still created by erst_get_record_id_next. When next enumerate the records,
get the cached deleted record, then erst_read() return -ENOENT and try to
get next record, loop back to first ID will return 0 in function
__erst_record_id_cache_add_one and then set record_id as
APEI_ERST_INVALID_RECORD_ID, finished this time read operation.
It will result in read the records just in the cache hereafter.
This patch cleared the deleted record cache, fix the issue that
"./erst-inject -p" shows record counts not equal to "./erst-inject -n".
A reproducer of the problem(retry many times):
[root@localhost erst-inject]# ./erst-inject -c 0xaaaaa00011
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000006
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000007
[root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000008
[root@localhost erst-inject]# ./erst-inject -p
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00012
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00013
rc: 273
rcd sig: CPER
rcd id: 0xaaaaa00014
[root@localhost erst-inject]# ./erst-inject -n
total error record count: 6
Signed-off-by: Liu Xinpeng <liuxp11@chinatelecom.cn>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the variable
being allocated.
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Platforms with large BERT table data can trigger soft lockup errors
while attempting to print the entire BERT table data to the console at
boot:
watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1]
Observed on Ampere Altra systems with a single BERT record of ~250KB.
The original bert driver appears to have assumed relatively small table
data. Since it is impractical to reassemble large table data from
interwoven console messages, and the table data is available in
/sys/firmware/acpi/tables/data/BERT
limit the size for tables printed to the console to 1024 (for no reason
other than it seemed like a good place to kick off the discussion, would
appreciate feedback from existing users in terms of what size would
maintain their current usage model).
Alternatively, we could make printing a CONFIG option, use the
bert_disable boot arg (or something similar), or use a debug log level.
However, all those solutions require extra steps or change the existing
behavior for small table data. Limiting the size preserves existing
behavior on existing platforms with small table data, and eliminates the
soft lockups for platforms with large table data, while still making it
available.
Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__setup() handlers should return 1 to indicate that the boot option
has been handled. Returning 0 causes a boot option to be listed in
the Unknown kernel command line parameters and also added to init's
arg list (if no '=' sign) or environment list (if of the form 'a=b').
Unknown kernel command line parameters "erst_disable
bert_disable hest_disable BOOT_IMAGE=/boot/bzImage-517rc6", will be
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
erst_disable
bert_disable
hest_disable
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
Fixes: a3e2acc5e37b ("ACPI / APEI: Add Boot Error Record Table (BERT) support")
Fixes: a08f82d08053 ("ACPI, APEI, Error Record Serialization Table (ERST) support")
Fixes: 9dc966641677 ("ACPI, APEI, HEST table parsing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ghes_init() sticks out in acpi_init() because it is the only functions
without an "acpi_" prefix.
Rename ghes_init with an "acpi_" prefix, then all looks fine.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
the estatus memory pool. On the other hand, ghes_init() relies on
sdei_init() to detect the SDEI version and (un)register events. The
dependencies are as follows:
ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init()
ghes_init() => sdei_init()
HEST is not PCI-specific and initcall ordering is implicit and not
well-defined within a level.
Based on above, remove acpi_hest_init() from acpi_pci_root_init() and
convert ghes_init() and sdei_init() from initcalls to explicit calls in the
following order:
acpi_hest_init()
ghes_init()
sdei_init()
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.
Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
SGX reserved memory does not appear in the standard address maps.
Add hook to call into the SGX code to check if an address is located
in SGX memory.
There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-7-tony.luck@intel.com
apei_hest_parse() is only used in hest.c, so mark it static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ rjw: Minor subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When injecting an error into the platform, the OSPM executes an
EXECUTE_OPERATION action to instruct the platform to begin the injection
operation. And then, the OSPM busy waits for a while by continually
executing CHECK_BUSY_STATUS action until the platform indicates that the
operation is complete. More specifically, the platform is limited to
respond within 1 millisecond right now. This is too strict for some
platforms.
For example, in Arm platform, when injecting a Processor Correctable error,
the OSPM will warn:
Firmware does not respond in time.
And a message is printed on the console:
echo: write error: Input/output error
We observe that the waiting time for DDR error injection is about 10 ms and
that for PCIe error injection is about 500 ms in Arm platform.
In this patch, we relax the response timeout to 1 second.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea()
synchronise with APEI's irq work"), do_sea() would unconditionally
signal the affected task from the arch code. Since that change,
the GHES driver sends the signals.
This exposes a problem as errors the GHES driver doesn't understand
or doesn't handle effectively are silently ignored. It will cause
the errors get taken again, and circulate endlessly. User-space task
get stuck in this loop.
Existing firmware on Kunpeng9xx systems reports cache errors with the
'ARM Processor Error' CPER records.
Do memory failure handling for ARM Processor Error Section just like
for Memory Error Section.
Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If ACPI is not enabled but support for ACPI and APEI is enabled in the
kernel, then the following warning is seen on boot ...
WARNING KERN EINJ: ACPI disabled.
For ARM64 platforms, the 'acpi_disabled' variable is true by default
and hence, the above is often seen on ARM64. Given that it can be
normal for ACPI to be disabled, make this an informational print rather
that a warning.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* acpi-misc:
ACPI: dock: fix some coding style issues
ACPI: sysfs: fix some coding style issues
ACPI: PM: add a missed blank line after declarations
ACPI: custom_method: fix a coding style issue
ACPI: CPPC: fix some coding style issues
ACPI: button: fix some coding style issues
ACPI: battery: fix some coding style issues
ACPI: acpi_pad: add a missed blank line after declarations
ACPI: LPSS: add a missed blank line after declarations
ACPI: ipmi: remove useless return statement for void function
ACPI: processor: fix some coding style issues
ACPI: APD: fix a block comment align issue
ACPI: AC: fix some coding style issues
ACPI: fix various typos in comments
The variable rc is being assigned a value that is never read,
the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Eliminate the following coccicheck warning:
./drivers/acpi/apei/erst.c:691:2-3: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
From commit 6915564dc5a8 ("ACPI: OSL: Change the type of
acpi_os_map_generic_address() return value"),
acpi_os_map_generic_address() will return logical address or NULL
for error, but for ACPI_ADR_SPACE_SYSTEM_IO case, it should be also
return 0 as it's a normal case, but now it will return -ENXIO.
So check it out for such case to avoid einj module initialization
fail.
Fixes: 6915564dc5a8 ("ACPI: OSL: Change the type of acpi_os_map_generic_address() return value")
Cc: <stable@vger.kernel.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
J9ILnGAAeQ==
=aVWo
-----END PGP SIGNATURE-----
Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()