linux

iv/linux

Author	SHA1	Message	Date
Yazen Ghannam	5176a93ab2	x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types Add HWID and McaType values for new SMCA bank types, and add their error descriptions to edac_mce_amd. The "PHY" bank types all have the same error descriptions, and the NBIF and SHUB bank types have the same error descriptions. So reuse the same arrays where appropriate. [ bp: Remove useless comments over hwid types. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com	2021-12-22 17:19:18 +01:00
Colin Ian King	567617baac	EDAC/sb_edac: Remove redundant initialization of variable rc The variable rc is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and thus remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211126221848.1125321-1-colin.i.king@gmail.com	2021-12-21 12:02:11 +01:00
Yazen Ghannam	e2be5955a8	EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh Add a new family type for AMD Family 19h Models 10h to 1Fh. Use this new family type for Models A0h to AFh also. Increase the maximum number of controllers from 8 to 12. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-3-yazen.ghannam@amd.com	2021-12-10 12:54:33 +01:00
Yazen Ghannam	f957112423	EDAC: Add RDDR5 and LRDDR5 memory types Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com	2021-12-10 12:51:28 +01:00
Randy Dunlap	ad2c302bc6	EDAC/sifive: Fix non-kernel-doc comment scripts/kernel-doc complains about a comment that begins with "/" but is not in kernel-doc format, so correct it. Prevents this warning: drivers/edac/sifive_edac.c:23: warning: This comment starts with '/', \ but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EDAC error callback Fixes: `91abaeaaff` ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211201030913.10283-1-rdunlap@infradead.org	2021-12-05 19:54:46 +01:00
Dinh Nguyen	f6bc0d8bc2	EDAC/synopsys: Enable the driver on Intel's N5X platform Intel's N5X platform is also using the Synopsys EDAC controller. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-3-dinguyen@kernel.org	2021-11-20 19:41:48 +01:00
Dinh Nguyen	f7824ded41	EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR Add support for version 3.80a of the Synopsys DDR controller. This version of the controller has the following differences: - UE/CE are auto cleared - Interrupts are supported by default Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-2-dinguyen@kernel.org	2021-11-20 19:23:49 +01:00
Dinh Nguyen	bd1d6da17c	EDAC/synopsys: Use the quirk for version instead of ddr version Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org	2021-11-20 18:13:06 +01:00
Yazen Ghannam	70aeb807cf	EDAC/amd64: Add context struct Define an address translation context struct. This will hold values that will be passed between multiple functions. Save return address, Node ID, and the Instance ID number to start. Currently, the UMC number is used as the Instance ID, but future DF versions may use another value. Also include a "tmp" field to use when reading registers. This is to avoid having to define a temporary variable in multiple functions. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com	2021-11-15 12:54:16 +01:00
Yazen Ghannam	448c3d6085	EDAC/amd64: Allow for DF Indirect Broadcast reads The DF Indirect Access method allows for "Broadcast" accesses in which case no specific instance is targeted. Add support using a reserved instance ID of 0xFF to indicate a broadcast access. Set the FICAA register appropriately. Define helpers functions for instance and broadcast reads and use them where appropriate. Drop the "amd_" prefix since these functions are all static. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com	2021-11-15 12:48:55 +01:00
Yazen Ghannam	b3218ae477	x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDAC df_indirect_read() is used only for address translation. Move it to EDAC along with the translation code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com	2021-11-15 12:44:47 +01:00
Yazen Ghannam	0b746e8c1e	x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDAC The address translation code used for current AMD systems is non-architectural. So move it to EDAC. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.com	2021-11-15 12:36:32 +01:00
Linus Torvalds	fe354159ca	- amd64_edac: Add support for three-rank interleaving mode which is present on AMD zen2 servers - The usual fixes and cleanups all over EDAC land -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/rQAACgkQEsHwGGHe VUqTsg//QSI/akiJnnZ5Ea3CMsFAWH/LIAZzjWa3zcit6OcxrjRMsym1JGs4llUY 7Uis0lGTUH2je3kFrYdPVGXyoWPNdsQCQi/89Anabl5Jrf2u7N2MmUIKcCnzQWjS PHb4BSSWW2zR8wU8UXOo2I1QKltenKlaO97uE9Te9YZCYZ6PzS0z2qevd0SCPUtF tPQ+AqYNKE0JNNAqlPietSHs7MisyCuxTHqLiwC0QNuKJ3GhCtVhmYGKqaLOMcfr oKdBFaAZOhmtUFT0z8cLka8sKO5WDfqli9Qp6Zphv9E6iz6gJq7NCBaKRz8gOkPi boOjmSOhCi/xJq63DUtfKOogVRM+qhms1Kxni3EgJxyGR/oLwnVX4n0AopKBstb6 AsHPUB8AxrtnEtoBPFM/bO18Eto3HkCZD2yCzztOc5NmtUnVlIh2/qN4oYjBG7pc 20iXwZ49OAYVQwrLdoNvyXQ5BdKGC5tyMgvPWVyOO8Iad2YYfM5iUWDGSycEWSo3 /xZgVljbFRBwjxaTqgoUDxOlBEgQsDg4ENTathCpxuR+meac/kqGZDSLc0zc7hV1 zV2kHB7y6gWGAQiuS3Lq+j0BTNtgbUi43R61J00ecQEIOomsBaWLi6uewNJmvOZ9 Ef7c0t/ree/PMTaXVer0IA8lU3vcHBMTrbNtQsPb4TCRsZOgFhI= =O0zA -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A small pile of EDAC updates which the autumn wind blew my way. :) - amd64_edac: Add support for three-rank interleaving mode which is present on AMD zen2 servers - The usual fixes and cleanups all over EDAC land" * tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell EDAC/ti: Remove redundant error messages EDAC/amd64: Handle three rank interleaving mode EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned EDAC/al_mc: Make use of the helper function devm_add_action_or_reset() EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()	2021-11-01 15:02:49 -07:00
Hans Potsch	d9b7748ffc	EDAC/armada-xp: Fix output of uncorrectable error counter The number of correctable errors is displayed as uncorrectable errors because the "SBE" error count is passed to both calls of edac_mc_handle_error(). Pass the correct uncorrectable error count to the second edac_mc_handle_error() call when logging uncorrectable errors. [ bp: Massage commit message. ] Fixes: `7f6998a412` ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC") Signed-off-by: Hans Potsch <hans.potsch@nokia.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com	2021-10-14 11:46:03 +02:00
Eric Badger	537bddd069	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell The computation of TOHM is off by one bit. This missed bit results in too low a value for TOHM, which can cause errors in regular memory to incorrectly report: EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory Fixes: `50d1bb9367` ("sb_edac: add support for Haswell based systems") Cc: stable@vger.kernel.org Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com	2021-10-11 08:28:46 -07:00
Tang Bin	0b6d4ab216	EDAC/ti: Remove redundant error messages In the function ti_edac_probe(), devm_ioremap_resource() and platform_get_irq() already issue error messages when they fail so remove the redundant error messages in the EDAC driver. Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210811112626.27848-1-tangbin@cmss.chinamobile.com	2021-10-07 19:16:01 +02:00
Yazen Ghannam	9f4873fb6a	EDAC/amd64: Handle three rank interleaving mode AMD Rome systems and later support interleaving between three identical ranks within a channel. Check for this mode by counting the number of enabled chip selects and comparing their masks. If there are exactly three enabled chip selects and their masks are identical, then three rank interleaving is enabled. The size of a rank is determined from its mask value. However, three rank interleaving doesn't follow the method of swapping an interleave bit with the most significant bit. Rather, the interleave bit is flipped and the most significant bit remains the same. There is only a single interleave bit in this case. Account for this when determining the chip select size by keeping the most significant bit at its original value and ignoring any zero bits. This will return a full bitmask in [MSB:1]. Fixes: `e53a3b267f` ("EDAC/amd64: Find Chip Select memory size using Address Mask") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com	2021-10-07 12:30:53 +02:00
Eric Badger	34417f27b9	EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned This is cosmetically nicer for counts > INT32_MAX, and aligns the MC-scope format with that of the lower layer sysfs counter files. Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20211003181653.GA685515@ebadger-ThinkPad-T590	2021-10-07 12:06:20 +02:00
Cai Huoqing	470b52564c	EDAC/al_mc: Make use of the helper function devm_add_action_or_reset() The helper function devm_add_action_or_reset() will internally call devm_add_action(), and if devm_add_action() fails then it will execute the action mentioned and return the error code. So use devm_add_action_or_reset() instead of devm_add_action() to simplify the error handling, reduce the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Talel Shenhar <talel@amazon.com> Link: https://lkml.kernel.org/r/20210922125924.321-1-caihuoqing@baidu.com	2021-09-28 18:35:11 +02:00
Borislav Petkov	54607282fa	EDAC/dmc520: Assign the proper type to dimm->edac_mode dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Fixes: `1088750d78` ("EDAC: Add EDAC driver for DMC520") Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210916085258.7544-1-bp@alien8.de	2021-09-16 11:00:12 +02:00
Sai Krishna Potthuri	5297cfa6bd	EDAC/synopsys: Fix wrong value type assignment for edac_mode dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Issue caught by Coverity check "enumerated type mixed with another type." [ bp: Rewrite commit message, add tags. ] Fixes: `ae9b56e399` ("EDAC, synps: Add EDAC support for zynq ddr ecc controller") Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com	2021-09-16 10:21:35 +02:00
Len Baker	fca6116564	EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf() strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehavior. The safe replacement is strscpy(). [1][2] However, to simplify and clarify the code, to concatenate labels use the scnprintf() function. This way it is not necessary to check the return value of strscpy() (-E2BIG if the parameter count is 0 or the src was truncated) since scnprintf() always returns the number of chars written into the buffer. This function always returns a nul-terminated string even if it needs to be truncated. While at it, fix all other broken string generation code that wrongly interprets snprintf()'s return code or just uses sprintf(), implement that using scnprintf() here too. Drop breaks in loops around scnprintf() as it is safe now to loop. Moreover, the check is not needed: for the case when the buffer is exhausted, len never gets zero because scnprintf() takes the full buffer length as input parameter, but excludes the trailing '\0' in its return code and thus, 1 is the minimum len. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [2] https://github.com/KSPP/linux/issues/88 [ rric: Replace snprintf() with scnprintf(), rework sprintf() user, drop breaks in loops around scnprintf(), introduce 'end' pointer to reduce pointer arithmetic, use prefix pattern for e->location, adjust subject and description ] Co-developed-by: Joe Perches <joe@perches.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210903150539.7282-1-len.baker@gmx.com	2021-09-15 13:52:58 +02:00
Linus Torvalds	7d6e3fa87e	Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing outstanding MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsnpsTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoS+/EACQdpRkzl3IDIYqThxVZ8KQzp2rKKVn qisAQiWg/6koNJx/yYy62KNAUyKjCIObNtRnWi7OAOx6OvNtQTD2WOLAwkh3Pgw1 8ePYYl55k+yCs8VoITsZM9jYeO+Tk878pU2A6R943zR+g6G7bskGJrxEyZ9TbzIe qKfusNKnRY9/jMQaRALUAAtA9VIVR867GqORX5X8hKz8yE2rqlpb4y+1CFba5BTV Vlxw7cIXvXBn7BKAom5diRqEGDNJEbX+56jJ7yDZshgLo7m11D7QLw72kmb6TNVC g7PchvFi4afpc1ifEAAp0tk4RiSIAQ91nS3n0+jLcLbodOjIkl14eY02ZCJGAP29 uslyzUbmy1wgejG6CA63JtZ4MYdrf/OSMGuoN78qnOKYcIsWFzOvlJmBWWNW34qW LCaUF9QdJ/slXu6B4vIx30GfN9q4myml8bFUobE5q9mBRrEk4R0B7iyBvPu1xKYr ZEan67prI5VEu+afJGpp4r294m4HNVkMLfl3nYmE5+y4MoLeMNKDY3IPTvI9iP4G kaFgoPvQo23WnuclNYpJ+CaA4aRASlB2nTY+oAXIYfehbey9EW5vq4/EK864ek6w oyUTepxxNhE81tG2jpQbf2tR4COsEHy986clxqPP4AvsZXcbypCw8O2FcflpQbHO 5DLEAfTmp7cziQ== =qyll -----END PGP SIGNATURE----- Merge tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...	2021-08-30 14:38:37 -07:00
Linus Torvalds	05b5fdb2a8	- Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it to the Intel SKx drivers - Print additional useful per-channel error information on i10nm, like on SKL - Check whether the AMD decoder is loaded on a guest and if so, don't - The usual round of fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEso2QACgkQEsHwGGHe VUqZRRAAqraSdImfO4a7dV4lHErI43GDx9WZIKhrfNyvEv32pDc7E6dwUqJ+NHa1 ON469MjmV4N3X5Deq+Ecaij6giieBKazfqbbZA9pSbNtX4+ta+TEDTCmO7IX38BC pESwHlWinJbpy8SfopXQh9wgx0pWOHfPysStHS0pmG5mH+o9mIDqsn/20Q0o/kIY Cn+PZ1mJ2f7iJXvjQv7XsGMM/HotLFkJ5RNUoVjZqteOOdTcHRWytuL/r7QY0SpX iWAcmkkW3gcW1qypvo/n53ghxvSydZpqWJhI25QBJ79CGOCVocYfitkIdIYAYOtt xH82GFuC5EtDBTsQXQa2gzbraTwnxPCr6BXKY819a7+xsw/WX2x7MctZc6vCH4R+ UoDRQoUAqgNG2D89lgtQHspy3h3gSSWbGRysLdFCohJlDv3mCU4hk7yyvoCtatqR ki725nkJn0XT1HZAQc+o3p6E+PCmVYd2Km8TtNJ2ZS8z+97rjsZQAV5eeZKLnUC7 lVX57lZ9MciSlrBNe1k9b1er1Ir3/rLs1X33K+yqE8IwwCX1hEw1yPNEf6jSXQwK qce95INCamyl1Z8y0uhFZfgl8aEN5ZbaL/lBYz1Gk5R7IQCSVd9Amq5fqxOUdPb/ PcJk24Ng+h0oPKiBBVHrsDhFiDnZKAv7JGqEPFNmOi3dWZPrNrI= =JYRz -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "The usual EDAC stuff which managed to trickle in for 5.15: - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it to the Intel SKx drivers - Print additional useful per-channel error information on i10nm, like on SKL - Don't load the AMD EDAC decoder in virtual images - The usual round of fixes and cleanups" * tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Retrieve and print retry_rd_err_log registers EDAC/i10nm: Fix NVDIMM detection EDAC/skx_common: Set the memory type correctly for HBM memory EDAC/altera: Skip defining unused structures for specific configs EDAC/mce_amd: Do not load edac_mce_amd module on guests EDAC/mc: Add new HBM2 memory type EDAC/amd64: Use DEVICE_ATTR helper macros	2021-08-30 13:17:29 -07:00
Youquan Song	cf4e6d52f5	EDAC/i10nm: Retrieve and print retry_rd_err_log registers Retrieve and print retry_rd_err_log registers like the earlier change: commit `e80634a75a` ("EDAC, skx: Retrieve and print retry_rd_err_log registers") This is a little trickier than on Skylake because of potential interference with BIOS use of the same registers. The default behavior is to ignore these registers. A module parameter retry_rd_err_log(default=0) controls the mode of operation: - 0=off : Default. - 1=bios : Linux doesn't reset any control bits, but just reports values. This is "no harm" mode, but it may miss reporting some data. - 2=linux: Linux tries to take control and resets mode bits, clears valid/UC bits after reading. This should be more reliable (especially if BIOS interference is reduced by disabling eMCA reporting mode in BIOS setup). Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com	2021-08-23 10:35:36 -07:00
Qiuxu Zhuo	2294a7299f	EDAC/i10nm: Fix NVDIMM detection MCDDRCFG is a per-channel register and uses bit{0,1} to indicate the NVDIMM presence on DIMM slot{0,1}. Current i10nm_edac driver wrongly uses MCDDRCFG as per-DIMM register and fails to detect the NVDIMM. Fix it by reading MCDDRCFG as per-channel register and using its bit{0,1} to check whether the NVDIMM is populated on DIMM slot{0,1}. Fixes: `d4dc89d069` ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Fan Du <fan.du@intel.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-2-tony.luck@intel.com	2021-08-23 10:35:36 -07:00
Qiuxu Zhuo	fd07a4a0d3	EDAC/skx_common: Set the memory type correctly for HBM memory Set the memory type to MEM_HBM2 if it's managed by the HBM2 memory controller. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210720163009.GA1417532@agluck-desk2.amr.corp.intel.com	2021-08-23 10:32:32 -07:00
Krzysztof Kozlowski	7d07deb3b8	EDAC/altera: Skip defining unused structures for specific configs The Altera EDAC driver has several features conditionally built depending on Kconfig options. The edac_device_prv_data structures are conditionally used in of_device_id tables. They reference other functions and structures which can be defined as __maybe_unused. Silence build warnings like: drivers/edac/altera_edac.c:643:37: warning: ‘altr_edac_device_inject_fops’ defined but not used [-Wunused-const-variable=] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lkml.kernel.org/r/20210601092704.203555-1-krzysztof.kozlowski@canonical.com	2021-08-16 20:21:46 +02:00
Marc Zyngier	eecb06813d	EDAC/altera: Convert to generic_handle_domain_irq() Replace generic_handle_irq(irq_linear_revmap()) with a single call to generic_handle_domain_irq(). Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-08-12 11:39:41 +01:00
Smita Koralahalli	767f4b620e	EDAC/mce_amd: Do not load edac_mce_amd module on guests Hypervisors likely do not expose the SMCA feature to the guest and loading this module leads to false warnings. This module should not be loaded in guests to begin with, but people tend to do so, especially when testing kernels in VMs. And then they complain about those false warnings. Do the practical thing and do not load this module when running as a guest to avoid all that complaining. [ bp: Rewrite commit message. ] Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com	2021-08-09 12:35:43 +02:00
Naveen Krishna Chatradhi	e1ca90b7cc	EDAC/mc: Add new HBM2 memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for HBM2 (High Bandwidth Memory Gen 2) new memory type. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com	2021-07-20 09:20:49 -07:00
Randy Dunlap	a1c9ca5f65	EDAC/igen6: fix core dependency AGAIN My previous patch had a typo/thinko which prevents this driver from being enabled: change X64_64 to X86_64. Fixes: `0a9ece9ba1` ("EDAC/igen6: fix core dependency") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac@vger.kernel.org Cc: bowsingbetee <bowsingbetee@protonmail.com> Cc: stable@vger.kernel.org Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-15 11:59:59 -07:00
Dwaipayan Ray	d19faf0e49	EDAC/amd64: Use DEVICE_ATTR helper macros Instead of "open coding" DEVICE_ATTR, use the corresponding helper macros DEVICE_ATTR_{RW,RO,WO} in amd64_edac.c Some function names needed to be changed to match the device conventions <foo>_show and <foo>_store, but the functionality itself is unchanged. The devices using EDAC_DCT_ATTR_SHOW() are left unchanged. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210713065130.2151-1-dwaipayanray1@gmail.com	2021-07-13 09:59:57 -07:00
Linus Torvalds	71bd934101	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...	2021-07-02 12:08:10 -07:00
Andy Shevchenko	f39650de68	kernel.h: split out panic and oops helpers kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-01 11:06:04 -07:00
Linus Torvalds	4b5e35ce07	Various fixes and support for new CPUS - Clean up error messages from thunderx_edac - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload - Use %pR to print resources in aspeed_edac - Add Yazen Ghannam as MAINTAINER for AMD edac drivers - Fix Ice Lake and Sapphire Rapids drivers to report correct "near" or "far" device for errors in 2LM configurations - Add support of on package high bandwidth memory in Sapphire Rapids - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for ICL-NNPI, Tiger Lake and Alder Lake) - Don't even try to load Intel EDAC drivers when running as a guest - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6 -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQW3WBGcnu5yJnSXn0kTJLX0iGMLAUCYNujCRQcdG9ueS5sdWNr QGludGVsLmNvbQAKCRAkTJLX0iGMLCazAPsHoGc9ymStw0hL06Lw71Va/3VyqiUZ ha1OTWXCcV42UgD/QlwhKkHC3UkEI1dTEAI6McXcC88s8+yXQ8fKAPeQyQw= =iTWK -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Tony Luck: "Various fixes and support for new CPUs: - Clean up error messages from thunderx_edac - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload - Use %pR to print resources in aspeed_edac - Add Yazen Ghannam as MAINTAINER for AMD edac drivers - Fix Ice Lake and Sapphire Rapids drivers to report correct "near" or "far" device for errors in 2LM configurations - Add support of on package high bandwidth memory in Sapphire Rapids - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for ICL-NNPI, Tiger Lake and Alder Lake) - Don't even try to load Intel EDAC drivers when running as a guest - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6" * tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/igen6: fix core dependency EDAC/Intel: Do not load EDAC driver when running as a guest EDAC/igen6: Add Intel Alder Lake SoC support EDAC/igen6: Add Intel Tiger Lake SoC support EDAC/igen6: Add Intel ICL-NNPI SoC support EDAC/i10nm: Add support for high bandwidth memory EDAC/i10nm: Add detection of memory levels for ICX/SPR servers EDAC/skx_common: Add new ADXL components for 2-level memory MAINTAINERS: Make Yazen Ghannam maintainer for EDAC-AMD64 EDAC/aspeed: Use proper format string for printing resource EDAC/ti: Add missing MODULE_DEVICE_TABLE EDAC/thunderx: Remove irrelevant variable from error messages	2021-06-30 11:27:49 -07:00
Randy Dunlap	0a9ece9ba1	EDAC/igen6: fix core dependency igen6_edac needs mce_register()/unregister() functions, so it should depend on X86_MCE (or X86_MCE_INTEL). That change prevents these build errors: ld: drivers/edac/igen6_edac.o: in function `igen6_remove': igen6_edac.c:(.text+0x494): undefined reference to `mce_unregister_decode_chain' ld: drivers/edac/igen6_edac.o: in function `igen6_probe': igen6_edac.c:(.text+0xf5b): undefined reference to `mce_register_decode_chain' Fixes: `10590a9d4f` ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210619160203.2026-1-rdunlap@infradead.org	2021-06-20 14:04:48 -07:00
Luck, Tony	f0a029fff4	EDAC/Intel: Do not load EDAC driver when running as a guest There's little to no point in loading an EDAC driver running in a guest: 1) The CPU model reported by CPUID may not represent actual h/w 2) The hypervisor likely does not pass in access to memory controller devices 3) Hypervisors generally do not pass corrected error details to guests Add a check in each of the Intel EDAC drivers for X86_FEATURE_HYPERVISOR and simply return -ENODEV in the init routine. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210615174419.GA1087688@agluck-desk2.amr.corp.intel.com	2021-06-17 18:23:14 -07:00
Qiuxu Zhuo	ad774bd5a8	EDAC/igen6: Add Intel Alder Lake SoC support Alder Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Tiger Lake SoC. Like Tiger Lake, it also has two memory controllers each associated one IBECC instance. The minor differences include the MMIO offset of each memory controller and the type of memory error address logged in the IBECC. So add Alder Lake compute die IDs, adjust the MMIO offset for each memory controller and handle the type of memory error address logged in the IBECC for Alder Lake EDAC support. Tested-by: Vrukesh V Panse <vrukesh.v.panse@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-7-tony.luck@intel.com	2021-06-17 18:20:01 -07:00
Qiuxu Zhuo	0b7338b27e	EDAC/igen6: Add Intel Tiger Lake SoC support Tiger Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Elkhart Lake SoC. The main differences are that Tiger Lake has two memory controllers each associated with one IBECC and uses Machine Check for the memory error notification. So add Tiger Lake compute die IDs, MCE decoding chain registration, and memory slice decoding for Tiger Lake EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-6-tony.luck@intel.com	2021-06-17 18:19:53 -07:00
Qiuxu Zhuo	4e591c0568	EDAC/igen6: Add Intel ICL-NNPI SoC support The Ice Lake Neural Network Processor for Deep Learning Inference (ICL-NNPI) SoC shares the same memory controller and In-Band ECC with Elkhart Lake SoC. Add the ICL-NNPI compute die IDs for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-5-tony.luck@intel.com	2021-06-17 18:19:46 -07:00
Qiuxu Zhuo	c945088384	EDAC/i10nm: Add support for high bandwidth memory A future Xeon processor will include in-package HBM (high bandwidth memory). The in-package HBM memory controller shares the same architecture with the regular DDR memory controller. Add the HBM memory controller devices for EDAC support. Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-4-tony.luck@intel.com	2021-06-17 18:19:39 -07:00
Qiuxu Zhuo	4bd4d32e9a	EDAC/i10nm: Add detection of memory levels for ICX/SPR servers Current i10nm_edac driver is only for system configured in 1-level memory. If the system is configured in 2-level memory, the driver doesn't report the 1st level memory DIMM for the error address, even if the error occurs in the 1st level memory. Both Ice Lake servers and Sapphire Rapids servers can be configured in 2-level memory. Add detection of memory levels to i10nm_edac for the two kinds of servers so that the driver can report the 2nd level memory DIMM or the 1st level memory DIMM according to error source. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-3-tony.luck@intel.com	2021-06-17 18:19:30 -07:00
Qiuxu Zhuo	2f4348e5a8	EDAC/skx_common: Add new ADXL components for 2-level memory Some Intel servers may configure memory in 2 levels, using fast "near" memory (e.g. DDR) as a cache for larger, slower, "far" memory (e.g. 3D X-point). In these configurations the BIOS ADXL address translation for an address in a 2-level memory range will provide details of both the "near" and far components. Current exported ADXL components are only for 1-level memory system or for 2nd level memory of 2-level memory system. So add new ADXL components for 1st level memory of 2-level memory system to fully support 2-level memory system and the detection of memory error source(1st level memory or 2nd level memory). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-2-tony.luck@intel.com	2021-06-17 18:19:22 -07:00
Colin Ian King	429b2ba708	EDAC/mce_amd: Fix typo "FIfo" -> "Fifo" There is an uppercase letter I in one of the MCE error descriptions instead of a lowercase one. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210603103349.79117-1-colin.king@canonical.com	2021-06-04 15:44:25 +02:00
Muralidhara M K	94a311ce24	x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types Add the (HWID, MCATYPE) tuples and names for new SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Also while at it, optimize the string names for some SMCA banks. [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com	2021-05-27 20:08:14 +02:00
Arnd Bergmann	2e2f16d5cd	EDAC/aspeed: Use proper format string for printing resource On ARMv7, resource_size_t can be 64-bit, which breaks printing it as %x: drivers/edac/aspeed_edac.c: In function 'init_csrows': drivers/edac/aspeed_edac.c:257:28: error: format '%x' expects argument of \ type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long \ long unsigned int'} [-Werror=format=] 257 \| dev_dbg(mci->pdev, "dt: /memory node resources: first page \ r.start=0x%x, resource_size=0x%x, PAGE_SHIFT macro=0x%x\n", Use the special %pR format string to pretty-print the entire resource instead. Fixes: `edfc2d73ca` ("EDAC/aspeed: Add support for AST2400 and AST2600") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lkml.kernel.org/r/20210421135500.3518661-1-arnd@kernel.org	2021-05-18 16:33:13 +02:00
Bixuan Cui	0a37f32ba5	EDAC/ti: Add missing MODULE_DEVICE_TABLE The module misses MODULE_DEVICE_TABLE() for of_device_id tables and thus never autoloads on ID matches. Add the missing declaration. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tero Kristo <kristo@kernel.org> Link: https://lkml.kernel.org/r/20210512033727.26701-1-cuibixuan@huawei.com	2021-05-14 11:54:53 +02:00
Christophe JAILLET	89f5f8fb5b	EDAC/thunderx: Remove irrelevant variable from error messages 'ret' is irrelevant (it is 0) for both dev_err() calls, so just remove it from the error message. [ bp: Massage commit message. ] Fixes: `41003396f9` ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/0c046ef5cfb367a3f707ef4270e21a2bcbf44952.1620280098.git.christophe.jaillet@wanadoo.fr	2021-05-10 10:24:43 +02:00
Brijesh Singh	059e5c321a	x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG The SYSCFG MSR continued being updated beyond the K8 family; drop the K8 name from it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com	2021-05-10 07:51:38 +02:00
Krzysztof Kozlowski	098da961d8	EDAC: altera: merge ARCH_SOCFPGA and ARCH_STRATIX10 Simplify 32-bit and 64-bit Intel SoCFPGA Kconfig options by having only one for both of them. This the common practice for other platforms. Additionally, the ARCH_SOCFPGA is too generic as SoCFPGA designs come from multiple vendors. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>	2021-03-23 11:03:35 -05:00
Borislav Petkov	6118b48893	Merge branch 'edac-misc' into edac-updates-for-v5.12	2021-02-15 10:06:58 +01:00
Borislav Petkov	4cbcb73b1c	EDAC/amd64: Issue probing messages only on properly detected hardware amd64_edac was converted to CPU family autoprobing (from PCI device IDs) to not have to add a new PCI device ID each time a new platform is shipped but to support the whole family out-of-the-box. However, this caused a lot of noise in dmesg even when the machine doesn't have ECC DIMMs or ECC has been disabled in the BIOS: EDAC MC: Ver: 3.0.0 EDAC amd64: F17h detected (node 0). EDAC amd64: Node 0: DRAM ECC disabled. EDAC amd64: F17h detected (node 1). EDAC amd64: Node 1: DRAM ECC disabled. EDAC amd64: F17h detected (node 2). EDAC amd64: Node 2: DRAM ECC disabled. EDAC amd64: F17h detected (node 3). EDAC amd64: Node 3: DRAM ECC disabled. EDAC amd64: F17h detected (node 4). EDAC amd64: Node 4: DRAM ECC disabled. EDAC amd64: F17h detected (node 5). EDAC amd64: Node 5: DRAM ECC disabled. EDAC amd64: F17h detected (node 6). EDAC amd64: Node 6: DRAM ECC disabled. EDAC amd64: F17h detected (node 7). EDAC amd64: Node 7: DRAM ECC disabled. or even $ grep EDAC dmesg.log \| sed 's/\[.*\] //' \| sort \| uniq -c 128 EDAC amd64: F17h detected (node 0). 128 EDAC amd64: Node 0: DRAM ECC disabled. 1 EDAC MC: Ver: 3.0.0 on a big machine. Yap, that's once per CPU for 128 of them. So move the init messages after all probing has succeeded to avoid unnecessary spew in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210119164141.17417-1-bp@alien8.de	2021-01-22 11:13:47 +01:00
Menglong Dong	e26124cd5f	EDAC/xgene: Do not print a failure message to get an IRQ twice Coccinelle reports a redundant error print in xgene_edac_probe() because platform_get_irq() will already print an error message when it is unable to get an IRQ. Use platform_get_irq_optional() instead which avoids the error message and keep the driver-specific one. [ bp: Sanitize commit message. ] Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rric@kernel.org> Link: https://lkml.kernel.org/r/20210112103540.7818-1-dong.menglong@zte.com.cn	2021-01-19 10:22:23 +01:00
Zheng Yongjun	e0e0427412	EDAC/ppc4xx: Convert comma to semicolon Replace a comma between expression statements with a semicolon. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201216131846.14937-1-zhengyongjun3@huawei.com	2020-12-30 09:09:11 +01:00
Borislav Petkov	1865bc71a8	EDAC/amd64: Limit error injection functionality to supported hw Families up to and including 0x16 allow access to the injection hardware. Starting with family 0x17, access to those registers is blocked by security policy. Limit that only on the families which support it. Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201222180013.GD13463@zn.tnic	2020-12-28 19:36:37 +01:00
Borislav Petkov	61810096de	EDAC/amd64: Merge error injection sysfs facilities Merge them into the main driver and put them inside an EDAC_DEBUG ifdeffery to simplify the driver and have all debugging/injection stuff behind a debug build-time switch. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20201215110517.5215-2-bp@alien8.de	2020-12-28 19:36:25 +01:00
Borislav Petkov	2a28ceef00	EDAC/amd64: Merge sysfs debugging attributes setup code There's no need for them to be in a separate file so merge them into the main driver compilation unit like the other EDAC drivers do. Drop now-unneeded function export, make the function static and shorten static function names. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20201215110517.5215-1-bp@alien8.de	2020-12-28 19:36:17 +01:00
Yazen Ghannam	6a4afe3878	EDAC/amd64: Tone down messages about missing PCI IDs Give these messages a debug severity as they are really only useful to the module developers. Also, drop the "(broken BIOS?)" phrase, since this can cause churn for BIOS folks. The PCI IDs needed by the module, at least on modern systems, are fixed in hardware. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201215170131.8496-1-Yazen.Ghannam@amd.com	2020-12-28 19:18:03 +01:00
Borislav Petkov	6c13d7ff81	EDAC/amd64: Do not load on family 0x15, model 0x13 Those were only laptops and are very very unlikely to have ECC memory. Currently, when the driver attempts to load, it issues: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?) because the PCI device is the wrong one (it uses the F15h default one). So do not load the driver on them as that is pointless. Reported-by: Don Curtis <bugrprt21882@online.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Don Curtis <bugrprt21882@online.de> Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1179763 Link: https://lkml.kernel.org/r/20201218160622.20146-1-bp@alien8.de	2020-12-28 12:18:11 +01:00
Linus Torvalds	ac73e3dc8a	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...	2020-12-15 12:53:37 -08:00
Bartosz Golaszewski	af11be05b6	edac: ghes: use krealloc_array() Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/20201109110654.12547-7-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Knig <christian.koenig@amd.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: James Morse <james.morse@arm.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-15 12:13:37 -08:00
Linus Torvalds	0d712978dc	- Save the AMD's physical die ID into cpuinfo_x86.cpu_die_id and convert all code to use it (Yazen Ghannam) - Remove a dead and unused TSEG region remapping workaround on AMD (Arvind Sankar) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XVlYACgkQEsHwGGHe VUpxTA/9F0KsgSyTh66uX+aX5qkQ3WTBVgxbXGFrn5qPvwcALXabU8qObDWTSdwS 1YbiWDjKNBJX+dggWe/fcQgUZxu5DFkM4IKEW1V7MLJEcdfylcqCyc1YNpEI4ySn ebw2Sy4/5iXGAvhz802/WoUU/o3A2uZwe0RFyodHGxof5027HkZhRHeYB27Htw+l z0IsmiYOoPl/4mNuVgr/qieIFSw1SUE9kwjU8RvM6xVWmXWXpM68JHa9s+/51pFt 6BaOz485OyzWUCtSx3/++GEkU2d53bWYOuQ1zTLEiuaBfYC5n5T/kAcT4WJNK6Tf tX7yrzmWm9ecykIxfkgMrhG57G38y2GMJcEg+dFQHeXC062fdHDg+oY6Ql2EkAm5 t5RIQ/cyOmQCLns31rHI/kwQ3RMKc/lfnL/z8lrlfWsC5o755yFJKttbfLJugbTo 3BO1fbs4xgQcgi0KoqXOUETrQtsOLtr9FJwvcArB94XXqcIPClE8Ir7n8T7FCuLr 9litSXIdn46EHwD6hD5QIk7y+Rxwk/jxZFys3eh90jcWDDZTaG2lz3if33RbZ1go XBrS5X3HsMODGZlaMeUjrbFIz3e0Zyoo+RO/TX48w8nzivC6xSNxSNFgIZ1XTF5E SLMGa6lEQ9mLiqRfgFjynNwSYOSlGv3euMkZaVPS3hnNmn+vZbI= =RsCs -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: "Only AMD-specific changes this time: - Save the AMD physical die ID into cpuinfo_x86.cpu_die_id and convert all code to use it (Yazen Ghannam) - Remove a dead and unused TSEG region remapping workaround on AMD (Arvind Sankar)" * tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove dead code for TSEG region remapping x86/topology: Set cpu_die_id only if DIE_TYPE found EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId x86/CPU/AMD: Remove amd_get_nb_id() x86/CPU/AMD: Save AMD NodeId as cpu_die_id	2020-12-14 13:21:33 -08:00
Borislav Petkov	f84b799996	Merge branches 'edac-spr', 'edac-igen6' and 'edac-misc' into edac-updates-for-v5.11 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-12-14 11:51:46 +01:00
Michael Ellerman	0385979a30	EDAC/mv64x60: Remove orphan mv64x60 driver The mv64x60 EDAC driver depends on CONFIG_MV64X60. But that symbol is not user-selectable, and the last code that selected it was removed with the C2K board support in 2018, see: `92c8c16f34` ("powerpc/embedded6xx: Remove C2K board support") That means the driver is now dead code, so remove it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201207040253.628528-1-mpe@ellerman.id.au	2020-12-07 12:16:02 +01:00
Troy Lee	edfc2d73ca	EDAC/aspeed: Add support for AST2400 and AST2600 Add AST2400 and AST2600 EDAC driver support. Signed-off-by: Troy Lee <troy_lee@aspeedtech.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <sschaeck@cisco.com> Link: https://lkml.kernel.org/r/20201207090013.14145-3-troy_lee@aspeedtech.com	2020-12-07 12:05:41 +01:00
Borislav Petkov	706657b1fe	EDAC/amd64: Fix PCI component registration In order to setup its PCI component, the driver needs any node private instance in order to get a reference to the PCI device and hand that into edac_pci_create_generic_ctl(). For convenience, it uses the 0th memory controller descriptor under the assumption that if any, the 0th will be always present. However, this assumption goes wrong when the 0th node doesn't have memory and the driver doesn't initialize an instance for it: EDAC amd64: F17h detected (node 0). ... EDAC amd64: Node 0: No DIMMs detected. But looking up node instances is not really needed - all one needs is the pointer to the proper device which gets discovered during instance init. So stash that pointer into a variable and use it when setting up the EDAC PCI component. Clear that variable when the driver needs to unwind due to some instances failing init to avoid any registration imbalance. Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de	2020-11-27 11:11:16 +01:00
kernel test robot	77429eebd9	EDAC/igen6: ecclog_llist can be static Fixes: `10590a9d4f` ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201123031850.GA20416@aef56166e5fc Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-23 10:11:08 -08:00
Qiuxu Zhuo	479f58dda2	EDAC/i10nm: Add Intel Sapphire Rapids server support The Sapphire Rapids CPU model shares the same memory controller architecture with Ice Lake server. There are some configurations different from Ice Lake server as below: - The device ID for configuration agent. - The size for per channel memory-mapped I/O. - The DDR5 memory support. So add the above configurations and the Sapphire Rapids CPU model ID for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-19 12:57:26 -08:00
Qiuxu Zhuo	bc1c99a597	EDAC: Add DDR5 new memory type Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for DDR5 new memory type. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-19 12:57:09 -08:00
Qiuxu Zhuo	83ff51c4e3	EDAC/i10nm: Use readl() to access MMIO registers Instead of raw access, use readl() to access MMIO registers of memory controller to avoid possible compiler re-ordering. Fixes: `d4dc89d069` ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Cc: <stable@vger.kernel.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-19 12:53:55 -08:00
Qiuxu Zhuo	2223d8c781	EDAC/igen6: Add debugfs interface for Intel client SoC EDAC driver Add debugfs support to fake memory correctable errors to test the error reporting path and the error address decoding logic in the igen6_edac driver. Please note that the fake errors are also reported to EDAC core and then the CE counter in EDAC sysfs is also increased. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-19 12:52:47 -08:00
Qiuxu Zhuo	10590a9d4f	EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC This driver supports Intel client SoC with integrated memory controller using In-Band ECC(IBECC). The memory correctable and uncorrectable errors are reported via NMIs. The driver handles the NMIs and decodes the memory error address to platform specific address. The first IBECC-supported SoC is Elkhart Lake. [Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-19 12:51:17 -08:00
Yazen Ghannam	8de0c9917c	EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and later systems. This function is used in amd64_edac_mod to do system-specific decoding for DRAM ECC errors. The function takes a "NodeId" as a parameter. In AMD documentation, NodeId is used to identify a physical die in a system. This can be used to identify a node in the AMD_NB code and also it is used with umc_normaddr_to_sysaddr(). However, the input used for decode_dram_ecc() is currently the NUMA node of a logical CPU. In the default configuration, the NUMA node and physical die will be equivalent, so this doesn't have an impact. But the NUMA node configuration can be adjusted with optional memory interleaving modes. This will cause the NUMA node enumeration to not match the physical die enumeration. The mismatch will cause the address translation function to fail or report incorrect results. Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the physical ID is used. Fixes: `fbe63acf62` ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com	2020-11-19 11:43:21 +01:00
Yazen Ghannam	db970bd231	x86/CPU/AMD: Remove amd_get_nb_id() The Last Level Cache ID is returned by amd_get_nb_id(). In practice, this value is the same as the AMD NodeId for callers of this function. The NodeId is saved in struct cpuinfo_x86.cpu_die_id. Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and remove the function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com	2020-11-19 11:43:17 +01:00
Zhang Xiaoxu	61d35648c0	EDAC/synopsys: Return the correct value in mc_probe() Return the error value if the inject sysfs file creation fails, rather than returning 0, to signal to the upper layer that the ->probe function failed. [ bp: Massage. ] Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20201116135810.3130845-1-zhangxiaoxu5@huawei.com	2020-11-18 18:59:47 +01:00
Qiuxu Zhuo	3b20369313	EDAC: Add three new memory types There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2020-11-05 08:30:22 -08:00
Mauro Carvalho Chehab	2426999902	EDAC: Fix some kernel-doc markups Kernel-doc markup should use this format: identifier - description Correct that and also fix some enums' names in the kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1d291393ba58c7b80908a3fedf02d2f53921ffe9.1603469755.git.mchehab+huawei@kernel.org	2020-11-02 20:33:19 +01:00
Borislav Petkov	f30795fb40	EDAC: Do not issue useless debug statements in the polling routine They have been spreading around the subsystem by example so remove them all. Reported-by: Raymond Bennett <raymond.bennett@gmail.com> Suggested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2020-10-26 12:59:56 +01:00
Tom Rix	f09056c1de	EDAC/amd64: Remove unneeded breaks A break is not needed if it is preceded by a return. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rric@kernel.org> Link: https://lkml.kernel.org/r/20201019193524.13391-1-trix@redhat.com	2020-10-26 12:07:50 +01:00
Linus Torvalds	e6412f9833	EFI changes for v5.10: - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV tree. - Relax decompressed image placement rules for 32-bit ARM - Add support for passing MOK certificate table contents via a config table rather than a EFI variable. - Add support for 18 bit DIMM row IDs in the CPER records. - Work around broken Dell firmware that passes the entire Boot#### variable contents as the command line - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can identify it in the memory map listings. - Don't abort the boot on arm64 if the EFI RNG protocol is available but returns with an error - Replace slashes with exclamation marks in efivarfs file names - Split efi-pstore from the deprecated efivars sysfs code, so we can disable the latter on !x86. - Misc fixes, cleanups and updates. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Ec9QRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1inTQ//TYj3kJq/7sWfUAxmAsWnUEC005YCNf0T x3kJQv3zYX4Rl4eEwkff8S1PrqqvwUP5yUZYApp8HD9s9CYvzz5iG5xtf/jX+QaV 06JnTMnkoycx2NaOlbr1cmcIn4/cAhQVYbVCeVrlf7QL8enNTBr5IIQmo4mgP8Lc mauSsO1XU8ZuMQM+JcZSxAkAPxlhz3dbR5GteP4o2K4ShQKpiTCOfOG1J3FvUYba s1HGnhHFlkQr6m3pC+iG8dnAG0YtwHMH1eJVP7mbeKUsMXz944U8OVXDWxtn81pH /Xt/aFZXnoqwlSXythAr6vFTuEEn40n+qoOK6jhtcGPUeiAFPJgiaeAXw3gO0YBe Y8nEgdGfdNOMih94McRd4M6gB/N3vdqAGt+vjiZSCtzE+nTWRyIXSGCXuDVpkvL4 VpEXpPINnt1FZZ3T/7dPro4X7pXALhODE+pl36RCbfHVBZKRfLV1Mc1prAUGXPxW E0MfaM9TxDnVhs3VPWlHmRgavee2MT1Tl/ES4CrRHEoz8ZCcu4MfROQyao8+Gobr VR+jVk+xbyDrykEc6jdAK4sDFXpTambuV624LiKkh6Mc4yfHRhPGrmP5c5l7SnCd aLp+scQ4T7sqkLuYlXpausXE3h4sm5uur5hNIRpdlvnwZBXpDEpkzI8x0C9OYr0Q kvFrreQWPLQ= =ZNI8 -----END PGP SIGNATURE----- Merge tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV tree. - Relax decompressed image placement rules for 32-bit ARM - Add support for passing MOK certificate table contents via a config table rather than a EFI variable. - Add support for 18 bit DIMM row IDs in the CPER records. - Work around broken Dell firmware that passes the entire Boot#### variable contents as the command line - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can identify it in the memory map listings. - Don't abort the boot on arm64 if the EFI RNG protocol is available but returns with an error - Replace slashes with exclamation marks in efivarfs file names - Split efi-pstore from the deprecated efivars sysfs code, so we can disable the latter on !x86. - Misc fixes, cleanups and updates. * tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) efi: mokvar: add missing include of asm/early_ioremap.h efi: efivars: limit availability to X86 builds efi: remove some false dependencies on CONFIG_EFI_VARS efi: gsmi: fix false dependency on CONFIG_EFI_VARS efi: efivars: un-export efivars_sysfs_init() efi: pstore: move workqueue handling out of efivars efi: pstore: disentangle from deprecated efivars module efi: mokvar-table: fix some issues in new code efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure efivarfs: Replace invalid slashes with exclamation marks in dentries. efi: Delete deprecated parameter comments efi/libstub: Fix missing-prototypes in string.c efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it cper,edac,efi: Memory Error Record: bank group/address and chip id edac,ghes,cper: Add Row Extension to Memory Error Record efi/x86: Add a quirk to support command line arguments on Dell EFI firmware efi/libstub: Add efi_warn and *_once logging helpers integrity: Load certs from the EFI MOK config table integrity: Move import of MokListRT certs to a separate routine efi: Support for MOK variable config table ...	2020-10-12 13:26:49 -07:00
Linus Torvalds	ca1b66922a	* Extend the recovery from MCE in kernel space also to processes which encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. * memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. * New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. * Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. * Misc fixes, improvements and cleanups, as always. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EIpUACgkQEsHwGGHe VUouoBAAgwb+NkWZtIqGImV4f+LOyFjhTR/r/7ZyiijXdbhOIuAdc/jQM31mQxug sX2jxaRYnf1n6SLA0ggX99gwr2deRQ/hsNf5Abw55GC+Z1dOxpGL0k59A3ELl1IR H9KYmCAFQIHvzfk38qcdND73XHcgthQoXFBOG9wAPAdgDWnaiWt6lcLAq8OiJTmp D8pInAYhcnL8YXwMGyQQ1KkFn9HwydoWDsK5Ff2shaw2/+dMQqd1zetenbVtjhLb iNYGvV7Bi/RQ8PyMbzmtTWa4kwQJAHC2gptkGxty//2ADGVBbqUQdqF9TjIWCNy5 V6Ldv5zo0/1s7DOzji3htzqkSs/K1Ea6d2LtZjejkJipHKV5x068UC6Fu+PlfS2D VZfcICeapU4G2F3Zvks2DlZ7dVTbHCvoI78Qi7bBgczPUVmk6iqah4xuQaiHyBJc kTFDA4Nnf/026GpoWRiFry9vqdnHBZyLet5A6Y+SoWF0FbhYnCVPpq4MnussYoav lUIi9ZZav6X2RZp9DDM1f9d5xubtKq0DKt93wvzqAhjK0T2DikckJ+riOYkI6N8t fHCBNUkdfgyMzJUTBPAzYQ7RmjbjKWJi7xWP0oz6+GqOJkQfSTVC5/2yEffbb3ya whYRS6iklbl7yshzaOeecXsZcAeK2oGPfoHg34WkHFgXdF5mNgA= =u1Wg -----END PGP SIGNATURE----- Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Extend the recovery from MCE in kernel space also to processes which encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. - memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. - Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. - Misc fixes, improvements and cleanups, as always. * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Allow for copy_mc_fragile symbol checksum to be generated x86/mce: Decode a kernel instruction to determine if it is copying from user x86/mce: Recover from poison found while copying from user space x86/mce: Avoid tail copy when machine check terminated a copy from user x86/mce: Add _ASM_EXTABLE_CPY for copy user access x86/mce: Provide method to find out the type of an exception handler x86/mce: Pass pointer to saved pt_regs to severity calculation routines x86/copy_mc: Introduce copy_mc_enhanced_fast_string() x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list x86/mce: Add Skylake quirk for patrol scrub reported errors RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE() x86/mce: Annotate mce_rd/wrmsrl() with noinstr x86/mce/dev-mcelog: Do not update kflags on AMD systems x86/mce: Stop mce_reign() from re-computing severity for every CPU x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR x86/mce: Increase maximum number of banks to 64 x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check() x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap RAS/CEC: Fix cec_init() prototype	2020-10-12 10:14:38 -07:00
Linus Torvalds	a9a4b7d9a6	* Add Amazon's Annapurna Labs memory controller EDAC driver, by Talel Shenhar. * New AMD CPUs support, by Yazen Ghannam. * The usual misc fixes and cleanups all over the subsystem. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EHJoACgkQEsHwGGHe VUoZwQ//XrPtvR/ADQpFnh++WwcmbbbqA6yruBy8zyCVHveSG9dukl4pITO21CGQ QGvOGdV3aEwDNHyrErqzWEK2o8RF28LYeTpRtaRDdH7ozk7Ua8Jqxoj7JSijz2Mv iMh5Kn0v3+EcOMckbFBBjwf+yIPUw7/HwHDqsQ9K+F7drplnkRxU6mDyurYTTKJJ Y00iOTp/PY7rZeHXmQAgbxvmDfv6/Xoo15ZKtzkHGtG7xW6y3n2NU2tegZizI7kA LDx0g4lmZQZOsv2DR3ZpeLclsq9pYWCYga3P4+DJkVAAdJJ4qtk+XyKFMzyMhVOa AyDJArSRXZsUFzbaMe1hlGknrSeXDTJHlZ7FjlpJHAudAohZlNCOEIS0VQzXwYjN 0FeR9rjinhaSOkMW1S3bBKssciCEs93a0oMASvHbzy4NGAM97YyRDZof1Iu3/jWX 20YRkSXyFhqSSRznJUh58kKk1f85B2ikySB4b3mZatPEiDJvF3I2KS/j3BFKoYE6 9JT+eP1+4FyEcobG6e8VE8BZw7+sw0A6eYisAbCLCV5oaQMdR946BEXZQKaFHDib zKaveB4I/xbiLghYl61mMNSPtGD4czs+JR7/QEP5Q40VvwNJKjh1oHkt7hZQAgQQ yv7JkstTpHQK/B8V6exEfBbCXQekItsVSrryPPW4gfmqncgyEVU= =ROcp -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add Amazon's Annapurna Labs memory controller EDAC driver (Talel Shenhar) - New AMD CPUs support (Yazen Ghannam) - The usual misc fixes and cleanups all over the subsystem * tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location EDAC/aspeed: Use module_platform_driver() to simplify EDAC, sb_edac: Simplify switch statement EDAC/ti: Fix handling of platform_get_irq() error EDAC/aspeed: Fix handling of platform_get_irq() error EDAC/i5100: Fix error handling order in i5100_init_one() EDAC/highbank: Handover Calxeda Highbank maintenance to Andre Przywara EDAC/socfpga: Transfer SoCFPGA EDAC maintainership EDAC/thunderx: Make symbol lmc_dfs_ents static EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver dt-bindings: EDAC: Add Amazon's Annapurna Labs Memory Controller binding EDAC/mce_amd: Add new error descriptions for existing types EDAC: Replace HTTP links with HTTPS ones	2020-10-12 10:12:26 -07:00
Borislav Petkov	1dc32628d6	Merge branch 'edac-drivers' into edac-updates-for-v5.10 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-10-12 11:05:42 +02:00
Yazen Ghannam	b4210eab91	EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models 70h-7Fh. The same family ops and number of channels also apply. Use the Family17h Model 70h family_type and ops for Family 19h Models 20h-2Fh. Update the controller name to match the system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201009171803.3214354-1-Yazen.Ghannam@amd.com	2020-10-09 19:28:14 +02:00
Xiongfeng Wang	e6bbde8b2b	EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location Reading those sysfs entries gives: [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location memory 0 [root@localhost /]# Add newlines after the value it prints for better readability. [ bp: Make len a signed int and change the check to catch wraparound. Increment the pointer p only when the length check passes. Use scnprintf(). ] Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com	2020-09-18 09:14:01 +02:00
Liu Shixin	07def58717	EDAC/aspeed: Use module_platform_driver() to simplify Use module_platform_driver() which makes the code simpler by eliminating boilerplate code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lkml.kernel.org/r/20200914065358.3726216-1-liushixin2@huawei.com	2020-09-18 09:14:01 +02:00
Alex Kluver	612b5d506d	cper,edac,efi: Memory Error Record: bank group/address and chip id Updates to the UEFI 2.8 Memory Error Record allow splitting the bank field into bank address and bank group, and using the last 3 bits of the extended field as a chip identifier. When needed, print correct version of bank field, bank group, and chip identification. Based on UEFI 2.8 Table 299. Memory Error Record. Signed-off-by: Alex Kluver <alex.kluver@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-3-alex.kluver@hpe.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2020-09-17 10:19:52 +03:00
Alex Kluver	9baf68cc45	edac,ghes,cper: Add Row Extension to Memory Error Record Memory errors could be printed with incorrect row values since the DIMM size has outgrown the 16 bit row field in the CPER structure. UEFI Specification Version 2.8 has increased the size of row by allowing it to use the first 2 bits from a previously reserved space within the structure. When needed, add the extension bits to the row value printed. Based on UEFI 2.8 Table 299. Memory Error Record Signed-off-by: Alex Kluver <alex.kluver@hpe.com> Tested-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-2-alex.kluver@hpe.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2020-09-17 10:19:52 +03:00
Borislav Petkov	251c54ea26	EDAC/ghes: Check whether the driver is on the safe list correctly With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe, unregister and probe again a driver. When ghes_edac is attempted to be loaded on a system which is not on the safe platforms list, ghes_edac_register() would return early. The unregister counterpart ghes_edac_unregister() would still attempt to unregister and exit early at the refcount test, leading to the refcount underflow below. In order to not do anything on the unregister path too, reuse the force_load parameter and check it on that path too, before fumbling with the refcount. ghes_edac: ghes_edac_register: entry ghes_edac: ghes_edac_register: return -ENODEV ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100 Modules linked in: CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:refcount_warn_saturate+0xb9/0x100 Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c RSP: 0018:ffffc90000037d58 EFLAGS: 00010292 RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0 Call Trace: ghes_edac_unregister ghes_remove platform_drv_remove really_probe driver_probe_device device_driver_attach __driver_attach ? device_driver_attach ? device_driver_attach bus_for_each_dev bus_add_driver driver_register ? bert_init ghes_init do_one_initcall ? rcu_read_lock_sched_held kernel_init_freeable ? rest_init kernel_init ret_from_fork ... ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824 Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164950.GB19320@zn.tnic	2020-09-15 09:42:15 +02:00
Borislav Petkov	cd8100f1f3	EDAC/ghes: Clear scanned data on unload Commit `b972fdba86` ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") didn't clear all the information from the scanned system and, more specifically, left ghes_hw.num_dimms to its previous value. On a second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the leftover num_dimms value which is not 0 and thus the 0 check in enumerate_dimms() will get bypassed and it would go directly to the pointer deref: d = &hw->dimms[hw->num_dimms]; which is, of course, NULL: #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:enumerate_dimms.cold+0x7b/0x375 Reset the whole ghes_hw on driver unregister so that no stale values are used on a second system scan. Fixes: `b972fdba86` ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") Cc: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164817.GA19320@zn.tnic	2020-09-15 09:41:28 +02:00
Tom Rix	fbd4ab7802	EDAC, sb_edac: Simplify switch statement clang static analyzer reports this problem sb_edac.c:959:2: warning: Undefined or garbage value returned to caller return type; ^~~~~~~~~~~ This is a false positive. However by initializing the type to DEV_UNKNOWN the 3 case can be removed from the switch, saving a comparison and jump. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200907153225.7294-1-trix@redhat.com	2020-09-08 14:56:17 -07:00
Krzysztof Kozlowski	66077adb70	EDAC/ti: Fix handling of platform_get_irq() error platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: `86a18ee21e` ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tero Kristo <t-kristo@ti.com> Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org	2020-09-01 20:43:20 +02:00
Krzysztof Kozlowski	afce699694	EDAC/aspeed: Fix handling of platform_get_irq() error platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: `9b7e6242ee` ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net> Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org	2020-09-01 20:41:27 +02:00
Dinghao Liu	857a3139bd	EDAC/i5100: Fix error handling order in i5100_init_one() When pci_get_device_func() fails, the driver doesn't need to execute pci_dev_put(). mci should still be freed, though, to prevent a memory leak. When pci_enable_device() fails, the error injection PCI device "einj" doesn't need to be disabled either. [ bp: Massage commit message, rename label to "bail_mc_free". ] Fixes: `52608ba205` ("i5100_edac: probe for device 19 function 0") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn	2020-09-01 12:10:19 +02:00
Linus Torvalds	42df60fcdf	A fix to properly clear ghes_edac driver state on driver remove so that a subsequent load can probe the system properly; by Shiju Jose. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9LTFoACgkQEsHwGGHe VUpAxQ//YXxtNRrGxTQa6TGcHE0wRif/EkA20O54zWGPt+kolW9mXmsUFweDi0PH U6EW1To9H7vukfynCxc+TFGNG8oh3B6XXhaMILsNi691m418Z52a+wOHGaf/qLpX fqJe3WIIvo97cFtJJJvdHW90NaswtIo0TUspYqUiyThjwI3a9XXmY6Vh7odtVQaP M/tYt90By60KAR57zROifo6binSOO45zCOJiEO6Qm8X9UphXhShv6oL4TQvlEEwf ydWSz5NZwPnnVPk4MATMIQoptrixNaDXAfmh2kLBuNNmeNLCHKGous9C7ErI+Pqo ZH5oECXGOvJELRL3CJ/6fgb9lKaUg9z0IILsRhmoSqj6AYNcDAreWyi4wjJTaRTs rGNzYGHgB1tlWIpFxHBoxl3u/0EaRatkfvHYnY/i9LlmbIcNQgS+vBxV8LNyCRnk uxzJDIc8adPx+4RwmzQrnjwN6A52d50JfhVXakQykTWxTaKFg2qkEDUgpEJ4bMCr i77H1ytr+nGuR4BL9mferTfZqSWZULiQkeDV+skKAx8sSryOrKloucDXJYUzWFDU 6MfKi4iqfNH+w0LQxDE4cRGfOID1tEz7WWI1ROflI17sD2zW+/WkgYoHj7nD2Vwy xm5DD91Jst4rdDd53PqJhiB3sz/V9VGlDL7o+BsYzYagFP9pyfc= =WMYq -----END PGP SIGNATURE----- Merge tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: "A fix to properly clear ghes_edac driver state on driver remove so that a subsequent load can probe the system properly (Shiju Jose)" * tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()	2020-08-30 10:47:23 -07:00
Shiju Jose	b972fdba86	EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register() After `b9cae27728` ("EDAC/ghes: Scan the system once on driver init") and with CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled, ghes_hw.dimms becomes a NULL pointer after the second ->probe() (aka ghes_edac_register()) which the config option causes to be called. This happens because the static variable which holds down whether the system has been scanned already, doesn't get reset in ghes_edac_unregister(). Then, on the second probe, ghes_scan_system() doesn't get to enumerate the DIMMs, leading to ghes_hw.dimms remaining NULL. Clear the variable and rename it to something more descriptive so that a second probe succeeds. [ bp: Rewrite commit message. ] Fixes: `b9cae27728` ("EDAC/ghes: Scan the system once on driver init") Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200827140450.1620-1-shiju.jose@huawei.com	2020-08-27 18:04:07 +02:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Yazen Ghannam	368d188720	x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap The Extended Error Code Bitmap (xec_bitmap) for a Scalable MCA bank type was intended to be used by the kernel to filter out invalid error codes on a system. However, this is unnecessary after a few product releases because the hardware will only report valid error codes. Thus, there's no need for it with future systems. Remove the xec_bitmap field and all references to it. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200720145353.43924-1-Yazen.Ghannam@amd.com	2020-08-20 10:34:38 +02:00
Tony Luck	45bc6098a3	EDAC/{i7core,sb,pnd2,skx}: Fix error event severity IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto the stack as part of machine check delivery is valid or not. Various drivers copied a code fragment that uses the RIPV bit to determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where RIPV is set as "FATAL"). Reverse the tests so that the error is marked fatal when RIPV is not set. Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com	2020-08-18 15:40:30 +02:00
Wei Yongjun	bd17e0b771	EDAC/thunderx: Make symbol lmc_dfs_ents static Symbol 'lmc_dfs_ents' is not used outside of thunderx_edac.c, so make it static: drivers/edac/thunderx_edac.c:457:22: warning: symbol 'lmc_dfs_ents' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rrichter@marvell.com> Link: https://lkml.kernel.org/r/20200714142308.46612-1-weiyongjun1@huawei.com	2020-08-17 10:35:46 +02:00
Talel Shenhar	e23a7cdeb3	EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver The Amazon's Annapurna Labs Memory Controller EDAC supports ECC capability for error detection and correction (Single bit error correction, Double detection). This driver introduces EDAC driver for that capability. [ bp: Remove "EDAC" string from Kconfig tristate as it is redundant. ] Signed-off-by: Talel Shenhar <talel@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lkml.kernel.org/r/20200816185551.19108-3-talel@amazon.com	2020-08-17 10:10:29 +02:00
Yazen Ghannam	dc7a8476cf	EDAC/mce_amd: Add new error descriptions for existing types A few existing MCA bank types will have new error types in future SMCA systems. Add the descriptions for the new error types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200708153515.1911642-1-Yazen.Ghannam@amd.com	2020-08-17 09:36:00 +02:00
Alexander A. Klimov	7d4c1ea2be	EDAC: Replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w\|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [ bp: Merge all EDAC patches into a single one. ] Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tero Kristo <t-kristo@ti.com> # ti_edac Link: https://lkml.kernel.org/r/20200708113546.14135-1-grandmaster@al2klimov.de	2020-08-17 09:31:19 +02:00
Linus Torvalds	6ffdcde4ee	Fixes for ie31200 driver that missed the first pull -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQW3WBGcnu5yJnSXn0kTJLX0iGMLAUCXzcrQRQcdG9ueS5sdWNr QGludGVsLmNvbQAKCRAkTJLX0iGMLHeUAP9mEdU4W+7STNaXuzJ/5PyWbDIHPB3m X/KXBlPRyQ5lXgD6A+IRTY3Ob7fBjs9i+sTuH5yMY6oi9YneC4lt1uqmmQc= =fg/o -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull edac fix from Tony Luck: "Fix for the ie31200 driver that missed the first pull" * tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ie31200: Fallback if host bridge device is already initialized	2020-08-15 08:25:41 -07:00
Jason Baron	709ed1bcef	EDAC/ie31200: Fallback if host bridge device is already initialized The Intel uncore driver may claim some of the pci ids from ie31200 which means that the ie31200 edac driver will not initialize them as part of pci_register_driver(). Let's add a fallback for this case to 'pci_get_device()' to get a reference on the device such that it can still be configured. This is similar in approach to other edac drivers. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com	2020-08-10 11:13:06 -07:00
Linus Torvalds	f8851cb2d0	`17ed808ad2` ("EDAC: Fix reference count leaks") `e370f886fe` ("EDAC: Remove edac_get_dimm_by_index()") `b9cae27728` ("EDAC/ghes: Scan the system once on driver init") `b001694d60` ("EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvt") `cb51a371d0` ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") `8807e15597` ("EDAC, {skx,i10nm}: Use CPU stepping macro to pass configurations") `e9ff6636d3` ("EDAC/mc: Call edac_inc_ue_error() before panic") `30bf38e434` ("EDAC, pnd2: Set MCE_PRIO_EDAC priority for pnd2_mce_dec notifier") -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQW3WBGcnu5yJnSXn0kTJLX0iGMLAUCXyNSJxQcdG9ueS5sdWNr QGludGVsLmNvbQAKCRAkTJLX0iGMLLX+AP0RHxHv0kYI8EScLpdHFnoGAKdKJ8Zb UmTTWqz4GwdoFgD/VIQ/RueIfsV+Hlqj+U+84+NJwPEcbJxY6WLo9ZHUkgc= =aFwG -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Tony Luck: "Boris is on vacation and aske me to send you the EDAC changes" * tag 'edac_updates_for_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Fix reference count leaks EDAC: Remove edac_get_dimm_by_index() EDAC/ghes: Scan the system once on driver init EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvt EDAC/ghes: Setup DIMM label from DMI and use it in error reports EDAC, {skx,i10nm}: Use CPU stepping macro to pass configurations EDAC/mc: Call edac_inc_ue_error() before panic EDAC, pnd2: Set MCE_PRIO_EDAC priority for pnd2_mce_dec notifier	2020-08-03 20:01:00 -07:00
Linus Torvalds	e53bc3ff99	Boris is on vacation and he asked us to send you the pending RAS bits: - Print the PPIN field on CPUs that fill them out - Fix an MCE injection bug - Simplify a kzalloc in dev_mcelog_init_device() Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oZ+ERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iIVQ//XLP3qky/X/DOcOhdqw7JEDUpElK+sLdb nFLcSP0vgdoip0jjz0voEf2tb0nRbZRqjnCIuHj5Dq8fTWbM/qcD0zM35N5lk5bJ N+y9cE+FQg2oayU/0O3IezoT0lAW3YzdE2HdzhZ2ZEiDKx3fAj9QoJbKlc4TxshC Tg93IpMq5dmnj6Ge3BfNE56O4WCLIQxs2MCIWHWjBEU8qb4Nry7vEL9hijmFQNRd 85i9pDrOkdnLDKhTagHUR4HbhkAacWeJKe0UoraJFr8/1nCm8Hcii5hI1t5N9KXg /+asjkhLX75uYy2aMmwF7qelSLNbOieMNji7rt42vevBsqrAtX0zuSvbEBb8bNdx f4WS5GcmUTZVHTuppPzAX3YNmYdrGqmecerAY+1j+uHtWFGs07ZCnKx+kQbiK0m4 LqaQmi16KsV/Jj9BYZu8wLd/uIAsRqnY5nJlbe7of0/kFPzlJkRkT8Z3DPhSp/ee qHc3eR1EFMQvNr8F1e17TPCTcq5YjLM9BXFX4JJppaX30YOb6ZH0z5jYMbouJeKD 8qtQ1BtawWKqkoi/0PtbH6khRFXH6oUNWZ+2aAKQqWwfJDdP8Q7fFyIvgN487s79 u7fN4h8qj7w1AeIRJQH5v1G/Es14gdBcotgkM2AflQLF+c7QormnHdCJZad4gTAj /X8Ks5DE51w= =z0MD -----END PGP SIGNATURE----- Merge tag 'ras-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "Boris is on vacation and he asked us to send you the pending RAS bits: - Print the PPIN field on CPUs that fill them out - Fix an MCE injection bug - Simplify a kzalloc in dev_mcelog_init_device()" * tag 'ras-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce, EDAC/mce_amd: Print PPIN in machine check records x86/mce/dev-mcelog: Use struct_size() helper in kzalloc() x86/mce/inject: Fix a wrong assignment of i_mce.status	2020-08-03 17:42:23 -07:00
Smita Koralahalli	bb2de0adca	x86/mce, EDAC/mce_amd: Print PPIN in machine check records Print the Protected Processor Identification Number (PPIN) on processors which support it. [ bp: Massage. ] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200623130059.8870-1-Smita.KoralahalliChannabasappa@amd.com	2020-06-23 17:27:53 +02:00
Borislav Petkov	0f959e19fa	Merge branch 'edac-ghes' into edac-for-next	2020-06-22 15:28:01 +02:00
Borislav Petkov	ee470bb25d	EDAC/amd64: Read back the scrub rate PCI register on F15h Commit: `da92110dfd` ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") added support for F15h, model 0x60 CPUs but in doing so, missed to read back SCRCTRL PCI config register on F15h CPUs which are not model 0x60. Add that read so that doing $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate can show the previously set DRAM scrub rate. Fixes: `da92110dfd` ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") Reported-by: Anders Andersson <pipatron@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> #v4.4.. Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com	2020-06-18 20:25:25 +02:00
Qiushi Wu	17ed808ad2	EDAC: Fix reference count leaks When kobject_init_and_add() returns an error, it should be handled because kobject_init_and_add() takes a reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Therefore, replace calling kfree() and call kobject_put() and add a missing kobject_put() in the edac_device_register_sysfs_main_kobj() error path. [ bp: Massage and merge into a single patch. ] Fixes: `b2ed215a33` ("Kobject: change drivers/edac to use kobject_init_and_add") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu	2020-06-17 15:38:35 +02:00
Borislav Petkov	b9cae27728	EDAC/ghes: Scan the system once on driver init Change the hardware scanning and figuring out how many DIMMs a machine has to a single, one-time thing which happens once on driver init. After that scanning completes, struct ghes_hw_desc contains a representation of the hardware which the driver can then use for later initialization. Then, copy the DIMM information into the respective EDAC core representation of those. Get rid of ghes_edac_dimm_fill and use a struct dimm_info array directly. This way, hw detection and further driver initialization is nicely and logically split. Further additions should all be added to ghes_scan_system() and the hw representation extended as needed. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2020-06-16 19:25:15 +02:00
Robert Richter	b001694d60	EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_pvt The struct members list and ghes of struct ghes_edac_pvt are unused, remove them. On that occasion, rename it to the shorter name struct ghes_pvt. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200519104443.15673-2-rrichter@marvell.com	2020-06-16 15:32:18 +02:00
Robert Richter	cb51a371d0	EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg \| grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com	2020-06-16 15:22:04 +02:00
Qiuxu Zhuo	8807e15597	EDAC, {skx,i10nm}: Use CPU stepping macro to pass configurations Use the X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro to pass CPU stepping specific configurations to {skx,i10nm}_init(), so can delete the CPU stepping check from 10nm_init(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200509010822.76331-1-qiuxu.zhuo@intel.com	2020-06-15 14:50:39 -07:00
Zhenzhong Duan	e9ff6636d3	EDAC/mc: Call edac_inc_ue_error() before panic By calling edac_inc_ue_error() before panic, we get a correct UE error count for core dump analysis. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-2-zhenzhong.duan@gmail.com	2020-06-15 11:19:52 -07:00
Zhenzhong Duan	30bf38e434	EDAC, pnd2: Set MCE_PRIO_EDAC priority for pnd2_mce_dec notifier Avoid giving it MCE_PRIO_LOWEST priority by default. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-1-zhenzhong.duan@gmail.com	2020-06-15 11:19:39 -07:00
Linus Torvalds	6adc19fd13	Kbuild updates for v5.8 (2nd) - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7lBuYVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHvIP/3iErjPshpg/phwH8NTCS4SFkiti BZRM+2lupSn7Qs53BTpVzIkXoHBJQZlJxlQ5HY8ScO+fiz28rKZr+b40us+je1Q+ SkvSPfwZzxjEg7lAZutznG4KgItJLWJKmDyh9T8Y8TAuG4f8WO0hKnXoAp3YorS2 zppEIxso8O5spZPjp+fF/fPbxPjIsabGK7Jp2LpSVFR5pVDHI/ycTlKQS+MFpMEx 6JIpdFRw7TkvKew1dr5uAWT5btWHatEqjSR3JeyVHv3EICTGQwHmcHK67cJzGInK T51+DT7/CpKtmRgGMiTEu/INfMzzoQAKl6Fcu+vMaShTN97Hk9DpdtQyvA6P/h3L 8GA4UBct05J7fjjIB7iUD+GYQ0EZbaFujzRXLYk+dQqEJRbhcCwvdzggGp0WvGRs 1f8/AIpgnQv8JSL/bOMgGMS5uL2dSLsgbzTdr6RzWf1jlYdI1i4u7AZ/nBrwWP+Z iOBkKsVceEoJrTbaynl3eoYqFLtWyDau+//oBc2gUvmhn8ioM5dfqBRiJjxJnPG9 /giRj6xRIqMMEw8Gg8PCG7WebfWxWyaIQwlWBbPok7DwISURK5mvOyakZL+Q25/y 6MBr2H8NEJsf35q0GTINpfZnot7NX4JXrrndJH8NIRC7HEhwd29S041xlQJdP0rs E76xsOr3hrAmBu4P =1NIT -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues	2020-06-13 13:29:16 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Thomas Gleixner	f77d26a9fc	Merge branch 'x86/entry' into ras/core to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow up patches can be applied without creating a horrible merge conflict afterwards.	2020-06-11 15:17:57 +02:00
Borislav Petkov	2a02ca0428	Merge branches 'edac-i10nm' and 'edac-misc' into edac-updates-for-5.8 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-06-01 11:39:15 +02:00
Colin Ian King	f00eb5ff2f	EDAC/amd64: Remove redundant assignment to variable ret in hw_info_get() The variable ret is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant so remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200429154847.287001-1-colin.king@canonical.com	2020-05-29 15:15:02 +02:00
Alexander Monakov	b6bea24d41	EDAC/amd64: Add AMD family 17h model 60h PCI IDs Add support for AMD Renoir (4000-series Ryzen CPUs). Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru	2020-05-22 18:43:13 +02:00
Qiuxu Zhuo	1032095053	EDAC/skx: Use the mcmtr register to retrieve close_pg/bank_xor_enable The skx_edac driver wrongly uses the mtr register to retrieve two fields close_pg and bank_xor_enable. Fix it by using the correct mcmtr register to get the two fields. Cc: <stable@vger.kernel.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reported-by: Matthew Riley <mattdr@google.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200515210146.1337-1-tony.luck@intel.com	2020-05-19 15:11:29 -07:00
Qiuxu Zhuo	ce20670828	EDAC/i10nm: Update driver to support different bus number config register offsets The i10nm_edac driver failed to load on Ice Lake and Tremont/Jacobsville servers if their CPU stepping >= 4 and failed on Ice Lake-D servers from stepping 0. The root cause was that for Ice Lake and Tremont/Jacobsville servers with CPU stepping >=4, the offset for bus number configuration register was updated from 0xcc to 0xd0. For Ice Lake-D servers, all the steppings use the updated 0xd0 offset. Fix the issue by using the appropriate offset for bus number configuration register according to the CPU model number and stepping. Reported-by: Jerry Chen <jerry.t.chen@intel.com> Reported-and-tested-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/linux-edac/20200427084022.GC11036@zn.tnic	2020-04-27 09:40:49 -07:00
Qiuxu Zhuo	ee5340abab	EDAC, {skx,i10nm}: Make some configurations CPU model specific The device ID for configuration agent PCI device and the offset for bus number configuration register can be CPU model specific. So add a new structure res_config to make them configurable and pass res_config to {skx,i10nm}_init() and skx_get_all_bus_mappings() for use. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200427083246.GB11036@zn.tnic	2020-04-27 09:29:41 -07:00
Jason Yan	b2f9fb0d67	EDAC/amd8131: Remove defined but not used bridge_str Fix the following gcc warning: drivers/edac/amd8131_edac.c:47:21: warning: ‘bridge_str’ defined but not used [-Wunused-const-variable=] static char * const bridge_str[] = { ^~~~~~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rrichter@marvell.com> Link: https://lkml.kernel.org/r/20200415085006.6732-1-yanaijie@huawei.com	2020-04-24 09:08:47 +02:00
Zou Wei	58d66175d4	EDAC/thunderx: Make symbols static Make a couple of symbols static, as reported by sparse. [ bp: Massage. ] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1587624744-97240-1-git-send-email-zou_wei@huawei.com	2020-04-23 12:07:24 +02:00
Tony Luck	7fc0b9b995	EDAC: Drop the EDAC report status checks When acpi_extlog was added, we were worried that the same error would be reported more than once by different subsystems. But in the ensuing years I've seen complaints that people could not find an error log (because this mechanism suppressed the log they were looking for). Rip it all out. People are smart enough to notice the same address from different reporting mechanisms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-8-tony.luck@intel.com	2020-04-14 16:01:01 +02:00
Tony Luck	23ba710a08	x86/mce: Fix all mce notifiers to update the mce->kflags bitmask If the handler took any action to log or deal with the error, set a bit in mce->kflags so that the default handler on the end of the machine check chain can see what has been done. Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers skip over errors already processed by CEC. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com	2020-04-14 15:59:26 +02:00
Borislav Petkov	3e0fdec858	x86/mce/amd, edac: Remove report_gart_errors ... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de	2020-04-14 15:53:46 +02:00
Jason Yan	87a4eca891	EDAC/xgene: Remove set but not used address local var Fix the following gcc warning: drivers/edac/xgene_edac.c:1486:7: warning: variable ‘address’ set but not used [-Wunused-but-set-variable] u32 address; ^~~~~~~ Remove the unused macro RBERRADDR_RD while at it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200409093259.20069-1-yanaijie@huawei.com	2020-04-14 14:35:19 +02:00
Christophe JAILLET	493362dd7b	EDAC/armada_xp: Fix some log messages Fix spelling (s/Aramda/Armada/) in a log message and in a comment. While at it, add a trailing '\n' in messages. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lkml.kernel.org/r/20200413041556.3514-1-christophe.jaillet@wanadoo.fr	2020-04-14 11:28:09 +02:00
Linus Torvalds	9b82f05f86	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - A couple of x86/cpu cleanups and changes were grandfathered in due to patch dependencies. These clean up the set of CPU model/family matching macros with a consistent namespace and C99 initializer style. - A bunch of updates to various low level PMU drivers: * AMD Family 19h L3 uncore PMU * Intel Tiger Lake uncore support * misc fixes to LBR TOS sampling - optprobe fixes - perf/cgroup: optimize cgroup event sched-in processing - misc cleanups and fixes Tooling side changes are to: - perf {annotate,expr,record,report,stat,test} - perl scripting - libapi, libperf and libtraceevent - vendor events on Intel and S390, ARM cs-etm - Intel PT updates - Documentation changes and updates to core facilities - misc cleanups, fixes and other enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits) cpufreq/intel_pstate: Fix wrong macro conversion x86/cpu: Cleanup the now unused CPU match macros hwrng: via_rng: Convert to new X86 CPU match macros crypto: Convert to new CPU match macros ASoC: Intel: Convert to new X86 CPU match macros powercap/intel_rapl: Convert to new X86 CPU match macros PCI: intel-mid: Convert to new X86 CPU match macros mmc: sdhci-acpi: Convert to new X86 CPU match macros intel_idle: Convert to new X86 CPU match macros extcon: axp288: Convert to new X86 CPU match macros thermal: Convert to new X86 CPU match macros hwmon: Convert to new X86 CPU match macros platform/x86: Convert to new CPU match macros EDAC: Convert to new X86 CPU match macros cpufreq: Convert to new X86 CPU match macros ACPI: Convert to new X86 CPU match macros x86/platform: Convert to new CPU match macros x86/kernel: Convert to new CPU match macros x86/kvm: Convert to new CPU match macros x86/perf/events: Convert to new CPU match macros ...	2020-03-30 16:40:08 -07:00
Borislav Petkov	41dac9a2ad	Merge branches 'edac-mc-cleanup', 'edac-misc', 'edac-drivers' and 'edac-urgent' into edac-updates-for-5.7 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-03-30 10:07:58 +02:00
Ingo Molnar	629b3df7ec	Merge branch 'x86/cpu' into perf/core, to resolve conflict Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-03-25 15:20:44 +01:00
Thomas Gleixner	298426211c	EDAC: Convert to new X86 CPU match macros The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200320131509.673579000@linutronix.de	2020-03-24 21:32:28 +01:00
Takashi Iwai	215a423cc0	EDAC/armada_xp: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lkml.kernel.org/r/20200311071728.4541-1-tiwai@suse.de	2020-03-17 19:26:09 +01:00
Sherry Sun	2fb3f6e125	EDAC/synopsys: Do not dump uninitialized pinf->col On the ZynqMP platform, zynqmp_get_error_info() is used to read out error information. In this function, the pinf->col parameter is not used (it is only used by the Zynq platform's zynq_get_error_info()). So there's no need to print pinf->col on ZynqMP. In order to differentiate on which platform handle_error() is executed, use DDR_ECC_INTR_SUPPORT as the check condition to distinguish between Zynq and ZynqMP platforms. [ bp: Massage. ] Fixes: `b500b4a029` ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Manish Narani <manish.narani@xilinx.com> Link: https://lkml.kernel.org/r/1584365679-27443-1-git-send-email-sherry.sun@nxp.com	2020-03-17 14:32:31 +01:00
Sherry Sun	dfc6014e3b	EDAC/synopsys: Do not print an error with back-to-back snprintf() calls handle_error() currently calls snprintf() a couple of times in succession to output the message for a CE/UE, therefore overwriting each part of the message which was formatted with the previous snprintf() call. As a result, only the part of the message from the last snprintf() call will be printed. The simplest and most effective way to fix this problem is to combine the whole string into one which to supply to a single snprintf() call. [ bp: Massage. ] Fixes: `b500b4a029` ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Cc: Manish Narani <manish.narani@xilinx.com> Link: https://lkml.kernel.org/r/1582792452-32575-1-git-send-email-sherry.sun@nxp.com	2020-02-27 16:44:25 +01:00
Lei Wang	1088750d78	EDAC: Add EDAC driver for DMC520 The driver supports error detection and correction on devices with an ARM DMC-520 memory controller. Signed-off-by: Lei Wang <leiwang_git@outlook.com> Signed-off-by: Shiping Ji <shiping.linux@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lkml.kernel.org/r/83b48c70-dc06-d0d4-cae9-a2187fca628b@gmail.com	2020-02-19 21:00:27 +01:00
Prarit Bhargava	52cff04a81	EDAC/mce_amd: Print !SMCA processor warning only once This warning is output for every virtual CPU in a guest on an EPYC 2 system because kvm doesn't enable SMCA. Once is enough too. [ bp: Massage. ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200217134627.19765-1-prarit@redhat.com	2020-02-18 17:20:41 +01:00
Robert Richter	4aa92c8646	EDAC/mc: Remove per layer counters Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it turns out that only the leaves in the memory hierarchy are consumed (in sysfs), but not the intermediate layers, e.g.: count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx]; These unused counters only add complexity, remove them. The error counter values are directly stored in struct dimm_info now. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com	2020-02-17 13:37:00 +01:00
Robert Richter	1853ee7299	EDAC/mc: Remove detail[] string and cleanup error string generation The error descriptor is passed to the error reporting functions, so the error details can be directly generated there. Move string generation from edac_raw_mc_handle_error() to edac_ce_error() and edac_ue_error(). The intermediate detail[] string can be removed then. Also, cleanup the string generation by switching to a single variant only using the ternary operator. [ bp: put ternary operators on a separate line for better readability and use the short-form "inline if" in edac_mc_handle_error(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-10-rrichter@marvell.com	2020-02-17 13:36:28 +01:00
Robert Richter	6ab76179ad	EDAC/mc: Pass the error descriptor to error reporting functions Most arguments of error reporting functions are already stored in the struct edac_raw_error_desc error descriptor. Pass the error descriptor to the functions and reduce the functions' argument list. [ bp: Sort function args in reverse fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-9-rrichter@marvell.com	2020-02-17 13:35:30 +01:00
Robert Richter	67792cf958	EDAC/mc: Remove enable_per_layer_report function argument Many functions carry the enable_per_layer_report argument. This is a bool value indicating the error information contains some location data where the error occurred. This can easily being determined by checking the pos[] array for values. Negative values indicate there is no location available. So if the top layer is negative, the error location is unknown. Just check if the top layer is negative and remove enable_per_layer_report as function argument and also from struct edac_raw_error_desc. [ bp: Reflow comments to 80 columns, while at it. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-8-rrichter@marvell.com	2020-02-17 13:13:16 +01:00
Robert Richter	65bb4d1af9	EDAC/mc: Report "unknown memory" on too many DIMM labels found There is a limitation to report only EDAC_MAX_LABELS in e->label of the error descriptor. This is to prevent a potential string overflow. The current implementation falls back to "any memory" in this case and also stops all further processing to find a unique row and channel of the possible error location. Reporting "any memory" is wrong as the memory controller reported an error location for one of the layers. Instead, report "unknown memory" and also do not break early in the loop to further check row and channel for uniqueness. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-7-rrichter@marvell.com	2020-02-17 13:10:14 +01:00
Robert Richter	6334dc4e3f	EDAC/mc: Carve out error increment into a separate function Carve out the error_count increment into a separate function edac_inc_csrow(). This better separates code and reduces the indentation level. Implementation note: The function edac_inc_csrow() counts the same as before, ->ce_count is only incremented if row >= 0. This is esp. true for the case of (!e->enable_per_layer_report). Here, a DIMM was not found, variable row still has a value of -1 and ->ce_count is not incremented. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com	2020-02-17 13:07:50 +01:00
Robert Richter	91b327f672	EDAC/mc: Determine mci pointer from the error descriptor Each struct mci has its own error descriptor. Create a function error_desc_to_mci() to determine the corresponding mci from an error descriptor. This removes @mci from the parameter list of edac_raw_mc_handle_error() as the mci pointer does not need to be passed any longer. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com	2020-02-17 13:05:10 +01:00
Robert Richter	672ef0e568	EDAC: Store error type in struct edac_raw_error_desc Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com	2020-02-17 13:02:30 +01:00
Robert Richter	1f27c79062	EDAC/mc: Reorder functions edac_mc_alloc*() Reorder the new created functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further code changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com	2020-02-17 12:57:18 +01:00
Robert Richter	aad28c6f6b	EDAC/mc: Split edac_mc_alloc() into smaller functions edac_mc_alloc() is huge. Factor out code by moving it to the two new functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not move code yet for better review. [ bp: sort local args in reversed fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com	2020-02-17 12:47:50 +01:00
Robert Richter	bea1bfd5b7	EDAC/mc: Change mci device removal to use put_device() There are dimm and csrow devices linked to the mci device esp. to show up in sysfs. It must be granted that children devices are removed before its mci parent. Thus, the release functions must be called in the correct order and may not miss any child before releasing its parent. In the current implementation this is only granted by the correct order of release functions. A much better approach is to use put_device() that releases the device only after all users are gone. It is the recommended way to release a device and free its memory. The function uses the device's refcount and only frees it if there are no users of it anymore such as children. So implement a mci_release() function to remove mci devices, use put_device() to free them and early initialize the mci device right after its struct has been allocated. Change the release function so that it can be universally used no matter if the device is registered or not. Since subsequent dimm and csrow sysfs links are implemented as children devices, their refcounts will keep the parent mci device from being removed as long as sysfs entries exist and until all users have been unregistered in edac_remove_sysfs_mci_device(). Remove edac_unregister_sysfs() and merge mci sysfs removal into edac_remove_sysfs_mci_device(). There is only a single instance now that removes the sysfs entries. The function can now be used in the error paths for cleanup. Also, create device release functions for all involved devices (dev->release), remove device_type release functions (dev_type-> release) and also use dev->init_name instead of dev_set_name(). [ bp: Massage commit message and comments. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com	2020-02-17 12:32:44 +01:00
Robert Richter	4d59588c09	EDAC/sysfs: Remove csrow objects on errors All created csrow objects must be removed in the error path of edac_create_csrow_objects(). The objects have been added as devices. They need to be removed by doing a device_del() and put_device() call to also free their memory. The missing put_device() leaves a memory leak. Use device_unregister() instead of device_del() which properly unregisters the device doing both. Fixes: `7adc05d2dc` ("EDAC/sysfs: Drop device references properly") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com	2020-02-13 13:29:41 +01:00
Robert Richter	216aa145aa	EDAC/mc: Fix use-after-free and memleaks during device removal A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with `c498afaf7d` ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak \| sort \| uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: `c498afaf7d` ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: `faa2ad09c0` ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: `7a623c0390` ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com	2020-02-13 13:28:52 +01:00
Linus Torvalds	6a1000bd27	ioremap changes for 5.6 - remove ioremap_nocache given that is is equivalent to ioremap everywhere -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr +3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9 wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp 25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+ b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei /rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/ Kdg= =TUCJ -----END PGP SIGNATURE----- Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap	2020-01-27 13:03:00 -08:00
Linus Torvalds	30f5a75640	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Misc fixes to the MCE code all over the place, by Jan H. Schönherr. - Initial support for AMD F19h and other cleanups to amd64_edac, by Yazen Ghannam. - Other small cleanups. * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/mce_amd: Make fam_ops static global EDAC/amd64: Drop some family checks for newer systems EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh x86/amd_nb: Add Family 19h PCI IDs EDAC/mce_amd: Always load on SMCA systems x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType x86/mce: Fix use of uninitialized MCE message string x86/mce: Fix mce=nobootlog x86/mce: Take action on UCNA/Deferred errors again x86/mce: Remove mce_inject_log() in favor of mce_log() x86/mce: Pass MCE message to mce_panic() on failed kernel recovery x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused	2020-01-27 09:19:35 -08:00
Linus Torvalds	b62061b82a	A garden variety of small fixes all over the place. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl4uvKUACgkQEsHwGGHe VUoG5RAAoSdq9LvBPWTtJUMckxZ124hAqvtDOqvAbCnde3V2yICZLfOuGZSuWBy0 +ED2Ebwnf6mxzB2vEEI1OQ0+LmmRXvc+UbI2z7SnC/4sLZnS7h9wDrxN41nR0dyB cMx0JB0ClyMW71WJo4Jbp9B+fiB4rBu/gmrXc0kC3L6ShmYvWzGdrPIV31qXPPYn zjqOn+tBENi7J+8omVsFg/KtAK9BTEk7FLuAz2mevLJzZGwycQBk/sigDHTuXJC2 AACPmaQAqZgOYEOWID+F1USgD7x3yfYEZUBQWokqmDJDkN6agvcScQWoYh2PGobR wxJhGZR+Uq3NU0BsdoKlI4OCI/Hf+WYBAhpSdlu1/Wle3Wq4wGc91wisFtnpXU6s 2VfiUtpf9WTROUeyHvhXcKgnjTwmRxSOBmaIhSYSL/aQ34AJaMmaeS8Vj1vpamrp 1dotRIQKmjI8yDRs4JBfsJX3KtMjJ+u1qiz7SfwqBv1dqypSamTkvjpbPhItUrAz Hv9QHWQAOmCLBqWw3Jx2W9gwd7Z1gYmpvo9mqyUy81eF/BDNeJTi1zRkNL2jqElU gu6gaOheYr7UumA9LDsZUqh3cGgGii73pp2Fc1A2QvL9HeD++UVoXOlwjo8VEs1b +u+DRDKNy3OwgCLMGfK0zuWfvTRqgV+OnlPnBH2/2PnYpE9tawU= =fR3M -----END PGP SIGNATURE----- Merge tag 'edac_for_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A totally boring branch this time around: a garden variety of small fixes all over the place" * tag 'edac_for_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Do not warn when removing instances EDAC/sifive: Fix return value check in ecc_register() EDAC/aspeed: Remove unneeded semicolon EDAC: remove set but not used variable 'ecc_loc' EDAC: skx_common: downgrade message importance on missing PCI device EDAC/Kconfig: Fix Kconfig indentation	2020-01-27 09:16:22 -08:00
Borislav Petkov	7e5d6cf353	EDAC/amd64: Do not warn when removing instances On machines which do not populate all nodes with DIMMs, the driver doesn't initialize an instance there. However, the instance removal remove_one_instance() path will warn unconditionally, which is wrong. Remove the WARN_ON() even if the warning is innocent because it causes a splat in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200117115939.5524-1-bp@alien8.de	2020-01-17 13:00:06 +01:00
Wei Yongjun	6cd18453b6	EDAC/sifive: Fix return value check in ecc_register() In case of error, the function edac_device_alloc_ctl_info() returns a NULL pointer, not ERR_PTR(). Replace the IS_ERR() test in the return value check with a NULL test. Fixes: `91abaeaaff` ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200115150303.112627-1-weiyongjun1@huawei.com	2020-01-17 01:37:51 +01:00
Borislav Petkov	86e9f9d60e	EDAC/mce_amd: Make fam_ops static global ... and do not kmalloc a three-pointer struct. Which simplifies mce_amd_init() a bit. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200116163403.GF27148@zn.tnic	2020-01-16 21:52:48 +01:00
Yazen Ghannam	dcd01394ce	EDAC/amd64: Drop some family checks for newer systems In general, "pvt->umc != NULL" is used to check if the system is Family 17h+. However, there are a few places that are using direct family checks. Replace the remaining family checks with a check for "pvt->umc != NULL". Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-6-Yazen.Ghannam@amd.com	2020-01-16 17:09:29 +01:00
Yazen Ghannam	2eb61c91c3	EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh Add family ops to support AMD Family 19h systems. Existing Family 17h functions can be used. Also, add Family 19h to the list of families to automatically load the module. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-5-Yazen.Ghannam@amd.com	2020-01-16 17:09:23 +01:00
Yazen Ghannam	9f6aef8631	EDAC/mce_amd: Always load on SMCA systems MCA error decoding on SMCA systems is not dependent on family. Return success early if the system supports the SMCA feature. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-3-Yazen.Ghannam@amd.com	2020-01-16 17:09:13 +01:00
Yazen Ghannam	89a76171bf	x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType Add support for a new version of the Load Store unit bank type as indicated by its McaType value, which will be present in future SMCA systems. Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is logically the same to the user. Also, add the new error descriptions to edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com	2020-01-16 17:09:02 +01:00
Yash Shah	13cf4cf030	riscv: move sifive_l2_cache.h to include/soc The commit `9209fb5189` ("riscv: move sifive_l2_cache.c to drivers/soc") moves the sifive L2 cache driver to driver/soc. It did not move the header file along with the driver. Therefore this patch moves the header file to driver/soc Signed-off-by: Yash Shah <yash.shah@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: updated to fix the include guard] Fixes: `9209fb5189` ("riscv: move sifive_l2_cache.c to drivers/soc") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2020-01-12 10:12:44 -08:00
Christoph Hellwig	4bdc0d676a	remove ioremap_nocache and devm_ioremap_nocache ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de>	2020-01-06 09:45:59 +01:00
Christoph Hellwig	9209fb5189	riscv: move sifive_l2_cache.c to drivers/soc The sifive_l2_cache.c is in no way related to RISC-V architecture memory management. It is a little stub driver working around the fact that the EDAC maintainers prefer their drivers to be structured in a certain way that doesn't fit the SiFive SOCs. Move the file to drivers/soc and add a Kconfig option for it, as well as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE. Fixes: `a967a289f1` ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Borislav Petkov <bp@suse.de> [paul.walmsley@sifive.com: keep the MAINTAINERS change specific to the L2$ controller code] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2019-12-20 03:40:24 -08:00
Xu Wang	a651c6c644	EDAC/aspeed: Remove unneeded semicolon Remove unneeded semicolon reported by coccinelle. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andrew Jeffery <andrew@aj.id.au> Cc: James Morse <james.morse@arm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-aspeed@lists.ozlabs.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Stefan Schaeckeler <sschaeck@cisco.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1576648806-1114-1-git-send-email-vulab@iscas.ac.cn	2019-12-19 07:27:09 +01:00
yu kuai	2403ed2f44	EDAC: remove set but not used variable 'ecc_loc' Fixes gcc '-Wunused-but-set-variable' warning: drivers/edac/i5100_edac.c: In function ‘i5100_read_log’: drivers/edac/i5100_edac.c:489:11: warning: variable ‘ecc_loc’ set but not used [-Wunused-but-set-variable] It is never used, and so can be removed. Signed-off-by: yu kuai <yukuai3@huawei.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20191216110121.46698-1-yukuai3@huawei.com	2019-12-16 13:54:02 -08:00
Aristeu Rozanski	854bb48018	EDAC: skx_common: downgrade message importance on missing PCI device Both skx_edac and i10nm_edac drivers are loaded based on the matching CPU being available which leads the module to be automatically loaded in virtual machines as well. That will fail due the missing PCI devices. In both drivers the first function to make use of the PCI devices is skx_get_hi_lo() will simply print EDAC skx: Can't get tolm/tohm for each CPU core, which is noisy. This patch makes it a debug message. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20191204212325.c4k47p5hrnn3vpb5@redhat.com	2019-12-10 14:14:43 -08:00
Krzysztof Kozlowski	a483e22791	EDAC/Kconfig: Fix Kconfig indentation Adjust indentation from spaces to tab (+optional two spaces) as in coding style with a command like: $ sed -e 's/^ /\t/' -i */Kconfig [ bp: make it a single line. ] Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191120134206.15588-1-krzk@kernel.org	2019-12-09 19:07:40 +01:00
Thor Thayer	5781823fd0	EDAC/altera: Use the Altera System Manager driver Simplify by using the Altera System Manager driver that abstracts the differences between ARM32 and ARM64. Also allows the removal of the Arria10 test function since this is handled by the System Manager driver. Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Meng.Li@windriver.com Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1574361048-17572-4-git-send-email-thor.thayer@linux.intel.com	2019-11-22 10:18:29 +01:00
Thor Thayer	08a260d968	EDAC/altera: Cleanup the ECC Manager Cleanup the ECC Manager peripheral test in probe function as suggested by James. Remove the check for Stratix10. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1573156890-26891-2-git-send-email-thor.thayer@linux.intel.com	2019-11-22 10:16:43 +01:00
Meng Li	56d9e7bd3f	EDAC/altera: Use fast register IO for S10 IRQs When an IRQ occurs, regmap_{read,write,...}() is invoked in atomic context. Regmap must indicate register IO is fast so that a spinlock is used instead of a mutex to avoid sleeping in atomic context: lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_write a10_eccmgr_irq_unmask unmask_irq.part.0 irq_enable __irq_startup irq_startup __setup_irq request_threaded_irq devm_request_threaded_irq altr_sdram_probe Mark it so. [ bp: Massage. ] Fixes: `3dab6bd526` ("EDAC, altera: Add support for Stratix10 SDRAM EDAC") Reported-by: Meng Li <Meng.Li@windriver.com> Signed-off-by: Meng Li <Meng.Li@windriver.com> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: stable <stable@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1574361048-17572-2-git-send-email-thor.thayer@linux.intel.com	2019-11-22 10:14:56 +01:00
Robert Richter	16214bd9e4	EDAC/ghes: Do not warn when incrementing refcount on 0 The following warning from the refcount framework is seen during ghes initialization: EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) ------------[ cut here ]------------ refcount_t: increment on 0; use-after-free. WARNING: CPU: 36 PID: 1 at lib/refcount.c:156 refcount_inc_checked [...] Call trace: refcount_inc_checked ghes_edac_register ghes_probe ... It warns if the refcount is incremented from zero. This warning is reasonable as a kernel object is typically created with a refcount of one and freed once the refcount is zero. Afterwards the object would be "used-after-free". For GHES, the refcount is initialized with zero, and that is why this message is seen when initializing the first instance. However, whenever the refcount is zero, the device will be allocated and registered. Since the ghes_reg_mutex protects the refcount and serializes allocation and freeing of ghes devices, a use-after-free cannot happen here. Instead of using refcount_inc() for the first instance, use refcount_set(). This can be used here because the refcount is zero at this point and can not change due to its protection by the mutex. Fixes: `23f61b9fc5` ("EDAC/ghes: Fix locking and memory barrier issues") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <huangming23@huawei.com> Cc: James Morse <james.morse@arm.com> Cc: <linuxarm@huawei.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: <tanxiaofei@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <wanghuiqiang@huawei.com> Link: https://lkml.kernel.org/r/20191121213628.21244-1-rrichter@marvell.com	2019-11-22 09:53:08 +01:00
Robert Richter	787d899914	EDAC: Unify the mc_event tracepoint call The code in ghes_edac.c and edac_mc.c for grain_bits calculation and calling trace_mc_event() is now the same. Move it to a single location in edac_raw_mc_handle_error(). The only difference is the missing IS_ENABLED(CONFIG_RAS) switch, but this is needed for ghes too. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-13-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	501eb40d2b	EDAC/ghes: Remove intermediate buffer pvt->detail_location detail_location[] is used to collect two location strings so they can be passed as one to trace_mc_event(). Instead of having an extra copy step, assemble the location string in other_detail[] from the beginning. Using other_detail[] to call trace_mc_event() is now the same as in edac_mc.c and code can be unified. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-12-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	7088e29e04	EDAC/ghes: Fix grain calculation The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	7c10493170	EDAC/ghes: Use standard kernel macros for page calculations Use standard macros for page calculations. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-10-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	0d8292e003	EDAC/mc: Reduce indentation level in edac_mc_handle_error() Reduce the indentation level in edac_mc_handle_error() a bit. No functional changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	47bec6b4c3	EDAC/mc: Remove needless zero string termination The e string to which this is pointing to has already been cleared earlier in the function so remove the needless zero string termination. [ bp: Correct the commit message. ] Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	d260e8ff51	EDAC/mc: Do not BUG_ON() in edac_mc_alloc() No need to crash the system in case edac_mc_alloc() is called with invalid arguments, just warn and return. This would cause a checkpatch warning when touching the code later, so just fix it. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com	2019-11-10 12:40:14 +01:00
Robert Richter	c498afaf7d	EDAC: Introduce an mci_for_each_dimm() iterator Introduce an mci_for_each_dimm() iterator. It returns a pointer to a struct dimm_info. This makes the declaration and use of an index obsolete and avoids access to internal data of struct mci (direct array access etc). [ bp: push the struct dimm_info *dimm; declaration into the CONFIG_EDAC_DEBUG block. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com	2019-11-10 12:39:40 +01:00
Robert Richter	977b1ce7c1	EDAC: Remove EDAC_DIMM_OFF() macro The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index. Simplify this by storing the index in struct dimm_info to avoid its calculation and remove the EDAC_DIMM_OFF() macro. The index can be directly used then. Another advantage is that edac_mc_alloc() could be used even if the exact size of the layers is unknown. Only the number of DIMMs would be needed. Rename iterator variable to idx, while at it. The name is more handy, esp. when searching for it in the code. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com	2019-11-09 11:23:49 +01:00
Robert Richter	bc9ad9e40d	EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function The EDAC_DIMM_PTR() macro takes 3 arguments from struct mem_ctl_info. Clean up this interface to only pass the mci struct and replace this macro with a new function edac_get_dimm(). Also introduce an edac_get_dimm_by_index() function for later use. This allows it to get a DIMM pointer only by a given index. This can be useful if the DIMM's position within the layers of the memory controller or the exact size of the layers are unknown. Small style changes made for some hunks after applying the semantic patch. Semantic patch used: @@ expression mci, a, b,c; @@ -EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c) +edac_get_dimm(mci, a, b, c) [ bp: Touchups. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tero Kristo <t-kristo@ti.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-2-rrichter@marvell.com	2019-11-09 10:32:32 +01:00
Borislav Petkov	7fdfee926b	EDAC/amd64: Get rid of the ECC disabled long message This message keeps flooding dmesg on boxes where ECC is disabled or the DIMMs do not support ECC but the module gets auto-probed. What's even worse is that autoprobing happens on every CPU due to the CPU-family matching the driver does and uevent being generated for each CPU device. What is more, this message is becoming even more useless on newer systems where forcing ECC is not recommended and it should be done in the BIOS so the BIOS can do all the necessary work, i.e., just setting a bit in an MSR is not enough anymore. So get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20191106160607.GC28380@zn.tnic	2019-11-09 10:06:36 +01:00
Robert Richter	23f61b9fc5	EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted before that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: `0fe5f281f7` ("EDAC, ghes: Model a single, logical memory controller") Fixes: `1e72e673b9` ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com	2019-11-08 16:28:28 +01:00
Yazen Ghannam	582f94b590	EDAC/amd64: Check for memory before fully initializing an instance Return early before checking for ECC if the node does not have any populated memory. Free any cached hardware data before returning. Also, return 0 in this case since this is not a failure. Other nodes may have memory and the module should attempt to load an instance for them. Move printing of hardware information to after the instance is initialized, so that the information is only printed for nodes with memory. Return an error code when ECC is disabled. This check happens after checking for memory. The module should explicitly fail to load if memory is populated on a node and ECC is disabled. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-6-Yazen.Ghannam@amd.com	2019-11-06 11:10:11 +01:00
Yazen Ghannam	1c9b08bac5	EDAC/amd64: Use cached data when checking for ECC ...now that the data is available earlier. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-5-Yazen.Ghannam@amd.com	2019-11-06 11:07:57 +01:00
Yazen Ghannam	5e4c55276a	EDAC/amd64: Save max number of controllers to family type The maximum number of memory controllers is fixed within a family/model group. In most cases, this has been fixed at 2, but some systems may have up to 8. The struct amd64_family_type already contains family/model-specific information, and this can be used rather than adding model checks to various functions. Create a new field in struct amd64_family_type for max_mcs. Set this when setting other family type information, and use this when needing the maximum number of memory controllers possible for a system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com	2019-11-06 11:07:01 +01:00
Yazen Ghannam	80355a3b2d	EDAC/amd64: Gather hardware information early Split out gathering hardware information from init_one_instance() into a separate function hw_info_get(). This is necessary so that the information can be cached earlier and used to check if memory is populated and if ECC is enabled on a node. Also, define a function hw_info_put() to back out changes made in hw_info_get(). Check for an allocated PCI device (Function 0 for Family 17h or Function 1 for pre-Family 17h) before freeing, since hw_info_put() may be called before PCI siblings are reserved. Drop the family check when freeing pvt->umc. This will be NULL on pre-Family 17h systems. However, kfree() is safe and will check for a NULL pointer before freeing. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-3-Yazen.Ghannam@amd.com	2019-11-06 11:04:49 +01:00
Yazen Ghannam	38ddd4d157	EDAC/amd64: Make struct amd64_family_type global The struct amd64_family_type doesn't change between multiple nodes and instances of the module, so make it global. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106012448.243970-2-Yazen.Ghannam@amd.com	2019-11-06 10:58:12 +01:00
Yazen Ghannam	466503d6b1	EDAC/amd64: Set grain per DIMM The following commit introduced a warning on error reports without a non-zero grain value. `3724ace582` ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: `3724ace582` ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com	2019-10-25 15:36:36 +02:00
Markus Elfring	5bbab3cf21	EDAC/aspeed: Use devm_platform_ioremap_resource() in aspeed_probe() Simplify this function implementation by using a known wrapper function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joel Stanley <joel@jms.id.au> Cc: Andrew Jeffery <andrew@aj.id.au> Cc: James Morse <james.morse@arm.com> Cc: kernel-janitors@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-aspeed@lists.ozlabs.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Stefan Schaeckeler <sschaeck@cisco.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/baabb9e9-a1b2-3a04-9fb6-aa632de5f722@web.de	2019-10-24 11:17:29 +02:00
Tony Luck	e80634a75a	EDAC, skx: Retrieve and print retry_rd_err_log registers Skylake logs some additional useful information in per-channel registers in addition the the architectural status/addr/misc logged in the machine check bank. Pick up this information and add it to the EDAC log: retry_rd_err_[five 32-bit register values] Sorry, no definitions for these registers. OEMs and DIMM vendors will be able to use them to isolate which cells in the DIMM are causing problems. correrrcnt[per rank corrected error counts] Note that if additional errors are logged while these registers are being read, you may see a jumble of values some from earlier errors, others from later errors (since the registers report the most recent logged error). The correrrcnt registers provide error counts per possible rank. If these counts only change by one since the previous error logged for this channel, then it is safe to assume that the registers logged provide a coherent view of one error. With this change EDAC logs look like this: EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 - err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000]) Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2019-10-18 15:27:58 -07:00
Tony Luck	29b8e84fbc	EDAC, skx_common: Refactor so that we initialize "dev" in result of adxl decode. Simplifies the code a little. Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2019-10-18 15:27:48 -07:00
Borislav Petkov	3a5e7ec903	Merge branch 'edac-urgent' into edac-for-next Pick up urgent change into next queue. Signed-off-by: Borislav Petkov <bp@suse.de>	2019-10-17 13:47:12 +02:00
James Morse	1e72e673b9	EDAC/ghes: Fix Use after free in ghes_edac remove path ghes_edac models a single logical memory controller, and uses a global ghes_init variable to ensure only the first ghes_edac_register() will do anything. ghes_edac is registered the first time a GHES entry in the HEST is probed. There may be multiple entries, so subsequent attempts to register ghes_edac are silently ignored as the work has already been done. When a GHES entry is unregistered, it calls ghes_edac_unregister(), which free()s the memory behind the global variables in ghes_edac. But there may be multiple GHES entries, the next call to ghes_edac_unregister() will dereference the free()d memory, and attempt to free it a second time. This may also be triggered on a platform with one GHES entry, if the driver is unbound/re-bound and unbound. The re-bind step will do nothing because of ghes_init, the second unbind will then do the same work as the first. Doing the unregister work on the first call is unsafe, as another CPU may be processing a notification in ghes_edac_report_mem_error(), using the memory we are about to free. ghes_init is already half of the reference counting. We only need to do the register work for the first call, and the unregister work for the last. Add the unregister check. This means we no longer free ghes_edac's memory while there are GHES entries that may receive a notification. This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE. [ bp: merge into a single patch. ] Fixes: `0fe5f281f7` ("EDAC, ghes: Model a single, logical memory controller") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com	2019-10-17 11:27:05 +02:00

... 2 3 4 5 6 ...

1912 Commits