IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit e84077437902ec99eba0a6b516df772653f142c7 upstream.
Fix period calculation in case user sets a value of 1000. The input of
round_jiffies_relative() should be in jiffies and not in milli-seconds.
[ bp: Use the same code pattern as in edac_device_workq_setup() for
clarity. ]
Fixes: c4cf3b454eca ("EDAC: Rework workqueue handling")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221020124458.22153-1-farbere@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8efca92ae509c25e0a4bd5d0a86decea4f0c41e upstream.
Do alignment logic properly and use the "ptr" local variable for
calculating the remainder of the alignment.
This became an issue because struct edac_mc_layer has a size that is not
zero modulo eight, and the next offset that was prepared for the private
data was unaligned, causing an alignment exception.
The patch in Fixes: which broke this actually wanted to "what we
actually care about is the alignment of the actual pointer that's about
to be returned." But it didn't check that alignment.
Use the correct variable "ptr" for that.
[ bp: Massage commit message. ]
Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfd0dfb9a7cc04acf93435b440dd34c2ca7b4424 upstream.
The driver overrides error codes returned by platform_get_irq_optional()
to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
[ bp: Massage commit message. ]
Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 279eb8575fdaa92c314a54c0d583c65e26229107 upstream.
The driver overrides the error codes returned by platform_get_irq() to
-ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
[ bp: Massage commit message. ]
Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 537bddd069c743759addf422d0b8f028ff0f8dbc upstream.
The computation of TOHM is off by one bit. This missed bit results in
too low a value for TOHM, which can cause errors in regular memory to
incorrectly report:
EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory
Fixes: 50d1bb93672f ("sb_edac: add support for Haswell based systems")
Cc: stable@vger.kernel.org
Reported-by: Meeta Saggi <msaggi@purestorage.com>
Signed-off-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0a37f32ba5272b2d4ec8c8d0f6b212b81b578f7e ]
The module misses MODULE_DEVICE_TABLE() for of_device_id tables and thus
never autoloads on ID matches.
Add the missing declaration.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tero Kristo <kristo@kernel.org>
Link: https://lkml.kernel.org/r/20210512033727.26701-1-cuibixuan@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 706657b1febf446a9ba37dc51b89f46604f57ee9 upstream.
In order to setup its PCI component, the driver needs any node private
instance in order to get a reference to the PCI device and hand that
into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
memory controller descriptor under the assumption that if any, the 0th
will be always present.
However, this assumption goes wrong when the 0th node doesn't have
memory and the driver doesn't initialize an instance for it:
EDAC amd64: F17h detected (node 0).
...
EDAC amd64: Node 0: No DIMMs detected.
But looking up node instances is not really needed - all one needs is
the pointer to the proper device which gets discovered during instance
init.
So stash that pointer into a variable and use it when setting up the
EDAC PCI component.
Clear that variable when the driver needs to unwind due to some
instances failing init to avoid any registration imbalance.
Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 66077adb70a2a9e92540155b2ace33ec98299c90 ]
platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.
[ bp: Massage commit message. ]
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tero Kristo <t-kristo@ti.com>
Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 857a3139bd8be4f702c030c8ca06f3fd69c1741a ]
When pci_get_device_func() fails, the driver doesn't need to execute
pci_dev_put(). mci should still be freed, though, to prevent a memory
leak. When pci_enable_device() fails, the error injection PCI device
"einj" doesn't need to be disabled either.
[ bp: Massage commit message, rename label to "bail_mc_free". ]
Fixes: 52608ba205461 ("i5100_edac: probe for device 19 function 0")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 709ed1bcef12398ac1a35c149f3e582db04456c2 ]
The Intel uncore driver may claim some of the pci ids from ie31200 which
means that the ie31200 edac driver will not initialize them as part of
pci_register_driver().
Let's add a fallback for this case to 'pci_get_device()' to get a
reference on the device such that it can still be configured. This is
similar in approach to other edac drivers.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ]
When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.
Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.
[ bp: Massage and merge into a single patch. ]
Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu
Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee470bb25d0dcdf126f586ec0ae6dca66cb340a4 ]
Commit:
da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
added support for F15h, model 0x60 CPUs but in doing so, missed to read
back SCRCTRL PCI config register on F15h CPUs which are *not* model
0x60. Add that read so that doing
$ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate
can show the previously set DRAM scrub rate.
Fixes: da92110dfdfa ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
Reported-by: Anders Andersson <pipatron@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> #v4.4..
Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e846239e5487cbb89ac8192d5f11437d010130e ]
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.
This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.
Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc ]
The following commit introduced a warning on error reports without a
non-zero grain value.
3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.
Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.
Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29a0c843973bc385918158c6976e4dbe891df969 ]
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.
I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.
[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]
Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ]
The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
This is broken in several ways:
1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.
2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.
Fix this with:
e->grain = ~mem_err->physical_addr_mask + 1;
The grain_bits are defined as:
grain = 1 << grain_bits;
Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.
The value in ->physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f6da136046294a1e8d2944336eb97412751f653 ]
The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.
[Tony: These are all "edac_dbg()" messages, so this won't break scripts
that parse console logs.]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ]
Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:
1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]
The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.
Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.
ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.
When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.
But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.
This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.
Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.
ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.
This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.
This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
[ bp: merge into a single patch. ]
Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a2eaab7daf03b23ac902481218034ae2fae5e16 ]
AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.
Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.
Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8be8e5680225ac9caf07d4545f8529b7395327f ]
AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting mci.edac_ctl_cap.
Set the appropriate capability flag based on the device type.
Default to x8 DRAM device when neither the x4 or x16 bits are set.
[ bp: reverse cpk_en check to save an indentation level. ]
Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-3-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29a3388bfcce7a6d087051376ea02bf8326a957b ]
Depending on how BIOS has marked the reserved region containing the 32KB
MCHBAR you can get warnings like:
resource sanity check: requesting [mem 0xfed10000-0xfed1ffff], which spans more than reserved [mem 0xfed10000-0xfed17fff]
caller dnv_rd_reg+0xc8/0x240 [pnd2_edac] mapping multiple BARs
Not all of the mmio regions used in dnv_rd_reg() are the same size. The
MCHBAR window is 32KB and the sideband ports are 64KB. Pass the correct
size to ioremap() depending on which resource we're reading from.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8faa1cf6ed82f33009f63986c3776cc48af1b7b2 ]
Smatch complains about the cast of a u32 pointer to unsigned long:
drivers/edac/altera_edac.c:1878 altr_edac_a10_irq_handler()
warn: passing casted pointer '&irq_status' to 'find_first_bit()'
This code wouldn't work on a 64 bit big endian system because it would
read past the end of &irq_status.
[ bp: massage. ]
Fixes: 13ab8448d2c9 ("EDAC, altera: Add ECC Manager IRQ controller support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624134717.GA1754@mwanda
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3724ace582d9f675134985727fd5e9811f23c059 ]
The grain in EDAC is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:
grain_bits = fls_long(e->grain) + 1;
Where grain_bits is defined as:
grain = 1 << grain_bits
Example:
grain = 8 # 64 bit (8 bytes)
grain_bits = fls_long(8) + 1
grain_bits = 4 + 1 = 5
grain = 1 << grain_bits
grain = 1 << 5 = 32
Replace it with the correct calculation:
grain_bits = fls_long(e->grain - 1);
The example gives now:
grain_bits = fls_long(8 - 1)
grain_bits = fls_long(7)
grain_bits = 3
grain = 1 << 3 = 8
Also, check if the hardware reports a reasonable grain != 0 and fallback
with a warning to 1 byte granularity otherwise.
[ bp: massage a bit. ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8655e7630dafa88bc37f101640e39c736399771 ]
Commit 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.
Reproducer:
# echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
KASAN report:
BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0xca/0x13e
print_address_description.cold+0x5/0x246
__kasan_report.cold+0x75/0x9a
? edac_set_poll_msec+0x140/0x150
kasan_report+0xe/0x20
edac_set_poll_msec+0x140/0x150
? dimmdev_location_show+0x30/0x30
? vfs_lock_file+0xe0/0xe0
? _raw_spin_lock+0x87/0xe0
param_attr_store+0x1b5/0x310
? param_array_set+0x4f0/0x4f0
module_attr_store+0x58/0x80
? module_attr_show+0x80/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
? sysfs_kf_bin_read+0x270/0x270
? kernfs_notify+0x1f0/0x1f0
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa7caa5e970
Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
The buggy address belongs to the variable:
edac_mc_poll_msec+0x0/0x40
Memory state around the buggy address:
ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
>ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
^
ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.
Reviewed-by: James Morse <james.morse@arm.com>
Fixes: 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 585fb3d93d32dbe89e718b85009f9c322cc554cd ]
In edac_create_csrow_object(), the reference to the object is not
released when adding the device to the device hierarchy fails
(device_add()). This may result in a memory leak.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b8358a951b1e2a534a54924cd8245e58a1c5fb8 ]
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:
ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.
This seems to have been broken since at least
c92132f59806 ("edac/85xx: Add PCIe error interrupt edac support")
(Nov 2013).
[ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
as a module. ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 432de7fd7630c84ad24f1c2acd1e3bb4ce3741ca upstream.
The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8960de4a5ca7980ed1e19e7ca5a774d3b7a55c38 upstream.
Add new device IDs for family 17h, models 10h-2fh.
This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.
Signed-off-by: Michael Jin <mikhail.jin@gmail.com>
Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Extend the driver to check whether segment number and bus number matches
when deciding how to group memory controller PCI devices to CPU sockets.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com
[ Cleanup commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
In the quest to remove all stack VLA usage from the kernel[1], switch to
using a kmalloc-allocated buffer instead of stack space. This should be
fine since the existing routine is allocating memory too.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180629184850.GA37464@beast
Link: https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [1]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a30860d ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2d56b109e3a5 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
If regmap_write() fails, we should release some resources as done in all
the other error handling paths of the function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180610174532.22071-1-christophe.jaillet@wanadoo.fr
Fixes: e9918d7fafae ("EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10")
Signed-off-by: Borislav Petkov <bp@suse.de>
ARM machines all have DMI tables so if they request hw error reporting
through GHES, then the driver should be able to detect DIMMs and report
errors successfully (famous last words :)).
Make the platform-based list x86-specific so that ghes_edac can load on
ARM.
Reported-by: Qiang Zheng <zhengqiang10@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Qiang Zheng <zhengqiang10@huawei.com>
Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
The kbuild test robot reported the following warning:
drivers/edac/altera_edac.c: In function 'ocram_free_mem':
drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
gen_pool_free((struct gen_pool *)other, (u32)p, size);
^
After adding support for ARM64 architectures, the unsigned long
parameter is 64 bits and causes a build warning on 64-bit configs. Fix
by casting to the correct size (unsigned long) instead of u32.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c3eea1942a16 ("EDAC, altera: Add Altera L2 cache and OCRAM support")
Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Prevent build error when CONFIG_ACPI_NFIT=m and CONFIG_EDAC_SKX=y by
limiting EDAC_SKX based on how ACPI_NFIT is set.
Fixes this build error:
drivers/edac/skx_edac.o: In function `get_nvdimm_info':
../drivers/edac/skx_edac.c:399: undefined reference to `nfit_get_smbios_id'
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 58ca9ac1463d ("EDAC, skx_edac: Detect non-volatile DIMMs")
Link: http://lkml.kernel.org/r/3af91354-8e19-d2af-1bba-ced8dce053f1@infradead.org
Signed-off-by: Borislav Petkov <bp@suse.de>
... for improved readability. Also, add a local mask variable for the
same reason.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
The ghes_edac driver obtains memory type from SMBIOS type 17, but it
does not recognize DDR4 and NVDIMM types.
Add support of DDR4 and NVDIMM types. NVDIMM type is denoted by memory
type DDR3/4 and non-volatile.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180509222030.9299-1-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On Stratix10, uncorrectable errors are routed to the SError exception
instead of the IRQ exceptions. In Stratix10, uncorrectable SErrors
must be treated as fatal and will cause a panic. Older Altera/Intel
parts printed out a message for UE so do that here using the notifier
framework.
Record the UE in sticky registers that retain the state through a reset.
Check these registers on probe and printout the error on startup.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1526079610-5527-1-git-send-email-thor.thayer@linux.intel.com
[ Remove unused var in s10_edac_dberr_handler(), reorder args. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The use of the @ghes argument was removed in a previous commit, but
function signature was not updated to reflect this.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>