IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC updates from Borislav Petkov:
"Mostly AMD's side of EDAC. It is basically a new family enablement
stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is
trivial cleanups."
* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
mpc85xx_edac: Fix typo
EDAC, MCE, AMD: Remove unneeded exports
EDAC, MCE, AMD: Add MCE decoding support for Family 16h
EDAC, MCE, AMD: Make MC2 decoding per-family
amd64_edac: Remove dead code
5e2af0c09e ("edac: Don't initialize csrow's first_page & friends when
not needed") removed useless initialization of variables but left in the
functions which did that. They're unused now so drop them.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Fix locating sibling memory controller PCI functions by using the
correct PCI domain and use a northbridge descriptor only if found. We
need to at least warn if it wasn't found so that it gets fixed and we
don't go off with wrong results.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com
[Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails]
Signed-off-by: Borislav Petkov <bp@alien8.de>
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.
Fix it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Make sure code pays attention to K8 having only one DCT, reformat and
cleanup code, correct debug messages, remove unused code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Instead of open-coding it, use the DBAM_DIMM macro in
amd64_csrow_nr_pages() which we have already.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This basically reverts 603adaf6b3 ("amd64_edac: fix K8 chip select
reporting") because it was a clumsy workaround for DIMM sizes reporting
on K8 which got superceded by a much more correct one with 41d8bfaba7
("amd64_edac: Improve DRAM address mapping") without removing the prior
one. Remove it now finally.
Reported-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
All families report a valid error address when encountering a DRAM ECC
error so no need to check it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine
does this by injecting the error in the next non-cached access. This
takes relatively long time on a normal system so that in order for us to
expedite it, we disable the caches around the injection.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
amd64_get_dram_hole_info: remove local variable 'base'.
sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize
constants formatting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.
As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.
Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2
It will be rewritten more robustly soonish.
Reported-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: stable@vger.kernel.org
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.
The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.
The changes at the drivers were made by this small script:
$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
print $file;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.
The changes at the EDAC drivers were done by this small perl script:
$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
print $file;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The EDAC driver name doesn't help to handle EDAC errors. So,
remove it from the EDAC error messages, preserving only the
error_message.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use a more common debugging style.
Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that the EDAC core supports struct device, there's no sense
on having any logic at the EDAC core to simulate it. So, instead
of adding such logic there, change the logic at amd64_edac to
use it.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that all drivers got converted to use the new ABI, we can
drop the old one.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Almost all edac drivers initialize csrow_info->first_page,
csrow_info->last_page and csrow_info->page_mask. Those vars are
used inside the EDAC core, in order to calculate the csrow affected
by an error, by using the routine edac_mc_find_csrow_by_page().
However, very few drivers actually use it:
e752x_edac.c
e7xxx_edac.c
i3000_edac.c
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
r82600_edac.c
There also a few other drivers that have their own calculus
formula internally using those vars.
All the others are just wasting time by initializing those
data.
While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:
"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."
So use DEFINE_PCI_DEVICE_TABLE(x).
Based on PaX and earlier work by Andi Kleen.
Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
While initializing the array of csrow attribute instances, a few csrows
were uninitialized. This happened because the module only performed a
check for DRAM base ctl register0's and not DRAM base ctl register1's
chip select enable bit. There could be systems with DIMMs populated
on only single memory channel whereas the module also assumed that a
dual channel dimm had double the memory size of a single memory channel
instead of checking the memory on each channel.
This patch fixes these above issues.
Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com>
Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When accessing the scrub rate control register (F3x58) on F15h, the DRAM
controller selector (F1x10C[DctCfgSel]) has to point to DCT0 so that the
scrub rate configuration can take effect. See Erratum 505 in the AMD
F15h revision guide for more details.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Drop third nbcfg argument which is old remains and not required anymore.
No functionality change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in the scrub rate
accessors.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.
Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt
is always first). Also, the now second arg denotes the DCT so adjust its
type.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>