Commit Graph

861 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab
3d958823e2 edac: better report error conditions in debug mode
It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab
59b9796d1e i5100_edac: Remove two checkpatch warnings
The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+       if (priv->debugfs)
+               debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+       if (i5100_debugfs)
+               debugfs_remove(i5100_debugfs);

Get rid of them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund
9cbc6d38f2 i5100_edac: connect fault injection to debugfs node
Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund
53ceafd6a2 i5100_edac: add fault injection code
Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
	http://www.intel.com/assets/pdf/datasheet/318082.pdf

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:33 -03:00
Niklas Söderlund
52608ba205 i5100_edac: probe for device 19 function 0
Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:32 -03:00
Mauro Carvalho Chehab
e7100478fa edac: only create sdram_scrub_rate where supported
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.

Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab
61734e1835 i3200_edac: Fix the logic that detects filled memories
After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.

It seems to be partially caused by the BIOS and partially by the driver.

The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.

However, the BIOS on this particular machine doesn't do it:

[   16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[   16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).

Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).

The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.

The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.

I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:

[   16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[   16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[   16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[   16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[   16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[   16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

and the corresponding sysfs nodes are now properly filled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab
5f466cb025 i3200_edac: Add more debug to the driver
Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.

Add debug for both, as it helps to track issues on the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:22 -03:00
Mauro Carvalho Chehab
1339730e73 Linux 3.8-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q
 GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5
 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+
 Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo
 EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL
 ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M=
 =jjBc
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc7' into next

Linux 3.8-rc7

* tag 'v3.8-rc7': (12052 commits)
  Linux 3.8-rc7
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
  ARM: DMA mapping: fix bad atomic test
  ARM: realview: ensure that we have sufficient IRQs available
  ARM: GIC: fix GIC cpumask initialization
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
  virtio_console: Don't access uninitialized data.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  ...
2013-02-20 15:45:52 -03:00
Joe Perches
d3d09e1820 EDAC: Fix kcalloc argument order
First number, then size.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-30 11:38:25 +01:00
Dan Carpenter
8024c4c0b1 EDAC: Test correct variable in ->store function
We're testing for ->show but calling ->store().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-30 11:38:11 +01:00
Linus Torvalds
57a0c1e2d6 Two error path fixes causing a crash and a Kconfig fix for an issue
which spilled all EDAC suboptions into the 'Device Drivers' menu.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7D0jAAoJEBLB8Bhh3lVKtcgP/3uzBjETQ7KzKG9R+zHxkBom
 7pl6UYqgquqpoqP3tGsn0+oaVyEvY3D11NMp/vhGxUjqJjDnIE3T/TvBMIwaRaq1
 /bPrZ+hE5d2mHWP/TULEVVYMziHhR9UO1SpB+E7vX7FqKcLwzeiS7CMlm4qjq0qC
 duXCG/6gyEKIecT3KPuXFzpojV6pSpSIYvLu03hdS3e8w7Wb6jvomWV282TpaSim
 aFyL909M/f4kexWD0827VnBxogwBWPl1hIGfFVprYGrabgUTtfb/OFhnZut8U1FV
 giDtDgIC7nATp1LzTK+fPE5kx9yZvrE1SV8ZTNF6r0dYH14fnR3YgbQqbVgbLUYF
 icQqCHpyZKD+s/ajymq8lg+ltVtROqCLWdT6YFZlaBlObP0sYltpSj1JRcoC5O6c
 Lx252LQQmxMQZKGnNlBooI6QN73GXlsngTgHD9z3DpFgvRilhA8EmbK2ZFFesdkk
 R4s25yH8fno6ClhxlwyA3+0vwU9B19Ul00cx4/5ZXPdK56Fq7slM/ORTE5SQFrHH
 6QG/pYDz09WXhKvOtX4ZRh2bBudMsY7mgsrxFelGYtsqHikLtYVUzkkyFkccjZMj
 UWcHUHsdSx+gr4DK/jclW1tDgwzh2G17d3FJobujGSgJJIqM2qOo/izON/g4+ojA
 RlKqYTCkAh3fX6DZX1vX
 =Qh4Y
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fixes from Borislav Petkov:
 "Two error path fixes causing a crash and a Kconfig fix for an issue
  which spilled all EDAC suboptions into the 'Device Drivers' menu."

* tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Cleanup device deregistering path
  EDAC: Fix EDAC Kconfig menu
  EDAC: Fix kernel panic on module unloading
2013-01-09 08:43:56 -08:00
Lans Zhang
44d22e2404 EDAC: Cleanup device deregistering path
Use device_unregister to replace put_device + device_del for
cleanup, and fix the potential use after free.

Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07 17:43:00 +01:00
Borislav Petkov
5445166384 EDAC: Fix EDAC Kconfig menu
After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering
the "Device Drivers" toplevel menu in menuconfig, the suboptions behind
EDAC appeared merged with the rest of the device drivers types. This was
because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig
bool which was defined after the menu definition.

When pushing EDAC_SUPPORT up, before the menu definition, the variable
is defined earlier and the above menuconfig artifact doesn't happen.

Drop a useless menuconfig comment while at it.

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07 17:42:59 +01:00
Konstantin Khlebnikov
311bd84247 EDAC: Fix kernel panic on module unloading
This patch fixes use-after-free and double-free bugs in
edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
calls mc_attr_release() which calls kfree(). The following
device_del() works with already released memory. An another kfree() in
edac_mc_sysfs_exit() releses the same memory again. Great.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: stable@vger.kernel.org # 3.[67]
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07 17:42:58 +01:00
Greg Kroah-Hartman
9b3c6e85c2 Drivers: edac: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:03 -08:00
Lans Zhang
1c069100c1 i7core_edac: fix kernel crash on unloading i7core_edac.
It is easy to trigger this crash on 3.7.0:

root@intel_westmere_ep-3:~# modprobe -r i7core_edac
EDAC PCI: Removed device 0 for i7core_edac EDAC PCI controller: DEV 0000:fe:03.0
EDAC MC: Removed device 1 for i7core_edac.c i7 core #1: DEV 0000:fe:03.0
EDAC PCI: Removed device 1 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0
EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
IP: [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80
PGD 1eaae7067 PUD 1e96e4067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: minix acpi_cpufreq freq_table mperf ioatdma processor edac_core(-) iTCO_wdt coretemp evdev hwmon lpc_ich dca mfd_core crc32c_intel ioapic [last unloaded: i7core_edac]
CPU 3
Pid: 1268, comm: modprobe Not tainted 3.7.0-WR5.0.1.0_standard+ #30 Intel Corporation S5520HC/S5520HC
RIP: 0010:[<ffffffff82069ee9>]  [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80
RSP: 0018:ffff8801eb12de28  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000f0 RCX: 00000000ffffffff
RDX: ffff88012b452800 RSI: 0000000000000002 RDI: 00000000000000f0
RBP: ffff8801eb12de68 R08: 0000000000000000 R09: ffffea0004ad1118
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8801eb12dee8 R14: ffff88012b452800 R15: 000000000060e518
FS:  00007f9ea95a9700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000110 CR3: 00000001262f1000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 1268, threadinfo ffff8801eb12c000, task ffff8801e8421690)
Stack:
  ffff88012c802a00 ffff88012b445ec0 ffff88012c802300 ffff88012b452800
  0000000000000000 ffff8801eb12dee8 000000000060e080 000000000060e518
  ffff8801eb12de78 ffffffff82069f56 ffff8801eb12dea8 ffffffff824ead7c
Call Trace:
  [<ffffffff82069f56>] blocking_notifier_call_chain+0x16/0x20
  [<ffffffff824ead7c>] device_del+0x3c/0x1d0
  [<ffffffffa00095a8>] edac_mc_sysfs_exit+0x1c/0x2f [edac_core]
  [<ffffffffa000961c>] edac_exit+0x4f/0x56 [edac_core]
  [<ffffffff820a3d2a>] sys_delete_module+0x17a/0x240
  [<ffffffff8212da7c>] ? vm_munmap+0x5c/0x80
  [<ffffffff82877682>] system_call_fastpath+0x16/0x1b
Code: 90 90 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 31 c0 49 89 d6 48 89 fb <48> 8b 57 20 49 89 f5 41 89 cf 4c 8d 67 20 48 85 d2 74 2c 4c 89
RIP  [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80
  RSP <ffff8801eb12de28>
CR2: 0000000000000110
---[ end trace b69acf12ccad1c0d ]---

Usually, edac_subsys is grabbed one time by pci at initialization.
But edac_subsys may be released several times if multiple pci MCs exist.
The fix just makes the operations balanced.

Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21 08:01:23 -02:00
Niklas Söderlund
c31d34fe92 i7core_edac: fix erroneous size of static array
Remove size from lookup arrays and mark them as const.

Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Niklas Söderlund <niso@kth.se>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21 08:01:20 -02:00
Mauro Carvalho Chehab
da14d93d95 sb_edac: add a missing /n on a debug message
[   17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[   17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21 08:01:18 -02:00
Shaun Ruffell
80f5ab097b edac: edac_mc no longer deals with kobjects directly
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21 08:00:02 -02:00
Linus Torvalds
cebfa85eb8 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "The MIPS bits for 3.8.  This also includes a bunch fixes that were
  sitting in the linux-mips.org git tree for a long time.  This pull
  request contains updates to several OCTEON drivers and the board
  support code for BCM47XX, BCM63XX, XLP, XLR, XLS, lantiq, Loongson1B,
  updates to the SSB bus support, MIPS kexec code and adds support for
  kdump.

  When pulling this, there are two expected merge conflicts in
  include/linux/bcma/bcma_driver_chipcommon.h which are trivial to
  resolve, just remove the conflict markers and keep both alternatives."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (90 commits)
  MIPS: PMC-Sierra Yosemite: Remove support.
  VIDEO: Newport Fix console crashes
  MIPS: wrppmc: Fix build of PCI code.
  MIPS: IP22/IP28: Fix build of EISA code.
  MIPS: RB532: Fix build of prom code.
  MIPS: PowerTV: Fix build.
  MIPS: IP27: Correct fucked grammar in ops-bridge.c
  MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled
  MIPS: Fix potencial corruption
  MIPS: Fix for warning from FPU emulation code
  MIPS: Handle COP3 Unusable exception as COP1X for FP emulation
  MIPS: Fix poweroff failure when HOTPLUG_CPU configured.
  MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y
  MIPS: Remove unused smvp.h
  MIPS/EDAC: Improve OCTEON EDAC support.
  MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.
  MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h
  ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.
  MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.
  MIPS: Remove usage of CEVT_R4K_LIB config option.
  ...
2012-12-14 14:27:45 -08:00
David Daney
e1ced09797 MIPS/EDAC: Improve OCTEON EDAC support.
Some initialization errors are reported with the existing OCTEON EDAC
support patch.  Also some parts have more than one memory controller.

Fix the errors and add multiple controllers if present.

Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:26 +01:00
Ralf Baechle
f65aad4177 MIPS: Cavium: Add EDAC support.
Drivers for EDAC on Cavium.  Supported subsystems are:

 o CPU primary caches.  These are parity protected only, so only error
   reporting.
 o Second level cache - ECC protected, provides SECDED.
 o Memory: ECC / SECDEC if used with suitable DRAM modules.  The driver will
   will only initialize if ECC is enabled on a system so is safe to run on
   non-ECC memory.
 o PCI: Parity error reporting

Since it is very hard to test this sort of code the implementation is very
conservative and uses polling where possible for now.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
2012-12-12 16:48:49 +01:00
Linus Torvalds
9ada9fd5df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:

 - EDAC core error path fix, from Denis Kirjanov.

 - Generalization of AMD MCE bank names and some minor error reporting
   improvements.

 - EDAC core cleanups and simplifications, from Wei Yongjun.

 - amd64_edac fixes for sysfs-reported values, from Josh Hunt.

 - some heavy amd64_edac error reporting path shaving, leading to
   removing a bunch of code.

 - amd64_edac error injection method improvements.

 - EDAC core cleanups and fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
  EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
  EDAC: Handle error path in edac_mc_sysfs_init() properly
  MCE, AMD: Dump error status
  MCE, AMD: Report decoded error type first
  MCE, AMD: Dump CPU f/m/s triple with the error
  MCE, AMD: Remove functional unit references
  EDAC: Convert to use simple_open()
  EDAC, Calxeda highbank: Convert to use simple_open()
  EDAC: Fix mc size reported in sysfs
  EDAC: Fix csrow size reported in sysfs
  EDAC: Pass mci parent
  EDAC: Add memory controller flags
  amd64_edac: Fix csrows size and pages computation
  amd64_edac: Use DBAM_DIMM macro
  amd64_edac: Fix K8 chip select reporting
  amd64_edac: Reorganize error reporting path
  amd64_edac: Do not check whether error address is valid
  amd64_edac: Improve error injection
  amd64_edac: Cleanup error injection code
  amd64_edac: Small fixlets and cleanups
  ...
2012-12-11 11:28:43 -08:00
Wei Yongjun
3bfe5aae8e EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
Use for_each_pci_dev to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
[Boris: cleanup comments and drop loop brackets]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-12-04 08:27:39 +01:00
Linus Torvalds
b52c6402b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC fixes from Mauro Carvalho Chehab:
 "One EDAC core fix, and a few driver fixes (i7300, i9275x, i7core)."

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  i7core_edac: fix panic when accessing sysfs files
  i7300_edac: Fix error flag testing
  edac: Fix the dimm filling for csrows-based layouts
  i82975x_edac: Fix dimm label initialization
2012-12-03 11:16:37 -08:00
Denis Kirjanov
2d56b109e3 EDAC: Handle error path in edac_mc_sysfs_init() properly
Make sure proper deregistration happens on all error paths in
edac_mc_sysfs_init.

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
[ Boris: cleanup and concretize commit message ]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28 11:56:52 +01:00
Borislav Petkov
d5c6770d4c MCE, AMD: Dump error status
Dump error status after decoding the error which describes the error
disposition.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:56:30 +01:00
Borislav Petkov
d824c7718b MCE, AMD: Report decoded error type first
Instead of starting with the error details, report the decoded, readable
error type first.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:56:17 +01:00
Borislav Petkov
f89f8388cd MCE, AMD: Dump CPU f/m/s triple with the error
It is very useful to have the family/model/stepping with the reported
error so dump it. This saves us asking the bug reporter about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:57 +01:00
Borislav Petkov
f05c41a9c6 MCE, AMD: Remove functional unit references
Having the functional unit names in each bank decode is only misleading
as this code supports multiple families and there's no guarantee the
mapping between FUs and MCE banks will stay the same.

And also, knowing the functional unit name doesn't help much since you
end up looking at the respective BKDG anyway.

So drop all FU references and use the MC bank numbers instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:44 +01:00
Wei Yongjun
db7312a295 EDAC: Convert to use simple_open()
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:22 +01:00
Wei Yongjun
f35d852e80 EDAC, Calxeda highbank: Convert to use simple_open()
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:07 +01:00
Josh Hunt
3c0622760a EDAC: Fix mc size reported in sysfs
This is the complement to previous commit "EDAC: Fix csrow size
reported in sysfs". This fixes the memory controller size reporting on
csrow-based memory controllers. The csrow size is already combined for
both channels. Without this patch memory size is reported doubled.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:50 +01:00
Borislav Petkov
16a528ee39 EDAC: Fix csrow size reported in sysfs
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.

Fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:40 +01:00
Borislav Petkov
921a689965 EDAC: Pass mci parent
Initialize the mem_ctl_info descriptor of a csrow properly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:23 +01:00
Borislav Petkov
1165276917 EDAC: Add memory controller flags
The first flag is ->csbased and will be used in common EDAC code later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:48:04 +01:00
Borislav Petkov
10de6497a5 amd64_edac: Fix csrows size and pages computation
Make sure code pays attention to K8 having only one DCT, reformat and
cleanup code, correct debug messages, remove unused code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:47:36 +01:00
Borislav Petkov
0a5dfc3140 amd64_edac: Use DBAM_DIMM macro
Instead of open-coding it, use the DBAM_DIMM macro in
amd64_csrow_nr_pages() which we have already.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:46:19 +01:00
Borislav Petkov
bb89f5a054 amd64_edac: Fix K8 chip select reporting
This basically reverts 603adaf6b3 ("amd64_edac: fix K8 chip select
reporting") because it was a clumsy workaround for DIMM sizes reporting
on K8 which got superceded by a much more correct one with 41d8bfaba7
("amd64_edac: Improve DRAM address mapping") without removing the prior
one. Remove it now finally.

Reported-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:46 +01:00
Borislav Petkov
33ca0643c9 amd64_edac: Reorganize error reporting path
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:34 +01:00
Borislav Petkov
c8d1adf092 amd64_edac: Do not check whether error address is valid
All families report a valid error address when encountering a DRAM ECC
error so no need to check it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:11 +01:00
Borislav Petkov
66fed2d464 amd64_edac: Improve error injection
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine
does this by injecting the error in the next non-cached access. This
takes relatively long time on a normal system so that in order for us to
expedite it, we disable the caches around the injection.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:01 +01:00
Borislav Petkov
6e71a870b8 amd64_edac: Cleanup error injection code
Invert kstrtoul return value testing and win one indentation level.
Also, shorten up macro names so that the lines can fit into 80 cols. No
functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:44:35 +01:00
Borislav Petkov
1f31677e0d amd64_edac: Small fixlets and cleanups
amd64_get_dram_hole_info: remove local variable 'base'.
sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize
constants formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:44:12 +01:00
Borislav Petkov
f430d5707a EDAC: Handle empty msg strings when reporting errors
A reported error could look like this

[  226.178315] EDAC MC0: 1 CE  on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)

with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:24:12 +01:00
Borislav Petkov
4da1b7bfe7 EDAC: Remove useless assignment of error type
The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:23:50 +01:00
Borislav Petkov
37929874d4 EDAC: Boundary-check edac_debug_level
Only levels [0:4] are allowed so enforce that. Also, while at it,
massage Kconfig text and add valid debug levels range to the module
parameter description.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:23:32 +01:00
Borislav Petkov
876bb331e2 EDAC: Respect operational state in edac_pci.c
Currently, we unconditionally enable PCI polling and we don't look at
the edac_op_state module parameter. Make this dependent on the parameter
setting supplied on the command line.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:22:47 +01:00
Prarit Bhargava
42709efb3a i7core_edac: fix panic when accessing sysfs files
The i7core_edac addrmatch_dev and chancounts_dev have sysfs files
associated with them.  The sysfs files, however, are coded so that the
parent device is is the mci device.  This is incorrect and the mci struct
should be obtained through the addrmatch_dev and chancounts_dev device's
private data field which is populated in i7core_create_sysfs_devices().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-11-28 06:56:00 -02:00