723 Commits

Author SHA1 Message Date
Hidetoshi Seto
21b6806a8c i7core_edac: Remove unused member channels in i7core_pvt
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
2e5185f7ff i7core_edac: Remove unused arg csrow from get_dimm_config
A local is enough.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
aace42831a i7core_edac: Reduce args of i7core_register_mci
We can check the number of channels in i7core_register_mci.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
1c6edbbe25 i7core_edac: Introduce i7core_unregister_mci
In i7core_probe, when setup of mci for 2nd or later socket failed,
we should cleanup prepared mci for 1st socket or so before "put" of
all devices.

So let have i7core_unregister_mci that can be shared between here
and i7core_remove.

While here fix a typo "hanler".

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
73589c80cd i7core_edac: Use saved pointers
We already have saved pointers.  Use shorter ones.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
71fe01706d i7core_edac: Check probe counter in i7core_remove
Prevent i7core_remove from running multiple times.
Otherwise value proved will be negative and something will be wrong.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
2896637b86 i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
628c5ddfb0 i7core_edac: Fix error path of i7core_register_mci
Release resources properly.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
5939813b9c i7core_edac: Fix order of lines in i7core_register_mci
The flag is_registered is not initialized until mci_bind_devs()
is called.  Refer it properly.

The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
64c10f6e0e i7core_edac: Always do get/put for all devices
We already do 'get' for all sockets at once. So do 'put' in the
same way.

And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
a3aa0a4ab5 i7core_edac: Introduce i7core_pci_ctl_create/release
Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
2aa9be448d i7core_edac: Introduce free_i7core_dev
Have a method to make a couple with alloc_i7core_dev() previously
introduced.  Using in pair will help proper resource handling.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
848b2f7ed6 i7core_edac: Introduce alloc_i7core_dev
It's nice to have a method for a single purpose.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
b197cba071 i7core_edac: Reduce args of i7core_get_onedevice
Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.

While here make it static.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
45b7c981ae i7core_edac: Fix the logic in i7core_remove()
commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons.  It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.

Fix the part.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
54a08ab153 i7core_edac: Don't do the legacy PCI probe by default
The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
accf74fff3 i7core_edac: don't use a freed mci struct
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[   80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[   80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
[   80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[   80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[   80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[   80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[   80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[   80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
	edac_remove_sysfs_mci_device(mci);
	edac_printk(KERN_INFO, EDAC_MC,
		"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
		mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
bbc560ae67 edac_core: Print debug messages at release calls
This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
ac99768c53 edac_core: Don't let free(mci) happen while using it
A very nasty bug were happening on edac core, due to the way mci objects are
freed. mci memory is freed when kobject count reaches zero, by
edac_mci_control_release(). However, from the logs, this is clearly happening
before the final usage of mci struct:

[15799.607454] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[15799.618773] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 769: edac_inst_grp_release()
[15799.627326] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 894: edac_remove_mci_instance_attributes() end of seeking for group all_channel_counts
[15799.640887] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 877: edac_remove_mci_instance_attributes() sysfs_attrib = ffffffffa01d7240
[15799.653412] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[15799.663753] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6fe1108f14 edac_core: Do a better job with node removal
Make sure we remove groups at the right order

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
39300e7143 i7core_edac: explicitly remove PCI devices from the devices list
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
41ba6c1058 i7core_edac: MCE NMI handling should stop first
Otherwise, a NMI may happen causing a race condition and a panic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6ee7dd5044 i7core_edac: Initialize all priv vars before start polling
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
3cfd01468b i7core_edac: Improve debug to seek for register/remove errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
e9144601d3 i7core_edac: move #if PAGE_SHIFT to edac_core.h
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:36 -02:00
Mauro Carvalho Chehab
1288c18f48 i7core_edac: Properly mark const static vars as such
There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.

This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.

While here, use const on all places where it is applicable.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_core: use const for constant sysfs arguments

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:14 -02:00
Mauro Carvalho Chehab
18c29002f9 i7core_edac: move static vars to the beginning of the file
While here, don't initialize probed with 0.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Mauro Carvalho Chehab
939747bd68 i7core_edac: Be sure that the edac pci handler will be properly released
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Linus Torvalds
c029e405bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (21 commits)
  EDAC, MCE: Fix shift warning on 32-bit
  EDAC, MCE: Add a BIT_64() macro
  EDAC, MCE: Enable MCE decoding on F12h
  EDAC, MCE: Add F12h NB MCE decoder
  EDAC, MCE: Add F12h IC MCE decoder
  EDAC, MCE: Add F12h DC MCE decoder
  EDAC, MCE: Add support for F11h MCEs
  EDAC, MCE: Enable MCE decoding on F14h
  EDAC, MCE: Fix FR MCEs decoding
  EDAC, MCE: Complete NB MCE decoders
  EDAC, MCE: Warn about LS MCEs on F14h
  EDAC, MCE: Adjust IC decoders to F14h
  EDAC, MCE: Adjust DC decoders to F14h
  EDAC, MCE: Rename files
  EDAC, MCE: Rework MCE injection
  EDAC: Export edac sysfs class to users.
  EDAC, MCE: Pass complete MCE info to decoders
  EDAC, MCE: Sanitize error codes
  EDAC, MCE: Remove unused function parameter
  EDAC, MCE: Add HW_ERR prefix
  ...
2010-10-21 14:04:58 -07:00
Linus Torvalds
2f0384e5fc Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
  x86, amd: Use compute unit information to determine thread siblings
  x86, amd: Extract compute unit information for AMD CPUs
  x86, amd: Add support for CPUID topology extension of AMD CPUs
  x86, nmi: Support NMI watchdog on newer AMD CPU families
  x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
  x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
  x86, k8-gart: Decouple handling of garts and northbridges
  x86, cacheinfo: Fix dependency of AMD L3 CID
  x86, kvm: add new AMD SVM feature bits
  x86, cpu: Fix allowed CPUID bits for KVM guests
  x86, cpu: Update AMD CPUID feature bits
  x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
  x86, AMD: Remove needless CPU family check (for L3 cache info)
  x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21 13:01:08 -07:00
Borislav Petkov
525906bc89 EDAC, MCE: Fix shift warning on 32-bit
Fix

drivers/edac/mce_amd.c:262: warning: left shift count >= width of type

on 32-bit builds.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:07 +02:00
Borislav Petkov
cf1d2200db EDAC, MCE: Add a BIT_64() macro
Add a macro for 64-bit vectors to use when accessing MSR contents.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
fda7561f43 EDAC, MCE: Enable MCE decoding on F12h
Turn on MCE decoding on F12h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
cb9d5ecdff EDAC, MCE: Add F12h NB MCE decoder
F12h is completely covered by the generic path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
e7281eb37d EDAC, MCE: Add F12h IC MCE decoder
... which is the same as for K8 and F10h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
9be0bb1072 EDAC, MCE: Add F12h DC MCE decoder
F12h DC MCE signatures are a subset of F10h's so reuse them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
f0157b3afd EDAC, MCE: Add support for F11h MCEs
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
9530d608ef EDAC, MCE: Enable MCE decoding on F14h
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
fe4ea2623b EDAC, MCE: Fix FR MCEs decoding
Those are N/A on K8, so don't decode them there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
5ce88f6ea6 EDAC, MCE: Complete NB MCE decoders
Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
ded5062328 EDAC, MCE: Warn about LS MCEs on F14h
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
dd53bce4e8 EDAC, MCE: Adjust IC decoders to F14h
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:01 +02:00
Borislav Petkov
888ab8e6eb EDAC, MCE: Adjust DC decoders to F14h
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
47ca08a40b EDAC, MCE: Rename files
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
9cdeb404a1 EDAC, MCE: Rework MCE injection
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.

Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
30e1f7a812 EDAC: Export edac sysfs class to users.
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
7cfd4a8744 EDAC, MCE: Pass complete MCE info to decoders
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
6337583d7d EDAC, MCE: Sanitize error codes
Clean up error codes names, shorten to mnemonics, add RRRR boundary
checking.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
0ee8efa8f4 EDAC, MCE: Remove unused function parameter
Remove remains from previous functionality.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
c9f281fd96 EDAC, MCE: Add HW_ERR prefix
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00