12781 Commits

Author SHA1 Message Date
Thomas Gleixner
ed972ccf43 x86: ioapic: Split out the nested loop in setup_IO_APIC_irqs()
Two consecutive

    for(...)
    for(...)

lines to avoid an extra indentation are just horrible to read. I had
to look more than once to figure out what the code is doing.

Split out the inner loop into a separate function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-23 17:26:49 +01:00
Thomas Gleixner
c8d6b8fe72 x86: ioapic: Remove silly debug bloat in setup_IOAPIC_irqs()
This is debug code and it does not matter at all whether we print each
not connected pin in an extra line or try to be extra clever.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-23 17:26:48 +01:00
Thomas Gleixner
939d578ecc x86: OLPC: Make OLPC=n build again
Stupid me missed the functions called from setup.c. Add the stubs back
for OLPC=n

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-23 11:54:02 +01:00
Henrik Kretzschmar
1444e0c9da x86: Fix deps of X86_UP_IOAPIC
Since commit 7cd92366a593246650cc7d6198e2c7d3af8c1d8a
lAPIC enabled accidently the IOAPIC, which now gets fixed.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-5-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:38:46 +01:00
Henrik Kretzschmar
7d0f192613 x86: Add dummy functions for compiling without IOAPIC
This patch adds IOAPIC dummy functions for compilation
with local APIC, but without IOAPIC.

The local variable ioapic_entries in enable_IR_x2apic()
does not need initialization anymore, since the dummy
returns NULL.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-4-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:38:46 +01:00
Henrik Kretzschmar
7167d08e78 x86: Rework arch_disable_smp_support() for x86
Currently arch_disable_smp_support() on x86 disables only the
support for the IOAPIC and is also compiled in if SMP-support is
not.

Therefore this function is renamed to disable_ioapic_support(),
which meets its purpose and is only compiled in the kernel
when IOAPIC support is also.

A new arch_disable_smp_support() is created in smpboot.c,
which calls disable_ioapic_support() and gets only compiled
in the kernel when SMP support is also.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-3-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:38:45 +01:00
Henrik Kretzschmar
b6a1432da8 x86: Add dummy mp_save_irq()
This is a dummy function, used when no IOAPIC is compiled in.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-2-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:38:45 +01:00
Henrik Kretzschmar
4e034b2451 x86: Move ioapic_irq_destination_types to apicdef.h
This enum is used by non IOAPIC code, so apicdef.h is
the best place for it.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-1-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23 11:38:44 +01:00
Thomas Gleixner
c2a941fadb x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection
OLPC_OPENFIRMWARE_DT is just there to be selected by OLPC and selects
OF_PROMTREE. So let OLPC select OF_PROMTREE and remove that extra
config indirection. Fixup code and Makefile and use CONFIG_OF_PROMTREE
instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>
2011-02-23 10:40:45 +01:00
Thomas Gleixner
dc3119e700 x86: OLPC: Cleanup config maze completely
Neither CONFIG_OLPC_OPENFIRMWARE nor CONFIG_OLPC_OPENFIRMWARE_DT are
really necessary.

OLPC selects OLPC_OPENFIRMWARE unconditionally, so move the "select
OF" part under OLPC config option and fixup the dependencies in
Makefiles and code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>
2011-02-23 10:40:45 +01:00
Thomas Gleixner
fe239545a1 x86: OLPC: Hide OLPC_OPENFIRMWARE config switch
OLPC selects OLPC_OPENFIRMWARE unconditionally. If OLPC=n then
the OLPC_OPENFIRMWARE functionality is pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>
2011-02-23 10:40:45 +01:00
Thomas Gleixner
540089798d x86: OLPC: Remove redundant !X64_64 config dependency
OLPC is under if X86_32 already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>
2011-02-23 10:40:45 +01:00
Thomas Gleixner
7acdbb3f35 Merge branch 'linus' into x86/platform
Reason: Import mainline device tree changes on which further patches
        depend on or conflict.

Trivial conflict in: drivers/spi/pxa2xx_spi_pci.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-23 09:21:41 +01:00
Zhang, Fengzhe
2f14ddc3a7 xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.
With the hypervisor argument of dom0_mem=X we iterate over the physical
(only for the initial domain) E820 and subtract the the size from each
E820_RAM region the delta so that the cumulative size of all E820_RAM regions
is equal to 'X'. This sometimes ends up with E820_RAM regions with zero size
(which are removed by e820_sanitize) and E820_RAM that are smaller
than physically.

Later on the PCI API looks at the E820 and attempts to set up an
resource region for the "PCI mem". The E820 (assume dom0_mem=1GB is
set) compared to the physical looks as so:

 [    0.000000] BIOS-provided physical RAM map:
 [    0.000000]  Xen: 0000000000000000 - 0000000000097c00 (usable)
 [    0.000000]  Xen: 0000000000097c00 - 0000000000100000 (reserved)
-[    0.000000]  Xen: 0000000000100000 - 00000000defafe00 (usable)
+[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
 [    0.000000]  Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS)
 [    0.000000]  Xen: 00000000defb1ea0 - 00000000e0000000 (reserved)
 [    0.000000]  Xen: 00000000f4000000 - 00000000f8000000 (reserved)
..
And we get
[    0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:9efafe00)

while it should have started at e0000000 (a nice big gap up to
f4000000 exists). The "Allocating PCI" is part of the resource API.

The users that end up using those PCI I/O regions usually supply their
own BARs when calling the resource API (request_resource, or allocate_resource),
but there are exceptions which provide an empty 'struct resource' and
expect the API to provide the 'struct resource' to be populated with valid values.
The one that triggered this bug was the intel AGP driver that requested
a region for the flush page (intel_i9xx_setup_flush).

Before this patch, when running under Xen hypervisor, the 'struct resource'
returned could have (depending on the dom0_mem size) physical ranges of a 'System RAM'
instead of 'I/O' regions. This ended up with the Hypervisor failing a request
to populate PTE's with those PFNs as the domain did not have access to those
'System RAM' regions (rightly so).

After this patch, the left-over E820_RAM region from the truncation, will be
labeled as E820_UNUSABLE. The E820 will look as so:

 [    0.000000] BIOS-provided physical RAM map:
 [    0.000000]  Xen: 0000000000000000 - 0000000000097c00 (usable)
 [    0.000000]  Xen: 0000000000097c00 - 0000000000100000 (reserved)
-[    0.000000]  Xen: 0000000000100000 - 00000000defafe00 (usable)
+[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
+[    0.000000]  Xen: 0000000040000000 - 00000000defafe00 (unusable)
 [    0.000000]  Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS)
 [    0.000000]  Xen: 00000000defb1ea0 - 00000000e0000000 (reserved)
 [    0.000000]  Xen: 00000000f4000000 - 00000000f8000000 (reserved)

For more information:
http://mid.gmane.org/1A42CE6F5F474C41B63392A5F80372B2335E978C@shsmsx501.ccr.corp.intel.com

BugLink: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726

Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-22 12:48:50 -05:00
Thomas Gleixner
695884fb8a Merge branch 'devicetree/for-x86' of git://git.secretlab.ca/git/linux-2.6 into x86/platform
Reason: x86 devicetree support for ce4100 depends on those device tree
	changes scheduled for .39.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-22 18:41:48 +01:00
Joerg Roedel
2c46d2aec0 KVM: SVM: Advance instruction pointer in dr_intercept
In the dr_intercept function a new cpu-feature called
decode-assists is implemented and used when available. This
code-path does not advance the guest-rip causing the guest
to dead-loop over mov-dr instructions. This is fixed by this
patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-02-22 16:01:44 +02:00
Yinghai Lu
2bf50555b0 x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
Alloc code is much bigger the distance setting.  Separate it out into
numa_alloc_distance() for readability.

-v2: Let alloc_numa_distance to return -ENOMEM on failing path,
     requested by tj.

-tj: Description update.  Minor tweaks including function name,
     location and return value check.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-02-22 11:18:49 +01:00
Tejun Heo
90e6b677b4 x86-64, NUMA: Add proper function comments to global functions
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
2011-02-22 11:10:08 +01:00
Tejun Heo
b8ef9172b2 x86-64, NUMA: Move NUMA emulation into numa_emulation.c
Create numa_emulation.c and move all NUMA emulation code there.  The
definitions of struct numa_memblk and numa_meminfo are moved to
numa_64.h.  Also, numa_remove_memblk_from(), numa_cleanup_meminfo(),
numa_reset_distance() along with numa_emulation() are made global.

- v2: Internal declarations moved to numa_internal.h as suggested by
      Yinghai.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
2011-02-22 11:10:08 +01:00
Tejun Heo
fbe99959d1 x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file
Update numa_emulation() such that, it

- takes @numa_meminfo and @numa_dist_cnt instead of directly
  referencing the global variables.

- copies the distance table by iterating each distance with
  node_distance() instead of memcpy'ing the distance table.

- tests emu_cmdline to determine whether emulation is requested and
  fills emu_nid_to_phys[] with identity mapping if emulation is not
  used.  This allows the caller to call numa_emulation()
  unconditionally and makes return value unncessary.

- defines dummy version if CONFIG_NUMA_EMU is disabled.

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
2011-02-22 11:10:08 +01:00
Yinghai Lu
69efcc6d90 x86-64, NUMA: Do not scan two times for setup_node_bootmem()
By the time setup_node_bootmem() is called, all the memblocks are
already registered.  As node_data is allocated from these memblocks,
calling it more than once doesn't make any difference.  Drop the loop.

tj: Dropped comment referencing to the old behavior as suggested by
    David and rephrased the description.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-02-21 11:23:31 +01:00
Kushal Koolwal
e19e074b15 x86: Fix reboot problem on VersaLogic Menlow boards
VersaLogic Menlow based boards hang on reboot unless reboot=bios
is used. Add quirk to reboot through the BIOS.

Tested on at least four boards.

Signed-off-by: Kushal Koolwal <kushalkoolwal@gmail.com>
LKML-Reference: <1298152563-21594-1-git-send-email-kushalkoolwal@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-21 08:41:26 +01:00
Dan Carpenter
1396fa9cd2 x86, microcode, AMD: Fix signedness bug in generic_load_microcode()
install_equiv_cpu_table() returns type int.  It uses negative
error codes so using an unsigned type breaks the error handling.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: open list:AMD MICROCODE UPD... <amd64-microcode@amd64.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20110218091716.GA4384@bicker>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-20 14:01:32 +01:00
Borislav Petkov
2b15cd96e5 x86, system.h: Drop unused __SAVE/__RESTORE macros
Those are unused since at least the beginning of git history.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1298044056-31104-1-git-send-email-bp@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-20 13:38:47 +01:00
H. Peter Anvin
1c4badbdea x86-64, trampoline: Remove unused variable
Removed unused variable left over from development.

Reported-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <AANLkTik6UJ680mWJcu_W+jerLcqPjwjvaXyxB1jAMaG0@mail.gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
2011-02-18 15:50:36 -08:00
H. Peter Anvin
ee1b06ea6a x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot
The initial version of this patch had %eax being a segment and %ecx
being the mode.  I had changed the interfaces, but not the actual
implementation!

Reported-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <AANLkTikxqk=HEw9R-Du=v-1ti1HDGAY9vaNUep2XARaz@mail.gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
2011-02-18 15:47:42 -08:00
jacob.jun.pan@linux.intel.com
5df91509d3 x86: mrst: Remove apb timer read workaround
APB timer current count was unreliable in the earlier silicon, which
could result in time going backwards. This problem has been fixed in
the current silicon stepping. This patch removes the workaround which
was used to check and prevent timer rolling back when APB timer is
used as clocksource device.

The workaround code was also flawed by potential race condition
around the cached read value last_read. Though a fix can be done
by assigning last_read to a local variable at the beginning of
apbt_read_clocksource(), but this is not necessary anymore.

[ tglx: A sane timer on an Intel chip - I can't believe it ]

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
LKML-Reference: <1298065374-25532-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-18 23:14:54 +01:00
Konrad Rzeszutek Wilk
3d74a539ae pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
This code path is only run when an MSI/MSI-X PCI device is passed
in to PV DomU.

In 2.6.37 time-frame we over-wrote the default cleanup handler for
MSI/MSI-X irq->desc to be "xen_teardown_msi_irqs". That function
calls the the xen-pcifront driver which can tell the backend to
cleanup/take back the MSI/MSI-X device.

However, we forgot to continue the process of free-ing the MSI/MSI-X
device resources (irq->desc) in the PV domU side. Which is what
the default cleanup handler: default_teardown_msi_irqs did.

Hence we would leak IRQ descriptors.

Without this patch, doing "rmmod igbvf;modprobe igbvf" multiple
times ends with abandoned IRQ descriptors:

 28:          5  xen-pirq-pcifront-msi-x
 29:          8  xen-pirq-pcifront-msi-x
...
130:         10  xen-pirq-pcifront-msi-x

with the end result of running out of IRQ descriptors.

Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-18 12:41:53 -05:00
Konrad Rzeszutek Wilk
cc0f89c4a4 pci/xen: Cleanup: convert int** to int[]
Cleanup code. Cosmetic change to make the code look easier
to read.

Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-18 12:41:49 -05:00
Sebastian Andrzej Siewior
13884c6680 x86/pci: Remove unused variable
|arch/x86/pci/ce4100.c: In function `ce4100_conf_read':
|arch/x86/pci/ce4100.c:257:9: warning: unused variable `retval'

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: dirk.brandewie@gmail.com
LKML-Reference: <1292600033-12271-16-git-send-email-bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-18 16:57:18 +01:00
Konrad Rzeszutek Wilk
55cb8cd45e pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
xen_allocate_pirq -> xen_map_pirq_gsi -> PHYSDEVOP_alloc_irq_vector IFF
xen_initial_domain() in addition to the kernel side book-keeping side of
things (set chip and handler, update irq_info etc) whereas
xen_allocate_pirq_msi just does the kernel book keeping.

Also xen_allocate_pirq allocates an IRQ in the 1-1 GSI space whereas
xen_allocate_pirq_msi allocates a dynamic one in the >GSI IRQ space.

All of this is uneccessary as this code path is only executed
when we run as a domU PV guest with an MSI/MSI-X PCI card passed in.
Hence we can jump straight to allocating an dynamic IRQ (and
binding it to the proper PIRQ) and skip the rest.

In short: this change is a cosmetic one.

Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-18 09:26:37 -05:00
Jan Beulich
58bff947e2 x86: Eliminate pointless adjustment attempts in fixup_irqs()
Not only when an IRQ's affinity equals cpu_online_mask is there
no need to actually try to adjust the affinity, but also when
it's a subset thereof. This particularly avoids adjustment
attempts during system shutdown to any IRQs bound to CPU#0.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Gary Hade <garyhade@us.ibm.com>
LKML-Reference: <4D5D52C2020000780003272C@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18 08:58:00 +01:00
Jan Beulich
02ca752e41 x86: Remove die_nmi()
With no caller left, the function and the DIE_NMIWATCHDOG
enumerator can both go away.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Don Zickus <dzickus@redhat.com>
LKML-Reference: <4D5D521C0200007800032702@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18 08:54:05 +01:00
Jan Beulich
fd8fa4d3dd x86: Combine printk()s in show_regs_common()
Printing a single character alone when there's an immediately
following printk() is pretty pointless (and wasteful).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D5D535A0200007800032730@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18 08:52:30 +01:00
Jan Beulich
bb3e6251a6 x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler()
show_regs() already prints two(!) stack traces, no need for a third one.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D5D512902000078000326EE@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18 08:52:29 +01:00
H. Peter Anvin
3d35ac346e x86, reboot: Move the real-mode reboot code to an assembly file
Move the real-mode reboot code out to an assembly file (reboot_32.S)
which is allocated using the common lowmem trampoline allocator.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
2011-02-17 21:05:34 -08:00
H. Peter Anvin
014eea518a x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly
Make the GDT_ENTRY() macro in  <asm/segment.h> safe for use in
assembly code by guarding the ULL suffixes with _AC() macros.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2011-02-17 21:05:13 -08:00
H. Peter Anvin
d1ee433539 x86, trampoline: Use the unified trampoline setup for ACPI wakeup
Use the unified trampoline allocation setup to allocate and install
the ACPI wakeup code in low memory.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2011-02-17 21:05:06 -08:00
H. Peter Anvin
4822b7fc6d x86, trampoline: Common infrastructure for low memory trampolines
Common infrastructure for low memory trampolines.  This code installs
the trampolines permanently in low memory very early.  It also permits
multiple pieces of code to be used for this purpose.

This code also introduces a standard infrastructure for computing
symbol addresses in the trampoline code.

The only change to the actual SMP trampolines themselves is that the
64-bit trampoline has been made reusable -- the previous version would
overwrite the code with a status variable; this moves the status
variable to a separate location.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2011-02-17 21:02:43 -08:00
Len Brown
bfb53ccf1c intel_idle: disable Atom/Lincroft HW C-state auto-demotion
Just as we had to disable auto-demotion for NHM/WSM,
we need to do the same for Atom (Lincroft version).

In particular, auto-demotion will prevent Lincroft
from entering the S0i3 idle power saving state.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17 17:08:48 -05:00
Len Brown
14796fca2b intel_idle: disable NHM/WSM HW C-state auto-demotion
Hardware C-state auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead demoting to a shallower state,
which is less expensive, but saves less power.

Modern Linux should generally get exactly the states it requests.
In particular, when a CPU is taken off-line, it must not be demoted, else
it can prevent the entire package from reaching deep C-states.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17 17:08:46 -05:00
Yinghai Lu
6d496f9f23 x86-64, NUMA: Put dummy_numa_init() in the init section
dummy_numa_init() is used only during system boot.  Put it in .init
like other NUMA init functions.

- tj: Description update.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-02-17 15:04:20 +01:00
Yinghai Lu
2ca230baeb x86-64, NUMA: Don't call __pa() with invalid address in numa_reset_distance()
Do not call __pa(numa_distance) if it was not allocated before.
Calling with invalid address triggers VIRTUAL_BUG_ON() in
__phys_addr() if CONFIG_DEBUG_VIRTUAL.

Also reported by Ingo.

 http://thread.gmane.org/gmane.linux.kernel/1101306/focus=1101785

- v2: Change to check existing path as tj requested.
- tj: Description update.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
2011-02-17 15:03:43 +01:00
Akinobu Mita
da1016df85 x86: Use bitmap library functions
Use bitmap_set()/bitmap_clear() to fill/zero a region of a
bitmap instead of doing set_bit()/clear_bit() each bit.

This change has been tested with ioperm() and there's no
change in behavior.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <1297867715-20394-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-17 14:59:22 +01:00
Tejun Heo
e23bba6044 x86-64, NUMA: Unify emulated distance mapping
NUMA emulation needs to update node distance information.  It did it
by remapping apicid to PXM mapping, even when amdtopology is being
used.  There is no reason to go through such convolution.  The generic
code has all the information necessary to transform the distance table
to the emulated nid space.

Implement generic distance table transformation in numa_emulation()
and drop private implementations in srat_64 and amdtopology_64.  This
makes find_node_by_addr() and fake_physnodes() and related functions
unnecessary, drop them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00
Tejun Heo
6b78cb549b x86-64, NUMA: Unify emulated apicid -> node mapping transformation
NUMA emulation changes node mappings and thus apicid -> node mapping
needs to be updated accordingly.  srat_64 and amdtopology_64 did this
separately; however, all the necessary information is the mapping from
emulated nodes to physical nodes which is available in
emu_nid_to_phys[].

Implement common __apicid_to_node[] transformation in numa_emulation()
and drop duplicate implementations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00
Tejun Heo
1cca534073 x86-64, NUMA: Emulate directly from numa_meminfo
NUMA emulation built physnodes[] array which could only represent
configurations from the physical meminfo and emulated nodes using the
information.  There's no reason to take this extra level of
indirection.  Update emulation functions so that they operate directly
on numa_meminfo.  This simplifies the code and makes emulation layout
behave better with interleaved physical nodes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00
Tejun Heo
775ee85d7b x86-64, NUMA: Wrap node ID during emulation
Both emulation layout functions - split_nodes[_size]_interleave() -
didn't wrap emulated nid while laying out the fake nodes and tried to
avoid interating over the specified number of nodes, which is fragile.

Now that the emulation code generates numa_meminfo, the node memblks
don't need to be consecutive and emulated node IDs can simply wrap.
This makes the code more robust and is necessary for updates to better
handle the cases where the physical nodes are interleaved.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00
Tejun Heo
c88aea7a70 x86-64, NUMA: Make emulation code build numa_meminfo and share the registration path
NUMA emulation code built nodes[] array and had its own registration
path to set up the emulated nodes.  Update it such that it generates
emulated numa_meminfo and returns control to initmem_init() and shares
the same registration path with non-emulated cases.

Because {acpi|amd}_fake_nodes() expect nodes[] parameter,
fake_physnodes() now generates nodes[] from numa_meminfo.  This will
go away with further updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00
Tejun Heo
9d073caeb3 x86-64, NUMA: Build and use direct emulated nid -> phys nid mapping
NUMA emulation copied physical NUMA configuration into physnodes[] and
used it to reverse-map emulated nodes to physical nodes, which is
unnecessarily convoluted.  Build emu_nid_to_phys[] array to map
emulated nids directly to the matching physical nids and use it in
numa_add_cpu().

physnodes[] will be removed with further patches.

- v2: Build failure when CONFIG_DEBUG_PER_CPU_MAPS due to missing
  local variable definition fixed.  Reported by Ingo.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16 17:11:10 +01:00