linux

iv/linux

Author	SHA1	Message	Date
Nick Piggin	4827bbb06e	i386: remove bogus comment about memory barrier The comment being removed by this patch is incorrect and misleading. In the following situation: 1. load ... 2. store 1 -> X 3. wmb 4. rmb 5. load a <- Y 6. store ... 4 will only ensure ordering of 1 with 5. 3 will only ensure ordering of 2 with 6. Further, a CPU with strictly in-order stores will still only provide that 2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU with wmb after every store). In all cases, 5 may still be executed before 2 is visible to other CPUs! The additional piece of the puzzle that mb() provides is the store/load ordering, which fundamentally cannot be achieved with any combination of rmb()s and wmb()s. This can be an unexpected result if one expected any sort of global ordering guarantee to barriers (eg. that the barriers themselves are sequentially consistent with other types of barriers). However sfence or lfence barriers need only provide an ordering partial ordering of memory operations -- Consider that wmb may be implemented as nothing more than inserting a special barrier entry in the store queue, or, in the case of x86, it can be a noop as the store queue is in order. And an rmb may be implemented as a directive to prevent subsequent loads only so long as their are no previous outstanding loads (while there could be stores still in store queues). I can actually see the occasional load/store being reordered around lfence on my core2. That doesn't prove my above assertions, but it does show the comment is wrong (unless my program is -- can send it out by request). So: mb() and smp_mb() always have and always will require a full mfence or lock prefixed instruction on x86. And we should remove this comment. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Paul McKenney <paulmck@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-29 09:13:59 -07:00
Len Brown	5a16eff86d	Pull bugzilla-1641 into release branch	2007-08-24 22:19:05 -04:00
Rolf Eike Beer	5ca2481424	PCI: Document pci_iomap() This useful interface is hardly mentioned anywhere in the in-tree documentation. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Tejun Heo <htejun@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-08-22 14:48:40 -07:00
Len Brown	61ec7567db	ACPI: boot correctly with "nosmp" or "maxcpus=0" In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled. However, in ACPI mode, these parameters didn't completely disable the IO APIC initialization code and boot failed. init/main.c: Disable the IO_APIC if "nosmp" or "maxcpus=0" undefine disable_ioapic_setup() when it doesn't apply. i386: delete ioapic_setup(), it was a duplicate of parse_noapic() delete undefinition of disable_ioapic_setup() x86_64: rename disable_ioapic_setup() to parse_noapic() to match i386 define disable_ioapic_setup() in header to match i386 http://bugzilla.kernel.org/show_bug.cgi?id=1641 Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>	2007-08-21 00:33:35 -04:00
Daniel Gollub	0328ecef90	x86_64: Fix to keep watchdog disabled by default for i386/x86_64 Fixed wrong expression which enabled watchdogs even if nmi_watchdog kernel parameter wasn't set. This regression got slightly introduced with commit `b7471c6da9`. Introduced NMI_DISABLED (-1) which allows to switch the value of NMI_DEFAULT without breaking the APIC NMI watchdog code (again). Fixes: https://bugzilla.novell.com/show_bug.cgi?id=298084 http://bugzilla.kernel.org/show_bug.cgi?id=7839 And likely some more nmi_watchdog=0 related issues. Signed-off-by: Daniel Gollub <dgollub@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-18 10:25:25 -07:00
Satyam Sharma	62be90012c	i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert() Use cpu_relax() in the busy loops, as atomic_read() doesn't automatically imply volatility for i386 and x86_64. x86_64 doesn't have this issue because it open-codes the while loop in smpboot.c:smp_callin() itself that already uses cpu_relax(). For i386, however, smpboot.c:smp_callin() calls wait_for_init_deassert() which is buggy for mach-default and mach-es7000 cases. [ I test-built a kernel -- smp_callin() itself got inlined in its only callsite, smpboot.c:start_secondary() -- and the relevant piece of code disassembles to the following: 0xc1019704 <start_secondary+12>: mov 0xc144c4c8,%eax 0xc1019709 <start_secondary+17>: test %eax,%eax 0xc101970b <start_secondary+19>: je 0xc1019709 <start_secondary+17> init_deasserted (at 0xc144c4c8) gets fetched into %eax only once and then we loop over the test of the stale value in the register only, so these look like real bugs to me. With the fix below, this becomes: 0xc1019706 <start_secondary+14>: pause 0xc1019708 <start_secondary+16>: cmpl $0x0,0xc144c4c8 0xc101970f <start_secondary+23>: je 0xc1019706 <start_secondary+14> which looks nice and healthy. ] Thanks to Heiko Carstens for noticing this. Signed-off-by: Satyam Sharma <satyam@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-18 09:54:44 -07:00
Andi Kleen	d3f7eae182	i386: Use global flag to disable broken local apic timer on AMD CPUs. The Averatec 2370 and some other Turion laptop BIOS seems to program the ENABLE_C1E MSR inconsistently between cores. This confuses the lapic use heuristics because when C1E is enabled anywhere it seems to affect the complete chip. Use a global flag instead of a per cpu flag to handle this. If any CPU has C1E enabled disabled lapic use. Thanks to Cal Peake for debugging. Cc: tglx@linutronix.de Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:58:13 -07:00
Andi Kleen	ab144f5ec6	i386: Make patching more robust, fix paravirt issue Commit `19d36ccdc3` "x86: Fix alternatives and kprobes to remap write-protected kernel text" uses code which is being patched for patching. In particular, paravirt_ops does patching in two stages: first it calls paravirt_ops.patch, then it fills any remaining instructions with nop_out(). nop_out calls text_poke() which calls lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val): that call site is one of the places we patch. If we always do patching as one single call to text_poke(), we only need make sure we're not patching the memcpy in text_poke itself. This means the prototype to paravirt_ops.patch needs to change, to marshal the new code into a buffer rather than patching in place as it does now. It also means all patching goes through text_poke(), which is known to be safe (apply_alternatives is also changed to make a single patch). AK: fix compilation on x86-64 (bad rusty!) AK: fix boot on x86-64 (sigh) AK: merged with other patches Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:58:13 -07:00
Muli Ben-Yehuda	73c59afc65	finish i386 and x86-64 sysdata conversion This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix "pci=noacpi" on your laptop? Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: <riku.seppala@kymp.net> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:42 -07:00
Andrew Morton	57d4810ea0	revert "x86, serial: convert legacy COM ports to platform devices" Revert `7e92b4fc34`. It broke Sébastien Dugué's machine and Jeff said (persuasively) This seems like it will break decades-long-working stuff, in favor of breaking new ground in our favorite area, "trusting the BIOS." It's just not worth it for serial ports, IMO. Serial ports are something that just shouldn't break at this late stage in the game. My new Intel platform boxes don't even have serial ports, so I question the value of messing with serial port probing even more... because... just wait a year, and your box won't have a serial port either! :) I certainly don't object to the use of platform devices (or isa_driver), but the probe change seems questionable. That's sorta analagous to rewriting the floppy driver probe routine. Sure you could do it... but why risk all that damage and go through debugging all over again? It seems clear from this report that we cannot, should not, trust BIOS for something (a) so simple and (b) that has been working for over a decade. Much discussion ensued and we've decided to have another go at all of this. Cc: Sébastien Dugué <sebastien.dugue@bull.net> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Len Brown <lenb@kernel.org> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Sascha Sommer <saschasommer@freenet.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
Stephane Eranian	a583f1b542	remove unused TIF_NOTIFY_RESUME flag Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The flag was not used excecpt on IA-64 where the patch replaces it with TIF_PERFMON_WORK. Signed-off-by: stephane eranian <eranian@hpl.hp.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
Rafael J. Wysocki	b0cb1a19d0	Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid confusion (among other things, with CONFIG_SUSPEND introduced in the next patch). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 16:45:38 -07:00
H. Peter Anvin	238b706da1	[x86 setup] Make struct ist_info cross-architecture, and use in setup code Make "struct ist_info" valid on both i386 and x86-64, and use the structure by name in the setup code. Additionally, "Intel SpeedStep IST" is redundant, refer to it as IST consistently. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2007-07-25 12:02:21 -07:00
H. Peter Anvin	f77b1ab383	[x86 setup] Fix typos in struct efi_info Fix missing letters in the structure members of struct efi_info. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2007-07-25 12:02:21 -07:00
Len Brown	e8b2fd0122	ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source As it was a synonym for (CONFIG_ACPI && CONFIG_X86), the ifdefs for it were more clutter than they were worth. For ia64, just add a few stubs in anticipation of future S3 or S4 support. Signed-off-by: Len Brown <len.brown@intel.com>	2007-07-25 01:29:39 -04:00
Andi Kleen	e05aff854c	i386: Use patchable lock prefix in set_64bit Previously lock was unconditionally used, but shouldn't be needed on UP systems. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:03:38 -07:00
Juergen Beisert	f25f64ed5b	x86: Replace NSC/Cyrix specific chipset access macros by inlined functions. Due to index register access ordering problems, when using macros a line like this fails (and does nothing): setCx86(CX86_CCR2, getCx86(CX86_CCR2) \| 0x88); With inlined functions this line will work as expected. Note about a side effect: Seems on Geode GX1 based systems the "suspend on halt power saving feature" was never enabled due to this wrong macro expansion. With inlined functions it will be enabled, but this will stop the TSC when the CPU runs into a HLT instruction. Kernel output something like this: Clocksource tsc unstable (delta = -472746897 ns) This is the 3rd version of this patch. - Adding missed arch/i386/kernel/cpu/mtrr/state.c Thanks to Andres Salomon - Adding some big fat comments into the new header file Suggested by Andi Kleen AK: fixed x86-64 compilation Signed-off-by: Juergen Beisert <juergen@kreuzholzen.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:03:38 -07:00
Andi Kleen	8f4e956b31	x86: Stop MCEs and NMIs during code patching When a machine check or NMI occurs while multiple byte code is patched the CPU could theoretically see an inconsistent instruction and crash. Prevent this by temporarily disabling MCEs and returning early in the NMI handler. Based on discussion with Mathieu Desnoyers. Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:03:37 -07:00
Andi Kleen	19d36ccdc3	x86: Fix alternatives and kprobes to remap write-protected kernel text Reenable kprobes and alternative patching when the kernel text is write protected by DEBUG_RODATA Add a general utility function to change write protected text. The new function remaps the code using vmap to write it and takes care of CPU synchronization. It also does CLFLUSH to make icache recovery faster. There are some limitations on when the function can be used, see the comment. This is a newer version that also changes the paravirt_ops code. text_poke also supports multi byte patching now. Contains bug fixes from Zach Amsden and suggestions from Mathieu Desnoyers. Cc: Jan Beulich <jbeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:03:37 -07:00
Muli Ben-Yehuda	08f1c192c3	x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:14 -07:00
Andres Salomon	f62e518484	i386: basic infrastructure support for AMD geode-class machines This builds upon the existing geode infrastructure, but adds southbridge support, some GPIO functions, and a header file (asm-i386/geode.h) with some useful GX/LX detection tests. The majority of this code was written by Jordan Crouse. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andres Salomon <dilinger@debian.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:14 -07:00
Thomas Gleixner	a2900975ef	i386: move PIT function declarations and constants to correct header file setup_pit_timer is declared in asm-i386/timer.h. Move it to the pit header file, so it can be used by x86_64 as well. Move also the PIT constants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:14 -07:00
Robert P. J. Day	48dd9343d0	i386: replace hard-coded constant with appropriate macro from kernel.h Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:13 -07:00
Andreas Mohr	267eb01a62	i386: add cpu_relax() to cmos_lock() Add cpu_relax() to cmos_lock() inline function for faster operation on SMT CPUs and less power consumption on others in case of lock contention (which probably doesn't happen too often, so admittedly this patch is not too exciting). [akpm@linux-foundation.org: Include the header file for cpu_relax()] Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:13 -07:00
Rafael J. Wysocki	1c10070a55	i386: do not restore reserved memory after hibernation On some systems the ACPI NVS area is located in the first 1 MB of RAM and it is overwritten by the i386 code during the restore after hibernation. This confuses the ACPI platform firmware that doesn't update the AC adapter status appropriately as a result (http://bugzilla.kernel.org/show_bug.cgi?id=7995). The solution is to register the reserved memory in the first 1 MB as 'nosave', so that swsusp doesn't touch it during the restore. Also, this has been done on x86_64 for a long time now, so this patch makes the i386 restore code behave like the x86_64 one. [akpm@linux-foundation.org: build fix] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:12 -07:00
Adrian Bunk	9596017e79	x86: remove support for the Rise CPU The Rise CPUs were only very short-lived, and there are no reports of anyone both owning one and running Linux on it. Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg available online. If it turns out that against all expectations there are actually users reverting this patch would be easy. This patch will make the kernel images smaller by a few bytes for all i386 users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:10 -07:00
Andrew Morton	459029541d	i386: add reference to the arguments Prevent stuff like this: mm/vmalloc.c: In function 'unmap_kernel_range': mm/vmalloc.c:75: warning: unused variable 'start' Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:10 -07:00
Nigel Cunningham	44bf4cea43	x86: PM_TRACE support Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:10 -07:00
Truxton Fulton	8b93789808	i386: fix machine rebooting Commit `59f4e7d572` fixed machine rebooting on Truxton's machine (when no keyboard was present). But it broke it on Lee's machine. The patch reinstates the old (pre-59f4e7d572980a521b7bdba74ab71b21f5995538) code and if that doesn't work out, try the new, post-59f4e7d572980a521b7bdba74ab71b21f5995538 code instead. Cc: Lee Garrett <lee-in-berlin@web.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:10 -07:00
Jan Beulich	d5321abe6a	i386: minor nx handling adjustment Constrain __supported_pte_mask and NX handling to just the PAE kernel. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:09 -07:00
Thomas Gleixner	0655d7c32b	x86: share hpet.h with i386 hpet.h in asm-i386 and asm-x86_64 contain tons of duplicated stuff. Consolidate into one shared header file. AK: Fix i386 compilation with !X86_IO_APIC Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:09 -07:00
Chris Wright	bef9f9de32	i386: remove pit_interrupt_hook Remove pit_interrupt_hook as it adds just an extra layer. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:08 -07:00
Andi Kleen	b520b85a96	i386: Move all simple string operations out of line The compiler generally generates reasonable inline code for the simple cases and for the rest it's better for code size for them to be out of line. Also there they can be potentially optimized more in the future. In fact they probably should be in a .S file because they're all pure assembly, but that's for another day. Also some code style cleanup on them while I was on it (this seems to be the last untouched really early Linux code) This saves ~12k text for a defconfig kernel with gcc 4.1. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:08 -07:00
Thomas Gleixner	82644459c5	NTP: move the cmos update code into ntp.c i386 and sparc64 have the identical code to update the cmos clock. Move it into kernel/time/ntp.c as there are other architectures coming along with the same requirements. [akpm@linux-foundation.org: build fixes] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:15 -07:00
Avi Kivity	2d9ce177e6	i386: Allow KVM on i386 nonpae Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of the cmpxchg64b feature and enables compilation of the set_64bit() family. Since the option is dependent on PAE, and since KVM depends on set_64bit(), this effectively disables KVM on i386 nopae. Simplify by removing the config option altogether: the boot check is made dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed without constraints. It is up to users to check for the feature flag (KVM does not as virtualiation extensions imply its existence). Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 14:37:05 -07:00
Ralf Baechle	c41917df8a	[PATCH] sched: sched_cacheflush is now unused Since Ingo's recent scheduler rewrite which was merged as commit `0437e109e1` sched_cacheflush is unused. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-19 21:28:35 +02:00
Rusty Russell	d7e28ffe6c	lguest: the host code This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Peter Zijlstra	b111757c50	arch: personality independent stack top New arch macro STACK_TOP_MAX it gives the larges valid stack address for the architecture in question. It differs from STACK_TOP in that it will not distinguish between personalities but will always return the largest possible address. This is used to create the initial stack on execve, which we will move down to the proper location once the binfmt code has figured out where that is. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Fenghua Yu	5fb7dc37dc	define new percpu interface for shared data per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Ellerman	9e367d8592	jprobes: remove JPROBE_ENTRY() AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Linus Torvalds	d756d10e24	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: extent macros cleanup Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. ext4: remove extra IS_RDONLY() check ext4: Use is_power_of_2() Use zero_user_page() in ext4 where possible ext4: Remove 65000 subdirectory limit ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields ext4: Add nanosecond timestamps jbd2: Move jbd2-debug file to debugfs jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices ext4: Make extents code sanely handle on-disk corruption ext4: copy i_flags to inode flags on write ext4: Enable extents by default Change on-disk format to support 2^15 uninitialized extents write support for preallocated blocks fallocate support in ext4 sys_fallocate() implementation on i386, x86_64 and powerpc	2007-07-18 10:32:00 -07:00
Jeremy Fitzhardinge	4bac07c993	xen: add the Xenbus sysfs and virtual device hotplug driver This communicates with the machine control software via a registry residing in a controlling virtual machine. This allows dynamic creation, destruction and modification of virtual device configurations (network devices, block devices and CPUS, to name some examples). [ Greg, would you mind giving this a review? Thanks -J ] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Greg KH <greg@kroah.com>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	5ead97c84f	xen: Core Xen implementation This patch is a rollup of all the core pieces of the Xen implementation, including: - booting and setup - pagetable setup - privileged instructions - segmentation - interrupt flags - upcalls - multicall batching BOOTING AND SETUP The vmlinux image is decorated with ELF notes which tell the Xen domain builder what the kernel's requirements are; the domain builder then constructs the address space accordingly and starts the kernel. Xen has its own entrypoint for the kernel (contained in an ELF note). The ELF notes are set up by xen-head.S, which is included into head.S. In principle it could be linked separately, but it seems to provoke lots of binutils bugs. Because the domain builder starts the kernel in a fairly sane state (32-bit protected mode, paging enabled, flat segments set up), there's not a lot of setup needed before starting the kernel proper. The main steps are: 1. Install the Xen paravirt_ops, which is simply a matter of a structure assignment. 2. Set init_mm to use the Xen-supplied pagetables (analogous to the head.S generated pagetables in a native boot). 3. Reserve address space for Xen, since it takes a chunk at the top of the address space for its own use. 4. Call start_kernel() PAGETABLE SETUP Once we hit the main kernel boot sequence, it will end up calling back via paravirt_ops to set up various pieces of Xen specific state. One of the critical things which requires a bit of extra care is the construction of the initial init_mm pagetable. Because Xen places tight constraints on pagetables (an active pagetable must always be valid, and must always be mapped read-only to the guest domain), we need to be careful when constructing the new pagetable to keep these constraints in mind. It turns out that the easiest way to do this is use the initial Xen-provided pagetable as a template, and then just insert new mappings for memory where a mapping doesn't already exist. This means that during pagetable setup, it uses a special version of xen_set_pte which ignores any attempt to remap a read-only page as read-write (since Xen will map its own initial pagetable as RO), but lets other changes to the ptes happen, so that things like NX are set properly. PRIVILEGED INSTRUCTIONS AND SEGMENTATION When the kernel runs under Xen, it runs in ring 1 rather than ring 0. This means that it is more privileged than user-mode in ring 3, but it still can't run privileged instructions directly. Non-performance critical instructions are dealt with by taking a privilege exception and trapping into the hypervisor and emulating the instruction, but more performance-critical instructions have their own specific paravirt_ops. In many cases we can avoid having to do any hypercalls for these instructions, or the Xen implementation is quite different from the normal native version. The privileged instructions fall into the broad classes of: Segmentation: setting up the GDT and the GDT entries, LDT, TLS and so on. Xen doesn't allow the GDT to be directly modified; all GDT updates are done via hypercalls where the new entries can be validated. This is important because Xen uses segment limits to prevent the guest kernel from damaging the hypervisor itself. Traps and exceptions: Xen uses a special format for trap entrypoints, so when the kernel wants to set an IDT entry, it needs to be converted to the form Xen expects. Xen sets int 0x80 up specially so that the trap goes straight from userspace into the guest kernel without going via the hypervisor. sysenter isn't supported. Kernel stack: The esp0 entry is extracted from the tss and provided to Xen. TLB operations: the various TLB calls are mapped into corresponding Xen hypercalls. Control registers: all the control registers are privileged. The most important is cr3, which points to the base of the current pagetable, and we handle it specially. Another instruction we treat specially is CPUID, even though its not privileged. We want to control what CPU features are visible to the rest of the kernel, and so CPUID ends up going into a paravirt_op. Xen implements this mainly to disable the ACPI and APIC subsystems. INTERRUPT FLAGS Xen maintains its own separate flag for masking events, which is contained within the per-cpu vcpu_info structure. Because the guest kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely ignored (and must be, because even if a guest domain disables interrupts for itself, it can't disable them overall). (A note on terminology: "events" and interrupts are effectively synonymous. However, rather than using an "enable flag", Xen uses a "mask flag", which blocks event delivery when it is non-zero.) There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which are implemented to manage the Xen event mask state. The only thing worth noting is that when events are unmasked, we need to explicitly see if there's a pending event and call into the hypervisor to make sure it gets delivered. UPCALLS Xen needs a couple of upcall (or callback) functions to be implemented by each guest. One is the event upcalls, which is how events (interrupts, effectively) are delivered to the guests. The other is the failsafe callback, which is used to report errors in either reloading a segment register, or caused by iret. These are implemented in i386/kernel/entry.S so they can jump into the normal iret_exc path when necessary. MULTICALL BATCHING Xen provides a multicall mechanism, which allows multiple hypercalls to be issued at once in order to mitigate the cost of trapping into the hypervisor. This is particularly useful for context switches, since the 4-5 hypercalls they would normally need (reload cr3, update TLS, maybe update LDT) can be reduced to one. This patch implements a generic batching mechanism for hypercalls, which gets used in many places in the Xen code. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Ian Pratt <ian.pratt@xensource.com> Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Cc: Adrian Bunk <bunk@stusta.de>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	a42089dd35	xen: Add Xen interface header files Add Xen interface header files. These are taken fairly directly from the Xen tree, but somewhat rearranged to suit the kernel's conventions. Define macros and inline functions for doing hypercalls into the hypervisor. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	688340ea34	Add a sched_clock paravirt_op The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	d572929cdd	paravirt: helper to disable all IO space In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	c70df74376	paravirt: make siblingmap functions visible Paravirt implementations need to set the sibling map on new cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	724faa89cc	paravirt: unstatic smp_store_cpu_info Paravirt implementations need to store cpu info when bringing up cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	5378701324	paravirt: unstatic leave_mm Make globally leave_mm visible, specifically so that Xen can use it to shoot-down lazy uses of cr3. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	03f0c2f950	paravirt: increase IRQ limit When running with CONFIG_PARAVIRT, we may want lots of IRQs even if there's no IO APIC. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>	2007-07-18 08:47:41 -07:00

1 2 3 4 5 ...

782 Commits