linux

iv/linux

Author	SHA1	Message	Date
Paul E. McKenney	cc631fb732	sched: correctly place paranioa memory barriers in synchronize_sched_expedited() The memory barriers must be in the SMP case, not in the !SMP case. Also add a barrier after the atomic_inc() in order to ensure that other CPUs see post-synchronize_sched_expedited() actions as following the expedited grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-05-06 18:49:21 +02:00
Tejun Heo	94458d5ecb	sched: kill paranoia check in synchronize_sched_expedited() The paranoid check which verifies that the cpu_stop callback is actually called on all online cpus is completely superflous. It's guaranteed by cpu_stop facility and if it didn't work as advertised other things would go horribly wrong and trying to recover using synchronize_sched() wouldn't be very meaningful. Kill the paranoid check. Removal of this feature is done as a separate step so that it can serve as a bisection point if something actually goes wrong. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>	2010-05-06 18:49:21 +02:00
Tejun Heo	969c79215a	sched: replace migration_thread with cpu_stop Currently migration_thread is serving three purposes - migration pusher, context to execute active_load_balance() and forced context switcher for expedited RCU synchronize_sched. All three roles are hardcoded into migration_thread() and determining which job is scheduled is slightly messy. This patch kills migration_thread and replaces all three uses with cpu_stop. The three different roles of migration_thread() are splitted into three separate cpu_stop callbacks - migration_cpu_stop(), active_load_balance_cpu_stop() and synchronize_sched_expedited_cpu_stop() - and each use case now simply asks cpu_stop to execute the callback as necessary. synchronize_sched_expedited() was implemented with private preallocated resources and custom multi-cpu queueing and waiting logic, both of which are provided by cpu_stop. synchronize_sched_expedited_count is made atomic and all other shared resources along with the mutex are dropped. synchronize_sched_expedited() also implemented a check to detect cases where not all the callback got executed on their assigned cpus and fall back to synchronize_sched(). If called with cpu hotplug blocked, cpu_stop already guarantees that and the condition cannot happen; otherwise, stop_machine() would break. However, this patch preserves the paranoid check using a cpumask to record on which cpus the stopper ran so that it can serve as a bisection point if something actually goes wrong theree. Because the internal execution state is no longer visible, rcu_expedited_torture_stats() is removed. This patch also renames cpu_stop threads to from "stopper/%d" to "migration/%d". The names of these threads ultimately don't matter and there's no reason to make unnecessary userland visible changes. With this patch applied, stop_machine() and sched now share the same resources. stop_machine() is faster without wasting any resources and sched migration users are much cleaner. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>	2010-05-06 18:49:21 +02:00
Tejun Heo	3fc1f1e27a	stop_machine: reimplement using cpu_stop Reimplement stop_machine using cpu_stop. As cpu stoppers are guaranteed to be available for all online cpus, stop_machine_create/destroy() are no longer necessary and removed. With resource management and synchronization handled by cpu_stop, the new implementation is much simpler. Asking the cpu_stop to execute the stop_cpu() state machine on all online cpus with cpu hotplug disabled is enough. stop_machine itself doesn't need to manage any global resources anymore, so all per-instance information is rolled into struct stop_machine_data and the mutex and all static data variables are removed. The previous implementation created and destroyed RT workqueues as necessary which made stop_machine() calls highly expensive on very large machines. According to Dimitri Sivanich, preventing the dynamic creation/destruction makes booting faster more than twice on very large machines. cpu_stop resources are preallocated for all online cpus and should have the same effect. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>	2010-05-06 18:49:20 +02:00
Tejun Heo	1142d81029	cpu_stop: implement stop_cpu[s]() Implement a simplistic per-cpu maximum priority cpu monopolization mechanism. A non-sleeping callback can be scheduled to run on one or multiple cpus with maximum priority monopolozing those cpus. This is primarily to replace and unify RT workqueue usage in stop_machine and scheduler migration_thread which currently is serving multiple purposes. Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(), stop_cpus() and try_stop_cpus(). This is to allow clean sharing of resources among stop_cpu and all the migration thread users. One stopper thread per cpu is created which is currently named "stopper/CPU". This will eventually replace the migration thread and take on its name. * This facility was originally named cpuhog and lived in separate files but Peter Zijlstra nacked the name and thus got renamed to cpu_stop and moved into stop_machine.c. * Better reporting of preemption leak as per Peter's suggestion. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>	2010-05-06 18:49:20 +02:00
Suresh Siddha	99bd5e2f24	sched: Fix select_idle_sibling() logic in select_task_rq_fair() Issues in the current select_idle_sibling() logic in select_task_rq_fair() in the context of a task wake-up: a) Once we select the idle sibling, we use that domain (spanning the cpu that the task is currently woken-up and the idle sibling that we found) in our wake_affine() decisions. This domain is completely different from the domain(we are supposed to use) that spans the cpu that the task currently woken-up and the cpu where the task previously ran. b) We do select_idle_sibling() check only for the cpu that the task is currently woken-up on. If select_task_rq_fair() selects the previously run cpu for waking the task, doing a select_idle_sibling() check for that cpu also helps and we don't do this currently. c) In the scenarios where the cpu that the task is woken-up is busy but with its HT siblings are idle, we are selecting the task be woken-up on the idle HT sibling instead of a core that it previously ran and currently completely idle. i.e., we are not taking decisions based on wake_affine() but directly selecting an idle sibling that can cause an imbalance at the SMT/MC level which will be later corrected by the periodic load balancer. Fix this by first going through the load imbalance calculations using wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu, then choose a possible idle sibling for waking up the task on. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-23 11:02:02 +02:00
Peter Zijlstra	669c55e9f9	sched: Pre-compute cpumask_weight(sched_domain_span(sd)) Dave reported that his large SPARC machines spend lots of time in hweight64(), try and optimize some of those needless cpumask_weight() invocations (esp. with the large offstack cpumasks these are very expensive indeed). Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-23 11:02:02 +02:00
Peter Zijlstra	74f5187ac8	sched: Cure load average vs NO_HZ woes Chase reported that due to us decrementing calc_load_task prematurely (before the next LOAD_FREQ sample), the load average could be scewed by as much as the number of CPUs in the machine. This patch, based on Chase's patch, cures the problem by keeping the delta of the CPU going into NO_HZ idle separately and folding that in on the next LOAD_FREQ update. This restores the balance and we get strict LOAD_FREQ period samples. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chase Douglas <chase.douglas@canonical.com> LKML-Reference: <1271934490.1776.343.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-23 11:02:02 +02:00
Mike Galbraith	09a40af524	sched: Fix UP update_avg() build warning update_avg() is only used for SMP builds, move it to the nearest SMP block. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1271309399.14779.17.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-15 09:36:47 +02:00
Ingo Molnar	b257c14ceb	Merge branch 'linus' into sched/core Merge reason: merge the latest fixes, update to -rc4. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-15 09:36:16 +02:00
Linus Torvalds	2ba3abd818	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handling	2010-04-13 17:49:48 -07:00
Linus Torvalds	0fdfe5ad28	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: fix delegated locking NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible NFS: Fix a race with the new commit code NFS: Ensure that writeback_single_inode() calls write_inode() when syncing NFS: Fix the mode calculation in nfs_find_open_context NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR	2010-04-13 15:10:16 -07:00
Linus Torvalds	44d2d371d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Add some more commentary to __raw_local_irq_save() sparc64: Fix memory leak in pci_register_iommu_region(). sparc64: Add kmemleak annotation to sun4v_build_virq() sparc64: Support kmemleak. sparc64: Add function graph tracer support. sparc64: Give a stack frame to the ftrace call sites. sparc64: Use a seperate counter for timer interrupts and NMI checks, like x86. sparc64: Remove profiling from some low-level bits. sparc64: Kill unnecessary static on local var in ftrace_call_replace(). sparc64: Kill CONFIG_STACK_DEBUG code. sparc64: Add HAVE_FUNCTION_TRACE_MCOUNT_TEST and tidy up. sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs. sparc64: Use kstack_valid() in die_if_kernel().	2010-04-13 11:34:05 -07:00
Linus Torvalds	465de2ba71	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits) smc91c92_cs: define multicast_table as unsigned char can: avoids a false warning e1000e: stop cleaning when we reach tx_ring->next_to_use igb: restrict WoL for 82576 ET2 Quad Port Server Adapter virtio_net: missing sg_init_table Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb" iwlwifi: need check for valid qos packet before free tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb udp: fix for unicast RX path optimization myri10ge: fix rx_pause in myri10ge_set_pauseparam net: corrected documentation for hardware time stamping stmmac: use resource_size() x.25 attempts to negotiate invalid throughput x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet. bridge: Fix IGMP3 report parsing cnic: Fix crash during bnx2x MTU change. qlcnic: fix set mac addr r6040: fix r6040_multicast_list vhost-net: fix vq_memory_access_ok error checking ath9k: fix double calls to ath_radio_enable ...	2010-04-13 11:32:48 -07:00
Ken Kawasaki	a6d37024de	smc91c92_cs: define multicast_table as unsigned char smc91c92_cs: * define multicast_table as unsigned char * remove unnecessary "#ifndef final_version" Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 03:03:16 -07:00
Eric Dumazet	4ffa87012e	can: avoids a false warning At this point optlen == sizeof(sfilter) but some compilers are dumb. Reported-by: Németh Márton <nm127@freemail.h Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 03:03:14 -07:00
Terry Loftin	dac876193c	e1000e: stop cleaning when we reach tx_ring->next_to_use Tx ring buffers after tx_ring->next_to_use are volatile and could change, possibly causing a crash. Stop cleaning when we hit tx_ring->next_to_use. Signed-off-by: Terry Loftin <terry.loftin@hp.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 03:03:13 -07:00
Stefan Assmann	d5aa22520d	igb: restrict WoL for 82576 ET2 Quad Port Server Adapter Restrict Wake-on-LAN to first port on 82576 ET2 quad port NICs, as it is only supported there. Signed-off-by: Stefan Assmann <sassmann@redhat.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 03:03:12 -07:00
David S. Miller	c011f80ba0	sparc64: Add some more commentary to __raw_local_irq_save() Suggested by Peter Zijlstra Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 01:50:43 -07:00
David S. Miller	9343af084c	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: lib/Kconfig.debug	2010-04-13 00:28:45 -07:00
David S. Miller	e182c77cc2	sparc64: Fix memory leak in pci_register_iommu_region(). Found by kmemleak. If request_resource() fails, we leak the struct resource we allocated to represent the IOMMU mapping area. This actually happens on sun4v machines because the IOMEM area is only reported sans the IOMMU region, unlike all previous systems. I'll need to fix that at some point, but for now fix the leak. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 23:46:18 -07:00
David S. Miller	25ad403f67	sparc64: Add kmemleak annotation to sun4v_build_virq() The only reference we store to this memory is in the form of a physical address, so kmemleak can't see it. Add a kmemleak_not_leak() annotation. It's probably useful to be able to look at a dump of these things either via debugfs or similar, and thus we could at some point store them in some kind of table and therefore get rid of this annotation. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 23:46:17 -07:00
David S. Miller	8b8d8e2840	sparc64: Support kmemleak. Only missing thing was an _sdata marker in vmlinux.lds.S Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 23:46:17 -07:00
David S. Miller	9960e9e894	sparc64: Add function graph tracer support. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:37:26 -07:00
David S. Miller	a71d1d6bb1	sparc64: Give a stack frame to the ftrace call sites. It's the only way we'll be able to implement the function graph tracer properly. A positive is that we no longer have to worry about the linker over-optimizing the tail call, since we don't use a tail call any more. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:37:15 -07:00
David S. Miller	daecbf58a5	sparc64: Use a seperate counter for timer interrupts and NMI checks, like x86. This keeps us from having to use kstat_irqs_cpu() from the NMI handler, the former of which is a profiled function. Instead we use a currently empty slot in the cpu_data Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:37:07 -07:00
David S. Miller	f8e8a8e8cb	sparc64: Remove profiling from some low-level bits. These include the timer implementation, perf events support, and the performance counter register (pcr) programming layer. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:36:19 -07:00
David S. Miller	d96478d5a2	sparc64: Kill unnecessary static on local var in ftrace_call_replace(). Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:36:10 -07:00
David S. Miller	ddacd0bc70	sparc64: Kill CONFIG_STACK_DEBUG code. The generic stack tracer does this job just as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:36:03 -07:00
David S. Miller	63b7549573	sparc64: Add HAVE_FUNCTION_TRACE_MCOUNT_TEST and tidy up. Check function_trace_stop at ftrace_caller Toss mcount_call and dummy call of ftrace_stub, unnecessary. Document problems we'll have if the final kernel image link ever turns on relaxation. Properly size 'ftrace_call' so it looks right when inspecting instructions under gdb et al. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:35:24 -07:00
David S. Miller	0c25e9e6cb	sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs. If we are in an NMI then doing a plain raw_local_irq_disable() will write PIL_NORMAL_MAX into %pil, which is lower than PIL_NMI, and thus we'll re-enable NMIs and recurse. Doing a simple: %pil = %pil \| PIL_NORMAL_MAX does what we want, if we're already at PIL_NMI (15) we leave it at that setting, else we set it to PIL_NORMAL_MAX (14). This should get the function tracer working on sparc64. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:21:52 -07:00
David S. Miller	cb256aa604	sparc64: Use kstack_valid() in die_if_kernel(). This gets rid of a local function (is_kernel_stack()) which tries to do the same thing, yet poorly in that it doesn't handle IRQ stacks properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:16:22 -07:00
Shirley Ma	0e413f22e4	virtio_net: missing sg_init_table Add missing sg_init_table for sg_set_buf in virtio_net which induced in defer skb patch. Reported-by: Thomas Müller <thomas@mathtm.de> Tested-by: Thomas Müller <thomas@mathtm.de> Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-12 22:00:34 -07:00
Linus Torvalds	0d0fb0f9c5	Linux 2.6.34-rc4	2010-04-12 18:41:35 -07:00
Linus Torvalds	64a8920fab	Merge branch 'anonvma' * anonvma: anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma anon_vma: clone the anon_vma chain in the right order vma_adjust: fix the copying of anon_vma chains Simplify and comment on anon_vma re-use for anon_vma_prepare()	2010-04-12 18:39:58 -07:00
Linus Torvalds	50b88c46f0	Merge master.kernel.org:/home/rmk/linux-2.6-arm * master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits) ARM: Fix ioremap_cached()/ioremap_wc() for SMP platforms ARM: 6043/1: AT91 slow-clock resume: Don't wait for a disabled PLL to lock ARM: 6031/1: fix Thumb-2 decompressor ARM: 6029/1: ep93xx: gpio.c: local functions should be static ARM: 6028/1: ARM: add MAINTAINERS for U300 ARM: 6024/1: bcmring: fix missing down on semaphore in dma.c MXC: mach_armadillo5x0: Add USB Host support. ARM mach-mx3: duplicated include ARM mach-mx3: duplicated include imx31: add watchdog device on litekit board. imx3: Add watchdog platform device support MXC: mach-mx31_3ds: add support for freescale mc13783 power management device. MXC: mach-mx31_3ds: Add SPI1 device support. MXC: mach-mx31_3ds: Add support for on board NAND Flash. MXC: mach-mx31_3ds: Update variable names over recent mach name modification. imx31: fix parent clock for rtc i.MX51: remove NFC AXI static mapping i.MX51: determine silicon revision dynamically i.MX51: map TZIC dynamically i.MX51: Use correct clock for gpt ...	2010-04-12 18:37:34 -07:00
Linus Torvalds	d6cf853d4d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure the chunk allocator doesn't create zero length chunks Btrfs: fix data enospc check overflow	2010-04-12 18:37:04 -07:00
Linus Torvalds	6a945f38be	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix possible dq_flags corruption quota: Hide warnings about writes to the filesystem before quota was turned on ext3: symlink must be handled via filesystem specific operation ext2: symlink must be handled via filesystem specific operation	2010-04-12 18:36:49 -07:00
Linus Torvalds	50fc88cb03	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: add speciffic ->setattr callback udf: potential integer overflow	2010-04-12 18:36:34 -07:00
Linus Torvalds	4505a49389	Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (36 commits) MIPS: Calculate proper ebase value for 64-bit kernels MIPS: Alchemy: DB1200: Remove custom wait implementation MIPS: Big Sur: Make defconfig more useful. MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules MIPS: Sibyte: Fix M3 TLB exception handler workaround. MIPS: BCM63xx: Fix build failure in board_bcm963xx.c MIPS: uasm: Add OR instruction. MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions. MIPS: BCM63xx: Initialize gpio_out_low & out_high to current value at boot. MIPS: BCM63xx: Register SSB SPROM fallback in board's first stage callback MIPS: BCM63xx: Fix typo in cpu-feature-overrides file. MIPS: BCM63xx: Add support for second uart. MIPS: BCM63xx: Fix double gpio registration. MIPS: BCM63xx: Add DWVS0 board MIPS: BCM63xx: Add the RTA1025W-16 BCM6348-based board to suppported boards. MIPS: BCM63xx: Fix BCM6338 and BCM6345 gpio count MIPS: libgcc.h: Checkpatch cleanup MIPS: Loongson-2F: Flush the branch target history in BTB and RAS MIPS: Move signal trampolines off of the stack. MIPS: Preliminary VDSO ...	2010-04-12 18:36:11 -07:00
Linus Torvalds	fedfb947b2	Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: svcrdma: RDMA support not yet compatible with RPC6	2010-04-12 18:34:56 -07:00
Linus Torvalds	44fa2b4bee	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo "numer" -> "number" in alloc.c nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v() nilfs2: fix a wrong type conversion in nilfs_ioctl()	2010-04-12 18:34:25 -07:00
Linus Torvalds	ea90002b0f	anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma Otherwise we might be mapping in a page in a new mapping, but that page (through the swapcache) would later be mapped into an old mapping too. The page->mapping must be the case that works for everybody, not just the mapping that happened to page it in first. Here's the scenario: - page gets allocated/mapped by process A. Let's call the anon_vma we associate the page with 'A' to keep it easy to track. - Process A forks, creating process B. The anon_vma in B is 'B', and has a chain that looks like 'B' -> 'A'. Everything is fine. - Swapping happens. The page (with mapping pointing to 'A') gets swapped out (perhaps not to disk - it's enough to assume that it's just not mapped any more, and lives entirely in the swap-cache) - Process B pages it in, which goes like this: do_swap_page -> page = lookup_swap_cache(entry); ... set_pte_at(mm, address, page_table, pte); page_add_anon_rmap(page, vma, address); And think about what happens here! In particular, what happens is that this will now be the "first" mapping of that page, so page_add_anon_rmap() used to do if (first) __page_set_anon_rmap(page, vma, address); and notice what anon_vma it will use? It will use the anon_vma for process B! What happens then? Trivial: process 'A' also pages it in (nothing happens, it's not the first mapping), and then process 'B' execve's or exits or unmaps, making anon_vma B go away. End result: process A has a page that points to anon_vma B, but anon_vma B does not exist any more. This can go on forever. Forget about RCU grace periods, forget about locking, forget anything like that. The bug is simply that page->mapping points to an anon_vma that was correct at one point, but was _not_ the one that was shared by all users of that possible mapping. Changing it to always use the deepest anon_vma in the anonvma chain gets us to the safest model. This can be improved in certain cases: if we know the page is private to just this particular mapping (for example, it's a new page, or it is the only swapcache entry), we could pick the top (most specific) anon_vma. But that's a future optimization. Make it _work_ reliably first. Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-12 17:54:13 -07:00
Linus Torvalds	646d87b481	anon_vma: clone the anon_vma chain in the right order We want to walk the chain in reverse order when cloning it, so that the order of the result chain will be the same as the order in the source chain. When we add entries to the chain, they go at the head of the chain, so we want to add the source head last. Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-12 17:54:12 -07:00
Linus Torvalds	287d97ac03	vma_adjust: fix the copying of anon_vma chains When we move the boundaries between two vma's due to things like mprotect, we need to make sure that the anon_vma of the pages that got moved from one vma to another gets properly copied around. And that was not always the case, in this rather hard-to-follow code sequence. Clarify the code, and fix it so that it copies the anon_vma from the right source. Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-12 17:54:11 -07:00
Linus Torvalds	d0e9fe1758	Simplify and comment on anon_vma re-use for anon_vma_prepare() This changes the anon_vma reuse case to require that we only reuse simple anon_vma's - ie the case when the vma only has a single anon_vma associated with it. This means that a reuse of an anon_vma from an adjacent vma will always guarantee that both vma's are associated not only with the same anon_vma, they will also have the same anon_vma chain (of just a single entry in this case). And since anon_vma re-use was the only case where the same anon_vma might be associated with different chains of anon_vma's, we now have the case that every vma that shares the same anon_vma will always also have the same chain. That makes it much easier to think about merging vma's that share the same anon_vma's: you can always just drop the other anon_vma chain in anon_vma_merge() since you know that they are always identical. This also splits up the function to validate the anon_vma re-use, and adds a lot of commentary about the possible races. Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-12 17:53:59 -07:00
Andrew Perepechko	08261673cb	quota: Fix possible dq_flags corruption dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a change done by an atomic operation can be overwritten by a change done by a non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk. Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:12:36 +02:00
Jan Kara	4c5e6c0e70	quota: Hide warnings about writes to the filesystem before quota was turned on For a root filesystem write to the filesystem before quota is turned on happens regularly and there's no way around it because of writes to syslog, /etc/mtab, and similar. So the warning is rather pointless for ordinary users. It's still useful during development so we just hide the warning behind __DQUOT_PARANOIA config option. Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:12:19 +02:00
Dmitry Monakhov	774f03fb2c	ext3: symlink must be handled via filesystem specific operation generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext3_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:11:39 +02:00
Dmitry Monakhov	fc7683a3c3	ext2: symlink must be handled via filesystem specific operation generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext2_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-04-12 21:11:25 +02:00

1 2 3 4 5 ...

189928 Commits