linux

iv/linux

Author	SHA1	Message	Date
Tim Deegan	ae03b63f92	fix jiffy calculations in calibrate_delay_direct to handle overflow commit 70a062286b9dfcbd24d2e11601aecfead5cf709a upstream. Fixes a hang when booting as dom0 under Xen, when jiffies can be quite large by the time the kernel init gets this far. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> [jbeulich@novell.com: !time_after() -> time_before_eq() as suggested by Jiri Slaby] Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:34 -08:00
Suresh Siddha	23665d48ff	x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms commit f7448548a9f32db38f243ccd4271617758ddfe2c upstream. Markus Kohn ran into a hard hang regression on an acer aspire 1310, when acpi is enabled. git bisect showed the following commit as the bad one that introduced the boot regression. commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Aug 19 18:05:36 2009 -0700 x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init Because of the UP configuration of that platform, native_smp_prepare_cpus() bailed out (in smp_sanity_check()) before doing the set_mtrr_aps_delayed_init() Further down the boot path, native_smp_cpus_done() will call the delayed MTRR initialization for the AP's (mtrr_aps_init()) with mtrr_aps_delayed_init not set. This resulted in the boot processor reprogramming its MTRR's to the values seen during the start of the OS boot. While this is not needed ideally, this shouldn't have caused any side-effects. This is because the reprogramming of MTRR's (set_mtrr_state() that gets called via set_mtrr()) will check if the live register contents are different from what is being asked to write and will do the actual write only if they are different. BP's mtrr state is read during the start of the OS boot and typically nothing would have changed when we ask to reprogram it on BP again because of the above scenario on an UP platform. So on a normal UP platform no reprogramming of BP MTRR MSR's happens and all is well. However, on this platform, bios seems to be modifying the fixed mtrr range registers between the start of OS boot and when we double check the live registers for reprogramming BP MTRR registers. And as the live registers are modified, we end up reprogramming the MTRR's to the state seen during the start of the OS boot. During ACPI initialization, something in the bios (probably smi handler?) don't like this fact and results in a hard lockup. We didn't see this boot hang issue on this platform before the commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3, because only the AP's (if any) will program its MTRR's to the value that BP had at the start of the OS boot. Fix this issue by checking mtrr_aps_delayed_init before continuing further in the mtrr_aps_init(). Now, only AP's (if any) will program its MTRR's to the BP values during boot. Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393 [ By the way, this behavior of the bios modifying MTRR's after the start of the OS boot is not common and the kernel is not prepared to handle this situation well. Irrespective of this issue, during suspend/resume, linux kernel will try to reprogram the BP's MTRR values to the values seen during the start of the OS boot. So suspend/resume might be already broken on this platform for all linux kernel versions. ] Reported-and-bisected-by: Markus Kohn <jabber@gmx.org> Tested-by: Markus Kohn <jabber@gmx.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@novell.com> Cc: Rafael Wysocki <rjw@novell.com> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:33 -08:00
Eric W. Biederman	6940b3914f	net: Fix ip link add netns oops commit 13ad17745c2cbd437d9e24b2d97393e0be11c439 upstream. Ed Swierk <eswierk@bigswitch.com> writes: > On 2.6.35.7 > ip link add link eth0 netns 9999 type macvlan > where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang: > [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d > [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0 > [10663.821944] Oops: 0000 [#1] SMP > [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq > [10663.821959] CPU 3 > [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class > [10663.822155] > [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO > [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286 > [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000 > [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000 > [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041 > [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818 > [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000 > [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000 > [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0 > [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0) > [10663.822236] Stack: > [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6 > [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818 > [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413 > [10663.822281] Call Trace: > [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0 > [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90 > [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0 > [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570 > [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570 > [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20 > [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290 > [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290 > [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0 > [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40 > [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0 > [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0 > [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120 > [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150 > [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80 > [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110 > [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70 > [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0 > [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0 > [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560 > [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150 > [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350 > [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b > [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55 > [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822627] RSP <ffff88014aebf7b8> > [10663.822631] CR2: 000000000000006d > [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]--- This bug was introduced in: commit 81adee47dfb608df3ad0b91d230fb3cef75f0060 Author: Eric W. Biederman <ebiederm@aristanetworks.com> Date: Sun Nov 8 00:53:51 2009 -0800 net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Where apparently I forgot to add error handling to the path where we create a new network device in a new network namespace, and pass in an invalid pid. Reported-by: Ed Swierk <eswierk@bigswitch.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:33 -08:00
Tejun Heo	bf7adcf7ea	ptrace: use safer wake up on ptrace_detach() commit 01e05e9a90b8f4c3997ae0537e87720eb475e532 upstream. The wake_up_process() call in ptrace_detach() is spurious and not interlocked with the tracee state. IOW, the tracee could be running or sleeping in any place in the kernel by the time wake_up_process() is called. This can lead to the tracee waking up unexpectedly which can be dangerous. The wake_up is spurious and should be removed but for now reduce its toxicity by only waking up if the tracee is in TRACED or STOPPED state. This bug can possibly be used as an attack vector. I don't think it will take too much effort to come up with an attack which triggers oops somewhere. Most sleeps are wrapped in condition test loops and should be safe but we have quite a number of places where sleep and wakeup conditions are expected to be interlocked. Although the window of opportunity is tiny, ptrace can be used by non-privileged users and with some loading the window can definitely be extended and exploited. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:32 -08:00
Pavel Machek	ee48eb6bd8	serial: unbreak billionton CF card commit d0694e2aeb815042aa0f3e5036728b3db4446f1d upstream. Unbreak Billionton CF bluetooth card. This actually fixes a regression on zaurus. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:32 -08:00
Dario Lombardo	4572634fff	drivers: update to pl2303 usb-serial to support Motorola cables commit 96a3e79edff6f41b0f115a82f1a39d66218077a7 upstream. Added 0x0307 device id to support Motorola cables to the pl2303 usb serial driver. This cable has a modified chip that is a pl2303, but declares itself as 0307. Fixed by adding the right device id to the supported devices list, assigning it the code labeled PL2303_PRODUCT_ID_MOTOROLA. Signed-off-by: Dario Lombardo <dario.lombardo@libero.it> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:32 -08:00
Ben Hutchings	3ccf04a7a7	68360serial: Plumb in rs_360_get_icount() commit 3e517f4b1de4787ecff87a73a9865a0b1aa2b10b upstream. Commit 0587102cf9f427c185bfdeb2cef41e13ee0264b1 replaced a direct implementation of SIOCGICOUNT with an implementation of tty_operations::get_icount, but it did not actually set rs_360_ops.get_icount. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:32 -08:00
Christian Lamparter	653779ea43	carl9170: fix typo in PS code commit 5820de5303f73d48dcc3a053c875d1f0da7eef67 upstream. This patch fixes a off-by-one bug which bugged the driver's PS-POLL capability. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:31 -08:00
Amit Shah	e3c1891dca	virtio: console: Wake up outvq on host notifications commit 2770c5ea501be69989a7acf54ec4cb3554c47191 upstream. The outvq needs to be woken up on host notifications so that buffers consumed by the host can be reclaimed, outvq freed, and application writes may proceed again. The need for this is now finally noticed when I have qemu patches ready to use nonblocking IO and flow control. CC: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:31 -08:00
Bruce Rogers	815df7acb4	virtio_net: Add schedule check to napi_enable call commit 3e9d08ec0a68f6faf718d5a7e050fe5ca0ba004f upstream. Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers <brogers@novell.com> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:31 -08:00
Mark Brown	921cedd310	ASoC: Create an AIF1ADCDAT signal widget to match AIF2 commit 7f94de483f4e37e14d646ad6e85a3c82f66fb487 upstream. Due to the different routing for AIF1 and AIF2 we weren't using a single widget to represent the ADCDAT signal. For consistency add one. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:30 -08:00
Qiao Zhou	fe5b472ca2	ASoC: WM8994: fix wrong value in tristate function commit 78b3fb46753872fc79bffecc8d50355a8622b39b upstream. fix wrong value in wm8994_set_tristate func. when updating reg bits, it should use "value", not "reg". Signed-off-by: Qiao Zhou <zhouqiao@marvell.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:30 -08:00
Mark Brown	79063a02e2	ASoC: Handle low measured DC offsets for wm_hubs devices commit 20a4e7fc7e213365ea3771d7bf1e10a6bab853be upstream. The DC servo codes are actually signed numbers so need to be treated as such. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:30 -08:00
Dmitry Eremin-Solenikov	3ef0ec4ba4	ASoC: correct link specifications for corgi, poodle and spitz commit a3adfa00e8089aa72826c6ba04bcb18cfceaf0a9 upstream. ASoC DAI link descriptions for Corgi, Poodle and Spitz platforms contained incorrect names for cpu_dai and codec, which effectievly disabled sound on theese platforms. Fix that errors. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:29 -08:00
Mark Brown	a6d5413a75	ASoC: Fix AC'97 registration unwind commit 7d8316df44053687625eef792d53b3ac62e82248 upstream. soc_unregister_ac97_dai_link() takes a CODEC as an argument, not a rtd like the registration function, so give it what it's looking for. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Reported-by: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:29 -08:00
Sudhakar Rajashekhara	65e603adc8	ASoC: da8xx/omap-l1xx: match codec_name with i2c ids commit dc5a460a1bfa44273653700e33d4e7051194cbfd upstream. The codec_name entry for da8xx evm in sound/soc/davinci/davinci-evm.c is not matching with the i2c ids in the board file. Without this fix the soundcard does not get detected on da850/omap-l138/am18x evm. Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com> Tested-by: Dan Sharon <dansharon@nanometrics.ca> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:28 -08:00
Jean Delvare	84d6dfc30b	i2c: Unregister dummy devices last on adapter removal commit 5219bf884b6e2b54e734ca1799b6f0014bb2b4b7 upstream. Remove real devices first and dummy devices last. This gives device driver which instantiated dummy devices themselves a chance to clean them up before we do. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:28 -08:00
Christian Lamparter	99939b7d2b	p54: fix sequence no. accounting off-by-one error commit 3b5c5827d1f80ad8ae844a8b1183f59ddb90fe25 upstream. P54_HDR_FLAG_DATA_OUT_SEQNR is meant to tell the firmware that "the frame's sequence number has already been set by the application." Whereas IEEE80211_TX_CTL_ASSIGN_SEQ is set for frames which lack a valid sequence number and either the driver or firmware has to assign one. Yup, it's the exact opposite! Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:27 -08:00
Shaohua Li	0f212b8754	fix a shutdown regression in intel_idle commit ec30f343d61391ab23705e50a525da1d55395780 upstream. Fix a shutdown regression caused by 2a2d31c8dc6f ("intel_idle: open broadcast clock event"). The clockevent framework can automatically shutdown broadcast timers for hotremove CPUs. And we get a shutdown regression when we shutdown broadcast timer for hot remove CPU, so just delete some code. Also fix some section mismatch. Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:26 -08:00
Shaohua Li	0f076e96ea	intel_idle: open broadcast clock event commit 2a2d31c8dc6f1ebcf5eab1d93a0cb0fb4ed57c7c upstream. Intel_idle driver uses CLOCK_EVT_NOTIFY_BROADCAST_ENTER CLOCK_EVT_NOTIFY_BROADCAST_EXIT for broadcast clock events. The _ENTER/_EXIT doesn't really open broadcast clock events, please see processor_idle.c for an example. In some situation, this will cause boot hang, because some CPUs enters idle but local APIC timer stalls. Reported-and-tested-by: Yan Zheng <zheng.z.yan@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:26 -08:00
Rafael J. Wysocki	dbe128e978	cpuidle: Make cpuidle_enable_device() call poll_idle_init() commit d8c216cfa57e8a579f41729cbb88c97835d9ac8d upstream. The following scenario is possible with the current cpuidle code and the ACPI cpuidle driver: (1) acpi_processor_cst_has_changed() is called, (2) cpuidle_disable_device() is called, (3) cpuidle_remove_state_sysfs() is called to remove the (presumably outdated) states info from sysfs, (3) acpi_processor_get_power_info() is called, the first entry in the pr->power.states[] table is filled with zeros, (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the first entry in pr->power.states[], (5) cpuidle_enable_device() is called, (6) __cpuidle_register_device() is _not_ called, since the device has already been registered, (7) Consequently, poll_idle_init() is _not_ called either, (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes for the new states and it uses the bogus first table entry from acpi_processor_get_power_info() for creating state0. This problem is avoided if cpuidle_enable_device() unconditionally calls poll_idle_init(). Reported-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:26 -08:00
Hans-Christian Egtvedt	db26f7f40d	avr32: use syscall prototypes from asm-generic instead of arch commit 664cb7142ced8b827e92e1851d1ed2cae922f225 upstream. This patch removes the redundant syscalls prototypes in the architecture specific syscalls.h header file. These were identical with the ones in asm-generic/syscalls.h. Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Reported-by: Peter Huewe <PeterHuewe@gmx.de> Reported-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:26 -08:00
Sven Neumann	39ed1e1ac8	ds2760_battery: Fix calculation of time_to_empty_now commit 86af95039b69a90db15294eb1f9c147f1df0a8ea upstream. A check against division by zero was modified in commit b0525b48. Since this change time_to_empty_now is always reported as zero while the battery is discharging and as a negative value while the battery is charging. This is because current is negative while the battery is discharging. Fix the check introduced by commit b0525b48 so that time_to_empty_now is reported correctly during discharge and as zero while charging. Signed-off-by: Sven Neumann <s.neumann@raumfeld.com> Acked-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:26 -08:00
Lars-Peter Clausen	fcd8c1ff60	jz4740-battery: Protect against concurrent battery readings commit 8ec98fe0b4ffdedce4c1caa9fb3d550f52ad1c6b upstream. We can not handle more then one ADC request at a time to the battery. The patch adds a mutex around the ADC read code to ensure this. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:25 -08:00
Milton Miller	ab8ce49d8f	virtio: remove virtio-pci root device commit 8b3bb3ecf1934ac4a7005ad9017de1127e2fbd2f upstream. We sometimes need to map between the virtio device and the given pci device. One such use is OS installer that gets the boot pci device from BIOS and needs to find the relevant block device. Since it can't, installation fails. Instead of creating a top-level devices/virtio-pci directory, create each device under the corresponding pci device node. Symlinks to all virtio-pci devices can be found under the pci driver link in bus/pci/drivers/virtio-pci/devices, and all virtio devices under drivers/bus/virtio/devices. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gleb Natapov <gleb@redhat.com> Tested-by: "Daniel P. Berrange" <berrange@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:25 -08:00
Tejun Heo	69e992c520	PCI: pci-stub: ignore zero-length id parameters commit 99a0fadf561e1f553c08f0a29f8b2578f55dd5f0 upstream. pci-stub uses strsep() to separate list of ids and generates a warning message when it fails to parse an id. However, not specifying the parameter results in ids set to an empty string. strsep() happily returns the empty string as the first token and thus triggers the warning message spuriously. Make the tokner ignore zero length ids. Reported-by: Chris Wright <chrisw@sous-sol.org> Reported-by: Prasad Joshi <P.G.Joshi@student.reading.ac.uk> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:24 -08:00
Paul Mundt	3033aa7eb6	sh: Fix up legacy PTEA space attribute mapping. commit efb3e34b6176d30c4fe8635fa8e1beb6280cc2cd upstream. When p3_ioremap() was converted to ioremap_prot() there was some breakage introduced where the 29-bit segmentation logic would trap the area range and return an identity mapping without having allowed the area specification to force mapping through page tables. This wires up a PCC mask for pgprot verification to work out whether to short-circuit the identity mapping on legacy parts, restoring the previous behaviour. Reported-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:24 -08:00
Eric Dumazet	f0a32b09fd	net_sched: pfifo_head_drop problem [ Upstream commit 44b8288308ac9da27eab7d7bdbf1375a568805c3 ] commit 57dbb2d83d100ea (sched: add head drop fifo queue) introduced pfifo_head_drop, and broke the invariant that sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing counters only) This can break estimators because est_timer() handles unsigned deltas only. A decreasing counter can then give a huge unsigned delta. My mid term suggestion would be to change things so that sch->bstats.bytes and sch->bstats.packets are incremented in dequeue() only, not at enqueue() time. We also could add drop_bytes/drop_packets and provide estimations of drop rates. It would be more sensible anyway for very low speeds, and big bursts. Right now, if we drop packets, they still are accounted in byte/packets abolute counters and rate estimators. Before this mid term change, this patch makes pfifo_head_drop behavior similar to other qdiscs in case of drops : Dont decrement sch->bstats.bytes and sch->bstats.packets Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:23 -08:00
Eric Dumazet	83093a9e87	ipv4: IP defragmentation must be ECN aware [ Upstream commit 6623e3b24a5ebb07e81648c478d286a1329ab891 ] RFC3168 (The Addition of Explicit Congestion Notification to IP) states : 5.3. Fragmentation ECN-capable packets MAY have the DF (Don't Fragment) bit set. Reassembly of a fragmented packet MUST NOT lose indications of congestion. In other words, if any fragment of an IP packet to be reassembled has the CE codepoint set, then one of two actions MUST be taken: * Set the CE codepoint on the reassembled packet. However, this MUST NOT occur if any of the other fragments contributing to this reassembly carries the Not-ECT codepoint. * The packet is dropped, instead of being reassembled, for any other reason. This patch implements this requirement for IPv4, choosing the first action : If one fragment had NO-ECT codepoint reassembled frame has NO-ECT ElIf one fragment had CE codepoint reassembled frame has CE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:23 -08:00
Eric Dumazet	ed12313af4	net: add POLLPRI to sock_def_readable() [ Upstream commit 2c6607c611cb7bf0a6750bcea34a258144e302c5 ] Leonardo Chiquitto found poll() could block forever on tcp sockets and Urgent data was received, if the event flag only contains POLLPRI. He did a bisection and found commit 4938d7e0233 (poll: avoid extra wakeups in select/poll) was the source of the problem. Problem is TCP sockets use standard sock_def_readable() function for their sk_data_ready() handler, and sock_def_readable() doesnt signal POLLPRI. Only TCP is affected by the problem. Adding POLLPRI to the list of flags might trigger unnecessary schedules, but URGENT handling is such a seldom used feature this seems a good compromise. Thanks a lot to Leonardo for providing the bisection result and a test program as well. Reference : http://www.spinics.net/lists/netdev/msg151793.html Reported-and-bisected-by: Leonardo Chiquitto <leonardo.lists@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:23 -08:00
Alexey Kuznetsov	3e7e81d9e8	inet6: prevent network storms caused by linux IPv6 routers [ Upstream commit 72b43d0898e97f588293b4a24b33c58c46633d81 ] Linux IPv6 forwards unicast packets, which are link layer multicasts... The hole was present since day one. I was 100% this check is there, but it is not. The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network. This software resolves IPv6 unicast addresses to multicast MAC addresses. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:23 -08:00
David S. Miller	bdbeaf1567	af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. [ Upstream commit 3610cda53f247e176bcbb7a7cca64bc53b12acdb ] unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:22 -08:00
David S. Miller	37b4b584a0	atyfb: Fix bootup hangs on sparc64. [ Upstream commit 09798eb9479da3413bdf96e7d22a84d8b21e05e1 ] After commit 25edd6946a1d74e5e77813c2324a0908c68bcf9e ("sparc64: Get rid of indirect p1275 PROM call buffer.") we can't pass virtual addresses >4GB to PROM calls. Largely this is never necessary in drivers because we have a copy of the entire PROM device tree in the kernel and a set of of_*() interfaces to access it. Unfortunately there were some lingering prom calls in the atyfb driver, in particular prom_finddevice() was being called with an on-stack address which could be anywhere. This code is actually probing for information we already have, the PROM choosen console output device is stored in of_console_device so all of this nasty code consolidates into a one-line comparison. Next we have some prom_getintdefault() calls which are trivially transformed into the equivalent of_getintprop_default(). Special thanks to Fabio, who figured out exactly where the bootup was hanging. That made this bug trivial to fix. Reported-by: Fabio M. Di NItto <fabbione@fabbione.net> Reported-by: Sam Ravnborg <sam@ravnborg.org> Reported-by: Frans van Berckel <fberckel@xs4all.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Fabio M. Di NItto <fabbione@fabbione.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:22 -08:00
Thomas Taranowski	69def87c5c	rapidio: fix hang on RapidIO doorbell queue full condition commit 12a4dc43911785f51a596f771ae0701b18d436f1 upstream. In fsl_rio_dbell_handler() the code currently simply acknowledges the QFI queue full interrupt, but does nothing to resolve the queue full condition. Instead, it jumps to the end of the isr. When a queue full condition occurs, the isr is then re-entered immediately and continually, forever. The fix is to just fall through and read out current doorbell entries. Signed-off-by: Thomas Taranowski <tom@baringforge.com> Cc: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:22 -08:00
Eric Sandeen	2be2f98b96	ext4: make grpinfo slab cache names static commit 2892c15ddda6a76dc10b7499e56c0f3b892e5a69 upstream. In 2.6.37 I was running into oopses with repeated module loads & unloads. I tracked this down to: fb1813f4 ext4: use dedicated slab caches for group_info structures (this was in addition to the features advert unload problem) The kstrdup & subsequent kfree of the cache name was causing a double free. In slub, at least, if I read it right it allocates & frees the name itself, slab seems to do something different... so in slub I think we were leaking -our- cachep->name, and double freeing the one allocated by slub. After getting lost in slab/slub/slob a bit, I just looked at other sized-caches that get allocated. jbd2, biovec, sgpool all do it more or less the way jbd2 does. Below patch follows the jbd2 method of dynamically allocating a cache at mount time from a list of static names. (This might also possibly fix a race creating the caches with parallel mounts running). [Folded in a fix from Dan Carpenter which fixed an off-by-one error in the original patch] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:21 -08:00
Curt Wohlgemuth	617d99a8dd	ext4: Fix data corruption with multi-block writepages support commit d50bdd5aa55127635fd8a5c74bd2abb256bd34e3 upstream. This fixes a corruption problem with the multi-block writepages submittal change for ext4, from commit bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio layer instead of buffer layer in mpage_da_submit_io"). (Note that this corruption is not present in 2.6.37 on ext4, because the corruption was detected after the feature was merged in 2.6.37-rc1, and so it was turned off by adding a non-default mount option, mblk_io_submit. With this commit, which hopefully fixes the last of the bugs with this feature, we'll be able to turn on this performance feature by default in 2.6.38, and remove the mblk_io_submit option.) The ext4 code path to bundle multiple pages for writeback in ext4_bio_write_page() had a bug: we should be clearing buffer head dirty flags before we submit the bio, not in the completion routine. The patch below was tested on 2.6.37 under KVM with the postgresql script which was submitted by Jon Nelson as documented in commit 1449032be1. Without the patch, I'd hit the corruption problem about 50-70% of the time. With the patch, I executed the script > 100 times with no corruption seen. I also fixed a bug to make sure ext4_end_bio() doesn't dereference the bio after the bio_put() call. Reported-by: Jon Nelson <jnelson@jamponi.net> Reported-by: Matthias Bayer <jackdachef@gmail.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:21 -08:00
Lukas Czerner	4b01549142	ext4: unregister features interface on module unload commit 8f021222c1e2756ea4c9dde93b23e1d2a0a4ec37 upstream. Ext4 features interface was not properly unregistered which led to problems while unloading/reloading ext4 module. This commit fixes that by adding proper kobject unregistration code into ext4_exit_fs() as well as fail-path of ext4_init_fs() Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:21 -08:00
Eric Sandeen	974bc5d196	ext4: fix panic on module unload when stopping lazyinit thread commit 8f1f745331c1b560f53c0d60e55a4f4f43f7cce5 upstream. https://bugzilla.kernel.org/show_bug.cgi?id=27652 If the lazyinit thread is running, the teardown function ext4_destroy_lazyinit_thread() has problems: ext4_clear_request_list(); while (ext4_li_info->li_task) { wake_up(&ext4_li_info->li_wait_daemon); wait_event(ext4_li_info->li_wait_task, ext4_li_info->li_task == NULL); } Clearing the request list will cause the thread to exit and free ext4_li_info, so then we're waiting on something which is getting freed. Fix this up by making the thread respond to kthread_stop, and exit, without the need to wait for that exit in some other homegrown way. Reported-and-Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:21 -08:00
Theodore Ts'o	8a249a8e34	ext4: fix memory leak in ext4_free_branches commit 1c5b9e9065567876c2d4a7a16d78f0fed154a5bf upstream. Commit 40389687 moved a call to ext4_forget() out of ext4_free_branches and let ext4_free_blocks() handle calling bforget(). But that change unfortunately did not replace the call to ext4_forget() with brelse(), which was needed to drop the in-use count of the indirect block's buffer head, which lead to a memory leak when deleting files that used indirect blocks. Fix this. Thanks to Hugh Dickins for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:20 -08:00
Jan Kara	3d597eb5d4	ext4: fix trimming of a single group commit ca6e909f9bebe709bc65a3ee605ce32969db0452 upstream. When ext4_trim_fs() is called to trim a part of a single group, the logic will wrongly set last block of the interval to 'len' instead of 'first_block + len'. Thus a shorter interval is possibly trimmed. Fix it. CC: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:20 -08:00
Andrew Morton	3d8874cb34	ext4: fix uninitialized variable in ext4_register_li_request commit 6c5a6cb998854f3c579ecb2bc1423d302bcb1b76 upstream. fs/ext4/super.c: In function 'ext4_register_li_request': fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function It looks buggy to me, too. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:20 -08:00
Jan Kara	9b17038d06	writeback: avoid livelocking WB_SYNC_ALL writeback commit b9543dac5bbc4aef0a598965b6b34f6259ab9a9b upstream. When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is usually set to LONG_MAX. The logic in wb_writeback() then calls __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we easily end up with non-positive nr_to_write after the function returns, if the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment. When nr_to_write is <= 0 wb_writeback() decides we need another round of writeback but this is wrong in some cases! For example when a single large file is continuously dirtied, we would never finish syncing it because each pass would be able to write MAX_WRITEBACK_PAGES and inode dirty timestamp never gets updated (as inode is never completely clean). Thus __writeback_inodes_sb() would write the redirtied inode again and again. Fix the issue by setting nr_to_write to LONG_MAX in WB_SYNC_ALL mode. We do not need nr_to_write in WB_SYNC_ALL mode anyway since write_cache_pages() does livelock avoidance using page tagging in WB_SYNC_ALL mode. This makes wb_writeback() call __writeback_inodes_sb() only once on WB_SYNC_ALL. The latter function won't livelock because it works on - a finite set of files by doing queue_io() once at the beginning - a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no longer able to stall sync forever. [fengguang.wu@intel.com: fix locking comment] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:19 -08:00
Jan Kara	fe9cb4c383	writeback: stop background/kupdate works from livelocking other works commit aa373cf550994623efb5d49a4d8775bafd10bbc1 upstream. Background writeback is easily livelockable in a loop in wb_writeback() by a process continuously re-dirtying pages (or continuously appending to a file). This is in fact intended as the target of background writeback is to write dirty pages it can find as long as we are over dirty_background_threshold. But the above behavior gets inconvenient at times because no other work queued in the flusher thread's queue gets processed. In particular, since e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1) can hang forever waiting for flusher thread to do the work. Generally, when a flusher thread has some work queued, someone submitted the work to achieve a goal more specific than what background writeback does. Moreover by working on the specific work, we also reduce amount of dirty pages which is exactly the target of background writeout. So it makes sense to give specific work a priority over a generic page cleaning. Thus we interrupt background writeback if there is some other work to do. We return to the background writeback after completing all the queued work. This may delay the writeback of expired inodes for a while, however the expired inodes will eventually be flushed to disk as long as the other works won't livelock. [fengguang.wu@intel.com: update comment] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:19 -08:00
Jan Kara	84d7acf2c5	writeback: integrated background writeback work commit 6585027a5e8cb490e3a761b2f3f3c3acf722aff2 upstream. Check whether background writeback is needed after finishing each work. When bdi flusher thread finishes doing some work check whether any kind of background writeback needs to be done (either because dirty_background_ratio is exceeded or because we need to start flushing old inodes). If so, just do background write back. This way, bdi_start_background_writeback() just needs to wake up the flusher thread. It will do background writeback as soon as there is no other work. This is a preparatory patch for the next patch which stops background writeback as soon as there is other work to do. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:19 -08:00
Thomas Gleixner	212075b89b	genirq: Prevent irq storm on migration commit f1a06390d013244e721372b3f9b66e39b6429c71 upstream. move_native_irq() masks and unmasks the interrupt line unconditionally, but the interrupt line might be masked due to a threaded oneshot handler in progress. Unmasking the line in that case can lead to interrupt storms. Observed on PREEMPT_RT. Originally-from: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:19 -08:00
Hugh Dickins	42833e7fab	mm: fix hugepage migration commit fd4a4663db293bfd5dc20fb4113977f62895e550 upstream. 2.6.37 added an unmap_and_move_huge_page() for memory failure recovery, but its anon_vma handling was still based around the 2.6.35 conventions. Update it to use page_lock_anon_vma, get_anon_vma, page_unlock_anon_vma, drop_anon_vma in the same way as we're now changing unmap_and_move(). I don't particularly like to propose this for stable when I've not seen its problems in practice nor tested the solution: but it's clearly out of synch at present. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:18 -08:00
Hugh Dickins	e27ef3e9a8	mm: fix migration hangs on anon_vma lock commit 1ce82b69e96c838d007f316b8347b911fdfa9842 upstream. Increased usage of page migration in mmotm reveals that the anon_vma locking in unmap_and_move() has been deficient since 2.6.36 (or even earlier). Review at the time of f18194275c39835cb84563500995e0d503a32d9a ("mm: fix hang on anon_vma->root->lock") missed the issue here: the anon_vma to which we get a reference may already have been freed back to its slab (it is in use when we check page_mapped, but that can change), and so its anon_vma->root may be switched at any moment by reuse in anon_vma_prepare. Perhaps we could fix that with a get_anon_vma_unless_zero(), but let's not: just rely on page_lock_anon_vma() to do all the hard thinking for us, then we don't need any rcu read locking over here. In removing the rcu_unlock label: since PageAnon is a bit in page->mapping, it's impossible for a !page->mapping page to be anon; but insert VM_BUG_ON in case the implementation ever changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:18 -08:00
Mel Gorman	f64d080166	mm: migration: use rcu_dereference_protected when dereferencing the radix tree slot during file page migration commit 29c1f677d424e8c5683a837fc4f03fc9f19201d7 upstream. migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d ("fix rcu_read_lock() in page migraton"). The point of the RCU protection there is part of getting a stable reference to anon_vma and is only held for anon pages as file pages are locked which is sufficient protection against freeing. However, while a file page's mapping is being migrated, the radix tree is double checked to ensure it is the expected page. This uses radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held triggering the following warning. [ 173.674290] =================================================== [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] [ 173.676016] --------------------------------------------------- [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [ 173.676016] [ 173.676016] other info that might help us debug this: [ 173.676016] [ 173.676016] [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 [ 173.676016] 1 lock held by hugeadm/2899: [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab [ 173.676016] [ 173.676016] stack backtrace: [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild [ 173.676016] Call Trace: [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa This patch introduces radix_tree_deref_slot_protected() which calls rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock that is protecting this dereference. Holding the tree lock protects against parallel updaters of the radix tree meaning that rcu_dereference_protected is allowable. [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Milton Miller <miltonm@bga.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:18 -08:00
Jerome Marchand	c9357b750e	block: fix accounting bug on cross partition merges commit 09e099d4bafea3b15be003d548bdf94b4b6e0e17 upstream. /proc/diskstats would display a strange output as follows. $ cat /proc/diskstats \|grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. \| hd_struct->in_flight --------------------------- sda1 \| -1 sda2 \| 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. Also add a refcount to struct hd_struct to keep the partition in memory as long as users exist. We use kref_test_and_get() to ensure we don't add a reference to a partition which is going away. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:17 -08:00
Jerome Marchand	07384b42b6	kref: add kref_test_and_get commit e4a683c899cd5a49f8d684a054c95bd115a0c005 upstream. Add kref_test_and_get() function, which atomically add a reference only if refcount is not zero. This prevent to add a reference to an object that is already being removed. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-17 15:14:17 -08:00

1 2 3 4 5 ...

223941 Commits