277035 Commits

Author SHA1 Message Date
Wolfram Sang
2f4478ccff mtd: tests: stresstest: bail out if device has not enough eraseblocks
stresstest needs at least two eraseblocks. Bail out gracefully if that
condition is not met. Fixes the following 'division by zero' OOPS:

[  619.100000] mtd_stresstest: MTD device size 131072, eraseblock size 131072, page size 2048, count of eraseblocks 1, pages per eraseblock 64, OOB size 64
[  619.120000] mtd_stresstest: scanning for bad eraseblocks
[  619.120000] mtd_stresstest: scanned 1 eraseblocks, 0 are bad
[  619.130000] mtd_stresstest: doing operations
[  619.130000] mtd_stresstest: 0 operations done
[  619.140000] Division by zero in kernel.
...

caused by

        /* Read or write up 2 eraseblocks at a time - hence 'ebcnt - 1' */
        eb %= (ebcnt - 1);

Cc: stable@kernel.org
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:12:42 +00:00
Axel Lin
f99640dee2 mtd: convert drivers/mtd/* to use module_platform_driver()
This patch converts the drivers in drivers/mtd/* to use the
module_platform_driver() macro which makes the code smaller and a bit
simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:12:35 +00:00
Robert Jarzmik
1f9327fcff Documentation: add sysfs entries for mtd docg3 chips
Add documentation for MSystems disk-on-chip docg3 chips
sysfs entries, which enable and disable protection areas,
giving or disabling access to the chip's memory.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:12:17 +00:00
Dan Carpenter
b49e345e61 mtd: docg3: dereferencing an ERR_PTR() in docg3_probe()
If doc_probe_device() returned an ERR_PTR, then we accidentally saved
that to docg3_floors[floor] = mtd; which gets derefenced in the error
handling when we call doc_release_device().

I've reworked the error handling to take care of that and hopefully
make it a little simpler.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:32 +00:00
Lars-Peter Clausen
5d3667eee4 mtd: Remove redundant spi driver bus initialization
In ancient times it was necessary to manually initialize the bus field of an
spi_driver to spi_bus_type. These days this is done in spi_driver_register(),
so we can drop the manual assignment.

The patch was generated using the following coccinelle semantic patch:
// <smpl>
@@
identifier _driver;
@@
struct spi_driver _driver = {
	.driver = {
-		.bus = &spi_bus_type,
	},
};
// </smpl>

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:31 +00:00
Robert Jarzmik
0f769d3f9e mtd: docg3: add protection areas sysfs access
As each docg3 chip has 2 protection areas (DPS0 and DPS1),
and because theses areas can prevent user access to the chip
data, add for each floor the sysfs entries which insert the
protection key into the right DPS.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:30 +00:00
Robert Jarzmik
c3de8a8a5a mtd: docg3: add fast mode
Docg3 chips can work in 3 modes : normal MLC mode, fast
mode and reliable mode. Normally, as docg3 is a MLC chip, it
should be configured to work in normal mode.

In both normal mode, each page is distinct. This
means that writing to page 12 of blocks 14,15 writes only to
that page, and reading from page 12 of blocks 14,15 reads
only from that page.

In reliable and fast modes, pages are coupled by pairs, and
are clones one of each other. This means that the available
capacity of the chip is halved. Pages are coupled in each
block, and page of index 2*n contains the same data as page
2*n+1 of the same block.

In fast mode, the reads occur a bit faster, but are a bit
less reliable that in normal mode.

When reading from page 2*n, the chip reads bytes from both
page 2*n and page 2*n+1, makes a logical and for each byte,
and returns the result. As programming a page means
"clearing bits", even if a bit was not cleared on one page
because the flash is worn out, the other page has the bit
cleared, and the result of the "AND" gives a correct result.

When writing to page 2*n, the chip writes data to both page
2*n and page 2*n+1.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:29 +00:00
Robert Jarzmik
e4b2a96aeb mtd: docg3: add suspend and resume
Add functions to powerdown and powerup from suspend, in
order to save power.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:28 +00:00
Robert Jarzmik
d13d19ece3 mtd: docg3: add ECC correction code
Credit for discovering the BCH algorith parameters, and bit
reversing algorithm is to be give to Mike Dunn and Ivan
Djelic.

The BCH correction code relied upon the BCH library, where
all data and ECC is bit-reversed. The BCH library works
correctly when each input byte is bit-reversed, and
accordingly ECC output is also bit-reversed.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:26 +00:00
Robert Jarzmik
7a7fcf1402 mtd: docg3: map erase and write functions
Map the developped write and erase functions into the mtd
structure.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:25 +00:00
Robert Jarzmik
de03cd716b mtd: docg3: add erase functions
Add erase capability to the docg3 driver. The erase block is
made of 2 physical blocks, as both share all 64 pages. That
makes an erase block of at least 64 kBytes.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:24 +00:00
Robert Jarzmik
fb50b58e48 mtd: docg3: add write functions
Add write capability to the docg3 driver. The writes are
possible on a single page (512 bytes + 16 bytes), even if
that page is split on 2 physical pages on 2 blocks (each on
one plane).

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:23 +00:00
Robert Jarzmik
316e627edc mtd: docg3: add OOB buffer to device structure
Add OOB buffer area to store the OOB data until the actual
page is written, so that it can be completed by hardware ECC
generator.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:22 +00:00
Robert Jarzmik
376fbf2087 mtd: docg3: add registers for erasing and writing
Add the required registers and commands to erase and write
flash pages / blocks.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:21 +00:00
Robert Jarzmik
732b63bd8c mtd: docg3: add OOB layout to mtdinfo
Add OOB layout description for docg3, so that userspace can
use this information to setup the data for write_oob().

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:19 +00:00
Robert Jarzmik
ae9d4934b2 mtd: docg3: add multiple floor support
Add support for multiple floors, ie. cascaded docg3
chips. There might be 4 docg3 chips cascaded, sharing the
same address space, and providing up to 4 times the storage
capacity of a unique chip.

Each floor will be seen as an independant mtd device.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:18 +00:00
Robert Jarzmik
32a50b3a45 mtd: docg3: fix reading oob+data without correction
Fix the docg3 reads to be able to cope with all possible
data buffer / oob buffer / file mode combinations from
docg3_read_oob().
This especially ensures that raw reads do not use ECC
corrections, and AUTOOOB and PLACEOOB do use ECC
correction.

The approach is to empty docg3_read() and make it a wrapper
to docg3_read_oob(). As docg3_read_oob() handles all the
funny cases (no data buffer but oob buffer, data buffer but
no oob buffer, ...), docg3_read() is just a special use of
docg3_read_oob().

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:17 +00:00
Robert Jarzmik
34db8a5a72 mtd: docg3: fix BCH registers
BCH registers are contiguous, not on every byte. Fix the
register definitions.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:16 +00:00
Robert Jarzmik
dbc26d98f8 mtd: docg3: fix protection areas reading
The protection areas boundaries were on 16bit registers, not
8bit. This is consistent with block numbers, which can
extend up to 4096 on bigger chips (and is 2048 on the
docg3).

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:15 +00:00
Robert Jarzmik
84a930581e mtd: docg3: fix tracing of IO in writeb
Writeb was incorrectly traced as a 16 bits write, instead of
a 8 bits write. Fix it by tracing the correct width.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:13 +00:00
Robert Jarzmik
ac48e800c0 mtd: docg3: fix debug log verbosity
Change the NOP debug log verbosity to very verbose to
unburden log analysis.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:12 +00:00
Shubhrajyoti D
12f049bd59 mtd: nand: Making MTD_NAND_OMAP2 depend on ARCH_OMAP2PLUS
Making  MTD_NAND_OMAP2 depend on ARCH_OMAP2PLUS instead of
oring with ARCH2/3/4.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:07:04 +00:00
Aaron Sierra
8e987465a1 mtd: cfi: Allow per-mapping CFI device endianness
This patch allows each CFI device map to use its own endianness. The
globally defined CFI endianness (CONFIG_MTD_CFI_NOSWAP,
CONFIG_MTD_CFI_BE_BYTE_SWAP or CONFIG_MTD_CFI_LE_BYTE_SWAP) becomes the
default value which can be overridden by a driver for a particular device.

Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:06:13 +00:00
Brian Norris
342ff28f5a mtd: mtd_blkdevs: don't increase 'open' count on error path
Some error paths in mtd_blkdevs were fixed in the following commit:

    commit 94735ec4044a6d318b83ad3c5794e931ed168d10
    mtd: mtd_blkdevs: fix error path in blktrans_open

But on these error paths, the block device's `dev->open' count is
already incremented before we check for errors. This meant that, while
the error path was handled correctly on the first time through
blktrans_open(), the device is erroneously considered already open on
the second time through.

This problem can be seen, for instance, when a UBI volume is
simultaneously mounted as a UBIFS partition and read through its
corresponding gluebi mtdblockX device. This results in blktrans_open()
passing its error checks (with `dev->open > 0') without actually having
a handle on the device. Here's a summarized log of the actions and
results with nandsim:

    # modprobe nandsim
    # modprobe mtdblock
    # modprobe gluebi
    # modprobe ubifs
    # ubiattach /dev/ubi_ctrl -m 0
    ...
    # ubimkvol /dev/ubi0 -N test -s 16MiB
    ...
    # mount -t ubifs ubi0:test /mnt
    # ls /dev/mtdblock*
    /dev/mtdblock0  /dev/mtdblock1
    # cat /dev/mtdblock1 > /dev/null
    cat: can't open '/dev/mtdblock4': Device or resource busy
    # cat /dev/mtdblock1 > /dev/null

    CPU 0 Unable to handle kernel paging request at virtual address
    fffffff0, epc == 8031536c, ra == 8031f280
    Oops[#1]:
    ...
    Call Trace:
    [<8031536c>] ubi_leb_read+0x14/0x164
    [<8031f280>] gluebi_read+0xf0/0x148
    [<802edba8>] mtdblock_readsect+0x64/0x198
    [<802ecfe4>] mtd_blktrans_thread+0x330/0x3f4
    [<8005be98>] kthread+0x88/0x90
    [<8000bc04>] kernel_thread_helper+0x10/0x18

Cc: stable@kernel.org [3.0+]
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:04:01 +00:00
Brian Norris
8c34233596 mtd: nand: scan 1st and 2nd page for Macronix SLC
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 17:59:53 +00:00
Brian Norris
c01804edde mtd: nand: add 512 Mbit device code (Macronix)
Macronix MX30LF1208AA is a 512 Mbit NAND with device code 0xF0.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 17:59:44 +00:00
Brian Norris
c1257b4798 mtd: nand: add Macronix manufacturer
Macronix is produing SLC NAND MX30LF1208AA, so add their manufacturer
code to the manufacturer lists.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 17:59:33 +00:00
Linus Torvalds
5f0a6e2d50 Linux 3.2-rc7 2011-12-23 21:51:06 -08:00
Linus Torvalds
a22681fabb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Fix race between CPU hotplug and lglocks
2011-12-23 21:47:28 -08:00
Linus Torvalds
6d451c578c for linus: writeback reason binary tracing format fix
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJO9EbVAAoJECvKgwp+S8JaUG0P/RDICTvG5b6/YD1wwh4cHBTF
 xu4av5o+Okablr282vLt1d9N4nLP6A4Jp2XOxNoLdyUVMtwRNCMjO62vcBetKmqU
 9GJTKh3H72/amqNrfvf9E0Fl3rOv2U71x7k4KTwKVdUvITXEL/U0Vsl8a9WVNUZ0
 mZERzf0vrOCSN6gEzh4iNzMuZpKRSnNNP4iUilkwcD9cXPk85hFCNZx/nyMhKtcF
 9XzhSJgg1wJAwmBc9bdhkEm7jKYvxmslb4nMdQHoQNDGpEjwRbS7jQ/iHuD2AhPH
 DFTQ8LOhxxaTOiDjHJav0z/FRw+q6ZYbrkbLVt2qTOxfMxvHJdlfu7vTglq4PK9n
 Bo02K9zZisCM76uCUTHcp1aMjzU9tsx9tYipBz8YXNPoEuhYn/1F3tbt7FkCGBck
 wwTCe/J0+IKHWiXSAkZMj5PiSeMwliMpF7INdkLExkinwNu719dS6pTZDs/o8CMD
 M/0/M8jYnWOmylYDAbhKyEzAAHbAm0YGuUG7IVGP0H5YJucfmRGJzQMNaBTUUsP7
 pXdFA02rUTodCrSHqXscmA0Lb9ypsFnmAYMbb+YF5UNOW9zcQ9b2J23wmna7prIv
 FNKVAgDEjWk/SpN0mG3zZk7ixUagkbo9DfalZCBZsveZPktq1KZor1KaOIFzkUuB
 DUdtr4+GjhfDqFWywZ9+
 =dOhj
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

for linus: writeback reason binary tracing format fix

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: show writeback reason with __print_symbolic
2011-12-23 20:25:36 -08:00
Linus Torvalds
71448c1f4f Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kconfig: adapt update-po-config to new UML layout
2011-12-23 15:01:24 -08:00
Linus Torvalds
4d18de9449 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes
2011-12-23 14:59:08 -08:00
Linus Torvalds
827fa4c762 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: call d_instantiate after all ops are setup
  Btrfs: fix worker lock misuse in find_worker
2011-12-23 14:58:39 -08:00
Linus Torvalds
5d219c6b9f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().
2011-12-23 14:58:14 -08:00
Linus Torvalds
155d4551bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: xt_connbytes: handle negation correctly
  net: relax rcvbuf limits
  rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()
  net: introduce DST_NOPEER dst flag
  mqprio: Avoid panic if no options are provided
  bridge: provide a mtu() method for fake_dst_ops
2011-12-23 14:57:55 -08:00
David S. Miller
6350323ad8 Merge branch 'nf' of git://1984.lsi.us.es/net 2011-12-23 14:29:20 -05:00
Florian Westphal
0354b48f63 netfilter: xt_connbytes: handle negation correctly
"! --connbytes 23:42" should match if the packet/byte count is not in range.

As there is no explict "invert match" toggle in the match structure,
userspace swaps the from and to arguments
(i.e., as if "--connbytes 42:23" were given).

However, "what <= 23 && what >= 42" will always be false.

Change things so we use "||" in case "from" is larger than "to".

This change may look like it breaks backwards compatibility when "to" is 0.
However, older iptables binaries will refuse "connbytes 42:0",
and current releases treat it to mean "! --connbytes 0:42",
so we should be fine.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23 14:50:19 +01:00
Al Viro
08c422c27f Btrfs: call d_instantiate after all ops are setup
This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-23 08:02:26 -05:00
Chris Mason
8d532b2afb Btrfs: fix worker lock misuse in find_worker
Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-12-23 07:53:00 -05:00
Eric Dumazet
0fd7bac6b6 net: relax rcvbuf limits
skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.

Reported-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 02:15:14 -05:00
Xi Wang
a0a129f8b6 rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()
Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
cause a kernel oops due to insufficient bounds checking.

	if (count > 1<<30) {
		/* Enforce a limit to prevent overflow */
		return -EINVAL;
	}
	count = roundup_pow_of_two(count);
	table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count));

Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as:

	... + (count * sizeof(struct rps_dev_flow))

where sizeof(struct rps_dev_flow) is 8.  (1 << 30) * 8 will overflow
32 bits.

This patch replaces the magic number (1 << 30) with a symbolic bound.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 22:34:56 -05:00
Eric Dumazet
e688a60480 net: introduce DST_NOPEER dst flag
Chris Boot reported crashes occurring in ipv6_select_ident().

[  461.457562] RIP: 0010:[<ffffffff812dde61>]  [<ffffffff812dde61>]
ipv6_select_ident+0x31/0xa7

[  461.578229] Call Trace:
[  461.580742] <IRQ>
[  461.582870]  [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2
[  461.589054]  [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155
[  461.595140]  [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b
[  461.601198]  [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e
[nf_conntrack_ipv6]
[  461.608786]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.614227]  [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543
[  461.620659]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.626440]  [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a
[bridge]
[  461.633581]  [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459
[  461.639577]  [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76
[bridge]
[  461.646887]  [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f
[bridge]
[  461.653997]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.659473]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.665485]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.671234]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.677299]  [<ffffffffa0379215>] ?
nf_bridge_update_protocol+0x20/0x20 [bridge]
[  461.684891]  [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
[  461.691520]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.697572]  [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56
[bridge]
[  461.704616]  [<ffffffffa0379031>] ?
nf_bridge_push_encap_header+0x1c/0x26 [bridge]
[  461.712329]  [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95
[bridge]
[  461.719490]  [<ffffffffa037900a>] ?
nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
[  461.727223]  [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
[  461.734292]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.739758]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.746203]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.751950]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.758378]  [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56
[bridge]

This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.

Problem is present since commit 87c48fa3b46 (ipv6: make fragment
identifications less predictable)

Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.

Reported-by: Chris Boot <bootc@bootc.net>
Tested-by: Chris Boot <bootc@bootc.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 22:34:56 -05:00
Thomas Graf
7838f2ce36 mqprio: Avoid panic if no options are provided
Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 22:34:56 -05:00
Eric Dumazet
a13861a28b bridge: provide a mtu() method for fake_dst_ops
Commit 618f9bc74a039da76 (net: Move mtu handling down to the protocol
depended handlers) forgot the bridge netfilter case, adding a NULL
dereference in ip_fragment().

Reported-by: Chris Boot <bootc@bootc.net>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 22:34:56 -05:00
Linus Torvalds
ad1fca2003 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md/bitmap: It is OK to clear bits during recovery.
  md: don't give up looking for spares on first failure-to-add
  md/raid5: ensure correct assessment of drives during degraded reshape.
  md/linear: fix hot-add of devices to linear arrays.
2011-12-22 15:36:17 -08:00
NeilBrown
961902c0f8 md/bitmap: It is OK to clear bits during recovery.
commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a
regression which is annoying but fairly harmless.

When writing to an array that is undergoing recovery (a spare
in being integrated into the array), writing to the array will
set bits in the bitmap, but they will not be cleared when the
write completes.

For bits covering areas that have not been recovered yet this is not a
problem as the recovery will clear the bits.  However bits set in
already-recovered region will stay set and never be cleared.
This doesn't risk data integrity.  The only negatives are:
 - next time there is a crash, more resyncing than necessary will
   be done.
 - the bitmap doesn't look clean, which is confusing.

While an array is recovering we don't want to update the
'events_cleared' setting in the bitmap but we do still want to clear
bits that have very recently been set - providing they were written to
the recovering device.

So split those two needs - which previously both depended on 'success'
and always clear the bit of the write went to all devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:48 +11:00
NeilBrown
60fc13702a md: don't give up looking for spares on first failure-to-add
Before performing a recovery we try to remove any spares that
might not be working, then add any that might have become relevant.

Currently we abort on the first spare that cannot be added.
This is a false optimisation.
It is conceivable that - depending on rules in the personality - a
subsequent spare might be accepted.
Also the loop does other things like count the available spares and
reset the 'recovery_offset' value.

If we abort early these might not happen properly.

So remove the early abort.

In particular if you have an array what is undergoing recovery and
which has extra spares, then the recovery may not restart after as
reboot as the could of 'spares' might end up as zero.

Reported-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:19 +11:00
NeilBrown
30d7a48368 md/raid5: ensure correct assessment of drives during degraded reshape.
While reshaping a degraded array (as when reshaping a RAID0 by first
converting it to a degraded RAID4) we currently get confused about
which devices are in_sync.  In most cases we get it right, but in the
region that is being reshaped we need to treat non-failed devices as
in-sync when we have the data but haven't actually written it out yet.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:00 +11:00
NeilBrown
09cd9270ea md/linear: fix hot-add of devices to linear arrays.
commit d70ed2e4fafdbef0800e73942482bb075c21578b
broke hot-add to a linear array.
After that commit, metadata if not written to devices until they
have been fully integrated into the array as determined by
saved_raid_disk.  That patch arranged to clear that field after
a recovery completed.

However for linear arrays, there is no recovery - the integration is
instantaneous.  So we need to explicitly clear the saved_raid_disk
field.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:56:55 +11:00
David S. Miller
7cc8583372 sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().
This silently was working for many years and stopped working on
Niagara-T3 machines.

We need to set the MSIQ to VALID before we can set it's state to IDLE.

On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
errors.  The hypervisor documentation says, rather ambiguously, that
the MSIQ must be "initialized" before one can set the state.

I previously understood this to mean merely that a successful setconf()
operation has been performed on the MSIQ, which we have done at this
point.  But it seems to also mean that it has been set VALID too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22 13:46:53 -08:00