Commit Graph

200397 Commits

Author SHA1 Message Date
Wengang Wang
a39953dd95 ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
dlm_migration_can_proceed() for that purpose.  However, if the node is
down, dlm_migration_can_proceed() will also return "go ahead".  In this
rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
the BUG_ON() that trips over this condition.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 10:56:30 -07:00
Tao Ma
f5e27b6ddf ocfs2: Don't duplicate pages past i_size during CoW.
During CoW, the pages after i_size don't contain valid data, so there's
no need to read and duplicate them.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 10:54:28 -07:00
Dan Carpenter
e372357ba5 ocfs2: tighten up strlen() checking
This function is only called from one place and it's like this:
	dlm_register_domain(conn->cc_name, dlm_key, &fs_version);

The "conn->cc_name" is 64 characters long.  If strlen(conn->cc_name)
were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
strlen() doesn't count the NULL character.

In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
64 character buffers.  The only exception is nd_name from struct
o2nm_node.

Anyway I looked into it and in this case the domain string comes from
osb->uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
the code works, but I think it makes the code a little cleaner.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-12 13:57:53 -07:00
Tao Ma
121a39bb00 ocfs2: Make xattr reflink work with new local alloc reservation.
The new reservation code in local alloc has add the limitation
that the caller should handle the case that the local alloc
doesn't give use enough contiguous clusters. It make the old
xattr reflink code broken.

So this patch udpate the xattr reflink code so that it can
handle the case that local alloc give us one cluster at a time.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-12 13:57:50 -07:00
Tao Ma
a78f9f4668 ocfs2: make xattr extension work with new local alloc reservation.
The old ocfs2_xattr_extent_allocation is too optimistic about
the clusters we can get. So actually if the file system is
too fragmented, ocfs2_add_clusters_in_btree will return us
with EGAIN and we need to allocate clusters once again.

So this patch change it to a while loop so that we can allocate
clusters until we reach clusters_to_add.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-07-12 13:57:24 -07:00
Tao Ma
0a463b74e7 ocfs2: Remove the redundant cpu_to_le64.
In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno.
But actually bg->bg_blkno is already changed to little endian
in ocfs2_block_group_fill. So remove the extra cpu_to_le64.

Reported-by: Marcos Matsunaga <Marcos.Matsunaga@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-12 13:56:18 -07:00
Wengang Wang
f471c9df92 ocfs2/dlm: don't access beyond bitmap size
dlm->recovery_map is defined as
	unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)];

We should treat O2NM_MAX_NODES as the bit map size in bits.
This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-12 13:56:14 -07:00
Joel Becker
693c241a5f ocfs2: No need to zero pages past i_size.
When ocfs2 fills a hole, it does so by allocating clusters.  When a
cluster is larger than the write, ocfs2 must zero the portions of the
cluster outside of the write.  If the clustersize is smaller than a
pagecache page, this is handled by the normal pagecache mechanisms, but
when the clustersize is larger than a page, ocfs2's write code will zero
the pages adjacent to the write.  This makes sure the entire cluster is
zeroed correctly.

Currently ocfs2 behaves exactly the same when writing past i_size.
However, this means ocfs2 is writing zeroed pages for portions of a new
cluster that are beyond i_size.  The page writeback code isn't expecting
this.  It treats all pages past the one containing i_size as left behind
due to a previous truncate operation.

Thankfully, ocfs2 calculates the number of pages it will be working on
up front.  The rest of the write code merely honors the original
calculation.  We can simply trim the number of pages to only cover the
actual file data.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-07-12 13:55:27 -07:00
Joel Becker
5693486bad ocfs2: Zero the tail cluster when extending past i_size.
ocfs2's allocation unit is the cluster.  This can be larger than a block
or even a memory page.  This means that a file may have many blocks in
its last extent that are beyond the block containing i_size.  There also
may be more unwritten extents after that.

When ocfs2 grows a file, it zeros the entire cluster in order to ensure
future i_size growth will see cleared blocks.  Unfortunately,
block_write_full_page() drops the pages past i_size.  This means that
ocfs2 is actually leaking garbage data into the tail end of that last
cluster.  This is a bug.

We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
when a write or truncate is past i_size.  They will use
ocfs2_zero_extend() to ensure the data is properly zeroed.

Older versions of ocfs2_zero_extend() simply zeroed every block between
i_size and the zeroing position.  This presumes three things:

1) There is allocation for all of these blocks.
2) The extents are not unwritten.
3) The extents are not refcounted.

(1) and (2) hold true for non-sparse filesystems, which used to be the
only users of ocfs2_zero_extend().  (3) is another bug.

Since we're now using ocfs2_zero_extend() for sparse filesystems as
well, we teach ocfs2_zero_extend() to check every extent between
i_size and the zeroing position.  If the extent is unwritten, it is
ignored.  If it is refcounted, it is CoWed.  Then it is zeroed.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-07-08 13:25:35 -07:00
Joel Becker
a4bfb4cf11 ocfs2: When zero extending, do it by page.
ocfs2_zero_extend() does its zeroing block by block, but it calls a
function named ocfs2_write_zero_page().  Let's have
ocfs2_write_zero_page() handle the page level.  From
ocfs2_zero_extend()'s perspective, it is now page-at-a-time.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-07-08 13:24:49 -07:00
Tao Ma
1739da4054 ocfs2: Limit default local alloc size within bitmap range.
In commit 6b82021b9e, we increase
our local alloc size and calculate how much megabytes we can
get according to group size and volume size.
But we also need to check the maximum bits a local alloc block
bitmap can have. With a bs=512, cs=32K, local volume with 160G,
it calculate 96MB while the maximum local alloc size is only
76M. So the bitmap will overflow and corrupt the system truncate
log file. See bug
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-06-15 16:50:43 -07:00
Tao Ma
40f165f416 ocfs2: Move orphan scan work to ocfs2_wq.
We used to let orphan scan work in the default work queue,
but there is a corner case which will make the system deadlock.
The scenario is like this:
1. set heartbeat threadshold to 200. this will allow us to have a
   great chance to have a orphan scan work before our quorum decision.
2. mount node 1.
3. after 1~2 minutes, mount node 2(in order to make the bug easier
   to reproduce, better add maxcpus=1 to kernel command line).
4. node 1 do orphan scan work.
5. node 2 do orphan scan work.
6. node 1 do orphan scan work. After this, node 1 hold the orphan scan
   lock while node 2 know node 1 is the master.
7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection).

Now when node 2 begins orphan scan, the system queue is blocked.

The root cause is that both orphan scan work and quorum decision work
will use the system event work queue. orphan scan has a chance of
blocking the event work queue(in dlm_wait_for_node_death) so that there
is no chance for quorum decision work to proceed.

This patch resolve it by moving orphan scan work to ocfs2_wq.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-06-15 15:43:48 -07:00
Julia Lawall
6469272c35 fs/ocfs2/dlm: Add missing spin_unlock
Add a spin_unlock missing on the error path.  Unlock as in the other code
that leads to the leave label.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* spin_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* spin_unlock(E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-06-15 15:43:46 -07:00
Linus Torvalds
7e27d6e778 Linux 2.6.35-rc3 2010-06-11 19:14:04 -07:00
Linus Torvalds
4cea8706c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  wimax/i2400m: fix missing endian correction read in fw loader
  net8139: fix a race at the end of NAPI
  pktgen: Fix accuracy of inter-packet delay.
  pkt_sched: gen_estimator: add a new lock
  net: deliver skbs on inactive slaves to exact matches
  ipv6: fix ICMP6_MIB_OUTERRORS
  r8169: fix mdio_read and update mdio_write according to hw specs
  gianfar: Revive the driver for eTSEC devices (disable timestamping)
  caif: fix a couple range checks
  phylib: Add support for the LXT973 phy.
  net: Print num_rx_queues imbalance warning only when there are allocated queues
2010-06-11 14:20:03 -07:00
Linus Torvalds
7ae1277a52 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / x86: Save/restore MISC_ENABLE register
2010-06-11 14:19:45 -07:00
Linus Torvalds
b25b550bb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: The file argument for fsync() is never null
  Btrfs: handle ERR_PTR from posix_acl_from_xattr()
  Btrfs: avoid BUG when dropping root and reference in same transaction
  Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
  Btrfs: should add a permission check for setfacl
  Btrfs: btrfs_lookup_dir_item() can return ERR_PTR
  Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs
  Btrfs: unwind after btrfs_start_transaction() errors
  Btrfs: btrfs_iget() returns ERR_PTR
  Btrfs: handle kzalloc() failure in open_ctree()
  Btrfs: handle error returns from btrfs_lookup_dir_item()
  Btrfs: Fix BUG_ON for fs converted from extN
  Btrfs: Fix null dereference in relocation.c
  Btrfs: fix remap_file_pages error
  Btrfs: uninitialized data is check_path_shared()
  Btrfs: fix fallocate regression
  Btrfs: fix loop device on top of btrfs
2010-06-11 14:18:47 -07:00
Linus Torvalds
eda054770e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: clear bridge resource range if BIOS assigned bad one
  PCI: hotplug/cpqphp, fix NULL dereference
  Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
  PCI: change resource collision messages from KERN_ERR to KERN_INFO
2010-06-11 14:15:44 -07:00
Yinghai Lu
837c4ef13c PCI: clear bridge resource range if BIOS assigned bad one
Yannick found that video does not work with 2.6.34.  The cause of this
bug was that the BIOS had assigned the wrong range to the PCI bridge
above the video device.  Before 2.6.34 the kernel would have shrunk
the size of the bridge window, but since
  d65245c PCI: don't shrink bridge resources
the kernel will avoid shrinking BIOS ranges.

So zero out the old range if we fail to claim it at boot time; this will
cause us to allocate a new range at startup, restoring the 2.6.34
behavior.

Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009.

Reported-by: Yannick <yannick.roehlly@free.fr>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:24:51 -07:00
Jiri Slaby
a7ef7d1f5e PCI: hotplug/cpqphp, fix NULL dereference
There are devices out there which are PCI Hot-plug controllers with
compaq PCI IDs, but are not bridges, hence have pdev->subordinate
NULL. But cpqphp expects the pointer to be non-NULL.

Add a check to the probe function to avoid oopses like:
BUG: unable to handle kernel NULL pointer dereference at 00000050
IP: [<f82e3c41>] cpqhpc_probe+0x951/0x1120 [cpqphp]
*pdpt = 0000000033779001 *pde = 0000000000000000
...

The device here was:
00:0b.0 PCI Hot-plug controller [0804]: Compaq Computer Corporation PCI Hotplug Controller [0e11:a0f7] (rev 11)
	Subsystem: Compaq Computer Corporation Device [0e11:a2f8]

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:10:21 -07:00
Jesse Barnes
3be434f024 Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
This reverts commit 75568f8094.

Since they're just a convenience anyway, remove these symlinks since
they're causing duplicate filename errors in the wild.

Acked-by: Alex Chiang <achiang@canonical.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:08:37 -07:00
Bjorn Helgaas
f6d440daeb PCI: change resource collision messages from KERN_ERR to KERN_INFO
We can often deal with PCI resource issues by moving devices around.  In
that case, there's no point in alarming the user with messages like these.
There are many bug reports where the message itself is the only problem,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/413419 .

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:08:14 -07:00
Dan Carpenter
6f902af400 Btrfs: The file argument for fsync() is never null
The "file" argument for fsync is never null so we can remove this check.

What drew my attention here is that 7ea8085910: "drop unused dentry
argument to ->fsync" introduced an unconditional dereference at the
start of the function and that generated a smatch warning.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:40 -04:00
Dan Carpenter
834e74759a Btrfs: handle ERR_PTR from posix_acl_from_xattr()
posix_acl_from_xattr() returns both ERR_PTRs and null, but it's OK to
pass null values to set_cached_acl()

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:39 -04:00
Sage Weil
15e7000095 Btrfs: avoid BUG when dropping root and reference in same transaction
If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes
with end_transaction(), the cleaner kthread may come in and
drop the root in the same transaction.  If that's the case, the
root's refs still == 1 in the tree when btrfs_del_root() deletes
the item, because commit_fs_roots() hasn't updated it yet (that
happens during the commit).

This wasn't a problem before only because
btrfs_ioctl_snap_destroy() would commit the transaction before dropping
the dentry reference, so the dead root wouldn't get queued up until
after the fs root item was updated in the btree.

Since it is not an error to drop the root reference and the root in the
same transaction, just drop the BUG_ON() in btrfs_del_root().

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:39 -04:00
Shi Weihua
731e3d1b43 Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
when used Posix File System Test Suite(pjd-fstest) to test btrfs,
some cases about setfacl failed when noacl mount option used.
I simplified used commands in pjd-fstest, and the following steps
can reproduce it.
------------------------
# cd btrfs-part/
# mkdir aaa
# setfacl -m m::rw aaa    <- successed, but not expected by pjd-fstest.
------------------------
I checked ext3, a warning message occured, like as:
  setfacl: aaa/: Operation not supported
Certainly, it's expected by pjd-fstest.

So, i compared acl.c of btrfs and ext3. Based on that, a patch created.
Fortunately, it works.

Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:38 -04:00
Shi Weihua
2f26afba46 Btrfs: should add a permission check for setfacl
On btrfs, do the following
------------------
# su user1
# cd btrfs-part/
# touch aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rw-
  group::rw-
  other::r--
# su user2
# cd btrfs-part/
# setfacl -m u::rwx aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rwx           <- successed to setfacl
  group::rw-
  other::r--
------------------
but we should prohibit it that user2 changing user1's acl.
In fact, on ext3 and other fs, a message occurs:
  setfacl: aaa: Operation not permitted

This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:37 -04:00
Dan Carpenter
cf1e99a4e0 Btrfs: btrfs_lookup_dir_item() can return ERR_PTR
btrfs_lookup_dir_item() can return either ERR_PTRs or null.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:37 -04:00
Dan Carpenter
3140c9a34b Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs
btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a
check for that.  It's not clear to me if it can also return NULL
pointers or not so I left the original NULL pointer check as is.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:36 -04:00
Dan Carpenter
d327099a23 Btrfs: unwind after btrfs_start_transaction() errors
This was added by a22285a6a3: "Btrfs: Integrate metadata reservation
with start_transaction".  If we goto out here then we skip all the
unwinding and there are locks still held etc.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:35 -04:00
Dan Carpenter
4cbd1149fb Btrfs: btrfs_iget() returns ERR_PTR
btrfs_iget() returns an ERR_PTR() on failure and not null.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:35 -04:00
Dan Carpenter
676e4c8639 Btrfs: handle kzalloc() failure in open_ctree()
Unwind and return -ENOMEM if the allocation fails here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:34 -04:00
Dan Carpenter
fb4f6f910c Btrfs: handle error returns from btrfs_lookup_dir_item()
If btrfs_lookup_dir_item() fails, we should can just let the mount fail
with an error.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:57:33 -04:00
Yan, Zheng
3bf84a5a83 Btrfs: Fix BUG_ON for fs converted from extN
Tree blocks can live in data block groups in FS converted from extN.
So it's easy to trigger the BUG_ON.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:48:35 -04:00
Yan, Zheng
046f264f6b Btrfs: Fix null dereference in relocation.c
Fix a potential null dereference in relocation.c

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 15:48:34 -04:00
David S. Miller
e79aa86710 Merge branch 'wimax-2.6.35.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2010-06-11 12:38:23 -07:00
Inaky Perez-Gonzalez
a385a53e65 wimax/i2400m: fix missing endian correction read in fw loader
i2400m_fw_hdr_check() was accessing hardware field
bcf_hdr->module_type (little endian 32) without converting to host
byte sex.

Reported-by: Данилин Михаил <mdanilin@nsg.net.ru>

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2010-06-11 11:51:20 -07:00
Linus Torvalds
891a9894ee Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  kbuild: Create output directory in Makefile.modbuiltin
  kbuild: Generate modules.builtin in make modules
2010-06-11 09:55:50 -07:00
Linus Torvalds
f1f6ea3522 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: avoid validate_cis failure on CIS override
  pcmcia: dev_node removal bugfix
  pcmcia: yenta_socket.c Remove extra #ifdef CONFIG_YENTA_TI
  pcmcia: only keep saved I365_CSCINT flag if there is no PCI irq
2010-06-11 09:55:21 -07:00
Linus Torvalds
63c70a0d7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: try to send partial cap release on cap message on missing inode
  ceph: release cap on import if we don't have the inode
  ceph: fix misleading/incorrect debug message
  ceph: fix atomic64_t initialization on ia64
  ceph: fix lease revocation when seq doesn't match
  ceph: fix f_namelen reported by statfs
  ceph: fix memory leak in statfs
  ceph: fix d_subdirs ordering problem
2010-06-11 09:52:23 -07:00
Miao Xie
058a457ef0 Btrfs: fix remap_file_pages error
when we use remap_file_pages() to remap a file, remap_file_pages always return
error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 11:46:12 -04:00
Dan Carpenter
0e4dcbef1c Btrfs: uninitialized data is check_path_shared()
refs can be used with uninitialized data if btrfs_lookup_extent_info()
fails on the first pass through the loop.  In the original code if that
happens then check_path_shared() probably returns 1, this patch
changes it to return 1 for safety.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 11:46:12 -04:00
Josef Bacik
8360977972 Btrfs: fix fallocate regression
Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we
dropped passing the mode to the function that actually does the preallocation.
This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 11:46:12 -04:00
Miao Xie
4a001071d3 Btrfs: fix loop device on top of btrfs
We cannot use the loop device which has been connected to a file in the btrf

The reproduce steps is following:
 # dd if=/dev/zero of=vdev0 bs=1M count=1024
 # losetup /dev/loop0 vdev0
 # mkfs.btrfs /dev/loop0
 ...
 failed to zero device start -5

The reason is that the btrfs don't implement either ->write_begin or ->write
the VFS API, so we fix it by setting ->write to do_sync_write().

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-06-11 11:46:11 -04:00
Figo.zhang
349124a007 net8139: fix a race at the end of NAPI
fix a race at the end of NAPI complete processing, it had
better do __napi_complete() first before re-enable interrupt.

Signed-off-by:Figo.zhang <figo1802@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:14:08 -07:00
Daniel Turull
07a0f0f07a pktgen: Fix accuracy of inter-packet delay.
This patch correct a bug in the delay of pktgen. 
It makes sure the inter-packet interval is accurate.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:08:11 -07:00
Eric Dumazet
ae638c47dc pkt_sched: gen_estimator: add a new lock
gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".

Add a new fine grained lock instead of trying to use RTNL in netfilter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 22:53:52 -07:00
John Fastabend
597a264b1a net: deliver skbs on inactive slaves to exact matches
Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop().  This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.

For example,

vlanx -> bond0 -> ethx

will be dropped in the vlan path and not delivered to any
packet handlers at all.  However,

bond0 -> vlanx -> ethx

and

bond0 -> ethx

will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.

This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv().  Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly.  This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.

I have tested the following 4 configurations in failover modes
and load balancing modes.

# bond0 -> ethx

# vlanx -> bond0 -> ethx

# bond0 -> vlanx -> ethx

# bond0 -> ethx
            |
  vlanx -> --

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 22:23:34 -07:00
Sage Weil
2b2300d62e ceph: try to send partial cap release on cap message on missing inode
If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately.  This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.

If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:30:25 -07:00
Sage Weil
3d7ded4d81 ceph: release cap on import if we don't have the inode
If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:30:07 -07:00