linux

iv/linux

Author	SHA1	Message	Date
Tao Ma	138211515c	ocfs2: Optimize inode allocation by remembering last group In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	b80b549c35	ocfs2: re-order ocfs2_empty_dir checks ocfs2_empty_dir() is far more expensive than checking link count. Since both need to be checked at the same time, we can improve performance by checking link count first. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	198a1ca3b7	ocfs2: Increase max links count Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	4ed8a6bb08	ocfs2: Store dir index records inline Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	9b7895efac	ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:15 -07:00
Mark Fasheh	4a12ca3a00	ocfs2: Introduce dir lookup helper struct Many directory manipulation calls pass around a tuple of dirent, and it's containing buffer_head. Dir indexing has a bit more state, but instead of adding yet more arguments to functions, we introduce 'struct ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but future patches will add more state. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:15 -07:00
Tiger Yang	d9ae49d6e2	ocfs2: tweak to get the maximum inline data size with xattr Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-03-12 16:45:46 -07:00
Joel Becker	13723d00e3	ocfs2: Use metadata-specific ocfs2_journal_access_() functions. The per-metadata-type ocfs2_journal_access_() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Jan Kara	a90714c150	ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Joel Becker	b657c95c11	ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Tiger Yang	89c38bd0ad	ocfs2: add ocfs2_init_acl in mknod We need to get the parent directories acls and let the new child inherit it. To this, we add additional calculations for data/metadata allocation. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	534eadddc1	ocfs2: add ocfs2_init_security in during file create Security attributes must be set when creating a new inode. We do this in three steps. - First, get security xattr's name and value by security_operation - Calculate and reserve the meta data and clusters needed by this security xattr before starting transaction - Finally, we set it before add_entry Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	f5d362022a	ocfs2: move new inode allocation out of the transaction Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:19 -08:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	b19c2a3b83	CRED: Wrap task credential accesses in the OCFS2 filesystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:59 +11:00
Jan Kara	b99835c168	ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked() and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed space was not released). Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-11-10 09:51:46 -08:00
Jan Kara	87cfa00432	ocfs2: Fix checking of return value of new_inode() new_inode() does not return ERR_PTR() but NULL in case of failure. Correct checking of the return value. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-11-10 09:51:46 -08:00
Joel Becker	0fcaa56a2a	ocfs2: Simplify ocfs2_read_block() More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:51:57 -07:00
Joel Becker	31d33073ca	ocfs2: Require an inode for ocfs2_read_block(s)(). Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:43:29 -07:00
Mark Fasheh	a81cb88b64	ocfs2: Don't check for NULL before brelse() This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh	2008-10-13 17:02:44 -07:00
Tiger Yang	cf1d6c763f	ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:02 -07:00
Tao Ma	0eb8d47e69	ocfs2: Make high level btree extend code generic Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:59 -07:00
Jan Kara	5dabd69515	ocfs2: Improve rename locking ocfs2_rename() was being too aggressive with the rename lock - we only need it for certain forms of directory rename. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:11 -07:00
Tao Ma	4d0ddb2ce2	ocfs2: Add inode stealing for ocfs2_reserve_new_inode Inode allocation is modified to look in other nodes allocators during extreme out of space situations. We retry our own slot when space is freed back to the global bitmap, or whenever we've allocated more than 1024 inodes from another slot. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:10 -07:00
Jan Kara	5fa0613ea5	ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Mark Fasheh	e63aecb651	ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:01 -08:00
Mark Fasheh	34d024f843	ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:34 -08:00
Srinivas Eeda	e325a88f17	ocfs2: fix rename vs unlink race If another node unlinks the destination while ocfs2_rename() is waiting on a cluster lock, ocfs2_rename() simply logs an error and continues. This causes a crash because the renaming node is now trying to delete a non-existent inode. The correct solution is to return -ENOENT. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-06 15:35:40 -08:00
Mark Fasheh	5b6a3a2b4a	ocfs2: Write support for directories with inline data Create all new directories with OCFS2_INLINE_DATA_FL and the inline data bytes formatted as an empty directory. Inode size field reflects the actual amount of inline data available, which makes searching for dirent space very similar to the regular directory search. Inline-data directories are automatically pushed out to extents on any insert request which is too large for the available space. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:41 -07:00
Mark Fasheh	38760e2432	ocfs2: Rename cleanups ocfs2_rename() does direct manipulation of the dirent it's gotten back from a directory search. Wrap this manipulation inside of a function so that we can transparently change directory update behavior in the future. As an added bonus, this gets rid of an ugly macro. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:38 -07:00
Mark Fasheh	be94d11704	ocfs2: Provide convenience function for ino lookup A couple paths which needed to just match a parent dir + name pair to an inode number were a bit messy because they had to deal with ocfs2_find_files_on_disk() which returns a larger number of values. Provide a convenience function, ocfs2_lookup_ino_from_name() which internalizes all the extra accounting. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:38 -07:00
Mark Fasheh	316f4b9f98	ocfs2: Move directory manipulation code into dir.c The code for adding, removing, deleting directory entries was splattered all over namei.c. I'd rather have this all centralized, so that it's easier to make changes for inline dir data, and eventually indexed directories. None of the code in any of the functions was changed. I only removed the static keyword from some prototypes so that they could be exported. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:36 -07:00
Sunil Mushran	480214d71f	ocfs2: Fix rename/extend race If one process is extending a file while another is renaming it, there exists a window when rename could flush the old inode's stale i_size to disk. This patch recognizes the fact that rename is only updating the old inode's ctime, so it ensures only that value is flushed to disk. Signed-off-by: Sunil Mushran <sunil.musran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:27:10 -07:00
Mark Fasheh	2ae99a6037	ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:04 -07:00
Mark Fasheh	1ca1a111b1	ocfs2: fix sparse warnings in fs/ocfs2 None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:08 -07:00
Mark Fasheh	8110b073a9	ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:40 -07:00
Mark Fasheh	4f902c3772	ocfs2: Fix extent lookup to return true size of holes Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:45 -07:00
Mark Fasheh	49cb8d2d49	ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:41 -07:00
Mark Fasheh	363041a5f7	ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:31 -07:00
Mark Fasheh	dcd0538ff4	ocfs2: sparse b-tree support Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:44:03 -07:00
Tiger Yang	500086300e	ocfs2: Remove delete inode vote Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:39:48 -07:00
Mark Fasheh	a9f5f70739	ocfs2: filter more error prints We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:39:08 -07:00
Uwe Kleine-König	1b3c3714cb	Fix typos concerning hierarchy heirarchical, hierachical -> hierarchical heirarchy, hierachy -> hierarchy Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 19:23:03 +01:00
Arjan van de Ven	92e1d5be91	[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Mark Fasheh	592282cf2e	ocfs2: Directory c/mtime update fixes ocfs2 wasn't updating c/mtime on directories during dirent creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and __ocfs2_add_entry() by adding the proper code to update the struct inode and push the change out to disk. This helps rename/unlink on nfs exported file systems in particular as those clients compare directory time values to avoid a full re-reading a directory which hasn't changed. ocfs2_rename() loses some superfluous error handling as a result of this patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-01-21 16:18:49 -08:00
Sunil Mushran	c271c5c22b	ocfs2: local mounts This allows users to format an ocfs2 file system with a special flag, OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it will not use any cluster services, nor will it require a cluster configuration, thus acting like a 'local' file system. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-07 17:37:53 -08:00
Tiger Yang	d38eb8db6a	ocfs2: implement i_op->permission Implement .permission() in ocfs2_file_iops, ocfs2_special_file_iops and ocfs2_dir_iops. This helps us avoid some multi-node races with mode change and vfs operations. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:29:14 -08:00
Mark Fasheh	1fabe1481f	ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:28 -08:00
Mark Fasheh	65eff9ccf8	ocfs2: remove handle argument to ocfs2_start_trans() All callers either pass in NULL directly, or a local variable that is already set to NULL. The internals of ocfs2_start_trans() get a nice cleanup as a result. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:23 -08:00
Mark Fasheh	02dc1af44e	ocfs2: pass ocfs2_super * into ocfs2_commit_trans() This sets us up to remove handle->journal. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:08 -08:00

1 2

72 Commits