IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
It's a relatively complete function to accomplish defragmentation for entire
or partial extent, one journal handle was kept during the operation, it was
logically doing one more thing than ocfs2_move_extent() acutally, yes, it's
claiming the new clusters itself;-)
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
The moving range of __ocfs2_move_extent() was within one extent always, it
consists following parts:
1. Duplicates the clusters in pages to new_blkoffset, where extent to be moved.
2. Split the original extent with new extent, coalecse the nearby extents if possible.
3. Append old clusters to truncate log, or decrease_refcount if the extent was refcounted.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
ocfs2_lock_allocators_move_extents() was like the common ocfs2_lock_allocators(),
to lock metadata and data alloctors during extents moving, reserve appropriate
metadata blocks and data clusters, also performa a best- effort to calculate the
credits for journal transaction in one run of movement.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
The original goal of commonizing these funcs is to benefit defraging/extent_moving
codes in the future, based on the fact that reflink and defragmentation having
the same Copy-On-Wrtie mechanism.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
This new code is a bit more complicated than former ones, the goal is to
show user all statistics required to take a deep insight into filesystem
on how the disk is being fragmentaed.
The goal is achieved by scaning global bitmap from (cluster)group to group
to figure out following factors in the filesystem:
- How many free chunks in a fixed size as user requested.
- How many real free chunks in all size.
- Min/Max/Avg size(in) clusters of free chunks.
- How do free chunks distribute(in size) in terms of a histogram,
just like following:
---------------------------------------------------------
Extent Size Range : Free extents Free Clusters Percent
32K... 64K- : 1 1 0.00%
1M... 2M- : 9 288 0.03%
8M... 16M- : 2 831 0.09%
32M... 64M- : 1 2047 0.23%
128M... 256M- : 1 8191 0.92%
256M... 512M- : 2 21706 2.43%
512M... 1024M- : 27 858623 96.29%
---------------------------------------------------------
Userspace ioctl() call eventually gets the above info returned by passing
a 'struct ocfs2_info_freefrag' with the chunk_size being specified first.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
The new code is dedicated to calculate free inodes number of all inode_allocs,
then return the info to userpace in terms of an array.
Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
from userspace, is now going to be involved. setting the flag on means no cluster
coherency considered, usually, userspace tools choose none-coherency strategy by
default for the sake of performace.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
I am working on patch to add quota as a built-in feature for ext4
filesystem. The implementation is based on the design given at
https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4.
This patch reserves the inode numbers 3 and 4 for quota purposes and
also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Prevent an ext4 filesystem from being mounted multiple times.
A sequence number is stored on disk and is periodically updated (every 5
seconds by default) by a mounted filesystem.
At mount time, we now wait for s_mmp_update_interval seconds to make sure
that the MMP sequence does not change.
In case of failure, the nodename, bdevname and the time at which the MMP
block was last updated is displayed.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
I found the issue that the number of free blocks went negative.
# stat -f /mnt/mp1/
File: "/mnt/mp1/"
ID: e175ccb83a872efe Namelen: 255 Type: ext2/ext3
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 258022 Free: -15 Available: -13122
Inodes: Total: 65536 Free: 63029
f_bfree in struct statfs will go negative when the filesystem has
few free blocks. Because the number of dirty blocks is bigger than
the number of free blocks in the following two cases.
CASE 1:
ext4_da_writepages
mpage_da_map_and_submit
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_diskspace_used
percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
<--- interrupt statfs systemcall --->
ext4_da_update_reserve_space
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
used + ei->i_allocated_meta_blocks);
CASE 2:
ext4_write_begin
__block_write_begin
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_diskspace_used
percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
<--- interrupt statfs systemcall --->
percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
To avoid the issue, this patch ensures that f_bfree is non-negative.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
We should protect reading bd_info->bb_first_free with the group lock
because otherwise we might miss some free blocks. This is not a big deal
at all, but the change to do right thing is really simple, so lets do
that.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently we are loading buddy ext4_mb_load_buddy() for every block
group we are going through in ext4_trim_fs() in many cases just to find
out that there is not enough space to be bothered with. As Amir Goldstein
suggested we can use bb_free information directly from ext4_group_info.
This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather
get the ext4_group_info via ext4_get_group_info() and use the bb_free
information directly from that. This avoids unnecessary call to load
buddy in the case the group does not have enough free space to trim.
Loading buddy is now moved to ext4_trim_all_free().
Tested by me with xfstests 251.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
jbd: Fix comment to match the code in journal_start()
jbd/jbd2: remove obsolete summarise_journal_usage.
jbd: Fix forever sleeping process in do_get_write_access()
ext2: fix error msg when mounting fs with too-large blocksize
jbd: fix fsync() tid wraparound bug
ext3: Fix fs corruption when make_indexed_dir() fails
ext3: Fix lock inversion in ext3_symlink()
jbd2__journal_start() returns an ERR_PTR() value rather than NULL on
failure.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (43 commits)
TOMOYO: Fix wrong domainname validation.
SELINUX: add /sys/fs/selinux mount point to put selinuxfs
CRED: Fix load_flat_shared_library() to initialise bprm correctly
SELinux: introduce path_has_perm
flex_array: allow 0 length elements
flex_arrays: allow zero length flex arrays
flex_array: flex_array_prealloc takes a number of elements, not an end
SELinux: pass last path component in may_create
SELinux: put name based create rules in a hashtable
SELinux: generic hashtab entry counter
SELinux: calculate and print hashtab stats with a generic function
SELinux: skip filename trans rules if ttype does not match parent dir
SELinux: rename filename_compute_type argument to *type instead of *con
SELinux: fix comment to state filename_compute_type takes an objname not a qstr
SMACK: smack_file_lock can use the struct path
LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH
LSM: split LSM_AUDIT_DATA_FS into _PATH and _INODE
SELINUX: Make selinux cache VFS RCU walks safe
SECURITY: Move exec_permission RCU checks into security modules
SELinux: security_read_policy should take a size_t not ssize_t
...
In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure. It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.
Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant). This is cleaner and more
robust.
Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.
Signed-off-by: Sage Weil <sage@newdream.net>
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (52 commits)
UBIFS: switch to dynamic printks
UBIFS: fix kernel-doc comments
UBIFS: fix extremely rare mount failure
UBIFS: simplify LEB recovery function further
UBIFS: always cleanup the recovered LEB
UBIFS: clean up LEB recovery function
UBIFS: fix-up free space on mount if flag is set
UBIFS: add the fixup function
UBIFS: add a superblock flag for free space fix-up
UBIFS: share the next_log_lnum helper
UBIFS: expect corruption only in last journal head LEBs
UBIFS: synchronize write-buffer before switching to the next bud
UBIFS: remove BUG statement
UBIFS: change bud replay function conventions
UBIFS: substitute the replay tree with a replay list
UBIFS: simplify replay
UBIFS: store free and dirty space in the bud replay entry
UBIFS: remove unnecessary stack variable
UBIFS: double check that buds are replied in order
UBIFS: make 2 functions static
...
Blocks for the allocation btree are allocated from and released to
the AGFL, and thus frequently reused. Even worse we do not have an
easy way to avoid using an AGFL block when it is discarded due to
the simple FILO list of free blocks, and thus can frequently stall
on blocks that are currently undergoing a discard.
Add a flag to the busy extent tracking structure to skip the discard
for allocation btree blocks. In normal operation these blocks are
reused frequently enough that there is no need to discard them
anyway, but if they spill over to the allocation btree as part of a
balance we "leak" blocks that we would otherwise discard. We could
fix this by adding another flag and keeping these block in the
rbtree even after they aren't busy any more so that we could discard
them when they migrate out of the AGFL. Given that this would cause
significant overhead I don't think it's worthwile for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.
The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard. Note that we don't bother
supporting discard for the non-delaylog mode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
jbd2_log_start_commit() returns 1 only when we really start a
transaction. But we also need to wait for a transaction when the
commit is already running. Fix this problem by waiting for
transaction commit unconditionally (which is just a quick check if the
transaction is already committed).
Also we have to be more careful with sending of a barrier because when
transaction is being committed in parallel to ext4_sync_file()
running, we cannot be sure that the barrier the journalling code sends
happens after we wrote all the data for fsync (note that not every
data writeout needs to trigger metadata changes thus commit of some
metadata changes can be running while other data is still written
out). So use jbd2_will_send_data_barrier() helper to detect the common
cases when we can be sure barrier will be issued by the commit code
and issue the barrier ourselves in the remaining cases.
Reported-by: Edward Goggin <egoggin@vmware.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Provide a function which returns whether a transaction with given tid
will send a flush to the filesystem device. The function will be used
by ext4 to detect whether fsync needs to send a separate flush or not.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In data=ordered mode, it's theoretically possible (however rare) that
an inode is filed to transaction's t_inode_list and a flusher thread
writes all the data and inode is reclaimed before the transaction
starts to commit. In such a case, we could erroneously omit sending a
flush to file system device when it is different from the journal
device (because data can still be in disk cache only).
Fix the problem by setting a flag in a transaction when some inode is added
to it and then send disk flush in the commit code when the flag is set.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
To get delayed-extent information, ext4_ext_fiemap_cb() looks up
pagecache, it thus collects information starting from a page's
head block.
If blocksize < pagesize, the beginning blocks of a page may lies
before the request range. So ext4_ext_fiemap_cb() should proceed
ignoring them, because they has been handled before. If no mapped
buffer in the range is found in the 1st page, we need to look up
the 2nd page, otherwise delayed-extents after a hole will be ignored.
Without this patch, xfstests 225 will hung on ext4 with 1K block.
Reported-by: Amir Goldstein <amir73il@users.sourceforge.net>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Change function param_set_dlmfs_capabilities from 'extern' to 'static' since
function param_get_dlmfs_capabilities is also 'static'.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Patch cleans up the gunk added by commit 388c4bcb4e63e88fb1f312a2f5f9eb2623afcf5b.
dlm_is_lockres_migrateable() now returns 1 if lockresource is deemed
migrateable and 0 if not.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Add ocfs2_trim_fs to support trimming freed clusters in the
volume. A range will be given and all the freed clusters greater
than minlen will be discarded to the block layer.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
As ocfs2 supports relatime and strictatime, we need update the
relative document. Atime_quantum need work with strictatime, so only
show it in procfs when mount with strictatime.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: Fix statfs->f_namelen
fat: Replace all printk with fat_msg()
fat: Add fat_msg() function for preformated FAT messages
fat: Convert fat_fs_error to use %pV
fat: Fix possible null deref in fat_cache_add()
fat: use new setup() for ->dir_ops too
journal_start returns an ERR_PTR() value rather than NULL on failure.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: obey minleft values during extent allocation correctly
xfs: reset buffer pointers before freeing them
xfs: avoid getting stuck during async inode flushes
xfs: fix xfs_itruncate_start tracing
xfs: fix duplicate workqueue initialisation
xfs: kill off xfs_printk()
xfs: fix race condition in AIL push trigger
xfs: make AIL target updates and compares 32bit safe.
xfs: always push the AIL to the target
xfs: exit AIL push work correctly when AIL is empty
xfs: ensure reclaim cursor is reset correctly at end of AG
xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE
xfs: fix compiler warning in xfs_trace.h
xfs: cleanup duplicate initializations
xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy
xfs: exact busy extent tracking
xfs: do not immediately reuse busy extent ranges
xfs: optimize AGFL refills
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (48 commits)
serial: 8250_pci: add support for Cronyx Omega PCI multiserial board.
tty/serial: Fix break handling for PORT_TEGRA
tty/serial: Add explicit PORT_TEGRA type
n_tracerouter and n_tracesink ldisc additions.
Intel PTI implementaiton of MIPI 1149.7.
Kernel documentation for the PTI feature.
export kernel call get_task_comm().
tty: Remove to support serial for S5P6442
pch_phub: Support new device ML7223
8250_pci: Add support for the Digi/IBM PCIe 2-port Adapter
ASoC: Update cx20442 for TTY API change
pch_uart: Support new device ML7223 IOH
parport: Use request_muxed_region for IT87 probe and lock
tty/serial: add support for Xilinx PS UART
n_gsm: Use print_hex_dump_bytes
drivers/tty/moxa.c: Put correct tty value
TTY: tty_io, annotate locking functions
TTY: serial_core, remove superfluous set_task_state
TTY: serial_core, remove invalid test
Char: moxa, fix locking in moxa_write
...
Fix up trivial conflicts in drivers/bluetooth/hci_ldisc.c and
drivers/tty/serial/Makefile.
I did the hci_ldisc thing as an evil merge, cleaning things up.
In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
before calling vmtruncate(). This caused any allocated but unwritten
blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
flag to be dropped. This was done to make to make sure that
EOFBLOCKS_FL would not be cleared while still leaving blocks past
i_size allocated. This was not necessary, since ext4_truncate()
guarantees that blocks past i_size will be dropped, even in the case
where truncate() has increased i_size before calling ext4_truncate().
So fix this by removing the EOFBLOCKS_FL special case treatment in
ext4_setattr(). In addition, use truncate_setsize() followed by a
call to ext4_truncate() instead of using vmtruncate(). This is more
efficient since it skips the call to inode_newsize_ok(), which has
been checked already by inode_change_ok(). This is also in a win in
the case where EOFBLOCKS_FL is set since it avoids calling
ext4_truncate() twice.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimers: Reorder clock bases
hrtimers: Avoid touching inactive timer bases
hrtimers: Make struct hrtimer_cpu_base layout less stupid
timerfd: Manage cancelable timers in timerfd
clockevents: Move C3 stop test outside lock
alarmtimer: Drop device refcount after rtc_open()
alarmtimer: Check return value of class_find_device()
timerfd: Allow timers to be cancelled when clock was set
hrtimers: Prepare for cancel on clock was set timers
Add and use wakeup_pipe_readers() to consolidate duplicated codes.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
b43: fix comment typo reqest -> request
Haavard Skinnemoen has left Atmel
cris: typo in mach-fs Makefile
Kconfig: fix copy/paste-ism for dell-wmi-aio driver
doc: timers-howto: fix a typo ("unsgined")
perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
treewide: fix a few typos in comments
regulator: change debug statement be consistent with the style of the rest
Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
audit: acquire creds selectively to reduce atomic op overhead
rtlwifi: don't touch with treewide double semicolon removal
treewide: cleanup continuations and remove logging message whitespace
ath9k_hw: don't touch with treewide double semicolon removal
include/linux/leds-regulator.h: fix syntax in example code
tty: fix typo in descripton of tty_termios_encode_baud_rate
xtensa: remove obsolete BKL kernel option from defconfig
m68k: fix comment typo 'occcured'
arch:Kconfig.locks Remove unused config option.
treewide: remove extra semicolons
...
02e352287a4 (block: rescan partitions on invalidated devices on
-ENOMEDIA too) relocated partition rescan above explicit bd_set_size()
to simplify condition check. As rescan_partitions() does its own bdev
size setting, this doesn't break anything; however,
rescan_partitions() prints out the following messages when adjusting
bdev size, which can be confusing.
sda: detected capacity change from 0 to 146815737856
sdb: detected capacity change from 0 to 146815737856
This patch restores the original order and remove the warning
messages.
stable: Please apply together with 02e352287a4 (block: rescan
partitions on invalidated devices on -ENOMEDIA too).
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tony Luck <tony.luck@gmail.com>
Tested-by: Tony Luck <tony.luck@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow processes blocked on plock requests to be interrupted
when they are killed. This leaves the problem of cleaning
up the lock state in userspace. This has three parts:
1. Add a flag to unlock operations sent to userspace
indicating the file is being closed. Userspace will
then look for and clear any waiting plock operations that
were abandoned by an interrupted process.
2. Queue an unlock-close operation (like in 1) to clean up
userspace from an interrupted plock request. This is needed
because the vfs will not send a cleanup-unlock if it sees no
locks on the file, which it won't if the interrupted operation
was the only one.
3. Do not use replies from userspace for unlock-close operations
because they are unnecessary (they are just cleaning up for the
process which did not make an unlock call). This also simplifies
the new unlock-close generated from point 2.
Signed-off-by: David Teigland <teigland@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Wait properly when flushing the ail list
GFS2: Wipe directory hash table metadata when deallocating a directory