Commit Graph

457 Commits

Author SHA1 Message Date
Chao Yu
4bc8e9bcf5 f2fs: introduce f2fs_has_xattr_block for better readability
This patch introduces a help function f2fs_has_xattr_block for better
readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 09:29:46 +09:00
Chao Yu
90aa6dc9b9 f2fs: print type for each segment in segment_info's show
The original segment_info's show looks out-of-format:
cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 512
512 512 512 512 512 512 512 0 0 512
348 0 263 0 0 512 0 0 512 512
512 512 0 512 512 512 512 512 512 512
512 512 511 328 512 512 512 512 512 512
512 512 512 512 512 512 512 0 0 175

Let's fix this and show type for each segment.
cat /proc/fs/f2fs/loop0/segment_info
format: segment_type|valid_blocks
segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
0    2|0   1|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
10   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
20   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
30   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
40   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
50   3|0   3|0   3|0   3|0   3|0   3|0   3|0   0|0   3|0   3|0
60   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|512
70   3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0   3|0   3|512
80   3|0   3|0   3|0   3|0   3|0   3|512 3|0   3|0   3|512 3|512
90   3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512
100  3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512
110  0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0   4|0   3|512

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 09:27:18 +09:00
Chao Yu
910bb12d29 f2fs: check upper bound of ino value in f2fs_nfs_get_inode
Upper bound checking of ino should be added to f2fs_nfs_get_inode, so unneeded
process before do_read_inode in f2fs_iget could be avoided when ino is invalid.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-12 18:15:38 +09:00
Chao Yu
987c7c3112 f2fs: introduce f2fs_has_inline_xattr for better readability
This patch introduces a help function f2fs_has_inline_xattr for better
readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-12 17:23:35 +09:00
Chao Yu
28cdce0459 f2fs: recover inline xattr data in roll-forward process
Previously we do not recover inline xattr data of inode after power-cut, so
inline xattr data may be lost.
We should recover the data during the roll-forward process.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-11 16:31:06 +09:00
Gu Zheng
d653788a43 f2fs: optimize restore_node_summary slightly
Previously, we ra_sum_pages to pre-read contiguous pages as more
as possible, and if we fail to alloc more pages, an ENOMEM error
will be reported upstream, even though we have alloced some pages
yet. In fact, we can use the available pages to do the job partly,
and continue the rest in the following circle. Only reporting ENOMEM
upstream if we really can not alloc any available page.

And another fix is ignoring dealing with the following pages if an
EIO occurs when reading page from page_list.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: modify the flow for better neat code]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:15 +09:00
Gu Zheng
46c04366bb f2fs: format segment_info's show for better legibility
The original segment_info's show is a bit out-of-format:

[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 [root@guz Demoes]#

so we fix it here for better legibility.
[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1
[root@guz Demoes]#

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:15 +09:00
Gu Zheng
e8512d2e0c f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:14 +09:00
Gu Zheng
b6ce391e61 f2fs: update start nid only once each circle
Integrated a couple of minor changes for better readability suggested by
Chao Yu.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:09 +09:00
Jaegeuk Kim
20f70751c6 f2fs: fix wrong kernel coding style
This patch includes a simple fix to adjust coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-05 10:48:53 +09:00
Jaegeuk Kim
c81bf1c84f f2fs: fix to write node pages with WRITE_SYNC
This patch fixes performance regression of dbench reported by
Alex <hbx7d@yandex.com>.

This issue was revealed by Phoronix tests results:
http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2

It turns out that we need to assign WRITE_SYNC to the node writes, if
fsync is triggered.

The performance numbers are like below, which is measured by Alex.
1. 355MB/s       ext4
2. 225MB/s       f2fs : WRITE for node writes
3. 525MB/s       f2fs : WRITE_SYNC for node writes

Reported-And-Tested-by: Alex <hbx7d@yandex.com>.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-03 11:28:40 +09:00
Chao Yu
9cf3c3898a f2fs: fix dirty page accounting when redirty
We should de-account dirty counters for page when redirty in ->writepage().

Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
"writeback: fix dirtied pages accounting on redirty
De-account the accumulative dirty counters on page redirty.

Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)

a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN

This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints)."

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-28 13:09:08 +09:00
Chao Yu
695fd1ed3b f2fs: use existing macro to clean up some codes
This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some
codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 21:09:28 +09:00
Chao Yu
81c1a0f13e f2fs: readahead contiguous SSA blocks for f2fs_gc
If there are multi segments in one section, we will read those SSA blocks which
have contiguous address one by one in f2fs_gc. It may lost performance, let's
read ahead SSA blocks by merge multi read request.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 20:40:36 +09:00
Jaegeuk Kim
ab9fa662e4 f2fs: add an sysfs entry to control the directory level
This patch adds an sysfs entry to control dir_level used by the large directory.

The description of this entry is:

 dir_level                    This parameter controls the directory level to
			      support large directory. If a directory has a
			      number of files, it can reduce the file lookup
			      latency by increasing this dir_level value.
			      Otherwise, it needs to decrease this value to
			      reduce the space overhead. The default value is 0.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 20:31:15 +09:00
Jaegeuk Kim
3843154598 f2fs: introduce large directory support
This patch introduces an i_dir_level field to support large directory.

Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.

In Documentation/filesystems/f2fs.txt,

----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------

level #0   | A(2B)
           |
level #1   | A(2B) - A(2B)
           |
level #2   | A(2B) - A(2B) - A(2B) - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.

In order to avoid that, we can configure the hash tables sparsely from level #0
like this.

level #0   | A(2B) - A(2B) - A(2B) - A(2B)

level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

With this structure, we can skip the ineffective tree searches in lower level
hash tables.

This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 19:56:09 +09:00
Jaegeuk Kim
5d0c667121 f2fs: remove costly bit operations for f2fs_find_entry
It turns out that a bit operation like find_next_bit is not always fast enough
for f2fs_find_entry.
Instead, it is pretty much simple and fast to traverse each dentries.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 16:25:20 +09:00
Jaegeuk Kim
8b8343fa9d f2fs: implement a lock-free stat_show
The stat_show is just to show the current status of f2fs.
So, we can remove all the there-in locks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Jaegeuk Kim
8a7ed66aaf f2fs: introduce a radix_tree for the free_nid list
This patch introduces a radix tree for the list of free_nids, which enhances
the performance on free nid management.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Gu Zheng
f978f5a061 f2fs: introduce help macro on_build_free_nids()
Introduce help macro on_build_free_nids() which just uses build_lock
to judge whether the building free nid is going, so that we can remove
the on_build_free_nids field from f2fs_sb_info.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: remove an unnecessary white line removal]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
fffc2a00fc f2fs: fix to mark the checkpointed nat entry correctly
The nat cache entry maintains a status whether it is checkpointed or not.
So, if a new cache entry is loaded from the last checkpoint,
nat_entry->checkpointed should be true.
If the cache entry is modified as being dirty, nat_entry->checkpoint should
be false.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
6437d1b0ad f2fs: fix to do build_stat prior to the recovery procedure
At the end of the recovery procedure, write_checkpoint is called and updates
the cp count which is managed by f2fs stat.
But, previously build_stat() is called after the recovery procedure, which
results in:

BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs]
Call Trace:
 [<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140
 [<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs]
 [<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30
 [<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs]
 [<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0
 [<ffffffff8114915e>] ? __get_free_pages+0xe/0x40
 [<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs]
 [<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffff811aa713>] mount_fs+0x43/0x1b0
 [<ffffffff811c7124>] vfs_kern_mount+0x74/0x160
 [<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60
 [<ffffffff811c9727>] do_mount+0x237/0xb50
 [<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170

So, this patche changes the order of recovery_fsync_data() and
f2fs_build_stats().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:39 +09:00
Jaegeuk Kim
8618b881e9 f2fs: fix not to write data pages on the page reclaiming path
Even if f2fs_write_data_page is called by the page reclaiming path, we should
not write the page to provide enough free segments for the worst case scenario.
Otherwise, f2fs can face with no free segment while gc is conducted, resulting
in:

 ------------[ cut here ]------------
 kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565!
 RIP: 0010:[<ffffffffa02c3b11>]  [<ffffffffa02c3b11>] new_curseg+0x331/0x340 [f2fs]
 Call Trace:
  allocate_segment_by_default+0x204/0x280 [f2fs]
  allocate_data_block+0x108/0x210 [f2fs]
  write_data_page+0x8a/0xc0 [f2fs]
  do_write_data_page+0xe1/0x2a0 [f2fs]
  move_data_page+0x8a/0xf0 [f2fs]
  f2fs_gc+0x446/0x970 [f2fs]
  f2fs_balance_fs+0xb6/0xd0 [f2fs]
  f2fs_write_begin+0x50/0x350 [f2fs]
  ? unlock_page+0x27/0x30
  ? unlock_page+0x27/0x30
  generic_file_buffered_write+0x10a/0x280
  ? file_update_time+0xa3/0xf0
  __generic_file_aio_write+0x1c8/0x3d0
  ? generic_file_aio_write+0x52/0xb0
  ? generic_file_aio_write+0x52/0xb0
  generic_file_aio_write+0x65/0xb0
  do_sync_write+0x5a/0x90
  vfs_write+0xc5/0x1f0
  SyS_write+0x55/0xa0
  system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:33 +09:00
Jaegeuk Kim
b63da15e8b f2fs: fix the calculation of max_nids
Total nids that f2fs can use should not include 0, nid for node inode, and nid
for meta inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Changman Lee
942e0be621 f2fs: show counts of checkpoint in status
This patch shows the counts of checkpoint in f2fs' status.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Chao Yu
662befda25 f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages
This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages
function into ra_meta_pages.
Additionally the new function is used to readahead cp block in
recover_orphan_inodes.

Change log from v1:
 o fix a deadloop bug pointed by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Chao Yu
3375f696bd f2fs: use inode mutex to keep atomicity of f2fs_falloc
Previously without protection of inode mutex, f2fs_falloc and other data
correlated operations will interfere with each other.
So let's use inode mutex to keep atomicity of f2fs_falloc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
1fe54f9dd3 f2fs: clean up redundant function call
This patch integrates inode_[inc|dec]_dirty_dents with inc_page_count to remove
redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
203681f65b f2fs: fix f2fs_write_meta_page at no checkpoint status
If f2fs entered errorneous checkpoint status, it should skip writing meta
pages instead of redirtying the pages out.
Otherwise, it cannot unmount the partition even though f2fs is under read-only
status.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
bd859c6598 f2fs: fix to truncate dentry pages in the error case
When a new directory is allocated, if an error is occurred, we should truncate
preallocated dentry pages too.

This bug was reported by Andrey Tsyvarev after a while as follows.

mkdir()->
 f2fs_add_link()->
  init_inode_metadata()->
    f2fs_init_acl()->
      f2fs_get_acl()->
        f2fs_getxattr()->
          read_all_xattrs() fails.

Also there was a BUG_ON triggered after the fault in
mkdir()->
 f2fs_add_link()->
   init_inode_metadata()->
    remove_inode_page() ->
      f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);

But, previous patch wasn't perfect to resolve that bug, so the following bug
report was also submitted.

kernel BUG at fs/f2fs/inode.c:274!
Call Trace:
 [<ffffffff811fde03>] evict+0xa3/0x1a0
 [<ffffffff811fe615>] iput+0xf5/0x180
 [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
 [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
 [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
 [<ffffffff81680769>] system_call_fastpath+0x16/0x1b

Finally, this patch resolves all the issues like below.

If an error is occurred after make_empty_dir(),
 1. truncate_inode_pages()
   The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which
   means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode.
   But, by calling it here, we can do that.

 2. truncate_blocks()
   Preallocated dentry pages are trucated here to sync i_blocks.

 3. remove_dirty_dir_inode()
   Remove this directory inode from the list.

Reported-and-Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
f6517cfc84 f2fs: fix a build warning
This patch modifies flow a little bit to avoid the following build warnings.

src/fs/f2fs/recovery.c: In function ‘check_index_in_prev_nodes’:
src/fs/f2fs/recovery.c:288:51: warning: ‘sum.<U5390>.<U52f8>.ofs_in_node’ may
	be used uninitialized in this function [-Wmaybe-uninitialized]
src/fs/f2fs/recovery.c:260:23: warning: ‘sum.nid’ may be used uninitialized
	in this function [-Wmaybe-uninitialized]

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
491c0854b4 f2fs: clean up with a macro
This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
924a2ddbd0 f2fs: fix the potential mismatch between dir's i_size and i_blocks
This is the erroneous scenario.

                             i_size    on-disk i_size    i_blocks
__f2fs_add_link()             4096           4096           2
 get_new_data_page            8192           4096           3
 -ENOSPC = init_inode_metadata
 checkpoint                     -            4096           3
 POR and reboot

__f2fs_add_link()             4096           4096           3
 page = get_new_data_page (page->index = 1 by NEW_ADDR)
 add a dentry to the page successfully

f2fs_rmdir()
 f2fs_empty_dir()             4096           4096           3
 f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096.
 But, still there is one dentry in page->index = 1.

So this patch moves the code to write dir->i_size into on-disk i_size in order
to sync dir's i_size, on-disk i_size, and its i_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
1b1f559fc3 f2fs: remove the ugly pointer conversion
This patch modifies the use of bi_private to remove pointer chasing for sbi.
Previously, we had a bi_private structure, but it needs memory allocation.
So this patch uses bi_private by the sbi pointer and adds a completion pointer
into the sbi.
This can achieve no memory allocation and nice use of the bi_private.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
abb2366c82 f2fs: fix to recover xattr node block
If a new xattr node page was allocated and its inode is fsynced, we should
recover the xattr node page during the roll-forward process after power-cut.
But, previously, f2fs didn't handle that case, resulting in kernel panic as
follows reported by Tom Li.

BUG: unable to handle kernel paging request at ffffc9001c861a98
IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs]
Call Trace:
 [<ffffffff815ece9b>] ? printk+0x48/0x4a
 [<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs]
 [<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs]
 [<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200
 [<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs]
 [<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs]
 [<ffffffff8112d4de>] mount_fs+0x3e/0x1b0
 [<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10
 [<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120
 [<ffffffff811497b9>] do_mount+0x259/0xa90
 [<ffffffff810ead1d>] ? memdup_user+0x3d/0x80
 [<ffffffff810eadb3>] ? strndup_user+0x53/0x70
 [<ffffffff8114a2c9>] SyS_mount+0x89/0xd0
 [<ffffffff815feae2>] system_call_fastpath+0x16/0x1b

This patch adds a recovery function of xattr node pages.

Reported-by: Tom Li <biergaizi@members.fsf.org>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
5e443818fa f2fs: handle dirty segments inside refresh_sit_entry
This patch cleans up the refresh_sit_entry to handle locate_dirty_segments.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
744602cf45 f2fs: update_inode_page should be done all the time
In order to make fs consistency, update_inode_page should not be failed all
the time. Otherwise, it is possible to lose some metadata in the inode like
a link count.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:51 +09:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Christoph Hellwig
a6dda0e63e f2fs: use generic posix ACL infrastructure
f2fs has some weird mode bit handling, so still using the old
chmod code for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:19 -05:00
Christoph Hellwig
37bc15392a fs: make posix_acl_create more useful
Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Christoph Hellwig
5bf3258fd2 fs: make posix_acl_chmod more useful
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Jaegeuk Kim
bf39c00a9a f2fs: drop obsolete node page when it is truncated
If a node page is trucated, we'd better drop the page in the node_inode's page
cache for better memory footprint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-23 08:04:21 +09:00
Jaegeuk Kim
4ef51a8fcc f2fs: introduce NODE_MAPPING for code consistency
This patch adds NODE_MAPPING which is similar as META_MAPPING introduced by
Gu Zheng.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng
63f5384c9a f2fs: remove the orphan block page array
As the orphan_blocks may be max to 504, so it is not security
and rigorous to store such a large array in the kernel stack
as Dan Carpenter said.
In fact, grab_meta_page has locked the page in the page cache,
and we can use find_get_page() to fetch the page safely in the
downstream, so we can remove the page array directly.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng
9df27d982d f2fs: add help function META_MAPPING
Introduce help function META_MAPPING() to get the cache meta blocks'
address space.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim
e8dae60458 f2fs: move a branch for code redability
This patch moves a function in f2fs_delete_entry for code readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim
a18ff06340 f2fs: call mark_inode_dirty to flush dirty pages
If a dentry page is updated, we should call mark_inode_dirty to add the inode
into the dirty list, so that its dentry pages are flushed to the disk.
Otherwise, the inode can be evicted without flush.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:40:34 +09:00
Chris Fries
6c311ec6c2 f2fs: clean checkpatch warnings
Fixed a variety of trivial checkpatch warnings.  The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-20 10:27:12 +09:00
Changman Lee
c434cbc0ed f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw
using WRITE_FLUSH_FUA. At this time, we also should set
REQ_META|REQ_PRIO.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-16 17:28:35 +09:00