275 Commits

Author SHA1 Message Date
Pan Bian
dff93bb9ec f2fs: read page index before freeing
commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.

The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.

Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:12:37 +01:00
Jaegeuk Kim
e33c1e29d4 f2fs: fix missing up_read
commit 89d13c38501df730cbb2e02c4499da1b5187119d upstream.

This patch fixes missing up_read call.

Fixes: c9b60788fc76 ("f2fs: fix to do sanity check with block address in main area")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:14 +01:00
Chao Yu
aafb371575 f2fs: fix to do sanity check with block address in main area
commit c9b60788fc760d136211853f10ce73dc152d1f4a upstream.

This patch add to do sanity check with below field:
- cp_pack_total_block_count
- blkaddr of data/node
- extent info

- Overview
BUG() in verify_block_addr() when writing to a corrupted f2fs image

- Reproduce (4.18 upstream kernel)

- POC (poc.c)

static void activity(char *mpoint) {

  char *foo_bar_baz;
  int err;

  static int buf[8192];
  memset(buf, 0, sizeof(buf));

  err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);

  int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
  if (fd >= 0) {
    write(fd, (char *)buf, sizeof(buf));
    fdatasync(fd);
    close(fd);
  }
}

int main(int argc, char *argv[]) {
  activity(argv[1]);
  return 0;
}

- Kernel message
[  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
[  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
[  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
[  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
[  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
[  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
[  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
[  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
[  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
[  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
[  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
[  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.729171] Call Trace:
[  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
[  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
[  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
[  699.729269]  ? __radix_tree_replace+0xa3/0x120
[  699.729276]  __write_data_page+0x5c7/0xe30
[  699.729291]  ? kasan_check_read+0x11/0x20
[  699.729310]  ? page_mapped+0x8a/0x110
[  699.729321]  ? page_mkclean+0xe9/0x160
[  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
[  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
[  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
[  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
[  699.729358]  ? __write_data_page+0xe30/0xe30
[  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
[  699.729380]  ? kasan_check_write+0x14/0x20
[  699.729391]  ? _raw_spin_lock+0x17/0x40
[  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[  699.729413]  ? iov_iter_advance+0x113/0x640
[  699.729418]  ? f2fs_write_end+0x133/0x2e0
[  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
[  699.729428]  f2fs_write_data_pages+0x329/0x520
[  699.729433]  ? generic_perform_write+0x250/0x320
[  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
[  699.729454]  ? current_time+0x110/0x110
[  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
[  699.729464]  do_writepages+0x37/0xb0
[  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
[  699.729472]  ? do_writepages+0x37/0xb0
[  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
[  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
[  699.729496]  ? __vfs_write+0x2b2/0x410
[  699.729501]  file_write_and_wait_range+0x66/0xb0
[  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
[  699.729511]  ? truncate_partial_data_page+0x290/0x290
[  699.729521]  ? __sb_end_write+0x30/0x50
[  699.729526]  ? vfs_write+0x20f/0x260
[  699.729530]  f2fs_sync_file+0x9a/0xb0
[  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.729548]  vfs_fsync_range+0x68/0x100
[  699.729554]  ? __fget_light+0xc9/0xe0
[  699.729558]  do_fsync+0x3d/0x70
[  699.729562]  __x64_sys_fdatasync+0x24/0x30
[  699.729585]  do_syscall_64+0x78/0x170
[  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  699.729613] RIP: 0033:0x7f9bf930d800
[  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
[  699.729782] ------------[ cut here ]------------
[  699.729785] kernel BUG at fs/f2fs/segment.h:654!
[  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
[  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
[  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.752874] Call Trace:
[  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
[  699.754341]  f2fs_inplace_write_data+0xd2/0x240
[  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
[  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
[  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
[  699.758209]  ? __radix_tree_replace+0xa3/0x120
[  699.759164]  __write_data_page+0x5c7/0xe30
[  699.760002]  ? kasan_check_read+0x11/0x20
[  699.760823]  ? page_mapped+0x8a/0x110
[  699.761573]  ? page_mkclean+0xe9/0x160
[  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
[  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
[  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
[  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
[  699.766276]  ? __write_data_page+0xe30/0xe30
[  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
[  699.768112]  ? kasan_check_write+0x14/0x20
[  699.768951]  ? _raw_spin_lock+0x17/0x40
[  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[  699.770885]  ? iov_iter_advance+0x113/0x640
[  699.771743]  ? f2fs_write_end+0x133/0x2e0
[  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
[  699.773680]  f2fs_write_data_pages+0x329/0x520
[  699.774603]  ? generic_perform_write+0x250/0x320
[  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
[  699.776510]  ? current_time+0x110/0x110
[  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
[  699.778279]  do_writepages+0x37/0xb0
[  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
[  699.779978]  ? do_writepages+0x37/0xb0
[  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
[  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
[  699.782820]  ? __vfs_write+0x2b2/0x410
[  699.783597]  file_write_and_wait_range+0x66/0xb0
[  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
[  699.785381]  ? truncate_partial_data_page+0x290/0x290
[  699.786415]  ? __sb_end_write+0x30/0x50
[  699.787204]  ? vfs_write+0x20f/0x260
[  699.787941]  f2fs_sync_file+0x9a/0xb0
[  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.789572]  vfs_fsync_range+0x68/0x100
[  699.790360]  ? __fget_light+0xc9/0xe0
[  699.791128]  do_fsync+0x3d/0x70
[  699.791779]  __x64_sys_fdatasync+0x24/0x30
[  699.792614]  do_syscall_64+0x78/0x170
[  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  699.794406] RIP: 0033:0x7f9bf930d800
[  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
[  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.835556] ==================================================================
[  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309

[  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
[  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.843475] Call Trace:
[  699.843982]  dump_stack+0x7b/0xb5
[  699.844661]  print_address_description+0x70/0x290
[  699.845607]  kasan_report+0x291/0x390
[  699.846351]  ? update_stack_state+0x38c/0x3e0
[  699.853831]  __asan_load8+0x54/0x90
[  699.854569]  update_stack_state+0x38c/0x3e0
[  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
[  699.856601]  ? __save_stack_trace+0x5e/0x100
[  699.857476]  unwind_next_frame.part.5+0x18e/0x490
[  699.858448]  ? unwind_dump+0x290/0x290
[  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
[  699.860185]  __unwind_start+0x106/0x190
[  699.860974]  __save_stack_trace+0x5e/0x100
[  699.861808]  ? __save_stack_trace+0x5e/0x100
[  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
[  699.863525]  save_stack_trace+0x1f/0x30
[  699.864312]  save_stack+0x46/0xd0
[  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
[  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
[  699.866889]  ? kasan_check_write+0x14/0x20
[  699.867724]  ? __dec_node_state+0x92/0xb0
[  699.868543]  ? lock_page_memcg+0x85/0xf0
[  699.869350]  ? unlock_page_memcg+0x16/0x80
[  699.870185]  ? page_remove_rmap+0x198/0x520
[  699.871048]  ? mark_page_accessed+0x133/0x200
[  699.871930]  ? _cond_resched+0x1a/0x50
[  699.872700]  ? unmap_page_range+0xcd4/0xe50
[  699.873551]  ? rb_next+0x58/0x80
[  699.874217]  ? rb_next+0x58/0x80
[  699.874895]  __kasan_slab_free+0x13c/0x1a0
[  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
[  699.876563]  kasan_slab_free+0xe/0x10
[  699.877315]  kmem_cache_free+0x89/0x1e0
[  699.878095]  unlink_anon_vmas+0xba/0x2c0
[  699.878913]  free_pgtables+0x101/0x1b0
[  699.879677]  exit_mmap+0x146/0x2a0
[  699.880378]  ? __ia32_sys_munmap+0x50/0x50
[  699.881214]  ? kasan_check_read+0x11/0x20
[  699.882052]  ? mm_update_next_owner+0x322/0x380
[  699.882985]  mmput+0x8b/0x1d0
[  699.883602]  do_exit+0x43a/0x1390
[  699.884288]  ? mm_update_next_owner+0x380/0x380
[  699.885212]  ? f2fs_sync_file+0x9a/0xb0
[  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.886877]  ? vfs_fsync_range+0x68/0x100
[  699.887694]  ? __fget_light+0xc9/0xe0
[  699.888442]  ? do_fsync+0x3d/0x70
[  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
[  699.889996]  rewind_stack_do_exit+0x17/0x20
[  699.890860] RIP: 0033:0x7f9bf930d800
[  699.891585] Code: Bad RIP value.
[  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000

[  699.901241] The buggy address belongs to the page:
[  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[  699.903811] flags: 0x2ffff0000000000()
[  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
[  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
[  699.907673] page dumped because: kasan: bad access detected

[  699.909108] Memory state around the buggy address:
[  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
[  699.914392]                                                              ^
[  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
[  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
[  699.918634] ==================================================================

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 4.9:
 - Error label is different in validate_checkpoint() due to the earlier
   backport of "f2fs: fix invalid memory access"
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:14 +01:00
Chao Yu
d451188135 f2fs: introduce and spread verify_blkaddr
commit e1da7872f6eda977bd812346bf588c35e4495a1e upstream.

This patch introduces verify_blkaddr to check meta/data block address
with valid range to detect bug earlier.

In addition, once we encounter an invalid blkaddr, notice user to run
fsck to fix, and let the kernel panic.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 4.9:
 - I skipped an earlier renaming of is_valid_meta_blkaddr() to
   f2fs_is_valid_meta_blkaddr()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:13 +01:00
Chao Yu
6012d18d4b f2fs: clean up with is_valid_blkaddr()
commit 7b525dd01365c6764018e374d391c92466be1b7a upstream.

- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
- introduce is_valid_blkaddr() for cleanup.

No logic change in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:13 +01:00
Chao Yu
cb9b1d4ec2 f2fs: fix race condition in between free nid allocator/initializer
commit 30a61ddf8117c26ac5b295e1233eaa9629a94ca3 upstream.

In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.

Thread A				Thread B
- f2fs_create
 - f2fs_new_inode
  - alloc_nid
   - __insert_nid_to_list(ALLOC_NID_LIST)
					- f2fs_balance_fs_bg
					 - build_free_nids
					  - __build_free_nids
					   - scan_nat_page
					    - add_free_nid
					     - __lookup_nat_cache
 - f2fs_add_link
  - init_inode_metadata
   - new_inode_page
    - new_node_page
     - set_node_addr
 - alloc_nid_done
  - __remove_nid_from_list(ALLOC_NID_LIST)
					     - __insert_nid_to_list(FREE_NID_LIST)

This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 4.9:
 - add_free_nid() returns 0 in case of any error (except low memory)
 - Tree/list addition has not been moved into __insert_nid_to_list()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 13:05:13 +01:00
Chao Yu
ddeaf5abf5 f2fs: try grabbing node page lock aggressively in sync scenario
[ Upstream commit 4b270a8cc5047682f0a3f3f9af3b498408dbd2bc ]

In synchronous scenario, like in checkpoint(), we are going to flush
dirty node pages to device synchronously, we can easily failed
writebacking node page due to trylock_page() failure, especially in
condition of intensive lock competition, which can cause long latency
of checkpoint(). So let's use lock_page() in synchronous scenario to
avoid this issue.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19 22:47:14 +02:00
Linus Torvalds
abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Chao Yu
3f5f4959b1 f2fs: fix to commit bio cache after flushing node pages
In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.

We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:38 -07:00
Chao Yu
1ecc0c5c50 f2fs: support configuring fault injection per superblock
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:31 -07:00
Weichao Guo
5b7a487cf3 f2fs: add customized migrate_page callback
This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:23 -07:00
Miklos Szeredi
280db3c88c f2fs: use filemap_check_errors()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-16 12:44:21 +02:00
Jaegeuk Kim
e8ea9b3d7e f2fs: avoid ENOMEM during roll-forward recovery
This patch gives another chances during roll-forward recovery regarding to
-ENOMEM.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-13 13:02:29 -07:00
Chao Yu
5f8eaf1f9b f2fs: remove redundant judgement condition in available_free_memory
In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:12 -07:00
Jaegeuk Kim
b873b798af Revert "f2fs: use percpu_rw_semaphore"
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.

[1] https://github.com/sslab-gatech/fxmark

This reverts commit ec795418c41850056feb956534edf059dc1155d4.
2016-08-19 11:15:08 +09:00
Linus Torvalds
4fc29c1aa3 The major change in this version is mitigating cpu overheads on write paths by
replacing redundant inode page updates with mark_inode_dirty calls. And we tried
 to reduce lock contentions as well to improve filesystem scalability.
 Other feature is setting F2FS automatically when detecting host-managed SMR.
 
 = Enhancement =
  - ioctl to move a range of data between files
  - inject orphan inode errors
  - avoid flush commands congestion
  - support lazytime
 
 = Bug fixes =
  - return proper results for some dentry operations
  - fix deadlock in add_link failure
  - disable extent_cache for fcollapse/finsert
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmDJFAAoJEEAUqH6CSFDSJeYP/0ru8+5/ui5VTCdNPQB9KxYD
 DIUaDGpeoLvmn3ZdrMEdyNr6kWbgjCE9JjOGPQ7l1/apErOGVPyaBwflKcCDwloU
 pAlEqVM1Q9j4qH4i9SWTlvPtsHBHB7G7YSe3vDB9fJGSTqumubIlnaBm+Wfjx31U
 p53WcPn9LpOyzfmvZf2tOHmvZ7bWLkE/a07x9kPC6XHUFb9C17jLRFFGeuhZQHv1
 Yo7HgokBnPExa8TnEILYyX/x+eecFS/1Cp/cN0STsebSu8pStTHTcAP7qEpKQB88
 Cc51Lf+d5gFeydxKDFxwdH3VWOGIr9Ppako+lHW83gJcHP0zw8zdxULab+HJMa4n
 MOByRRiafwu1sL0dl7TCfsYNIHdEnXhWbhcRhMVZbb5C2Q6+Htuac8ZrKSOWExNN
 DUqRkzeTib9u+cHxUTFFPgOGdUjDLmg3XHU7mvb+2hViluVjIImC4tqD5XPpv7vt
 WnaDJxLCGD/6DF2yhiVY9NysuxInLTNFFCF06LworZ4L24hlg5TvN0UeUNRO9954
 ux6f+lSORCzV3TmrsHP5vwjSAW26FviPXV1q1HHJeTpWKMlhsZtHmOAJOtZKKmxP
 WFnHT0aiWF+sQf4qfxVQL+lLqtgRKJAI9zqGRyfDJWJp5aXdRuVsZs9pWNQF7lCo
 5gVnCYk3ULjXG3b23j2S
 =tKTR
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "The major change in this version is mitigating cpu overheads on write
  paths by replacing redundant inode page updates with mark_inode_dirty
  calls.  And we tried to reduce lock contentions as well to improve
  filesystem scalability.  Other feature is setting F2FS automatically
  when detecting host-managed SMR.

  Enhancements:
   - ioctl to move a range of data between files
   - inject orphan inode errors
   - avoid flush commands congestion
   - support lazytime

  Bug fixes:
   - return proper results for some dentry operations
   - fix deadlock in add_link failure
   - disable extent_cache for fcollapse/finsert"

* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: clean up coding style and redundancy
  f2fs: get victim segment again after new cp
  f2fs: handle error case with f2fs_bug_on
  f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
  f2fs: support an ioctl to move a range of data blocks
  f2fs: fix to report error number of f2fs_find_entry
  f2fs: avoid memory allocation failure due to a long length
  f2fs: reset default idle interval value
  f2fs: use blk_plug in all the possible paths
  f2fs: fix to avoid data update racing between GC and DIO
  f2fs: add maximum prefree segments
  f2fs: disable extent_cache for fcollapse/finsert inodes
  f2fs: refactor __exchange_data_block for speed up
  f2fs: fix ERR_PTR returned by bio
  f2fs: avoid mark_inode_dirty
  f2fs: move i_size_write in f2fs_write_end
  f2fs: fix to avoid redundant discard during fstrim
  f2fs: avoid mismatching block range for discard
  f2fs: fix incorrect f_bfree calculation in ->statfs
  f2fs: use percpu_rw_semaphore
  ...
2016-07-27 10:36:31 -07:00
Christoph Hellwig
70246286e9 block: get rid of bio_rw and READA
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:01 -06:00
Jaegeuk Kim
9dfa1baff7 f2fs: use blk_plug in all the possible paths
This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.

The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:23 -07:00
Jaegeuk Kim
0a2aa8fbb9 f2fs: refactor __exchange_data_block for speed up
This reduces the elapsed time to do xfstests/generic/017.

Before: 715 s
After:  458 s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:19 -07:00
Jaegeuk Kim
ec795418c4 f2fs: use percpu_rw_semaphore
This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:31 -07:00
Jaegeuk Kim
3bdad3c7ee f2fs: skip to check the block address of node page
If the node page is up-to-date, it should be alive.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:31 -07:00
Jaegeuk Kim
237c0790e5 f2fs: call SetPageUptodate if needed
SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:29 -07:00
Jaegeuk Kim
fe76b796fc f2fs: introduce f2fs_set_page_dirty_nobuffer
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:28 -07:00
Chao Yu
1563ac75e7 f2fs: fix to detect truncation prior rather than EIO during read
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:25 -07:00
Jaegeuk Kim
ad4edb8314 f2fs: produce more nids and reduce readahead nats
The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.

And, let's keep some free nids as much as possible.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-06 10:44:08 -07:00
Mike Christie
04d328defd f2fs: use bio op accessors
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Jaegeuk Kim
e589c2c477 f2fs: control not to exceed # of cached nat entries
This is to avoid cache entry management overhead including radix tree.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-07 10:18:08 -07:00
Yunlong Song
0c9df7fb80 f2fs: return the errno to the caller to avoid using a wrong page
Commit aaf9607516ed38825268515ef4d773289a44f429 ("f2fs: check node page
contents all the time") pointed out that "sometimes it was reported that
its contents was missing", so it checks the page's mapping and contents.
When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the
caller. However, commit e1c51b9f1df2f9efc2ec11488717e40cd12015f9 ("f2fs:
clean up node page updating flow") moves "nid != nid_of_node(page)" test
to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a
wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it
was reported that its contents was missing" happens.

This patch restores to check node page contents all the time, and
returns the errno to make the caller known something is wrong and avoid
to use the page. This patch also moves f2fs_bug_on to its proper location.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:22 -07:00
Jaegeuk Kim
26de9b1171 f2fs: avoid unnecessary updating inode during fsync
If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:13 -07:00
Jaegeuk Kim
ee6d182f2a f2fs: remove syncing inode page in all the cases
This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:12 -07:00
Jaegeuk Kim
0f18b462b2 f2fs: flush inode metadata when checkpoint is doing
This patch registers all the inodes which have dirty metadata to sync when
checkpoint is doing.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:11 -07:00
Jaegeuk Kim
205b98221c f2fs: call mark_inode_dirty_sync for i_field changes
This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.

 -> largest
 -> ctime/mtime/atime
 -> i_current_depth
 -> i_xattr_nid
 -> i_pino
 -> i_advise
 -> i_flags
 -> i_mode

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:11 -07:00
Jaegeuk Kim
91942321e4 f2fs: use inode pointer for {set, clear}_inode_flag
This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:07 -07:00
Chao Yu
0f3311a8c2 f2fs: fix to update dirty page count correctly
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-20 14:55:41 -07:00
Jaegeuk Kim
79344efb93 f2fs: read node blocks ahead when truncating blocks
This patch enables reading node blocks in advance when truncating large
data blocks.

 > time rm $MNT/testfile (500GB) after drop_cachees
Before : 9.422 s
After  : 4.821 s

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:44:56 -07:00
Jaegeuk Kim
fb58ae2206 f2fs: remove an obsolete variable
This patch removes an obsolete variable used in add_free_nid.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:32:31 -07:00
Jaegeuk Kim
cb78942b82 f2fs: inject ENOSPC failures
This patch injects ENOSPC failures.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:32:24 -07:00
Jaegeuk Kim
300e129c15 f2fs: use f2fs_grab_cache_page instead of grab_cache_page
This patch converts grab_cache_page to f2fs_grab_cache_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:32:21 -07:00
Chao Yu
da011cc0da f2fs: move node pages only in victim section during GC
For foreground GC, we cache node blocks in victim section and set them
dirty, then we call sync_node_pages to flush these node pages, but
meanwhile, those node pages which does not locate in victim section
will be flushed together, so more bandwidth and continuous free space
would be occupied.

So for this condition, it's better to leave those unrelated node page
in cache for further write hit, and let CP or VM to flush them afterward.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-27 14:10:42 -07:00
Jaegeuk Kim
608514deba f2fs: set fsync mark only for the last dnode
In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:59 -07:00
Jaegeuk Kim
c267ec1526 f2fs: report unwritten status in fsync_node_pages
The fsync_node_pages should return pass or failure so that user could know
fsync is completed or not.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:54 -07:00
Jaegeuk Kim
5268137564 f2fs: split sync_node_pages with fsync_node_pages
This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:48 -07:00
Jaegeuk Kim
eca76e783c f2fs: avoid needless lock for node pages when fsyncing a file
When fsync is called, sync_node_pages finds a proper direct node pages to flush.
But, it locks unrelated direct node pages together unnecessarily.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:30 -07:00
Jaegeuk Kim
ff37355886 f2fs: add BUG_ON to avoid unnecessary flow
This patch adds BUG_ON instead of retrying loop.
In the case of node pages, we already got this inode page, but unlocked it.
By the fact that we don't truncate any node pages in operations, the page's
mapping should be unchangeable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Jaegeuk Kim
4a6de50d54 f2fs: use PGP_LOCK to check its truncation
Previously, after trylock_page is succeeded, it doesn't check its mapping.
In order to fix that, we can just give PGP_LOCK to pagecache_get_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Jaegeuk Kim
12bb0a8fd4 f2fs: submit node page write bios when really required
If many threads calls fsync with data writes, we don't need to flush every
bios having node page writes.
The f2fs_wait_on_page_writeback will flush its bios when the page is really
needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:47 -07:00
Jaegeuk Kim
17a0ee552c f2fs: declare static functions
Just to avoid sparse warnings.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:44 -07:00
Fan Li
999270de31 f2fs: modify the readahead method in ra_node_page()
ra_node_page() is used to read ahead one node page. Comparing to regular
read, it's faster because it doesn't wait for IO completion.
But if it is called twice for reading the same block, and the IO request
from the first call hasn't been completed before the second call, the second
call will have to wait until the read is over.

Here use the code in __do_page_cache_readahead() to solve this problem.
It does nothing when someone else already puts the page in mapping. The
status of page should be assured by whoever puts it there.
This implement also prevents alteration of page reference count.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:43 -07:00
Chao Yu
19c7377b56 f2fs: fix to avoid deadlock when merging inline data
When testing with fsstress, kworker and user threads were both blocked:

INFO: task kworker/u16:1:16580 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D ffff8803f2595390     0 16580      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-251:0)
 ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440
 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440
 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000
Call Trace:
 [<ffffffff816fe9d9>] schedule+0x29/0x70
 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9
 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs]
 [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs]
 [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs]
 [<ffffffff811473b3>] do_writepages+0x23/0x40
 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250
 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470
 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0
 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0
 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220
 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0
 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0
 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0
 [<ffffffff810750ce>] kthread+0xde/0xf0
 [<ffffffff817093f8>] ret_from_fork+0x58/0x90

fsstress thread stack:
 [<ffffffff81139f0e>] sleep_on_page+0xe/0x20
 [<ffffffff81139ef7>] __lock_page+0x67/0x70
 [<ffffffff8113b100>] find_lock_page+0x50/0x80
 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0
 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs]
 [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs]
 [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs]
 [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs]
 [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40
 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20
 [<ffffffff811d3291>] do_fsync+0x41/0x70
 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20
 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b
 [<ffffffffffffffff>] 0xffffffffffffffff

The reason of this issue is:
CPU0:					CPU1:
 - f2fs_write_data_pages
					 - f2fs_sync_fs
					  - write_checkpoint
					   - block_operations
					    - f2fs_lock_all
					     - down_write(sbi->cp_rwsem)
  - lock_page(page)
  - f2fs_write_data_page
					    - sync_node_pages
					     - flush_inline_data
					      - pagecache_get_page(page, GFP_LOCK)
   - f2fs_lock_op
    - down_read(sbi->cp_rwsem)

This patch alters to use trylock_page in flush_inline_data to fix this ABBA
deadlock issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:03 -08:00