2018-09-04 15:46:30 -07:00
// SPDX-License-Identifier: GPL-2.0+
2009-04-06 19:01:32 -07:00
/*
* inode . c - NILFS inode operations .
*
* Copyright ( C ) 2005 - 2008 Nippon Telegraph and Telephone Corporation .
*
2016-05-23 16:23:09 -07:00
* Written by Ryusuke Konishi .
2009-04-06 19:01:32 -07:00
*
*/
# include <linux/buffer_head.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/gfp.h>
2009-04-06 19:01:32 -07:00
# include <linux/mpage.h>
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
# include <linux/pagemap.h>
2009-04-06 19:01:32 -07:00
# include <linux/writeback.h>
2015-02-22 08:58:50 -08:00
# include <linux/uio.h>
2020-05-23 09:30:11 +02:00
# include <linux/fiemap.h>
2009-04-06 19:01:32 -07:00
# include "nilfs.h"
2010-06-07 11:55:00 -04:00
# include "btnode.h"
2009-04-06 19:01:32 -07:00
# include "segment.h"
# include "page.h"
# include "mdt.h"
# include "cpfile.h"
# include "ifile.h"
2012-07-30 14:42:10 -07:00
/**
* struct nilfs_iget_args - arguments used during comparison between inodes
* @ ino : inode number
* @ cno : checkpoint number
* @ root : pointer on NILFS root object ( mounted checkpoint )
* @ for_gc : inode for GC flag
*/
2010-08-20 21:20:29 +09:00
struct nilfs_iget_args {
u64 ino ;
__u64 cno ;
2010-08-25 17:45:44 +09:00
struct nilfs_root * root ;
2010-08-20 21:20:29 +09:00
int for_gc ;
} ;
2009-04-06 19:01:32 -07:00
2014-12-10 15:54:34 -08:00
static int nilfs_iget_test ( struct inode * inode , void * opaque ) ;
2011-03-05 00:19:32 +09:00
void nilfs_inode_add_blocks ( struct inode * inode , int n )
{
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2017-02-27 14:28:32 -08:00
inode_add_bytes ( inode , i_blocksize ( inode ) * n ) ;
2011-03-05 00:19:32 +09:00
if ( root )
2013-07-03 15:08:06 -07:00
atomic64_add ( n , & root - > blocks_count ) ;
2011-03-05 00:19:32 +09:00
}
void nilfs_inode_sub_blocks ( struct inode * inode , int n )
{
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2017-02-27 14:28:32 -08:00
inode_sub_bytes ( inode , i_blocksize ( inode ) * n ) ;
2011-03-05 00:19:32 +09:00
if ( root )
2013-07-03 15:08:06 -07:00
atomic64_sub ( n , & root - > blocks_count ) ;
2011-03-05 00:19:32 +09:00
}
2009-04-06 19:01:32 -07:00
/**
* nilfs_get_block ( ) - get a file block on the filesystem ( callback function )
* @ inode - inode struct of the target file
* @ blkoff - file block number
* @ bh_result - buffer head to be mapped on
* @ create - indicate whether allocating the block or not when it has not
* been allocated yet .
*
* This function does not issue actual read request of the specified data
* block . It is done by VFS .
*/
int nilfs_get_block ( struct inode * inode , sector_t blkoff ,
struct buffer_head * bh_result , int create )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-05-05 12:56:51 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-05-25 02:47:14 +09:00
__u64 blknum = 0 ;
2009-04-06 19:01:32 -07:00
int err = 0 , ret ;
2016-05-23 16:23:39 -07:00
unsigned int maxblocks = bh_result - > b_size > > inode - > i_blkbits ;
2009-04-06 19:01:32 -07:00
2011-05-05 12:56:51 +09:00
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-05-25 02:47:14 +09:00
ret = nilfs_bmap_lookup_contig ( ii - > i_bmap , blkoff , & blknum , maxblocks ) ;
2011-05-05 12:56:51 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-05-25 02:47:14 +09:00
if ( ret > = 0 ) { /* found */
2009-04-06 19:01:32 -07:00
map_bh ( bh_result , inode - > i_sb , blknum ) ;
2009-05-25 02:47:14 +09:00
if ( ret > 0 )
bh_result - > b_size = ( ret < < inode - > i_blkbits ) ;
2009-04-06 19:01:32 -07:00
goto out ;
}
/* data block was not found */
if ( ret = = - ENOENT & & create ) {
struct nilfs_transaction_info ti ;
bh_result - > b_blocknr = 0 ;
err = nilfs_transaction_begin ( inode - > i_sb , & ti , 1 ) ;
if ( unlikely ( err ) )
goto out ;
2015-04-16 12:46:34 -07:00
err = nilfs_bmap_insert ( ii - > i_bmap , blkoff ,
2009-04-06 19:01:32 -07:00
( unsigned long ) bh_result ) ;
if ( unlikely ( err ! = 0 ) ) {
if ( err = = - EEXIST ) {
/*
* The get_block ( ) function could be called
* from multiple callers for an inode .
* However , the page having this block must
* be locked in this case .
*/
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" %s (ino=%lu): a race condition while inserting a data block at offset=%llu " ,
__func__ , inode - > i_ino ,
( unsigned long long ) blkoff ) ;
2009-04-06 19:01:55 -07:00
err = 0 ;
2009-04-06 19:01:32 -07:00
}
2009-04-06 19:01:45 -07:00
nilfs_transaction_abort ( inode - > i_sb ) ;
2009-04-06 19:01:32 -07:00
goto out ;
}
2014-10-13 15:53:22 -07:00
nilfs_mark_inode_dirty_sync ( inode ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( inode - > i_sb ) ; /* never fails */
2009-04-06 19:01:32 -07:00
/* Error handling should be detailed */
set_buffer_new ( bh_result ) ;
2010-12-26 16:28:28 +09:00
set_buffer_delay ( bh_result ) ;
2016-05-23 16:23:48 -07:00
map_bh ( bh_result , inode - > i_sb , 0 ) ;
/* Disk block number must be changed to proper value */
2009-04-06 19:01:32 -07:00
} else if ( ret = = - ENOENT ) {
2016-05-23 16:23:48 -07:00
/*
* not found is not error ( e . g . hole ) ; must return without
* the mapped state flag .
*/
2009-04-06 19:01:32 -07:00
;
} else {
err = ret ;
}
out :
return err ;
}
/**
* nilfs_readpage ( ) - implement readpage ( ) method of nilfs_aops { }
* address_space_operations .
* @ file - file struct of the file to be read
* @ page - the page to be read
*/
static int nilfs_readpage ( struct file * file , struct page * page )
{
return mpage_readpage ( page , nilfs_get_block ) ;
}
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
static void nilfs_readahead ( struct readahead_control * rac )
2009-04-06 19:01:32 -07:00
{
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
mpage_readahead ( rac , nilfs_get_block ) ;
2009-04-06 19:01:32 -07:00
}
static int nilfs_writepages ( struct address_space * mapping ,
struct writeback_control * wbc )
{
2009-04-06 19:01:38 -07:00
struct inode * inode = mapping - > host ;
int err = 0 ;
2017-07-17 08:45:34 +01:00
if ( sb_rdonly ( inode - > i_sb ) ) {
2013-04-30 15:27:48 -07:00
nilfs_clear_dirty_pages ( mapping , false ) ;
return - EROFS ;
}
2009-04-06 19:01:38 -07:00
if ( wbc - > sync_mode = = WB_SYNC_ALL )
err = nilfs_construct_dsync_segment ( inode - > i_sb , inode ,
wbc - > range_start ,
wbc - > range_end ) ;
return err ;
2009-04-06 19:01:32 -07:00
}
static int nilfs_writepage ( struct page * page , struct writeback_control * wbc )
{
struct inode * inode = page - > mapping - > host ;
int err ;
2017-07-17 08:45:34 +01:00
if ( sb_rdonly ( inode - > i_sb ) ) {
2013-04-30 15:27:48 -07:00
/*
* It means that filesystem was remounted in read - only
* mode because of error or metadata corruption . But we
* have dirty pages that try to be flushed in background .
* So , here we simply discard this dirty page .
*/
nilfs_clear_dirty_page ( page , false ) ;
unlock_page ( page ) ;
return - EROFS ;
}
2009-04-06 19:01:32 -07:00
redirty_page_for_writepage ( wbc , page ) ;
unlock_page ( page ) ;
if ( wbc - > sync_mode = = WB_SYNC_ALL ) {
err = nilfs_construct_segment ( inode - > i_sb ) ;
if ( unlikely ( err ) )
return err ;
} else if ( wbc - > for_reclaim )
nilfs_flush_segment ( inode - > i_sb , inode - > i_ino ) ;
return 0 ;
}
static int nilfs_set_page_dirty ( struct page * page )
{
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
struct inode * inode = page - > mapping - > host ;
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 15:55:29 -07:00
int ret = __set_page_dirty_nobuffers ( page ) ;
2009-04-06 19:01:32 -07:00
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 15:55:29 -07:00
if ( page_has_buffers ( page ) ) {
2016-05-23 16:23:39 -07:00
unsigned int nr_dirty = 0 ;
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 15:55:29 -07:00
struct buffer_head * bh , * head ;
2009-04-06 19:01:32 -07:00
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 15:55:29 -07:00
/*
* This page is locked by callers , and no other thread
* concurrently marks its buffers dirty since they are
* only dirtied through routines in fs / buffer . c in
* which call sites of mark_buffer_dirty are protected
* by page lock .
*/
bh = head = page_buffers ( page ) ;
do {
/* Do not mark hole blocks dirty */
if ( buffer_dirty ( bh ) | | ! buffer_mapped ( bh ) )
continue ;
set_buffer_dirty ( bh ) ;
nr_dirty + + ;
} while ( bh = bh - > b_this_page , bh ! = head ) ;
if ( nr_dirty )
nilfs_set_file_dirty ( inode , nr_dirty ) ;
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
} else if ( ret ) {
2016-05-23 16:23:39 -07:00
unsigned int nr_dirty = 1 < < ( PAGE_SHIFT - inode - > i_blkbits ) ;
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
nilfs_set_file_dirty ( inode , nr_dirty ) ;
2009-04-06 19:01:32 -07:00
}
return ret ;
}
2012-12-15 11:57:37 +01:00
void nilfs_write_failed ( struct address_space * mapping , loff_t to )
{
struct inode * inode = mapping - > host ;
if ( to > inode - > i_size ) {
2013-09-12 15:13:56 -07:00
truncate_pagecache ( inode , inode - > i_size ) ;
2012-12-15 11:57:37 +01:00
nilfs_truncate ( inode ) ;
}
}
2009-04-06 19:01:32 -07:00
static int nilfs_write_begin ( struct file * file , struct address_space * mapping ,
loff_t pos , unsigned len , unsigned flags ,
struct page * * pagep , void * * fsdata )
{
struct inode * inode = mapping - > host ;
int err = nilfs_transaction_begin ( inode - > i_sb , NULL , 1 ) ;
if ( unlikely ( err ) )
return err ;
2010-06-04 11:29:58 +02:00
err = block_write_begin ( mapping , pos , len , flags , pagep ,
nilfs_get_block ) ;
if ( unlikely ( err ) ) {
2012-12-15 11:57:37 +01:00
nilfs_write_failed ( mapping , pos + len ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_abort ( inode - > i_sb ) ;
2010-06-04 11:29:58 +02:00
}
2009-04-06 19:01:32 -07:00
return err ;
}
static int nilfs_write_end ( struct file * file , struct address_space * mapping ,
loff_t pos , unsigned len , unsigned copied ,
struct page * page , void * fsdata )
{
struct inode * inode = mapping - > host ;
2016-05-23 16:23:39 -07:00
unsigned int start = pos & ( PAGE_SIZE - 1 ) ;
unsigned int nr_dirty ;
2009-04-06 19:01:32 -07:00
int err ;
nr_dirty = nilfs_page_count_clean_buffers ( page , start ,
start + copied ) ;
copied = generic_write_end ( file , mapping , pos , len , copied , page ,
fsdata ) ;
2010-12-27 00:05:49 +09:00
nilfs_set_file_dirty ( inode , nr_dirty ) ;
2009-04-06 19:01:45 -07:00
err = nilfs_transaction_commit ( inode - > i_sb ) ;
2009-04-06 19:01:32 -07:00
return err ? : copied ;
}
static ssize_t
2016-04-07 08:51:58 -07:00
nilfs_direct_IO ( struct kiocb * iocb , struct iov_iter * iter )
2009-04-06 19:01:32 -07:00
{
2015-06-21 01:37:24 -04:00
struct inode * inode = file_inode ( iocb - > ki_filp ) ;
2009-04-06 19:01:32 -07:00
2015-03-16 04:33:52 -07:00
if ( iov_iter_rw ( iter ) = = WRITE )
2009-04-06 19:01:32 -07:00
return 0 ;
/* Needs synchronization with the cleaner */
2016-04-07 08:51:58 -07:00
return blockdev_direct_IO ( iocb , inode , iter , nilfs_get_block ) ;
2009-04-06 19:01:32 -07:00
}
2009-09-21 17:01:10 -07:00
const struct address_space_operations nilfs_aops = {
2009-04-06 19:01:32 -07:00
. writepage = nilfs_writepage ,
. readpage = nilfs_readpage ,
. writepages = nilfs_writepages ,
. set_page_dirty = nilfs_set_page_dirty ,
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
. readahead = nilfs_readahead ,
2009-04-06 19:01:32 -07:00
. write_begin = nilfs_write_begin ,
. write_end = nilfs_write_end ,
/* .releasepage = nilfs_releasepage, */
. invalidatepage = block_invalidatepage ,
. direct_IO = nilfs_direct_IO ,
2009-05-13 11:19:40 +09:00
. is_partially_uptodate = block_is_partially_uptodate ,
2009-04-06 19:01:32 -07:00
} ;
2014-12-10 15:54:34 -08:00
static int nilfs_insert_inode_locked ( struct inode * inode ,
struct nilfs_root * root ,
unsigned long ino )
{
struct nilfs_iget_args args = {
. ino = ino , . root = root , . cno = 0 , . for_gc = 0
} ;
return insert_inode_locked4 ( inode , ino , nilfs_iget_test , & args ) ;
}
2011-07-26 03:07:14 -04:00
struct inode * nilfs_new_inode ( struct inode * dir , umode_t mode )
2009-04-06 19:01:32 -07:00
{
struct super_block * sb = dir - > i_sb ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct inode * inode ;
struct nilfs_inode_info * ii ;
2010-08-25 17:45:44 +09:00
struct nilfs_root * root ;
2009-04-06 19:01:32 -07:00
int err = - ENOMEM ;
ino_t ino ;
inode = new_inode ( sb ) ;
if ( unlikely ( ! inode ) )
goto failed ;
mapping_set_gfp_mask ( inode - > i_mapping ,
2015-11-06 16:28:49 -08:00
mapping_gfp_constraint ( inode - > i_mapping , ~ __GFP_FS ) ) ;
2009-04-06 19:01:32 -07:00
2010-08-25 17:45:44 +09:00
root = NILFS_I ( dir ) - > i_root ;
2009-04-06 19:01:32 -07:00
ii = NILFS_I ( inode ) ;
2016-08-02 14:05:28 -07:00
ii - > i_state = BIT ( NILFS_I_NEW ) ;
2010-08-25 17:45:44 +09:00
ii - > i_root = root ;
2009-04-06 19:01:32 -07:00
2010-08-14 13:07:15 +09:00
err = nilfs_ifile_create_inode ( root - > ifile , & ino , & ii - > i_bh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
goto failed_ifile_create_inode ;
/* reference count of i_bh inherits from nilfs_mdt_read_block() */
2013-07-03 15:08:06 -07:00
atomic64_inc ( & root - > inodes_count ) ;
2021-01-21 14:19:25 +01:00
inode_init_owner ( & init_user_ns , inode , dir , mode ) ;
2009-04-06 19:01:32 -07:00
inode - > i_ino = ino ;
2016-09-14 07:48:04 -07:00
inode - > i_mtime = inode - > i_atime = inode - > i_ctime = current_time ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( S_ISREG ( mode ) | | S_ISDIR ( mode ) | | S_ISLNK ( mode ) ) {
err = nilfs_bmap_read ( ii - > i_bmap , NULL ) ;
if ( err < 0 )
2014-12-10 15:54:34 -08:00
goto failed_after_creation ;
2009-04-06 19:01:32 -07:00
set_bit ( NILFS_I_BMAP , & ii - > i_state ) ;
/* No lock is needed; iget() ensures it. */
}
2011-01-20 02:09:53 +09:00
ii - > i_flags = nilfs_mask_flags (
mode , NILFS_I ( dir ) - > i_flags & NILFS_FL_INHERITED ) ;
2009-04-06 19:01:32 -07:00
/* ii->i_file_acl = 0; */
/* ii->i_dir_acl = 0; */
ii - > i_dir_start_lookup = 0 ;
nilfs_set_inode_flags ( inode ) ;
2011-03-09 11:05:08 +09:00
spin_lock ( & nilfs - > ns_next_gen_lock ) ;
inode - > i_generation = nilfs - > ns_next_generation + + ;
spin_unlock ( & nilfs - > ns_next_gen_lock ) ;
2014-12-10 15:54:34 -08:00
if ( nilfs_insert_inode_locked ( inode , root , ino ) < 0 ) {
err = - EIO ;
goto failed_after_creation ;
}
2009-04-06 19:01:32 -07:00
err = nilfs_init_acl ( inode , dir ) ;
if ( unlikely ( err ) )
2016-05-23 16:23:48 -07:00
/*
* Never occur . When supporting nilfs_init_acl ( ) ,
* proper cancellation of above jobs should be considered .
*/
goto failed_after_creation ;
2009-04-06 19:01:32 -07:00
return inode ;
2014-12-10 15:54:34 -08:00
failed_after_creation :
2011-10-28 14:13:28 +02:00
clear_nlink ( inode ) ;
2020-08-11 18:35:43 -07:00
if ( inode - > i_state & I_NEW )
unlock_new_inode ( inode ) ;
2016-05-23 16:23:48 -07:00
iput ( inode ) ; /*
* raw_inode will be deleted through
* nilfs_evict_inode ( ) .
*/
2009-04-06 19:01:32 -07:00
goto failed ;
failed_ifile_create_inode :
make_bad_inode ( inode ) ;
2016-05-23 16:23:48 -07:00
iput ( inode ) ;
2009-04-06 19:01:32 -07:00
failed :
return ERR_PTR ( err ) ;
}
void nilfs_set_inode_flags ( struct inode * inode )
{
unsigned int flags = NILFS_I ( inode ) - > i_flags ;
2015-04-16 12:46:50 -07:00
unsigned int new_fl = 0 ;
2009-04-06 19:01:32 -07:00
2011-01-20 02:09:52 +09:00
if ( flags & FS_SYNC_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_SYNC ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_APPEND_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_APPEND ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_IMMUTABLE_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_IMMUTABLE ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_NOATIME_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_NOATIME ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_DIRSYNC_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_DIRSYNC ;
inode_set_flags ( inode , new_fl , S_SYNC | S_APPEND | S_IMMUTABLE |
S_NOATIME | S_DIRSYNC ) ;
2009-04-06 19:01:32 -07:00
}
int nilfs_read_inode_common ( struct inode * inode ,
struct nilfs_inode * raw_inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
int err ;
inode - > i_mode = le16_to_cpu ( raw_inode - > i_mode ) ;
2012-02-10 12:31:23 -08:00
i_uid_write ( inode , le32_to_cpu ( raw_inode - > i_uid ) ) ;
i_gid_write ( inode , le32_to_cpu ( raw_inode - > i_gid ) ) ;
2011-10-28 14:13:29 +02:00
set_nlink ( inode , le16_to_cpu ( raw_inode - > i_links_count ) ) ;
2009-04-06 19:01:32 -07:00
inode - > i_size = le64_to_cpu ( raw_inode - > i_size ) ;
inode - > i_atime . tv_sec = le64_to_cpu ( raw_inode - > i_mtime ) ;
inode - > i_ctime . tv_sec = le64_to_cpu ( raw_inode - > i_ctime ) ;
inode - > i_mtime . tv_sec = le64_to_cpu ( raw_inode - > i_mtime ) ;
2009-04-06 19:02:00 -07:00
inode - > i_atime . tv_nsec = le32_to_cpu ( raw_inode - > i_mtime_nsec ) ;
inode - > i_ctime . tv_nsec = le32_to_cpu ( raw_inode - > i_ctime_nsec ) ;
inode - > i_mtime . tv_nsec = le32_to_cpu ( raw_inode - > i_mtime_nsec ) ;
2014-12-10 15:54:34 -08:00
if ( inode - > i_nlink = = 0 )
return - ESTALE ; /* this inode is deleted */
2009-04-06 19:01:32 -07:00
inode - > i_blocks = le64_to_cpu ( raw_inode - > i_blocks ) ;
ii - > i_flags = le32_to_cpu ( raw_inode - > i_flags ) ;
#if 0
ii - > i_file_acl = le32_to_cpu ( raw_inode - > i_file_acl ) ;
ii - > i_dir_acl = S_ISREG ( inode - > i_mode ) ?
0 : le32_to_cpu ( raw_inode - > i_dir_acl ) ;
# endif
2009-09-28 13:02:46 +09:00
ii - > i_dir_start_lookup = 0 ;
2009-04-06 19:01:32 -07:00
inode - > i_generation = le32_to_cpu ( raw_inode - > i_generation ) ;
if ( S_ISREG ( inode - > i_mode ) | | S_ISDIR ( inode - > i_mode ) | |
S_ISLNK ( inode - > i_mode ) ) {
err = nilfs_bmap_read ( ii - > i_bmap , raw_inode ) ;
if ( err < 0 )
return err ;
set_bit ( NILFS_I_BMAP , & ii - > i_state ) ;
/* No lock is needed; iget() ensures it. */
}
return 0 ;
}
2010-08-14 13:07:15 +09:00
static int __nilfs_read_inode ( struct super_block * sb ,
struct nilfs_root * root , unsigned long ino ,
2009-04-06 19:01:32 -07:00
struct inode * inode )
{
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct buffer_head * bh ;
struct nilfs_inode * raw_inode ;
int err ;
2010-12-27 00:07:30 +09:00
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2010-08-14 13:07:15 +09:00
err = nilfs_ifile_get_inode_block ( root - > ifile , ino , & bh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
goto bad_inode ;
2010-08-14 13:07:15 +09:00
raw_inode = nilfs_ifile_map_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
2009-08-22 19:10:07 +09:00
err = nilfs_read_inode_common ( inode , raw_inode ) ;
if ( err )
2009-04-06 19:01:32 -07:00
goto failed_unmap ;
if ( S_ISREG ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_file_inode_operations ;
inode - > i_fop = & nilfs_file_operations ;
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else if ( S_ISDIR ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_dir_inode_operations ;
inode - > i_fop = & nilfs_dir_operations ;
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else if ( S_ISLNK ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_symlink_inode_operations ;
2015-11-17 01:07:57 -05:00
inode_nohighmem ( inode ) ;
2009-04-06 19:01:32 -07:00
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else {
inode - > i_op = & nilfs_special_inode_operations ;
init_special_inode (
inode , inode - > i_mode ,
2010-05-09 15:31:22 +09:00
huge_decode_dev ( le64_to_cpu ( raw_inode - > i_device_code ) ) ) ;
2009-04-06 19:01:32 -07:00
}
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
brelse ( bh ) ;
2010-12-27 00:07:30 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-04-06 19:01:32 -07:00
nilfs_set_inode_flags ( inode ) ;
nilfs2: put out gfp mask manipulation from nilfs_set_inode_flags()
nilfs_set_inode_flags() function adjusts gfp-mask of inode->i_mapping as
well as i_flags, however, this coupling of operations is not appropriate.
For instance, nilfs_ioctl_setflags(), one of three callers of
nilfs_set_inode_flags(), doesn't need to reinitialize the gfp-mask at all.
In addition, nilfs_new_inode(), another caller of
nilfs_set_inode_flags(), doesn't either because it has already initialized
the gfp-mask.
Only __nilfs_read_inode(), the remaining caller, needs it. So, this moves
the gfp mask manipulation to __nilfs_read_inode() from
nilfs_set_inode_flags().
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 12:46:47 -07:00
mapping_set_gfp_mask ( inode - > i_mapping ,
2015-11-06 16:28:49 -08:00
mapping_gfp_constraint ( inode - > i_mapping , ~ __GFP_FS ) ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
failed_unmap :
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
brelse ( bh ) ;
bad_inode :
2010-12-27 00:07:30 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-04-06 19:01:32 -07:00
return err ;
}
2010-08-20 21:20:29 +09:00
static int nilfs_iget_test ( struct inode * inode , void * opaque )
{
struct nilfs_iget_args * args = opaque ;
struct nilfs_inode_info * ii ;
2010-08-25 17:45:44 +09:00
if ( args - > ino ! = inode - > i_ino | | args - > root ! = NILFS_I ( inode ) - > i_root )
2010-08-20 21:20:29 +09:00
return 0 ;
ii = NILFS_I ( inode ) ;
if ( ! test_bit ( NILFS_I_GCINODE , & ii - > i_state ) )
return ! args - > for_gc ;
return args - > for_gc & & args - > cno = = ii - > i_cno ;
}
static int nilfs_iget_set ( struct inode * inode , void * opaque )
{
struct nilfs_iget_args * args = opaque ;
inode - > i_ino = args - > ino ;
if ( args - > for_gc ) {
2016-08-02 14:05:28 -07:00
NILFS_I ( inode ) - > i_state = BIT ( NILFS_I_GCINODE ) ;
2010-08-20 21:20:29 +09:00
NILFS_I ( inode ) - > i_cno = args - > cno ;
2010-08-25 17:45:44 +09:00
NILFS_I ( inode ) - > i_root = NULL ;
} else {
if ( args - > root & & args - > ino = = NILFS_ROOT_INO )
nilfs_get_root ( args - > root ) ;
NILFS_I ( inode ) - > i_root = args - > root ;
2010-08-20 21:20:29 +09:00
}
return 0 ;
}
2010-09-13 11:16:34 +09:00
struct inode * nilfs_ilookup ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
{
struct nilfs_iget_args args = {
. ino = ino , . root = root , . cno = 0 , . for_gc = 0
} ;
return ilookup5 ( sb , ino , nilfs_iget_test , & args ) ;
}
2010-09-05 12:20:59 +09:00
struct inode * nilfs_iget_locked ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
2009-04-06 19:01:32 -07:00
{
2010-08-25 17:45:44 +09:00
struct nilfs_iget_args args = {
. ino = ino , . root = root , . cno = 0 , . for_gc = 0
} ;
2010-09-05 12:20:59 +09:00
return iget5_locked ( sb , ino , nilfs_iget_test , nilfs_iget_set , & args ) ;
}
struct inode * nilfs_iget ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
{
2009-04-06 19:01:32 -07:00
struct inode * inode ;
int err ;
2010-09-05 12:20:59 +09:00
inode = nilfs_iget_locked ( sb , root , ino ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( ! inode ) )
return ERR_PTR ( - ENOMEM ) ;
if ( ! ( inode - > i_state & I_NEW ) )
return inode ;
2010-08-14 13:07:15 +09:00
err = __nilfs_read_inode ( sb , root , ino , inode ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) ) {
iget_failed ( inode ) ;
return ERR_PTR ( err ) ;
}
unlock_new_inode ( inode ) ;
return inode ;
}
2010-08-20 19:06:11 +09:00
struct inode * nilfs_iget_for_gc ( struct super_block * sb , unsigned long ino ,
__u64 cno )
{
2010-08-25 17:45:44 +09:00
struct nilfs_iget_args args = {
. ino = ino , . root = NULL , . cno = cno , . for_gc = 1
} ;
2010-08-20 19:06:11 +09:00
struct inode * inode ;
int err ;
inode = iget5_locked ( sb , ino , nilfs_iget_test , nilfs_iget_set , & args ) ;
if ( unlikely ( ! inode ) )
return ERR_PTR ( - ENOMEM ) ;
if ( ! ( inode - > i_state & I_NEW ) )
return inode ;
err = nilfs_init_gcinode ( inode ) ;
if ( unlikely ( err ) ) {
iget_failed ( inode ) ;
return ERR_PTR ( err ) ;
}
unlock_new_inode ( inode ) ;
return inode ;
}
2009-04-06 19:01:32 -07:00
void nilfs_write_inode_common ( struct inode * inode ,
struct nilfs_inode * raw_inode , int has_bmap )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
raw_inode - > i_mode = cpu_to_le16 ( inode - > i_mode ) ;
2012-02-10 12:31:23 -08:00
raw_inode - > i_uid = cpu_to_le32 ( i_uid_read ( inode ) ) ;
raw_inode - > i_gid = cpu_to_le32 ( i_gid_read ( inode ) ) ;
2009-04-06 19:01:32 -07:00
raw_inode - > i_links_count = cpu_to_le16 ( inode - > i_nlink ) ;
raw_inode - > i_size = cpu_to_le64 ( inode - > i_size ) ;
raw_inode - > i_ctime = cpu_to_le64 ( inode - > i_ctime . tv_sec ) ;
raw_inode - > i_mtime = cpu_to_le64 ( inode - > i_mtime . tv_sec ) ;
2009-04-06 19:02:00 -07:00
raw_inode - > i_ctime_nsec = cpu_to_le32 ( inode - > i_ctime . tv_nsec ) ;
raw_inode - > i_mtime_nsec = cpu_to_le32 ( inode - > i_mtime . tv_nsec ) ;
2009-04-06 19:01:32 -07:00
raw_inode - > i_blocks = cpu_to_le64 ( inode - > i_blocks ) ;
raw_inode - > i_flags = cpu_to_le32 ( ii - > i_flags ) ;
raw_inode - > i_generation = cpu_to_le32 ( inode - > i_generation ) ;
2011-04-30 18:56:12 +09:00
if ( NILFS_ROOT_METADATA_FILE ( inode - > i_ino ) ) {
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
/* zero-fill unused portion in the case of super root block */
raw_inode - > i_xattr = 0 ;
raw_inode - > i_pad = 0 ;
memset ( ( void * ) raw_inode + sizeof ( * raw_inode ) , 0 ,
nilfs - > ns_inode_size - sizeof ( * raw_inode ) ) ;
}
2009-04-06 19:01:32 -07:00
if ( has_bmap )
nilfs_bmap_write ( ii - > i_bmap , raw_inode ) ;
else if ( S_ISCHR ( inode - > i_mode ) | | S_ISBLK ( inode - > i_mode ) )
raw_inode - > i_device_code =
2010-05-09 15:31:22 +09:00
cpu_to_le64 ( huge_encode_dev ( inode - > i_rdev ) ) ;
2016-05-23 16:23:48 -07:00
/*
* When extending inode , nilfs - > ns_inode_size should be checked
* for substitutions of appended fields .
*/
2009-04-06 19:01:32 -07:00
}
2014-10-13 15:53:22 -07:00
void nilfs_update_inode ( struct inode * inode , struct buffer_head * ibh , int flags )
2009-04-06 19:01:32 -07:00
{
ino_t ino = inode - > i_ino ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2010-08-14 13:07:15 +09:00
struct inode * ifile = ii - > i_root - > ifile ;
2009-04-06 19:01:32 -07:00
struct nilfs_inode * raw_inode ;
2010-08-14 13:07:15 +09:00
raw_inode = nilfs_ifile_map_inode ( ifile , ino , ibh ) ;
2009-04-06 19:01:32 -07:00
if ( test_and_clear_bit ( NILFS_I_NEW , & ii - > i_state ) )
2010-08-14 13:07:15 +09:00
memset ( raw_inode , 0 , NILFS_MDT ( ifile ) - > mi_entry_size ) ;
2014-10-13 15:53:22 -07:00
if ( flags & I_DIRTY_DATASYNC )
set_bit ( NILFS_I_INODE_SYNC , & ii - > i_state ) ;
2009-04-06 19:01:32 -07:00
nilfs_write_inode_common ( inode , raw_inode , 0 ) ;
2016-05-23 16:23:48 -07:00
/*
* XXX : call with has_bmap = 0 is a workaround to avoid
* deadlock of bmap . This delays update of i_bmap to just
* before writing .
*/
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( ifile , ino , ibh ) ;
2009-04-06 19:01:32 -07:00
}
# define NILFS_MAX_TRUNCATE_BLOCKS 16384 /* 64MB for 4KB block */
static void nilfs_truncate_bmap ( struct nilfs_inode_info * ii ,
unsigned long from )
{
2015-04-16 12:46:34 -07:00
__u64 b ;
2009-04-06 19:01:32 -07:00
int ret ;
if ( ! test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
return ;
2010-11-19 15:26:20 +09:00
repeat :
2009-04-06 19:01:32 -07:00
ret = nilfs_bmap_last_key ( ii - > i_bmap , & b ) ;
if ( ret = = - ENOENT )
return ;
else if ( ret < 0 )
goto failed ;
if ( b < from )
return ;
2015-04-16 12:46:34 -07:00
b - = min_t ( __u64 , NILFS_MAX_TRUNCATE_BLOCKS , b - from ) ;
2009-04-06 19:01:32 -07:00
ret = nilfs_bmap_truncate ( ii - > i_bmap , b ) ;
nilfs_relax_pressure_in_lock ( ii - > vfs_inode . i_sb ) ;
if ( ! ret | | ( ret = = - ENOMEM & &
nilfs_bmap_truncate ( ii - > i_bmap , b ) = = 0 ) )
goto repeat ;
2010-11-19 15:26:20 +09:00
failed :
2020-08-11 18:35:49 -07:00
nilfs_warn ( ii - > vfs_inode . i_sb , " error %d truncating bmap (ino=%lu) " ,
ret , ii - > vfs_inode . i_ino ) ;
2009-04-06 19:01:32 -07:00
}
void nilfs_truncate ( struct inode * inode )
{
unsigned long blkoff ;
unsigned int blocksize ;
struct nilfs_transaction_info ti ;
struct super_block * sb = inode - > i_sb ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
if ( ! test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
return ;
if ( IS_APPEND ( inode ) | | IS_IMMUTABLE ( inode ) )
return ;
blocksize = sb - > s_blocksize ;
blkoff = ( inode - > i_size + blocksize - 1 ) > > sb - > s_blocksize_bits ;
2009-04-06 19:01:55 -07:00
nilfs_transaction_begin ( sb , & ti , 0 ) ; /* never fails */
2009-04-06 19:01:32 -07:00
block_truncate_page ( inode - > i_mapping , inode - > i_size , nilfs_get_block ) ;
nilfs_truncate_bmap ( ii , blkoff ) ;
2016-09-14 07:48:04 -07:00
inode - > i_mtime = inode - > i_ctime = current_time ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( IS_SYNC ( inode ) )
nilfs_set_transaction_flag ( NILFS_TI_SYNC ) ;
2009-11-27 19:41:14 +09:00
nilfs_mark_inode_dirty ( inode ) ;
2010-12-27 00:05:49 +09:00
nilfs_set_file_dirty ( inode , 0 ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( sb ) ;
2016-05-23 16:23:48 -07:00
/*
* May construct a logical segment and may fail in sync mode .
* But truncate has no return value .
*/
2009-04-06 19:01:32 -07:00
}
2010-06-07 11:55:00 -04:00
static void nilfs_clear_inode ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
/*
* Free resources allocated in nilfs_read_inode ( ) , here .
*/
BUG_ON ( ! list_empty ( & ii - > i_dirty ) ) ;
brelse ( ii - > i_bh ) ;
ii - > i_bh = NULL ;
2016-05-23 16:23:20 -07:00
if ( nilfs_is_metadata_file_inode ( inode ) )
nilfs_mdt_clear ( inode ) ;
2010-08-20 23:40:54 +09:00
2010-06-07 11:55:00 -04:00
if ( test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
nilfs_bmap_clear ( ii - > i_bmap ) ;
nilfs_btnode_cache_clear ( & ii - > i_btnode_cache ) ;
2010-08-25 17:45:44 +09:00
if ( ii - > i_root & & inode - > i_ino = = NILFS_ROOT_INO )
nilfs_put_root ( ii - > i_root ) ;
2010-06-07 11:55:00 -04:00
}
void nilfs_evict_inode ( struct inode * inode )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
struct super_block * sb = inode - > i_sb ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-02-11 15:23:27 +09:00
int ret ;
2009-04-06 19:01:32 -07:00
2010-08-25 17:45:44 +09:00
if ( inode - > i_nlink | | ! ii - > i_root | | unlikely ( is_bad_inode ( inode ) ) ) {
2014-04-03 14:47:49 -07:00
truncate_inode_pages_final ( & inode - > i_data ) ;
2012-05-03 14:48:02 +02:00
clear_inode ( inode ) ;
2010-06-07 11:55:00 -04:00
nilfs_clear_inode ( inode ) ;
2009-04-06 19:01:32 -07:00
return ;
}
2009-04-06 19:01:55 -07:00
nilfs_transaction_begin ( sb , & ti , 0 ) ; /* never fails */
2014-04-03 14:47:49 -07:00
truncate_inode_pages_final ( & inode - > i_data ) ;
2009-04-06 19:01:32 -07:00
2010-08-14 13:07:15 +09:00
/* TODO: some of the following operations may fail. */
2009-04-06 19:01:32 -07:00
nilfs_truncate_bmap ( ii , 0 ) ;
2009-11-27 19:41:14 +09:00
nilfs_mark_inode_dirty ( inode ) ;
2012-05-03 14:48:02 +02:00
clear_inode ( inode ) ;
2010-08-14 13:07:15 +09:00
2011-02-11 15:23:27 +09:00
ret = nilfs_ifile_delete_inode ( ii - > i_root - > ifile , inode - > i_ino ) ;
if ( ! ret )
2013-07-03 15:08:06 -07:00
atomic64_dec ( & ii - > i_root - > inodes_count ) ;
2010-08-14 13:07:15 +09:00
2010-06-07 11:55:00 -04:00
nilfs_clear_inode ( inode ) ;
2010-08-14 13:07:15 +09:00
2009-04-06 19:01:32 -07:00
if ( IS_SYNC ( inode ) )
nilfs_set_transaction_flag ( NILFS_TI_SYNC ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( sb ) ;
2016-05-23 16:23:48 -07:00
/*
* May construct a logical segment and may fail in sync mode .
* But delete_inode has no return value .
*/
2009-04-06 19:01:32 -07:00
}
2021-01-21 14:19:43 +01:00
int nilfs_setattr ( struct user_namespace * mnt_userns , struct dentry * dentry ,
struct iattr * iattr )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
2015-03-17 22:25:59 +00:00
struct inode * inode = d_inode ( dentry ) ;
2009-04-06 19:01:32 -07:00
struct super_block * sb = inode - > i_sb ;
2009-04-06 19:01:45 -07:00
int err ;
2009-04-06 19:01:32 -07:00
2021-01-21 14:19:26 +01:00
err = setattr_prepare ( & init_user_ns , dentry , iattr ) ;
2009-04-06 19:01:32 -07:00
if ( err )
return err ;
err = nilfs_transaction_begin ( sb , & ti , 0 ) ;
if ( unlikely ( err ) )
return err ;
2010-06-04 11:30:02 +02:00
if ( ( iattr - > ia_valid & ATTR_SIZE ) & &
iattr - > ia_size ! = i_size_read ( inode ) ) {
2011-06-24 14:29:45 -04:00
inode_dio_wait ( inode ) ;
2012-12-15 11:57:37 +01:00
truncate_setsize ( inode , iattr - > ia_size ) ;
nilfs_truncate ( inode ) ;
2010-06-04 11:30:02 +02:00
}
2021-01-21 14:19:26 +01:00
setattr_copy ( & init_user_ns , inode , iattr ) ;
2010-06-04 11:30:02 +02:00
mark_inode_dirty ( inode ) ;
if ( iattr - > ia_valid & ATTR_MODE ) {
2009-04-06 19:01:32 -07:00
err = nilfs_acl_chmod ( inode ) ;
2010-06-04 11:30:02 +02:00
if ( unlikely ( err ) )
goto out_err ;
}
return nilfs_transaction_commit ( sb ) ;
2009-04-06 19:01:45 -07:00
2010-06-04 11:30:02 +02:00
out_err :
nilfs_transaction_abort ( sb ) ;
2009-04-06 19:01:45 -07:00
return err ;
2009-04-06 19:01:32 -07:00
}
2021-01-21 14:19:43 +01:00
int nilfs_permission ( struct user_namespace * mnt_userns , struct inode * inode ,
int mask )
2010-08-15 23:33:57 +09:00
{
2011-06-18 20:21:44 -04:00
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2016-05-23 16:23:25 -07:00
2010-08-15 23:33:57 +09:00
if ( ( mask & MAY_WRITE ) & & root & &
root - > cno ! = NILFS_CPTREE_CURRENT_CNO )
return - EROFS ; /* snapshot is not writable */
2021-01-21 14:19:24 +01:00
return generic_permission ( & init_user_ns , inode , mask ) ;
2010-08-15 23:33:57 +09:00
}
2010-12-27 00:05:49 +09:00
int nilfs_load_inode_block ( struct inode * inode , struct buffer_head * * pbh )
2009-04-06 19:01:32 -07:00
{
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
int err ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
if ( ii - > i_bh = = NULL ) {
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2010-08-14 13:07:15 +09:00
err = nilfs_ifile_get_inode_block ( ii - > i_root - > ifile ,
inode - > i_ino , pbh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
return err ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
if ( ii - > i_bh = = NULL )
ii - > i_bh = * pbh ;
else {
brelse ( * pbh ) ;
* pbh = ii - > i_bh ;
}
} else
* pbh = ii - > i_bh ;
get_bh ( * pbh ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
}
int nilfs_inode_dirty ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
int ret = 0 ;
if ( ! list_empty ( & ii - > i_dirty ) ) {
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
ret = test_bit ( NILFS_I_DIRTY , & ii - > i_state ) | |
test_bit ( NILFS_I_BUSY , & ii - > i_state ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
}
return ret ;
}
2016-05-23 16:23:39 -07:00
int nilfs_set_file_dirty ( struct inode * inode , unsigned int nr_dirty )
2009-04-06 19:01:32 -07:00
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
2011-03-09 11:05:07 +09:00
atomic_add ( nr_dirty , & nilfs - > ns_ndirtyblks ) ;
2009-04-06 19:01:32 -07:00
2009-04-06 19:01:56 -07:00
if ( test_and_set_bit ( NILFS_I_DIRTY , & ii - > i_state ) )
2009-04-06 19:01:32 -07:00
return 0 ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
if ( ! test_bit ( NILFS_I_QUEUED , & ii - > i_state ) & &
! test_bit ( NILFS_I_BUSY , & ii - > i_state ) ) {
2016-05-23 16:23:48 -07:00
/*
* Because this routine may race with nilfs_dispose_list ( ) ,
* we have to check NILFS_I_QUEUED here , too .
*/
2009-04-06 19:01:32 -07:00
if ( list_empty ( & ii - > i_dirty ) & & igrab ( inode ) = = NULL ) {
2016-05-23 16:23:48 -07:00
/*
* This will happen when somebody is freeing
* this inode .
*/
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" cannot set file dirty (ino=%lu): the file is being freed " ,
inode - > i_ino ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2016-05-23 16:23:48 -07:00
return - EINVAL ; /*
* NILFS_I_DIRTY may remain for
* freeing inode .
*/
2009-04-06 19:01:32 -07:00
}
2011-03-19 16:45:30 +01:00
list_move_tail ( & ii - > i_dirty , & nilfs - > ns_dirty_files ) ;
2009-04-06 19:01:32 -07:00
set_bit ( NILFS_I_QUEUED , & ii - > i_state ) ;
}
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
}
2014-10-13 15:53:22 -07:00
int __nilfs_mark_inode_dirty ( struct inode * inode , int flags )
2009-04-06 19:01:32 -07:00
{
struct buffer_head * ibh ;
int err ;
2010-12-27 00:05:49 +09:00
err = nilfs_load_inode_block ( inode , & ibh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) ) {
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" cannot mark inode dirty (ino=%lu): error %d loading inode block " ,
inode - > i_ino , err ) ;
2009-04-06 19:01:32 -07:00
return err ;
}
2014-10-13 15:53:22 -07:00
nilfs_update_inode ( inode , ibh , flags ) ;
2011-05-05 12:56:51 +09:00
mark_buffer_dirty ( ibh ) ;
2010-08-14 13:07:15 +09:00
nilfs_mdt_mark_dirty ( NILFS_I ( inode ) - > i_root - > ifile ) ;
2009-04-06 19:01:32 -07:00
brelse ( ibh ) ;
return 0 ;
}
/**
* nilfs_dirty_inode - reflect changes on given inode to an inode block .
* @ inode : inode of the file to be registered .
*
* nilfs_dirty_inode ( ) loads a inode block containing the specified
* @ inode and copies data from a nilfs_inode to a corresponding inode
* entry in the inode block . This operation is excluded from the segment
* construction . This function can be called both as a single operation
* and as a part of indivisible file operations .
*/
2011-05-27 06:53:02 -04:00
void nilfs_dirty_inode ( struct inode * inode , int flags )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
2010-08-21 00:30:39 +09:00
struct nilfs_mdt_info * mdi = NILFS_MDT ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( is_bad_inode ( inode ) ) {
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" tried to mark bad_inode dirty. ignored. " ) ;
2009-04-06 19:01:32 -07:00
dump_stack ( ) ;
return ;
}
2010-08-21 00:30:39 +09:00
if ( mdi ) {
nilfs_mdt_mark_dirty ( inode ) ;
return ;
}
2009-04-06 19:01:32 -07:00
nilfs_transaction_begin ( inode - > i_sb , & ti , 0 ) ;
2014-10-13 15:53:22 -07:00
__nilfs_mark_inode_dirty ( inode , flags ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( inode - > i_sb ) ; /* never fails */
2009-04-06 19:01:32 -07:00
}
2010-12-26 16:38:43 +09:00
int nilfs_fiemap ( struct inode * inode , struct fiemap_extent_info * fieinfo ,
__u64 start , __u64 len )
{
2011-05-05 12:56:51 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2010-12-26 16:38:43 +09:00
__u64 logical = 0 , phys = 0 , size = 0 ;
__u32 flags = 0 ;
loff_t isize ;
sector_t blkoff , end_blkoff ;
sector_t delalloc_blkoff ;
unsigned long delalloc_blklen ;
unsigned int blkbits = inode - > i_blkbits ;
int ret , n ;
2020-05-23 09:30:14 +02:00
ret = fiemap_prep ( inode , fieinfo , start , & len , 0 ) ;
2010-12-26 16:38:43 +09:00
if ( ret )
return ret ;
2016-01-22 15:40:57 -05:00
inode_lock ( inode ) ;
2010-12-26 16:38:43 +09:00
isize = i_size_read ( inode ) ;
blkoff = start > > blkbits ;
end_blkoff = ( start + len - 1 ) > > blkbits ;
delalloc_blklen = nilfs_find_uncommitted_extent ( inode , blkoff ,
& delalloc_blkoff ) ;
do {
__u64 blkphy ;
unsigned int maxblocks ;
if ( delalloc_blklen & & blkoff = = delalloc_blkoff ) {
if ( size ) {
/* End of the current extent */
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size , flags ) ;
if ( ret )
break ;
}
if ( blkoff > end_blkoff )
break ;
flags = FIEMAP_EXTENT_MERGED | FIEMAP_EXTENT_DELALLOC ;
logical = blkoff < < blkbits ;
phys = 0 ;
size = delalloc_blklen < < blkbits ;
blkoff = delalloc_blkoff + delalloc_blklen ;
delalloc_blklen = nilfs_find_uncommitted_extent (
inode , blkoff , & delalloc_blkoff ) ;
continue ;
}
/*
* Limit the number of blocks that we look up so as
* not to get into the next delayed allocation extent .
*/
maxblocks = INT_MAX ;
if ( delalloc_blklen )
maxblocks = min_t ( sector_t , delalloc_blkoff - blkoff ,
maxblocks ) ;
blkphy = 0 ;
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
n = nilfs_bmap_lookup_contig (
NILFS_I ( inode ) - > i_bmap , blkoff , & blkphy , maxblocks ) ;
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
if ( n < 0 ) {
int past_eof ;
if ( unlikely ( n ! = - ENOENT ) )
break ; /* error */
/* HOLE */
blkoff + + ;
past_eof = ( ( blkoff < < blkbits ) > = isize ) ;
if ( size ) {
/* End of the current extent */
if ( past_eof )
flags | = FIEMAP_EXTENT_LAST ;
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size , flags ) ;
if ( ret )
break ;
size = 0 ;
}
if ( blkoff > end_blkoff | | past_eof )
break ;
} else {
if ( size ) {
if ( phys & & blkphy < < blkbits = = phys + size ) {
/* The current extent goes on */
size + = n < < blkbits ;
} else {
/* Terminate the current extent */
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size ,
flags ) ;
if ( ret | | blkoff > end_blkoff )
break ;
/* Start another extent */
flags = FIEMAP_EXTENT_MERGED ;
logical = blkoff < < blkbits ;
phys = blkphy < < blkbits ;
size = n < < blkbits ;
}
} else {
/* Start a new extent */
flags = FIEMAP_EXTENT_MERGED ;
logical = blkoff < < blkbits ;
phys = blkphy < < blkbits ;
size = n < < blkbits ;
}
blkoff + = n ;
}
cond_resched ( ) ;
} while ( true ) ;
/* If ret is 1 then we just hit the end of the extent array */
if ( ret = = 1 )
ret = 0 ;
2016-01-22 15:40:57 -05:00
inode_unlock ( inode ) ;
2010-12-26 16:38:43 +09:00
return ret ;
}