2018-09-04 15:46:30 -07:00
// SPDX-License-Identifier: GPL-2.0+
2009-04-06 19:01:32 -07:00
/*
2021-11-08 18:35:01 -08:00
* NILFS inode operations .
2009-04-06 19:01:32 -07:00
*
* Copyright ( C ) 2005 - 2008 Nippon Telegraph and Telephone Corporation .
*
2016-05-23 16:23:09 -07:00
* Written by Ryusuke Konishi .
2009-04-06 19:01:32 -07:00
*
*/
# include <linux/buffer_head.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/gfp.h>
2009-04-06 19:01:32 -07:00
# include <linux/mpage.h>
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
# include <linux/pagemap.h>
2009-04-06 19:01:32 -07:00
# include <linux/writeback.h>
2015-02-22 08:58:50 -08:00
# include <linux/uio.h>
2020-05-23 09:30:11 +02:00
# include <linux/fiemap.h>
2009-04-06 19:01:32 -07:00
# include "nilfs.h"
2010-06-07 11:55:00 -04:00
# include "btnode.h"
2009-04-06 19:01:32 -07:00
# include "segment.h"
# include "page.h"
# include "mdt.h"
# include "cpfile.h"
# include "ifile.h"
2012-07-30 14:42:10 -07:00
/**
* struct nilfs_iget_args - arguments used during comparison between inodes
* @ ino : inode number
* @ cno : checkpoint number
* @ root : pointer on NILFS root object ( mounted checkpoint )
* @ for_gc : inode for GC flag
2022-04-01 11:28:18 -07:00
* @ for_btnc : inode for B - tree node cache flag
2022-04-01 11:28:21 -07:00
* @ for_shadow : inode for shadowed page cache flag
2012-07-30 14:42:10 -07:00
*/
2010-08-20 21:20:29 +09:00
struct nilfs_iget_args {
u64 ino ;
__u64 cno ;
2010-08-25 17:45:44 +09:00
struct nilfs_root * root ;
2022-04-01 11:28:18 -07:00
bool for_gc ;
bool for_btnc ;
2022-04-01 11:28:21 -07:00
bool for_shadow ;
2010-08-20 21:20:29 +09:00
} ;
2009-04-06 19:01:32 -07:00
2014-12-10 15:54:34 -08:00
static int nilfs_iget_test ( struct inode * inode , void * opaque ) ;
2011-03-05 00:19:32 +09:00
void nilfs_inode_add_blocks ( struct inode * inode , int n )
{
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2017-02-27 14:28:32 -08:00
inode_add_bytes ( inode , i_blocksize ( inode ) * n ) ;
2011-03-05 00:19:32 +09:00
if ( root )
2013-07-03 15:08:06 -07:00
atomic64_add ( n , & root - > blocks_count ) ;
2011-03-05 00:19:32 +09:00
}
void nilfs_inode_sub_blocks ( struct inode * inode , int n )
{
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2017-02-27 14:28:32 -08:00
inode_sub_bytes ( inode , i_blocksize ( inode ) * n ) ;
2011-03-05 00:19:32 +09:00
if ( root )
2013-07-03 15:08:06 -07:00
atomic64_sub ( n , & root - > blocks_count ) ;
2011-03-05 00:19:32 +09:00
}
2009-04-06 19:01:32 -07:00
/**
* nilfs_get_block ( ) - get a file block on the filesystem ( callback function )
2022-05-12 07:05:29 -07:00
* @ inode : inode struct of the target file
* @ blkoff : file block number
* @ bh_result : buffer head to be mapped on
* @ create : indicate whether allocating the block or not when it has not
2009-04-06 19:01:32 -07:00
* been allocated yet .
*
* This function does not issue actual read request of the specified data
* block . It is done by VFS .
*/
int nilfs_get_block ( struct inode * inode , sector_t blkoff ,
struct buffer_head * bh_result , int create )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-05-05 12:56:51 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-05-25 02:47:14 +09:00
__u64 blknum = 0 ;
2009-04-06 19:01:32 -07:00
int err = 0 , ret ;
2016-05-23 16:23:39 -07:00
unsigned int maxblocks = bh_result - > b_size > > inode - > i_blkbits ;
2009-04-06 19:01:32 -07:00
2011-05-05 12:56:51 +09:00
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-05-25 02:47:14 +09:00
ret = nilfs_bmap_lookup_contig ( ii - > i_bmap , blkoff , & blknum , maxblocks ) ;
2011-05-05 12:56:51 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-05-25 02:47:14 +09:00
if ( ret > = 0 ) { /* found */
2009-04-06 19:01:32 -07:00
map_bh ( bh_result , inode - > i_sb , blknum ) ;
2009-05-25 02:47:14 +09:00
if ( ret > 0 )
bh_result - > b_size = ( ret < < inode - > i_blkbits ) ;
2009-04-06 19:01:32 -07:00
goto out ;
}
/* data block was not found */
if ( ret = = - ENOENT & & create ) {
struct nilfs_transaction_info ti ;
bh_result - > b_blocknr = 0 ;
err = nilfs_transaction_begin ( inode - > i_sb , & ti , 1 ) ;
if ( unlikely ( err ) )
goto out ;
2015-04-16 12:46:34 -07:00
err = nilfs_bmap_insert ( ii - > i_bmap , blkoff ,
2009-04-06 19:01:32 -07:00
( unsigned long ) bh_result ) ;
if ( unlikely ( err ! = 0 ) ) {
if ( err = = - EEXIST ) {
/*
* The get_block ( ) function could be called
* from multiple callers for an inode .
* However , the page having this block must
* be locked in this case .
*/
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" %s (ino=%lu): a race condition while inserting a data block at offset=%llu " ,
__func__ , inode - > i_ino ,
( unsigned long long ) blkoff ) ;
2009-04-06 19:01:55 -07:00
err = 0 ;
2009-04-06 19:01:32 -07:00
}
2009-04-06 19:01:45 -07:00
nilfs_transaction_abort ( inode - > i_sb ) ;
2009-04-06 19:01:32 -07:00
goto out ;
}
2014-10-13 15:53:22 -07:00
nilfs_mark_inode_dirty_sync ( inode ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( inode - > i_sb ) ; /* never fails */
2009-04-06 19:01:32 -07:00
/* Error handling should be detailed */
set_buffer_new ( bh_result ) ;
2010-12-26 16:28:28 +09:00
set_buffer_delay ( bh_result ) ;
2016-05-23 16:23:48 -07:00
map_bh ( bh_result , inode - > i_sb , 0 ) ;
/* Disk block number must be changed to proper value */
2009-04-06 19:01:32 -07:00
} else if ( ret = = - ENOENT ) {
2016-05-23 16:23:48 -07:00
/*
* not found is not error ( e . g . hole ) ; must return without
* the mapped state flag .
*/
2009-04-06 19:01:32 -07:00
;
} else {
err = ret ;
}
out :
return err ;
}
/**
2022-04-29 11:47:39 -04:00
* nilfs_read_folio ( ) - implement read_folio ( ) method of nilfs_aops { }
2009-04-06 19:01:32 -07:00
* address_space_operations .
2022-05-12 07:05:29 -07:00
* @ file : file struct of the file to be read
* @ folio : the folio to be read
2009-04-06 19:01:32 -07:00
*/
2022-04-29 11:47:39 -04:00
static int nilfs_read_folio ( struct file * file , struct folio * folio )
2009-04-06 19:01:32 -07:00
{
2022-04-29 11:47:39 -04:00
return mpage_read_folio ( folio , nilfs_get_block ) ;
2009-04-06 19:01:32 -07:00
}
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
static void nilfs_readahead ( struct readahead_control * rac )
2009-04-06 19:01:32 -07:00
{
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
mpage_readahead ( rac , nilfs_get_block ) ;
2009-04-06 19:01:32 -07:00
}
static int nilfs_writepages ( struct address_space * mapping ,
struct writeback_control * wbc )
{
2009-04-06 19:01:38 -07:00
struct inode * inode = mapping - > host ;
int err = 0 ;
2017-07-17 08:45:34 +01:00
if ( sb_rdonly ( inode - > i_sb ) ) {
2013-04-30 15:27:48 -07:00
nilfs_clear_dirty_pages ( mapping , false ) ;
return - EROFS ;
}
2009-04-06 19:01:38 -07:00
if ( wbc - > sync_mode = = WB_SYNC_ALL )
err = nilfs_construct_dsync_segment ( inode - > i_sb , inode ,
wbc - > range_start ,
wbc - > range_end ) ;
return err ;
2009-04-06 19:01:32 -07:00
}
static int nilfs_writepage ( struct page * page , struct writeback_control * wbc )
{
struct inode * inode = page - > mapping - > host ;
int err ;
2017-07-17 08:45:34 +01:00
if ( sb_rdonly ( inode - > i_sb ) ) {
2013-04-30 15:27:48 -07:00
/*
* It means that filesystem was remounted in read - only
* mode because of error or metadata corruption . But we
* have dirty pages that try to be flushed in background .
* So , here we simply discard this dirty page .
*/
nilfs_clear_dirty_page ( page , false ) ;
unlock_page ( page ) ;
return - EROFS ;
}
2009-04-06 19:01:32 -07:00
redirty_page_for_writepage ( wbc , page ) ;
unlock_page ( page ) ;
if ( wbc - > sync_mode = = WB_SYNC_ALL ) {
err = nilfs_construct_segment ( inode - > i_sb ) ;
if ( unlikely ( err ) )
return err ;
} else if ( wbc - > for_reclaim )
nilfs_flush_segment ( inode - > i_sb , inode - > i_ino ) ;
return 0 ;
}
2022-02-09 20:22:11 +00:00
static bool nilfs_dirty_folio ( struct address_space * mapping ,
struct folio * folio )
2009-04-06 19:01:32 -07:00
{
2022-02-09 20:22:11 +00:00
struct inode * inode = mapping - > host ;
struct buffer_head * head ;
unsigned int nr_dirty = 0 ;
bool ret = filemap_dirty_folio ( mapping , folio ) ;
/*
* The page may not be locked , eg if called from try_to_unmap_one ( )
*/
spin_lock ( & mapping - > private_lock ) ;
head = folio_buffers ( folio ) ;
if ( head ) {
struct buffer_head * bh = head ;
2009-04-06 19:01:32 -07:00
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 15:55:29 -07:00
do {
/* Do not mark hole blocks dirty */
if ( buffer_dirty ( bh ) | | ! buffer_mapped ( bh ) )
continue ;
set_buffer_dirty ( bh ) ;
nr_dirty + + ;
} while ( bh = bh - > b_this_page , bh ! = head ) ;
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
} else if ( ret ) {
2022-02-09 20:22:11 +00:00
nr_dirty = 1 < < ( folio_shift ( folio ) - inode - > i_blkbits ) ;
}
spin_unlock ( & mapping - > private_lock ) ;
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
2022-02-09 20:22:11 +00:00
if ( nr_dirty )
nilfs2: fix data loss with mmap()
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25 16:05:14 -07:00
nilfs_set_file_dirty ( inode , nr_dirty ) ;
2009-04-06 19:01:32 -07:00
return ret ;
}
2012-12-15 11:57:37 +01:00
void nilfs_write_failed ( struct address_space * mapping , loff_t to )
{
struct inode * inode = mapping - > host ;
if ( to > inode - > i_size ) {
2013-09-12 15:13:56 -07:00
truncate_pagecache ( inode , inode - > i_size ) ;
2012-12-15 11:57:37 +01:00
nilfs_truncate ( inode ) ;
}
}
2009-04-06 19:01:32 -07:00
static int nilfs_write_begin ( struct file * file , struct address_space * mapping ,
2022-02-22 14:31:43 -05:00
loff_t pos , unsigned len ,
2009-04-06 19:01:32 -07:00
struct page * * pagep , void * * fsdata )
{
struct inode * inode = mapping - > host ;
int err = nilfs_transaction_begin ( inode - > i_sb , NULL , 1 ) ;
if ( unlikely ( err ) )
return err ;
2022-02-22 11:25:12 -05:00
err = block_write_begin ( mapping , pos , len , pagep , nilfs_get_block ) ;
2010-06-04 11:29:58 +02:00
if ( unlikely ( err ) ) {
2012-12-15 11:57:37 +01:00
nilfs_write_failed ( mapping , pos + len ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_abort ( inode - > i_sb ) ;
2010-06-04 11:29:58 +02:00
}
2009-04-06 19:01:32 -07:00
return err ;
}
static int nilfs_write_end ( struct file * file , struct address_space * mapping ,
loff_t pos , unsigned len , unsigned copied ,
struct page * page , void * fsdata )
{
struct inode * inode = mapping - > host ;
2016-05-23 16:23:39 -07:00
unsigned int start = pos & ( PAGE_SIZE - 1 ) ;
unsigned int nr_dirty ;
2009-04-06 19:01:32 -07:00
int err ;
nr_dirty = nilfs_page_count_clean_buffers ( page , start ,
start + copied ) ;
copied = generic_write_end ( file , mapping , pos , len , copied , page ,
fsdata ) ;
2010-12-27 00:05:49 +09:00
nilfs_set_file_dirty ( inode , nr_dirty ) ;
2009-04-06 19:01:45 -07:00
err = nilfs_transaction_commit ( inode - > i_sb ) ;
2009-04-06 19:01:32 -07:00
return err ? : copied ;
}
static ssize_t
2016-04-07 08:51:58 -07:00
nilfs_direct_IO ( struct kiocb * iocb , struct iov_iter * iter )
2009-04-06 19:01:32 -07:00
{
2015-06-21 01:37:24 -04:00
struct inode * inode = file_inode ( iocb - > ki_filp ) ;
2009-04-06 19:01:32 -07:00
2015-03-16 04:33:52 -07:00
if ( iov_iter_rw ( iter ) = = WRITE )
2009-04-06 19:01:32 -07:00
return 0 ;
/* Needs synchronization with the cleaner */
2016-04-07 08:51:58 -07:00
return blockdev_direct_IO ( iocb , inode , iter , nilfs_get_block ) ;
2009-04-06 19:01:32 -07:00
}
2009-09-21 17:01:10 -07:00
const struct address_space_operations nilfs_aops = {
2009-04-06 19:01:32 -07:00
. writepage = nilfs_writepage ,
2022-04-29 11:47:39 -04:00
. read_folio = nilfs_read_folio ,
2009-04-06 19:01:32 -07:00
. writepages = nilfs_writepages ,
2022-02-09 20:22:11 +00:00
. dirty_folio = nilfs_dirty_folio ,
fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).
The callers are all trivial except for GFS2 & OCFS2.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01 21:47:02 -07:00
. readahead = nilfs_readahead ,
2009-04-06 19:01:32 -07:00
. write_begin = nilfs_write_begin ,
. write_end = nilfs_write_end ,
2022-02-09 20:21:34 +00:00
. invalidate_folio = block_invalidate_folio ,
2009-04-06 19:01:32 -07:00
. direct_IO = nilfs_direct_IO ,
2009-05-13 11:19:40 +09:00
. is_partially_uptodate = block_is_partially_uptodate ,
2009-04-06 19:01:32 -07:00
} ;
2014-12-10 15:54:34 -08:00
static int nilfs_insert_inode_locked ( struct inode * inode ,
struct nilfs_root * root ,
unsigned long ino )
{
struct nilfs_iget_args args = {
2022-04-01 11:28:18 -07:00
. ino = ino , . root = root , . cno = 0 , . for_gc = false ,
2022-04-01 11:28:21 -07:00
. for_btnc = false , . for_shadow = false
2014-12-10 15:54:34 -08:00
} ;
return insert_inode_locked4 ( inode , ino , nilfs_iget_test , & args ) ;
}
2011-07-26 03:07:14 -04:00
struct inode * nilfs_new_inode ( struct inode * dir , umode_t mode )
2009-04-06 19:01:32 -07:00
{
struct super_block * sb = dir - > i_sb ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct inode * inode ;
struct nilfs_inode_info * ii ;
2010-08-25 17:45:44 +09:00
struct nilfs_root * root ;
2022-10-04 00:05:19 +09:00
struct buffer_head * bh ;
2009-04-06 19:01:32 -07:00
int err = - ENOMEM ;
ino_t ino ;
inode = new_inode ( sb ) ;
if ( unlikely ( ! inode ) )
goto failed ;
mapping_set_gfp_mask ( inode - > i_mapping ,
2015-11-06 16:28:49 -08:00
mapping_gfp_constraint ( inode - > i_mapping , ~ __GFP_FS ) ) ;
2009-04-06 19:01:32 -07:00
2010-08-25 17:45:44 +09:00
root = NILFS_I ( dir ) - > i_root ;
2009-04-06 19:01:32 -07:00
ii = NILFS_I ( inode ) ;
2016-08-02 14:05:28 -07:00
ii - > i_state = BIT ( NILFS_I_NEW ) ;
2010-08-25 17:45:44 +09:00
ii - > i_root = root ;
2009-04-06 19:01:32 -07:00
2022-10-04 00:05:19 +09:00
err = nilfs_ifile_create_inode ( root - > ifile , & ino , & bh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
goto failed_ifile_create_inode ;
/* reference count of i_bh inherits from nilfs_mdt_read_block() */
2022-10-04 00:05:19 +09:00
if ( unlikely ( ino < NILFS_USER_INO ) ) {
nilfs_warn ( sb ,
" inode bitmap is inconsistent for reserved inodes " ) ;
do {
brelse ( bh ) ;
err = nilfs_ifile_create_inode ( root - > ifile , & ino , & bh ) ;
if ( unlikely ( err ) )
goto failed_ifile_create_inode ;
} while ( ino < NILFS_USER_INO ) ;
nilfs_info ( sb , " repaired inode bitmap for reserved inodes " ) ;
}
ii - > i_bh = bh ;
2013-07-03 15:08:06 -07:00
atomic64_inc ( & root - > inodes_count ) ;
2023-01-13 12:49:25 +01:00
inode_init_owner ( & nop_mnt_idmap , inode , dir , mode ) ;
2009-04-06 19:01:32 -07:00
inode - > i_ino = ino ;
2023-07-05 15:01:24 -04:00
inode - > i_mtime = inode - > i_atime = inode_set_ctime_current ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( S_ISREG ( mode ) | | S_ISDIR ( mode ) | | S_ISLNK ( mode ) ) {
err = nilfs_bmap_read ( ii - > i_bmap , NULL ) ;
if ( err < 0 )
2014-12-10 15:54:34 -08:00
goto failed_after_creation ;
2009-04-06 19:01:32 -07:00
set_bit ( NILFS_I_BMAP , & ii - > i_state ) ;
/* No lock is needed; iget() ensures it. */
}
2011-01-20 02:09:53 +09:00
ii - > i_flags = nilfs_mask_flags (
mode , NILFS_I ( dir ) - > i_flags & NILFS_FL_INHERITED ) ;
2009-04-06 19:01:32 -07:00
/* ii->i_file_acl = 0; */
/* ii->i_dir_acl = 0; */
ii - > i_dir_start_lookup = 0 ;
nilfs_set_inode_flags ( inode ) ;
2011-03-09 11:05:08 +09:00
spin_lock ( & nilfs - > ns_next_gen_lock ) ;
inode - > i_generation = nilfs - > ns_next_generation + + ;
spin_unlock ( & nilfs - > ns_next_gen_lock ) ;
2014-12-10 15:54:34 -08:00
if ( nilfs_insert_inode_locked ( inode , root , ino ) < 0 ) {
err = - EIO ;
goto failed_after_creation ;
}
2009-04-06 19:01:32 -07:00
err = nilfs_init_acl ( inode , dir ) ;
if ( unlikely ( err ) )
2016-05-23 16:23:48 -07:00
/*
* Never occur . When supporting nilfs_init_acl ( ) ,
* proper cancellation of above jobs should be considered .
*/
goto failed_after_creation ;
2009-04-06 19:01:32 -07:00
return inode ;
2014-12-10 15:54:34 -08:00
failed_after_creation :
2011-10-28 14:13:28 +02:00
clear_nlink ( inode ) ;
2020-08-11 18:35:43 -07:00
if ( inode - > i_state & I_NEW )
unlock_new_inode ( inode ) ;
2016-05-23 16:23:48 -07:00
iput ( inode ) ; /*
* raw_inode will be deleted through
* nilfs_evict_inode ( ) .
*/
2009-04-06 19:01:32 -07:00
goto failed ;
failed_ifile_create_inode :
make_bad_inode ( inode ) ;
2016-05-23 16:23:48 -07:00
iput ( inode ) ;
2009-04-06 19:01:32 -07:00
failed :
return ERR_PTR ( err ) ;
}
void nilfs_set_inode_flags ( struct inode * inode )
{
unsigned int flags = NILFS_I ( inode ) - > i_flags ;
2015-04-16 12:46:50 -07:00
unsigned int new_fl = 0 ;
2009-04-06 19:01:32 -07:00
2011-01-20 02:09:52 +09:00
if ( flags & FS_SYNC_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_SYNC ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_APPEND_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_APPEND ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_IMMUTABLE_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_IMMUTABLE ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_NOATIME_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_NOATIME ;
2011-01-20 02:09:52 +09:00
if ( flags & FS_DIRSYNC_FL )
2015-04-16 12:46:50 -07:00
new_fl | = S_DIRSYNC ;
inode_set_flags ( inode , new_fl , S_SYNC | S_APPEND | S_IMMUTABLE |
S_NOATIME | S_DIRSYNC ) ;
2009-04-06 19:01:32 -07:00
}
int nilfs_read_inode_common ( struct inode * inode ,
struct nilfs_inode * raw_inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
int err ;
inode - > i_mode = le16_to_cpu ( raw_inode - > i_mode ) ;
2012-02-10 12:31:23 -08:00
i_uid_write ( inode , le32_to_cpu ( raw_inode - > i_uid ) ) ;
i_gid_write ( inode , le32_to_cpu ( raw_inode - > i_gid ) ) ;
2011-10-28 14:13:29 +02:00
set_nlink ( inode , le16_to_cpu ( raw_inode - > i_links_count ) ) ;
2009-04-06 19:01:32 -07:00
inode - > i_size = le64_to_cpu ( raw_inode - > i_size ) ;
inode - > i_atime . tv_sec = le64_to_cpu ( raw_inode - > i_mtime ) ;
2023-07-05 15:01:24 -04:00
inode_set_ctime ( inode , le64_to_cpu ( raw_inode - > i_ctime ) ,
le32_to_cpu ( raw_inode - > i_ctime_nsec ) ) ;
2009-04-06 19:01:32 -07:00
inode - > i_mtime . tv_sec = le64_to_cpu ( raw_inode - > i_mtime ) ;
2009-04-06 19:02:00 -07:00
inode - > i_atime . tv_nsec = le32_to_cpu ( raw_inode - > i_mtime_nsec ) ;
inode - > i_mtime . tv_nsec = le32_to_cpu ( raw_inode - > i_mtime_nsec ) ;
2022-10-02 12:08:04 +09:00
if ( nilfs_is_metadata_file_inode ( inode ) & & ! S_ISREG ( inode - > i_mode ) )
return - EIO ; /* this inode is for metadata and corrupted */
2014-12-10 15:54:34 -08:00
if ( inode - > i_nlink = = 0 )
return - ESTALE ; /* this inode is deleted */
2009-04-06 19:01:32 -07:00
inode - > i_blocks = le64_to_cpu ( raw_inode - > i_blocks ) ;
ii - > i_flags = le32_to_cpu ( raw_inode - > i_flags ) ;
#if 0
ii - > i_file_acl = le32_to_cpu ( raw_inode - > i_file_acl ) ;
ii - > i_dir_acl = S_ISREG ( inode - > i_mode ) ?
0 : le32_to_cpu ( raw_inode - > i_dir_acl ) ;
# endif
2009-09-28 13:02:46 +09:00
ii - > i_dir_start_lookup = 0 ;
2009-04-06 19:01:32 -07:00
inode - > i_generation = le32_to_cpu ( raw_inode - > i_generation ) ;
if ( S_ISREG ( inode - > i_mode ) | | S_ISDIR ( inode - > i_mode ) | |
S_ISLNK ( inode - > i_mode ) ) {
err = nilfs_bmap_read ( ii - > i_bmap , raw_inode ) ;
if ( err < 0 )
return err ;
set_bit ( NILFS_I_BMAP , & ii - > i_state ) ;
/* No lock is needed; iget() ensures it. */
}
return 0 ;
}
2010-08-14 13:07:15 +09:00
static int __nilfs_read_inode ( struct super_block * sb ,
struct nilfs_root * root , unsigned long ino ,
2009-04-06 19:01:32 -07:00
struct inode * inode )
{
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct buffer_head * bh ;
struct nilfs_inode * raw_inode ;
int err ;
2010-12-27 00:07:30 +09:00
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2010-08-14 13:07:15 +09:00
err = nilfs_ifile_get_inode_block ( root - > ifile , ino , & bh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
goto bad_inode ;
2010-08-14 13:07:15 +09:00
raw_inode = nilfs_ifile_map_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
2009-08-22 19:10:07 +09:00
err = nilfs_read_inode_common ( inode , raw_inode ) ;
if ( err )
2009-04-06 19:01:32 -07:00
goto failed_unmap ;
if ( S_ISREG ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_file_inode_operations ;
inode - > i_fop = & nilfs_file_operations ;
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else if ( S_ISDIR ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_dir_inode_operations ;
inode - > i_fop = & nilfs_dir_operations ;
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else if ( S_ISLNK ( inode - > i_mode ) ) {
inode - > i_op = & nilfs_symlink_inode_operations ;
2015-11-17 01:07:57 -05:00
inode_nohighmem ( inode ) ;
2009-04-06 19:01:32 -07:00
inode - > i_mapping - > a_ops = & nilfs_aops ;
} else {
inode - > i_op = & nilfs_special_inode_operations ;
init_special_inode (
inode , inode - > i_mode ,
2010-05-09 15:31:22 +09:00
huge_decode_dev ( le64_to_cpu ( raw_inode - > i_device_code ) ) ) ;
2009-04-06 19:01:32 -07:00
}
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
brelse ( bh ) ;
2010-12-27 00:07:30 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-04-06 19:01:32 -07:00
nilfs_set_inode_flags ( inode ) ;
nilfs2: put out gfp mask manipulation from nilfs_set_inode_flags()
nilfs_set_inode_flags() function adjusts gfp-mask of inode->i_mapping as
well as i_flags, however, this coupling of operations is not appropriate.
For instance, nilfs_ioctl_setflags(), one of three callers of
nilfs_set_inode_flags(), doesn't need to reinitialize the gfp-mask at all.
In addition, nilfs_new_inode(), another caller of
nilfs_set_inode_flags(), doesn't either because it has already initialized
the gfp-mask.
Only __nilfs_read_inode(), the remaining caller, needs it. So, this moves
the gfp mask manipulation to __nilfs_read_inode() from
nilfs_set_inode_flags().
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 12:46:47 -07:00
mapping_set_gfp_mask ( inode - > i_mapping ,
2015-11-06 16:28:49 -08:00
mapping_gfp_constraint ( inode - > i_mapping , ~ __GFP_FS ) ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
failed_unmap :
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( root - > ifile , ino , bh ) ;
2009-04-06 19:01:32 -07:00
brelse ( bh ) ;
bad_inode :
2010-12-27 00:07:30 +09:00
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
2009-04-06 19:01:32 -07:00
return err ;
}
2010-08-20 21:20:29 +09:00
static int nilfs_iget_test ( struct inode * inode , void * opaque )
{
struct nilfs_iget_args * args = opaque ;
struct nilfs_inode_info * ii ;
2010-08-25 17:45:44 +09:00
if ( args - > ino ! = inode - > i_ino | | args - > root ! = NILFS_I ( inode ) - > i_root )
2010-08-20 21:20:29 +09:00
return 0 ;
ii = NILFS_I ( inode ) ;
2022-04-01 11:28:18 -07:00
if ( test_bit ( NILFS_I_BTNC , & ii - > i_state ) ) {
if ( ! args - > for_btnc )
return 0 ;
} else if ( args - > for_btnc ) {
return 0 ;
}
2022-04-01 11:28:21 -07:00
if ( test_bit ( NILFS_I_SHADOW , & ii - > i_state ) ) {
if ( ! args - > for_shadow )
return 0 ;
} else if ( args - > for_shadow ) {
return 0 ;
}
2022-04-01 11:28:18 -07:00
2010-08-20 21:20:29 +09:00
if ( ! test_bit ( NILFS_I_GCINODE , & ii - > i_state ) )
return ! args - > for_gc ;
return args - > for_gc & & args - > cno = = ii - > i_cno ;
}
static int nilfs_iget_set ( struct inode * inode , void * opaque )
{
struct nilfs_iget_args * args = opaque ;
inode - > i_ino = args - > ino ;
2022-04-01 11:28:18 -07:00
NILFS_I ( inode ) - > i_cno = args - > cno ;
NILFS_I ( inode ) - > i_root = args - > root ;
if ( args - > root & & args - > ino = = NILFS_ROOT_INO )
nilfs_get_root ( args - > root ) ;
if ( args - > for_gc )
2016-08-02 14:05:28 -07:00
NILFS_I ( inode ) - > i_state = BIT ( NILFS_I_GCINODE ) ;
2022-04-01 11:28:18 -07:00
if ( args - > for_btnc )
NILFS_I ( inode ) - > i_state | = BIT ( NILFS_I_BTNC ) ;
2022-04-01 11:28:21 -07:00
if ( args - > for_shadow )
NILFS_I ( inode ) - > i_state | = BIT ( NILFS_I_SHADOW ) ;
2010-08-20 21:20:29 +09:00
return 0 ;
}
2010-09-13 11:16:34 +09:00
struct inode * nilfs_ilookup ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
{
struct nilfs_iget_args args = {
2022-04-01 11:28:18 -07:00
. ino = ino , . root = root , . cno = 0 , . for_gc = false ,
2022-04-01 11:28:21 -07:00
. for_btnc = false , . for_shadow = false
2010-09-13 11:16:34 +09:00
} ;
return ilookup5 ( sb , ino , nilfs_iget_test , & args ) ;
}
2010-09-05 12:20:59 +09:00
struct inode * nilfs_iget_locked ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
2009-04-06 19:01:32 -07:00
{
2010-08-25 17:45:44 +09:00
struct nilfs_iget_args args = {
2022-04-01 11:28:18 -07:00
. ino = ino , . root = root , . cno = 0 , . for_gc = false ,
2022-04-01 11:28:21 -07:00
. for_btnc = false , . for_shadow = false
2010-08-25 17:45:44 +09:00
} ;
2010-09-05 12:20:59 +09:00
return iget5_locked ( sb , ino , nilfs_iget_test , nilfs_iget_set , & args ) ;
}
struct inode * nilfs_iget ( struct super_block * sb , struct nilfs_root * root ,
unsigned long ino )
{
2009-04-06 19:01:32 -07:00
struct inode * inode ;
int err ;
2010-09-05 12:20:59 +09:00
inode = nilfs_iget_locked ( sb , root , ino ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( ! inode ) )
return ERR_PTR ( - ENOMEM ) ;
if ( ! ( inode - > i_state & I_NEW ) )
return inode ;
2010-08-14 13:07:15 +09:00
err = __nilfs_read_inode ( sb , root , ino , inode ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) ) {
iget_failed ( inode ) ;
return ERR_PTR ( err ) ;
}
unlock_new_inode ( inode ) ;
return inode ;
}
2010-08-20 19:06:11 +09:00
struct inode * nilfs_iget_for_gc ( struct super_block * sb , unsigned long ino ,
__u64 cno )
{
2010-08-25 17:45:44 +09:00
struct nilfs_iget_args args = {
2022-04-01 11:28:18 -07:00
. ino = ino , . root = NULL , . cno = cno , . for_gc = true ,
2022-04-01 11:28:21 -07:00
. for_btnc = false , . for_shadow = false
2010-08-25 17:45:44 +09:00
} ;
2010-08-20 19:06:11 +09:00
struct inode * inode ;
int err ;
inode = iget5_locked ( sb , ino , nilfs_iget_test , nilfs_iget_set , & args ) ;
if ( unlikely ( ! inode ) )
return ERR_PTR ( - ENOMEM ) ;
if ( ! ( inode - > i_state & I_NEW ) )
return inode ;
err = nilfs_init_gcinode ( inode ) ;
if ( unlikely ( err ) ) {
iget_failed ( inode ) ;
return ERR_PTR ( err ) ;
}
unlock_new_inode ( inode ) ;
return inode ;
}
2022-04-01 11:28:18 -07:00
/**
* nilfs_attach_btree_node_cache - attach a B - tree node cache to the inode
* @ inode : inode object
*
* nilfs_attach_btree_node_cache ( ) attaches a B - tree node cache to @ inode ,
* or does nothing if the inode already has it . This function allocates
* an additional inode to maintain page cache of B - tree nodes one - on - one .
*
* Return Value : On success , 0 is returned . On errors , one of the following
* negative error code is returned .
*
* % - ENOMEM - Insufficient memory available .
*/
int nilfs_attach_btree_node_cache ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
struct inode * btnc_inode ;
struct nilfs_iget_args args ;
if ( ii - > i_assoc_inode )
return 0 ;
args . ino = inode - > i_ino ;
args . root = ii - > i_root ;
args . cno = ii - > i_cno ;
args . for_gc = test_bit ( NILFS_I_GCINODE , & ii - > i_state ) ! = 0 ;
args . for_btnc = true ;
2022-04-01 11:28:21 -07:00
args . for_shadow = test_bit ( NILFS_I_SHADOW , & ii - > i_state ) ! = 0 ;
2022-04-01 11:28:18 -07:00
btnc_inode = iget5_locked ( inode - > i_sb , inode - > i_ino , nilfs_iget_test ,
nilfs_iget_set , & args ) ;
if ( unlikely ( ! btnc_inode ) )
return - ENOMEM ;
if ( btnc_inode - > i_state & I_NEW ) {
nilfs_init_btnc_inode ( btnc_inode ) ;
unlock_new_inode ( btnc_inode ) ;
}
NILFS_I ( btnc_inode ) - > i_assoc_inode = inode ;
NILFS_I ( btnc_inode ) - > i_bmap = ii - > i_bmap ;
ii - > i_assoc_inode = btnc_inode ;
return 0 ;
}
/**
* nilfs_detach_btree_node_cache - detach the B - tree node cache from the inode
* @ inode : inode object
*
* nilfs_detach_btree_node_cache ( ) detaches the B - tree node cache and its
* holder inode bound to @ inode , or does nothing if @ inode doesn ' t have it .
*/
void nilfs_detach_btree_node_cache ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
struct inode * btnc_inode = ii - > i_assoc_inode ;
if ( btnc_inode ) {
NILFS_I ( btnc_inode ) - > i_assoc_inode = NULL ;
ii - > i_assoc_inode = NULL ;
iput ( btnc_inode ) ;
}
}
2022-04-01 11:28:21 -07:00
/**
* nilfs_iget_for_shadow - obtain inode for shadow mapping
* @ inode : inode object that uses shadow mapping
*
* nilfs_iget_for_shadow ( ) allocates a pair of inodes that holds page
* caches for shadow mapping . The page cache for data pages is set up
* in one inode and the one for b - tree node pages is set up in the
* other inode , which is attached to the former inode .
*
* Return Value : On success , a pointer to the inode for data pages is
* returned . On errors , one of the following negative error code is returned
* in a pointer type .
*
* % - ENOMEM - Insufficient memory available .
*/
struct inode * nilfs_iget_for_shadow ( struct inode * inode )
{
struct nilfs_iget_args args = {
. ino = inode - > i_ino , . root = NULL , . cno = 0 , . for_gc = false ,
. for_btnc = false , . for_shadow = true
} ;
struct inode * s_inode ;
int err ;
s_inode = iget5_locked ( inode - > i_sb , inode - > i_ino , nilfs_iget_test ,
nilfs_iget_set , & args ) ;
if ( unlikely ( ! s_inode ) )
return ERR_PTR ( - ENOMEM ) ;
if ( ! ( s_inode - > i_state & I_NEW ) )
return inode ;
NILFS_I ( s_inode ) - > i_flags = 0 ;
memset ( NILFS_I ( s_inode ) - > i_bmap , 0 , sizeof ( struct nilfs_bmap ) ) ;
mapping_set_gfp_mask ( s_inode - > i_mapping , GFP_NOFS ) ;
err = nilfs_attach_btree_node_cache ( s_inode ) ;
if ( unlikely ( err ) ) {
iget_failed ( s_inode ) ;
return ERR_PTR ( err ) ;
}
unlock_new_inode ( s_inode ) ;
return s_inode ;
}
2009-04-06 19:01:32 -07:00
void nilfs_write_inode_common ( struct inode * inode ,
struct nilfs_inode * raw_inode , int has_bmap )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
raw_inode - > i_mode = cpu_to_le16 ( inode - > i_mode ) ;
2012-02-10 12:31:23 -08:00
raw_inode - > i_uid = cpu_to_le32 ( i_uid_read ( inode ) ) ;
raw_inode - > i_gid = cpu_to_le32 ( i_gid_read ( inode ) ) ;
2009-04-06 19:01:32 -07:00
raw_inode - > i_links_count = cpu_to_le16 ( inode - > i_nlink ) ;
raw_inode - > i_size = cpu_to_le64 ( inode - > i_size ) ;
2023-07-05 15:01:24 -04:00
raw_inode - > i_ctime = cpu_to_le64 ( inode_get_ctime ( inode ) . tv_sec ) ;
2009-04-06 19:01:32 -07:00
raw_inode - > i_mtime = cpu_to_le64 ( inode - > i_mtime . tv_sec ) ;
2023-07-05 15:01:24 -04:00
raw_inode - > i_ctime_nsec = cpu_to_le32 ( inode_get_ctime ( inode ) . tv_nsec ) ;
2009-04-06 19:02:00 -07:00
raw_inode - > i_mtime_nsec = cpu_to_le32 ( inode - > i_mtime . tv_nsec ) ;
2009-04-06 19:01:32 -07:00
raw_inode - > i_blocks = cpu_to_le64 ( inode - > i_blocks ) ;
raw_inode - > i_flags = cpu_to_le32 ( ii - > i_flags ) ;
raw_inode - > i_generation = cpu_to_le32 ( inode - > i_generation ) ;
2011-04-30 18:56:12 +09:00
if ( NILFS_ROOT_METADATA_FILE ( inode - > i_ino ) ) {
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
/* zero-fill unused portion in the case of super root block */
raw_inode - > i_xattr = 0 ;
raw_inode - > i_pad = 0 ;
memset ( ( void * ) raw_inode + sizeof ( * raw_inode ) , 0 ,
nilfs - > ns_inode_size - sizeof ( * raw_inode ) ) ;
}
2009-04-06 19:01:32 -07:00
if ( has_bmap )
nilfs_bmap_write ( ii - > i_bmap , raw_inode ) ;
else if ( S_ISCHR ( inode - > i_mode ) | | S_ISBLK ( inode - > i_mode ) )
raw_inode - > i_device_code =
2010-05-09 15:31:22 +09:00
cpu_to_le64 ( huge_encode_dev ( inode - > i_rdev ) ) ;
2016-05-23 16:23:48 -07:00
/*
* When extending inode , nilfs - > ns_inode_size should be checked
* for substitutions of appended fields .
*/
2009-04-06 19:01:32 -07:00
}
2014-10-13 15:53:22 -07:00
void nilfs_update_inode ( struct inode * inode , struct buffer_head * ibh , int flags )
2009-04-06 19:01:32 -07:00
{
ino_t ino = inode - > i_ino ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2010-08-14 13:07:15 +09:00
struct inode * ifile = ii - > i_root - > ifile ;
2009-04-06 19:01:32 -07:00
struct nilfs_inode * raw_inode ;
2010-08-14 13:07:15 +09:00
raw_inode = nilfs_ifile_map_inode ( ifile , ino , ibh ) ;
2009-04-06 19:01:32 -07:00
if ( test_and_clear_bit ( NILFS_I_NEW , & ii - > i_state ) )
2010-08-14 13:07:15 +09:00
memset ( raw_inode , 0 , NILFS_MDT ( ifile ) - > mi_entry_size ) ;
2014-10-13 15:53:22 -07:00
if ( flags & I_DIRTY_DATASYNC )
set_bit ( NILFS_I_INODE_SYNC , & ii - > i_state ) ;
2009-04-06 19:01:32 -07:00
nilfs_write_inode_common ( inode , raw_inode , 0 ) ;
2016-05-23 16:23:48 -07:00
/*
* XXX : call with has_bmap = 0 is a workaround to avoid
* deadlock of bmap . This delays update of i_bmap to just
* before writing .
*/
2010-08-14 13:07:15 +09:00
nilfs_ifile_unmap_inode ( ifile , ino , ibh ) ;
2009-04-06 19:01:32 -07:00
}
# define NILFS_MAX_TRUNCATE_BLOCKS 16384 /* 64MB for 4KB block */
static void nilfs_truncate_bmap ( struct nilfs_inode_info * ii ,
unsigned long from )
{
2015-04-16 12:46:34 -07:00
__u64 b ;
2009-04-06 19:01:32 -07:00
int ret ;
if ( ! test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
return ;
2010-11-19 15:26:20 +09:00
repeat :
2009-04-06 19:01:32 -07:00
ret = nilfs_bmap_last_key ( ii - > i_bmap , & b ) ;
if ( ret = = - ENOENT )
return ;
else if ( ret < 0 )
goto failed ;
if ( b < from )
return ;
2015-04-16 12:46:34 -07:00
b - = min_t ( __u64 , NILFS_MAX_TRUNCATE_BLOCKS , b - from ) ;
2009-04-06 19:01:32 -07:00
ret = nilfs_bmap_truncate ( ii - > i_bmap , b ) ;
nilfs_relax_pressure_in_lock ( ii - > vfs_inode . i_sb ) ;
if ( ! ret | | ( ret = = - ENOMEM & &
nilfs_bmap_truncate ( ii - > i_bmap , b ) = = 0 ) )
goto repeat ;
2010-11-19 15:26:20 +09:00
failed :
2020-08-11 18:35:49 -07:00
nilfs_warn ( ii - > vfs_inode . i_sb , " error %d truncating bmap (ino=%lu) " ,
ret , ii - > vfs_inode . i_ino ) ;
2009-04-06 19:01:32 -07:00
}
void nilfs_truncate ( struct inode * inode )
{
unsigned long blkoff ;
unsigned int blocksize ;
struct nilfs_transaction_info ti ;
struct super_block * sb = inode - > i_sb ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
if ( ! test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
return ;
if ( IS_APPEND ( inode ) | | IS_IMMUTABLE ( inode ) )
return ;
blocksize = sb - > s_blocksize ;
blkoff = ( inode - > i_size + blocksize - 1 ) > > sb - > s_blocksize_bits ;
2009-04-06 19:01:55 -07:00
nilfs_transaction_begin ( sb , & ti , 0 ) ; /* never fails */
2009-04-06 19:01:32 -07:00
block_truncate_page ( inode - > i_mapping , inode - > i_size , nilfs_get_block ) ;
nilfs_truncate_bmap ( ii , blkoff ) ;
2023-07-05 15:01:24 -04:00
inode - > i_mtime = inode_set_ctime_current ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( IS_SYNC ( inode ) )
nilfs_set_transaction_flag ( NILFS_TI_SYNC ) ;
2009-11-27 19:41:14 +09:00
nilfs_mark_inode_dirty ( inode ) ;
2010-12-27 00:05:49 +09:00
nilfs_set_file_dirty ( inode , 0 ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( sb ) ;
2016-05-23 16:23:48 -07:00
/*
* May construct a logical segment and may fail in sync mode .
* But truncate has no return value .
*/
2009-04-06 19:01:32 -07:00
}
2010-06-07 11:55:00 -04:00
static void nilfs_clear_inode ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
/*
* Free resources allocated in nilfs_read_inode ( ) , here .
*/
BUG_ON ( ! list_empty ( & ii - > i_dirty ) ) ;
brelse ( ii - > i_bh ) ;
ii - > i_bh = NULL ;
2016-05-23 16:23:20 -07:00
if ( nilfs_is_metadata_file_inode ( inode ) )
nilfs_mdt_clear ( inode ) ;
2010-08-20 23:40:54 +09:00
2010-06-07 11:55:00 -04:00
if ( test_bit ( NILFS_I_BMAP , & ii - > i_state ) )
nilfs_bmap_clear ( ii - > i_bmap ) ;
2022-04-01 11:28:18 -07:00
if ( ! test_bit ( NILFS_I_BTNC , & ii - > i_state ) )
nilfs_detach_btree_node_cache ( inode ) ;
2010-08-25 17:45:44 +09:00
if ( ii - > i_root & & inode - > i_ino = = NILFS_ROOT_INO )
nilfs_put_root ( ii - > i_root ) ;
2010-06-07 11:55:00 -04:00
}
void nilfs_evict_inode ( struct inode * inode )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
struct super_block * sb = inode - > i_sb ;
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2023-05-10 00:29:56 +09:00
struct the_nilfs * nilfs ;
2011-02-11 15:23:27 +09:00
int ret ;
2009-04-06 19:01:32 -07:00
2010-08-25 17:45:44 +09:00
if ( inode - > i_nlink | | ! ii - > i_root | | unlikely ( is_bad_inode ( inode ) ) ) {
2014-04-03 14:47:49 -07:00
truncate_inode_pages_final ( & inode - > i_data ) ;
2012-05-03 14:48:02 +02:00
clear_inode ( inode ) ;
2010-06-07 11:55:00 -04:00
nilfs_clear_inode ( inode ) ;
2009-04-06 19:01:32 -07:00
return ;
}
2009-04-06 19:01:55 -07:00
nilfs_transaction_begin ( sb , & ti , 0 ) ; /* never fails */
2014-04-03 14:47:49 -07:00
truncate_inode_pages_final ( & inode - > i_data ) ;
2009-04-06 19:01:32 -07:00
2023-05-10 00:29:56 +09:00
nilfs = sb - > s_fs_info ;
if ( unlikely ( sb_rdonly ( sb ) | | ! nilfs - > ns_writer ) ) {
/*
* If this inode is about to be disposed after the file system
* has been degraded to read - only due to file system corruption
* or after the writer has been detached , do not make any
* changes that cause writes , just clear it .
* Do this check after read - locking ns_segctor_sem by
* nilfs_transaction_begin ( ) in order to avoid a race with
* the writer detach operation .
*/
clear_inode ( inode ) ;
nilfs_clear_inode ( inode ) ;
nilfs_transaction_abort ( sb ) ;
return ;
}
2010-08-14 13:07:15 +09:00
/* TODO: some of the following operations may fail. */
2009-04-06 19:01:32 -07:00
nilfs_truncate_bmap ( ii , 0 ) ;
2009-11-27 19:41:14 +09:00
nilfs_mark_inode_dirty ( inode ) ;
2012-05-03 14:48:02 +02:00
clear_inode ( inode ) ;
2010-08-14 13:07:15 +09:00
2011-02-11 15:23:27 +09:00
ret = nilfs_ifile_delete_inode ( ii - > i_root - > ifile , inode - > i_ino ) ;
if ( ! ret )
2013-07-03 15:08:06 -07:00
atomic64_dec ( & ii - > i_root - > inodes_count ) ;
2010-08-14 13:07:15 +09:00
2010-06-07 11:55:00 -04:00
nilfs_clear_inode ( inode ) ;
2010-08-14 13:07:15 +09:00
2009-04-06 19:01:32 -07:00
if ( IS_SYNC ( inode ) )
nilfs_set_transaction_flag ( NILFS_TI_SYNC ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( sb ) ;
2016-05-23 16:23:48 -07:00
/*
* May construct a logical segment and may fail in sync mode .
* But delete_inode has no return value .
*/
2009-04-06 19:01:32 -07:00
}
2023-01-13 12:49:11 +01:00
int nilfs_setattr ( struct mnt_idmap * idmap , struct dentry * dentry ,
2021-01-21 14:19:43 +01:00
struct iattr * iattr )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
2015-03-17 22:25:59 +00:00
struct inode * inode = d_inode ( dentry ) ;
2009-04-06 19:01:32 -07:00
struct super_block * sb = inode - > i_sb ;
2009-04-06 19:01:45 -07:00
int err ;
2009-04-06 19:01:32 -07:00
2023-01-13 12:49:11 +01:00
err = setattr_prepare ( & nop_mnt_idmap , dentry , iattr ) ;
2009-04-06 19:01:32 -07:00
if ( err )
return err ;
err = nilfs_transaction_begin ( sb , & ti , 0 ) ;
if ( unlikely ( err ) )
return err ;
2010-06-04 11:30:02 +02:00
if ( ( iattr - > ia_valid & ATTR_SIZE ) & &
iattr - > ia_size ! = i_size_read ( inode ) ) {
2011-06-24 14:29:45 -04:00
inode_dio_wait ( inode ) ;
2012-12-15 11:57:37 +01:00
truncate_setsize ( inode , iattr - > ia_size ) ;
nilfs_truncate ( inode ) ;
2010-06-04 11:30:02 +02:00
}
2023-01-13 12:49:11 +01:00
setattr_copy ( & nop_mnt_idmap , inode , iattr ) ;
2010-06-04 11:30:02 +02:00
mark_inode_dirty ( inode ) ;
if ( iattr - > ia_valid & ATTR_MODE ) {
2009-04-06 19:01:32 -07:00
err = nilfs_acl_chmod ( inode ) ;
2010-06-04 11:30:02 +02:00
if ( unlikely ( err ) )
goto out_err ;
}
return nilfs_transaction_commit ( sb ) ;
2009-04-06 19:01:45 -07:00
2010-06-04 11:30:02 +02:00
out_err :
nilfs_transaction_abort ( sb ) ;
2009-04-06 19:01:45 -07:00
return err ;
2009-04-06 19:01:32 -07:00
}
2023-01-13 12:49:22 +01:00
int nilfs_permission ( struct mnt_idmap * idmap , struct inode * inode ,
2021-01-21 14:19:43 +01:00
int mask )
2010-08-15 23:33:57 +09:00
{
2011-06-18 20:21:44 -04:00
struct nilfs_root * root = NILFS_I ( inode ) - > i_root ;
2016-05-23 16:23:25 -07:00
2010-08-15 23:33:57 +09:00
if ( ( mask & MAY_WRITE ) & & root & &
root - > cno ! = NILFS_CPTREE_CURRENT_CNO )
return - EROFS ; /* snapshot is not writable */
2023-01-13 12:49:22 +01:00
return generic_permission ( & nop_mnt_idmap , inode , mask ) ;
2010-08-15 23:33:57 +09:00
}
2010-12-27 00:05:49 +09:00
int nilfs_load_inode_block ( struct inode * inode , struct buffer_head * * pbh )
2009-04-06 19:01:32 -07:00
{
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
int err ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2023-08-18 22:18:04 +09:00
if ( ii - > i_bh = = NULL | | unlikely ( ! buffer_uptodate ( ii - > i_bh ) ) ) {
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2010-08-14 13:07:15 +09:00
err = nilfs_ifile_get_inode_block ( ii - > i_root - > ifile ,
inode - > i_ino , pbh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) )
return err ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
if ( ii - > i_bh = = NULL )
ii - > i_bh = * pbh ;
2023-08-18 22:18:04 +09:00
else if ( unlikely ( ! buffer_uptodate ( ii - > i_bh ) ) ) {
__brelse ( ii - > i_bh ) ;
ii - > i_bh = * pbh ;
} else {
2009-04-06 19:01:32 -07:00
brelse ( * pbh ) ;
* pbh = ii - > i_bh ;
}
} else
* pbh = ii - > i_bh ;
get_bh ( * pbh ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
}
int nilfs_inode_dirty ( struct inode * inode )
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
int ret = 0 ;
if ( ! list_empty ( & ii - > i_dirty ) ) {
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
ret = test_bit ( NILFS_I_DIRTY , & ii - > i_state ) | |
test_bit ( NILFS_I_BUSY , & ii - > i_state ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
}
return ret ;
}
2016-05-23 16:23:39 -07:00
int nilfs_set_file_dirty ( struct inode * inode , unsigned int nr_dirty )
2009-04-06 19:01:32 -07:00
{
struct nilfs_inode_info * ii = NILFS_I ( inode ) ;
2011-03-09 11:05:08 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
2011-03-09 11:05:07 +09:00
atomic_add ( nr_dirty , & nilfs - > ns_ndirtyblks ) ;
2009-04-06 19:01:32 -07:00
2009-04-06 19:01:56 -07:00
if ( test_and_set_bit ( NILFS_I_DIRTY , & ii - > i_state ) )
2009-04-06 19:01:32 -07:00
return 0 ;
2011-03-09 11:05:07 +09:00
spin_lock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
if ( ! test_bit ( NILFS_I_QUEUED , & ii - > i_state ) & &
! test_bit ( NILFS_I_BUSY , & ii - > i_state ) ) {
2016-05-23 16:23:48 -07:00
/*
* Because this routine may race with nilfs_dispose_list ( ) ,
* we have to check NILFS_I_QUEUED here , too .
*/
2009-04-06 19:01:32 -07:00
if ( list_empty ( & ii - > i_dirty ) & & igrab ( inode ) = = NULL ) {
2016-05-23 16:23:48 -07:00
/*
* This will happen when somebody is freeing
* this inode .
*/
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" cannot set file dirty (ino=%lu): the file is being freed " ,
inode - > i_ino ) ;
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2016-05-23 16:23:48 -07:00
return - EINVAL ; /*
* NILFS_I_DIRTY may remain for
* freeing inode .
*/
2009-04-06 19:01:32 -07:00
}
2011-03-19 16:45:30 +01:00
list_move_tail ( & ii - > i_dirty , & nilfs - > ns_dirty_files ) ;
2009-04-06 19:01:32 -07:00
set_bit ( NILFS_I_QUEUED , & ii - > i_state ) ;
}
2011-03-09 11:05:07 +09:00
spin_unlock ( & nilfs - > ns_inode_lock ) ;
2009-04-06 19:01:32 -07:00
return 0 ;
}
2014-10-13 15:53:22 -07:00
int __nilfs_mark_inode_dirty ( struct inode * inode , int flags )
2009-04-06 19:01:32 -07:00
{
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously,
nilfs_evict_inode() could cause use-after-free read for nilfs_root if
inodes are left in "garbage_list" and released by nilfs_dispose_list at
the end of nilfs_detach_log_writer(), and this bug was fixed by commit
9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in
nilfs_evict_inode()").
However, it turned out that there is another possibility of UAF in the
call path where mark_inode_dirty_sync() is called from iput():
nilfs_detach_log_writer()
nilfs_dispose_list()
iput()
mark_inode_dirty_sync()
__mark_inode_dirty()
nilfs_dirty_inode()
__nilfs_mark_inode_dirty()
nilfs_load_inode_block() --> causes UAF of nilfs_root struct
This can happen after commit 0ae45f63d4ef ("vfs: add support for a
lazytime mount option"), which changed iput() to call
mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME
flag and i_nlink is non-zero.
This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty
data after degenerating to read-only") when using the syzbot reproducer,
but the issue has potentially existed before.
Fix this issue by adding a "purging flag" to the nilfs structure, setting
that flag while disposing the "garbage_list" and checking it in
__nilfs_mark_inode_dirty().
Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root
in nilfs_evict_inode()"), this patch does not rely on ns_writer to
determine whether to skip operations, so as not to break recovery on
mount. The nilfs_salvage_orphan_logs routine dirties the buffer of
salvaged data before attaching the log writer, so changing
__nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL
will cause recovery write to fail. The purpose of using the cleanup-only
flag is to allow for narrowing of such conditions.
Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-29 04:13:18 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2009-04-06 19:01:32 -07:00
struct buffer_head * ibh ;
int err ;
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously,
nilfs_evict_inode() could cause use-after-free read for nilfs_root if
inodes are left in "garbage_list" and released by nilfs_dispose_list at
the end of nilfs_detach_log_writer(), and this bug was fixed by commit
9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in
nilfs_evict_inode()").
However, it turned out that there is another possibility of UAF in the
call path where mark_inode_dirty_sync() is called from iput():
nilfs_detach_log_writer()
nilfs_dispose_list()
iput()
mark_inode_dirty_sync()
__mark_inode_dirty()
nilfs_dirty_inode()
__nilfs_mark_inode_dirty()
nilfs_load_inode_block() --> causes UAF of nilfs_root struct
This can happen after commit 0ae45f63d4ef ("vfs: add support for a
lazytime mount option"), which changed iput() to call
mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME
flag and i_nlink is non-zero.
This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty
data after degenerating to read-only") when using the syzbot reproducer,
but the issue has potentially existed before.
Fix this issue by adding a "purging flag" to the nilfs structure, setting
that flag while disposing the "garbage_list" and checking it in
__nilfs_mark_inode_dirty().
Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root
in nilfs_evict_inode()"), this patch does not rely on ns_writer to
determine whether to skip operations, so as not to break recovery on
mount. The nilfs_salvage_orphan_logs routine dirties the buffer of
salvaged data before attaching the log writer, so changing
__nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL
will cause recovery write to fail. The purpose of using the cleanup-only
flag is to allow for narrowing of such conditions.
Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-29 04:13:18 +09:00
/*
* Do not dirty inodes after the log writer has been detached
* and its nilfs_root struct has been freed .
*/
if ( unlikely ( nilfs_purging ( nilfs ) ) )
return 0 ;
2010-12-27 00:05:49 +09:00
err = nilfs_load_inode_block ( inode , & ibh ) ;
2009-04-06 19:01:32 -07:00
if ( unlikely ( err ) ) {
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" cannot mark inode dirty (ino=%lu): error %d loading inode block " ,
inode - > i_ino , err ) ;
2009-04-06 19:01:32 -07:00
return err ;
}
2014-10-13 15:53:22 -07:00
nilfs_update_inode ( inode , ibh , flags ) ;
2011-05-05 12:56:51 +09:00
mark_buffer_dirty ( ibh ) ;
2010-08-14 13:07:15 +09:00
nilfs_mdt_mark_dirty ( NILFS_I ( inode ) - > i_root - > ifile ) ;
2009-04-06 19:01:32 -07:00
brelse ( ibh ) ;
return 0 ;
}
/**
* nilfs_dirty_inode - reflect changes on given inode to an inode block .
* @ inode : inode of the file to be registered .
2022-05-12 07:05:29 -07:00
* @ flags : flags to determine the dirty state of the inode
2009-04-06 19:01:32 -07:00
*
* nilfs_dirty_inode ( ) loads a inode block containing the specified
* @ inode and copies data from a nilfs_inode to a corresponding inode
* entry in the inode block . This operation is excluded from the segment
* construction . This function can be called both as a single operation
* and as a part of indivisible file operations .
*/
2011-05-27 06:53:02 -04:00
void nilfs_dirty_inode ( struct inode * inode , int flags )
2009-04-06 19:01:32 -07:00
{
struct nilfs_transaction_info ti ;
2010-08-21 00:30:39 +09:00
struct nilfs_mdt_info * mdi = NILFS_MDT ( inode ) ;
2009-04-06 19:01:32 -07:00
if ( is_bad_inode ( inode ) ) {
2020-08-11 18:35:49 -07:00
nilfs_warn ( inode - > i_sb ,
" tried to mark bad_inode dirty. ignored. " ) ;
2009-04-06 19:01:32 -07:00
dump_stack ( ) ;
return ;
}
2010-08-21 00:30:39 +09:00
if ( mdi ) {
nilfs_mdt_mark_dirty ( inode ) ;
return ;
}
2009-04-06 19:01:32 -07:00
nilfs_transaction_begin ( inode - > i_sb , & ti , 0 ) ;
2014-10-13 15:53:22 -07:00
__nilfs_mark_inode_dirty ( inode , flags ) ;
2009-04-06 19:01:45 -07:00
nilfs_transaction_commit ( inode - > i_sb ) ; /* never fails */
2009-04-06 19:01:32 -07:00
}
2010-12-26 16:38:43 +09:00
int nilfs_fiemap ( struct inode * inode , struct fiemap_extent_info * fieinfo ,
__u64 start , __u64 len )
{
2011-05-05 12:56:51 +09:00
struct the_nilfs * nilfs = inode - > i_sb - > s_fs_info ;
2010-12-26 16:38:43 +09:00
__u64 logical = 0 , phys = 0 , size = 0 ;
__u32 flags = 0 ;
loff_t isize ;
sector_t blkoff , end_blkoff ;
sector_t delalloc_blkoff ;
unsigned long delalloc_blklen ;
unsigned int blkbits = inode - > i_blkbits ;
int ret , n ;
2020-05-23 09:30:14 +02:00
ret = fiemap_prep ( inode , fieinfo , start , & len , 0 ) ;
2010-12-26 16:38:43 +09:00
if ( ret )
return ret ;
2016-01-22 15:40:57 -05:00
inode_lock ( inode ) ;
2010-12-26 16:38:43 +09:00
isize = i_size_read ( inode ) ;
blkoff = start > > blkbits ;
end_blkoff = ( start + len - 1 ) > > blkbits ;
delalloc_blklen = nilfs_find_uncommitted_extent ( inode , blkoff ,
& delalloc_blkoff ) ;
do {
__u64 blkphy ;
unsigned int maxblocks ;
if ( delalloc_blklen & & blkoff = = delalloc_blkoff ) {
if ( size ) {
/* End of the current extent */
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size , flags ) ;
if ( ret )
break ;
}
if ( blkoff > end_blkoff )
break ;
flags = FIEMAP_EXTENT_MERGED | FIEMAP_EXTENT_DELALLOC ;
logical = blkoff < < blkbits ;
phys = 0 ;
size = delalloc_blklen < < blkbits ;
blkoff = delalloc_blkoff + delalloc_blklen ;
delalloc_blklen = nilfs_find_uncommitted_extent (
inode , blkoff , & delalloc_blkoff ) ;
continue ;
}
/*
* Limit the number of blocks that we look up so as
* not to get into the next delayed allocation extent .
*/
maxblocks = INT_MAX ;
if ( delalloc_blklen )
maxblocks = min_t ( sector_t , delalloc_blkoff - blkoff ,
maxblocks ) ;
blkphy = 0 ;
down_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
n = nilfs_bmap_lookup_contig (
NILFS_I ( inode ) - > i_bmap , blkoff , & blkphy , maxblocks ) ;
up_read ( & NILFS_MDT ( nilfs - > ns_dat ) - > mi_sem ) ;
if ( n < 0 ) {
int past_eof ;
if ( unlikely ( n ! = - ENOENT ) )
break ; /* error */
/* HOLE */
blkoff + + ;
past_eof = ( ( blkoff < < blkbits ) > = isize ) ;
if ( size ) {
/* End of the current extent */
if ( past_eof )
flags | = FIEMAP_EXTENT_LAST ;
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size , flags ) ;
if ( ret )
break ;
size = 0 ;
}
if ( blkoff > end_blkoff | | past_eof )
break ;
} else {
if ( size ) {
if ( phys & & blkphy < < blkbits = = phys + size ) {
/* The current extent goes on */
size + = n < < blkbits ;
} else {
/* Terminate the current extent */
ret = fiemap_fill_next_extent (
fieinfo , logical , phys , size ,
flags ) ;
if ( ret | | blkoff > end_blkoff )
break ;
/* Start another extent */
flags = FIEMAP_EXTENT_MERGED ;
logical = blkoff < < blkbits ;
phys = blkphy < < blkbits ;
size = n < < blkbits ;
}
} else {
/* Start a new extent */
flags = FIEMAP_EXTENT_MERGED ;
logical = blkoff < < blkbits ;
phys = blkphy < < blkbits ;
size = n < < blkbits ;
}
blkoff + = n ;
}
cond_resched ( ) ;
} while ( true ) ;
/* If ret is 1 then we just hit the end of the extent array */
if ( ret = = 1 )
ret = 0 ;
2016-01-22 15:40:57 -05:00
inode_unlock ( inode ) ;
2010-12-26 16:38:43 +09:00
return ret ;
}