linux

iv/linux

Author	SHA1	Message	Date
Wei Yongjun	66348b723b	f2fs: fix error return code in f2fs_fill_super() Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Introduce by commit c0d39e(f2fs: fix return values from validate superblock) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-22 08:56:03 +09:00
Al Viro	0ecc833bac	mode_t, whack-a-mole at 11... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:13:05 -04:00
Al Viro	bdaec334bb	f2fs: use mnt_want_write_file() in ioctl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:12:56 -04:00
Namjae Jeon	6224da875e	f2fs: fix typo mistakes Fix typo mistakes. 1. I think that it should be 'L' instead of 'V'. 2. and try to fix 'Front' instead of 'Frone' Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 19:01:03 +09:00
Jaegeuk Kim	d64f80473b	f2fs: write checkpoint before starting FG_GC In order to be aware of prefree and free sections during FG_GC, let's start with write_checkpoint(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 18:21:24 +09:00
Zhihui Zhang	3315101f70	f2fs: fix the logic of IS_DNODE() If (ofs % (NIDS_PER_BLOCK + 1) == 0), the node is an indirect node block. Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 18:21:24 +09:00
Jaegeuk Kim	399368372e	f2fs: introduce a new global lock scheme In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations / DENTRY_OPS, / for directory operations / DATA_WRITE, / for data write / DATA_NEW, / for data allocation / DATA_TRUNC, / for data truncate / NODE_NEW, / for node allocation / NODE_TRUNC, / for node truncate / NODE_WRITE, / for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 18:21:18 +09:00
Jason Hrycay	1127a3d448	f2fs: move f2fs_balance_fs from truncate to punch_hole Move the f2fs_balance_fs out of the truncate_hole function and only perform that in punch_hole use case. The commit: ed60b1644e7f7e5dd67d21caf7e4425dff05dad0 intended to do this but moved it into truncate_hole to cover more cases. However, a deadlock scenario is possible when deleting an inode entry under specific conditions: f2fs_delete_entry() mutex_lock_op(sbi, DENTRY_OPS); truncate_hole() f2fs_balance_fs() mutex_lock(&sbi->gc_mutex); f2fs_gc() write_checkpoint() block_operations() mutex_lock_op(sbi, DENTRY_OPS); Lets move it into the punch_hole case to cover the original intent of avoiding it during fallocate's expand_inode_data case. Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 17:22:45 +09:00
Jaegeuk Kim	49952fa182	f2fs: reduce redundant spin_lock operations This patch reduces redundant spin_lock operations in alloc_nid_failed(). The alloc_nid_failed() does not need to delete entry and add one again by triggering spin_lock and spin_unlock redundantly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 22:19:03 +09:00
P J P	cfb185a148	f2fs: add NULL pointer check Commit - fa9150a84c - replaces a call to generic_writepages() in f2fs_write_data_pages() with write_cache_pages(), with a function pointer argument pointing to routine: __f2fs_writepage. -> https://git.kernel.org/linus/fa9150a84ca333f68127097c4fa1eda4b3913a22 This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid a possible NULL pointer dereference, in case if - mapping->a_ops->writepage - is NULL. Signed-off-by: P J P <ppandit@redhat.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:52 +09:00
Jaegeuk Kim	b2f2c390c5	f2fs: fix the bitmap consistency of dirty segments Like below, there are 8 segment bitmaps for SSR victim candidates. enum dirty_type { DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs / DIRTY_WARM_DATA, / dirty segments assigned as warm data logs / DIRTY_COLD_DATA, / dirty segments assigned as cold data logs / DIRTY_HOT_NODE, / dirty segments assigned as hot node logs / DIRTY_WARM_NODE, / dirty segments assigned as warm node logs / DIRTY_COLD_NODE, / dirty segments assigned as cold node logs / DIRTY, / to count # of dirty segments / PRE, / to count # of entirely obsolete segments */ NR_DIRTY_TYPE }; The upper 6 bitmaps indicates segments dirtied by active log areas respectively. And, the DIRTY bitmap integrates all the 6 bitmaps. For example, o DIRTY_HOT_DATA : 1010000 o DIRTY_WARM_DATA: 0100000 o DIRTY_COLD_DATA: 0001000 o DIRTY_HOT_NODE : 0000010 o DIRTY_WARM_NODE: 0000001 o DIRTY_COLD_NODE: 0000000 In this case, o DIRTY : 1111011, which means that we should guarantee the consistency between DIRTY and other bitmaps concreately. However, the SSR mode selects victims freely from any log types, which can set multiple bits across the various bitmap types. So, this patch eliminates this inconsistency. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:51 +09:00
Jaegeuk Kim	b74737541c	f2fs: avoid race for summary information In order to do GC more reliably, I'd like to lock the vicitm summary page until its GC is completed, and also prevent any checkpoint process. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:51 +09:00
Jaegeuk Kim	60374688a1	f2fs: allocate remained free segments in the LFS mode This patch adds a new condition that allocates free segments in the current active section even if SSR is needed. Otherwise, f2fs cannot allocate remained free segments in the section since SSR finds dirty segments only. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:50 +09:00
Jaegeuk Kim	4ebefc4443	f2fs: check completion of foreground GC The foreground GCs are triggered under not enough free sections. So, we should not skip moving valid blocks in the victim segments. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:50 +09:00
Jaegeuk Kim	5ec4e49f9b	f2fs: change GC bitmaps to apply the section granularity This patch removes a bitmap for victim segments selected by foreground GC, and modifies the other bitmap for victim segments selected by background GC. 1) foreground GC bitmap : We don't need to manage this, since we just only one previous victim section number instead of the whole victim history. The f2fs uses the victim section number in order not to allocate currently GC'ed section to current active logs. 2) background GC bitmap : This bitmap is used to avoid selecting victims repeatedly by background GCs. In addition, the victims are able to be selected by foreground GCs, since there is no need to read victim blocks during foreground GCs. By the fact that the foreground GC reclaims segments in a section unit, it'd be better to manage this bitmap based on the section granularity. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:49 +09:00
Jaegeuk Kim	33afa7fde0	f2fs: allocate new segment aligned with sections When allocating a new segment under the LFS mode, we should keep the section boundary. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:49 +09:00
Jaegeuk Kim	56ae674cc2	f2fs: remove redundant lock_page calls In get_node_page, we do not need to call lock_page all the time. If the node page is cached as uptodate, 1. grab_cache_page locks the page, 2. read_node_page unlocks the page, and 3. lock_page is called for further process. Let's avoid this. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 17:27:42 +09:00
Jaegeuk Kim	53cf95222f	f2fs: introduce TOTAL_SECS macro Let's use a macro to get the total number of sections. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 16:23:10 +09:00
Jaegeuk Kim	5c773ba33a	f2fs: do not use duplicate names in a macro A macro should not use duplicate parameter names. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-03 16:22:44 +09:00
Alexandru Gheorghiu	79b5793be4	f2fs: use kmemdup Use kmemdup instead of kzalloc and memcpy. Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-31 09:12:18 +09:00
Jaegeuk Kim	953a3e27e1	f2fs: fix to give correct parent inode number for roll forward When we recover fsync'ed data after power-off-recovery, we should guarantee that any parent inode number should be correct for each direct inode blocks. So, let's make the following rules. - The fsync should do checkpoint to all the inodes that were experienced hard links. - So, the only normal files can be recovered by roll-forward. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:25 +09:00
Jaegeuk Kim	fa37241743	f2fs: remain nat cache entries for further free nid allocation In the checkpoint flow, the f2fs investigates the total nat cache entries. Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the obsolete nid to the free nid list. However, this free nid will be reused sooner, resulting in its nat entry miss. In order to avoid this, we don't need to drop the nat cache entry at this moment. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:18 +09:00
Jaegeuk Kim	0ff153a2f1	f2fs: do not skip writing file meta during fsync This patch removes data_version check flow during the fsync call. The original purpose for the use of data_version was to avoid writng inode pages redundantly by the fsync calls repeatedly. However, when user can modify file meta and then call fsync, we should not skip fsync procedure. So, let's remove this condition check and hope that user triggers in right manner. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:16 +09:00
Jaegeuk Kim	6ead114232	f2fs: fix the recovery flow to handle errors correctly We should handle errors during the recovery flow correctly. For example, if we get -ENOMEM, we should report a mount failure instead of conducting the remained mount procedure. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:06 +09:00
Masanari Iida	111d2495a8	f2fs: fix typo in comments Correct spelling typo in comments Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:16 +09:00
Namjae Jeon	064e082328	f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inode In function check_nid_range, there is no need to trigger BUG_ON and make kernel stop. Instead it could just check and indicate the inode number to be EINVAL. Update the return path in do_read_inode to use the return from check_nid_range. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> [Jaegeuk: replace BUG_ON with WARN_ON] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:16 +09:00
Namjae Jeon	c0d39e65ba	f2fs: fix return values from validate superblock validate super block is not returning with proper values. When failure from sb_bread it should reflect there is an EIO otherwise it should return of EINVAL. Returning, '1' is not conveying proper message as the return type. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:15 +09:00
Namjae Jeon	7c909772f1	f2fs: reorganize f2fs_setxattr make use of F2FS_NAME_LEN for name length checking, change return conditions at few places, by assigning storing the errorvalue in 'error' and making a common exit path. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:15 +09:00
Namjae Jeon	d3ee456dfb	f2fs: notify when discard is not supported Change f2fs so that a warning is emitted when an attempt is made to mount a filesystem with the unsupported discard option. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:14 +09:00
Jaegeuk Kim	ae51fb31b8	f2fs: fix to call WRITE_FLUSH at the end of fsync The fsync call should be ended after flushing the in-device caches. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:14 +09:00
Jaegeuk Kim	04431c44e5	f2fs: fix not to allocate max_nid The build_free_nid should not add free nids over nm_i->max_nid. But, there was a hole that invalid free nid was added by the following scenario. Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids. build_free_nids - get_current_nat_page loads the last NAT page - scan_nat_page can add 100 ~ 200 nids -> Bug here! So, when scanning an NAT page, we should check each candidate whether it is over max_nid or not. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:13 +09:00
Jaegeuk Kim	c3850aa1cb	f2fs: fix return value of releasepage for node and data If the return value of releasepage is equal to zero, the page cannot be reclaimed. Instead, we should return 1 in order to reclaim clean pages. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:13 +09:00
Jaegeuk Kim	48cb76c7be	f2fs: scan next nat page to reuse free nids in there When we build new free nids, let's scan the just next NAT page instead of skipping a couple of previously scanned pages in order to reuse free nids in there. Otherwise, we can use too much wide range of nids even though several nids were deallocated, and also their node pages can be cached in the node_inode's address space. This means that we can retain lots of clean pages in the main memory, which induces mm's reclaiming overhead. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:12 +09:00
Jaegeuk Kim	08d8058be6	f2fs: should check the node page was truncated first Currently, f2fs doesn't reclaim any node pages. However, if we found that a node page was truncated by checking its block address with zero during f2fs_write_node_page, we should not skip that node page and return zero to reclaim it. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:12 +09:00
Jaegeuk Kim	393ff91f57	f2fs: reduce unncessary locking pages during read This patch reduces redundant locking and unlocking pages during read operations. In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page. And then, when we need to modify any data finally, let's lock the page so that we can avoid lock contention. [readpage rule] - The f2fs_readpage returns unlocked page, or released page too in error cases. - Its caller should handle read error, -EIO, after locking the page, which indicates read completion. - Its caller should check PageUptodate after grab_cache_page. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:06 +09:00
Masanari Iida	434720fa98	f2fs: Fix typo in comments Correct spelling typo in comments Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-03-19 09:44:55 +01:00
Namjae Jeon	25c0a6e529	f2fs: avoid extra ++ while returning from get_node_path In all the breaking conditions in get_node_path, 'n' is used to track index in offset[] array, but while breaking out also, in all paths n++ is done. So, remove the ++ from breaking paths. Also, avoid reset of 'level=0' in first case. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:36 +09:00
Jaegeuk Kim	5a20d339c7	f2fs: align f2fs maximum name length to linux based filesystem The maximum filename length supported in linux is 255 characters. So let's follow that. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:35 +09:00
Namjae Jeon	3aa770a9c9	f2fs: optimize and change return path in lookup_free_nid_list Optimize and change return path in lookup_free_nid_list Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:35 +09:00
Namjae Jeon	e0f56cb44b	f2fs: optimize get node page readahead part We can remove the call to find_get_page to get a page from the cache and check for up-to-date, instead we can make use of grab_cache_page part itself to fetch the page from the cache. So, removing the call and moving the PageUptodate at proper place, also taken care of moving the lock_page condition in the page_hit part. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:34 +09:00
Changman Lee	52c2db3f95	f2fs: check the level before calling get_nid function The caller of get_nid should be careful not to put lower value than NODE_DIR1_BLOCK in case of level is zero. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:34 +09:00
Jaegeuk Kim	266e97a81c	f2fs: introduce readahead mode of node pages Previously, f2fs reads several node pages ahead when get_dnode_of_data is called with RDONLY_NODE flag. And, this flag is set by the following functions. - get_data_block_ro - get_lock_data_page - do_write_data_page - truncate_blocks - truncate_hole However, this readahead mechanism is initially introduced for the use of get_data_block_ro to enhance the sequential read performance. So, let's clarify all the cases with the additional modes as follows. enum { ALLOC_NODE, /* allocate a new node page if needed / LOOKUP_NODE, / look up a node without readahead / LOOKUP_NODE_RA, / * look up a node with readahead called * by get_datablock_ro. */ } Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>	2013-03-18 21:00:33 +09:00
Jaegeuk Kim	66d36a2944	f2fs: read with READ_SYNC when getting dnode page The get_node_page_ra tries to: 1. grab or read a target node page for the given nid, 2. then, call ra_node_page to read other adjacent node pages in advance. So, when we try to read a target node page by #1, we should submit bio with READ_SYNC instead of READA. And, in #2, READA should be used. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>	2013-03-18 21:00:33 +09:00
Jaegeuk Kim	12faafe454	f2fs: fix to unlock node page when it was truncated If the node page was truncated, its block address became zero. This means that we don't need to write the node page, but have to unlock NODE_WRITE, decrease the number of dirty node pages, and then unlock_page before returning the f2fs_write_node_page with zero. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-18 21:00:09 +09:00
Changman Lee	12fc760fd6	f2fs: fix overflow when calculating utilization on 32-bit Use div_u64 to fix overflow when calculating utilization. long int is 4-bytes on 32-bit so (user blocks * 100) might be overflow if disk size is over e.g. 512GB. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-08 10:02:56 +09:00
Eric W. Biederman	7f78e03513	fs: Limit sys_mount to only request filesystem modules. Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-03 19:36:31 -08:00
Al Viro	6131ffaa1f	more file_inode() open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-27 16:59:05 -05:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Jaegeuk Kim	7dd690c820	f2fs: avoid build warning This patch removes the following build warning: fs/f2fs/node.c: warning: 'nofs' may be used uninitialized in this function [-Wuninitialized]: => 738:8 Note that this is a false alarm. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-02-12 07:28:55 +09:00

... 38 39 40 41 42 ...

2101 Commits