linux

iv/linux

Author	SHA1	Message	Date
Steve French	97b82c07c4	cifs: trivial style fixup missing blank line after declaration Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:37:38 -05:00
Yang Yingliang	aea02fc40a	cifs: fix wrong unlock before return from cifs_tree_connect() It should unlock 'tcon->tc_lock' before return from cifs_tree_connect(). Fixes: fe67bd563ec2 ("cifs: avoid use of global locks for high contention data") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:45 -05:00
Shyam Prasad N	d7d7a66aac	cifs: avoid use of global locks for high contention data During analysis of multichannel perf, it was seen that the global locks cifs_tcp_ses_lock and GlobalMid_Lock, which were shared between various data structures were causing a lot of contention points. With this change, we're breaking down the use of these locks by introducing new locks at more granular levels. i.e. server->srv_lock, ses->ses_lock and tcon->tc_lock to protect the unprotected fields of server, session and tcon structs; and server->mid_lock to protect mid related lists and entries at server level. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:45 -05:00
Steve French	1bfa25ee30	cifs: remove remaining build warnings Removed remaining warnings related to externs. These warnings although harmless could be distracting e.g. fs/cifs/cifsfs.c: note: in included file: fs/cifs/cifsglob.h:1968:24: warning: symbol 'sesInfoAllocCount' was not declared. Should it be static? Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Enzo Matsumiya	9543c8ab30	cifs: list_for_each() -> list_for_each_entry() Replace list_for_each() by list_for_each_entr() where appropriate. Remove no longer used list_head stack variables. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Enzo Matsumiya	da3847894f	smb2: small refactor in smb2_check_message() If the command is SMB2_IOCTL, OutputLength and OutputContext are optional and can be zero, so return early and skip calculated length check. Move the mismatched length message to the end of the check, to avoid unnecessary logs when the check was not a real miscalculation. Also change the pr_warn_once() to a pr_warn() so we're sure to get a log for the real mismatches. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Matthew Wilcox (Oracle)	c6f62f81b4	cifs: Fix memory leak when using fscache If we hit the 'index == next_cached' case, we leak a refcount on the struct page. Fix this by using readahead_folio() which takes care of the refcount for you. Fixes: 0174ee9947bd ("cifs: Implement cache I/O by accessing the cache directly") Cc: David Howells <dhowells@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	89e42f49ef	cifs: remove minor build warning The build warning: warning: symbol 'cifs_tcp_ses_lock' was not declared. Should it be static? can be distracting. Fix two of these. Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	c2c17ddbf3	cifs: remove some camelCase and also some static build warnings Remove warnings for five global variables. For example: fs/cifs/cifsglob.h:1984:24: warning: symbol 'midCount' was not declared. Should it be static? Also change them from camelCase (e.g. "midCount" to "mid_count") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Yu Zhe	0827f71b88	cifs: remove unnecessary (void) conversions. One more. remove unnecessary void type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Yu Zhe	0f46608ae7	cifs: remove unnecessary type castings remove unnecessary void* type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Colin Ian King	4da2cd0517	cifs: remove redundant initialization to variable mnt_sign_enabled Variable mnt_sign_enabled is being initialized with a value that is never read, it is being reassigned later on with a different value. The initialization is redundant and can be removed. Cleans up clang scan-build warning: fs/cifs/cifssmb.c:465:7: warning: Value stored to 'mnt_sign_enabled during its initialization is never read Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	5fa2cffba0	smb3: check xattr value length earlier Coverity complains about assigning a pointer based on value length before checking that value length goes beyond the end of the SMB. Although this is even more unlikely as value length is a single byte, and the pointer is not dereferenced until laterm, it is clearer to check the lengths first. Addresses-Coverity: 1467704 ("Speculative execution data leak") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Hyunchul Lee	824d4f64c2	ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT if Status is not 0 and PathLength is long, smb_strndup_from_utf16 could make out of bound read in smb2_tree_connnect. This bug can lead an oops looking something like: [ 1553.882047] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882064] Read of size 2 at addr ffff88802c4eda04 by task kworker/0:2/42805 ... [ 1553.882095] Call Trace: [ 1553.882098] <TASK> [ 1553.882101] dump_stack_lvl+0x49/0x5f [ 1553.882107] print_report.cold+0x5e/0x5cf [ 1553.882112] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882122] kasan_report+0xaa/0x120 [ 1553.882128] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882139] __asan_report_load_n_noabort+0xf/0x20 [ 1553.882143] smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882155] ? smb_strtoUTF16+0x3b0/0x3b0 [ksmbd] [ 1553.882166] ? __kmalloc_node+0x185/0x430 [ 1553.882171] smb2_tree_connect+0x140/0xab0 [ksmbd] [ 1553.882185] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1553.882197] process_one_work+0x778/0x11c0 [ 1553.882201] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1553.882206] worker_thread+0x544/0x1180 [ 1553.882209] ? __cpuidle_text_end+0x4/0x4 [ 1553.882214] kthread+0x282/0x320 [ 1553.882218] ? process_one_work+0x11c0/0x11c0 [ 1553.882221] ? kthread_complete_and_exit+0x30/0x30 [ 1553.882225] ret_from_fork+0x1f/0x30 [ 1553.882231] </TASK> There is no need to check error request validation in server. This check allow invalid requests not to validate message. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17818 Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Hyunchul Lee	ac60778b87	ksmbd: prevent out of bound read for SMB2_WRITE OOB read memory can be written to a file, if DataOffset is 0 and Length is too large in SMB2_WRITE request of compound request. To prevent this, when checking the length of the data area of SMB2_WRITE in smb2_get_data_area_len(), let the minimum of DataOffset be the size of SMB2 header + the size of SMB2_WRITE header. This bug can lead an oops looking something like: [ 798.008715] BUG: KASAN: slab-out-of-bounds in copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008724] Read of size 252 at addr ffff88800f863e90 by task kworker/0:2/2859 ... [ 798.008754] Call Trace: [ 798.008756] <TASK> [ 798.008759] dump_stack_lvl+0x49/0x5f [ 798.008764] print_report.cold+0x5e/0x5cf [ 798.008768] ? __filemap_get_folio+0x285/0x6d0 [ 798.008774] ? copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008777] kasan_report+0xaa/0x120 [ 798.008781] ? copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008784] kasan_check_range+0x100/0x1e0 [ 798.008788] memcpy+0x24/0x60 [ 798.008792] copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008795] ? pagecache_get_page+0x53/0x160 [ 798.008799] ? iov_iter_get_pages_alloc+0x1590/0x1590 [ 798.008803] ? ext4_write_begin+0xfc0/0xfc0 [ 798.008807] ? current_time+0x72/0x210 [ 798.008811] generic_perform_write+0x2c8/0x530 [ 798.008816] ? filemap_fdatawrite_wbc+0x180/0x180 [ 798.008820] ? down_write+0xb4/0x120 [ 798.008824] ? down_write_killable+0x130/0x130 [ 798.008829] ext4_buffered_write_iter+0x137/0x2c0 [ 798.008833] ext4_file_write_iter+0x40b/0x1490 [ 798.008837] ? __fsnotify_parent+0x275/0xb20 [ 798.008842] ? __fsnotify_update_child_dentry_flags+0x2c0/0x2c0 [ 798.008846] ? ext4_buffered_write_iter+0x2c0/0x2c0 [ 798.008851] __kernel_write+0x3a1/0xa70 [ 798.008855] ? __x64_sys_preadv2+0x160/0x160 [ 798.008860] ? security_file_permission+0x4a/0xa0 [ 798.008865] kernel_write+0xbb/0x360 [ 798.008869] ksmbd_vfs_write+0x27e/0xb90 [ksmbd] [ 798.008881] ? ksmbd_vfs_read+0x830/0x830 [ksmbd] [ 798.008892] ? _raw_read_unlock+0x2a/0x50 [ 798.008896] smb2_write+0xb45/0x14e0 [ksmbd] [ 798.008909] ? __kasan_check_write+0x14/0x20 [ 798.008912] ? _raw_spin_lock_bh+0xd0/0xe0 [ 798.008916] ? smb2_read+0x15e0/0x15e0 [ksmbd] [ 798.008927] ? memcpy+0x4e/0x60 [ 798.008931] ? _raw_spin_unlock+0x19/0x30 [ 798.008934] ? ksmbd_smb2_check_message+0x16af/0x2350 [ksmbd] [ 798.008946] ? _raw_spin_lock_bh+0xe0/0xe0 [ 798.008950] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 798.008962] process_one_work+0x778/0x11c0 [ 798.008966] ? _raw_spin_lock_irq+0x8e/0xe0 [ 798.008970] worker_thread+0x544/0x1180 [ 798.008973] ? __cpuidle_text_end+0x4/0x4 [ 798.008977] kthread+0x282/0x320 [ 798.008982] ? process_one_work+0x11c0/0x11c0 [ 798.008985] ? kthread_complete_and_exit+0x30/0x30 [ 798.008989] ret_from_fork+0x1f/0x30 [ 798.008995] </TASK> Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17817 Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	cf6531d981	ksmbd: fix use-after-free bug in smb2_tree_disconect smb2_tree_disconnect() freed the struct ksmbd_tree_connect, but it left the dangling pointer. It can be accessed again under compound requests. This bug can lead an oops looking something link: [ 1685.468014 ] BUG: KASAN: use-after-free in ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468068 ] Read of size 4 at addr ffff888102172180 by task kworker/1:2/4807 ... [ 1685.468130 ] Call Trace: [ 1685.468132 ] <TASK> [ 1685.468135 ] dump_stack_lvl+0x49/0x5f [ 1685.468141 ] print_report.cold+0x5e/0x5cf [ 1685.468145 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468157 ] kasan_report+0xaa/0x120 [ 1685.468194 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468206 ] __asan_report_load4_noabort+0x14/0x20 [ 1685.468210 ] ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468222 ] smb2_tree_disconnect+0x175/0x250 [ksmbd] [ 1685.468235 ] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1685.468247 ] process_one_work+0x778/0x11c0 [ 1685.468251 ] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1685.468289 ] worker_thread+0x544/0x1180 [ 1685.468293 ] ? __cpuidle_text_end+0x4/0x4 [ 1685.468297 ] kthread+0x282/0x320 [ 1685.468301 ] ? process_one_work+0x11c0/0x11c0 [ 1685.468305 ] ? kthread_complete_and_exit+0x30/0x30 [ 1685.468309 ] ret_from_fork+0x1f/0x30 Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17816 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	aa7253c239	ksmbd: fix memory leak in smb2_handle_negotiate The allocated memory didn't free under an error path in smb2_handle_negotiate(). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17815 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	af7c39d971	ksmbd: fix racy issue while destroying session on multichannel After multi-channel connection with windows, Several channels of session are connected. Among them, if there is a problem in one channel, Windows connects again after disconnecting the channel. In this process, the session is released and a kernel oop can occurs while processing requests to other channels. When the channel is disconnected, if other channels still exist in the session after deleting the channel from the channel list in the session, the session should not be released. Finally, the session will be released after all channels are disconnected. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	a14c573870	ksmbd: use wait_event instead of schedule_timeout() ksmbd threads eating masses of cputime when connection is disconnected. If connection is disconnected, ksmbd thread waits for pending requests to be processed using schedule_timeout. schedule_timeout() incorrectly is used, and it is more efficient to use wait_event/wake_up than to check r_count every time with timeout. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Takashi Iwai	512b74d17a	exfat: Drop superfluous new line for error messages exfat_err() adds the new line at the end of the message by itself, hence the passed string shouldn't contain a new line. Drop the superfluous newline letters in the error messages in a few places that have been put mistakenly. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	64fca6e621	exfat: Downgrade ENAMETOOLONG error message to debug messages The ENAMETOOLONG error message is printed at each time when user tries to operate with a too long name, and this can flood the kernel logs easily, as every user can trigger this. Let's downgrade this error message level to a debug message for suppressing the superfluous logs. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	6425baabda	exfat: Expand exfat_err() and co directly to pr_() macro Currently the error and info messages handled by exfat_err() and co are tossed to exfat_msg() function that does nothing but passes the strings with printk() invocation. Not only that this is more overhead by the indirect calls, but also this makes harder to extend for the debug print usage; because of the direct printk() call, you cannot make it for dynamic debug or without debug like the standard helpers such as pr_debug() or dev_dbg(). For addressing the problem, this patch replaces exfat_() macro to expand to pr_*() directly. Along with it, add the new exfat_debug() macro that is expanded to pr_debug() (which output can be gracefully suppressed via dyndbg). Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	1b1a9195ae	exfat: Define NLS_NAME_* as bit flags explicitly NLS_NAME_* are bit flags although they are currently defined as enum; it's casually working so far (from 0 to 2), but it's error-prone and may bring a problem when we want to add more flag. This patch changes the definitions of NLS_NAME_* explicitly being bit flags. Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Takashi Iwai	86da53e8ff	exfat: Return ENAMETOOLONG consistently for oversized paths LTP has a test for oversized file path renames and it expects the return value to be ENAMETOOLONG. However, exfat returns EINVAL unexpectedly in some cases, hence LTP test fails. The further investigation indicated that the problem happens only when iocharset isn't set to utf8. The difference comes from that, in the case of utf8, exfat_utf8_to_utf16() returns the error -ENAMETOOLONG directly and it's treated as the final error code. Meanwhile, on other iocharsets, exfat_nls_to_ucs2() returns the max path size but it sets NLS_NAME_OVERLEN to lossy flag instead; the caller side checks only whether lossy flag is set or not, resulting in always -EINVAL unconditionally. This patch aligns the return code for both cases by checking the lossy flag bit and returning ENAMETOOLONG when NLS_NAME_OVERLEN bit is set. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	be17b1ccd4	exfat: remove duplicate write inode for extending dir/file Since the timestamps need to be updated, the directory entries will be updated by mark_inode_dirty() whether or not a new cluster is allocated for the file or directory, so there is no need to use __exfat_write_inode() to update the directory entries when allocating a new cluster for a file or directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	4493895b2b	exfat: remove duplicate write inode for truncating file This commit moves updating file attributes and timestamps before calling __exfat_write_inode(), so that all updates of the inode had been written by __exfat_write_inode(), mark_inode_dirty() is unneeded. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	23e6e1c9b3	exfat: reuse __exfat_write_inode() to update directory entry __exfat_write_inode() is used to update file and stream directory entries, except for file->start_clu and stream->flags. This commit moves update file->start_clu and stream->flags to __exfat_write_inode() and reuse __exfat_write_inode() to update directory entries. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:05 +09:00
Xie Shaowen	5e9466a5d0	xfs: delete extra space and tab in blank line delete extra space and tab in blank line, there is no functional change. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-07-31 09:21:27 -07:00
ChenXiaoSong	001c179c4e	xfs: fix NULL pointer dereference in xfs_getbmap() Reproducer: 1. fallocate -l 100M image 2. mkfs.xfs -f image 3. mount image /mnt 4. setxattr("/mnt", "trusted.overlay.upper", NULL, 0, XATTR_CREATE) 5. char arg[32] = "\x01\xff\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00" "\x00\x00\x00\x00\x00\x08\x00\x00\x00\xc6\x2a\xf7"; fd = open("/mnt", O_RDONLY\|O_DIRECTORY); ioctl(fd, _IOC(_IOC_READ\|_IOC_WRITE, 0x58, 0x2c, 0x20), arg); NULL pointer dereference will occur when race happens between xfs_getbmap() and xfs_bmap_set_attrforkoff(): ioctl \| setxattr ----------------------------\|--------------------------- xfs_getbmap \| xfs_ifork_ptr \| xfs_inode_has_attr_fork \| ip->i_forkoff == 0 \| return NULL \| ifp == NULL \| \| xfs_bmap_set_attrforkoff \| ip->i_forkoff > 0 xfs_inode_has_attr_fork \| ip->i_forkoff > 0 \| ifp == NULL \| ifp->if_format \| Fix this by locking i_lock before xfs_ifork_ptr(). Fixes: abbf9e8a4507 ("xfs: rewrite getbmap using the xfs_iext_* helpers") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: added fixes tag] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-07-31 09:21:27 -07:00
Hongnan Li	ecce9212d0	erofs: update ctx->pos for every emitted dirent erofs_readdir update ctx->pos after filling a batch of dentries and it may cause dir/files duplication for NFS readdirplus which depends on ctx->pos to fill dir correctly. So update ctx->pos for every emitted dirent in erofs_fill_dentries to fix it. Also fix the update of ctx->pos when the initial file position has exceeded nameoff. Fixes: 3e917cc305c6 ("erofs: make filesystem exportable") Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-07-31 22:26:29 +08:00
Chao Yu	09beadf289	f2fs: fix to do sanity check on segment type in build_sit_entries() As Wenqing Liu <wenqingliu0120@gmail.com> reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216285 RIP: 0010:memcpy_erms+0x6/0x10 f2fs_update_meta_page+0x84/0x570 [f2fs] change_curseg.constprop.0+0x159/0xbd0 [f2fs] f2fs_do_replace_block+0x5c7/0x18a0 [f2fs] f2fs_replace_block+0xeb/0x180 [f2fs] recover_data+0x1abd/0x6f50 [f2fs] f2fs_recover_fsync_data+0x12ce/0x3250 [f2fs] f2fs_fill_super+0x4459/0x6190 [f2fs] mount_bdev+0x2cf/0x3b0 legacy_get_tree+0xed/0x1d0 vfs_get_tree+0x81/0x2b0 path_mount+0x47e/0x19d0 do_mount+0xce/0xf0 __x64_sys_mount+0x12c/0x1a0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is segment type is invalid, so in f2fs_do_replace_block(), f2fs accesses f2fs_sm_info::curseg_array with out-of-range segment type, result in accessing invalid curseg->sum_blk during memcpy in f2fs_update_meta_page(). Fix this by adding sanity check on segment type in build_sit_entries(). Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:19:00 -07:00
Chao Yu	7b01ad7f33	f2fs: obsolete unused MAX_DISCARD_BLOCKS After commit a7eeb823854c ("f2fs: use bitmap in discard_entry"), MAX_DISCARD_BLOCKS became obsolete, remove it. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:18:09 -07:00
Chao Yu	141170b759	f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() As Dipanjan Das <mail.dipanjan.das@gmail.com> reported, syzkaller found a f2fs bug as below: RIP: 0010:f2fs_new_node_page+0x19ac/0x1fc0 fs/f2fs/node.c:1295 Call Trace: write_all_xattrs fs/f2fs/xattr.c:487 [inline] __f2fs_setxattr+0xe76/0x2e10 fs/f2fs/xattr.c:743 f2fs_setxattr+0x233/0xab0 fs/f2fs/xattr.c:790 f2fs_xattr_generic_set+0x133/0x170 fs/f2fs/xattr.c:86 __vfs_setxattr+0x115/0x180 fs/xattr.c:182 __vfs_setxattr_noperm+0x125/0x5f0 fs/xattr.c:216 __vfs_setxattr_locked+0x1cf/0x260 fs/xattr.c:277 vfs_setxattr+0x13f/0x330 fs/xattr.c:303 setxattr+0x146/0x160 fs/xattr.c:611 path_setxattr+0x1a7/0x1d0 fs/xattr.c:630 __do_sys_lsetxattr fs/xattr.c:653 [inline] __se_sys_lsetxattr fs/xattr.c:649 [inline] __x64_sys_lsetxattr+0xbd/0x150 fs/xattr.c:649 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 NAT entry and nat bitmap can be inconsistent, e.g. one nid is free in nat bitmap, and blkaddr in its NAT entry is not NULL_ADDR, it may trigger BUG_ON() in f2fs_new_node_page(), fix it. Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:57 -07:00
Chao Liu	8ee236dcaa	f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time If the inode has the compress flag, it will fail to use 'chattr -c +m' to remove its compress flag and tag no compress flag. However, the same command will be successful when executed again, as shown below: $ touch foo.txt $ chattr +c foo.txt $ chattr -c +m foo.txt chattr: Invalid argument while setting flags on foo.txt $ chattr -c +m foo.txt $ f2fs_io getflags foo.txt get a flag on foo.txt ret=0, flags=nocompression,inline_data Fix this by removing some checks in f2fs_setflags_common() that do not affect the original logic. I go through all the possible scenarios, and the results are as follows. Bold is the only thing that has changed. +---------------+-----------+-----------+----------+ \| \| file flags \| + command +-----------+-----------+----------+ \| \| no flag \| compr \| nocompr \| +---------------+-----------+-----------+----------+ \| chattr +c \| compr \| compr \| -EINVAL \| \| chattr -c \| no flag \| no flag \| nocompr \| \| chattr +m \| nocompr \| -EINVAL \| nocompr \| \| chattr -m \| no flag \| compr \| no flag \| \| chattr +c +m \| -EINVAL \| -EINVAL \| -EINVAL \| \| chattr +c -m \| compr \| compr \| compr \| \| chattr -c +m \| nocompr \| nocompr \| nocompr \| \| chattr -c -m \| no flag \| no flag \| no flag \| +---------------+-----------+-----------+----------+ Link: https://lore.kernel.org/linux-f2fs-devel/20220621064833.1079383-1-chaoliu719@gmail.com/ Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chao Liu <liuchao@coolpad.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00
Daeho Jeong	f8e2f32bcd	f2fs: introduce sysfs atomic write statistics introduce the below 4 new sysfs node for atomic write statistics. - current_atomic_write: the total current atomic write block count, which is not committed yet. - peak_atomic_write: the peak value of total current atomic write block count after boot. - committed_atomic_block: the accumulated total committed atomic write block count after boot. - revoked_atomic_block: the accumulated total revoked atomic write block count after boot. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00
qixiaoyu1	1adaa71ea9	f2fs: don't bother wait_ms by foreground gc f2fs_gc returns -EINVAL via f2fs_balance_fs when there is enough free secs after write checkpoint, but with gc_merge enabled, it will cause the sleep time of gc thread to be set to no_gc_sleep_time even if there are many dirty segments can be selected. Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00
Chao Yu	0d5b9d8156	f2fs: invalidate meta pages only for post_read required inode After commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write"), invalidate_mapping_pages() will be called to avoid race condition in between IPU/DIO and readahead for GC. However, readahead flow is only used for post_read required inode, so this patch adds check condition to avoids unnecessary page cache invalidating for non-post_read inode. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:06 -07:00
Chao Liu	a8634ccf5d	f2fs: allow compression of files without blocks Files created by truncate(1) have a size but no blocks, so they can be allowed to enable compression. Signed-off-by: Chao Liu <liuchao@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:06 -07:00
Chao Yu	7165841d57	f2fs: fix to check inline_data during compressed inode conversion When converting inode to compressed one via ioctl, it needs to check inline_data, since inline_data flag and compressed flag are incompatible. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:06 -07:00
Fabio M. De Francesco	1dd55358ef	f2fs: Delete f2fs_copy_page() and replace with memcpy_page() f2fs_copy_page() is a wrapper around two kmap() + one memcpy() from/to the mapped pages. It unnecessarily duplicates a kernel API and it makes use of kmap(), which is being deprecated in favor of kmap_local_page(). Two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Therefore, its use in __clone_blkaddrs() is safe and should be preferred. Delete f2fs_copy_page() and use a plain memcpy_page() in the only one site calling the removed function. memcpy_page() avoids open coding two kmap_local_page() + one memcpy() between the two kernel virtual addresses. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:57 -07:00
Chao Yu	67ca06872e	f2fs: fix to invalidate META_MAPPING before DIO write Quoted from commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") " Encrypted pages during GC are read and cached in META_MAPPING. However, due to cached pages in META_MAPPING, there is an issue where newly written pages are lost by IPU or DIO writes. Thread A - f2fs_gc() Thread B /* phase 3 / down_write(i_gc_rwsem) ra_data_block() ---- (a) up_write(i_gc_rwsem) f2fs_direct_IO() : - down_read(i_gc_rwsem) - __blockdev_direct_io() - get_data_block_dio_write() - f2fs_dio_submit_bio() ---- (b) - up_read(i_gc_rwsem) / phase 4 */ down_write(i_gc_rwsem) move_data_block() ---- (c) up_write(i_gc_rwsem) (a) In phase 3 of f2fs_gc(), up-to-date page is read from storage and cached in META_MAPPING. (b) In thread B, writing new data by IPU or DIO write on same blkaddr as read in (a). cached page in META_MAPPING become out-dated. (c) In phase 4 of f2fs_gc(), out-dated page in META_MAPPING is copied to new blkaddr. In conclusion, the newly written data in (b) is lost. To address this issue, invalidating pages in META_MAPPING before IPU or DIO write. " In previous commit, we missed to cover extent cache hit case, and passed wrong value for parameter @end of invalidate_mapping_pages(), fix both issues. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") Cc: Hyeong-Jun Kim <hj514.kim@samsung.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:20 -07:00
Jaegeuk Kim	8e0f54a70e	f2fs: add a sysfs entry to show zone capacity This patch adds a sysfs entry showing the unusable space in a section made by zone capacity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:20 -07:00
Jaegeuk Kim	074b5ea290	f2fs: adjust zone capacity when considering valid block count This patch fixes counting unusable blocks set by zone capacity when checking the valid block count in a section. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:20 -07:00
Jaegeuk Kim	b771aadc6e	f2fs: enforce single zone capacity In order to simplify the complicated per-zone capacity, let's support only one capacity for entire zoned device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:20 -07:00
duguowei	14de5fc3dd	f2fs: remove redundant code for gc condition Remove the redundant code and use local variant as the argument directly. Make it more human-readable. Signed-off-by: duguowei <duguowei@xiaomi.com> [Jaegeuk Kim: make code neat] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:20 -07:00
Daeho Jeong	7a8fc58618	f2fs: introduce memory mode Introduce memory mode to supports "normal" and "low" memory modes. "low" mode is to support low memory devices. Because of the nature of low memory devices, in this mode, f2fs will try to save memory sometimes by sacrificing performance. "normal" mode is the default mode and same as before. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:16:12 -07:00
Sebastian Andrzej Siewior	50417d22d0	fs/dcache: Move wakeup out of i_seq_dir write held region. __d_add() and __d_move() wake up waiters on dentry::d_wait from within the i_seq_dir write held region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. There is no requirement to do so. __d_lookup_unhash() has cleared DCACHE_PAR_LOOKUP and dentry::d_wait and returned the now unreachable wait queue head pointer to the caller, so the actual wake up can be postponed until the i_dir_seq write side critical section is left. The only requirement is that dentry::lock is held across the whole sequence including the wake up. The previous commit includes an analysis why this is considered safe. Move the wake up past end_dir_add() which leaves the i_dir_seq write side critical section and enables preemption. For non RT kernels there is no difference because preemption is still disabled due to dentry::lock being held, but it shortens the time between wake up and unlocking dentry::lock, which reduces the contention for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-30 00:38:16 -04:00
Sebastian Andrzej Siewior	45f78b0a27	fs/dcache: Move the wakeup from __d_lookup_done() to the caller. __d_lookup_done() wakes waiters on dentry->d_wait. On PREEMPT_RT we are not allowed to do that with preemption disabled, since the wakeup acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT. Calling it under dentry->d_lock is not a problem, since that is also a "sleeping" spinlock on the same configs. Unfortunately, two of its callers (__d_add() and __d_move()) are holding more than just ->d_lock and that needs to be dealt with. The key observation is that wakeup can be moved to any point before dropping ->d_lock. As a first step to solve this, move the wake up outside of the hlist_bl_lock() held section. This is safe because: Waiters get inserted into ->d_wait only after they'd taken ->d_lock and observed DCACHE_PAR_LOOKUP in flags. As long as they are woken up (and evicted from the queue) between the moment __d_lookup_done() has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe, since the waitqueue ->d_wait points to won't get destroyed without having __d_lookup_done(dentry) called (under ->d_lock). ->d_wait is set only by d_alloc_parallel() and only in case when it returns a freshly allocated in-lookup dentry. Whenever that happens, we are guaranteed that __d_lookup_done() will be called for resulting dentry (under ->d_lock) before the wq in question gets destroyed. With two exceptions wq lives in call frame of the caller of d_alloc_parallel() and we have an explicit d_lookup_done() on the resulting in-lookup dentry before we leave that frame. One of those exceptions is nfs_call_unlink(), where wq is embedded into (dynamically allocated) struct nfs_unlinkdata. It is destroyed in nfs_async_unlink_release() after an explicit d_lookup_done() on the dentry wq went into. Remaining exception is d_add_ci(). There wq is what we'd found in ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two instances of ->d_lookup() and they must have been given an in-lookup dentry. Which means that they'd been called by __lookup_slow() or lookup_open(), with wq in the call frame of one of those. Result of d_alloc_parallel() in d_add_ci() is fed to d_splice_alias(), which either returns non-NULL (and d_add_ci() does d_lookup_done()) or feeds dentry to __d_add() that will do __d_lookup_done() under ->d_lock. That concludes the analysis. Let __d_lookup_unhash(): 1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP 2) Unhash the dentry 3) Retrieve and clear dentry::d_wait 4) Unlock the hash and return the retrieved waitqueue head pointer 5) Let the caller handle the wake up. 6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce build failures for OOT code that used __d_lookup_done() and is not aware of the new return value. This does not yet solve the PREEMPT_RT problem completely because preemption is still disabled due to i_dir_seq being held for write. This will be addressed in subsequent steps. An alternative solution would be to switch the waitqueue to a simple waitqueue, but aside of Linus not being a fan of them, moving the wake up closer to the place where dentry::lock is unlocked reduces lock contention time for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.de Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-30 00:36:10 -04:00
Sebastian Andrzej Siewior	cf634d540a	fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT i_dir_seq is a sequence counter with a lock which is represented by the lowest bit. The writer atomically updates the counter which ensures that it can be modified by only one writer at a time. This requires preemption to be disabled across the write side critical section. On !PREEMPT_RT kernels this is implicit by the caller acquiring dentry::lock. On PREEMPT_RT kernels spin_lock() does not disable preemption which means that a preempting writer or reader would live lock. It's therefore required to disable preemption explicitly. An alternative solution would be to replace i_dir_seq with a seqlock_t for PREEMPT_RT, but that comes with its own set of problems due to arbitrary lock nesting. A pure sequence count with an associated spinlock is not possible because the locks held by the caller are not necessarily related. As the critical section is small, disabling preemption is a sensible solution. Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20220613140712.77932-2-bigeasy@linutronix.de Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-30 00:35:51 -04:00
Al Viro	40a3cb0d23	d_add_ci(): make sure we don't miss d_lookup_done() All callers of d_alloc_parallel() must make sure that resulting in-lookup dentry (if any) will encounter __d_lookup_done() before the final dput(). d_add_ci() might end up creating in-lookup dentries; they are fed to d_splice_alias(), which will normally make sure they meet __d_lookup_done(). However, it is possible to end up with d_splice_alias() failing with ERR_PTR(-ELOOP) without having done so. It takes a corrupted ntfs or case-insensitive xfs image, but neither should end up with memory corruption... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-07-30 00:29:05 -04:00

... 3 4 5 6 7 ...

77804 Commits