License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
// SPDX-License-Identifier: GPL-2.0
2010-10-28 05:30:10 +04:00
/*
* linux / fs / ext4 / page - io . c
*
* This contains the new page_io functions for ext4
*
* Written by Theodore Ts ' o , 2010.
*/
# include <linux/fs.h>
# include <linux/time.h>
# include <linux/highuid.h>
# include <linux/pagemap.h>
# include <linux/quotaops.h>
# include <linux/string.h>
# include <linux/buffer_head.h>
# include <linux/writeback.h>
# include <linux/pagevec.h>
# include <linux/mpage.h>
# include <linux/namei.h>
# include <linux/uio.h>
# include <linux/bio.h>
# include <linux/workqueue.h>
# include <linux/kernel.h>
# include <linux/slab.h>
2013-01-28 18:32:54 +04:00
# include <linux/mm.h>
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 01:07:14 +03:00
# include <linux/sched/mm.h>
2010-10-28 05:30:10 +04:00
# include "ext4_jbd2.h"
# include "xattr.h"
# include "acl.h"
2013-04-12 07:48:32 +04:00
static struct kmem_cache * io_end_cachep ;
2019-10-16 10:37:10 +03:00
static struct kmem_cache * io_end_vec_cachep ;
2010-10-28 05:30:10 +04:00
2010-10-28 05:30:14 +04:00
int __init ext4_init_pageio ( void )
2010-10-28 05:30:10 +04:00
{
io_end_cachep = KMEM_CACHE ( ext4_io_end , SLAB_RECLAIM_ACCOUNT ) ;
2013-04-12 07:48:32 +04:00
if ( io_end_cachep = = NULL )
2010-10-28 05:30:10 +04:00
return - ENOMEM ;
2019-10-16 10:37:10 +03:00
io_end_vec_cachep = KMEM_CACHE ( ext4_io_end_vec , 0 ) ;
if ( io_end_vec_cachep = = NULL ) {
kmem_cache_destroy ( io_end_cachep ) ;
return - ENOMEM ;
}
2010-10-28 05:30:10 +04:00
return 0 ;
}
2010-10-28 05:30:14 +04:00
void ext4_exit_pageio ( void )
2010-10-28 05:30:10 +04:00
{
kmem_cache_destroy ( io_end_cachep ) ;
2019-10-16 10:37:10 +03:00
kmem_cache_destroy ( io_end_vec_cachep ) ;
}
struct ext4_io_end_vec * ext4_alloc_io_end_vec ( ext4_io_end_t * io_end )
{
struct ext4_io_end_vec * io_end_vec ;
io_end_vec = kmem_cache_zalloc ( io_end_vec_cachep , GFP_NOFS ) ;
if ( ! io_end_vec )
return ERR_PTR ( - ENOMEM ) ;
INIT_LIST_HEAD ( & io_end_vec - > list ) ;
list_add_tail ( & io_end_vec - > list , & io_end - > list_vec ) ;
return io_end_vec ;
}
static void ext4_free_io_end_vec ( ext4_io_end_t * io_end )
{
struct ext4_io_end_vec * io_end_vec , * tmp ;
if ( list_empty ( & io_end - > list_vec ) )
return ;
list_for_each_entry_safe ( io_end_vec , tmp , & io_end - > list_vec , list ) {
list_del ( & io_end_vec - > list ) ;
kmem_cache_free ( io_end_vec_cachep , io_end_vec ) ;
}
}
struct ext4_io_end_vec * ext4_last_io_end_vec ( ext4_io_end_t * io_end )
{
BUG_ON ( list_empty ( & io_end - > list_vec ) ) ;
return list_last_entry ( & io_end - > list_vec , struct ext4_io_end_vec , list ) ;
2010-10-28 05:30:10 +04:00
}
2013-06-04 22:23:41 +04:00
/*
* Print an buffer I / O error compatible with the fs / buffer . c . This
* provides compatibility with dmesg scrapers that look for a specific
* buffer I / O error message . We really need a unified error reporting
* structure to userspace ala Digital Unix ' s uerf system , but it ' s
* probably not going to happen in my lifetime , due to LKML politics . . .
*/
static void buffer_io_error ( struct buffer_head * bh )
{
2015-04-13 15:31:37 +03:00
printk_ratelimited ( KERN_ERR " Buffer I/O error on device %pg, logical block %llu \n " ,
bh - > b_bdev ,
2013-06-04 22:23:41 +04:00
( unsigned long long ) bh - > b_blocknr ) ;
}
static void ext4_finish_bio ( struct bio * bio )
{
2023-03-24 21:01:04 +03:00
struct folio_iter fi ;
2013-06-04 22:23:41 +04:00
2023-03-24 21:01:04 +03:00
bio_for_each_folio_all ( fi , bio ) {
struct folio * folio = fi . folio ;
struct folio * io_folio = NULL ;
2013-06-04 22:23:41 +04:00
struct buffer_head * bh , * head ;
2023-03-24 21:01:04 +03:00
size_t bio_start = fi . offset ;
size_t bio_end = bio_start + fi . length ;
2013-06-04 22:23:41 +04:00
unsigned under_io = 0 ;
unsigned long flags ;
2023-03-24 21:01:04 +03:00
if ( fscrypt_is_bounce_folio ( folio ) ) {
io_folio = folio ;
folio = fscrypt_pagecache_folio ( folio ) ;
2015-04-12 07:55:10 +03:00
}
2017-06-03 10:38:06 +03:00
if ( bio - > bi_status ) {
2023-03-24 21:01:04 +03:00
int err = blk_status_to_errno ( bio - > bi_status ) ;
folio_set_error ( folio ) ;
mapping_set_error ( folio - > mapping , err ) ;
2013-06-04 22:23:41 +04:00
}
2023-03-24 21:01:04 +03:00
bh = head = folio_buffers ( folio ) ;
2013-06-04 22:23:41 +04:00
/*
2023-03-24 21:01:04 +03:00
* We check all buffers in the folio under b_uptodate_lock
2013-06-04 22:23:41 +04:00
* to avoid races with other end io clearing async_write flags
*/
2019-11-18 16:28:24 +03:00
spin_lock_irqsave ( & head - > b_uptodate_lock , flags ) ;
2013-06-04 22:23:41 +04:00
do {
if ( bh_offset ( bh ) < bio_start | |
bh_offset ( bh ) + bh - > b_size > bio_end ) {
if ( buffer_async_write ( bh ) )
under_io + + ;
continue ;
}
clear_buffer_async_write ( bh ) ;
2022-03-21 17:44:38 +03:00
if ( bio - > bi_status ) {
set_buffer_write_io_error ( bh ) ;
2013-06-04 22:23:41 +04:00
buffer_io_error ( bh ) ;
2022-03-21 17:44:38 +03:00
}
2013-06-04 22:23:41 +04:00
} while ( ( bh = bh - > b_this_page ) ! = head ) ;
2019-11-18 16:28:24 +03:00
spin_unlock_irqrestore ( & head - > b_uptodate_lock , flags ) ;
2015-04-12 07:55:10 +03:00
if ( ! under_io ) {
2023-03-24 21:01:04 +03:00
fscrypt_free_bounce_page ( & io_folio - > page ) ;
folio_end_writeback ( folio ) ;
2015-04-12 07:55:10 +03:00
}
2013-06-04 22:23:41 +04:00
}
}
2013-06-04 19:58:58 +04:00
static void ext4_release_io_end ( ext4_io_end_t * io_end )
2010-10-28 05:30:10 +04:00
{
2013-06-04 22:23:41 +04:00
struct bio * bio , * next_bio ;
2013-06-04 19:58:58 +04:00
BUG_ON ( ! list_empty ( & io_end - > list ) ) ;
BUG_ON ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) ;
2013-06-04 21:21:11 +04:00
WARN_ON ( io_end - > handle ) ;
2013-06-04 19:58:58 +04:00
2013-06-04 22:23:41 +04:00
for ( bio = io_end - > bio ; bio ; bio = next_bio ) {
next_bio = bio - > bi_private ;
ext4_finish_bio ( bio ) ;
bio_put ( bio ) ;
}
2019-10-16 10:37:10 +03:00
ext4_free_io_end_vec ( io_end ) ;
2013-06-04 19:58:58 +04:00
kmem_cache_free ( io_end_cachep , io_end ) ;
}
2013-06-04 22:30:00 +04:00
/*
* Check a range of space and convert unwritten extents to written . Note that
* we are protected from truncate touching same part of extent tree by the
* fact that truncate code waits for all DIO to finish ( thus exclusion from
* direct IO is achieved ) and also waits for PageWriteback bits . Thus we
* cannot get to ext4_ext_truncate ( ) before all IOs overlapping that range are
* completed ( happens from ext4_free_ioend ( ) ) .
*/
2019-10-16 10:37:07 +03:00
static int ext4_end_io_end ( ext4_io_end_t * io_end )
2010-10-28 05:30:10 +04:00
{
2019-10-16 10:37:07 +03:00
struct inode * inode = io_end - > inode ;
handle_t * handle = io_end - > handle ;
2010-10-28 05:30:10 +04:00
int ret = 0 ;
2019-10-16 10:37:07 +03:00
ext4_debug ( " ext4_end_io_nolock: io_end 0x%p from inode %lu,list->next 0x%p, "
2010-10-28 05:30:10 +04:00
" list->prev 0x%p \n " ,
2019-10-16 10:37:07 +03:00
io_end , inode - > i_ino , io_end - > list . next , io_end - > list . prev ) ;
2010-10-28 05:30:10 +04:00
2019-10-16 10:37:07 +03:00
io_end - > handle = NULL ; /* Following call will use up the handle */
2019-10-16 10:37:08 +03:00
ret = ext4_convert_unwritten_io_end_vec ( handle , io_end ) ;
2023-06-16 19:50:49 +03:00
if ( ret < 0 & & ! ext4_forced_shutdown ( inode - > i_sb ) ) {
2011-10-31 18:56:32 +04:00
ext4_msg ( inode - > i_sb , KERN_EMERG ,
" failed to convert unwritten extents to written "
" extents -- potential data loss! "
2019-10-16 10:37:10 +03:00
" (inode %lu, error %d) " , inode - > i_ino , ret ) ;
2010-10-28 05:30:10 +04:00
}
2019-10-16 10:37:07 +03:00
ext4_clear_io_unwritten_flag ( io_end ) ;
ext4_release_io_end ( io_end ) ;
2010-10-28 05:30:10 +04:00
return ret ;
}
2013-06-04 22:21:02 +04:00
static void dump_completed_IO ( struct inode * inode , struct list_head * head )
2012-09-29 08:14:55 +04:00
{
# ifdef EXT4FS_DEBUG
struct list_head * cur , * before , * after ;
2019-10-16 10:37:07 +03:00
ext4_io_end_t * io_end , * io_end0 , * io_end1 ;
2012-09-29 08:14:55 +04:00
2013-06-04 22:21:02 +04:00
if ( list_empty ( head ) )
2012-09-29 08:14:55 +04:00
return ;
2013-06-04 22:21:02 +04:00
ext4_debug ( " Dump inode %lu completed io list \n " , inode - > i_ino ) ;
2019-10-16 10:37:07 +03:00
list_for_each_entry ( io_end , head , list ) {
cur = & io_end - > list ;
2012-09-29 08:14:55 +04:00
before = cur - > prev ;
2019-10-16 10:37:07 +03:00
io_end0 = container_of ( before , ext4_io_end_t , list ) ;
2012-09-29 08:14:55 +04:00
after = cur - > next ;
2019-10-16 10:37:07 +03:00
io_end1 = container_of ( after , ext4_io_end_t , list ) ;
2012-09-29 08:14:55 +04:00
ext4_debug ( " io 0x%p from inode %lu,prev 0x%p,next 0x%p \n " ,
2019-10-16 10:37:07 +03:00
io_end , inode - > i_ino , io_end0 , io_end1 ) ;
2012-09-29 08:14:55 +04:00
}
# endif
}
/* Add the io_end to per-inode completed end_io list. */
2013-06-04 19:58:58 +04:00
static void ext4_add_complete_io ( ext4_io_end_t * io_end )
2010-10-28 05:30:10 +04:00
{
2012-09-29 08:14:55 +04:00
struct ext4_inode_info * ei = EXT4_I ( io_end - > inode ) ;
2013-10-16 16:25:11 +04:00
struct ext4_sb_info * sbi = EXT4_SB ( io_end - > inode - > i_sb ) ;
2012-09-29 08:14:55 +04:00
struct workqueue_struct * wq ;
unsigned long flags ;
2013-09-04 17:04:39 +04:00
/* Only reserved conversions from writeback should enter here */
WARN_ON ( ! ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) ) ;
2013-10-16 16:25:11 +04:00
WARN_ON ( ! io_end - > handle & & sbi - > s_journal ) ;
2011-10-31 02:26:08 +04:00
spin_lock_irqsave ( & ei - > i_completed_io_lock , flags ) ;
2013-10-16 16:25:11 +04:00
wq = sbi - > rsv_conversion_wq ;
2013-09-04 17:04:39 +04:00
if ( list_empty ( & ei - > i_rsv_conversion_list ) )
queue_work ( wq , & ei - > i_rsv_conversion_work ) ;
list_add_tail ( & io_end - > list , & ei - > i_rsv_conversion_list ) ;
2012-09-29 08:14:55 +04:00
spin_unlock_irqrestore ( & ei - > i_completed_io_lock , flags ) ;
}
2011-10-31 02:26:08 +04:00
2013-06-04 22:21:02 +04:00
static int ext4_do_flush_completed_IO ( struct inode * inode ,
struct list_head * head )
2012-09-29 08:14:55 +04:00
{
2019-10-16 10:37:07 +03:00
ext4_io_end_t * io_end ;
2013-01-28 18:49:15 +04:00
struct list_head unwritten ;
2012-09-29 08:14:55 +04:00
unsigned long flags ;
struct ext4_inode_info * ei = EXT4_I ( inode ) ;
int err , ret = 0 ;
spin_lock_irqsave ( & ei - > i_completed_io_lock , flags ) ;
2013-06-04 22:21:02 +04:00
dump_completed_IO ( inode , head ) ;
list_replace_init ( head , & unwritten ) ;
2012-09-29 08:14:55 +04:00
spin_unlock_irqrestore ( & ei - > i_completed_io_lock , flags ) ;
while ( ! list_empty ( & unwritten ) ) {
2019-10-16 10:37:07 +03:00
io_end = list_entry ( unwritten . next , ext4_io_end_t , list ) ;
BUG_ON ( ! ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) ) ;
list_del_init ( & io_end - > list ) ;
2012-09-29 08:14:55 +04:00
2019-10-16 10:37:07 +03:00
err = ext4_end_io_end ( io_end ) ;
2012-09-29 08:14:55 +04:00
if ( unlikely ( ! ret & & err ) )
ret = err ;
}
return ret ;
}
/*
2013-06-04 22:21:02 +04:00
* work on completed IO , to convert unwritten extents to extents
2012-09-29 08:14:55 +04:00
*/
2013-06-04 22:21:02 +04:00
void ext4_end_io_rsv_work ( struct work_struct * work )
{
struct ext4_inode_info * ei = container_of ( work , struct ext4_inode_info ,
i_rsv_conversion_work ) ;
ext4_do_flush_completed_IO ( & ei - > vfs_inode , & ei - > i_rsv_conversion_list ) ;
}
2010-10-28 05:30:10 +04:00
ext4_io_end_t * ext4_init_io_end ( struct inode * inode , gfp_t flags )
{
2019-10-16 10:37:07 +03:00
ext4_io_end_t * io_end = kmem_cache_zalloc ( io_end_cachep , flags ) ;
if ( io_end ) {
io_end - > inode = inode ;
INIT_LIST_HEAD ( & io_end - > list ) ;
2019-10-16 10:37:10 +03:00
INIT_LIST_HEAD ( & io_end - > list_vec ) ;
2021-07-19 08:59:14 +03:00
refcount_set ( & io_end - > count , 1 ) ;
2010-10-28 05:30:10 +04:00
}
2019-10-16 10:37:07 +03:00
return io_end ;
2010-10-28 05:30:10 +04:00
}
2013-06-04 19:58:58 +04:00
void ext4_put_io_end_defer ( ext4_io_end_t * io_end )
{
2021-07-19 08:59:14 +03:00
if ( refcount_dec_and_test ( & io_end - > count ) ) {
2019-10-16 10:37:10 +03:00
if ( ! ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) | |
list_empty ( & io_end - > list_vec ) ) {
2013-06-04 19:58:58 +04:00
ext4_release_io_end ( io_end ) ;
return ;
}
ext4_add_complete_io ( io_end ) ;
}
}
int ext4_put_io_end ( ext4_io_end_t * io_end )
{
int err = 0 ;
2021-07-19 08:59:14 +03:00
if ( refcount_dec_and_test ( & io_end - > count ) ) {
2013-06-04 19:58:58 +04:00
if ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) {
2019-10-16 10:37:08 +03:00
err = ext4_convert_unwritten_io_end_vec ( io_end - > handle ,
io_end ) ;
2013-06-04 21:21:11 +04:00
io_end - > handle = NULL ;
2013-06-04 19:58:58 +04:00
ext4_clear_io_unwritten_flag ( io_end ) ;
}
ext4_release_io_end ( io_end ) ;
}
return err ;
}
ext4_io_end_t * ext4_get_io_end ( ext4_io_end_t * io_end )
{
2021-07-19 08:59:14 +03:00
refcount_inc ( & io_end - > count ) ;
2013-06-04 19:58:58 +04:00
return io_end ;
}
2013-07-11 05:31:04 +04:00
/* BIO completion function for page writeback */
2015-07-20 16:29:37 +03:00
static void ext4_end_bio ( struct bio * bio )
2010-10-28 05:30:10 +04:00
{
ext4_io_end_t * io_end = bio - > bi_private ;
2013-10-12 02:44:27 +04:00
sector_t bi_sector = bio - > bi_iter . bi_sector ;
2010-10-28 05:30:10 +04:00
2022-03-04 21:01:04 +03:00
if ( WARN_ONCE ( ! io_end , " io_end is NULL: %pg: sector %Lu len %u err %d \n " ,
bio - > bi_bdev ,
2017-05-01 03:08:05 +03:00
( long long ) bio - > bi_iter . bi_sector ,
( unsigned ) bio_sectors ( bio ) ,
2017-06-03 10:38:06 +03:00
bio - > bi_status ) ) {
2017-05-01 03:08:05 +03:00
ext4_finish_bio ( bio ) ;
bio_put ( bio ) ;
return ;
}
2010-10-28 05:30:10 +04:00
bio - > bi_end_io = NULL ;
2013-04-12 07:48:32 +04:00
2017-06-03 10:38:06 +03:00
if ( bio - > bi_status ) {
2013-06-04 22:23:41 +04:00
struct inode * inode = io_end - > inode ;
2014-04-07 18:54:20 +04:00
ext4_warning ( inode - > i_sb , " I/O error %d writing to inode %lu "
2019-10-16 10:37:10 +03:00
" starting block %llu) " ,
2017-06-03 10:38:06 +03:00
bio - > bi_status , inode - > i_ino ,
2010-11-08 21:43:33 +03:00
( unsigned long long )
ext4: Fix data corruption with multi-block writepages support
This fixes a corruption problem with the multi-block
writepages submittal change for ext4, from commit
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio
layer instead of buffer layer in mpage_da_submit_io").
(Note that this corruption is not present in 2.6.37 on
ext4, because the corruption was detected after the
feature was merged in 2.6.37-rc1, and so it was turned
off by adding a non-default mount option,
mblk_io_submit. With this commit, which hopefully
fixes the last of the bugs with this feature, we'll be
able to turn on this performance feature by default in
2.6.38, and remove the mblk_io_submit option.)
The ext4 code path to bundle multiple pages for
writeback in ext4_bio_write_page() had a bug: we should
be clearing buffer head dirty flags *before* we submit
the bio, not in the completion routine.
The patch below was tested on 2.6.37 under KVM with the
postgresql script which was submitted by Jon Nelson as
documented in commit 1449032be1.
Without the patch, I'd hit the corruption problem about
50-70% of the time. With the patch, I executed the
script > 100 times with no corruption seen.
I also fixed a bug to make sure ext4_end_bio() doesn't
dereference the bio after the bio_put() call.
Reported-by: Jon Nelson <jnelson@jamponi.net>
Reported-by: Matthias Bayer <jackdachef@gmail.com>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-02-07 20:46:14 +03:00
bi_sector > > ( inode - > i_blkbits - 9 ) ) ;
2017-06-03 10:38:06 +03:00
mapping_set_error ( inode - > i_mapping ,
blk_status_to_errno ( bio - > bi_status ) ) ;
2010-11-08 21:43:33 +03:00
}
2013-07-11 05:31:04 +04:00
if ( io_end - > flag & EXT4_IO_END_UNWRITTEN ) {
/*
* Link bio into list hanging from io_end . We have to do it
* atomically as bio completions can be racing against each
* other .
*/
bio - > bi_private = xchg ( & io_end - > bio , bio ) ;
ext4_put_io_end_defer ( io_end ) ;
} else {
/*
* Drop io_end reference early . Inode can get freed once
* we finish the bio .
*/
ext4_put_io_end_defer ( io_end ) ;
ext4_finish_bio ( bio ) ;
bio_put ( bio ) ;
}
2010-10-28 05:30:10 +04:00
}
void ext4_io_submit ( struct ext4_io_submit * io )
{
struct bio * bio = io - > io_bio ;
if ( bio ) {
2022-02-22 18:46:33 +03:00
if ( io - > io_wbc - > sync_mode = = WB_SYNC_ALL )
io - > io_bio - > bi_opf | = REQ_SYNC ;
2016-06-05 22:31:41 +03:00
submit_bio ( io - > io_bio ) ;
2010-10-28 05:30:10 +04:00
}
2011-02-22 05:01:42 +03:00
io - > io_bio = NULL ;
2013-06-04 19:58:58 +04:00
}
void ext4_io_submit_init ( struct ext4_io_submit * io ,
struct writeback_control * wbc )
{
2015-07-22 06:50:24 +03:00
io - > io_wbc = wbc ;
2013-06-04 19:58:58 +04:00
io - > io_bio = NULL ;
2011-02-22 05:01:42 +03:00
io - > io_end = NULL ;
2010-10-28 05:30:10 +04:00
}
2019-10-31 12:23:15 +03:00
static void io_submit_init_bio ( struct ext4_io_submit * io ,
struct buffer_head * bh )
2010-10-28 05:30:10 +04:00
{
struct bio * bio ;
2019-10-31 12:23:15 +03:00
/*
* bio_alloc will _always_ be able to allocate a bio if
* __GFP_DIRECT_RECLAIM is set , see comments for bio_alloc_bioset ( ) .
*/
2022-02-22 18:46:33 +03:00
bio = bio_alloc ( bh - > b_bdev , BIO_MAX_VECS , REQ_OP_WRITE , GFP_NOIO ) ;
2020-07-02 04:56:07 +03:00
fscrypt_set_bio_crypt_ctx_bh ( bio , bh , GFP_NOIO ) ;
2013-10-12 02:44:27 +04:00
bio - > bi_iter . bi_sector = bh - > b_blocknr * ( bh - > b_size > > 9 ) ;
2010-10-28 05:30:10 +04:00
bio - > bi_end_io = ext4_end_bio ;
2013-06-04 19:58:58 +04:00
bio - > bi_private = ext4_get_io_end ( io - > io_end ) ;
2010-10-28 05:30:10 +04:00
io - > io_bio = bio ;
io - > io_next_block = bh - > b_blocknr ;
2018-12-05 20:10:34 +03:00
wbc_init_bio ( io - > io_wbc , bio ) ;
2010-10-28 05:30:10 +04:00
}
2019-10-31 12:23:15 +03:00
static void io_submit_add_bh ( struct ext4_io_submit * io ,
struct inode * inode ,
2023-03-24 21:01:03 +03:00
struct folio * folio ,
struct folio * io_folio ,
2019-10-31 12:23:15 +03:00
struct buffer_head * bh )
2010-10-28 05:30:10 +04:00
{
2020-07-02 04:56:07 +03:00
if ( io - > io_bio & & ( bh - > b_blocknr ! = io - > io_next_block | |
! fscrypt_mergeable_bio_bh ( io - > io_bio , bh ) ) ) {
2010-10-28 05:30:10 +04:00
submit_and_retry :
ext4_io_submit ( io ) ;
}
2022-03-04 20:55:56 +03:00
if ( io - > io_bio = = NULL )
2019-10-31 12:23:15 +03:00
io_submit_init_bio ( io , bh ) ;
2023-03-24 21:01:03 +03:00
if ( ! bio_add_folio ( io - > io_bio , io_folio , bh - > b_size , bh_offset ( bh ) ) )
2013-06-04 19:58:58 +04:00
goto submit_and_retry ;
2023-03-24 21:01:03 +03:00
wbc_account_cgroup_owner ( io - > io_wbc , & folio - > page , bh - > b_size ) ;
2010-10-28 05:30:10 +04:00
io - > io_next_block + + ;
}
2023-03-24 21:01:08 +03:00
int ext4_bio_write_folio ( struct ext4_io_submit * io , struct folio * folio ,
size_t len )
2010-10-28 05:30:10 +04:00
{
2023-03-24 21:01:03 +03:00
struct folio * io_folio = folio ;
struct inode * inode = folio - > mapping - > host ;
2016-09-30 09:14:56 +03:00
unsigned block_start ;
2010-10-28 05:30:10 +04:00
struct buffer_head * bh , * head ;
int ret = 0 ;
2015-10-03 06:54:58 +03:00
int nr_to_submit = 0 ;
2020-12-11 09:54:24 +03:00
struct writeback_control * wbc = io - > io_wbc ;
2022-12-07 14:27:05 +03:00
bool keep_towrite = false ;
2010-10-28 05:30:10 +04:00
2023-03-24 21:01:03 +03:00
BUG_ON ( ! folio_test_locked ( folio ) ) ;
BUG_ON ( folio_test_writeback ( folio ) ) ;
2010-10-28 05:30:10 +04:00
2023-03-24 21:01:03 +03:00
folio_clear_error ( folio ) ;
2010-10-28 05:30:10 +04:00
2014-05-27 20:48:55 +04:00
/*
2014-06-09 00:03:35 +04:00
* Comments copied from block_write_full_page :
2014-05-27 20:48:55 +04:00
*
2023-03-24 21:01:03 +03:00
* The folio straddles i_size . It must be zeroed out on each and every
2014-05-27 20:48:55 +04:00
* writepage invocation because it may be mmapped . " A file is mapped
* in multiples of the page size . For a file that is not a multiple of
* the page size , the remaining memory is zeroed when mapped , and
* writes to that region are not written out to the file . "
*/
2023-03-24 21:01:03 +03:00
if ( len < folio_size ( folio ) )
folio_zero_segment ( folio , len , folio_size ( folio ) ) ;
2013-04-12 07:48:32 +04:00
/*
* In the first loop we prepare and mark buffers to submit . We have to
2023-03-24 21:01:03 +03:00
* mark all buffers in the folio before submitting so that
* folio_end_writeback ( ) cannot be called from ext4_end_bio ( ) when IO
2013-04-12 07:48:32 +04:00
* on the first buffer finishes and we are still working on submitting
* the second buffer .
*/
2023-03-24 21:01:03 +03:00
bh = head = folio_buffers ( folio ) ;
2013-04-12 07:48:32 +04:00
do {
block_start = bh_offset ( bh ) ;
2010-10-28 05:30:10 +04:00
if ( block_start > = len ) {
clear_buffer_dirty ( bh ) ;
set_buffer_uptodate ( bh ) ;
continue ;
}
2013-01-29 05:53:28 +04:00
if ( ! buffer_dirty ( bh ) | | buffer_delay ( bh ) | |
! buffer_mapped ( bh ) | | buffer_unwritten ( bh ) ) {
/* A hole? We can safely clear the dirty bit */
if ( ! buffer_mapped ( bh ) )
clear_buffer_dirty ( bh ) ;
2022-12-07 14:27:04 +03:00
/*
2022-12-07 14:27:05 +03:00
* Keeping dirty some buffer we cannot write ? Make sure
2023-03-24 21:01:03 +03:00
* to redirty the folio and keep TOWRITE tag so that
* racing WB_SYNC_ALL writeback does not skip the folio .
2022-12-07 14:27:05 +03:00
* This happens e . g . when doing writeout for
2023-03-29 18:49:34 +03:00
* transaction commit or when journalled data is not
* yet committed .
2022-12-07 14:27:04 +03:00
*/
2023-03-29 18:49:34 +03:00
if ( buffer_dirty ( bh ) | |
( buffer_jbd ( bh ) & & buffer_jbddirty ( bh ) ) ) {
2023-03-24 21:01:03 +03:00
if ( ! folio_test_dirty ( folio ) )
folio_redirty_for_writepage ( wbc , folio ) ;
2022-12-07 14:27:05 +03:00
keep_towrite = true ;
}
2013-01-29 05:53:28 +04:00
continue ;
}
2019-02-11 07:32:07 +03:00
if ( buffer_new ( bh ) )
2013-04-12 07:48:32 +04:00
clear_buffer_new ( bh ) ;
set_buffer_async_write ( bh ) ;
2022-12-07 14:27:04 +03:00
clear_buffer_dirty ( bh ) ;
2015-10-03 06:54:58 +03:00
nr_to_submit + + ;
2013-04-12 07:48:32 +04:00
} while ( ( bh = bh - > b_this_page ) ! = head ) ;
2023-03-24 21:01:03 +03:00
/* Nothing to submit? Just unlock the folio... */
2022-12-07 14:27:05 +03:00
if ( ! nr_to_submit )
2023-02-28 08:13:16 +03:00
return 0 ;
2022-12-07 14:27:05 +03:00
2023-03-24 21:01:03 +03:00
bh = head = folio_buffers ( folio ) ;
2015-04-12 07:55:10 +03:00
2019-05-20 19:29:52 +03:00
/*
* If any blocks are being written to an encrypted file , encrypt them
* into a bounce page . For simplicity , just encrypt until the last
* block which might be needed . This may cause some unneeded blocks
* ( e . g . holes ) to be unnecessarily encrypted , but this is rare and
* can ' t happen in the common case of blocksize = = PAGE_SIZE .
*/
2023-03-16 23:48:31 +03:00
if ( fscrypt_inode_uses_fs_layer_crypto ( inode ) ) {
2016-03-26 23:14:34 +03:00
gfp_t gfp_flags = GFP_NOFS ;
2019-05-20 19:29:52 +03:00
unsigned int enc_bytes = round_up ( len , i_blocksize ( inode ) ) ;
2023-03-24 21:01:03 +03:00
struct page * bounce_page ;
2016-03-26 23:14:34 +03:00
2019-12-31 21:11:49 +03:00
/*
* Since bounce page allocation uses a mempool , we can only use
* a waiting mask ( i . e . request guaranteed allocation ) on the
* first page of the bio . Otherwise it can deadlock .
*/
if ( io - > io_bio )
gfp_flags = GFP_NOWAIT | __GFP_NOWARN ;
2016-03-26 23:14:34 +03:00
retry_encrypt :
2023-03-24 21:01:08 +03:00
bounce_page = fscrypt_encrypt_pagecache_blocks ( & folio - > page ,
enc_bytes , 0 , gfp_flags ) ;
2019-05-20 19:29:39 +03:00
if ( IS_ERR ( bounce_page ) ) {
ret = PTR_ERR ( bounce_page ) ;
2019-12-31 21:11:49 +03:00
if ( ret = = - ENOMEM & &
( io - > io_bio | | wbc - > sync_mode = = WB_SYNC_ALL ) ) {
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 01:07:14 +03:00
gfp_t new_gfp_flags = GFP_NOFS ;
2019-12-31 21:11:49 +03:00
if ( io - > io_bio )
2016-03-26 23:14:34 +03:00
ext4_io_submit ( io ) ;
2019-12-31 21:11:49 +03:00
else
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 01:07:14 +03:00
new_gfp_flags | = __GFP_NOFAIL ;
memalloc_retry_wait ( gfp_flags ) ;
gfp_flags = new_gfp_flags ;
2016-03-26 23:14:34 +03:00
goto retry_encrypt ;
}
2019-10-31 12:23:15 +03:00
printk_ratelimited ( KERN_ERR " %s: ret = %d \n " , __func__ , ret ) ;
2023-03-24 21:01:03 +03:00
folio_redirty_for_writepage ( wbc , folio ) ;
2019-10-31 12:23:15 +03:00
do {
2022-12-07 14:27:04 +03:00
if ( buffer_async_write ( bh ) ) {
clear_buffer_async_write ( bh ) ;
set_buffer_dirty ( bh ) ;
}
2019-10-31 12:23:15 +03:00
bh = bh - > b_this_page ;
} while ( bh ! = head ) ;
2023-02-28 08:13:16 +03:00
return ret ;
2015-04-12 07:55:10 +03:00
}
2023-03-24 21:01:03 +03:00
io_folio = page_folio ( bounce_page ) ;
2015-04-12 07:55:10 +03:00
}
2023-03-24 21:01:03 +03:00
__folio_start_writeback ( folio , keep_towrite ) ;
2022-12-07 14:27:05 +03:00
2015-04-12 07:55:10 +03:00
/* Now submit buffers to write */
2013-04-12 07:48:32 +04:00
do {
if ( ! buffer_async_write ( bh ) )
continue ;
2023-03-24 21:01:03 +03:00
io_submit_add_bh ( io , inode , folio , io_folio , bh ) ;
2013-04-12 07:48:32 +04:00
} while ( ( bh = bh - > b_this_page ) ! = head ) ;
2023-02-28 08:13:16 +03:00
return 0 ;
2010-10-28 05:30:10 +04:00
}