Commit Graph

4721 Commits

Author SHA1 Message Date
Jan Kara
ab382539ad ext4: Update comment in mpage_prepare_extent_to_map()
Since filemap_write_and_wait() is now enough to get journalled data to
final location update the comment in mpage_prepare_extent_to_map().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-12-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:58:33 -04:00
Jan Kara
951cafa6b8 ext4: Simplify handling of journalled data in ext4_bmap()
Now that ext4_writepages() gets journalled data into its final location
we just use filemap_write_and_wait() instead of special handling of
journalled data in ext4_bmap(). We can also drop EXT4_STATE_JDATA flag
as it is not used anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-11-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:58:33 -04:00
Jan Kara
7c375870fd ext4: Drop special handling of journalled data from ext4_quota_on()
Now that ext4_writepages() makes sure all journalled data is committed
and checkpointed, sync_filesystem() call done by dquot_quota_on() is
enough for quota IO to see uptodate data. So drop special handling of
journalled data from ext4_quota_on().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-10-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
56c2a0e3d9 ext4: Drop special handling of journalled data from ext4_evict_inode()
Now that ext4_writepages() makes sure journalled data is on stable
storage, write_inode_now() call in iput_final() is enough to make
pagecache pages with journalled data really clean (data committed and
checkpointed). So we can drop special handling of journalled data in
ext4_evict_inode().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
783ae448b7 ext4: Fix special handling of journalled data from extent zeroing
The handling of journalled data in ext4_zero_range() is incomplete. We
do not need to commit running transaction but we rather need to
checkpoint pages with journalled data. If we don't, journal tail can be
advanced beyond transaction containing the journalled data and if we
then crash before committing the transaction doing the zeroing we will
have inconsistent (too old) data in the file. Make sure file pages with
journalled data are properly checkpointed before removing them from the
page cache.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
c000dfec7e ext4: Drop special handling of journalled data from extent shifting operations
Now that filemap_write_and_wait() makes sure pages with journalled data
are safely on disk, ext4_collapse_range() and ext4_insert_range() do
not need special handling of journalled data.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-7-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
e360c6ed72 ext4: Drop special handling of journalled data from ext4_sync_file()
Now that ext4_writepages() make sure all pages with journalled data are
stable on disk, we don't need special handling of journalled data in
ext4_sync_file().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-6-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
1f1a55f0bf ext4: Commit transaction before writing back pages in data=journal mode
When journalling data we currently just walk over pages, journal those
that are marked for delayed dirtying (only pinned pages dirtied behing
our back these days) and checkpoint other dirty pages. Because some
pages may be part of running transaction the result is that after
filemap_write_and_wait() we are not guaranteed pages are stable on disk.
Thus places that want to flush current pagecache content need to jump
through hoops to make sure journalled data is not lost. This is
manageable in cases completely controlled by ext4 (such as extent
shifting operations or inode eviction) but it gets ugly for stuff like
fsverity. Furthermore it is rather error prone as people often do not
realize journalled data needs special handling.

So change ext4_writepages() to commit transaction with inode's data
before going through the writeback loop in WB_SYNC_ALL mode. As a result
filemap_write_and_wait() is now really getting pages to stable storage
and makes pagecache pages safe to reclaim. Consequently we can remove
the special handling of journalled data from several places in follow up
patches.

Note that this will make fsync(2) for journalled data more expensive as
we will end up not only committing the transaction we need but also
checkpointing the data (which we may have previously skipped if the data
was part of the running transaction). If we really cared, we would need
to introduce special VFS function for writing out & invalidating page
cache for a range, use ->launder_page callback to perform checkpointing,
and use it from all the places that need this functionality. But at this
point I'm not convinced the complexity is worth it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
5e1bdea639 ext4: Clear dirty bit from pages without data to write
With journalled data it can happen that checkpointing code will write
out page contents without clearing the page dirty bit. The logic in
ext4_page_nomap_can_writeout() then results in us never calling
mpage_submit_page() and thus clearing the dirty bit. Drop the
optimization with ext4_page_nomap_can_writeout() and just always call to
mpage_submit_page(). ext4_bio_write_page() knows when to redirty the
page and the additional clearing & setting of page dirty bit for ordered
mode writeout is not that expensive to jump through the hoops for it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:55:44 -04:00
Jan Kara
265e72efa9 ext4: Keep pages with journalled data dirty
Currently we clear page dirty bit when we checkpoint some buffers from a
page with journalled data or when we perform delayed dirtying of a page
in ext4_writepages(). In a quest to simplify handling of journalled data
we want to keep page dirty as long as it has either buffers to
checkpoint or journalled dirty data. So make sure to keep page dirty in
ext4_writepages() if it still has journalled data attached to it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:44:00 -04:00
Jan Kara
d84c9ebdac ext4: Mark pages with journalled data dirty
Currently pages with journalled data written by write(2) or modified by
block zeroing during truncate(2) are not marked as dirty. They are
dirtied only once the transaction commits. This however makes writeback
code think inode has no pages to write and so ext4_writepages() is not
called to make pages with journalled data persistent. Mark pages with
journalled data dirty (similarly as it happens for writes through mmap)
so that writeback code knows about them and ext4_writepages() can do
what it needs to to the inode.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:38:50 -04:00
Matthew Wilcox
e9ebecf266 ext4: Use a folio in ext4_read_merkle_tree_page
This is an implementation of fsverity_operations read_merkle_tree_page,
so it must still return the precise page asked for, but we can use the
folio API to reduce the number of conversions between folios & pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-30-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
b23fb76278 ext4: Convert pagecache_read() to use a folio
Use the folio API and support folios of arbitrary sizes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-29-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
3060b6ef05 ext4: Convert mext_page_mkuptodate() to take a folio
Use a folio throughout.  Does not support large folios due to
an array sized for MAX_BUF_PER_PAGE, but it does remove a few
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-28-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
f2b229a8c6 ext4: Use a folio iterator in __read_end_io()
Iterate once per folio, not once per page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-27-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
9ea0e45bd2 ext4: Use a folio in ext4_page_mkwrite()
Convert to the folio API, saving a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-26-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
86b38c273c ext4: Convert ext4_block_write_begin() to take a folio
All the callers now have a folio, so pass that in and operate on folios.
Removes four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-25-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
c0be8e6f08 ext4: Convert ext4_mpage_readpages() to work on folios
This definitely doesn't include support for large folios; there
are all kinds of assumptions about the number of buffers attached
to a folio.  But it does remove several calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-24-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
0b5a254395 ext4: Use a folio in ext4_da_write_begin()
Remove a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-23-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
02e4b04c56 ext4: Convert ext4_page_nomap_can_writeout to ext4_folio_nomap_can_writeout
Its one caller already uses a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-22-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
9d3973de9a ext4: Convert __ext4_block_zero_page_range() to use a folio
Use folio APIs throughout.  Saves many calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-21-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
86324a2162 ext4: Convert ext4_journalled_zero_new_buffers() to use a folio
Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-20-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
feb22b77b8 ext4: Use a folio in ext4_journalled_write_end()
Convert the incoming page to a folio to remove a few calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-19-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
64fb313675 ext4: Convert ext4_write_end() to use a folio
Convert the incoming struct page to a folio.  Replaces two implicit
calls to compound_head() with one explicit call.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-18-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
4d934a5e6c ext4: Convert ext4_write_begin() to use a folio
Remove a lot of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-17-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
6b90d4130a ext4: Convert ext4_write_inline_data_end() to use a folio
Convert the incoming page to a folio so that we call compound_head()
only once instead of seven times.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-16-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
6b87fbe415 ext4: Convert ext4_read_inline_page() to ext4_read_inline_folio()
All callers now have a folio, so pass it and use it.  The folio may
be large, although I doubt we'll want to use a large folio for an
inline file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-15-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
9a9d01f081 ext4: Convert ext4_da_write_inline_data_begin() to use a folio
Saves a number of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-14-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
4ed9b598ac ext4: Convert ext4_da_convert_inline_data_to_extent() to use a folio
Saves a number of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-13-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
f8f8c89f59 ext4: Convert ext4_try_to_write_inline_data() to use a folio
Saves a number of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-12-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
83eba701cf ext4: Convert ext4_convert_inline_data_to_extent() to use a folio
Saves a number of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-11-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
3edde93e07 ext4: Convert ext4_readpage_inline() to take a folio
Use the folio API in this function, saves a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-10-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
e8d6062c50 ext4: Convert ext4_bio_write_page() to ext4_bio_write_folio()
The only caller now has a folio so pass it in directly and avoid the call
to page_folio() at the beginning.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-9-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
33483b3b6e ext4: Convert mpage_page_done() to mpage_folio_done()
All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-8-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
81a0d3e126 ext4: Convert mpage_submit_page() to mpage_submit_folio()
All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-7-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Matthew Wilcox
4da2f6e3c4 ext4: Turn mpage_process_page() into mpage_process_folio()
The page/folio is only used to extract the buffers, so this is a
simple change.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-6-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Matthew Wilcox
bb64c08bff ext4: Convert ext4_finish_bio() to use folios
Prepare ext4 to support large folios in the page writeback path.
Also set the actual error in the mapping, not just -EIO.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-5-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Matthew Wilcox
cd57b77197 ext4: Convert ext4_bio_write_page() to use a folio
Remove several calls to compound_head() and the last caller of
set_page_writeback_keepwrite(), so remove the wrapper too.

Also export bio_add_folio() as this is the first caller from a module.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-4-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Matthew Wilcox
e999a5c5a1 fs: Add FGP_WRITEBEGIN
This particular combination of flags is used by most filesystems
in their ->write_begin method, although it does find use in a
few other places.  Before folios, it warranted its own function
(grab_cache_page_write_begin()), but I think that just having specialised
flags is enough.  It certainly helps the few places that have been
converted from grab_cache_page_write_begin() to __filemap_get_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-2-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Ojaswin Mujoo
361eb69fc9 ext4: Remove the logic to trim inode PAs
Earlier, inode PAs were stored in a linked list. This caused a need to
periodically trim the list down inorder to avoid growing it to a very
large size, as this would severly affect performance during list
iteration.

Recent patches changed this list to an rbtree, and since the tree scales
up much better, we no longer need to have the trim functionality, hence
remove it.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/c409addceaa3ade4b40328e28e3b54b2f259689e.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:13 -04:00
Ojaswin Mujoo
3872778664 ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list
Currently, the kernel uses i_prealloc_list to hold all the inode
preallocations. This is known to cause degradation in performance in
workloads which perform large number of sparse writes on a single file.
This is mainly because functions like ext4_mb_normalize_request() and
ext4_mb_use_preallocated() iterate over this complete list, resulting in
slowdowns when large number of PAs are present.

Patch 27bc446e2 partially fixed this by enforcing a limit of 512 for
the inode preallocation list and adding logic to continually trim the
list if it grows above the threshold, however our testing revealed that
a hardcoded value is not suitable for all kinds of workloads.

To optimize this, add an rbtree to the inode and hold the inode
preallocations in this rbtree. This will make iterating over inode PAs
faster and scale much better than a linked list. Additionally, we also
had to remove the LRU logic that was added during trimming of the list
(in ext4_mb_release_context()) as it will add extra overhead in rbtree.
The discards now happen in the lowest-logical-offset-first order.

** Locking notes **

With the introduction of rbtree to maintain inode PAs, we can't use RCU
to walk the tree for searching since it can result in partial traversals
which might miss some nodes(or entire subtrees) while discards happen
in parallel (which happens under a lock).  Hence this patch converts the
ei->i_prealloc_lock spin_lock to rw_lock.

Almost all the codepaths that read/modify the PA rbtrees are protected
by the higher level inode->i_data_sem (except
ext4_mb_discard_group_preallocations() and ext4_clear_inode()) IIUC, the
only place we need lock protection is when one thread is reading
"searching" the PA rbtree (earlier protected under rcu_read_lock()) and
another is "deleting" the PAs in ext4_mb_discard_group_preallocations()
function (which iterates all the PAs using the grp->bb_prealloc_list and
deletes PAs from the tree without taking any inode lock (i_data_sem)).

So, this patch converts all rcu_read_lock/unlock() paths for inode list
PA to use read_lock() and all places where we were using
ei->i_prealloc_lock spinlock will now be using write_lock().

Note that this makes the fast path (searching of the right PA e.g.
ext4_mb_use_preallocated() or ext4_mb_normalize_request()), now use
read_lock() instead of rcu_read_lock/unlock().  Ths also will now block
due to slow discard path (ext4_mb_discard_group_preallocations()) which
uses write_lock().

But this is not as bad as it looks. This is because -

1. The slow path only occurs when the normal allocation failed and we
   can say that we are low on disk space.  One can argue this scenario
   won't be much frequent.

2. ext4_mb_discard_group_preallocations(), locks and unlocks the rwlock
   for deleting every individual PA.  This gives enough opportunity for
   the fast path to acquire the read_lock for searching the PA inode
   list.

Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/4137bce8f6948fedd8bae134dabae24acfe699c6.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:13 -04:00
Ojaswin Mujoo
a8e38fd37c ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union
** Splitting pa->pa_inode_list **

Currently, we use the same pa->pa_inode_list to add a pa to either
the inode preallocation list or the locality group preallocation list.
For better clarity, split this list into a union of 2 list_heads and use
either of the them based on the type of pa.

** Splitting pa->pa_obj_lock **

Currently, pa->pa_obj_lock is either assigned &ei->i_prealloc_lock for
inode PAs or lg_prealloc_lock for lg PAs, and is then used to lock the
lists containing these PAs. Make the distinction between the 2 PA types
clear by changing this lock to a union of 2 locks. Explicitly use the
pa_lock_node.inode_lock for inode PAs and pa_lock_node.lg_lock for lg
PAs.

This patch is required so that the locality group preallocation code
remains the same as in upcoming patches we are going to make changes to
inode preallocation code to move from list to rbtree based
implementation. This patch also makes it easier to review the upcoming
patches.

There are no functional changes in this patch.

Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1d7ac0557e998c3fc7eef422b52e4bc67bdef2b0.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:13 -04:00
Ojaswin Mujoo
93cdf49f6e ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()
When the length of best extent found is less than the length of goal extent
we need to make sure that the best extent atleast covers the start of the
original request. This is done by adjusting the ac_b_ex.fe_logical (logical
start) of the extent.

While doing so, the current logic sometimes results in the best extent's
logical range overflowing the goal extent. Since this best extent is later
added to the inode preallocation list, we have a possibility of introducing
overlapping preallocations. This is discussed in detail here [1].

As per Jan's suggestion, to fix this, replace the existing logic with the
below logic for adjusting best extent as it keeps fragmentation in check
while ensuring logical range of best extent doesn't overflow out of goal
extent:

1. Check if best extent can be kept at end of goal range and still cover
   original start.
2. Else, check if best extent can be kept at start of goal range and still
   cover original start.
3. Else, keep the best extent at start of original request.

Also, add a few extra BUG_ONs that might help catch errors faster.

[1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/f96aca6d415b36d1f90db86c1a8cd7e2e9d7ab0e.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:13 -04:00
Ojaswin Mujoo
0830344c95 ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()
Abstract out the logic of fixing PA overlaps in ext4_mb_normalize_request
to improve readability of code. This also makes it easier to make changes
to the overlap logic in future.

There are no functional changes in this patch

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/9b35f3955a1d7b66bbd713eca1e63026e01f78c1.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Ojaswin Mujoo
7692094ac5 ext4: Move overlap assert logic into a separate function
Abstract out the logic to double check for overlaps in normalize_pa to
a separate function. Since there has been no reports in past where we
have seen any overlaps which hits this bug_on(), in future we can
consider calling this function under "#ifdef AGGRESSIVE_CHECK" only.

There are no functional changes in this patch

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/35dd5d94fa0b2d1cd2d2947adf8967279c72967d.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Ojaswin Mujoo
bcf4349921 ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()
Change some variable names to be more consistent and
refactor some of the code to make it easier to read.

There are no functional changes in this patch

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8edcab489c06cf861b19d87207d9b0ff7ac7f3c1.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Ojaswin Mujoo
820897258a ext4: Refactor code related to freeing PAs
This patch makes the following changes:

*  Rename ext4_mb_pa_free to ext4_mb_pa_put_free
   to better reflect its purpose

*  Add new ext4_mb_pa_free() which only handles freeing

*  Refactor ext4_mb_pa_callback() to use ext4_mb_pa_free()

There are no functional changes in this patch

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/b273bc9cbf5bd278f641fa5bc6c0cc9e6cb3330c.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Ojaswin Mujoo
e86a718228 ext4: Stop searching if PA doesn't satisfy non-extent file
If we come across a PA that matches the logical offset but is unable to
satisfy a non-extent file due to its physical start being higher than
that supported by non extent files, then simply stop searching for
another PA and break out of loop. This is because, since PAs don't
overlap, we won't be able to find another inode PA which can satisfy the
original request.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/42404ca29bd304ae2c962184c3c32a02e8eefcd0.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Theodore Ts'o
19b8b035a7 ext4: convert some BUG_ON's in mballoc to use WARN_RATELIMITED instead
In cases where we have an obvious way of continuing, let's use
WARN_RATELIMITED() instead of BUG_ON().

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
91a48aaf59 ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request
Result of EXT4_SB(ac->ac_sb) is already stored in sbi at beginning of
ext4_mb_normalize_request. Use sbi instead of EXT4_SB(ac->ac_sb) to
remove unnecessary pointer dereference.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230311170949.1047958-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
1221b23501 ext4: fix typos in mballoc
pa_plen -> pa_len
pa_start -> pa_pstart

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230311170949.1047958-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
253cacb0de ext4: simplify calculation of blkoff in ext4_mb_new_blocks_simple
We try to allocate a block from goal in ext4_mb_new_blocks_simple. We
only need get blkoff in first group with goal and set blkoff to 0 for
the rest groups.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-21-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
46825e9490 ext4: remove comment code ext4_discard_preallocations
Just remove comment code in ext4_discard_preallocations.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-20-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
3a037b1b88 ext4: remove repeat assignment to ac_f_ex
Call trace to assign ac_f_ex:
ext4_mb_use_best_found
	ac->ac_f_ex = ac->ac_b_ex;
	ext4_mb_new_preallocation
		ext4_mb_new_group_pa
			ac->ac_f_ex = ac->ac_b_ex;
		ext4_mb_new_inode_pa
			ac->ac_f_ex = ac->ac_b_ex;

Actually allocated blocks is already stored in ac_f_ex in
ext4_mb_use_best_found, so there is no need to assign ac_f_ex
in ext4_mb_new_group_pa and ext4_mb_new_inode_pa.
Just remove repeat assignment to ac_f_ex in ext4_mb_new_group_pa
and ext4_mb_new_inode_pa.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-19-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
fb28f9ceec ext4: remove unnecessary goto in ext4_mb_mark_diskspace_used
When ext4_read_block_bitmap fails, we can return PTR_ERR(bitmap_bh) to
remove unnecessary NULL check of bitmap_bh.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-18-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
c7f2bafa3c ext4: remove unnecessary count2 in ext4_free_data_in_buddy
count2 is always 1 in mb_debug, just remove unnecessary count2.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-17-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:12 -04:00
Kemeng Shi
df11909514 ext4: remove unnecessary exit_meta_group_info tag
We goto exit_meta_group_info only to return -ENOMEM. Return -ENOMEM
directly instead of goto to remove this unnecessary tag.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-16-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
78dc9f844f ext4: use best found when complex scan of group finishs
If any bex which meets bex->fe_len >= gex->fe_len is found, then it will
always be used when complex scan of group that bex belongs to finishs.
So there will not be any lock-unlock period.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-15-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
32c0869370 ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits
Only call trace of ext4_mb_check_limits is as following:
ext4_mb_complex_scan_group
	ext4_mb_measure_extent
		ext4_mb_check_limits(ac, e4b, 0);
	ext4_mb_check_limits(ac, e4b, 1);

If the first ac->ac_found > sbi->s_mb_max_to_scan check in
ext4_mb_check_limits is met, we will set ac_status to
AC_STATUS_BREAK and call ext4_mb_try_best_found to try to use
ac->ac_b_ex.
If ext4_mb_try_best_found successes, then block allocation finishs,
the removed ac->ac_found > sbi->s_mb_min_to_scan check is not reachable.
If ext4_mb_try_best_found fails, then we set EXT4_MB_HINT_FIRST and
reset ac->ac_b_ex to retry block allocation. We will use any found
free extent in ext4_mb_measure_extent before reach the removed
ac->ac_found > sbi->s_mb_min_to_scan check.
In summary, the removed ac->ac_found > sbi->s_mb_min_to_scan check is
not reachable and we can remove that dead check.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-14-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
976620bd26 ext4: remove dead check in mb_buddy_mark_free
We always adjust first to even number and adjust last to odd number, so
first == last will never happen. Remove this dead check.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-13-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
aaae558dae ext4: remove unnecessary check in ext4_mb_new_blocks
1. remove unnecessary ac check:
We always go to out tag before ac is successfully allocated, then we can
move out tag after free of ac and remove NULL check of ac.

2. remove unnecessary *errp check:
We always go to errout tag if *errp is non-zero, then we can move errout
tag into error handle if *errp is non-zero.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-12-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
285164b801 ext4: remove unnecessary e4b->bd_buddy_page check in ext4_mb_load_buddy_gfp
e4b->bd_buddy_page is only set if we initialize ext4_buddy successfully. So
e4b->bd_buddy_page is always NULL in error handle branch. Just remove the
dead check.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-11-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
139f46d3b5 ext4: Remove unnecessary release when memory allocation failed in ext4_mb_init_cache
If we alloc array of buffer_head failed, there is no resource need to be
freed and we can simpily return error.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-10-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
85b67ffb7d ext4: remove unused return value of ext4_mb_try_best_found and ext4_mb_free_metadata
Return value static function ext4_mb_try_best_found and
ext4_mb_free_metadata is not used. Just remove unused return value.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-9-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
1b5c9d3494 ext4: add missed brelse in ext4_free_blocks_simple
Release bitmap buffer_head we got if error occurs.
Besides, this patch remove unused assignment to err.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-8-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
36cb0f52ae ext4: protect pa->pa_free in ext4_discard_allocated_blocks
If ext4_mb_mark_diskspace_used fails in ext4_mb_new_blocks, we may
discard pa already in list. Protect pa with pa_lock to avoid race.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
1afdc58894 ext4: correct start of used group pa for debug in ext4_mb_use_group_pa
As we don't correct pa_lstart here, so there is no need to subtract
pa_lstart with consumed len.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
abc075d4a5 ext4: correct calculation of s_mb_preallocated
We will add pa_free to s_mb_preallocated when new ext4_prealloc_space is
created. In ext4_mb_new_inode_pa, we will call ext4_mb_use_inode_pa
before adding pa_free to s_mb_preallocated. However, ext4_mb_use_inode_pa
will consume pa_free for block allocation which triggerred the creation
of ext4_prealloc_space. Add pa_free to s_mb_preallocated before
ext4_mb_use_inode_pa to correct calculation of s_mb_preallocated.
There is no such problem in ext4_mb_new_group_pa as pa_free of group pa
is consumed in ext4_mb_release_context instead of ext4_mb_use_group_pa.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
22fab98402 ext4: get correct ext4_group_info in ext4_mb_prefetch_fini
We always get ext4_group_desc with group + 1 and ext4_group_info with
group to check if we need do initialize ext4_group_info for the group.
Just get ext4_group_desc with group for ext4_group_info initialization
check.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:11 -04:00
Kemeng Shi
01e4ca2945 ext4: allow to find by goal if EXT4_MB_HINT_GOAL_ONLY is set
If EXT4_MB_HINT_GOAL_ONLY is set, ext4_mb_regular_allocator will only
allocate blocks from ext4_mb_find_by_goal. Allow to find by goal in
ext4_mb_find_by_goal if EXT4_MB_HINT_GOAL_ONLY is set or allocation
with EXT4_MB_HINT_GOAL_ONLY set will always fail.

EXT4_MB_HINT_GOAL_ONLY is not used at all, so the problem is not
found for now.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:10 -04:00
Kemeng Shi
b07ffe6927 ext4: set goal start correctly in ext4_mb_normalize_request
We need to set ac_g_ex to notify the goal start used in
ext4_mb_find_by_goal. Set ac_g_ex instead of ac_f_ex in
ext4_mb_normalize_request.
Besides we should assure goal start is in range [first_data_block,
blocks_count) as ext4_mb_initialize_context does.

[ Added a check to make sure size is less than ar->pright; otherwise
  we could end up passing an underflowed value of ar->pright - size to
  ext4_get_group_no_and_offset(), which will trigger a BUG_ON later on.
  - TYT ]

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230303172120.3800725-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 01:13:03 -04:00
Christoph Hellwig
66dabbb65d mm: return an ERR_PTR from __filemap_get_folio
Instead of returning NULL for all errors, distinguish between:

 - no entry found and not asked to allocated (-ENOENT)
 - failed to allocate memory (-ENOMEM)
 - would block (-EAGAIN)

so that callers don't have to guess the error based on the passed in
flags.

Also pass through the error through the direct callers: filemap_get_folio,
filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio.

[hch@lst.de: fix null-pointer deref]
  Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de
  Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004
Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2]
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:42 -07:00
Uladzislau Rezki (Sony)
10e4f310a8 ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
The kfree_rcu() and kvfree_rcu() macros' single-argument forms are
deprecated.  Therefore switch to the new kfree_rcu_mightsleep() and
kvfree_rcu_mightsleep() variants. The goal is to avoid accidental use
of the single-argument forms, which can introduce functionality bugs in
atomic contexts and latency bugs in non-atomic contexts.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Lukas Czerner <lczerner@redhat.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05 13:48:04 +00:00
Jens Axboe
d8aeb44a9a fs: add FMODE_DIO_PARALLEL_WRITE flag
Some filesystems support multiple threads writing to the same file with
O_DIRECT without requiring exclusive access to it. io_uring can use this
hint to avoid serializing dio writes to this inode, instead allowing them
to run in parallel.

XFS and ext4 both fall into this category, so set the flag for both of
them.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-03 07:14:20 -06:00
Kemeng Shi
1df9bde48f ext4: remove unused group parameter in ext4_block_bitmap_csum_set
Remove unused group parameter in ext4_block_bitmap_csum_set. After this,
group parameter in ext4_set_bitmap_checksums is also not used, just
remove it too.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
82483dfe17 ext4: remove unused group parameter in ext4_block_bitmap_csum_verify
Remove unused group parameter in ext4_block_bitmap_csum_verify.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
4fd873c817 ext4: remove unused group parameter in ext4_inode_bitmap_csum_set
Remove unused group parameter in ext4_inode_bitmap_csum_set.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
b83acc7771 ext4: remove unused group parameter in ext4_inode_bitmap_csum_verify
Remove unused group parameter in ext4_inode_bitmap_csum_verify.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230221203027.2359920-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
68e294dccc ext4: improve inode table blocks counting in ext4_num_overhead_clusters
As inode table blocks are contiguous, inode table blocks inside the
block_group can be represented as range [itbl_cluster_start,
itbl_cluster_last]. Then we can simply account inode table cluters and
check cluster overlap with [itbl_cluster_start, itbl_cluster_last]
instead of traverse each block of inode table.

By the way, this patch fixes code style problem of comment for
ext4_num_overhead_clusters.

[ Merged fix-up patch which fixed potentially access to an
  uninitialzied stack variable. --TYT ]

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-8-shikemeng@huaweicloud.com
Link: https://lore.kernel.org/r/202303171446.eLEhZzAu-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
cefa74d004 ext4: stop trying to verify just initialized bitmap in ext4_read_block_bitmap_nowait
For case we initialize a bitmap bh, we will set bitmap bh verified.
We can return immediately instead of goto verify to remove unnecessary
work for trying to verify bitmap bh.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
f567ea7843 ext4: remove stale comment in ext4_init_block_bitmap
Commit bdfb6ff4a2 ("ext4: mark group corrupt on group descriptor
checksum") added flag to indicate corruption of group instead of
marking all blocks used. Just remove stale comment.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
ad3f09be6c ext4: remove unnecessary check in ext4_bg_num_gdb_nometa
We only call ext4_bg_num_gdb_nometa if there is no meta_bg feature or group
does not reach first_meta_bg, so group must not reside in meta group. Then
group has a global gdt if superblock exists. Just remove confusing branch
that meta_bg feature exists.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
a38627f143 ext4: call ext4_bg_num_gdb_[no]meta directly in ext4_num_base_meta_clusters
ext4_num_base_meta_clusters is already aware of meta_bg feature and test
if block_group is inside real meta block groups before calling
ext4_bg_num_gdb. Then ext4_bg_num_gdb will check if block group is inside
a real meta block groups again to decide either ext4_bg_num_gdb_meta or
ext4_bg_num_gdb_nometa is needed.
Call ext4_bg_num_gdb_meta or ext4_bg_num_gdb_nometa directly after we
check if block_group is inside a meta block groups in
ext4_num_base_meta_clusters to remove redundant check of meta block
groups in ext4_bg_num_gdb.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
3d61ef10f5 ext4: correct validation check of inode table in ext4_valid_block_bitmap
1.Last valid cluster of inode table is EXT4_B2C(sbi, offset +
sbi->s_itb_per_group - 1). We should make sure last valid cluster is <
max_bit, i.e., EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) is <
max_bit rather than EXT4_B2C(sbi, offset + sbi->s_itb_per_group) is
< max_bit.

2.Bit search length should be last valid cluster plus 1, i.e.,
EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) + 1 rather than
EXT4_B2C(sbi, offset + sbi->s_itb_per_group).

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:08 -04:00
Kemeng Shi
b5aa06bfe9 ext4: properly handle error of ext4_init_block_bitmap in ext4_read_block_bitmap_nowait
We mark buffer_head of bitmap successfully initialized even error occurs
in ext4_init_block_bitmap. Although we will return error, we will get a
invalid buffer_head of bitmap from next ext4_read_block_bitmap_nowait
which is marked buffer_verified but not successfully initialized actually
in previous ext4_read_block_bitmap_nowait.
Fix this by only marking buffer_head successfully initialized if
ext4_init_block_bitmap successes.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230221115919.1918161-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:07 -04:00
Theodore Ts'o
98ccceee3e ext4: fix comment: "start start" -> "start" in mpage_prepare_extent_to_map()
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:07 -04:00
Jan Kara
e6c28a26b7 ext4: Fix warnings when freezing filesystem with journaled data
Test generic/390 in data=journal mode often triggers a warning that
ext4_do_writepages() tries to start a transaction on frozen filesystem.
This happens because although all dirty data is properly written, jbd2
checkpointing code writes data through submit_bh() and as a result only
buffer dirty bits are cleared but page dirty bits stay set. Later when
the filesystem is frozen, writeback code comes, tries to write
supposedly dirty pages and the warning triggers. Fix the problem by
calling sync_filesystem() once more after flushing the whole journal to
clear stray page dirty bits.

[ Applied fixup patches to address crashes when running data=journal
  tests; see links for more details -- TYT ]

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230308142528.12384-1-jack@suse.cz
Reported-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/all/20230319183617.GA896@sol.localdomain
Link: https://lore.kernel.org/r/20230323145404.21381-1-jack@suse.cz
Link: https://lore.kernel.org/r/20230323145404.21381-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 22:53:00 -04:00
Jan Kara
3f079114bf ext4: Convert data=journal writeback to use ext4_writepages()
Add support for writeback of journalled data directly into
ext4_writepages() instead of offloading it to write_cache_pages(). This
actually significantly simplifies the code and reduces code duplication.
For checkpointing of committed data we can use ext4_writepages()
rightaway the same way as writeback of ordered data uses it on
transaction commit. For journalling of dirty mapped pages, we need to
add a special case to mpage_prepare_extent_to_map() to add all page
buffers to the journal.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-8-tytso@mit.edu
2023-03-23 10:06:08 -04:00
Jan Kara
d8be7607de ext4: Move mpage_page_done() calls after error handling
In case mpage_submit_page() returns error, it doesn't really matter
whether we call mpage_page_done() and then return error or whether we
return directly because in that case page cleanup will be done by
mpage_release_unused_pages() instead. Logically, it makes more sense to
leave the cleanup to mpage_release_unused_pages() because we didn't
succeed in writing the page. So move mpage_page_done() calls after the
error handling.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-7-tytso@mit.edu
2023-03-23 10:06:08 -04:00
Jan Kara
eaf2ca10ca ext4: Move page unlocking out of mpage_submit_page()
Move page unlocking during page writeback out of mpage_submit_page()
into the callers. This will allow writeback in data=journal mode to keep
the page locked for a bit longer. Since page unlocking it tightly
connected to increment of mpd->first_page (as that determines cleanup of
locked but unwritten pages), move page unlocking as well as
mpd->first_page handling into a helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-6-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
f1496362e9 ext4: Don't unlock page in ext4_bio_write_page()
Do not unlock the written page in ext4_bio_write_page(). Instead leave
the page locked and unlock it in the callers. We'll need to keep the
page locked for data=journal writeback for a bit longer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-5-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
3f5d30636d ext4: Mark page for delayed dirtying only if it is pinned
In data=journal mode, page should be dirtied only when it has buffers
for checkpoint or it is writeably mapped. In the first case, we don't
need to do anything special. In the second case, page was already added
to the journal by ext4_page_mkwrite() and since transaction commit
writeprotects mapped pages again, page should be writeable (and thus
dirtied) only while it is part of the running transaction. So nothing
needs to be done either. The only special case is when someone pins the
page and uses this pin for modifying page data. So recognize this
special case and only then mark the page as having data that needs
adding to the journal.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-4-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
c8e8e16dbb ext4: Use nr_to_write directly in mpage_prepare_extent_to_map()
When looking up extent of pages to map in mpage_prepare_extent_to_map()
we count how many pages we still need to find in a copy of
wbc->nr_to_write counter. With more complex page handling for
data=journal mode, it will be easier to use wbc->nr_to_write directly so
that we don't forget to carry over changes back to nr_to_write counter.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-3-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
9462f770ed ext4: Update stale comment about write constraints
The comment above do_journal_get_write_access() is very stale. Most of
it just does not refer to what the function does today or how jbd2
works. The bit about transaction handling during write(2) is still
correct so just update the function names in that part and move the
comment to a more appropriate place.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-2-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Theodore Ts'o
70e42feab2 ext4: fix possible double unlock when moving a directory
Fixes: 0813299c58 ("ext4: Fix possible corruption when moving a directory")
Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain
Reported-by: Dan Carpenter <error27@gmail.com>
Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-17 21:53:52 -04:00
Linus Torvalds
40d0c0901e Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
 the merge window discovered by a combination of syzbot and lockdep.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQNVwIACgkQ8vlZVpUN
 gaMwmgf/ZAasXZEMV0zaQZa8zP4KvMKZjWe6azkcJg4sb/HG9Q7JzeJDCurhhWUj
 8+QnyUcuKTyWKYWjGf0f5CZaYEM5AZYij41UJzu2qMkz5hVXSqBVuY8KywxuiJv5
 kfuIvQh0Onv0Yrg2qAc52/kZkq1lu2sl/F5ertBWjdpTUXdBUdrCxkUk+1BgQWAj
 vNwi1/+gNuX7RxMboHqYmwXFP39vECd+wteNdsiK1hR8bLqL68duLLq8xQdHt4gS
 sbVmJKR4j2Giw4ZnlYi9RiwKIO0beqocanp+cfOPulyj5mTM8X1lr0uvaLZgx2AF
 lqrS3/5ksp45cRT70qCIz8je70hTSg==
 =nN3T
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes and regressions for ext4, the most serious of which is a
  potential deadlock during directory renames that was introduced during
  the merge window discovered by a combination of syzbot and lockdep"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: zero i_disksize when initializing the bootloader inode
  ext4: make sure fs error flag setted before clear journal error
  ext4: commit super block if fs record error when journal record without error
  ext4, jbd2: add an optimized bmap for the journal inode
  ext4: fix WARNING in ext4_update_inline_data
  ext4: move where set the MAY_INLINE_DATA flag is set
  ext4: Fix deadlock during directory rename
  ext4: Fix comment about the 64BIT feature
  docs: ext4: modify the group desc size to 64
  ext4: fix another off-by-one fsmap error on 1k block filesystems
  ext4: fix RENAME_WHITEOUT handling for inline directories
  ext4: make kobj_type structures constant
  ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-12 08:55:55 -07:00
Zhihao Cheng
f5361da1e6 ext4: zero i_disksize when initializing the bootloader inode
If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin
f57886ca16 ext4: make sure fs error flag setted before clear journal error
Now, jounral error number maybe cleared even though ext4_commit_super()
failed. This may lead to error flag miss, then fsck will miss to check
file system deeply.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Ye Bin
eee00237fa ext4: commit super block if fs record error when journal record without error
Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Theodore Ts'o
62913ae96d ext4, jbd2: add an optimized bmap for the journal inode
The generic bmap() function exported by the VFS takes locks and does
checks that are not necessary for the journal inode.  So allow the
file system to set a journal-optimized bmap function in
journal->j_bmap.

Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00