IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Users complained about OOM errors during fork without triggering
compaction. This can be fixed by modifying the flags used in
mas_expected_entries() so that the compaction will be triggered in low
memory situations. Since mas_expected_entries() is only used during fork,
the extra argument does not need to be passed through.
Additionally, the two test_maple_tree test cases and one benchmark test
were altered to use the correct locking type so that allocations would not
trigger sleeping and thus fail. Testing was completed with lockdep atomic
sleep detection.
The additional locking change requires rwsem support additions to the
tools/ directory through the use of pthreads pthread_rwlock_t. With this
change test_maple_tree works in userspace, as a module, and in-kernel.
Users may notice that the system gave up early on attempting to start new
processes instead of attempting to reclaim memory.
Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4
Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <jason.sim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits. This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.
Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last. This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.
Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone. Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).
This solution is to add two states to track what happened with the
previous iterator action. This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).
Tests are also added and updated accordingly.
Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
Fixes: 39193685d5 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Avoid setting the variables until necessary, and actually use the
variables where applicable. Introducing a variable for the slots array
avoids spanning multiple lines.
Add the missing argument to the documentation.
Use the node type when setting the metadata instead of blindly assuming
the type.
Finally, add a trace point to the function for successful store.
Link: https://lkml.kernel.org/r/20230819004356.1454718-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reorder the operations for split and spanning stores so that new data is
placed in the tree prior to marking the old data as dead. This will limit
re-walks on dead data to just once instead of a retry loop.
The order of operations is as follows: Create the new data, put the new
data in place, mark the top node of the old data as dead.
Then repair parent links in the reused nodes through all levels of the
tree, following the new nodes downwards. Finally walk the top dead node
looking for nodes that are no longer used, or subtrees that should be
destroyed (marked dead throughout then freed), follow the partially used
nodes downwards to discover other dead nodes and subtrees.
Link: https://lkml.kernel.org/r/20230804165951.2661157-7-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replacing nodes may cause a live lock-up if CPU resources are saturated by
write operations on the tree by continuously retrying on dead nodes. To
avoid the continuous retry scenario, ensure the new node is inserted into
the tree prior to marking the old data as dead. This will define a window
where old and new data is swapped.
When reusing lower level nodes, ensure the parent pointer is updated after
the parent is marked dead. This ensures that the child is still reachable
from the top of the tree, but walking up to a dead node will result in a
single retry that will start a fresh walk from the top down through the
new node.
Link: https://lkml.kernel.org/r/20230804165951.2661157-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "maple_tree: Change replacement strategy".
The maple tree marks nodes dead as soon as they are going to be replaced.
This could be problematic when used in the RCU context since the writer
may be starved of CPU time by the readers. This patch set addresses the
issue by switching the data replacement strategy to one that will only
mark data as dead once the new data is available.
This series changes the ordering of the node replacement so that the new
data is live before the old data is marked 'dead'. When readers hit
'dead' nodes, they will restart from the top of the tree and end up in the
new data.
In more complex scenarios, the replacement strategy means a subtree is
built and graphed into the tree leaving some nodes to point to the old
parent. The view of tasks into the old data will either remain with the
old data, or see the new data once the old data is marked 'dead'.
Iterators will see the 'dead' node and restart on their own and switch to
the new data. There is no risk of the reader seeing old data in these
cases.
The 'dead' subtree of data is then fully marked dead, but reused nodes
will still point to the dead nodes until the parent pointer is updated.
Walking up to a 'dead' node will cause a re-walk from the top of the tree
and enter the new data area where old data is not reachable.
Once the parent pointers are fully up to date in the active data, the
'dead' subtree is iterated to collect entirely 'dead' subtrees, and dead
nodes (nodes that partially contained reused data).
This patch (of 6):
When dumping the tree, honour formatting request to output hex for the
maple node type arange64.
Link: https://lkml.kernel.org/r/20230804165951.2661157-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230804165951.2661157-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the new range can be completely covered by the original last range
without touching the boundaries on both sides, two new entries can be
appended to the end as a fast path. We update the original last pivot at
the end, and the newly appended two entries will not be accessed before
this, so it is also safe in RCU mode.
This is useful for sequential insertion, which is what we do in
dup_mmap(). Enabling BENCH_FORK in test_maple_tree and just running
bench_forking() gives the following time-consuming numbers:
before: after:
17,874.83 msec 15,738.38 msec
It shows about a 12% performance improvement for duplicating VMAs.
Link: https://lkml.kernel.org/r/20230628073657.75314-4-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The documentation of mt_next() claims that it starts the search at the
provided index. That's incorrect as it starts the search after the
provided index.
The documentation of mt_find() is slightly confusing. "Handles locking"
is not really helpful as it does not explain how the "locking" works.
Also the documentation of index talks about a range, while in reality the
index is updated on a succesful search to the index of the found entry
plus one.
Fix similar issues for mt_find_after() and mt_prev().
Reword the confusing "Note: Will not return the zero entry." comment on
mt_for_each() and document @__index correctly.
Link: https://lkml.kernel.org/r/87ttw2n556.ffs@tglx
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>