2005-04-17 02:20:36 +04:00
/*
2007-04-27 02:55:03 +04:00
* Copyright ( c ) 2002 , 2007 Red Hat , Inc . All rights reserved .
2005-04-17 02:20:36 +04:00
*
* This software may be freely redistributed under the terms of the
* GNU General Public License .
*
* You should have received a copy of the GNU General Public License
* along with this program ; if not , write to the Free Software
* Foundation , Inc . , 675 Mass Ave , Cambridge , MA 0213 9 , USA .
*
2008-06-06 09:46:18 +04:00
* Authors : David Woodhouse < dwmw2 @ infradead . org >
2005-04-17 02:20:36 +04:00
* David Howells < dhowells @ redhat . com >
*
*/
# include <linux/kernel.h>
# include <linux/module.h>
# include <linux/init.h>
2007-04-27 02:55:03 +04:00
# include <linux/circ_buf.h>
Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 01:22:52 +04:00
# include <linux/sched.h>
2005-04-17 02:20:36 +04:00
# include "internal.h"
2007-04-27 02:55:03 +04:00
afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver. This is done by the following means:
(1) When we break a callback on a vnode specified by the CB.CallBack call
from the server, we queue a work item (vnode->cb_work) to go and
clobber all the PTEs mapping to that inode.
This causes the CPU to trip through the ->map_pages() and
->page_mkwrite() handlers if userspace attempts to access the page(s)
again.
(Ideally, this would be done in the service handler for CB.CallBack,
but the server is waiting for our reply before considering, and we
have a list of vnodes, all of which need breaking - and the process of
getting the mmap_lock and stripping the PTEs on all CPUs could be
quite slow.)
(2) Call afs_validate() from the ->map_pages() handler to check to see if
the file has changed and to get a new callback promise from the
server.
Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:
(3) Maintain a per-cell list of afs files that are currently mmap'd
(cell->fs_open_mmaps).
(4) Add a work item to each server that is invoked if there are any open
mmaps when CB.InitCallBackState happens. This work item goes through
the aforementioned list and invokes the vnode->cb_work work item for
each one that is currently using this server.
This causes the PTEs to be cleared, causing ->map_pages() or
->page_mkwrite() to be called again, thereby calling afs_validate()
again.
I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.
This was tested using the attached test program:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t size = getpagesize();
unsigned char *p;
bool mod = (argc == 3);
int fd;
if (argc != 2 && argc != 3) {
fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
exit(2);
}
fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (;;) {
if (mod) {
p[0]++;
msync(p, size, MS_ASYNC);
fsync(fd);
}
printf("%02x", p[0]);
fflush(stdout);
sleep(1);
}
}
It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.
Two instances of this program can be run on different machines, one doing
the reading and one doing the writing. The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.
Testing the InitCallBackState change is more complicated. The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run. The
client machine then has to poke the server to trigger the InitCallBackState
call.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-02 18:43:10 +03:00
/*
* Handle invalidation of an mmap ' d file . We invalidate all the PTEs referring
* to the pages in this file ' s pagecache , forcing the kernel to go through
* - > fault ( ) or - > page_mkwrite ( ) - at which point we can handle invalidation
* more fully .
*/
void afs_invalidate_mmap_work ( struct work_struct * work )
{
struct afs_vnode * vnode = container_of ( work , struct afs_vnode , cb_work ) ;
unmap_mapping_pages ( vnode - > vfs_inode . i_mapping , 0 , 0 , false ) ;
}
void afs_server_init_callback_work ( struct work_struct * work )
{
struct afs_server * server = container_of ( work , struct afs_server , initcb_work ) ;
struct afs_vnode * vnode ;
struct afs_cell * cell = server - > cell ;
down_read ( & cell - > fs_open_mmaps_lock ) ;
list_for_each_entry ( vnode , & cell - > fs_open_mmaps , cb_mmap_link ) {
if ( vnode - > cb_server = = server ) {
clear_bit ( AFS_VNODE_CB_PROMISED , & vnode - > flags ) ;
queue_work ( system_unbound_wq , & vnode - > cb_work ) ;
}
}
up_read ( & cell - > fs_open_mmaps_lock ) ;
}
2017-11-02 18:27:49 +03:00
/*
2020-05-27 17:51:30 +03:00
* Allow the fileserver to request callback state ( re - ) initialisation .
* Unfortunately , UUIDs are not guaranteed unique .
2017-11-02 18:27:49 +03:00
*/
void afs_init_callback_state ( struct afs_server * server )
{
2020-05-27 17:51:30 +03:00
rcu_read_lock ( ) ;
do {
server - > cb_s_break + + ;
2021-09-02 23:51:01 +03:00
atomic_inc ( & server - > cell - > fs_s_break ) ;
afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver. This is done by the following means:
(1) When we break a callback on a vnode specified by the CB.CallBack call
from the server, we queue a work item (vnode->cb_work) to go and
clobber all the PTEs mapping to that inode.
This causes the CPU to trip through the ->map_pages() and
->page_mkwrite() handlers if userspace attempts to access the page(s)
again.
(Ideally, this would be done in the service handler for CB.CallBack,
but the server is waiting for our reply before considering, and we
have a list of vnodes, all of which need breaking - and the process of
getting the mmap_lock and stripping the PTEs on all CPUs could be
quite slow.)
(2) Call afs_validate() from the ->map_pages() handler to check to see if
the file has changed and to get a new callback promise from the
server.
Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:
(3) Maintain a per-cell list of afs files that are currently mmap'd
(cell->fs_open_mmaps).
(4) Add a work item to each server that is invoked if there are any open
mmaps when CB.InitCallBackState happens. This work item goes through
the aforementioned list and invokes the vnode->cb_work work item for
each one that is currently using this server.
This causes the PTEs to be cleared, causing ->map_pages() or
->page_mkwrite() to be called again, thereby calling afs_validate()
again.
I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.
This was tested using the attached test program:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t size = getpagesize();
unsigned char *p;
bool mod = (argc == 3);
int fd;
if (argc != 2 && argc != 3) {
fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
exit(2);
}
fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (;;) {
if (mod) {
p[0]++;
msync(p, size, MS_ASYNC);
fsync(fd);
}
printf("%02x", p[0]);
fflush(stdout);
sleep(1);
}
}
It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.
Two instances of this program can be run on different machines, one doing
the reading and one doing the writing. The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.
Testing the InitCallBackState change is more complicated. The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run. The
client machine then has to poke the server to trigger the InitCallBackState
call.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-02 18:43:10 +03:00
if ( ! list_empty ( & server - > cell - > fs_open_mmaps ) )
queue_work ( system_unbound_wq , & server - > initcb_work ) ;
} while ( ( server = rcu_dereference ( server - > uuid_next ) ) ) ;
2020-05-27 17:51:30 +03:00
rcu_read_unlock ( ) ;
2007-04-27 02:55:03 +04:00
}
/*
* actually break a callback
*/
2019-06-20 20:12:16 +03:00
void __afs_break_callback ( struct afs_vnode * vnode , enum afs_cb_break_reason reason )
2007-04-27 02:55:03 +04:00
{
_enter ( " " ) ;
2018-04-06 16:17:26 +03:00
clear_bit ( AFS_VNODE_NEW_CONTENT , & vnode - > flags ) ;
2017-11-02 18:27:49 +03:00
if ( test_and_clear_bit ( AFS_VNODE_CB_PROMISED , & vnode - > flags ) ) {
vnode - > cb_break + + ;
2021-09-02 23:51:01 +03:00
vnode - > cb_v_break = vnode - > volume - > cb_v_break ;
2017-11-02 18:27:49 +03:00
afs_clear_permits ( vnode ) ;
2007-04-27 02:55:03 +04:00
2019-05-11 01:03:31 +03:00
if ( vnode - > lock_state = = AFS_VNODE_LOCK_WAITING_FOR_CB )
2007-07-16 10:40:12 +04:00
afs_lock_may_be_available ( vnode ) ;
2019-06-20 20:12:16 +03:00
afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver. This is done by the following means:
(1) When we break a callback on a vnode specified by the CB.CallBack call
from the server, we queue a work item (vnode->cb_work) to go and
clobber all the PTEs mapping to that inode.
This causes the CPU to trip through the ->map_pages() and
->page_mkwrite() handlers if userspace attempts to access the page(s)
again.
(Ideally, this would be done in the service handler for CB.CallBack,
but the server is waiting for our reply before considering, and we
have a list of vnodes, all of which need breaking - and the process of
getting the mmap_lock and stripping the PTEs on all CPUs could be
quite slow.)
(2) Call afs_validate() from the ->map_pages() handler to check to see if
the file has changed and to get a new callback promise from the
server.
Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:
(3) Maintain a per-cell list of afs files that are currently mmap'd
(cell->fs_open_mmaps).
(4) Add a work item to each server that is invoked if there are any open
mmaps when CB.InitCallBackState happens. This work item goes through
the aforementioned list and invokes the vnode->cb_work work item for
each one that is currently using this server.
This causes the PTEs to be cleared, causing ->map_pages() or
->page_mkwrite() to be called again, thereby calling afs_validate()
again.
I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.
This was tested using the attached test program:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t size = getpagesize();
unsigned char *p;
bool mod = (argc == 3);
int fd;
if (argc != 2 && argc != 3) {
fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
exit(2);
}
fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (;;) {
if (mod) {
p[0]++;
msync(p, size, MS_ASYNC);
fsync(fd);
}
printf("%02x", p[0]);
fflush(stdout);
sleep(1);
}
}
It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.
Two instances of this program can be run on different machines, one doing
the reading and one doing the writing. The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.
Testing the InitCallBackState change is more complicated. The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run. The
client machine then has to poke the server to trigger the InitCallBackState
call.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-02 18:43:10 +03:00
if ( reason ! = afs_cb_break_for_deleted & &
vnode - > status . type = = AFS_FTYPE_FILE & &
atomic_read ( & vnode - > cb_nr_mmap ) )
queue_work ( system_unbound_wq , & vnode - > cb_work ) ;
2019-06-20 20:12:16 +03:00
trace_afs_cb_break ( & vnode - > fid , vnode - > cb_break , reason , true ) ;
} else {
trace_afs_cb_break ( & vnode - > fid , vnode - > cb_break , reason , false ) ;
2007-04-27 02:55:03 +04:00
}
2018-10-20 02:57:58 +03:00
}
2017-11-02 18:27:49 +03:00
2019-06-20 20:12:16 +03:00
void afs_break_callback ( struct afs_vnode * vnode , enum afs_cb_break_reason reason )
2018-10-20 02:57:58 +03:00
{
write_seqlock ( & vnode - > cb_lock ) ;
2019-06-20 20:12:16 +03:00
__afs_break_callback ( vnode , reason ) ;
2017-11-02 18:27:49 +03:00
write_sequnlock ( & vnode - > cb_lock ) ;
2007-04-27 02:55:03 +04:00
}
2020-03-27 18:02:44 +03:00
/*
2020-04-30 03:03:49 +03:00
* Look up a volume by volume ID under RCU conditions .
2020-03-27 18:02:44 +03:00
*/
2020-04-30 03:03:49 +03:00
static struct afs_volume * afs_lookup_volume_rcu ( struct afs_cell * cell ,
afs_volid_t vid )
2020-03-27 18:02:44 +03:00
{
2020-04-30 03:03:49 +03:00
struct afs_volume * volume = NULL ;
2020-03-27 18:02:44 +03:00
struct rb_node * p ;
int seq = 0 ;
do {
/* Unfortunately, rbtree walking doesn't give reliable results
* under just the RCU read lock , so we have to check for
* changes .
*/
2020-04-30 03:03:49 +03:00
read_seqbegin_or_lock ( & cell - > volume_lock , & seq ) ;
2020-03-27 18:02:44 +03:00
2020-04-30 03:03:49 +03:00
p = rcu_dereference_raw ( cell - > volumes . rb_node ) ;
2020-03-27 18:02:44 +03:00
while ( p ) {
2020-04-30 03:03:49 +03:00
volume = rb_entry ( p , struct afs_volume , cell_node ) ;
2020-03-27 18:02:44 +03:00
2020-04-30 03:03:49 +03:00
if ( volume - > vid < vid )
2020-03-27 18:02:44 +03:00
p = rcu_dereference_raw ( p - > rb_left ) ;
2020-04-30 03:03:49 +03:00
else if ( volume - > vid > vid )
2020-03-27 18:02:44 +03:00
p = rcu_dereference_raw ( p - > rb_right ) ;
else
break ;
2020-04-30 03:03:49 +03:00
volume = NULL ;
2020-03-27 18:02:44 +03:00
}
2020-04-30 03:03:49 +03:00
} while ( need_seqretry ( & cell - > volume_lock , seq ) ) ;
2020-03-27 18:02:44 +03:00
2020-04-30 03:03:49 +03:00
done_seqretry ( & cell - > volume_lock , seq ) ;
return volume ;
2020-03-27 18:02:44 +03:00
}
2007-04-27 02:55:03 +04:00
/*
* allow the fileserver to explicitly break one callback
* - happens when
* - the backing file is changed
* - a lock is released
*/
2020-04-30 03:03:49 +03:00
static void afs_break_one_callback ( struct afs_volume * volume ,
struct afs_fid * fid )
2007-04-27 02:55:03 +04:00
{
2020-04-30 03:03:49 +03:00
struct super_block * sb ;
2007-04-27 02:55:03 +04:00
struct afs_vnode * vnode ;
2017-11-02 18:27:49 +03:00
struct inode * inode ;
2007-04-27 02:55:03 +04:00
2020-04-30 03:03:49 +03:00
if ( fid - > vnode = = 0 & & fid - > unique = = 0 ) {
/* The callback break applies to an entire volume. */
write_lock ( & volume - > cb_v_break_lock ) ;
volume - > cb_v_break + + ;
trace_afs_cb_break ( fid , volume - > cb_v_break ,
afs_cb_break_for_volume_callback , false ) ;
write_unlock ( & volume - > cb_v_break_lock ) ;
return ;
}
2018-05-13 00:31:33 +03:00
2020-04-30 03:03:49 +03:00
/* See if we can find a matching inode - even an I_NEW inode needs to
* be marked as it can have its callback broken before we finish
* setting up the local inode .
*/
sb = rcu_dereference ( volume - > sb ) ;
if ( ! sb )
return ;
inode = find_inode_rcu ( sb , fid - > vnode , afs_ilookup5_test_by_fid , fid ) ;
if ( inode ) {
vnode = AFS_FS_I ( inode ) ;
afs_break_callback ( vnode , afs_cb_break_for_callback ) ;
} else {
trace_afs_cb_miss ( fid , afs_cb_break_for_callback ) ;
2017-11-02 18:27:49 +03:00
}
2020-03-27 18:02:44 +03:00
}
2007-04-27 02:55:03 +04:00
2020-03-27 18:02:44 +03:00
static void afs_break_some_callbacks ( struct afs_server * server ,
struct afs_callback_break * cbb ,
size_t * _count )
{
struct afs_callback_break * residue = cbb ;
2020-04-30 03:03:49 +03:00
struct afs_volume * volume ;
2020-03-27 18:02:44 +03:00
afs_volid_t vid = cbb - > fid . vid ;
size_t i ;
2020-04-30 03:03:49 +03:00
volume = afs_lookup_volume_rcu ( server - > cell , vid ) ;
2020-03-27 18:02:44 +03:00
/* TODO: Find all matching volumes if we couldn't match the server and
* break them anyway .
*/
for ( i = * _count ; i > 0 ; cbb + + , i - - ) {
if ( cbb - > fid . vid = = vid ) {
_debug ( " - Fid { vl=%08llx n=%llu u=%u } " ,
cbb - > fid . vid ,
cbb - > fid . vnode ,
cbb - > fid . unique ) ;
- - * _count ;
2020-04-30 03:03:49 +03:00
if ( volume )
afs_break_one_callback ( volume , & cbb - > fid ) ;
2020-03-27 18:02:44 +03:00
} else {
* residue + + = * cbb ;
}
}
2007-04-27 02:49:28 +04:00
}
2005-04-17 02:20:36 +04:00
/*
* allow the fileserver to break callback promises
*/
2007-04-27 02:55:03 +04:00
void afs_break_callbacks ( struct afs_server * server , size_t count ,
2018-04-09 23:12:31 +03:00
struct afs_callback_break * callbacks )
2005-04-17 02:20:36 +04:00
{
2007-04-27 02:55:03 +04:00
_enter ( " %p,%zu, " , server , count ) ;
2005-04-17 02:20:36 +04:00
2007-04-27 02:55:03 +04:00
ASSERT ( server ! = NULL ) ;
2005-04-17 02:20:36 +04:00
2020-03-27 18:02:44 +03:00
rcu_read_lock ( ) ;
2018-05-13 00:31:33 +03:00
2020-03-27 18:02:44 +03:00
while ( count > 0 )
afs_break_some_callbacks ( server , callbacks , & count ) ;
2007-04-27 02:55:03 +04:00
2020-03-27 18:02:44 +03:00
rcu_read_unlock ( ) ;
2007-04-27 02:55:03 +04:00
return ;
}