s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
/*
Samba Unix / Linux SMB client library
net serverid commands
Copyright ( C ) Volker Lendecke 2010
This program is free software ; you can redistribute it and / or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation ; either version 3 of the License , or
( at your option ) any later version .
This program is distributed in the hope that it will be useful ,
but WITHOUT ANY WARRANTY ; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
GNU General Public License for more details .
You should have received a copy of the GNU General Public License
along with this program . If not , see < http : //www.gnu.org/licenses/>.
*/
# include "includes.h"
# include "utils/net.h"
2010-08-18 20:59:23 +04:00
# include "dbwrap.h"
2011-02-25 01:05:57 +03:00
# include "serverid.h"
2011-02-25 01:14:15 +03:00
# include "session.h"
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
static int net_serverid_list_fn ( const struct server_id * id ,
uint32_t msg_flags , void * priv )
{
2011-06-08 08:05:55 +04:00
char * str = server_id_str ( talloc_tos ( ) , id ) ;
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
d_printf ( " %s %llu 0x%x \n " , str , ( unsigned long long ) id - > unique_id ,
( unsigned int ) msg_flags ) ;
TALLOC_FREE ( str ) ;
return 0 ;
}
static int net_serverid_list ( struct net_context * c , int argc ,
const char * * argv )
{
d_printf ( " pid unique_id msg_flags \n " ) ;
return serverid_traverse_read ( net_serverid_list_fn , NULL ) ? 0 : - 1 ;
}
static int net_serverid_wipe_fn ( struct db_record * rec ,
const struct server_id * id ,
uint32_t msg_flags , void * private_data )
{
NTSTATUS status ;
if ( id - > vnn ! = get_my_vnn ( ) ) {
return 0 ;
}
status = rec - > delete_rec ( rec ) ;
if ( ! NT_STATUS_IS_OK ( status ) ) {
2011-06-08 08:05:55 +04:00
char * str = server_id_str ( talloc_tos ( ) , id ) ;
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
DEBUG ( 1 , ( " Could not delete serverid.tdb record %s: %s \n " ,
str , nt_errstr ( status ) ) ) ;
TALLOC_FREE ( str ) ;
}
return 0 ;
}
static int net_serverid_wipe ( struct net_context * c , int argc ,
const char * * argv )
{
return serverid_traverse ( net_serverid_wipe_fn , NULL ) ? 0 : - 1 ;
}
static int net_serverid_wipedbs_conn (
struct db_record * rec ,
const struct connections_key * key ,
const struct connections_data * data ,
void * private_data )
{
if ( ! serverid_exists ( & key - > pid ) ) {
NTSTATUS status ;
DEBUG ( 10 , ( " Deleting connections.tdb record for pid %s \n " ,
2011-06-08 08:05:55 +04:00
server_id_str ( talloc_tos ( ) , & key - > pid ) ) ) ;
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
status = rec - > delete_rec ( rec ) ;
if ( ! NT_STATUS_IS_OK ( status ) ) {
DEBUG ( 1 , ( " Could not delete connections.tdb record "
" for pid %s: %s \n " ,
2011-06-08 08:05:55 +04:00
server_id_str ( talloc_tos ( ) , & key - > pid ) ,
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
nt_errstr ( status ) ) ) ;
}
}
return 0 ;
}
static int net_serverid_wipedbs_sessionid ( struct db_record * rec ,
const char * key ,
struct sessionid * session ,
void * private_data )
{
if ( ! serverid_exists ( & session - > pid ) ) {
NTSTATUS status ;
DEBUG ( 10 , ( " Deleting sessionid.tdb record for pid %s \n " ,
2011-06-08 08:05:55 +04:00
server_id_str ( talloc_tos ( ) , & session - > pid ) ) ) ;
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
status = rec - > delete_rec ( rec ) ;
if ( ! NT_STATUS_IS_OK ( status ) ) {
DEBUG ( 1 , ( " Could not delete session.tdb record "
" for pid %s: %s \n " ,
2011-06-08 08:05:55 +04:00
server_id_str ( talloc_tos ( ) , & session - > pid ) ,
s3: Fix a long-standing problem with recycled PIDs
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
2010-03-02 19:02:01 +03:00
nt_errstr ( status ) ) ) ;
}
}
return 0 ;
}
static int net_serverid_wipedbs ( struct net_context * c , int argc ,
const char * * argv )
{
connections_forall ( net_serverid_wipedbs_conn , NULL ) ;
sessionid_traverse ( net_serverid_wipedbs_sessionid , NULL ) ;
return 0 ;
}
int net_serverid ( struct net_context * c , int argc , const char * * argv )
{
struct functable func [ ] = {
{
" list " ,
net_serverid_list ,
NET_TRANSPORT_LOCAL ,
N_ ( " List all entries from serverid.tdb " ) ,
N_ ( " net serverid list \n "
" List all entries from serverid.tdb " )
} ,
{
" wipe " ,
net_serverid_wipe ,
NET_TRANSPORT_LOCAL ,
N_ ( " Wipe the serverid.tdb for the current node " ) ,
N_ ( " net serverid wipe \n "
" Wipe the serverid.tdb for the current node " )
} ,
{
" wipedbs " ,
net_serverid_wipedbs ,
NET_TRANSPORT_LOCAL ,
N_ ( " Clean dead entries from connections.tdb and "
" sessionid.tdb " ) ,
N_ ( " net serverid wipedbs \n "
" Clean dead entries from connections.tdb and "
" sessionid.tdb " )
} ,
{ NULL , NULL , 0 , NULL , NULL }
} ;
return net_run_function ( c , argc , argv , " net serverid " , func ) ;
}