2006-01-02 19:04:38 +01:00
/*
2014-06-09 11:08:18 -05:00
* net / tipc / socket . c : TIPC socket API
2007-02-09 23:25:21 +09:00
*
2014-03-12 11:31:09 -04:00
* Copyright ( c ) 2001 - 2007 , 2012 - 2014 , Ericsson AB
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
* Copyright ( c ) 2004 - 2008 , 2010 - 2013 , Wind River Systems
2006-01-02 19:04:38 +01:00
* All rights reserved .
*
2006-01-11 13:30:43 +01:00
* Redistribution and use in source and binary forms , with or without
2006-01-02 19:04:38 +01:00
* modification , are permitted provided that the following conditions are met :
*
2006-01-11 13:30:43 +01:00
* 1. Redistributions of source code must retain the above copyright
* notice , this list of conditions and the following disclaimer .
* 2. Redistributions in binary form must reproduce the above copyright
* notice , this list of conditions and the following disclaimer in the
* documentation and / or other materials provided with the distribution .
* 3. Neither the names of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission .
2006-01-02 19:04:38 +01:00
*
2006-01-11 13:30:43 +01:00
* Alternatively , this software may be distributed under the terms of the
* GNU General Public License ( " GPL " ) version 2 as published by the Free
* Software Foundation .
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS " AS IS "
* AND ANY EXPRESS OR IMPLIED WARRANTIES , INCLUDING , BUT NOT LIMITED TO , THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT , INDIRECT , INCIDENTAL , SPECIAL , EXEMPLARY , OR
* CONSEQUENTIAL DAMAGES ( INCLUDING , BUT NOT LIMITED TO , PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE , DATA , OR PROFITS ; OR BUSINESS
* INTERRUPTION ) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY , WHETHER IN
* CONTRACT , STRICT LIABILITY , OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE )
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE , EVEN IF ADVISED OF THE
2006-01-02 19:04:38 +01:00
* POSSIBILITY OF SUCH DAMAGE .
*/
# include "core.h"
2010-11-30 12:00:53 +00:00
# include "port.h"
2014-06-25 20:41:37 -05:00
# include "name_table.h"
2014-04-24 16:26:47 +02:00
# include "node.h"
2014-06-25 20:41:37 -05:00
# include "link.h"
2012-06-29 00:16:37 -04:00
# include <linux/export.h>
2014-06-25 20:41:35 -05:00
# include "link.h"
2012-06-29 00:16:37 -04:00
2006-01-02 19:04:38 +01:00
# define SS_LISTENING -1 /* socket is listening */
# define SS_READY -2 /* socket is connectionless */
2008-04-13 21:35:11 -07:00
# define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
2014-06-25 20:41:41 -05:00
# define TIPC_FWD_MSG 1
2006-01-02 19:04:38 +01:00
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
static int tipc_backlog_rcv ( struct sock * sk , struct sk_buff * skb ) ;
2014-04-11 16:15:36 -04:00
static void tipc_data_ready ( struct sock * sk ) ;
2012-08-21 11:16:57 +08:00
static void tipc_write_space ( struct sock * sk ) ;
2014-02-18 16:06:46 +08:00
static int tipc_release ( struct socket * sock ) ;
static int tipc_accept ( struct socket * sock , struct socket * new_sock , int flags ) ;
2014-07-16 20:41:01 -04:00
static int tipc_wait_for_sndmsg ( struct socket * sock , long * timeo_p ) ;
2006-01-02 19:04:38 +01:00
2008-02-07 18:18:01 -08:00
static const struct proto_ops packet_ops ;
static const struct proto_ops stream_ops ;
static const struct proto_ops msg_ops ;
2006-01-02 19:04:38 +01:00
static struct proto tipc_proto ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
static struct proto tipc_proto_kern ;
2006-01-02 19:04:38 +01:00
2007-02-09 23:25:21 +09:00
/*
2008-04-15 00:22:02 -07:00
* Revised TIPC socket locking policy :
*
* Most socket operations take the standard socket lock when they start
* and hold it until they finish ( or until they need to sleep ) . Acquiring
* this lock grants the owner exclusive access to the fields of the socket
* data structures , with the exception of the backlog queue . A few socket
* operations can be done without taking the socket lock because they only
* read socket information that never changes during the life of the socket .
*
* Socket operations may acquire the lock for the associated TIPC port if they
* need to perform an operation on the port . If any routine needs to acquire
* both the socket lock and the port lock it must take the socket lock first
* to avoid the risk of deadlock .
*
* The dispatcher handling incoming messages cannot grab the socket lock in
* the standard fashion , since invoked it runs at the BH level and cannot block .
* Instead , it checks to see if the socket lock is currently owned by someone ,
* and either handles the message itself or adds it to the socket ' s backlog
* queue ; in the latter case the queued message is processed once the process
* owning the socket lock releases it .
*
* NOTE : Releasing the socket lock while an operation is sleeping overcomes
* the problem of a blocked socket operation preventing any other operations
* from occurring . However , applications must be careful if they have
* multiple threads trying to send ( or receive ) on the same socket , as these
* operations might interfere with each other . For example , doing a connect
* and a receive at the same time might allow the receive to consume the
* ACK message meant for the connect . While additional work could be done
* to try and overcome this , it doesn ' t seem to be worthwhile at the present .
*
* NOTE : Releasing the socket lock while an operation is sleeping also ensures
* that another operation that must be performed in a non - blocking manner is
* not delayed for very long because the lock has already been taken .
*
* NOTE : This code assumes that certain fields of a port / socket pair are
* constant over its lifetime ; such fields can be examined without taking
* the socket lock and / or port lock , and do not need to be re - read even
* after resuming processing after waiting . These fields include :
* - socket type
* - pointer to socket sk structure ( aka tipc_sock structure )
* - pointer to port structure
* - port reference
*/
2014-03-12 11:31:09 -04:00
# include "socket.h"
2008-04-15 00:22:02 -07:00
/**
* advance_rx_queue - discard first buffer in socket receive queue
*
* Caller must hold socket lock
2006-01-02 19:04:38 +01:00
*/
2008-04-15 00:22:02 -07:00
static void advance_rx_queue ( struct sock * sk )
2006-01-02 19:04:38 +01:00
{
2011-11-04 13:24:29 -04:00
kfree_skb ( __skb_dequeue ( & sk - > sk_receive_queue ) ) ;
2006-01-02 19:04:38 +01:00
}
/**
2008-04-15 00:22:02 -07:00
* reject_rx_queue - reject all buffers in socket receive queue
*
* Caller must hold socket lock
2006-01-02 19:04:38 +01:00
*/
2008-04-15 00:22:02 -07:00
static void reject_rx_queue ( struct sock * sk )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sk_buff * buf ;
2014-06-25 20:41:35 -05:00
u32 dnode ;
2008-04-15 00:22:02 -07:00
2014-06-25 20:41:35 -05:00
while ( ( buf = __skb_dequeue ( & sk - > sk_receive_queue ) ) ) {
if ( tipc_msg_reverse ( buf , & dnode , TIPC_ERR_NO_PORT ) )
2014-07-16 20:41:03 -04:00
tipc_link_xmit ( buf , dnode , 0 ) ;
2014-06-25 20:41:35 -05:00
}
2006-01-02 19:04:38 +01:00
}
/**
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
* tipc_sk_create - create a TIPC socket
2008-04-15 00:22:02 -07:00
* @ net : network namespace ( must be default network )
2006-01-02 19:04:38 +01:00
* @ sock : pre - allocated socket structure
* @ protocol : protocol indicator ( must be 0 )
2009-11-05 22:18:14 -08:00
* @ kern : caused by kernel or by userspace ?
2007-02-09 23:25:21 +09:00
*
2008-04-15 00:22:02 -07:00
* This routine creates additional data structures used by the TIPC socket ,
* initializes them , and links them together .
2006-01-02 19:04:38 +01:00
*
* Returns 0 on success , errno otherwise
*/
2014-03-12 11:31:12 -04:00
static int tipc_sk_create ( struct net * net , struct socket * sock ,
int protocol , int kern )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
const struct proto_ops * ops ;
socket_state state ;
2006-01-02 19:04:38 +01:00
struct sock * sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk ;
struct tipc_port * port ;
u32 ref ;
2008-04-15 00:22:02 -07:00
/* Validate arguments */
2006-01-02 19:04:38 +01:00
if ( unlikely ( protocol ! = 0 ) )
return - EPROTONOSUPPORT ;
switch ( sock - > type ) {
case SOCK_STREAM :
2008-04-15 00:22:02 -07:00
ops = & stream_ops ;
state = SS_UNCONNECTED ;
2006-01-02 19:04:38 +01:00
break ;
case SOCK_SEQPACKET :
2008-04-15 00:22:02 -07:00
ops = & packet_ops ;
state = SS_UNCONNECTED ;
2006-01-02 19:04:38 +01:00
break ;
case SOCK_DGRAM :
case SOCK_RDM :
2008-04-15 00:22:02 -07:00
ops = & msg_ops ;
state = SS_READY ;
2006-01-02 19:04:38 +01:00
break ;
2006-06-25 23:47:18 -07:00
default :
return - EPROTOTYPE ;
2006-01-02 19:04:38 +01:00
}
2008-04-15 00:22:02 -07:00
/* Allocate socket's protocol area */
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
if ( ! kern )
sk = sk_alloc ( net , AF_TIPC , GFP_KERNEL , & tipc_proto ) ;
else
sk = sk_alloc ( net , AF_TIPC , GFP_KERNEL , & tipc_proto_kern ) ;
2008-04-15 00:22:02 -07:00
if ( sk = = NULL )
2006-01-02 19:04:38 +01:00
return - ENOMEM ;
2014-03-12 11:31:12 -04:00
tsk = tipc_sk ( sk ) ;
port = & tsk - > port ;
ref = tipc_port_init ( port , TIPC_LOW_IMPORTANCE ) ;
if ( ! ref ) {
pr_warn ( " Socket registration failed, ref. table exhausted \n " ) ;
2008-04-15 00:22:02 -07:00
sk_free ( sk ) ;
return - ENOMEM ;
}
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Finish initializing socket data structures */
sock - > ops = ops ;
sock - > state = state ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
sock_init_data ( sock , sk ) ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
sk - > sk_backlog_rcv = tipc_backlog_rcv ;
2013-06-17 10:54:37 -04:00
sk - > sk_rcvbuf = sysctl_tipc_rmem [ 1 ] ;
2012-08-21 11:16:57 +08:00
sk - > sk_data_ready = tipc_data_ready ;
sk - > sk_write_space = tipc_write_space ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
tsk - > conn_timeout = CONN_TIMEOUT_DEFAULT ;
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked = 0 ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
atomic_set ( & tsk - > dupl_rcvcnt , 0 ) ;
2014-03-12 11:31:12 -04:00
tipc_port_unlock ( port ) ;
2008-05-12 15:42:28 -07:00
2008-04-15 00:22:02 -07:00
if ( sock - > state = = SS_READY ) {
2014-03-12 11:31:12 -04:00
tipc_port_set_unreturnable ( port , true ) ;
2008-04-15 00:22:02 -07:00
if ( sock - > type = = SOCK_DGRAM )
2014-03-12 11:31:12 -04:00
tipc_port_set_unreliable ( port , true ) ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
return 0 ;
}
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
/**
* tipc_sock_create_local - create TIPC socket from inside TIPC module
* @ type : socket type - SOCK_RDM or SOCK_SEQPACKET
*
* We cannot use sock_creat_kern here because it bumps module user count .
* Since socket owner and creator is the same module we must make sure
* that module count remains zero for module local sockets , otherwise
* we cannot do rmmod .
*
* Returns 0 on success , errno otherwise
*/
int tipc_sock_create_local ( int type , struct socket * * res )
{
int rc ;
rc = sock_create_lite ( AF_TIPC , type , 0 , res ) ;
if ( rc < 0 ) {
pr_err ( " Failed to create kernel socket \n " ) ;
return rc ;
}
tipc_sk_create ( & init_net , * res , 0 , 1 ) ;
return 0 ;
}
/**
* tipc_sock_release_local - release socket created by tipc_sock_create_local
* @ sock : the socket to be released .
*
* Module reference count is not incremented when such sockets are created ,
* so we must keep it from being decremented when they are released .
*/
void tipc_sock_release_local ( struct socket * sock )
{
2014-02-18 16:06:46 +08:00
tipc_release ( sock ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
sock - > ops = NULL ;
sock_release ( sock ) ;
}
/**
* tipc_sock_accept_local - accept a connection on a socket created
* with tipc_sock_create_local . Use this function to avoid that
* module reference count is inadvertently incremented .
*
* @ sock : the accepting socket
* @ newsock : reference to the new socket to be created
* @ flags : socket flags
*/
int tipc_sock_accept_local ( struct socket * sock , struct socket * * newsock ,
2013-06-17 10:54:47 -04:00
int flags )
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
{
struct sock * sk = sock - > sk ;
int ret ;
ret = sock_create_lite ( sk - > sk_family , sk - > sk_type ,
sk - > sk_protocol , newsock ) ;
if ( ret < 0 )
return ret ;
2014-02-18 16:06:46 +08:00
ret = tipc_accept ( sock , * newsock , flags ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
if ( ret < 0 ) {
sock_release ( * newsock ) ;
return ret ;
}
( * newsock ) - > ops = sock - > ops ;
return ret ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_release - destroy a TIPC socket
2006-01-02 19:04:38 +01:00
* @ sock : socket to destroy
*
* This routine cleans up any messages that are still queued on the socket .
* For DGRAM and RDM socket types , all queued messages are rejected .
* For SEQPACKET and STREAM socket types , the first message is rejected
* and any others are discarded . ( If the first message on a STREAM socket
* is partially - read , it is discarded and the next one is rejected instead . )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* NOTE : Rejected messages are not necessarily returned to the sender ! They
* are returned or discarded according to the " destination droppable " setting
* specified for the message by the sender .
*
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_release ( struct socket * sock )
2006-01-02 19:04:38 +01:00
{
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk ;
struct tipc_port * port ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
2014-06-25 20:41:35 -05:00
u32 dnode ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/*
* Exit if socket isn ' t fully initialized ( occurs when a failed accept ( )
* releases a pre - allocated child socket that was never used )
*/
if ( sk = = NULL )
2006-01-02 19:04:38 +01:00
return 0 ;
2007-02-09 23:25:21 +09:00
2014-03-12 11:31:12 -04:00
tsk = tipc_sk ( sk ) ;
port = & tsk - > port ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
/*
* Reject all unreceived messages , except on an active connection
* ( which disconnects locally & sends a ' FIN + ' to peer )
*/
2006-01-02 19:04:38 +01:00
while ( sock - > state ! = SS_DISCONNECTING ) {
2008-04-15 00:22:02 -07:00
buf = __skb_dequeue ( & sk - > sk_receive_queue ) ;
if ( buf = = NULL )
2006-01-02 19:04:38 +01:00
break ;
2013-10-18 07:23:16 +02:00
if ( TIPC_SKB_CB ( buf ) - > handle ! = NULL )
2011-11-04 13:24:29 -04:00
kfree_skb ( buf ) ;
2008-04-15 00:22:02 -07:00
else {
if ( ( sock - > state = = SS_CONNECTING ) | |
( sock - > state = = SS_CONNECTED ) ) {
sock - > state = SS_DISCONNECTING ;
2014-03-12 11:31:12 -04:00
tipc_port_disconnect ( port - > ref ) ;
2008-04-15 00:22:02 -07:00
}
2014-06-25 20:41:35 -05:00
if ( tipc_msg_reverse ( buf , & dnode , TIPC_ERR_NO_PORT ) )
2014-07-16 20:41:03 -04:00
tipc_link_xmit ( buf , dnode , 0 ) ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
}
2014-03-12 11:31:12 -04:00
/* Destroy TIPC port; also disconnects an active connection and
* sends a ' FIN - ' to peer .
2008-04-15 00:22:02 -07:00
*/
2014-03-12 11:31:12 -04:00
tipc_port_destroy ( port ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Discard any remaining (connection-based) messages in receive queue */
2013-01-20 23:30:08 +01:00
__skb_queue_purge ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Reject any messages that accumulated in backlog queue */
sock - > state = SS_DISCONNECTING ;
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
sock_put ( sk ) ;
2008-04-15 00:22:02 -07:00
sock - > sk = NULL ;
2006-01-02 19:04:38 +01:00
2014-04-06 15:56:14 +02:00
return 0 ;
2006-01-02 19:04:38 +01:00
}
/**
2014-02-18 16:06:46 +08:00
* tipc_bind - associate or disassocate TIPC name ( s ) with a socket
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ uaddr : socket address describing name ( s ) and desired operation
* @ uaddr_len : size of socket address data structure
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Name and name sequence binding is indicated using a positive scope value ;
* a negative scope value unbinds the specified name . Specifying no name
* ( i . e . a socket address length of 0 ) unbinds all names from the socket .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
2008-04-15 00:22:02 -07:00
*
* NOTE : This routine doesn ' t need to take the socket lock since it doesn ' t
* access any non - constant socket information .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static int tipc_bind ( struct socket * sock , struct sockaddr * uaddr ,
int uaddr_len )
2006-01-02 19:04:38 +01:00
{
2013-12-27 10:18:28 +08:00
struct sock * sk = sock - > sk ;
2006-01-02 19:04:38 +01:00
struct sockaddr_tipc * addr = ( struct sockaddr_tipc * ) uaddr ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2013-12-27 10:18:28 +08:00
int res = - EINVAL ;
2006-01-02 19:04:38 +01:00
2013-12-27 10:18:28 +08:00
lock_sock ( sk ) ;
if ( unlikely ( ! uaddr_len ) ) {
2014-03-12 11:31:12 -04:00
res = tipc_withdraw ( & tsk - > port , 0 , NULL ) ;
2013-12-27 10:18:28 +08:00
goto exit ;
}
2007-02-09 23:25:21 +09:00
2013-12-27 10:18:28 +08:00
if ( uaddr_len < sizeof ( struct sockaddr_tipc ) ) {
res = - EINVAL ;
goto exit ;
}
if ( addr - > family ! = AF_TIPC ) {
res = - EAFNOSUPPORT ;
goto exit ;
}
2006-01-02 19:04:38 +01:00
if ( addr - > addrtype = = TIPC_ADDR_NAME )
addr - > addr . nameseq . upper = addr - > addr . nameseq . lower ;
2013-12-27 10:18:28 +08:00
else if ( addr - > addrtype ! = TIPC_ADDR_NAMESEQ ) {
res = - EAFNOSUPPORT ;
goto exit ;
}
2007-02-09 23:25:21 +09:00
tipc: convert topology server to use new server facility
As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it. We get two benefits
from doing this:
1) It simplifies the topology server locking policy. In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance. That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.
Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy. Moreover, the change will allow us to make the topology
server code more readable and maintainable.
2) It fixes a bug where sent subscription events may be lost when
the topology port is congested. Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context. The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.
Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter. And overall, we see a simplification of the
TIPC subscriber code in making this changeover.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:40 -04:00
if ( ( addr - > addr . nameseq . type < TIPC_RESERVED_TYPES ) & &
2013-06-17 10:54:41 -04:00
( addr - > addr . nameseq . type ! = TIPC_TOP_SRV ) & &
2013-12-27 10:18:28 +08:00
( addr - > addr . nameseq . type ! = TIPC_CFG_SRV ) ) {
res = - EACCES ;
goto exit ;
}
2011-11-02 15:49:40 -04:00
2013-12-27 10:18:28 +08:00
res = ( addr - > scope > 0 ) ?
2014-03-12 11:31:12 -04:00
tipc_publish ( & tsk - > port , addr - > scope , & addr - > addr . nameseq ) :
tipc_withdraw ( & tsk - > port , - addr - > scope , & addr - > addr . nameseq ) ;
2013-12-27 10:18:28 +08:00
exit :
release_sock ( sk ) ;
return res ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_getname - get port ID of socket or peer socket
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ uaddr : area for returned socket address
* @ uaddr_len : area for returned length of socket address
2008-07-14 22:43:32 -07:00
* @ peer : 0 = own ID , 1 = current peer ID , 2 = current / former peer ID
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
2008-04-15 00:22:02 -07:00
*
2008-07-14 22:43:32 -07:00
* NOTE : This routine doesn ' t need to take the socket lock since it only
* accesses socket information that is unchanging ( or which changes in
2010-12-31 18:59:32 +00:00
* a completely predictable manner ) .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static int tipc_getname ( struct socket * sock , struct sockaddr * uaddr ,
int * uaddr_len , int peer )
2006-01-02 19:04:38 +01:00
{
struct sockaddr_tipc * addr = ( struct sockaddr_tipc * ) uaddr ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sock - > sk ) ;
2006-01-02 19:04:38 +01:00
2010-10-31 07:10:32 +00:00
memset ( addr , 0 , sizeof ( * addr ) ) ;
2008-04-15 00:22:02 -07:00
if ( peer ) {
2008-07-14 22:43:32 -07:00
if ( ( sock - > state ! = SS_CONNECTED ) & &
( ( peer ! = 2 ) | | ( sock - > state ! = SS_DISCONNECTING ) ) )
return - ENOTCONN ;
2014-03-12 11:31:12 -04:00
addr - > addr . id . ref = tipc_port_peerport ( & tsk - > port ) ;
addr - > addr . id . node = tipc_port_peernode ( & tsk - > port ) ;
2008-04-15 00:22:02 -07:00
} else {
2014-03-12 11:31:12 -04:00
addr - > addr . id . ref = tsk - > port . ref ;
2010-11-30 12:01:03 +00:00
addr - > addr . id . node = tipc_own_addr ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
* uaddr_len = sizeof ( * addr ) ;
addr - > addrtype = TIPC_ADDR_ID ;
addr - > family = AF_TIPC ;
addr - > scope = 0 ;
addr - > addr . name . domain = 0 ;
2008-04-15 00:22:02 -07:00
return 0 ;
2006-01-02 19:04:38 +01:00
}
/**
2014-02-18 16:06:46 +08:00
* tipc_poll - read and possibly block on pollmask
2006-01-02 19:04:38 +01:00
* @ file : file structure associated with the socket
* @ sock : socket for which to calculate the poll bits
* @ wait : ? ? ?
*
2008-03-26 16:48:21 -07:00
* Returns pollmask value
*
* COMMENTARY :
* It appears that the usual socket locking mechanisms are not useful here
* since the pollmask info is potentially out - of - date the moment this routine
* exits . TCP and other protocols seem to rely on higher level poll routines
* to handle any preventable race conditions , so TIPC will do the same . . .
*
* TIPC sets the returned events as follows :
2010-08-17 11:00:06 +00:00
*
* socket state flags set
* - - - - - - - - - - - - - - - - - - - - -
* unconnected no read flags
2012-10-16 16:47:06 +02:00
* POLLOUT if port is not congested
2010-08-17 11:00:06 +00:00
*
* connecting POLLIN / POLLRDNORM if ACK / NACK in rx queue
* no write flags
*
* connected POLLIN / POLLRDNORM if data in rx queue
* POLLOUT if port is not congested
*
* disconnecting POLLIN / POLLRDNORM / POLLHUP
* no write flags
*
* listening POLLIN if SYN in rx queue
* no write flags
*
* ready POLLIN / POLLRDNORM if data in rx queue
* [ connectionless ] POLLOUT ( since port cannot be congested )
*
* IMPORTANT : The fact that a read or write operation is indicated does NOT
* imply that the operation will succeed , merely that it should be performed
* and will not block .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static unsigned int tipc_poll ( struct file * file , struct socket * sock ,
poll_table * wait )
2006-01-02 19:04:38 +01:00
{
2008-03-26 16:48:21 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2010-08-17 11:00:06 +00:00
u32 mask = 0 ;
2008-03-26 16:48:21 -07:00
2012-08-21 11:16:57 +08:00
sock_poll_wait ( file , sk_sleep ( sk ) , wait ) ;
2008-03-26 16:48:21 -07:00
2010-08-17 11:00:06 +00:00
switch ( ( int ) sock - > state ) {
2012-10-16 16:47:06 +02:00
case SS_UNCONNECTED :
2014-06-25 20:41:42 -05:00
if ( ! tsk - > link_cong )
2012-10-16 16:47:06 +02:00
mask | = POLLOUT ;
break ;
2010-08-17 11:00:06 +00:00
case SS_READY :
case SS_CONNECTED :
2014-06-25 20:41:42 -05:00
if ( ! tsk - > link_cong & & ! tipc_sk_conn_cong ( tsk ) )
2010-08-17 11:00:06 +00:00
mask | = POLLOUT ;
/* fall thru' */
case SS_CONNECTING :
case SS_LISTENING :
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
mask | = ( POLLIN | POLLRDNORM ) ;
break ;
case SS_DISCONNECTING :
mask = ( POLLIN | POLLRDNORM | POLLHUP ) ;
break ;
}
2008-03-26 16:48:21 -07:00
return mask ;
2006-01-02 19:04:38 +01:00
}
2014-07-16 20:41:01 -04:00
/**
* tipc_sendmcast - send multicast message
* @ sock : socket structure
* @ seq : destination address
* @ iov : message data to send
* @ dsz : total length of message data
* @ timeo : timeout to wait for wakeup
*
* Called from function tipc_sendmsg ( ) , which has done all sanity checks
* Returns the number of bytes sent on success , or errno
*/
static int tipc_sendmcast ( struct socket * sock , struct tipc_name_seq * seq ,
struct iovec * iov , size_t dsz , long timeo )
{
struct sock * sk = sock - > sk ;
struct tipc_msg * mhdr = & tipc_sk ( sk ) - > port . phdr ;
struct sk_buff * buf ;
uint mtu ;
int rc ;
msg_set_type ( mhdr , TIPC_MCAST_MSG ) ;
msg_set_lookup_scope ( mhdr , TIPC_CLUSTER_SCOPE ) ;
msg_set_destport ( mhdr , 0 ) ;
msg_set_destnode ( mhdr , 0 ) ;
msg_set_nametype ( mhdr , seq - > type ) ;
msg_set_namelower ( mhdr , seq - > lower ) ;
msg_set_nameupper ( mhdr , seq - > upper ) ;
msg_set_hdr_sz ( mhdr , MCAST_H_SIZE ) ;
new_mtu :
mtu = tipc_bclink_get_mtu ( ) ;
2014-07-16 20:41:03 -04:00
rc = tipc_msg_build ( mhdr , iov , 0 , dsz , mtu , & buf ) ;
2014-07-16 20:41:01 -04:00
if ( unlikely ( rc < 0 ) )
return rc ;
do {
rc = tipc_bclink_xmit ( buf ) ;
if ( likely ( rc > = 0 ) ) {
rc = dsz ;
break ;
}
if ( rc = = - EMSGSIZE )
goto new_mtu ;
if ( rc ! = - ELINKCONG )
break ;
rc = tipc_wait_for_sndmsg ( sock , & timeo ) ;
if ( rc )
kfree_skb_list ( buf ) ;
} while ( ! rc ) ;
return rc ;
}
2014-07-16 20:41:00 -04:00
/* tipc_sk_mcast_rcv - Deliver multicast message to all destination sockets
*/
void tipc_sk_mcast_rcv ( struct sk_buff * buf )
{
struct tipc_msg * msg = buf_msg ( buf ) ;
struct tipc_port_list dports = { 0 , NULL , } ;
struct tipc_port_list * item ;
struct sk_buff * b ;
uint i , last , dst = 0 ;
u32 scope = TIPC_CLUSTER_SCOPE ;
if ( in_own_node ( msg_orignode ( msg ) ) )
scope = TIPC_NODE_SCOPE ;
/* Create destination port list: */
tipc_nametbl_mc_translate ( msg_nametype ( msg ) ,
msg_namelower ( msg ) ,
msg_nameupper ( msg ) ,
scope ,
& dports ) ;
last = dports . count ;
if ( ! last ) {
kfree_skb ( buf ) ;
return ;
}
for ( item = & dports ; item ; item = item - > next ) {
for ( i = 0 ; i < PLSIZE & & + + dst < = last ; i + + ) {
b = ( dst ! = last ) ? skb_clone ( buf , GFP_ATOMIC ) : buf ;
if ( ! b ) {
pr_warn ( " Failed do clone mcast rcv buffer \n " ) ;
continue ;
}
msg_set_destport ( msg , item - > ports [ i ] ) ;
tipc_sk_rcv ( b ) ;
}
}
tipc_port_list_free ( & dports ) ;
}
2014-06-25 20:41:41 -05:00
/**
* tipc_sk_proto_rcv - receive a connection mng protocol message
* @ tsk : receiving socket
* @ dnode : node to send response message to , if any
* @ buf : buffer containing protocol message
* Returns 0 ( TIPC_OK ) if message was consumed , 1 ( TIPC_FWD_MSG ) if
* ( CONN_PROBE_REPLY ) message should be forwarded .
*/
int tipc_sk_proto_rcv ( struct tipc_sock * tsk , u32 * dnode , struct sk_buff * buf )
{
struct tipc_msg * msg = buf_msg ( buf ) ;
struct tipc_port * port = & tsk - > port ;
2014-06-25 20:41:42 -05:00
int conn_cong ;
2014-06-25 20:41:41 -05:00
/* Ignore if connection cannot be validated: */
if ( ! port - > connected | | ! tipc_port_peer_msg ( port , msg ) )
goto exit ;
port - > probing_state = TIPC_CONN_OK ;
if ( msg_type ( msg ) = = CONN_ACK ) {
2014-06-25 20:41:42 -05:00
conn_cong = tipc_sk_conn_cong ( tsk ) ;
tsk - > sent_unacked - = msg_msgcnt ( msg ) ;
if ( conn_cong )
tipc_sock_wakeup ( tsk ) ;
2014-06-25 20:41:41 -05:00
} else if ( msg_type ( msg ) = = CONN_PROBE ) {
if ( ! tipc_msg_reverse ( buf , dnode , TIPC_OK ) )
return TIPC_OK ;
msg_set_type ( msg , CONN_PROBE_REPLY ) ;
return TIPC_FWD_MSG ;
}
/* Do nothing if msg_type() == CONN_PROBE_REPLY */
exit :
kfree_skb ( buf ) ;
return TIPC_OK ;
}
2007-02-09 23:25:21 +09:00
/**
2006-01-02 19:04:38 +01:00
* dest_name_check - verify user is permitted to send to specified port name
* @ dest : destination address
* @ m : descriptor for message to be sent
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Prevents restricted configuration commands from being issued by
* unauthorized users .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 if permission is granted , otherwise errno
*/
2006-03-20 22:37:04 -08:00
static int dest_name_check ( struct sockaddr_tipc * dest , struct msghdr * m )
2006-01-02 19:04:38 +01:00
{
struct tipc_cfg_msg_hdr hdr ;
2014-06-25 20:41:37 -05:00
if ( unlikely ( dest - > addrtype = = TIPC_ADDR_ID ) )
return 0 ;
2007-02-09 23:25:21 +09:00
if ( likely ( dest - > addr . name . name . type > = TIPC_RESERVED_TYPES ) )
return 0 ;
if ( likely ( dest - > addr . name . name . type = = TIPC_TOP_SRV ) )
return 0 ;
if ( likely ( dest - > addr . name . name . type ! = TIPC_CFG_SRV ) )
return - EACCES ;
2006-01-02 19:04:38 +01:00
2011-01-18 13:09:29 -05:00
if ( ! m - > msg_iovlen | | ( m - > msg_iov [ 0 ] . iov_len < sizeof ( hdr ) ) )
return - EMSGSIZE ;
2007-02-09 23:25:21 +09:00
if ( copy_from_user ( & hdr , m - > msg_iov [ 0 ] . iov_base , sizeof ( hdr ) ) )
2006-01-02 19:04:38 +01:00
return - EFAULT ;
2006-06-25 23:41:47 -07:00
if ( ( ntohs ( hdr . tcm_type ) & 0xC000 ) & & ( ! capable ( CAP_NET_ADMIN ) ) )
2006-01-02 19:04:38 +01:00
return - EACCES ;
2007-02-09 23:25:21 +09:00
2006-01-02 19:04:38 +01:00
return 0 ;
}
2014-01-17 09:50:05 +08:00
static int tipc_wait_for_sndmsg ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-01-17 09:50:05 +08:00
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( sock - > state = = SS_DISCONNECTING )
return - EPIPE ;
if ( ! * timeo_p )
return - EAGAIN ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
2014-06-25 20:41:42 -05:00
done = sk_wait_event ( sk , timeo_p , ! tsk - > link_cong ) ;
2014-01-17 09:50:05 +08:00
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_sendmsg - send message in connectionless manner
2008-04-15 00:22:02 -07:00
* @ iocb : if NULL , indicates that socket lock is already held
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ m : message to send
2014-06-25 20:41:37 -05:00
* @ dsz : amount of user data to be sent
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Message must have an destination specified explicitly .
2007-02-09 23:25:21 +09:00
* Used for SOCK_RDM and SOCK_DGRAM messages ,
2006-01-02 19:04:38 +01:00
* and for ' SYN ' messages on SOCK_SEQPACKET and SOCK_STREAM connections .
* ( Note : ' SYN + ' is prohibited on SOCK_STREAM . )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns the number of bytes sent on success , or errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_sendmsg ( struct kiocb * iocb , struct socket * sock ,
2014-06-25 20:41:37 -05:00
struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2014-06-25 20:41:37 -05:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , dest , m - > msg_name ) ;
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-03-12 11:31:13 -04:00
struct tipc_port * port = & tsk - > port ;
2014-06-25 20:41:37 -05:00
struct tipc_msg * mhdr = & port - > phdr ;
struct iovec * iov = m - > msg_iov ;
u32 dnode , dport ;
struct sk_buff * buf ;
struct tipc_name_seq * seq = & dest - > addr . nameseq ;
u32 mtu ;
2014-01-17 09:50:05 +08:00
long timeo ;
2014-06-25 20:41:37 -05:00
int rc = - EINVAL ;
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! dest ) )
return - EDESTADDRREQ ;
2014-06-25 20:41:37 -05:00
2006-06-25 23:49:06 -07:00
if ( unlikely ( ( m - > msg_namelen < sizeof ( * dest ) ) | |
( dest - > family ! = AF_TIPC ) ) )
2006-01-02 19:04:38 +01:00
return - EINVAL ;
2014-06-25 20:41:37 -05:00
if ( dsz > TIPC_MAX_USER_MSG_SIZE )
2010-04-20 17:58:24 -04:00
return - EMSGSIZE ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
if ( iocb )
lock_sock ( sk ) ;
2014-06-25 20:41:37 -05:00
if ( unlikely ( sock - > state ! = SS_READY ) ) {
2008-04-15 00:22:02 -07:00
if ( sock - > state = = SS_LISTENING ) {
2014-06-25 20:41:37 -05:00
rc = - EPIPE ;
2008-04-15 00:22:02 -07:00
goto exit ;
}
if ( sock - > state ! = SS_UNCONNECTED ) {
2014-06-25 20:41:37 -05:00
rc = - EISCONN ;
2008-04-15 00:22:02 -07:00
goto exit ;
}
2014-03-12 11:31:12 -04:00
if ( tsk - > port . published ) {
2014-06-25 20:41:37 -05:00
rc = - EOPNOTSUPP ;
2008-04-15 00:22:02 -07:00
goto exit ;
}
2006-06-25 23:44:57 -07:00
if ( dest - > addrtype = = TIPC_ADDR_NAME ) {
2014-03-12 11:31:12 -04:00
tsk - > port . conn_type = dest - > addr . name . name . type ;
tsk - > port . conn_instance = dest - > addr . name . name . instance ;
2006-06-25 23:44:57 -07:00
}
2006-01-02 19:04:38 +01:00
}
2014-06-25 20:41:37 -05:00
rc = dest_name_check ( dest , m ) ;
if ( rc )
goto exit ;
2006-01-02 19:04:38 +01:00
2014-01-17 09:50:05 +08:00
timeo = sock_sndtimeo ( sk , m - > msg_flags & MSG_DONTWAIT ) ;
2014-06-25 20:41:37 -05:00
if ( dest - > addrtype = = TIPC_ADDR_MCAST ) {
rc = tipc_sendmcast ( sock , seq , iov , dsz , timeo ) ;
goto exit ;
} else if ( dest - > addrtype = = TIPC_ADDR_NAME ) {
u32 type = dest - > addr . name . name . type ;
u32 inst = dest - > addr . name . name . instance ;
u32 domain = dest - > addr . name . domain ;
dnode = domain ;
msg_set_type ( mhdr , TIPC_NAMED_MSG ) ;
msg_set_hdr_sz ( mhdr , NAMED_H_SIZE ) ;
msg_set_nametype ( mhdr , type ) ;
msg_set_nameinst ( mhdr , inst ) ;
msg_set_lookup_scope ( mhdr , tipc_addr_scope ( domain ) ) ;
dport = tipc_nametbl_translate ( type , inst , & dnode ) ;
msg_set_destnode ( mhdr , dnode ) ;
msg_set_destport ( mhdr , dport ) ;
if ( unlikely ( ! dport & & ! dnode ) ) {
rc = - EHOSTUNREACH ;
goto exit ;
2007-02-09 23:25:21 +09:00
}
2014-06-25 20:41:37 -05:00
} else if ( dest - > addrtype = = TIPC_ADDR_ID ) {
dnode = dest - > addr . id . node ;
msg_set_type ( mhdr , TIPC_DIRECT_MSG ) ;
msg_set_lookup_scope ( mhdr , 0 ) ;
msg_set_destnode ( mhdr , dnode ) ;
msg_set_destport ( mhdr , dest - > addr . id . ref ) ;
msg_set_hdr_sz ( mhdr , BASIC_H_SIZE ) ;
}
new_mtu :
mtu = tipc_node_get_mtu ( dnode , tsk - > port . ref ) ;
2014-07-16 20:41:03 -04:00
rc = tipc_msg_build ( mhdr , iov , 0 , dsz , mtu , & buf ) ;
2014-06-25 20:41:37 -05:00
if ( rc < 0 )
goto exit ;
do {
2014-07-16 20:41:03 -04:00
rc = tipc_link_xmit ( buf , dnode , tsk - > port . ref ) ;
2014-06-25 20:41:37 -05:00
if ( likely ( rc > = 0 ) ) {
if ( sock - > state ! = SS_READY )
2008-04-15 00:22:02 -07:00
sock - > state = SS_CONNECTING ;
2014-06-25 20:41:37 -05:00
rc = dsz ;
2008-04-15 00:22:02 -07:00
break ;
2007-02-09 23:25:21 +09:00
}
2014-06-25 20:41:37 -05:00
if ( rc = = - EMSGSIZE )
goto new_mtu ;
if ( rc ! = - ELINKCONG )
2008-04-15 00:22:02 -07:00
break ;
2014-06-25 20:41:37 -05:00
rc = tipc_wait_for_sndmsg ( sock , & timeo ) ;
2014-07-06 20:38:50 -04:00
if ( rc )
kfree_skb_list ( buf ) ;
2014-06-25 20:41:37 -05:00
} while ( ! rc ) ;
2008-04-15 00:22:02 -07:00
exit :
if ( iocb )
release_sock ( sk ) ;
2014-06-25 20:41:37 -05:00
return rc ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:06 +08:00
static int tipc_wait_for_sndpkt ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-01-17 09:50:06 +08:00
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( sock - > state = = SS_DISCONNECTING )
return - EPIPE ;
else if ( sock - > state ! = SS_CONNECTED )
return - ENOTCONN ;
if ( ! * timeo_p )
return - EAGAIN ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
done = sk_wait_event ( sk , timeo_p ,
2014-06-25 20:41:42 -05:00
( ! tsk - > link_cong & &
! tipc_sk_conn_cong ( tsk ) ) | |
! tsk - > port . connected ) ;
2014-01-17 09:50:06 +08:00
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2007-02-09 23:25:21 +09:00
/**
2014-06-25 20:41:38 -05:00
* tipc_send_stream - send stream - oriented data
* @ iocb : ( unused )
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2014-06-25 20:41:38 -05:00
* @ m : data to send
* @ dsz : total length of data to be transmitted
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Used for SOCK_STREAM data .
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Returns the number of bytes sent on success ( or partial success ) ,
* or errno if no data sent
2006-01-02 19:04:38 +01:00
*/
2014-06-25 20:41:38 -05:00
static int tipc_send_stream ( struct kiocb * iocb , struct socket * sock ,
struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-06-25 20:41:38 -05:00
struct tipc_port * port = & tsk - > port ;
struct tipc_msg * mhdr = & port - > phdr ;
struct sk_buff * buf ;
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , dest , m - > msg_name ) ;
2014-06-25 20:41:38 -05:00
u32 ref = port - > ref ;
int rc = - EINVAL ;
2014-01-17 09:50:06 +08:00
long timeo ;
2014-06-25 20:41:38 -05:00
u32 dnode ;
uint mtu , send , sent = 0 ;
2006-01-02 19:04:38 +01:00
/* Handle implied connection establishment */
2014-06-25 20:41:38 -05:00
if ( unlikely ( dest ) ) {
rc = tipc_sendmsg ( iocb , sock , m , dsz ) ;
if ( dsz & & ( dsz = = rc ) )
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked = 1 ;
2014-06-25 20:41:38 -05:00
return rc ;
}
if ( dsz > ( uint ) INT_MAX )
2010-04-20 17:58:24 -04:00
return - EMSGSIZE ;
2008-04-15 00:22:02 -07:00
if ( iocb )
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2014-01-17 09:50:06 +08:00
if ( unlikely ( sock - > state ! = SS_CONNECTED ) ) {
if ( sock - > state = = SS_DISCONNECTING )
2014-06-25 20:41:38 -05:00
rc = - EPIPE ;
2014-01-17 09:50:06 +08:00
else
2014-06-25 20:41:38 -05:00
rc = - ENOTCONN ;
2014-01-17 09:50:06 +08:00
goto exit ;
}
2011-07-06 05:53:15 -04:00
2014-01-17 09:50:06 +08:00
timeo = sock_sndtimeo ( sk , m - > msg_flags & MSG_DONTWAIT ) ;
2014-06-25 20:41:38 -05:00
dnode = tipc_port_peernode ( port ) ;
next :
mtu = port - > max_pkt ;
send = min_t ( uint , dsz - sent , TIPC_MAX_USER_MSG_SIZE ) ;
2014-07-16 20:41:03 -04:00
rc = tipc_msg_build ( mhdr , m - > msg_iov , sent , send , mtu , & buf ) ;
2014-06-25 20:41:38 -05:00
if ( unlikely ( rc < 0 ) )
goto exit ;
2007-02-09 23:25:21 +09:00
do {
2014-06-25 20:41:42 -05:00
if ( likely ( ! tipc_sk_conn_cong ( tsk ) ) ) {
2014-07-16 20:41:03 -04:00
rc = tipc_link_xmit ( buf , dnode , ref ) ;
2014-06-25 20:41:38 -05:00
if ( likely ( ! rc ) ) {
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked + + ;
2014-06-25 20:41:38 -05:00
sent + = send ;
if ( sent = = dsz )
break ;
goto next ;
}
if ( rc = = - EMSGSIZE ) {
port - > max_pkt = tipc_node_get_mtu ( dnode , ref ) ;
goto next ;
}
if ( rc ! = - ELINKCONG )
break ;
}
rc = tipc_wait_for_sndpkt ( sock , & timeo ) ;
2014-07-06 20:38:50 -04:00
if ( rc )
kfree_skb_list ( buf ) ;
2014-06-25 20:41:38 -05:00
} while ( ! rc ) ;
2014-01-17 09:50:06 +08:00
exit :
2008-04-15 00:22:02 -07:00
if ( iocb )
release_sock ( sk ) ;
2014-06-25 20:41:38 -05:00
return sent ? sent : rc ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-06-25 20:41:38 -05:00
* tipc_send_packet - send a connection - oriented message
* @ iocb : if NULL , indicates that socket lock is already held
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2014-06-25 20:41:38 -05:00
* @ m : message to send
* @ dsz : length of data to be transmitted
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Used for SOCK_SEQPACKET messages .
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Returns the number of bytes sent on success , or errno otherwise
2006-01-02 19:04:38 +01:00
*/
2014-06-25 20:41:38 -05:00
static int tipc_send_packet ( struct kiocb * iocb , struct socket * sock ,
struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2014-06-25 20:41:38 -05:00
if ( dsz > TIPC_MAX_USER_MSG_SIZE )
return - EMSGSIZE ;
2006-01-02 19:04:38 +01:00
2014-06-25 20:41:38 -05:00
return tipc_send_stream ( iocb , sock , m , dsz ) ;
2006-01-02 19:04:38 +01:00
}
/**
* auto_connect - complete connection setup to a remote port
2014-03-12 11:31:12 -04:00
* @ tsk : tipc socket structure
2006-01-02 19:04:38 +01:00
* @ msg : peer ' s response message
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-03-12 11:31:12 -04:00
static int auto_connect ( struct tipc_sock * tsk , struct tipc_msg * msg )
2006-01-02 19:04:38 +01:00
{
2014-03-12 11:31:12 -04:00
struct tipc_port * port = & tsk - > port ;
struct socket * sock = tsk - > sk . sk_socket ;
2014-03-12 11:31:08 -04:00
struct tipc_portid peer ;
peer . ref = msg_origport ( msg ) ;
peer . node = msg_orignode ( msg ) ;
2006-01-02 19:04:38 +01:00
2014-03-12 11:31:12 -04:00
__tipc_port_connect ( port - > ref , port , & peer ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( msg_importance ( msg ) > TIPC_CRITICAL_IMPORTANCE )
return - EINVAL ;
2014-03-12 11:31:12 -04:00
msg_set_importance ( & port - > phdr , ( u32 ) msg_importance ( msg ) ) ;
2006-01-02 19:04:38 +01:00
sock - > state = SS_CONNECTED ;
return 0 ;
}
/**
* set_orig_addr - capture sender ' s address for received message
* @ m : descriptor for message info
* @ msg : received message header
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Note : Address is not captured if not requested by receiver .
*/
2006-03-20 22:37:04 -08:00
static void set_orig_addr ( struct msghdr * m , struct tipc_msg * msg )
2006-01-02 19:04:38 +01:00
{
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , addr , m - > msg_name ) ;
2006-01-02 19:04:38 +01:00
2007-02-09 23:25:21 +09:00
if ( addr ) {
2006-01-02 19:04:38 +01:00
addr - > family = AF_TIPC ;
addr - > addrtype = TIPC_ADDR_ID ;
2013-04-07 01:52:00 +00:00
memset ( & addr - > addr , 0 , sizeof ( addr - > addr ) ) ;
2006-01-02 19:04:38 +01:00
addr - > addr . id . ref = msg_origport ( msg ) ;
addr - > addr . id . node = msg_orignode ( msg ) ;
2010-12-31 18:59:32 +00:00
addr - > addr . name . domain = 0 ; /* could leave uninitialized */
addr - > scope = 0 ; /* could leave uninitialized */
2006-01-02 19:04:38 +01:00
m - > msg_namelen = sizeof ( struct sockaddr_tipc ) ;
}
}
/**
2007-02-09 23:25:21 +09:00
* anc_data_recv - optionally capture ancillary data for received message
2006-01-02 19:04:38 +01:00
* @ m : descriptor for message info
* @ msg : received message header
* @ tport : TIPC port associated with message
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Note : Ancillary data is not captured if not requested by receiver .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 if successful , otherwise errno
*/
2006-03-20 22:37:04 -08:00
static int anc_data_recv ( struct msghdr * m , struct tipc_msg * msg ,
2013-06-17 10:54:47 -04:00
struct tipc_port * tport )
2006-01-02 19:04:38 +01:00
{
u32 anc_data [ 3 ] ;
u32 err ;
u32 dest_type ;
2006-06-25 23:45:24 -07:00
int has_name ;
2006-01-02 19:04:38 +01:00
int res ;
if ( likely ( m - > msg_controllen = = 0 ) )
return 0 ;
/* Optionally capture errored message object(s) */
err = msg ? msg_errcode ( msg ) : 0 ;
if ( unlikely ( err ) ) {
anc_data [ 0 ] = err ;
anc_data [ 1 ] = msg_data_sz ( msg ) ;
2010-12-31 18:59:33 +00:00
res = put_cmsg ( m , SOL_TIPC , TIPC_ERRINFO , 8 , anc_data ) ;
if ( res )
2006-01-02 19:04:38 +01:00
return res ;
2010-12-31 18:59:33 +00:00
if ( anc_data [ 1 ] ) {
res = put_cmsg ( m , SOL_TIPC , TIPC_RETDATA , anc_data [ 1 ] ,
msg_data ( msg ) ) ;
if ( res )
return res ;
}
2006-01-02 19:04:38 +01:00
}
/* Optionally capture message destination object */
dest_type = msg ? msg_type ( msg ) : TIPC_DIRECT_MSG ;
switch ( dest_type ) {
case TIPC_NAMED_MSG :
2006-06-25 23:45:24 -07:00
has_name = 1 ;
2006-01-02 19:04:38 +01:00
anc_data [ 0 ] = msg_nametype ( msg ) ;
anc_data [ 1 ] = msg_namelower ( msg ) ;
anc_data [ 2 ] = msg_namelower ( msg ) ;
break ;
case TIPC_MCAST_MSG :
2006-06-25 23:45:24 -07:00
has_name = 1 ;
2006-01-02 19:04:38 +01:00
anc_data [ 0 ] = msg_nametype ( msg ) ;
anc_data [ 1 ] = msg_namelower ( msg ) ;
anc_data [ 2 ] = msg_nameupper ( msg ) ;
break ;
case TIPC_CONN_MSG :
2006-06-25 23:45:24 -07:00
has_name = ( tport - > conn_type ! = 0 ) ;
2006-01-02 19:04:38 +01:00
anc_data [ 0 ] = tport - > conn_type ;
anc_data [ 1 ] = tport - > conn_instance ;
anc_data [ 2 ] = tport - > conn_instance ;
break ;
default :
2006-06-25 23:45:24 -07:00
has_name = 0 ;
2006-01-02 19:04:38 +01:00
}
2010-12-31 18:59:33 +00:00
if ( has_name ) {
res = put_cmsg ( m , SOL_TIPC , TIPC_DESTNAME , 12 , anc_data ) ;
if ( res )
return res ;
}
2006-01-02 19:04:38 +01:00
return 0 ;
}
2014-05-23 15:55:12 -04:00
static int tipc_wait_for_rcvmsg ( struct socket * sock , long * timeop )
2014-01-17 09:50:07 +08:00
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
2014-05-23 15:55:12 -04:00
long timeo = * timeop ;
2014-01-17 09:50:07 +08:00
int err ;
for ( ; ; ) {
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
2014-03-06 14:40:18 +01:00
if ( timeo & & skb_queue_empty ( & sk - > sk_receive_queue ) ) {
2014-01-17 09:50:07 +08:00
if ( sock - > state = = SS_DISCONNECTING ) {
err = - ENOTCONN ;
break ;
}
release_sock ( sk ) ;
timeo = schedule_timeout ( timeo ) ;
lock_sock ( sk ) ;
}
err = 0 ;
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
break ;
err = sock_intr_errno ( timeo ) ;
if ( signal_pending ( current ) )
break ;
err = - EAGAIN ;
if ( ! timeo )
break ;
}
finish_wait ( sk_sleep ( sk ) , & wait ) ;
2014-05-23 15:55:12 -04:00
* timeop = timeo ;
2014-01-17 09:50:07 +08:00
return err ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_recvmsg - receive packet - oriented message
2006-01-02 19:04:38 +01:00
* @ iocb : ( unused )
* @ m : descriptor for message info
* @ buf_len : total size of user buffer area
* @ flags : receive flags
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Used for SOCK_DGRAM , SOCK_RDM , and SOCK_SEQPACKET messages .
* If the complete message doesn ' t fit in user area , truncate it .
*
* Returns size of returned message data , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_recvmsg ( struct kiocb * iocb , struct socket * sock ,
struct msghdr * m , size_t buf_len , int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
struct tipc_port * port = & tsk - > port ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
struct tipc_msg * msg ;
2014-01-17 09:50:07 +08:00
long timeo ;
2006-01-02 19:04:38 +01:00
unsigned int sz ;
u32 err ;
int res ;
2008-04-15 00:22:02 -07:00
/* Catch invalid receive requests */
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! buf_len ) )
return - EINVAL ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
if ( unlikely ( sock - > state = = SS_UNCONNECTED ) ) {
res = - ENOTCONN ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2014-01-17 09:50:07 +08:00
timeo = sock_rcvtimeo ( sk , flags & MSG_DONTWAIT ) ;
2008-04-15 00:22:02 -07:00
restart :
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look for a message in receive queue; wait if necessary */
2014-05-23 15:55:12 -04:00
res = tipc_wait_for_rcvmsg ( sock , & timeo ) ;
2014-01-17 09:50:07 +08:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look at first message in receive queue */
buf = skb_peek ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
msg = buf_msg ( buf ) ;
sz = msg_data_sz ( msg ) ;
err = msg_errcode ( msg ) ;
/* Discard an empty non-errored message & try again */
if ( ( ! sz ) & & ( ! err ) ) {
2008-04-15 00:22:02 -07:00
advance_rx_queue ( sk ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
/* Capture sender's address (optional) */
set_orig_addr ( m , msg ) ;
/* Capture ancillary data (optional) */
2014-03-12 11:31:12 -04:00
res = anc_data_recv ( m , msg , port ) ;
2008-04-15 00:22:02 -07:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
/* Capture message data (if valid) & compute return value (always) */
if ( ! err ) {
if ( unlikely ( buf_len < sz ) ) {
sz = buf_len ;
m - > msg_flags | = MSG_TRUNC ;
}
2011-02-21 09:45:40 -05:00
res = skb_copy_datagram_iovec ( buf , msg_hdr_sz ( msg ) ,
m - > msg_iov , sz ) ;
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
res = sz ;
} else {
if ( ( sock - > state = = SS_READY ) | |
( ( err = = TIPC_CONN_SHUTDOWN ) | | m - > msg_control ) )
res = 0 ;
else
res = - ECONNRESET ;
}
/* Consume received message (optional) */
if ( likely ( ! ( flags & MSG_PEEK ) ) ) {
2008-04-15 00:06:12 -07:00
if ( ( sock - > state ! = SS_READY ) & &
2014-06-25 20:41:42 -05:00
( + + tsk - > rcv_unacked > = TIPC_CONNACK_INTV ) ) {
tipc_acknowledge ( port - > ref , tsk - > rcv_unacked ) ;
tsk - > rcv_unacked = 0 ;
}
2008-04-15 00:22:02 -07:00
advance_rx_queue ( sk ) ;
2007-02-09 23:25:21 +09:00
}
2006-01-02 19:04:38 +01:00
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_recv_stream - receive stream - oriented data
2006-01-02 19:04:38 +01:00
* @ iocb : ( unused )
* @ m : descriptor for message info
* @ buf_len : total size of user buffer area
* @ flags : receive flags
2007-02-09 23:25:21 +09:00
*
* Used for SOCK_STREAM messages only . If not enough data is available
2006-01-02 19:04:38 +01:00
* will optionally wait for more ; never truncates data .
*
* Returns size of returned message data , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_recv_stream ( struct kiocb * iocb , struct socket * sock ,
struct msghdr * m , size_t buf_len , int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
struct tipc_port * port = & tsk - > port ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
struct tipc_msg * msg ;
2014-01-17 09:50:07 +08:00
long timeo ;
2006-01-02 19:04:38 +01:00
unsigned int sz ;
2010-08-17 11:00:04 +00:00
int sz_to_copy , target , needed ;
2006-01-02 19:04:38 +01:00
int sz_copied = 0 ;
u32 err ;
2008-04-15 00:22:02 -07:00
int res = 0 ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Catch invalid receive attempts */
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! buf_len ) )
return - EINVAL ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2014-01-17 09:50:07 +08:00
if ( unlikely ( sock - > state = = SS_UNCONNECTED ) ) {
2008-04-15 00:22:02 -07:00
res = - ENOTCONN ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2010-08-17 11:00:04 +00:00
target = sock_rcvlowat ( sk , flags & MSG_WAITALL , buf_len ) ;
2014-01-17 09:50:07 +08:00
timeo = sock_rcvtimeo ( sk , flags & MSG_DONTWAIT ) ;
2006-01-02 19:04:38 +01:00
2012-04-30 15:29:02 -04:00
restart :
2008-04-15 00:22:02 -07:00
/* Look for a message in receive queue; wait if necessary */
2014-05-23 15:55:12 -04:00
res = tipc_wait_for_rcvmsg ( sock , & timeo ) ;
2014-01-17 09:50:07 +08:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look at first message in receive queue */
buf = skb_peek ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
msg = buf_msg ( buf ) ;
sz = msg_data_sz ( msg ) ;
err = msg_errcode ( msg ) ;
/* Discard an empty non-errored message & try again */
if ( ( ! sz ) & & ( ! err ) ) {
2008-04-15 00:22:02 -07:00
advance_rx_queue ( sk ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
/* Optionally capture sender's address & ancillary data of first msg */
if ( sz_copied = = 0 ) {
set_orig_addr ( m , msg ) ;
2014-03-12 11:31:12 -04:00
res = anc_data_recv ( m , msg , port ) ;
2008-04-15 00:22:02 -07:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
}
/* Capture message data (if valid) & compute return value (always) */
if ( ! err ) {
2011-02-21 09:45:40 -05:00
u32 offset = ( u32 ) ( unsigned long ) ( TIPC_SKB_CB ( buf ) - > handle ) ;
2006-01-02 19:04:38 +01:00
2011-02-21 09:45:40 -05:00
sz - = offset ;
2006-01-02 19:04:38 +01:00
needed = ( buf_len - sz_copied ) ;
sz_to_copy = ( sz < = needed ) ? sz : needed ;
2011-02-21 09:45:40 -05:00
res = skb_copy_datagram_iovec ( buf , msg_hdr_sz ( msg ) + offset ,
m - > msg_iov , sz_to_copy ) ;
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
2011-02-21 09:45:40 -05:00
2006-01-02 19:04:38 +01:00
sz_copied + = sz_to_copy ;
if ( sz_to_copy < sz ) {
if ( ! ( flags & MSG_PEEK ) )
2011-02-21 09:45:40 -05:00
TIPC_SKB_CB ( buf ) - > handle =
( void * ) ( unsigned long ) ( offset + sz_to_copy ) ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
} else {
if ( sz_copied ! = 0 )
goto exit ; /* can't add error msg to valid data */
if ( ( err = = TIPC_CONN_SHUTDOWN ) | | m - > msg_control )
res = 0 ;
else
res = - ECONNRESET ;
}
/* Consume received message (optional) */
if ( likely ( ! ( flags & MSG_PEEK ) ) ) {
2014-06-25 20:41:42 -05:00
if ( unlikely ( + + tsk - > rcv_unacked > = TIPC_CONNACK_INTV ) ) {
tipc_acknowledge ( port - > ref , tsk - > rcv_unacked ) ;
tsk - > rcv_unacked = 0 ;
}
2008-04-15 00:22:02 -07:00
advance_rx_queue ( sk ) ;
2007-02-09 23:25:21 +09:00
}
2006-01-02 19:04:38 +01:00
/* Loop around if more data is required */
2009-11-29 16:55:45 -08:00
if ( ( sz_copied < buf_len ) & & /* didn't get all requested data */
( ! skb_queue_empty ( & sk - > sk_receive_queue ) | |
2010-08-17 11:00:04 +00:00
( sz_copied < target ) ) & & /* and more is ready or required */
2009-11-29 16:55:45 -08:00
( ! ( flags & MSG_PEEK ) ) & & /* and aren't just peeking at data */
( ! err ) ) /* and haven't reached a FIN */
2006-01-02 19:04:38 +01:00
goto restart ;
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-06-25 23:48:22 -07:00
return sz_copied ? sz_copied : res ;
2006-01-02 19:04:38 +01:00
}
2012-08-21 11:16:57 +08:00
/**
* tipc_write_space - wake up thread if port congestion is released
* @ sk : socket
*/
static void tipc_write_space ( struct sock * sk )
{
struct socket_wq * wq ;
rcu_read_lock ( ) ;
wq = rcu_dereference ( sk - > sk_wq ) ;
if ( wq_has_sleeper ( wq ) )
wake_up_interruptible_sync_poll ( & wq - > wait , POLLOUT |
POLLWRNORM | POLLWRBAND ) ;
rcu_read_unlock ( ) ;
}
/**
* tipc_data_ready - wake up threads to indicate messages have been received
* @ sk : socket
* @ len : the length of messages
*/
2014-04-11 16:15:36 -04:00
static void tipc_data_ready ( struct sock * sk )
2012-08-21 11:16:57 +08:00
{
struct socket_wq * wq ;
rcu_read_lock ( ) ;
wq = rcu_dereference ( sk - > sk_wq ) ;
if ( wq_has_sleeper ( wq ) )
wake_up_interruptible_sync_poll ( & wq - > wait , POLLIN |
POLLRDNORM | POLLRDBAND ) ;
rcu_read_unlock ( ) ;
}
2012-11-29 18:39:14 -05:00
/**
* filter_connect - Handle all incoming messages for a connection - based socket
2014-03-12 11:31:12 -04:00
* @ tsk : TIPC socket
2012-11-29 18:39:14 -05:00
* @ msg : message
*
2014-06-25 20:41:31 -05:00
* Returns 0 ( TIPC_OK ) if everyting ok , - TIPC_ERR_NO_PORT otherwise
2012-11-29 18:39:14 -05:00
*/
2014-06-25 20:41:31 -05:00
static int filter_connect ( struct tipc_sock * tsk , struct sk_buff * * buf )
2012-11-29 18:39:14 -05:00
{
2014-03-12 11:31:12 -04:00
struct sock * sk = & tsk - > sk ;
struct tipc_port * port = & tsk - > port ;
2014-03-12 11:31:09 -04:00
struct socket * sock = sk - > sk_socket ;
2012-11-29 18:39:14 -05:00
struct tipc_msg * msg = buf_msg ( * buf ) ;
2014-03-12 11:31:09 -04:00
2014-06-25 20:41:31 -05:00
int retval = - TIPC_ERR_NO_PORT ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
int res ;
2012-11-29 18:39:14 -05:00
if ( msg_mcast ( msg ) )
return retval ;
switch ( ( int ) sock - > state ) {
case SS_CONNECTED :
/* Accept only connection-based messages sent by peer */
2014-03-12 11:31:09 -04:00
if ( msg_connected ( msg ) & & tipc_port_peer_msg ( port , msg ) ) {
2012-11-29 18:39:14 -05:00
if ( unlikely ( msg_errcode ( msg ) ) ) {
sock - > state = SS_DISCONNECTING ;
2014-03-12 11:31:09 -04:00
__tipc_port_disconnect ( port ) ;
2012-11-29 18:39:14 -05:00
}
retval = TIPC_OK ;
}
break ;
case SS_CONNECTING :
/* Accept only ACK or NACK message */
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( unlikely ( msg_errcode ( msg ) ) ) {
sock - > state = SS_DISCONNECTING ;
tipc: set sk_err correctly when connection fails
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28 09:29:58 +02:00
sk - > sk_err = ECONNREFUSED ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
retval = TIPC_OK ;
break ;
}
if ( unlikely ( ! msg_connected ( msg ) ) )
break ;
2014-03-12 11:31:12 -04:00
res = auto_connect ( tsk , msg ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( res ) {
sock - > state = SS_DISCONNECTING ;
tipc: set sk_err correctly when connection fails
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28 09:29:58 +02:00
sk - > sk_err = - res ;
2012-11-29 18:39:14 -05:00
retval = TIPC_OK ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
break ;
}
/* If an incoming message is an 'ACK-', it should be
* discarded here because it doesn ' t contain useful
* data . In addition , we should try to wake up
* connect ( ) routine if sleeping .
*/
if ( msg_data_sz ( msg ) = = 0 ) {
kfree_skb ( * buf ) ;
* buf = NULL ;
if ( waitqueue_active ( sk_sleep ( sk ) ) )
wake_up_interruptible ( sk_sleep ( sk ) ) ;
}
retval = TIPC_OK ;
2012-11-29 18:39:14 -05:00
break ;
case SS_LISTENING :
case SS_UNCONNECTED :
/* Accept only SYN message */
if ( ! msg_connected ( msg ) & & ! ( msg_errcode ( msg ) ) )
retval = TIPC_OK ;
break ;
case SS_DISCONNECTING :
break ;
default :
pr_err ( " Unknown socket state %u \n " , sock - > state ) ;
}
return retval ;
}
2013-01-20 23:30:09 +01:00
/**
* rcvbuf_limit - get proper overload limit of socket receive queue
* @ sk : socket
* @ buf : message
*
* For all connection oriented messages , irrespective of importance ,
* the default overload value ( i . e . 67 MB ) is set as limit .
*
* For all connectionless messages , by default new queue limits are
* as belows :
*
2013-06-17 10:54:37 -04:00
* TIPC_LOW_IMPORTANCE ( 4 MB )
* TIPC_MEDIUM_IMPORTANCE ( 8 MB )
* TIPC_HIGH_IMPORTANCE ( 16 MB )
* TIPC_CRITICAL_IMPORTANCE ( 32 MB )
2013-01-20 23:30:09 +01:00
*
* Returns overload limit according to corresponding message importance
*/
static unsigned int rcvbuf_limit ( struct sock * sk , struct sk_buff * buf )
{
struct tipc_msg * msg = buf_msg ( buf ) ;
if ( msg_connected ( msg ) )
2013-12-12 09:36:39 +08:00
return sysctl_tipc_rmem [ 2 ] ;
return sk - > sk_rcvbuf > > TIPC_CRITICAL_IMPORTANCE < <
msg_importance ( msg ) ;
2013-01-20 23:30:09 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2008-04-15 00:22:02 -07:00
* filter_rcv - validate incoming message
* @ sk : socket
2006-01-02 19:04:38 +01:00
* @ buf : message
2007-02-09 23:25:21 +09:00
*
2008-04-15 00:22:02 -07:00
* Enqueues message on receive queue if acceptable ; optionally handles
* disconnect indication for a connected socket .
*
* Called with socket lock already taken ; port lock may also be taken .
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:31 -05:00
* Returns 0 ( TIPC_OK ) if message was consumed , - TIPC error code if message
2014-06-25 20:41:41 -05:00
* to be rejected , 1 ( TIPC_FWD_MSG ) if ( CONN_MANAGER ) message to be forwarded
2006-01-02 19:04:38 +01:00
*/
2014-06-25 20:41:31 -05:00
static int filter_rcv ( struct sock * sk , struct sk_buff * buf )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct socket * sock = sk - > sk_socket ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2006-01-02 19:04:38 +01:00
struct tipc_msg * msg = buf_msg ( buf ) ;
2013-01-20 23:30:09 +01:00
unsigned int limit = rcvbuf_limit ( sk , buf ) ;
2014-06-25 20:41:41 -05:00
u32 onode ;
2014-06-25 20:41:31 -05:00
int rc = TIPC_OK ;
2006-01-02 19:04:38 +01:00
2014-06-25 20:41:41 -05:00
if ( unlikely ( msg_user ( msg ) = = CONN_MANAGER ) )
return tipc_sk_proto_rcv ( tsk , & onode , buf ) ;
2014-06-25 20:41:40 -05:00
2006-01-02 19:04:38 +01:00
/* Reject message if it is wrong sort of message for socket */
2012-04-26 18:13:08 -04:00
if ( msg_type ( msg ) > TIPC_DIRECT_MSG )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_NO_PORT ;
2008-04-15 00:22:02 -07:00
2006-01-02 19:04:38 +01:00
if ( sock - > state = = SS_READY ) {
2010-12-31 18:59:25 +00:00
if ( msg_connected ( msg ) )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_NO_PORT ;
2006-01-02 19:04:38 +01:00
} else {
2014-06-25 20:41:31 -05:00
rc = filter_connect ( tsk , & buf ) ;
if ( rc ! = TIPC_OK | | buf = = NULL )
return rc ;
2006-01-02 19:04:38 +01:00
}
/* Reject message if there isn't room to queue it */
2013-01-20 23:30:09 +01:00
if ( sk_rmem_alloc_get ( sk ) + buf - > truesize > = limit )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_OVERLOAD ;
2006-01-02 19:04:38 +01:00
2013-01-20 23:30:09 +01:00
/* Enqueue message */
2013-10-18 07:23:16 +02:00
TIPC_SKB_CB ( buf ) - > handle = NULL ;
2008-04-15 00:22:02 -07:00
__skb_queue_tail ( & sk - > sk_receive_queue , buf ) ;
2013-01-20 23:30:09 +01:00
skb_set_owner_r ( buf , sk ) ;
2008-04-15 00:22:02 -07:00
2014-04-11 16:15:36 -04:00
sk - > sk_data_ready ( sk ) ;
2008-04-15 00:22:02 -07:00
return TIPC_OK ;
}
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/**
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
* tipc_backlog_rcv - handle incoming message from backlog queue
2008-04-15 00:22:02 -07:00
* @ sk : socket
* @ buf : message
*
* Caller must hold socket lock , but not port lock .
*
* Returns 0
*/
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
static int tipc_backlog_rcv ( struct sock * sk , struct sk_buff * buf )
2008-04-15 00:22:02 -07:00
{
2014-06-25 20:41:31 -05:00
int rc ;
2014-06-25 20:41:35 -05:00
u32 onode ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-06-09 11:08:18 -05:00
uint truesize = buf - > truesize ;
2008-04-15 00:22:02 -07:00
2014-06-25 20:41:31 -05:00
rc = filter_rcv ( sk , buf ) ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
2014-06-25 20:41:41 -05:00
if ( likely ( ! rc ) ) {
if ( atomic_read ( & tsk - > dupl_rcvcnt ) < TIPC_CONN_OVERLOAD_LIMIT )
atomic_add ( truesize , & tsk - > dupl_rcvcnt ) ;
return 0 ;
}
if ( ( rc < 0 ) & & ! tipc_msg_reverse ( buf , & onode , - rc ) )
return 0 ;
2014-07-16 20:41:03 -04:00
tipc_link_xmit ( buf , onode , 0 ) ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
2008-04-15 00:22:02 -07:00
return 0 ;
}
/**
2014-03-12 11:31:10 -04:00
* tipc_sk_rcv - handle incoming message
2014-05-14 05:39:15 -04:00
* @ buf : buffer containing arriving message
* Consumes buffer
* Returns 0 if success , or errno : - EHOSTUNREACH
2008-04-15 00:22:02 -07:00
*/
2014-05-14 05:39:15 -04:00
int tipc_sk_rcv ( struct sk_buff * buf )
2008-04-15 00:22:02 -07:00
{
2014-05-14 05:39:15 -04:00
struct tipc_sock * tsk ;
struct tipc_port * port ;
struct sock * sk ;
u32 dport = msg_destport ( buf_msg ( buf ) ) ;
2014-06-25 20:41:31 -05:00
int rc = TIPC_OK ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
uint limit ;
2014-06-25 20:41:35 -05:00
u32 dnode ;
2014-05-14 05:39:15 -04:00
tipc: introduce message evaluation function
When a message arrives in a node and finds no destination
socket, we may need to drop it, reject it, or forward it after
a secondary destination lookup. The latter two cases currently
results in a code path that is perceived as complex, because it
follows a deep call chain via obscure functions such as
net_route_named_msg() and net_route_msg().
We now introduce a function, tipc_msg_eval(), that takes the
decision about whether such a message should be rejected or
forwarded, but leaves it to the caller to actually perform
the indicated action.
If the decision is 'reject', it is still the task of the recently
introduced function tipc_msg_reverse() to take the final decision
about whether the message is rejectable or not. In the latter case
it drops the message.
As a result of this change, we can finally eliminate the function
net_route_named_msg(), and hence become independent of net_route_msg().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:41:36 -05:00
/* Validate destination and message */
2014-05-14 05:39:15 -04:00
port = tipc_port_lock ( dport ) ;
if ( unlikely ( ! port ) ) {
tipc: introduce message evaluation function
When a message arrives in a node and finds no destination
socket, we may need to drop it, reject it, or forward it after
a secondary destination lookup. The latter two cases currently
results in a code path that is perceived as complex, because it
follows a deep call chain via obscure functions such as
net_route_named_msg() and net_route_msg().
We now introduce a function, tipc_msg_eval(), that takes the
decision about whether such a message should be rejected or
forwarded, but leaves it to the caller to actually perform
the indicated action.
If the decision is 'reject', it is still the task of the recently
introduced function tipc_msg_reverse() to take the final decision
about whether the message is rejectable or not. In the latter case
it drops the message.
As a result of this change, we can finally eliminate the function
net_route_named_msg(), and hence become independent of net_route_msg().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:41:36 -05:00
rc = tipc_msg_eval ( buf , & dnode ) ;
2014-05-14 05:39:15 -04:00
goto exit ;
}
tsk = tipc_port_to_sock ( port ) ;
sk = & tsk - > sk ;
/* Queue message */
2008-04-15 00:22:02 -07:00
bh_lock_sock ( sk ) ;
2014-05-14 05:39:15 -04:00
2008-04-15 00:22:02 -07:00
if ( ! sock_owned_by_user ( sk ) ) {
2014-06-25 20:41:31 -05:00
rc = filter_rcv ( sk , buf ) ;
2008-04-15 00:22:02 -07:00
} else {
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
if ( sk - > sk_backlog . len = = 0 )
atomic_set ( & tsk - > dupl_rcvcnt , 0 ) ;
limit = rcvbuf_limit ( sk , buf ) + atomic_read ( & tsk - > dupl_rcvcnt ) ;
if ( sk_add_backlog ( sk , buf , limit ) )
2014-06-25 20:41:31 -05:00
rc = - TIPC_ERR_OVERLOAD ;
2008-04-15 00:22:02 -07:00
}
bh_unlock_sock ( sk ) ;
2014-05-14 05:39:15 -04:00
tipc_port_unlock ( port ) ;
2008-04-15 00:22:02 -07:00
2014-06-25 20:41:31 -05:00
if ( likely ( ! rc ) )
2014-05-14 05:39:15 -04:00
return 0 ;
exit :
tipc: introduce message evaluation function
When a message arrives in a node and finds no destination
socket, we may need to drop it, reject it, or forward it after
a secondary destination lookup. The latter two cases currently
results in a code path that is perceived as complex, because it
follows a deep call chain via obscure functions such as
net_route_named_msg() and net_route_msg().
We now introduce a function, tipc_msg_eval(), that takes the
decision about whether such a message should be rejected or
forwarded, but leaves it to the caller to actually perform
the indicated action.
If the decision is 'reject', it is still the task of the recently
introduced function tipc_msg_reverse() to take the final decision
about whether the message is rejectable or not. In the latter case
it drops the message.
As a result of this change, we can finally eliminate the function
net_route_named_msg(), and hence become independent of net_route_msg().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:41:36 -05:00
if ( ( rc < 0 ) & & ! tipc_msg_reverse ( buf , & dnode , - rc ) )
2014-06-25 20:41:35 -05:00
return - EHOSTUNREACH ;
tipc: introduce message evaluation function
When a message arrives in a node and finds no destination
socket, we may need to drop it, reject it, or forward it after
a secondary destination lookup. The latter two cases currently
results in a code path that is perceived as complex, because it
follows a deep call chain via obscure functions such as
net_route_named_msg() and net_route_msg().
We now introduce a function, tipc_msg_eval(), that takes the
decision about whether such a message should be rejected or
forwarded, but leaves it to the caller to actually perform
the indicated action.
If the decision is 'reject', it is still the task of the recently
introduced function tipc_msg_reverse() to take the final decision
about whether the message is rejectable or not. In the latter case
it drops the message.
As a result of this change, we can finally eliminate the function
net_route_named_msg(), and hence become independent of net_route_msg().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:41:36 -05:00
2014-07-16 20:41:03 -04:00
tipc_link_xmit ( buf , dnode , 0 ) ;
tipc: introduce message evaluation function
When a message arrives in a node and finds no destination
socket, we may need to drop it, reject it, or forward it after
a secondary destination lookup. The latter two cases currently
results in a code path that is perceived as complex, because it
follows a deep call chain via obscure functions such as
net_route_named_msg() and net_route_msg().
We now introduce a function, tipc_msg_eval(), that takes the
decision about whether such a message should be rejected or
forwarded, but leaves it to the caller to actually perform
the indicated action.
If the decision is 'reject', it is still the task of the recently
introduced function tipc_msg_reverse() to take the final decision
about whether the message is rejectable or not. In the latter case
it drops the message.
As a result of this change, we can finally eliminate the function
net_route_named_msg(), and hence become independent of net_route_msg().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:41:36 -05:00
return ( rc < 0 ) ? - EHOSTUNREACH : 0 ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:03 +08:00
static int tipc_wait_for_connect ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( ! * timeo_p )
return - ETIMEDOUT ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
done = sk_wait_event ( sk , timeo_p , sock - > state ! = SS_CONNECTING ) ;
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_connect - establish a connection to another TIPC port
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ dest : socket address for destination port
* @ destlen : size of socket address data structure
2008-04-15 00:22:02 -07:00
* @ flags : file - related flags associated with socket
2006-01-02 19:04:38 +01:00
*
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_connect ( struct socket * sock , struct sockaddr * dest ,
int destlen , int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2008-04-15 00:20:37 -07:00
struct sockaddr_tipc * dst = ( struct sockaddr_tipc * ) dest ;
struct msghdr m = { NULL , } ;
2014-01-17 09:50:03 +08:00
long timeout = ( flags & O_NONBLOCK ) ? 0 : tipc_sk ( sk ) - > conn_timeout ;
socket_state previous ;
2008-04-15 00:20:37 -07:00
int res ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2008-04-15 00:20:37 -07:00
/* For now, TIPC does not allow use of connect() with DGRAM/RDM types */
2008-04-15 00:22:02 -07:00
if ( sock - > state = = SS_READY ) {
res = - EOPNOTSUPP ;
goto exit ;
}
2008-04-15 00:20:37 -07:00
/*
* Reject connection attempt using multicast address
*
* Note : send_msg ( ) validates the rest of the address fields ,
* so there ' s no need to do it here
*/
2008-04-15 00:22:02 -07:00
if ( dst - > addrtype = = TIPC_ADDR_MCAST ) {
res = - EINVAL ;
goto exit ;
}
2014-01-17 09:50:03 +08:00
previous = sock - > state ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
switch ( sock - > state ) {
case SS_UNCONNECTED :
/* Send a 'SYN-' to destination */
m . msg_name = dest ;
m . msg_namelen = destlen ;
/* If connect is in non-blocking case, set MSG_DONTWAIT to
* indicate send_msg ( ) is never blocked .
*/
if ( ! timeout )
m . msg_flags = MSG_DONTWAIT ;
2014-02-18 16:06:46 +08:00
res = tipc_sendmsg ( NULL , sock , & m , 0 ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( ( res < 0 ) & & ( res ! = - EWOULDBLOCK ) )
goto exit ;
/* Just entered SS_CONNECTING state; the only
* difference is that return value in non - blocking
* case is EINPROGRESS , rather than EALREADY .
*/
res = - EINPROGRESS ;
case SS_CONNECTING :
2014-01-17 09:50:03 +08:00
if ( previous = = SS_CONNECTING )
res = - EALREADY ;
if ( ! timeout )
goto exit ;
timeout = msecs_to_jiffies ( timeout ) ;
/* Wait until an 'ACK' or 'RST' arrives, or a timeout occurs */
res = tipc_wait_for_connect ( sock , & timeout ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
break ;
case SS_CONNECTED :
res = - EISCONN ;
break ;
default :
res = - EINVAL ;
2014-01-17 09:50:03 +08:00
break ;
2008-04-15 00:20:37 -07:00
}
2008-04-15 00:22:02 -07:00
exit :
release_sock ( sk ) ;
2008-04-15 00:20:37 -07:00
return res ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_listen - allow socket to listen for incoming connections
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ len : ( unused )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_listen ( struct socket * sock , int len )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
int res ;
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2011-07-06 06:01:13 -04:00
if ( sock - > state ! = SS_UNCONNECTED )
2008-04-15 00:22:02 -07:00
res = - EINVAL ;
else {
sock - > state = SS_LISTENING ;
res = 0 ;
}
release_sock ( sk ) ;
return res ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:04 +08:00
static int tipc_wait_for_accept ( struct socket * sock , long timeo )
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
int err ;
/* True wake-one mechanism for incoming connections: only
* one process gets woken up , not the ' whole herd ' .
* Since we do not ' race & poll ' for established sockets
* anymore , the common case will execute the loop only once .
*/
for ( ; ; ) {
prepare_to_wait_exclusive ( sk_sleep ( sk ) , & wait ,
TASK_INTERRUPTIBLE ) ;
2014-03-06 14:40:18 +01:00
if ( timeo & & skb_queue_empty ( & sk - > sk_receive_queue ) ) {
2014-01-17 09:50:04 +08:00
release_sock ( sk ) ;
timeo = schedule_timeout ( timeo ) ;
lock_sock ( sk ) ;
}
err = 0 ;
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
break ;
err = - EINVAL ;
if ( sock - > state ! = SS_LISTENING )
break ;
err = sock_intr_errno ( timeo ) ;
if ( signal_pending ( current ) )
break ;
err = - EAGAIN ;
if ( ! timeo )
break ;
}
finish_wait ( sk_sleep ( sk ) , & wait ) ;
return err ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_accept - wait for connection request
2006-01-02 19:04:38 +01:00
* @ sock : listening socket
* @ newsock : new socket that is to be connected
* @ flags : file - related flags associated with socket
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_accept ( struct socket * sock , struct socket * new_sock , int flags )
2006-01-02 19:04:38 +01:00
{
2012-12-04 11:01:55 -05:00
struct sock * new_sk , * sk = sock - > sk ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
2014-03-12 11:31:09 -04:00
struct tipc_port * new_port ;
2012-12-04 11:01:55 -05:00
struct tipc_msg * msg ;
2014-03-12 11:31:08 -04:00
struct tipc_portid peer ;
2012-12-04 11:01:55 -05:00
u32 new_ref ;
2014-01-17 09:50:04 +08:00
long timeo ;
2008-04-15 00:22:02 -07:00
int res ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
if ( sock - > state ! = SS_LISTENING ) {
res = - EINVAL ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2014-01-17 09:50:04 +08:00
timeo = sock_rcvtimeo ( sk , flags & O_NONBLOCK ) ;
res = tipc_wait_for_accept ( sock , timeo ) ;
if ( res )
goto exit ;
2008-04-15 00:22:02 -07:00
buf = skb_peek ( & sk - > sk_receive_queue ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
res = tipc_sk_create ( sock_net ( sock - > sk ) , new_sock , 0 , 1 ) ;
2012-12-04 11:01:55 -05:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2012-12-04 11:01:55 -05:00
new_sk = new_sock - > sk ;
2014-03-12 11:31:12 -04:00
new_port = & tipc_sk ( new_sk ) - > port ;
2014-03-12 11:31:09 -04:00
new_ref = new_port - > ref ;
2012-12-04 11:01:55 -05:00
msg = buf_msg ( buf ) ;
2006-01-02 19:04:38 +01:00
2012-12-04 11:01:55 -05:00
/* we lock on new_sk; but lockdep sees the lock on sk */
lock_sock_nested ( new_sk , SINGLE_DEPTH_NESTING ) ;
/*
* Reject any stray messages received by new socket
* before the socket lock was taken ( very , very unlikely )
*/
reject_rx_queue ( new_sk ) ;
/* Connect new socket to it's peer */
2014-03-12 11:31:08 -04:00
peer . ref = msg_origport ( msg ) ;
peer . node = msg_orignode ( msg ) ;
tipc_port_connect ( new_ref , & peer ) ;
2012-12-04 11:01:55 -05:00
new_sock - > state = SS_CONNECTED ;
2014-03-12 11:31:11 -04:00
tipc_port_set_importance ( new_port , msg_importance ( msg ) ) ;
2012-12-04 11:01:55 -05:00
if ( msg_named ( msg ) ) {
2014-03-12 11:31:09 -04:00
new_port - > conn_type = msg_nametype ( msg ) ;
new_port - > conn_instance = msg_nameinst ( msg ) ;
2006-01-02 19:04:38 +01:00
}
2012-12-04 11:01:55 -05:00
/*
* Respond to ' SYN - ' by discarding it & returning ' ACK ' - .
* Respond to ' SYN + ' by queuing it on new socket .
*/
if ( ! msg_data_sz ( msg ) ) {
struct msghdr m = { NULL , } ;
advance_rx_queue ( sk ) ;
2014-02-18 16:06:46 +08:00
tipc_send_packet ( NULL , new_sock , & m , 0 ) ;
2012-12-04 11:01:55 -05:00
} else {
__skb_dequeue ( & sk - > sk_receive_queue ) ;
__skb_queue_head ( & new_sk - > sk_receive_queue , buf ) ;
2013-01-20 23:30:09 +01:00
skb_set_owner_r ( buf , new_sk ) ;
2012-12-04 11:01:55 -05:00
}
release_sock ( new_sk ) ;
2006-01-02 19:04:38 +01:00
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
/**
2014-02-18 16:06:46 +08:00
* tipc_shutdown - shutdown socket connection
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2008-03-06 15:05:38 -08:00
* @ how : direction to close ( must be SHUT_RDWR )
2006-01-02 19:04:38 +01:00
*
* Terminates connection ( if necessary ) , then purges socket ' s receive queue .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_shutdown ( struct socket * sock , int how )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
struct tipc_port * port = & tsk - > port ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
2014-06-25 20:41:35 -05:00
u32 peer ;
2006-01-02 19:04:38 +01:00
int res ;
2008-03-06 15:05:38 -08:00
if ( how ! = SHUT_RDWR )
return - EINVAL ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
switch ( sock - > state ) {
2008-04-15 00:22:02 -07:00
case SS_CONNECTING :
2006-01-02 19:04:38 +01:00
case SS_CONNECTED :
restart :
2012-04-30 15:29:02 -04:00
/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
2008-04-15 00:22:02 -07:00
buf = __skb_dequeue ( & sk - > sk_receive_queue ) ;
if ( buf ) {
2013-10-18 07:23:16 +02:00
if ( TIPC_SKB_CB ( buf ) - > handle ! = NULL ) {
2011-11-04 13:24:29 -04:00
kfree_skb ( buf ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
2014-03-12 11:31:12 -04:00
tipc_port_disconnect ( port - > ref ) ;
2014-06-25 20:41:35 -05:00
if ( tipc_msg_reverse ( buf , & peer , TIPC_CONN_SHUTDOWN ) )
2014-07-16 20:41:03 -04:00
tipc_link_xmit ( buf , peer , 0 ) ;
2008-04-15 00:22:02 -07:00
} else {
2014-03-12 11:31:12 -04:00
tipc_port_shutdown ( port - > ref ) ;
2006-01-02 19:04:38 +01:00
}
2008-04-15 00:22:02 -07:00
sock - > state = SS_DISCONNECTING ;
2006-01-02 19:04:38 +01:00
/* fall through */
case SS_DISCONNECTING :
2012-10-29 09:38:15 -04:00
/* Discard any unreceived messages */
2013-01-20 23:30:08 +01:00
__skb_queue_purge ( & sk - > sk_receive_queue ) ;
2012-10-29 09:38:15 -04:00
/* Wake up anyone sleeping in poll */
sk - > sk_state_change ( sk ) ;
2006-01-02 19:04:38 +01:00
res = 0 ;
break ;
default :
res = - ENOTCONN ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
/**
2014-02-18 16:06:46 +08:00
* tipc_setsockopt - set socket option
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ lvl : option level
* @ opt : option identifier
* @ ov : pointer to new option value
* @ ol : length of option value
2007-02-09 23:25:21 +09:00
*
* For stream sockets only , accepts and ignores all IPPROTO_TCP options
2006-01-02 19:04:38 +01:00
* ( to ease compatibility ) .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_setsockopt ( struct socket * sock , int lvl , int opt ,
char __user * ov , unsigned int ol )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
struct tipc_port * port = & tsk - > port ;
2006-01-02 19:04:38 +01:00
u32 value ;
int res ;
2007-02-09 23:25:21 +09:00
if ( ( lvl = = IPPROTO_TCP ) & & ( sock - > type = = SOCK_STREAM ) )
return 0 ;
2006-01-02 19:04:38 +01:00
if ( lvl ! = SOL_TIPC )
return - ENOPROTOOPT ;
if ( ol < sizeof ( value ) )
return - EINVAL ;
2010-12-31 18:59:33 +00:00
res = get_user ( value , ( u32 __user * ) ov ) ;
if ( res )
2006-01-02 19:04:38 +01:00
return res ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2007-02-09 23:25:21 +09:00
2006-01-02 19:04:38 +01:00
switch ( opt ) {
case TIPC_IMPORTANCE :
2014-03-12 11:31:12 -04:00
tipc_port_set_importance ( port , value ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_SRC_DROPPABLE :
if ( sock - > type ! = SOCK_STREAM )
2014-03-12 11:31:12 -04:00
tipc_port_set_unreliable ( port , value ) ;
2007-02-09 23:25:21 +09:00
else
2006-01-02 19:04:38 +01:00
res = - ENOPROTOOPT ;
break ;
case TIPC_DEST_DROPPABLE :
2014-03-12 11:31:12 -04:00
tipc_port_set_unreturnable ( port , value ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_CONN_TIMEOUT :
2011-05-26 13:44:34 -04:00
tipc_sk ( sk ) - > conn_timeout = value ;
2008-04-15 00:22:02 -07:00
/* no need to set "res", since already 0 at this point */
2006-01-02 19:04:38 +01:00
break ;
default :
res = - EINVAL ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
/**
2014-02-18 16:06:46 +08:00
* tipc_getsockopt - get socket option
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ lvl : option level
* @ opt : option identifier
* @ ov : receptacle for option value
* @ ol : receptacle for length of option value
2007-02-09 23:25:21 +09:00
*
* For stream sockets only , returns 0 length result for all IPPROTO_TCP options
2006-01-02 19:04:38 +01:00
* ( to ease compatibility ) .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_getsockopt ( struct socket * sock , int lvl , int opt ,
char __user * ov , int __user * ol )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
struct tipc_port * port = & tsk - > port ;
2007-02-09 23:25:21 +09:00
int len ;
2006-01-02 19:04:38 +01:00
u32 value ;
2007-02-09 23:25:21 +09:00
int res ;
2006-01-02 19:04:38 +01:00
2007-02-09 23:25:21 +09:00
if ( ( lvl = = IPPROTO_TCP ) & & ( sock - > type = = SOCK_STREAM ) )
return put_user ( 0 , ol ) ;
2006-01-02 19:04:38 +01:00
if ( lvl ! = SOL_TIPC )
return - ENOPROTOOPT ;
2010-12-31 18:59:33 +00:00
res = get_user ( len , ol ) ;
if ( res )
2007-02-09 23:25:21 +09:00
return res ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
switch ( opt ) {
case TIPC_IMPORTANCE :
2014-03-12 11:31:12 -04:00
value = tipc_port_importance ( port ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_SRC_DROPPABLE :
2014-03-12 11:31:12 -04:00
value = tipc_port_unreliable ( port ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_DEST_DROPPABLE :
2014-03-12 11:31:12 -04:00
value = tipc_port_unreturnable ( port ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_CONN_TIMEOUT :
2011-05-26 13:44:34 -04:00
value = tipc_sk ( sk ) - > conn_timeout ;
2008-04-15 00:22:02 -07:00
/* no need to set "res", since already 0 at this point */
2006-01-02 19:04:38 +01:00
break ;
2010-12-31 18:59:32 +00:00
case TIPC_NODE_RECVQ_DEPTH :
tipc: eliminate aggregate sk_receive_queue limit
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.
This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.
Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.
As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-27 06:15:27 -05:00
value = 0 ; /* was tipc_queue_size, now obsolete */
2009-06-30 03:25:39 +00:00
break ;
2010-12-31 18:59:32 +00:00
case TIPC_SOCK_RECVQ_DEPTH :
2009-06-30 03:25:39 +00:00
value = skb_queue_len ( & sk - > sk_receive_queue ) ;
break ;
2006-01-02 19:04:38 +01:00
default :
res = - EINVAL ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2010-12-31 18:59:31 +00:00
if ( res )
return res ; /* "get" failed */
2006-01-02 19:04:38 +01:00
2010-12-31 18:59:31 +00:00
if ( len < sizeof ( value ) )
return - EINVAL ;
if ( copy_to_user ( ov , & value , sizeof ( value ) ) )
return - EFAULT ;
return put_user ( sizeof ( value ) , ol ) ;
2006-01-02 19:04:38 +01:00
}
2014-04-24 16:26:47 +02:00
int tipc_ioctl ( struct socket * sk , unsigned int cmd , unsigned long arg )
{
struct tipc_sioc_ln_req lnr ;
void __user * argp = ( void __user * ) arg ;
switch ( cmd ) {
case SIOCGETLINKNAME :
if ( copy_from_user ( & lnr , argp , sizeof ( lnr ) ) )
return - EFAULT ;
if ( ! tipc_node_get_linkname ( lnr . bearer_id , lnr . peer ,
lnr . linkname , TIPC_MAX_LINK_NAME ) ) {
if ( copy_to_user ( argp , & lnr , sizeof ( lnr ) ) )
return - EFAULT ;
return 0 ;
}
return - EADDRNOTAVAIL ;
default :
return - ENOIOCTLCMD ;
}
}
2012-07-10 10:55:35 +00:00
/* Protocol switches for the various types of TIPC sockets */
2008-02-07 18:18:01 -08:00
static const struct proto_ops msg_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2011-07-06 06:01:13 -04:00
. accept = sock_no_accept ,
2014-02-18 16:06:46 +08:00
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2011-07-06 06:01:13 -04:00
. listen = sock_no_listen ,
2014-02-18 16:06:46 +08:00
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_sendmsg ,
. recvmsg = tipc_recvmsg ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct proto_ops packet_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2014-02-18 16:06:46 +08:00
. accept = tipc_accept ,
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2014-02-18 16:06:46 +08:00
. listen = tipc_listen ,
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_send_packet ,
. recvmsg = tipc_recvmsg ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct proto_ops stream_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2014-02-18 16:06:46 +08:00
. accept = tipc_accept ,
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2014-02-18 16:06:46 +08:00
. listen = tipc_listen ,
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_send_stream ,
. recvmsg = tipc_recv_stream ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct net_proto_family tipc_family_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
. create = tipc_sk_create
2006-01-02 19:04:38 +01:00
} ;
static struct proto tipc_proto = {
. name = " TIPC " ,
. owner = THIS_MODULE ,
2013-06-17 10:54:37 -04:00
. obj_size = sizeof ( struct tipc_sock ) ,
. sysctl_rmem = sysctl_tipc_rmem
2006-01-02 19:04:38 +01:00
} ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
static struct proto tipc_proto_kern = {
. name = " TIPC " ,
. obj_size = sizeof ( struct tipc_sock ) ,
. sysctl_rmem = sysctl_tipc_rmem
} ;
2006-01-02 19:04:38 +01:00
/**
2006-01-18 00:38:21 +01:00
* tipc_socket_init - initialize TIPC socket interface
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2006-01-18 00:38:21 +01:00
int tipc_socket_init ( void )
2006-01-02 19:04:38 +01:00
{
int res ;
2007-02-09 23:25:21 +09:00
res = proto_register ( & tipc_proto , 1 ) ;
2006-01-02 19:04:38 +01:00
if ( res ) {
2012-06-29 00:16:37 -04:00
pr_err ( " Failed to register TIPC protocol type \n " ) ;
2006-01-02 19:04:38 +01:00
goto out ;
}
res = sock_register ( & tipc_family_ops ) ;
if ( res ) {
2012-06-29 00:16:37 -04:00
pr_err ( " Failed to register TIPC socket type \n " ) ;
2006-01-02 19:04:38 +01:00
proto_unregister ( & tipc_proto ) ;
goto out ;
}
out :
return res ;
}
/**
2006-01-18 00:38:21 +01:00
* tipc_socket_stop - stop TIPC socket interface
2006-01-02 19:04:38 +01:00
*/
2006-01-18 00:38:21 +01:00
void tipc_socket_stop ( void )
2006-01-02 19:04:38 +01:00
{
sock_unregister ( tipc_family_ops . family ) ;
proto_unregister ( & tipc_proto ) ;
}