2006-01-02 19:04:38 +01:00
/*
2014-06-09 11:08:18 -05:00
* net / tipc / socket . c : TIPC socket API
2007-02-09 23:25:21 +09:00
*
2015-02-05 08:36:43 -05:00
* Copyright ( c ) 2001 - 2007 , 2012 - 2015 , Ericsson AB
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
* Copyright ( c ) 2004 - 2008 , 2010 - 2013 , Wind River Systems
2006-01-02 19:04:38 +01:00
* All rights reserved .
*
2006-01-11 13:30:43 +01:00
* Redistribution and use in source and binary forms , with or without
2006-01-02 19:04:38 +01:00
* modification , are permitted provided that the following conditions are met :
*
2006-01-11 13:30:43 +01:00
* 1. Redistributions of source code must retain the above copyright
* notice , this list of conditions and the following disclaimer .
* 2. Redistributions in binary form must reproduce the above copyright
* notice , this list of conditions and the following disclaimer in the
* documentation and / or other materials provided with the distribution .
* 3. Neither the names of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission .
2006-01-02 19:04:38 +01:00
*
2006-01-11 13:30:43 +01:00
* Alternatively , this software may be distributed under the terms of the
* GNU General Public License ( " GPL " ) version 2 as published by the Free
* Software Foundation .
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS " AS IS "
* AND ANY EXPRESS OR IMPLIED WARRANTIES , INCLUDING , BUT NOT LIMITED TO , THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT , INDIRECT , INCIDENTAL , SPECIAL , EXEMPLARY , OR
* CONSEQUENTIAL DAMAGES ( INCLUDING , BUT NOT LIMITED TO , PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE , DATA , OR PROFITS ; OR BUSINESS
* INTERRUPTION ) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY , WHETHER IN
* CONTRACT , STRICT LIABILITY , OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE )
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE , EVEN IF ADVISED OF THE
2006-01-02 19:04:38 +01:00
* POSSIBILITY OF SUCH DAMAGE .
*/
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
# include <linux/rhashtable.h>
# include <linux/jhash.h>
2006-01-02 19:04:38 +01:00
# include "core.h"
2014-06-25 20:41:37 -05:00
# include "name_table.h"
2014-04-24 16:26:47 +02:00
# include "node.h"
2014-06-25 20:41:37 -05:00
# include "link.h"
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
# include "name_distr.h"
2014-08-22 18:09:18 -04:00
# include "socket.h"
2012-06-29 00:16:37 -04:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
# define SS_LISTENING -1 /* socket is listening */
# define SS_READY -2 /* socket is connectionless */
2006-01-02 19:04:38 +01:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
# define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
2015-01-09 15:27:00 +08:00
# define CONN_PROBING_INTERVAL msecs_to_jiffies(3600000) /* [ms] => 1 h */
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
# define TIPC_FWD_MSG 1
# define TIPC_CONN_OK 0
# define TIPC_CONN_PROBING 1
# define TIPC_MAX_PORT 0xffffffff
# define TIPC_MIN_PORT 1
2014-08-22 18:09:20 -04:00
/**
* struct tipc_sock - TIPC socket structure
* @ sk : socket - interacts with ' port ' and with user via the socket API
* @ connected : non - zero if port is currently connected to a peer port
* @ conn_type : TIPC type used when connection was established
* @ conn_instance : TIPC instance used when connection was established
* @ published : non - zero if port has one or more associated names
* @ max_pkt : maximum packet size " hint " used when building messages sent by port
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
* @ portid : unique port identity in TIPC socket hash table
2014-08-22 18:09:20 -04:00
* @ phdr : preformatted message header used when sending messages
* @ port_list : adjacent ports in TIPC ' s global list of ports
* @ publications : list of publications for port
* @ pub_count : total # of publications port has made during its lifetime
* @ probing_state :
2015-01-09 15:27:00 +08:00
* @ probing_intv :
2014-08-22 18:09:20 -04:00
* @ conn_timeout : the time we can wait for an unresponded setup request
* @ dupl_rcvcnt : number of bytes counted twice , in both backlog and rcv queue
* @ link_cong : non - zero if owner must sleep because of link congestion
* @ sent_unacked : # messages sent by socket , and not yet acked by peer
* @ rcv_unacked : # messages read by user , but not yet acked back to peer
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
* @ node : hash table node
* @ rcu : rcu struct for tipc_sock
2014-08-22 18:09:20 -04:00
*/
struct tipc_sock {
struct sock sk ;
int connected ;
u32 conn_type ;
u32 conn_instance ;
int published ;
u32 max_pkt ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
u32 portid ;
2014-08-22 18:09:20 -04:00
struct tipc_msg phdr ;
struct list_head sock_list ;
struct list_head publications ;
u32 pub_count ;
u32 probing_state ;
2015-01-09 15:27:00 +08:00
unsigned long probing_intv ;
2014-08-22 18:09:20 -04:00
uint conn_timeout ;
atomic_t dupl_rcvcnt ;
bool link_cong ;
uint sent_unacked ;
uint rcv_unacked ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
struct rhash_head node ;
struct rcu_head rcu ;
2014-08-22 18:09:20 -04:00
} ;
2006-01-02 19:04:38 +01:00
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
static int tipc_backlog_rcv ( struct sock * sk , struct sk_buff * skb ) ;
2014-04-11 16:15:36 -04:00
static void tipc_data_ready ( struct sock * sk ) ;
2012-08-21 11:16:57 +08:00
static void tipc_write_space ( struct sock * sk ) ;
2014-02-18 16:06:46 +08:00
static int tipc_release ( struct socket * sock ) ;
static int tipc_accept ( struct socket * sock , struct socket * new_sock , int flags ) ;
2014-07-16 20:41:01 -04:00
static int tipc_wait_for_sndmsg ( struct socket * sock , long * timeo_p ) ;
2015-01-09 15:27:02 +08:00
static void tipc_sk_timeout ( unsigned long data ) ;
2014-08-22 18:09:20 -04:00
static int tipc_sk_publish ( struct tipc_sock * tsk , uint scope ,
2014-08-22 18:09:17 -04:00
struct tipc_name_seq const * seq ) ;
2014-08-22 18:09:20 -04:00
static int tipc_sk_withdraw ( struct tipc_sock * tsk , uint scope ,
2014-08-22 18:09:17 -04:00
struct tipc_name_seq const * seq ) ;
2015-01-09 15:27:08 +08:00
static struct tipc_sock * tipc_sk_lookup ( struct net * net , u32 portid ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
static int tipc_sk_insert ( struct tipc_sock * tsk ) ;
static void tipc_sk_remove ( struct tipc_sock * tsk ) ;
2015-03-02 15:37:47 +08:00
static int __tipc_send_stream ( struct socket * sock , struct msghdr * m ,
size_t dsz ) ;
static int __tipc_sendmsg ( struct socket * sock , struct msghdr * m , size_t dsz ) ;
2006-01-02 19:04:38 +01:00
2008-02-07 18:18:01 -08:00
static const struct proto_ops packet_ops ;
static const struct proto_ops stream_ops ;
static const struct proto_ops msg_ops ;
2006-01-02 19:04:38 +01:00
static struct proto tipc_proto ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
static struct proto tipc_proto_kern ;
2006-01-02 19:04:38 +01:00
2014-11-20 10:29:11 +01:00
static const struct nla_policy tipc_nl_sock_policy [ TIPC_NLA_SOCK_MAX + 1 ] = {
[ TIPC_NLA_SOCK_UNSPEC ] = { . type = NLA_UNSPEC } ,
[ TIPC_NLA_SOCK_ADDR ] = { . type = NLA_U32 } ,
[ TIPC_NLA_SOCK_REF ] = { . type = NLA_U32 } ,
[ TIPC_NLA_SOCK_CON ] = { . type = NLA_NESTED } ,
[ TIPC_NLA_SOCK_HAS_PUBL ] = { . type = NLA_FLAG }
} ;
2007-02-09 23:25:21 +09:00
/*
2008-04-15 00:22:02 -07:00
* Revised TIPC socket locking policy :
*
* Most socket operations take the standard socket lock when they start
* and hold it until they finish ( or until they need to sleep ) . Acquiring
* this lock grants the owner exclusive access to the fields of the socket
* data structures , with the exception of the backlog queue . A few socket
* operations can be done without taking the socket lock because they only
* read socket information that never changes during the life of the socket .
*
* Socket operations may acquire the lock for the associated TIPC port if they
* need to perform an operation on the port . If any routine needs to acquire
* both the socket lock and the port lock it must take the socket lock first
* to avoid the risk of deadlock .
*
* The dispatcher handling incoming messages cannot grab the socket lock in
* the standard fashion , since invoked it runs at the BH level and cannot block .
* Instead , it checks to see if the socket lock is currently owned by someone ,
* and either handles the message itself or adds it to the socket ' s backlog
* queue ; in the latter case the queued message is processed once the process
* owning the socket lock releases it .
*
* NOTE : Releasing the socket lock while an operation is sleeping overcomes
* the problem of a blocked socket operation preventing any other operations
* from occurring . However , applications must be careful if they have
* multiple threads trying to send ( or receive ) on the same socket , as these
* operations might interfere with each other . For example , doing a connect
* and a receive at the same time might allow the receive to consume the
* ACK message meant for the connect . While additional work could be done
* to try and overcome this , it doesn ' t seem to be worthwhile at the present .
*
* NOTE : Releasing the socket lock while an operation is sleeping also ensures
* that another operation that must be performed in a non - blocking manner is
* not delayed for very long because the lock has already been taken .
*
* NOTE : This code assumes that certain fields of a port / socket pair are
* constant over its lifetime ; such fields can be examined without taking
* the socket lock and / or port lock , and do not need to be re - read even
* after resuming processing after waiting . These fields include :
* - socket type
* - pointer to socket sk structure ( aka tipc_sock structure )
* - pointer to port structure
* - port reference
*/
2015-02-05 08:36:36 -05:00
static u32 tsk_own_node ( struct tipc_sock * tsk )
{
return msg_prevnode ( & tsk - > phdr ) ;
}
2014-08-22 18:09:20 -04:00
static u32 tsk_peer_node ( struct tipc_sock * tsk )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
return msg_destnode ( & tsk - > phdr ) ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static u32 tsk_peer_port ( struct tipc_sock * tsk )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
return msg_destport ( & tsk - > phdr ) ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static bool tsk_unreliable ( struct tipc_sock * tsk )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
return msg_src_droppable ( & tsk - > phdr ) ! = 0 ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static void tsk_set_unreliable ( struct tipc_sock * tsk , bool unreliable )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
msg_set_src_droppable ( & tsk - > phdr , unreliable ? 1 : 0 ) ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static bool tsk_unreturnable ( struct tipc_sock * tsk )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
return msg_dest_droppable ( & tsk - > phdr ) ! = 0 ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static void tsk_set_unreturnable ( struct tipc_sock * tsk , bool unreturnable )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
msg_set_dest_droppable ( & tsk - > phdr , unreturnable ? 1 : 0 ) ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static int tsk_importance ( struct tipc_sock * tsk )
2014-08-22 18:09:18 -04:00
{
2014-08-22 18:09:20 -04:00
return msg_importance ( & tsk - > phdr ) ;
2014-08-22 18:09:18 -04:00
}
2014-08-22 18:09:20 -04:00
static int tsk_set_importance ( struct tipc_sock * tsk , int imp )
2014-08-22 18:09:18 -04:00
{
if ( imp > TIPC_CRITICAL_IMPORTANCE )
return - EINVAL ;
2014-08-22 18:09:20 -04:00
msg_set_importance ( & tsk - > phdr , ( u32 ) imp ) ;
2014-08-22 18:09:18 -04:00
return 0 ;
}
2014-03-12 11:31:09 -04:00
2014-08-22 18:09:20 -04:00
static struct tipc_sock * tipc_sk ( const struct sock * sk )
{
return container_of ( sk , struct tipc_sock , sk ) ;
}
static int tsk_conn_cong ( struct tipc_sock * tsk )
{
return tsk - > sent_unacked > = TIPC_FLOWCTRL_WIN ;
}
2008-04-15 00:22:02 -07:00
/**
2014-08-22 18:09:18 -04:00
* tsk_advance_rx_queue - discard first buffer in socket receive queue
2008-04-15 00:22:02 -07:00
*
* Caller must hold socket lock
2006-01-02 19:04:38 +01:00
*/
2014-08-22 18:09:18 -04:00
static void tsk_advance_rx_queue ( struct sock * sk )
2006-01-02 19:04:38 +01:00
{
2011-11-04 13:24:29 -04:00
kfree_skb ( __skb_dequeue ( & sk - > sk_receive_queue ) ) ;
2006-01-02 19:04:38 +01:00
}
/**
2014-08-22 18:09:18 -04:00
* tsk_rej_rx_queue - reject all buffers in socket receive queue
2008-04-15 00:22:02 -07:00
*
* Caller must hold socket lock
2006-01-02 19:04:38 +01:00
*/
2014-08-22 18:09:18 -04:00
static void tsk_rej_rx_queue ( struct sock * sk )
2006-01-02 19:04:38 +01:00
{
2014-11-26 11:41:55 +08:00
struct sk_buff * skb ;
2014-06-25 20:41:35 -05:00
u32 dnode ;
2015-02-05 08:36:36 -05:00
u32 own_node = tsk_own_node ( tipc_sk ( sk ) ) ;
2008-04-15 00:22:02 -07:00
2014-11-26 11:41:55 +08:00
while ( ( skb = __skb_dequeue ( & sk - > sk_receive_queue ) ) ) {
2015-02-05 08:36:36 -05:00
if ( tipc_msg_reverse ( own_node , skb , & dnode , TIPC_ERR_NO_PORT ) )
tipc_link_xmit_skb ( sock_net ( sk ) , skb , dnode , 0 ) ;
2014-06-25 20:41:35 -05:00
}
2006-01-02 19:04:38 +01:00
}
2014-08-22 18:09:18 -04:00
/* tsk_peer_msg - verify if message was sent by connected port's peer
2014-08-22 18:09:17 -04:00
*
* Handles cases where the node ' s network address has changed from
* the default of < 0.0 .0 > to its configured setting .
*/
2014-08-22 18:09:18 -04:00
static bool tsk_peer_msg ( struct tipc_sock * tsk , struct tipc_msg * msg )
2014-08-22 18:09:17 -04:00
{
2015-01-09 15:27:10 +08:00
struct tipc_net * tn = net_generic ( sock_net ( & tsk - > sk ) , tipc_net_id ) ;
2014-08-22 18:09:20 -04:00
u32 peer_port = tsk_peer_port ( tsk ) ;
2014-08-22 18:09:17 -04:00
u32 orig_node ;
u32 peer_node ;
2014-08-22 18:09:20 -04:00
if ( unlikely ( ! tsk - > connected ) )
2014-08-22 18:09:17 -04:00
return false ;
if ( unlikely ( msg_origport ( msg ) ! = peer_port ) )
return false ;
orig_node = msg_orignode ( msg ) ;
2014-08-22 18:09:20 -04:00
peer_node = tsk_peer_node ( tsk ) ;
2014-08-22 18:09:17 -04:00
if ( likely ( orig_node = = peer_node ) )
return true ;
2015-01-09 15:27:10 +08:00
if ( ! orig_node & & ( peer_node = = tn - > own_addr ) )
2014-08-22 18:09:17 -04:00
return true ;
2015-01-09 15:27:10 +08:00
if ( ! peer_node & & ( orig_node = = tn - > own_addr ) )
2014-08-22 18:09:17 -04:00
return true ;
return false ;
}
2006-01-02 19:04:38 +01:00
/**
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
* tipc_sk_create - create a TIPC socket
2008-04-15 00:22:02 -07:00
* @ net : network namespace ( must be default network )
2006-01-02 19:04:38 +01:00
* @ sock : pre - allocated socket structure
* @ protocol : protocol indicator ( must be 0 )
2009-11-05 22:18:14 -08:00
* @ kern : caused by kernel or by userspace ?
2007-02-09 23:25:21 +09:00
*
2008-04-15 00:22:02 -07:00
* This routine creates additional data structures used by the TIPC socket ,
* initializes them , and links them together .
2006-01-02 19:04:38 +01:00
*
* Returns 0 on success , errno otherwise
*/
2014-03-12 11:31:12 -04:00
static int tipc_sk_create ( struct net * net , struct socket * sock ,
int protocol , int kern )
2006-01-02 19:04:38 +01:00
{
2015-02-05 08:36:36 -05:00
struct tipc_net * tn ;
2008-04-15 00:22:02 -07:00
const struct proto_ops * ops ;
socket_state state ;
2006-01-02 19:04:38 +01:00
struct sock * sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk ;
2014-08-22 18:09:13 -04:00
struct tipc_msg * msg ;
2008-04-15 00:22:02 -07:00
/* Validate arguments */
2006-01-02 19:04:38 +01:00
if ( unlikely ( protocol ! = 0 ) )
return - EPROTONOSUPPORT ;
switch ( sock - > type ) {
case SOCK_STREAM :
2008-04-15 00:22:02 -07:00
ops = & stream_ops ;
state = SS_UNCONNECTED ;
2006-01-02 19:04:38 +01:00
break ;
case SOCK_SEQPACKET :
2008-04-15 00:22:02 -07:00
ops = & packet_ops ;
state = SS_UNCONNECTED ;
2006-01-02 19:04:38 +01:00
break ;
case SOCK_DGRAM :
case SOCK_RDM :
2008-04-15 00:22:02 -07:00
ops = & msg_ops ;
state = SS_READY ;
2006-01-02 19:04:38 +01:00
break ;
2006-06-25 23:47:18 -07:00
default :
return - EPROTOTYPE ;
2006-01-02 19:04:38 +01:00
}
2008-04-15 00:22:02 -07:00
/* Allocate socket's protocol area */
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
if ( ! kern )
sk = sk_alloc ( net , AF_TIPC , GFP_KERNEL , & tipc_proto ) ;
else
sk = sk_alloc ( net , AF_TIPC , GFP_KERNEL , & tipc_proto_kern ) ;
2008-04-15 00:22:02 -07:00
if ( sk = = NULL )
2006-01-02 19:04:38 +01:00
return - ENOMEM ;
2014-03-12 11:31:12 -04:00
tsk = tipc_sk ( sk ) ;
2014-08-22 18:09:20 -04:00
tsk - > max_pkt = MAX_PKT_DEFAULT ;
INIT_LIST_HEAD ( & tsk - > publications ) ;
msg = & tsk - > phdr ;
2015-02-05 08:36:36 -05:00
tn = net_generic ( sock_net ( sk ) , tipc_net_id ) ;
tipc_msg_init ( tn - > own_addr , msg , TIPC_LOW_IMPORTANCE , TIPC_NAMED_MSG ,
2014-08-22 18:09:13 -04:00
NAMED_H_SIZE , 0 ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Finish initializing socket data structures */
sock - > ops = ops ;
sock - > state = state ;
sock_init_data ( sock , sk ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
if ( tipc_sk_insert ( tsk ) ) {
pr_warn ( " Socket create failed; port numbrer exhausted \n " ) ;
return - EINVAL ;
}
msg_set_origport ( msg , tsk - > portid ) ;
2015-01-13 17:07:48 +08:00
setup_timer ( & sk - > sk_timer , tipc_sk_timeout , ( unsigned long ) tsk ) ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
sk - > sk_backlog_rcv = tipc_backlog_rcv ;
2013-06-17 10:54:37 -04:00
sk - > sk_rcvbuf = sysctl_tipc_rmem [ 1 ] ;
2012-08-21 11:16:57 +08:00
sk - > sk_data_ready = tipc_data_ready ;
sk - > sk_write_space = tipc_write_space ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
tsk - > conn_timeout = CONN_TIMEOUT_DEFAULT ;
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked = 0 ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
atomic_set ( & tsk - > dupl_rcvcnt , 0 ) ;
2008-05-12 15:42:28 -07:00
2008-04-15 00:22:02 -07:00
if ( sock - > state = = SS_READY ) {
2014-08-22 18:09:20 -04:00
tsk_set_unreturnable ( tsk , true ) ;
2008-04-15 00:22:02 -07:00
if ( sock - > type = = SOCK_DGRAM )
2014-08-22 18:09:20 -04:00
tsk_set_unreliable ( tsk , true ) ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
return 0 ;
}
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
/**
* tipc_sock_create_local - create TIPC socket from inside TIPC module
* @ type : socket type - SOCK_RDM or SOCK_SEQPACKET
*
* We cannot use sock_creat_kern here because it bumps module user count .
* Since socket owner and creator is the same module we must make sure
* that module count remains zero for module local sockets , otherwise
* we cannot do rmmod .
*
* Returns 0 on success , errno otherwise
*/
2015-01-09 15:27:11 +08:00
int tipc_sock_create_local ( struct net * net , int type , struct socket * * res )
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
{
int rc ;
rc = sock_create_lite ( AF_TIPC , type , 0 , res ) ;
if ( rc < 0 ) {
pr_err ( " Failed to create kernel socket \n " ) ;
return rc ;
}
2015-01-09 15:27:11 +08:00
tipc_sk_create ( net , * res , 0 , 1 ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
return 0 ;
}
/**
* tipc_sock_release_local - release socket created by tipc_sock_create_local
* @ sock : the socket to be released .
*
* Module reference count is not incremented when such sockets are created ,
* so we must keep it from being decremented when they are released .
*/
void tipc_sock_release_local ( struct socket * sock )
{
2014-02-18 16:06:46 +08:00
tipc_release ( sock ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
sock - > ops = NULL ;
sock_release ( sock ) ;
}
/**
* tipc_sock_accept_local - accept a connection on a socket created
* with tipc_sock_create_local . Use this function to avoid that
* module reference count is inadvertently incremented .
*
* @ sock : the accepting socket
* @ newsock : reference to the new socket to be created
* @ flags : socket flags
*/
int tipc_sock_accept_local ( struct socket * sock , struct socket * * newsock ,
2013-06-17 10:54:47 -04:00
int flags )
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
{
struct sock * sk = sock - > sk ;
int ret ;
ret = sock_create_lite ( sk - > sk_family , sk - > sk_type ,
sk - > sk_protocol , newsock ) ;
if ( ret < 0 )
return ret ;
2014-02-18 16:06:46 +08:00
ret = tipc_accept ( sock , * newsock , flags ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
if ( ret < 0 ) {
sock_release ( * newsock ) ;
return ret ;
}
( * newsock ) - > ops = sock - > ops ;
return ret ;
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
static void tipc_sk_callback ( struct rcu_head * head )
{
struct tipc_sock * tsk = container_of ( head , struct tipc_sock , rcu ) ;
sock_put ( & tsk - > sk ) ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_release - destroy a TIPC socket
2006-01-02 19:04:38 +01:00
* @ sock : socket to destroy
*
* This routine cleans up any messages that are still queued on the socket .
* For DGRAM and RDM socket types , all queued messages are rejected .
* For SEQPACKET and STREAM socket types , the first message is rejected
* and any others are discarded . ( If the first message on a STREAM socket
* is partially - read , it is discarded and the next one is rejected instead . )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* NOTE : Rejected messages are not necessarily returned to the sender ! They
* are returned or discarded according to the " destination droppable " setting
* specified for the message by the sender .
*
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_release ( struct socket * sock )
2006-01-02 19:04:38 +01:00
{
struct sock * sk = sock - > sk ;
2015-01-13 12:46:41 -05:00
struct net * net ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk ;
2014-11-26 11:41:55 +08:00
struct sk_buff * skb ;
2015-01-09 15:27:02 +08:00
u32 dnode , probing_state ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/*
* Exit if socket isn ' t fully initialized ( occurs when a failed accept ( )
* releases a pre - allocated child socket that was never used )
*/
if ( sk = = NULL )
2006-01-02 19:04:38 +01:00
return 0 ;
2007-02-09 23:25:21 +09:00
2015-01-13 12:46:41 -05:00
net = sock_net ( sk ) ;
2014-03-12 11:31:12 -04:00
tsk = tipc_sk ( sk ) ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
/*
* Reject all unreceived messages , except on an active connection
* ( which disconnects locally & sends a ' FIN + ' to peer )
*/
2014-08-22 18:09:20 -04:00
dnode = tsk_peer_node ( tsk ) ;
2006-01-02 19:04:38 +01:00
while ( sock - > state ! = SS_DISCONNECTING ) {
2014-11-26 11:41:55 +08:00
skb = __skb_dequeue ( & sk - > sk_receive_queue ) ;
if ( skb = = NULL )
2006-01-02 19:04:38 +01:00
break ;
2014-11-26 11:41:55 +08:00
if ( TIPC_SKB_CB ( skb ) - > handle ! = NULL )
kfree_skb ( skb ) ;
2008-04-15 00:22:02 -07:00
else {
if ( ( sock - > state = = SS_CONNECTING ) | |
( sock - > state = = SS_CONNECTED ) ) {
sock - > state = SS_DISCONNECTING ;
2014-08-22 18:09:20 -04:00
tsk - > connected = 0 ;
2015-01-09 15:27:05 +08:00
tipc_node_remove_conn ( net , dnode , tsk - > portid ) ;
2008-04-15 00:22:02 -07:00
}
2015-02-05 08:36:36 -05:00
if ( tipc_msg_reverse ( tsk_own_node ( tsk ) , skb , & dnode ,
2015-01-09 15:27:10 +08:00
TIPC_ERR_NO_PORT ) )
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( net , skb , dnode , 0 ) ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
}
2014-08-22 18:09:20 -04:00
tipc_sk_withdraw ( tsk , 0 , NULL ) ;
2015-01-09 15:27:02 +08:00
probing_state = tsk - > probing_state ;
2015-01-13 17:07:48 +08:00
if ( del_timer_sync ( & sk - > sk_timer ) & &
probing_state ! = TIPC_CONN_PROBING )
2015-01-09 15:27:02 +08:00
sock_put ( sk ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
tipc_sk_remove ( tsk ) ;
2014-08-22 18:09:20 -04:00
if ( tsk - > connected ) {
2015-02-05 08:36:36 -05:00
skb = tipc_msg_create ( TIPC_CRITICAL_IMPORTANCE ,
2015-01-09 15:27:10 +08:00
TIPC_CONN_MSG , SHORT_H_SIZE , 0 , dnode ,
2015-02-05 08:36:36 -05:00
tsk_own_node ( tsk ) , tsk_peer_port ( tsk ) ,
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
tsk - > portid , TIPC_ERR_NO_PORT ) ;
2014-11-26 11:41:55 +08:00
if ( skb )
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( net , skb , dnode , tsk - > portid ) ;
tipc_node_remove_conn ( net , dnode , tsk - > portid ) ;
2014-08-22 18:09:13 -04:00
}
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Discard any remaining (connection-based) messages in receive queue */
2013-01-20 23:30:08 +01:00
__skb_queue_purge ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Reject any messages that accumulated in backlog queue */
sock - > state = SS_DISCONNECTING ;
release_sock ( sk ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
call_rcu ( & tsk - > rcu , tipc_sk_callback ) ;
2008-04-15 00:22:02 -07:00
sock - > sk = NULL ;
2006-01-02 19:04:38 +01:00
2014-04-06 15:56:14 +02:00
return 0 ;
2006-01-02 19:04:38 +01:00
}
/**
2014-02-18 16:06:46 +08:00
* tipc_bind - associate or disassocate TIPC name ( s ) with a socket
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ uaddr : socket address describing name ( s ) and desired operation
* @ uaddr_len : size of socket address data structure
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Name and name sequence binding is indicated using a positive scope value ;
* a negative scope value unbinds the specified name . Specifying no name
* ( i . e . a socket address length of 0 ) unbinds all names from the socket .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
2008-04-15 00:22:02 -07:00
*
* NOTE : This routine doesn ' t need to take the socket lock since it doesn ' t
* access any non - constant socket information .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static int tipc_bind ( struct socket * sock , struct sockaddr * uaddr ,
int uaddr_len )
2006-01-02 19:04:38 +01:00
{
2013-12-27 10:18:28 +08:00
struct sock * sk = sock - > sk ;
2006-01-02 19:04:38 +01:00
struct sockaddr_tipc * addr = ( struct sockaddr_tipc * ) uaddr ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2013-12-27 10:18:28 +08:00
int res = - EINVAL ;
2006-01-02 19:04:38 +01:00
2013-12-27 10:18:28 +08:00
lock_sock ( sk ) ;
if ( unlikely ( ! uaddr_len ) ) {
2014-08-22 18:09:20 -04:00
res = tipc_sk_withdraw ( tsk , 0 , NULL ) ;
2013-12-27 10:18:28 +08:00
goto exit ;
}
2007-02-09 23:25:21 +09:00
2013-12-27 10:18:28 +08:00
if ( uaddr_len < sizeof ( struct sockaddr_tipc ) ) {
res = - EINVAL ;
goto exit ;
}
if ( addr - > family ! = AF_TIPC ) {
res = - EAFNOSUPPORT ;
goto exit ;
}
2006-01-02 19:04:38 +01:00
if ( addr - > addrtype = = TIPC_ADDR_NAME )
addr - > addr . nameseq . upper = addr - > addr . nameseq . lower ;
2013-12-27 10:18:28 +08:00
else if ( addr - > addrtype ! = TIPC_ADDR_NAMESEQ ) {
res = - EAFNOSUPPORT ;
goto exit ;
}
2007-02-09 23:25:21 +09:00
tipc: convert topology server to use new server facility
As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it. We get two benefits
from doing this:
1) It simplifies the topology server locking policy. In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance. That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.
Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy. Moreover, the change will allow us to make the topology
server code more readable and maintainable.
2) It fixes a bug where sent subscription events may be lost when
the topology port is congested. Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context. The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.
Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter. And overall, we see a simplification of the
TIPC subscriber code in making this changeover.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:40 -04:00
if ( ( addr - > addr . nameseq . type < TIPC_RESERVED_TYPES ) & &
2013-06-17 10:54:41 -04:00
( addr - > addr . nameseq . type ! = TIPC_TOP_SRV ) & &
2013-12-27 10:18:28 +08:00
( addr - > addr . nameseq . type ! = TIPC_CFG_SRV ) ) {
res = - EACCES ;
goto exit ;
}
2011-11-02 15:49:40 -04:00
2013-12-27 10:18:28 +08:00
res = ( addr - > scope > 0 ) ?
2014-08-22 18:09:20 -04:00
tipc_sk_publish ( tsk , addr - > scope , & addr - > addr . nameseq ) :
tipc_sk_withdraw ( tsk , - addr - > scope , & addr - > addr . nameseq ) ;
2013-12-27 10:18:28 +08:00
exit :
release_sock ( sk ) ;
return res ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_getname - get port ID of socket or peer socket
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ uaddr : area for returned socket address
* @ uaddr_len : area for returned length of socket address
2008-07-14 22:43:32 -07:00
* @ peer : 0 = own ID , 1 = current peer ID , 2 = current / former peer ID
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
2008-04-15 00:22:02 -07:00
*
2008-07-14 22:43:32 -07:00
* NOTE : This routine doesn ' t need to take the socket lock since it only
* accesses socket information that is unchanging ( or which changes in
2010-12-31 18:59:32 +00:00
* a completely predictable manner ) .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static int tipc_getname ( struct socket * sock , struct sockaddr * uaddr ,
int * uaddr_len , int peer )
2006-01-02 19:04:38 +01:00
{
struct sockaddr_tipc * addr = ( struct sockaddr_tipc * ) uaddr ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sock - > sk ) ;
2015-01-09 15:27:10 +08:00
struct tipc_net * tn = net_generic ( sock_net ( sock - > sk ) , tipc_net_id ) ;
2006-01-02 19:04:38 +01:00
2010-10-31 07:10:32 +00:00
memset ( addr , 0 , sizeof ( * addr ) ) ;
2008-04-15 00:22:02 -07:00
if ( peer ) {
2008-07-14 22:43:32 -07:00
if ( ( sock - > state ! = SS_CONNECTED ) & &
( ( peer ! = 2 ) | | ( sock - > state ! = SS_DISCONNECTING ) ) )
return - ENOTCONN ;
2014-08-22 18:09:20 -04:00
addr - > addr . id . ref = tsk_peer_port ( tsk ) ;
addr - > addr . id . node = tsk_peer_node ( tsk ) ;
2008-04-15 00:22:02 -07:00
} else {
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
addr - > addr . id . ref = tsk - > portid ;
2015-01-09 15:27:10 +08:00
addr - > addr . id . node = tn - > own_addr ;
2008-04-15 00:22:02 -07:00
}
2006-01-02 19:04:38 +01:00
* uaddr_len = sizeof ( * addr ) ;
addr - > addrtype = TIPC_ADDR_ID ;
addr - > family = AF_TIPC ;
addr - > scope = 0 ;
addr - > addr . name . domain = 0 ;
2008-04-15 00:22:02 -07:00
return 0 ;
2006-01-02 19:04:38 +01:00
}
/**
2014-02-18 16:06:46 +08:00
* tipc_poll - read and possibly block on pollmask
2006-01-02 19:04:38 +01:00
* @ file : file structure associated with the socket
* @ sock : socket for which to calculate the poll bits
* @ wait : ? ? ?
*
2008-03-26 16:48:21 -07:00
* Returns pollmask value
*
* COMMENTARY :
* It appears that the usual socket locking mechanisms are not useful here
* since the pollmask info is potentially out - of - date the moment this routine
* exits . TCP and other protocols seem to rely on higher level poll routines
* to handle any preventable race conditions , so TIPC will do the same . . .
*
* TIPC sets the returned events as follows :
2010-08-17 11:00:06 +00:00
*
* socket state flags set
* - - - - - - - - - - - - - - - - - - - - -
* unconnected no read flags
2012-10-16 16:47:06 +02:00
* POLLOUT if port is not congested
2010-08-17 11:00:06 +00:00
*
* connecting POLLIN / POLLRDNORM if ACK / NACK in rx queue
* no write flags
*
* connected POLLIN / POLLRDNORM if data in rx queue
* POLLOUT if port is not congested
*
* disconnecting POLLIN / POLLRDNORM / POLLHUP
* no write flags
*
* listening POLLIN if SYN in rx queue
* no write flags
*
* ready POLLIN / POLLRDNORM if data in rx queue
* [ connectionless ] POLLOUT ( since port cannot be congested )
*
* IMPORTANT : The fact that a read or write operation is indicated does NOT
* imply that the operation will succeed , merely that it should be performed
* and will not block .
2006-01-02 19:04:38 +01:00
*/
2014-02-18 16:06:46 +08:00
static unsigned int tipc_poll ( struct file * file , struct socket * sock ,
poll_table * wait )
2006-01-02 19:04:38 +01:00
{
2008-03-26 16:48:21 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2010-08-17 11:00:06 +00:00
u32 mask = 0 ;
2008-03-26 16:48:21 -07:00
2012-08-21 11:16:57 +08:00
sock_poll_wait ( file , sk_sleep ( sk ) , wait ) ;
2008-03-26 16:48:21 -07:00
2010-08-17 11:00:06 +00:00
switch ( ( int ) sock - > state ) {
2012-10-16 16:47:06 +02:00
case SS_UNCONNECTED :
2014-06-25 20:41:42 -05:00
if ( ! tsk - > link_cong )
2012-10-16 16:47:06 +02:00
mask | = POLLOUT ;
break ;
2010-08-17 11:00:06 +00:00
case SS_READY :
case SS_CONNECTED :
2014-08-22 18:09:20 -04:00
if ( ! tsk - > link_cong & & ! tsk_conn_cong ( tsk ) )
2010-08-17 11:00:06 +00:00
mask | = POLLOUT ;
/* fall thru' */
case SS_CONNECTING :
case SS_LISTENING :
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
mask | = ( POLLIN | POLLRDNORM ) ;
break ;
case SS_DISCONNECTING :
mask = ( POLLIN | POLLRDNORM | POLLHUP ) ;
break ;
}
2008-03-26 16:48:21 -07:00
return mask ;
2006-01-02 19:04:38 +01:00
}
2014-07-16 20:41:01 -04:00
/**
* tipc_sendmcast - send multicast message
* @ sock : socket structure
* @ seq : destination address
2014-11-15 01:13:43 -05:00
* @ msg : message to send
2014-07-16 20:41:01 -04:00
* @ dsz : total length of message data
* @ timeo : timeout to wait for wakeup
*
* Called from function tipc_sendmsg ( ) , which has done all sanity checks
* Returns the number of bytes sent on success , or errno
*/
static int tipc_sendmcast ( struct socket * sock , struct tipc_name_seq * seq ,
2014-11-15 01:13:43 -05:00
struct msghdr * msg , size_t dsz , long timeo )
2014-07-16 20:41:01 -04:00
{
struct sock * sk = sock - > sk ;
2015-02-05 08:36:36 -05:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( sk ) ;
2015-02-05 08:36:36 -05:00
struct tipc_msg * mhdr = & tsk - > phdr ;
2015-02-05 08:36:40 -05:00
struct sk_buff_head * pktchain = & sk - > sk_write_queue ;
2014-11-28 15:52:29 -05:00
struct iov_iter save = msg - > msg_iter ;
2014-07-16 20:41:01 -04:00
uint mtu ;
int rc ;
msg_set_type ( mhdr , TIPC_MCAST_MSG ) ;
msg_set_lookup_scope ( mhdr , TIPC_CLUSTER_SCOPE ) ;
msg_set_destport ( mhdr , 0 ) ;
msg_set_destnode ( mhdr , 0 ) ;
msg_set_nametype ( mhdr , seq - > type ) ;
msg_set_namelower ( mhdr , seq - > lower ) ;
msg_set_nameupper ( mhdr , seq - > upper ) ;
msg_set_hdr_sz ( mhdr , MCAST_H_SIZE ) ;
new_mtu :
mtu = tipc_bclink_get_mtu ( ) ;
2015-02-05 08:36:40 -05:00
rc = tipc_msg_build ( mhdr , msg , 0 , dsz , mtu , pktchain ) ;
2014-07-16 20:41:01 -04:00
if ( unlikely ( rc < 0 ) )
return rc ;
do {
2015-02-05 08:36:40 -05:00
rc = tipc_bclink_xmit ( net , pktchain ) ;
2014-07-16 20:41:01 -04:00
if ( likely ( rc > = 0 ) ) {
rc = dsz ;
break ;
}
2014-11-28 15:52:29 -05:00
if ( rc = = - EMSGSIZE ) {
msg - > msg_iter = save ;
2014-07-16 20:41:01 -04:00
goto new_mtu ;
2014-11-28 15:52:29 -05:00
}
2014-07-16 20:41:01 -04:00
if ( rc ! = - ELINKCONG )
break ;
2014-08-22 18:09:07 -04:00
tipc_sk ( sk ) - > link_cong = 1 ;
2014-07-16 20:41:01 -04:00
rc = tipc_wait_for_sndmsg ( sock , & timeo ) ;
if ( rc )
2015-02-05 08:36:40 -05:00
__skb_queue_purge ( pktchain ) ;
2014-07-16 20:41:01 -04:00
} while ( ! rc ) ;
return rc ;
}
2015-02-05 08:36:44 -05:00
/**
* tipc_sk_mcast_rcv - Deliver multicast messages to all destination sockets
* @ arrvq : queue with arriving messages , to be cloned after destination lookup
* @ inputq : queue with cloned messages , delivered to socket after dest lookup
*
* Multi - threaded : parallel calls with reference to same queues may occur
2014-07-16 20:41:00 -04:00
*/
2015-02-05 08:36:44 -05:00
void tipc_sk_mcast_rcv ( struct net * net , struct sk_buff_head * arrvq ,
struct sk_buff_head * inputq )
2014-07-16 20:41:00 -04:00
{
2015-02-05 08:36:44 -05:00
struct tipc_msg * msg ;
2015-02-05 08:36:43 -05:00
struct tipc_plist dports ;
u32 portid ;
2014-07-16 20:41:00 -04:00
u32 scope = TIPC_CLUSTER_SCOPE ;
2015-02-05 08:36:44 -05:00
struct sk_buff_head tmpq ;
uint hsz ;
struct sk_buff * skb , * _skb ;
2015-02-05 08:36:43 -05:00
2015-02-05 08:36:44 -05:00
__skb_queue_head_init ( & tmpq ) ;
2015-02-05 08:36:43 -05:00
tipc_plist_init ( & dports ) ;
2014-07-16 20:41:00 -04:00
2015-02-05 08:36:44 -05:00
skb = tipc_skb_peek ( arrvq , & inputq - > lock ) ;
for ( ; skb ; skb = tipc_skb_peek ( arrvq , & inputq - > lock ) ) {
msg = buf_msg ( skb ) ;
hsz = skb_headroom ( skb ) + msg_hdr_sz ( msg ) ;
if ( in_own_node ( net , msg_orignode ( msg ) ) )
scope = TIPC_NODE_SCOPE ;
/* Create destination port list and message clones: */
tipc_nametbl_mc_translate ( net ,
msg_nametype ( msg ) , msg_namelower ( msg ) ,
msg_nameupper ( msg ) , scope , & dports ) ;
portid = tipc_plist_pop ( & dports ) ;
for ( ; portid ; portid = tipc_plist_pop ( & dports ) ) {
_skb = __pskb_copy ( skb , hsz , GFP_ATOMIC ) ;
if ( _skb ) {
msg_set_destport ( buf_msg ( _skb ) , portid ) ;
__skb_queue_tail ( & tmpq , _skb ) ;
continue ;
}
pr_warn ( " Failed to clone mcast rcv buffer \n " ) ;
2014-07-16 20:41:00 -04:00
}
2015-02-05 08:36:44 -05:00
/* Append to inputq if not already done by other thread */
spin_lock_bh ( & inputq - > lock ) ;
if ( skb_peek ( arrvq ) = = skb ) {
skb_queue_splice_tail_init ( & tmpq , inputq ) ;
kfree_skb ( __skb_dequeue ( arrvq ) ) ;
}
spin_unlock_bh ( & inputq - > lock ) ;
__skb_queue_purge ( & tmpq ) ;
kfree_skb ( skb ) ;
2014-07-16 20:41:00 -04:00
}
2015-02-05 08:36:44 -05:00
tipc_sk_rcv ( net , inputq ) ;
2014-07-16 20:41:00 -04:00
}
2014-06-25 20:41:41 -05:00
/**
* tipc_sk_proto_rcv - receive a connection mng protocol message
* @ tsk : receiving socket
2015-02-05 08:36:37 -05:00
* @ skb : pointer to message buffer . Set to NULL if buffer is consumed .
2014-06-25 20:41:41 -05:00
*/
2015-02-05 08:36:37 -05:00
static void tipc_sk_proto_rcv ( struct tipc_sock * tsk , struct sk_buff * * skb )
2014-06-25 20:41:41 -05:00
{
2015-02-05 08:36:37 -05:00
struct tipc_msg * msg = buf_msg ( * skb ) ;
2014-06-25 20:41:42 -05:00
int conn_cong ;
2015-02-05 08:36:37 -05:00
u32 dnode ;
u32 own_node = tsk_own_node ( tsk ) ;
2014-06-25 20:41:41 -05:00
/* Ignore if connection cannot be validated: */
2014-08-22 18:09:18 -04:00
if ( ! tsk_peer_msg ( tsk , msg ) )
2014-06-25 20:41:41 -05:00
goto exit ;
2014-08-22 18:09:20 -04:00
tsk - > probing_state = TIPC_CONN_OK ;
2014-06-25 20:41:41 -05:00
if ( msg_type ( msg ) = = CONN_ACK ) {
2014-08-22 18:09:20 -04:00
conn_cong = tsk_conn_cong ( tsk ) ;
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked - = msg_msgcnt ( msg ) ;
if ( conn_cong )
2014-08-22 18:09:07 -04:00
tsk - > sk . sk_write_space ( & tsk - > sk ) ;
2014-06-25 20:41:41 -05:00
} else if ( msg_type ( msg ) = = CONN_PROBE ) {
2015-02-05 08:36:37 -05:00
if ( tipc_msg_reverse ( own_node , * skb , & dnode , TIPC_OK ) ) {
msg_set_type ( msg , CONN_PROBE_REPLY ) ;
return ;
}
2014-06-25 20:41:41 -05:00
}
/* Do nothing if msg_type() == CONN_PROBE_REPLY */
exit :
2015-02-05 08:36:37 -05:00
kfree_skb ( * skb ) ;
* skb = NULL ;
2014-06-25 20:41:41 -05:00
}
2014-01-17 09:50:05 +08:00
static int tipc_wait_for_sndmsg ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-01-17 09:50:05 +08:00
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( sock - > state = = SS_DISCONNECTING )
return - EPIPE ;
if ( ! * timeo_p )
return - EAGAIN ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
2014-06-25 20:41:42 -05:00
done = sk_wait_event ( sk , timeo_p , ! tsk - > link_cong ) ;
2014-01-17 09:50:05 +08:00
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_sendmsg - send message in connectionless manner
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ m : message to send
2014-06-25 20:41:37 -05:00
* @ dsz : amount of user data to be sent
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Message must have an destination specified explicitly .
2007-02-09 23:25:21 +09:00
* Used for SOCK_RDM and SOCK_DGRAM messages ,
2006-01-02 19:04:38 +01:00
* and for ' SYN ' messages on SOCK_SEQPACKET and SOCK_STREAM connections .
* ( Note : ' SYN + ' is prohibited on SOCK_STREAM . )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns the number of bytes sent on success , or errno otherwise
*/
2015-03-02 15:37:48 +08:00
static int tipc_sendmsg ( struct socket * sock ,
2014-06-25 20:41:37 -05:00
struct msghdr * m , size_t dsz )
2015-03-02 15:37:47 +08:00
{
struct sock * sk = sock - > sk ;
int ret ;
lock_sock ( sk ) ;
ret = __tipc_sendmsg ( sock , m , dsz ) ;
release_sock ( sk ) ;
return ret ;
}
static int __tipc_sendmsg ( struct socket * sock , struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2014-06-25 20:41:37 -05:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , dest , m - > msg_name ) ;
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( sk ) ;
2014-08-22 18:09:20 -04:00
struct tipc_msg * mhdr = & tsk - > phdr ;
2014-06-25 20:41:37 -05:00
u32 dnode , dport ;
2015-02-05 08:36:40 -05:00
struct sk_buff_head * pktchain = & sk - > sk_write_queue ;
2014-11-26 11:41:55 +08:00
struct sk_buff * skb ;
2014-06-25 20:41:37 -05:00
struct tipc_name_seq * seq = & dest - > addr . nameseq ;
2014-11-28 15:52:29 -05:00
struct iov_iter save ;
2014-06-25 20:41:37 -05:00
u32 mtu ;
2014-01-17 09:50:05 +08:00
long timeo ;
2014-12-03 14:44:44 +01:00
int rc ;
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! dest ) )
return - EDESTADDRREQ ;
2014-06-25 20:41:37 -05:00
2006-06-25 23:49:06 -07:00
if ( unlikely ( ( m - > msg_namelen < sizeof ( * dest ) ) | |
( dest - > family ! = AF_TIPC ) ) )
2006-01-02 19:04:38 +01:00
return - EINVAL ;
2014-06-25 20:41:37 -05:00
if ( dsz > TIPC_MAX_USER_MSG_SIZE )
2010-04-20 17:58:24 -04:00
return - EMSGSIZE ;
2006-01-02 19:04:38 +01:00
2014-06-25 20:41:37 -05:00
if ( unlikely ( sock - > state ! = SS_READY ) ) {
2015-03-02 15:37:47 +08:00
if ( sock - > state = = SS_LISTENING )
return - EPIPE ;
if ( sock - > state ! = SS_UNCONNECTED )
return - EISCONN ;
if ( tsk - > published )
return - EOPNOTSUPP ;
2006-06-25 23:44:57 -07:00
if ( dest - > addrtype = = TIPC_ADDR_NAME ) {
2014-08-22 18:09:20 -04:00
tsk - > conn_type = dest - > addr . name . name . type ;
tsk - > conn_instance = dest - > addr . name . name . instance ;
2006-06-25 23:44:57 -07:00
}
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:05 +08:00
timeo = sock_sndtimeo ( sk , m - > msg_flags & MSG_DONTWAIT ) ;
2014-06-25 20:41:37 -05:00
if ( dest - > addrtype = = TIPC_ADDR_MCAST ) {
2015-03-02 15:37:47 +08:00
return tipc_sendmcast ( sock , seq , m , dsz , timeo ) ;
2014-06-25 20:41:37 -05:00
} else if ( dest - > addrtype = = TIPC_ADDR_NAME ) {
u32 type = dest - > addr . name . name . type ;
u32 inst = dest - > addr . name . name . instance ;
u32 domain = dest - > addr . name . domain ;
dnode = domain ;
msg_set_type ( mhdr , TIPC_NAMED_MSG ) ;
msg_set_hdr_sz ( mhdr , NAMED_H_SIZE ) ;
msg_set_nametype ( mhdr , type ) ;
msg_set_nameinst ( mhdr , inst ) ;
msg_set_lookup_scope ( mhdr , tipc_addr_scope ( domain ) ) ;
2015-01-09 15:27:09 +08:00
dport = tipc_nametbl_translate ( net , type , inst , & dnode ) ;
2014-06-25 20:41:37 -05:00
msg_set_destnode ( mhdr , dnode ) ;
msg_set_destport ( mhdr , dport ) ;
2015-03-02 15:37:47 +08:00
if ( unlikely ( ! dport & & ! dnode ) )
return - EHOSTUNREACH ;
2014-06-25 20:41:37 -05:00
} else if ( dest - > addrtype = = TIPC_ADDR_ID ) {
dnode = dest - > addr . id . node ;
msg_set_type ( mhdr , TIPC_DIRECT_MSG ) ;
msg_set_lookup_scope ( mhdr , 0 ) ;
msg_set_destnode ( mhdr , dnode ) ;
msg_set_destport ( mhdr , dest - > addr . id . ref ) ;
msg_set_hdr_sz ( mhdr , BASIC_H_SIZE ) ;
}
2014-11-28 15:52:29 -05:00
save = m - > msg_iter ;
2014-06-25 20:41:37 -05:00
new_mtu :
2015-01-09 15:27:05 +08:00
mtu = tipc_node_get_mtu ( net , dnode , tsk - > portid ) ;
2015-02-05 08:36:40 -05:00
rc = tipc_msg_build ( mhdr , m , 0 , dsz , mtu , pktchain ) ;
2014-06-25 20:41:37 -05:00
if ( rc < 0 )
2015-03-02 15:37:47 +08:00
return rc ;
2014-06-25 20:41:37 -05:00
do {
2015-02-05 08:36:40 -05:00
skb = skb_peek ( pktchain ) ;
2014-11-26 11:41:55 +08:00
TIPC_SKB_CB ( skb ) - > wakeup_pending = tsk - > link_cong ;
2015-02-05 08:36:40 -05:00
rc = tipc_link_xmit ( net , pktchain , dnode , tsk - > portid ) ;
2014-06-25 20:41:37 -05:00
if ( likely ( rc > = 0 ) ) {
if ( sock - > state ! = SS_READY )
2008-04-15 00:22:02 -07:00
sock - > state = SS_CONNECTING ;
2014-06-25 20:41:37 -05:00
rc = dsz ;
2008-04-15 00:22:02 -07:00
break ;
2007-02-09 23:25:21 +09:00
}
2014-11-28 15:52:29 -05:00
if ( rc = = - EMSGSIZE ) {
m - > msg_iter = save ;
2014-06-25 20:41:37 -05:00
goto new_mtu ;
2014-11-28 15:52:29 -05:00
}
2014-06-25 20:41:37 -05:00
if ( rc ! = - ELINKCONG )
2008-04-15 00:22:02 -07:00
break ;
2014-08-22 18:09:07 -04:00
tsk - > link_cong = 1 ;
2014-06-25 20:41:37 -05:00
rc = tipc_wait_for_sndmsg ( sock , & timeo ) ;
2014-07-06 20:38:50 -04:00
if ( rc )
2015-02-05 08:36:40 -05:00
__skb_queue_purge ( pktchain ) ;
2014-06-25 20:41:37 -05:00
} while ( ! rc ) ;
return rc ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:06 +08:00
static int tipc_wait_for_sndpkt ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-01-17 09:50:06 +08:00
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( sock - > state = = SS_DISCONNECTING )
return - EPIPE ;
else if ( sock - > state ! = SS_CONNECTED )
return - ENOTCONN ;
if ( ! * timeo_p )
return - EAGAIN ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
done = sk_wait_event ( sk , timeo_p ,
2014-06-25 20:41:42 -05:00
( ! tsk - > link_cong & &
2014-08-22 18:09:20 -04:00
! tsk_conn_cong ( tsk ) ) | |
! tsk - > connected ) ;
2014-01-17 09:50:06 +08:00
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2007-02-09 23:25:21 +09:00
/**
2014-06-25 20:41:38 -05:00
* tipc_send_stream - send stream - oriented data
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2014-06-25 20:41:38 -05:00
* @ m : data to send
* @ dsz : total length of data to be transmitted
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Used for SOCK_STREAM data .
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Returns the number of bytes sent on success ( or partial success ) ,
* or errno if no data sent
2006-01-02 19:04:38 +01:00
*/
2015-03-02 15:37:48 +08:00
static int tipc_send_stream ( struct socket * sock , struct msghdr * m , size_t dsz )
2015-03-02 15:37:47 +08:00
{
struct sock * sk = sock - > sk ;
int ret ;
lock_sock ( sk ) ;
ret = __tipc_send_stream ( sock , m , dsz ) ;
release_sock ( sk ) ;
return ret ;
}
static int __tipc_send_stream ( struct socket * sock , struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( sk ) ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-08-22 18:09:20 -04:00
struct tipc_msg * mhdr = & tsk - > phdr ;
2015-02-05 08:36:40 -05:00
struct sk_buff_head * pktchain = & sk - > sk_write_queue ;
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , dest , m - > msg_name ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
u32 portid = tsk - > portid ;
2014-06-25 20:41:38 -05:00
int rc = - EINVAL ;
2014-01-17 09:50:06 +08:00
long timeo ;
2014-06-25 20:41:38 -05:00
u32 dnode ;
uint mtu , send , sent = 0 ;
2014-11-28 15:52:29 -05:00
struct iov_iter save ;
2006-01-02 19:04:38 +01:00
/* Handle implied connection establishment */
2014-06-25 20:41:38 -05:00
if ( unlikely ( dest ) ) {
2015-03-02 15:37:47 +08:00
rc = __tipc_sendmsg ( sock , m , dsz ) ;
2014-06-25 20:41:38 -05:00
if ( dsz & & ( dsz = = rc ) )
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked = 1 ;
2014-06-25 20:41:38 -05:00
return rc ;
}
if ( dsz > ( uint ) INT_MAX )
2010-04-20 17:58:24 -04:00
return - EMSGSIZE ;
2014-01-17 09:50:06 +08:00
if ( unlikely ( sock - > state ! = SS_CONNECTED ) ) {
if ( sock - > state = = SS_DISCONNECTING )
2015-03-02 15:37:47 +08:00
return - EPIPE ;
2014-01-17 09:50:06 +08:00
else
2015-03-02 15:37:47 +08:00
return - ENOTCONN ;
2014-01-17 09:50:06 +08:00
}
2011-07-06 05:53:15 -04:00
2014-01-17 09:50:06 +08:00
timeo = sock_sndtimeo ( sk , m - > msg_flags & MSG_DONTWAIT ) ;
2014-08-22 18:09:20 -04:00
dnode = tsk_peer_node ( tsk ) ;
2014-06-25 20:41:38 -05:00
next :
2014-11-28 15:52:29 -05:00
save = m - > msg_iter ;
2014-08-22 18:09:20 -04:00
mtu = tsk - > max_pkt ;
2014-06-25 20:41:38 -05:00
send = min_t ( uint , dsz - sent , TIPC_MAX_USER_MSG_SIZE ) ;
2015-02-05 08:36:40 -05:00
rc = tipc_msg_build ( mhdr , m , sent , send , mtu , pktchain ) ;
2014-06-25 20:41:38 -05:00
if ( unlikely ( rc < 0 ) )
2015-03-02 15:37:47 +08:00
return rc ;
2007-02-09 23:25:21 +09:00
do {
2014-08-22 18:09:20 -04:00
if ( likely ( ! tsk_conn_cong ( tsk ) ) ) {
2015-02-05 08:36:40 -05:00
rc = tipc_link_xmit ( net , pktchain , dnode , portid ) ;
2014-06-25 20:41:38 -05:00
if ( likely ( ! rc ) ) {
2014-06-25 20:41:42 -05:00
tsk - > sent_unacked + + ;
2014-06-25 20:41:38 -05:00
sent + = send ;
if ( sent = = dsz )
break ;
goto next ;
}
if ( rc = = - EMSGSIZE ) {
2015-01-09 15:27:05 +08:00
tsk - > max_pkt = tipc_node_get_mtu ( net , dnode ,
portid ) ;
2014-11-28 15:52:29 -05:00
m - > msg_iter = save ;
2014-06-25 20:41:38 -05:00
goto next ;
}
if ( rc ! = - ELINKCONG )
break ;
2014-08-22 18:09:07 -04:00
tsk - > link_cong = 1 ;
2014-06-25 20:41:38 -05:00
}
rc = tipc_wait_for_sndpkt ( sock , & timeo ) ;
2014-07-06 20:38:50 -04:00
if ( rc )
2015-02-05 08:36:40 -05:00
__skb_queue_purge ( pktchain ) ;
2014-06-25 20:41:38 -05:00
} while ( ! rc ) ;
2015-03-02 15:37:47 +08:00
2014-06-25 20:41:38 -05:00
return sent ? sent : rc ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-06-25 20:41:38 -05:00
* tipc_send_packet - send a connection - oriented message
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2014-06-25 20:41:38 -05:00
* @ m : message to send
* @ dsz : length of data to be transmitted
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Used for SOCK_SEQPACKET messages .
2007-02-09 23:25:21 +09:00
*
2014-06-25 20:41:38 -05:00
* Returns the number of bytes sent on success , or errno otherwise
2006-01-02 19:04:38 +01:00
*/
2015-03-02 15:37:48 +08:00
static int tipc_send_packet ( struct socket * sock , struct msghdr * m , size_t dsz )
2006-01-02 19:04:38 +01:00
{
2014-06-25 20:41:38 -05:00
if ( dsz > TIPC_MAX_USER_MSG_SIZE )
return - EMSGSIZE ;
2006-01-02 19:04:38 +01:00
2015-03-02 15:37:48 +08:00
return tipc_send_stream ( sock , m , dsz ) ;
2006-01-02 19:04:38 +01:00
}
2014-08-22 18:09:11 -04:00
/* tipc_sk_finish_conn - complete the setup of a connection
2006-01-02 19:04:38 +01:00
*/
2014-08-22 18:09:20 -04:00
static void tipc_sk_finish_conn ( struct tipc_sock * tsk , u32 peer_port ,
2014-08-22 18:09:11 -04:00
u32 peer_node )
2006-01-02 19:04:38 +01:00
{
2015-01-13 17:07:48 +08:00
struct sock * sk = & tsk - > sk ;
struct net * net = sock_net ( sk ) ;
2014-08-22 18:09:20 -04:00
struct tipc_msg * msg = & tsk - > phdr ;
2006-01-02 19:04:38 +01:00
2014-08-22 18:09:11 -04:00
msg_set_destnode ( msg , peer_node ) ;
msg_set_destport ( msg , peer_port ) ;
msg_set_type ( msg , TIPC_CONN_MSG ) ;
msg_set_lookup_scope ( msg , 0 ) ;
msg_set_hdr_sz ( msg , SHORT_H_SIZE ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
2015-01-09 15:27:00 +08:00
tsk - > probing_intv = CONN_PROBING_INTERVAL ;
2014-08-22 18:09:20 -04:00
tsk - > probing_state = TIPC_CONN_OK ;
tsk - > connected = 1 ;
2015-01-13 17:07:48 +08:00
sk_reset_timer ( sk , & sk - > sk_timer , jiffies + tsk - > probing_intv ) ;
2015-01-09 15:27:05 +08:00
tipc_node_add_conn ( net , peer_node , tsk - > portid , peer_port ) ;
tsk - > max_pkt = tipc_node_get_mtu ( net , peer_node , tsk - > portid ) ;
2006-01-02 19:04:38 +01:00
}
/**
* set_orig_addr - capture sender ' s address for received message
* @ m : descriptor for message info
* @ msg : received message header
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Note : Address is not captured if not requested by receiver .
*/
2006-03-20 22:37:04 -08:00
static void set_orig_addr ( struct msghdr * m , struct tipc_msg * msg )
2006-01-02 19:04:38 +01:00
{
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_tipc * , addr , m - > msg_name ) ;
2006-01-02 19:04:38 +01:00
2007-02-09 23:25:21 +09:00
if ( addr ) {
2006-01-02 19:04:38 +01:00
addr - > family = AF_TIPC ;
addr - > addrtype = TIPC_ADDR_ID ;
2013-04-07 01:52:00 +00:00
memset ( & addr - > addr , 0 , sizeof ( addr - > addr ) ) ;
2006-01-02 19:04:38 +01:00
addr - > addr . id . ref = msg_origport ( msg ) ;
addr - > addr . id . node = msg_orignode ( msg ) ;
2010-12-31 18:59:32 +00:00
addr - > addr . name . domain = 0 ; /* could leave uninitialized */
addr - > scope = 0 ; /* could leave uninitialized */
2006-01-02 19:04:38 +01:00
m - > msg_namelen = sizeof ( struct sockaddr_tipc ) ;
}
}
/**
2014-08-22 18:09:20 -04:00
* tipc_sk_anc_data_recv - optionally capture ancillary data for received message
2006-01-02 19:04:38 +01:00
* @ m : descriptor for message info
* @ msg : received message header
2014-08-22 18:09:20 -04:00
* @ tsk : TIPC port associated with message
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Note : Ancillary data is not captured if not requested by receiver .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 if successful , otherwise errno
*/
2014-08-22 18:09:20 -04:00
static int tipc_sk_anc_data_recv ( struct msghdr * m , struct tipc_msg * msg ,
struct tipc_sock * tsk )
2006-01-02 19:04:38 +01:00
{
u32 anc_data [ 3 ] ;
u32 err ;
u32 dest_type ;
2006-06-25 23:45:24 -07:00
int has_name ;
2006-01-02 19:04:38 +01:00
int res ;
if ( likely ( m - > msg_controllen = = 0 ) )
return 0 ;
/* Optionally capture errored message object(s) */
err = msg ? msg_errcode ( msg ) : 0 ;
if ( unlikely ( err ) ) {
anc_data [ 0 ] = err ;
anc_data [ 1 ] = msg_data_sz ( msg ) ;
2010-12-31 18:59:33 +00:00
res = put_cmsg ( m , SOL_TIPC , TIPC_ERRINFO , 8 , anc_data ) ;
if ( res )
2006-01-02 19:04:38 +01:00
return res ;
2010-12-31 18:59:33 +00:00
if ( anc_data [ 1 ] ) {
res = put_cmsg ( m , SOL_TIPC , TIPC_RETDATA , anc_data [ 1 ] ,
msg_data ( msg ) ) ;
if ( res )
return res ;
}
2006-01-02 19:04:38 +01:00
}
/* Optionally capture message destination object */
dest_type = msg ? msg_type ( msg ) : TIPC_DIRECT_MSG ;
switch ( dest_type ) {
case TIPC_NAMED_MSG :
2006-06-25 23:45:24 -07:00
has_name = 1 ;
2006-01-02 19:04:38 +01:00
anc_data [ 0 ] = msg_nametype ( msg ) ;
anc_data [ 1 ] = msg_namelower ( msg ) ;
anc_data [ 2 ] = msg_namelower ( msg ) ;
break ;
case TIPC_MCAST_MSG :
2006-06-25 23:45:24 -07:00
has_name = 1 ;
2006-01-02 19:04:38 +01:00
anc_data [ 0 ] = msg_nametype ( msg ) ;
anc_data [ 1 ] = msg_namelower ( msg ) ;
anc_data [ 2 ] = msg_nameupper ( msg ) ;
break ;
case TIPC_CONN_MSG :
2014-08-22 18:09:20 -04:00
has_name = ( tsk - > conn_type ! = 0 ) ;
anc_data [ 0 ] = tsk - > conn_type ;
anc_data [ 1 ] = tsk - > conn_instance ;
anc_data [ 2 ] = tsk - > conn_instance ;
2006-01-02 19:04:38 +01:00
break ;
default :
2006-06-25 23:45:24 -07:00
has_name = 0 ;
2006-01-02 19:04:38 +01:00
}
2010-12-31 18:59:33 +00:00
if ( has_name ) {
res = put_cmsg ( m , SOL_TIPC , TIPC_DESTNAME , 12 , anc_data ) ;
if ( res )
return res ;
}
2006-01-02 19:04:38 +01:00
return 0 ;
}
2014-08-22 18:09:20 -04:00
static void tipc_sk_send_ack ( struct tipc_sock * tsk , uint ack )
2014-08-22 18:09:12 -04:00
{
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( & tsk - > sk ) ;
2014-11-26 11:41:55 +08:00
struct sk_buff * skb = NULL ;
2014-08-22 18:09:12 -04:00
struct tipc_msg * msg ;
2014-08-22 18:09:20 -04:00
u32 peer_port = tsk_peer_port ( tsk ) ;
u32 dnode = tsk_peer_node ( tsk ) ;
2014-08-22 18:09:12 -04:00
2014-08-22 18:09:20 -04:00
if ( ! tsk - > connected )
2014-08-22 18:09:12 -04:00
return ;
2015-02-05 08:36:36 -05:00
skb = tipc_msg_create ( CONN_MANAGER , CONN_ACK , INT_H_SIZE , 0 ,
dnode , tsk_own_node ( tsk ) , peer_port ,
tsk - > portid , TIPC_OK ) ;
2014-11-26 11:41:55 +08:00
if ( ! skb )
2014-08-22 18:09:12 -04:00
return ;
2014-11-26 11:41:55 +08:00
msg = buf_msg ( skb ) ;
2014-08-22 18:09:12 -04:00
msg_set_msgcnt ( msg , ack ) ;
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( net , skb , dnode , msg_link_selector ( msg ) ) ;
2014-08-22 18:09:12 -04:00
}
2014-05-23 15:55:12 -04:00
static int tipc_wait_for_rcvmsg ( struct socket * sock , long * timeop )
2014-01-17 09:50:07 +08:00
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
2014-05-23 15:55:12 -04:00
long timeo = * timeop ;
2014-01-17 09:50:07 +08:00
int err ;
for ( ; ; ) {
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
2014-03-06 14:40:18 +01:00
if ( timeo & & skb_queue_empty ( & sk - > sk_receive_queue ) ) {
2014-01-17 09:50:07 +08:00
if ( sock - > state = = SS_DISCONNECTING ) {
err = - ENOTCONN ;
break ;
}
release_sock ( sk ) ;
timeo = schedule_timeout ( timeo ) ;
lock_sock ( sk ) ;
}
err = 0 ;
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
break ;
err = sock_intr_errno ( timeo ) ;
if ( signal_pending ( current ) )
break ;
err = - EAGAIN ;
if ( ! timeo )
break ;
}
finish_wait ( sk_sleep ( sk ) , & wait ) ;
2014-05-23 15:55:12 -04:00
* timeop = timeo ;
2014-01-17 09:50:07 +08:00
return err ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_recvmsg - receive packet - oriented message
2006-01-02 19:04:38 +01:00
* @ m : descriptor for message info
* @ buf_len : total size of user buffer area
* @ flags : receive flags
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Used for SOCK_DGRAM , SOCK_RDM , and SOCK_SEQPACKET messages .
* If the complete message doesn ' t fit in user area , truncate it .
*
* Returns size of returned message data , errno otherwise
*/
2015-03-02 15:37:48 +08:00
static int tipc_recvmsg ( struct socket * sock , struct msghdr * m , size_t buf_len ,
int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
struct tipc_msg * msg ;
2014-01-17 09:50:07 +08:00
long timeo ;
2006-01-02 19:04:38 +01:00
unsigned int sz ;
u32 err ;
int res ;
2008-04-15 00:22:02 -07:00
/* Catch invalid receive requests */
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! buf_len ) )
return - EINVAL ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
if ( unlikely ( sock - > state = = SS_UNCONNECTED ) ) {
res = - ENOTCONN ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2014-01-17 09:50:07 +08:00
timeo = sock_rcvtimeo ( sk , flags & MSG_DONTWAIT ) ;
2008-04-15 00:22:02 -07:00
restart :
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look for a message in receive queue; wait if necessary */
2014-05-23 15:55:12 -04:00
res = tipc_wait_for_rcvmsg ( sock , & timeo ) ;
2014-01-17 09:50:07 +08:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look at first message in receive queue */
buf = skb_peek ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
msg = buf_msg ( buf ) ;
sz = msg_data_sz ( msg ) ;
err = msg_errcode ( msg ) ;
/* Discard an empty non-errored message & try again */
if ( ( ! sz ) & & ( ! err ) ) {
2014-08-22 18:09:18 -04:00
tsk_advance_rx_queue ( sk ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
/* Capture sender's address (optional) */
set_orig_addr ( m , msg ) ;
/* Capture ancillary data (optional) */
2014-08-22 18:09:20 -04:00
res = tipc_sk_anc_data_recv ( m , msg , tsk ) ;
2008-04-15 00:22:02 -07:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
/* Capture message data (if valid) & compute return value (always) */
if ( ! err ) {
if ( unlikely ( buf_len < sz ) ) {
sz = buf_len ;
m - > msg_flags | = MSG_TRUNC ;
}
2014-11-05 16:46:40 -05:00
res = skb_copy_datagram_msg ( buf , msg_hdr_sz ( msg ) , m , sz ) ;
2011-02-21 09:45:40 -05:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
res = sz ;
} else {
if ( ( sock - > state = = SS_READY ) | |
( ( err = = TIPC_CONN_SHUTDOWN ) | | m - > msg_control ) )
res = 0 ;
else
res = - ECONNRESET ;
}
/* Consume received message (optional) */
if ( likely ( ! ( flags & MSG_PEEK ) ) ) {
2008-04-15 00:06:12 -07:00
if ( ( sock - > state ! = SS_READY ) & &
2014-06-25 20:41:42 -05:00
( + + tsk - > rcv_unacked > = TIPC_CONNACK_INTV ) ) {
2014-08-22 18:09:20 -04:00
tipc_sk_send_ack ( tsk , tsk - > rcv_unacked ) ;
2014-06-25 20:41:42 -05:00
tsk - > rcv_unacked = 0 ;
}
2014-08-22 18:09:18 -04:00
tsk_advance_rx_queue ( sk ) ;
2007-02-09 23:25:21 +09:00
}
2006-01-02 19:04:38 +01:00
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_recv_stream - receive stream - oriented data
2006-01-02 19:04:38 +01:00
* @ m : descriptor for message info
* @ buf_len : total size of user buffer area
* @ flags : receive flags
2007-02-09 23:25:21 +09:00
*
* Used for SOCK_STREAM messages only . If not enough data is available
2006-01-02 19:04:38 +01:00
* will optionally wait for more ; never truncates data .
*
* Returns size of returned message data , errno otherwise
*/
2015-03-02 15:37:48 +08:00
static int tipc_recv_stream ( struct socket * sock , struct msghdr * m ,
size_t buf_len , int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
struct tipc_msg * msg ;
2014-01-17 09:50:07 +08:00
long timeo ;
2006-01-02 19:04:38 +01:00
unsigned int sz ;
2010-08-17 11:00:04 +00:00
int sz_to_copy , target , needed ;
2006-01-02 19:04:38 +01:00
int sz_copied = 0 ;
u32 err ;
2008-04-15 00:22:02 -07:00
int res = 0 ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Catch invalid receive attempts */
2006-01-02 19:04:38 +01:00
if ( unlikely ( ! buf_len ) )
return - EINVAL ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2014-01-17 09:50:07 +08:00
if ( unlikely ( sock - > state = = SS_UNCONNECTED ) ) {
2008-04-15 00:22:02 -07:00
res = - ENOTCONN ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2010-08-17 11:00:04 +00:00
target = sock_rcvlowat ( sk , flags & MSG_WAITALL , buf_len ) ;
2014-01-17 09:50:07 +08:00
timeo = sock_rcvtimeo ( sk , flags & MSG_DONTWAIT ) ;
2006-01-02 19:04:38 +01:00
2012-04-30 15:29:02 -04:00
restart :
2008-04-15 00:22:02 -07:00
/* Look for a message in receive queue; wait if necessary */
2014-05-23 15:55:12 -04:00
res = tipc_wait_for_rcvmsg ( sock , & timeo ) ;
2014-01-17 09:50:07 +08:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/* Look at first message in receive queue */
buf = skb_peek ( & sk - > sk_receive_queue ) ;
2006-01-02 19:04:38 +01:00
msg = buf_msg ( buf ) ;
sz = msg_data_sz ( msg ) ;
err = msg_errcode ( msg ) ;
/* Discard an empty non-errored message & try again */
if ( ( ! sz ) & & ( ! err ) ) {
2014-08-22 18:09:18 -04:00
tsk_advance_rx_queue ( sk ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
/* Optionally capture sender's address & ancillary data of first msg */
if ( sz_copied = = 0 ) {
set_orig_addr ( m , msg ) ;
2014-08-22 18:09:20 -04:00
res = tipc_sk_anc_data_recv ( m , msg , tsk ) ;
2008-04-15 00:22:02 -07:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
}
/* Capture message data (if valid) & compute return value (always) */
if ( ! err ) {
2011-02-21 09:45:40 -05:00
u32 offset = ( u32 ) ( unsigned long ) ( TIPC_SKB_CB ( buf ) - > handle ) ;
2006-01-02 19:04:38 +01:00
2011-02-21 09:45:40 -05:00
sz - = offset ;
2006-01-02 19:04:38 +01:00
needed = ( buf_len - sz_copied ) ;
sz_to_copy = ( sz < = needed ) ? sz : needed ;
2011-02-21 09:45:40 -05:00
2014-11-05 16:46:40 -05:00
res = skb_copy_datagram_msg ( buf , msg_hdr_sz ( msg ) + offset ,
m , sz_to_copy ) ;
2011-02-21 09:45:40 -05:00
if ( res )
2006-01-02 19:04:38 +01:00
goto exit ;
2011-02-21 09:45:40 -05:00
2006-01-02 19:04:38 +01:00
sz_copied + = sz_to_copy ;
if ( sz_to_copy < sz ) {
if ( ! ( flags & MSG_PEEK ) )
2011-02-21 09:45:40 -05:00
TIPC_SKB_CB ( buf ) - > handle =
( void * ) ( unsigned long ) ( offset + sz_to_copy ) ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
} else {
if ( sz_copied ! = 0 )
goto exit ; /* can't add error msg to valid data */
if ( ( err = = TIPC_CONN_SHUTDOWN ) | | m - > msg_control )
res = 0 ;
else
res = - ECONNRESET ;
}
/* Consume received message (optional) */
if ( likely ( ! ( flags & MSG_PEEK ) ) ) {
2014-06-25 20:41:42 -05:00
if ( unlikely ( + + tsk - > rcv_unacked > = TIPC_CONNACK_INTV ) ) {
2014-08-22 18:09:20 -04:00
tipc_sk_send_ack ( tsk , tsk - > rcv_unacked ) ;
2014-06-25 20:41:42 -05:00
tsk - > rcv_unacked = 0 ;
}
2014-08-22 18:09:18 -04:00
tsk_advance_rx_queue ( sk ) ;
2007-02-09 23:25:21 +09:00
}
2006-01-02 19:04:38 +01:00
/* Loop around if more data is required */
2009-11-29 16:55:45 -08:00
if ( ( sz_copied < buf_len ) & & /* didn't get all requested data */
( ! skb_queue_empty ( & sk - > sk_receive_queue ) | |
2010-08-17 11:00:04 +00:00
( sz_copied < target ) ) & & /* and more is ready or required */
2009-11-29 16:55:45 -08:00
( ! ( flags & MSG_PEEK ) ) & & /* and aren't just peeking at data */
( ! err ) ) /* and haven't reached a FIN */
2006-01-02 19:04:38 +01:00
goto restart ;
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-06-25 23:48:22 -07:00
return sz_copied ? sz_copied : res ;
2006-01-02 19:04:38 +01:00
}
2012-08-21 11:16:57 +08:00
/**
* tipc_write_space - wake up thread if port congestion is released
* @ sk : socket
*/
static void tipc_write_space ( struct sock * sk )
{
struct socket_wq * wq ;
rcu_read_lock ( ) ;
wq = rcu_dereference ( sk - > sk_wq ) ;
if ( wq_has_sleeper ( wq ) )
wake_up_interruptible_sync_poll ( & wq - > wait , POLLOUT |
POLLWRNORM | POLLWRBAND ) ;
rcu_read_unlock ( ) ;
}
/**
* tipc_data_ready - wake up threads to indicate messages have been received
* @ sk : socket
* @ len : the length of messages
*/
2014-04-11 16:15:36 -04:00
static void tipc_data_ready ( struct sock * sk )
2012-08-21 11:16:57 +08:00
{
struct socket_wq * wq ;
rcu_read_lock ( ) ;
wq = rcu_dereference ( sk - > sk_wq ) ;
if ( wq_has_sleeper ( wq ) )
wake_up_interruptible_sync_poll ( & wq - > wait , POLLIN |
POLLRDNORM | POLLRDBAND ) ;
rcu_read_unlock ( ) ;
}
2012-11-29 18:39:14 -05:00
/**
* filter_connect - Handle all incoming messages for a connection - based socket
2014-03-12 11:31:12 -04:00
* @ tsk : TIPC socket
2015-02-05 08:36:37 -05:00
* @ skb : pointer to message buffer . Set to NULL if buffer is consumed
2012-11-29 18:39:14 -05:00
*
2014-10-29 22:58:51 -07:00
* Returns 0 ( TIPC_OK ) if everything ok , - TIPC_ERR_NO_PORT otherwise
2012-11-29 18:39:14 -05:00
*/
2015-02-05 08:36:37 -05:00
static int filter_connect ( struct tipc_sock * tsk , struct sk_buff * * skb )
2012-11-29 18:39:14 -05:00
{
2014-03-12 11:31:12 -04:00
struct sock * sk = & tsk - > sk ;
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( sk ) ;
2014-03-12 11:31:09 -04:00
struct socket * sock = sk - > sk_socket ;
2015-02-05 08:36:37 -05:00
struct tipc_msg * msg = buf_msg ( * skb ) ;
2014-06-25 20:41:31 -05:00
int retval = - TIPC_ERR_NO_PORT ;
2012-11-29 18:39:14 -05:00
if ( msg_mcast ( msg ) )
return retval ;
switch ( ( int ) sock - > state ) {
case SS_CONNECTED :
/* Accept only connection-based messages sent by peer */
2014-08-22 18:09:18 -04:00
if ( tsk_peer_msg ( tsk , msg ) ) {
2012-11-29 18:39:14 -05:00
if ( unlikely ( msg_errcode ( msg ) ) ) {
sock - > state = SS_DISCONNECTING ;
2014-08-22 18:09:20 -04:00
tsk - > connected = 0 ;
2014-08-22 18:09:11 -04:00
/* let timer expire on it's own */
2015-01-09 15:27:05 +08:00
tipc_node_remove_conn ( net , tsk_peer_node ( tsk ) ,
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
tsk - > portid ) ;
2012-11-29 18:39:14 -05:00
}
retval = TIPC_OK ;
}
break ;
case SS_CONNECTING :
/* Accept only ACK or NACK message */
2014-08-22 18:09:11 -04:00
if ( unlikely ( ! msg_connected ( msg ) ) )
break ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( unlikely ( msg_errcode ( msg ) ) ) {
sock - > state = SS_DISCONNECTING ;
tipc: set sk_err correctly when connection fails
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28 09:29:58 +02:00
sk - > sk_err = ECONNREFUSED ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
retval = TIPC_OK ;
break ;
}
2014-08-22 18:09:11 -04:00
if ( unlikely ( msg_importance ( msg ) > TIPC_CRITICAL_IMPORTANCE ) ) {
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
sock - > state = SS_DISCONNECTING ;
2014-08-22 18:09:11 -04:00
sk - > sk_err = EINVAL ;
2012-11-29 18:39:14 -05:00
retval = TIPC_OK ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
break ;
}
2014-08-22 18:09:20 -04:00
tipc_sk_finish_conn ( tsk , msg_origport ( msg ) , msg_orignode ( msg ) ) ;
msg_set_importance ( & tsk - > phdr , msg_importance ( msg ) ) ;
2014-08-22 18:09:11 -04:00
sock - > state = SS_CONNECTED ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
/* If an incoming message is an 'ACK-', it should be
* discarded here because it doesn ' t contain useful
* data . In addition , we should try to wake up
* connect ( ) routine if sleeping .
*/
if ( msg_data_sz ( msg ) = = 0 ) {
2015-02-05 08:36:37 -05:00
kfree_skb ( * skb ) ;
* skb = NULL ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( waitqueue_active ( sk_sleep ( sk ) ) )
wake_up_interruptible ( sk_sleep ( sk ) ) ;
}
retval = TIPC_OK ;
2012-11-29 18:39:14 -05:00
break ;
case SS_LISTENING :
case SS_UNCONNECTED :
/* Accept only SYN message */
if ( ! msg_connected ( msg ) & & ! ( msg_errcode ( msg ) ) )
retval = TIPC_OK ;
break ;
case SS_DISCONNECTING :
break ;
default :
pr_err ( " Unknown socket state %u \n " , sock - > state ) ;
}
return retval ;
}
2013-01-20 23:30:09 +01:00
/**
* rcvbuf_limit - get proper overload limit of socket receive queue
* @ sk : socket
* @ buf : message
*
* For all connection oriented messages , irrespective of importance ,
* the default overload value ( i . e . 67 MB ) is set as limit .
*
* For all connectionless messages , by default new queue limits are
* as belows :
*
2013-06-17 10:54:37 -04:00
* TIPC_LOW_IMPORTANCE ( 4 MB )
* TIPC_MEDIUM_IMPORTANCE ( 8 MB )
* TIPC_HIGH_IMPORTANCE ( 16 MB )
* TIPC_CRITICAL_IMPORTANCE ( 32 MB )
2013-01-20 23:30:09 +01:00
*
* Returns overload limit according to corresponding message importance
*/
static unsigned int rcvbuf_limit ( struct sock * sk , struct sk_buff * buf )
{
struct tipc_msg * msg = buf_msg ( buf ) ;
if ( msg_connected ( msg ) )
2013-12-12 09:36:39 +08:00
return sysctl_tipc_rmem [ 2 ] ;
return sk - > sk_rcvbuf > > TIPC_CRITICAL_IMPORTANCE < <
msg_importance ( msg ) ;
2013-01-20 23:30:09 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2008-04-15 00:22:02 -07:00
* filter_rcv - validate incoming message
* @ sk : socket
2015-02-05 08:36:37 -05:00
* @ skb : pointer to message . Set to NULL if buffer is consumed .
2007-02-09 23:25:21 +09:00
*
2008-04-15 00:22:02 -07:00
* Enqueues message on receive queue if acceptable ; optionally handles
* disconnect indication for a connected socket .
*
2015-02-05 08:36:37 -05:00
* Called with socket lock already taken
2007-02-09 23:25:21 +09:00
*
2015-02-05 08:36:37 -05:00
* Returns 0 ( TIPC_OK ) if message was ok , - TIPC error code if rejected
2006-01-02 19:04:38 +01:00
*/
2015-02-05 08:36:37 -05:00
static int filter_rcv ( struct sock * sk , struct sk_buff * * skb )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct socket * sock = sk - > sk_socket ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2015-02-05 08:36:37 -05:00
struct tipc_msg * msg = buf_msg ( * skb ) ;
unsigned int limit = rcvbuf_limit ( sk , * skb ) ;
2014-06-25 20:41:31 -05:00
int rc = TIPC_OK ;
2006-01-02 19:04:38 +01:00
2015-02-05 08:36:37 -05:00
if ( unlikely ( msg_user ( msg ) = = CONN_MANAGER ) ) {
tipc_sk_proto_rcv ( tsk , skb ) ;
return TIPC_OK ;
}
2014-06-25 20:41:40 -05:00
2014-08-22 18:09:07 -04:00
if ( unlikely ( msg_user ( msg ) = = SOCK_WAKEUP ) ) {
2015-02-05 08:36:37 -05:00
kfree_skb ( * skb ) ;
2014-08-22 18:09:07 -04:00
tsk - > link_cong = 0 ;
sk - > sk_write_space ( sk ) ;
2015-02-05 08:36:37 -05:00
* skb = NULL ;
2014-08-22 18:09:07 -04:00
return TIPC_OK ;
}
2006-01-02 19:04:38 +01:00
/* Reject message if it is wrong sort of message for socket */
2012-04-26 18:13:08 -04:00
if ( msg_type ( msg ) > TIPC_DIRECT_MSG )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_NO_PORT ;
2008-04-15 00:22:02 -07:00
2006-01-02 19:04:38 +01:00
if ( sock - > state = = SS_READY ) {
2010-12-31 18:59:25 +00:00
if ( msg_connected ( msg ) )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_NO_PORT ;
2006-01-02 19:04:38 +01:00
} else {
2015-02-05 08:36:37 -05:00
rc = filter_connect ( tsk , skb ) ;
if ( rc ! = TIPC_OK | | ! * skb )
2014-06-25 20:41:31 -05:00
return rc ;
2006-01-02 19:04:38 +01:00
}
/* Reject message if there isn't room to queue it */
2015-02-05 08:36:37 -05:00
if ( sk_rmem_alloc_get ( sk ) + ( * skb ) - > truesize > = limit )
2014-06-25 20:41:31 -05:00
return - TIPC_ERR_OVERLOAD ;
2006-01-02 19:04:38 +01:00
2013-01-20 23:30:09 +01:00
/* Enqueue message */
2015-02-05 08:36:37 -05:00
TIPC_SKB_CB ( * skb ) - > handle = NULL ;
__skb_queue_tail ( & sk - > sk_receive_queue , * skb ) ;
skb_set_owner_r ( * skb , sk ) ;
2008-04-15 00:22:02 -07:00
2014-04-11 16:15:36 -04:00
sk - > sk_data_ready ( sk ) ;
2015-02-05 08:36:37 -05:00
* skb = NULL ;
2008-04-15 00:22:02 -07:00
return TIPC_OK ;
}
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
/**
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
* tipc_backlog_rcv - handle incoming message from backlog queue
2008-04-15 00:22:02 -07:00
* @ sk : socket
2014-11-26 11:41:55 +08:00
* @ skb : message
2008-04-15 00:22:02 -07:00
*
tipc: split up function tipc_msg_eval()
The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.
This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:39 -05:00
* Caller must hold socket lock
2008-04-15 00:22:02 -07:00
*
* Returns 0
*/
2014-11-26 11:41:55 +08:00
static int tipc_backlog_rcv ( struct sock * sk , struct sk_buff * skb )
2008-04-15 00:22:02 -07:00
{
2015-02-05 08:36:37 -05:00
int err ;
atomic_t * dcnt ;
u32 dnode ;
tipc: compensate for double accounting in socket rcv buffer
The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.
As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.
This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.
In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.
It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 05:39:09 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2015-01-09 15:27:10 +08:00
struct net * net = sock_net ( sk ) ;
2014-11-26 11:41:55 +08:00
uint truesize = skb - > truesize ;
2008-04-15 00:22:02 -07:00
2015-02-05 08:36:37 -05:00
err = filter_rcv ( sk , & skb ) ;
if ( likely ( ! skb ) ) {
dcnt = & tsk - > dupl_rcvcnt ;
if ( atomic_read ( dcnt ) < TIPC_CONN_OVERLOAD_LIMIT )
atomic_add ( truesize , dcnt ) ;
2014-06-25 20:41:41 -05:00
return 0 ;
}
2015-02-05 08:36:37 -05:00
if ( ! err | | tipc_msg_reverse ( tsk_own_node ( tsk ) , skb , & dnode , - err ) )
tipc_link_xmit_skb ( net , skb , dnode , tsk - > portid ) ;
2008-04-15 00:22:02 -07:00
return 0 ;
}
2015-02-05 08:36:38 -05:00
/**
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
* tipc_sk_enqueue - extract all buffers with destination ' dport ' from
* inputq and try adding them to socket or backlog queue
* @ inputq : list of incoming buffers with potentially different destinations
* @ sk : socket where the buffers should be enqueued
* @ dport : port number for the socket
* @ _skb : returned buffer to be forwarded or rejected , if applicable
2015-02-05 08:36:38 -05:00
*
* Caller must hold socket lock
*
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
* Returns TIPC_OK if all buffers enqueued , otherwise - TIPC_ERR_OVERLOAD
* or - TIPC_ERR_NO_PORT
2015-02-05 08:36:38 -05:00
*/
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
static int tipc_sk_enqueue ( struct sk_buff_head * inputq , struct sock * sk ,
u32 dport , struct sk_buff * * _skb )
2015-02-05 08:36:38 -05:00
{
unsigned int lim ;
atomic_t * dcnt ;
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
int err ;
struct sk_buff * skb ;
unsigned long time_limit = jiffies + 2 ;
while ( skb_queue_len ( inputq ) ) {
2015-02-08 11:10:50 -05:00
if ( unlikely ( time_after_eq ( jiffies , time_limit ) ) )
return TIPC_OK ;
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
skb = tipc_skb_dequeue ( inputq , dport ) ;
if ( unlikely ( ! skb ) )
return TIPC_OK ;
if ( ! sock_owned_by_user ( sk ) ) {
err = filter_rcv ( sk , & skb ) ;
if ( likely ( ! skb ) )
continue ;
* _skb = skb ;
return err ;
}
dcnt = & tipc_sk ( sk ) - > dupl_rcvcnt ;
if ( sk - > sk_backlog . len )
atomic_set ( dcnt , 0 ) ;
lim = rcvbuf_limit ( sk , skb ) + atomic_read ( dcnt ) ;
if ( likely ( ! sk_add_backlog ( sk , skb , lim ) ) )
continue ;
* _skb = skb ;
2015-02-05 08:36:38 -05:00
return - TIPC_ERR_OVERLOAD ;
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
}
2015-02-05 08:36:38 -05:00
return TIPC_OK ;
}
2008-04-15 00:22:02 -07:00
/**
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
* tipc_sk_rcv - handle a chain of incoming buffers
* @ inputq : buffer list containing the buffers
* Consumes all buffers in list until inputq is empty
* Note : may be called in multiple threads referring to the same queue
* Returns 0 if last buffer was accepted , otherwise - EHOSTUNREACH
* Only node local calls check the return value , sending single - buffer queues
2008-04-15 00:22:02 -07:00
*/
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
int tipc_sk_rcv ( struct net * net , struct sk_buff_head * inputq )
2008-04-15 00:22:02 -07:00
{
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
u32 dnode , dport = 0 ;
int err = - TIPC_ERR_NO_PORT ;
struct sk_buff * skb ;
2014-05-14 05:39:15 -04:00
struct tipc_sock * tsk ;
2015-02-05 08:36:36 -05:00
struct tipc_net * tn ;
2014-05-14 05:39:15 -04:00
struct sock * sk ;
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
while ( skb_queue_len ( inputq ) ) {
skb = NULL ;
dport = tipc_skb_peek_port ( inputq , dport ) ;
tsk = tipc_sk_lookup ( net , dport ) ;
if ( likely ( tsk ) ) {
sk = & tsk - > sk ;
if ( likely ( spin_trylock_bh ( & sk - > sk_lock . slock ) ) ) {
err = tipc_sk_enqueue ( inputq , sk , dport , & skb ) ;
spin_unlock_bh ( & sk - > sk_lock . slock ) ;
dport = 0 ;
}
sock_put ( sk ) ;
} else {
skb = tipc_skb_dequeue ( inputq , dport ) ;
}
if ( likely ( ! skb ) )
continue ;
if ( tipc_msg_lookup_dest ( net , skb , & dnode , & err ) )
goto xmit ;
if ( ! err ) {
dnode = msg_destnode ( buf_msg ( skb ) ) ;
goto xmit ;
}
tn = net_generic ( net , tipc_net_id ) ;
if ( ! tipc_msg_reverse ( tn - > own_addr , skb , & dnode , - err ) )
continue ;
tipc: split up function tipc_msg_eval()
The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.
This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:39 -05:00
xmit :
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 08:36:41 -05:00
tipc_link_xmit_skb ( net , skb , dnode , dport ) ;
}
2015-02-05 08:36:37 -05:00
return err ? - EHOSTUNREACH : 0 ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:03 +08:00
static int tipc_wait_for_connect ( struct socket * sock , long * timeo_p )
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
int done ;
do {
int err = sock_error ( sk ) ;
if ( err )
return err ;
if ( ! * timeo_p )
return - ETIMEDOUT ;
if ( signal_pending ( current ) )
return sock_intr_errno ( * timeo_p ) ;
prepare_to_wait ( sk_sleep ( sk ) , & wait , TASK_INTERRUPTIBLE ) ;
done = sk_wait_event ( sk , timeo_p , sock - > state ! = SS_CONNECTING ) ;
finish_wait ( sk_sleep ( sk ) , & wait ) ;
} while ( ! done ) ;
return 0 ;
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_connect - establish a connection to another TIPC port
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ dest : socket address for destination port
* @ destlen : size of socket address data structure
2008-04-15 00:22:02 -07:00
* @ flags : file - related flags associated with socket
2006-01-02 19:04:38 +01:00
*
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_connect ( struct socket * sock , struct sockaddr * dest ,
int destlen , int flags )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2008-04-15 00:20:37 -07:00
struct sockaddr_tipc * dst = ( struct sockaddr_tipc * ) dest ;
struct msghdr m = { NULL , } ;
2014-01-17 09:50:03 +08:00
long timeout = ( flags & O_NONBLOCK ) ? 0 : tipc_sk ( sk ) - > conn_timeout ;
socket_state previous ;
2008-04-15 00:20:37 -07:00
int res ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2008-04-15 00:20:37 -07:00
/* For now, TIPC does not allow use of connect() with DGRAM/RDM types */
2008-04-15 00:22:02 -07:00
if ( sock - > state = = SS_READY ) {
res = - EOPNOTSUPP ;
goto exit ;
}
2008-04-15 00:20:37 -07:00
/*
* Reject connection attempt using multicast address
*
* Note : send_msg ( ) validates the rest of the address fields ,
* so there ' s no need to do it here
*/
2008-04-15 00:22:02 -07:00
if ( dst - > addrtype = = TIPC_ADDR_MCAST ) {
res = - EINVAL ;
goto exit ;
}
2014-01-17 09:50:03 +08:00
previous = sock - > state ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
switch ( sock - > state ) {
case SS_UNCONNECTED :
/* Send a 'SYN-' to destination */
m . msg_name = dest ;
m . msg_namelen = destlen ;
/* If connect is in non-blocking case, set MSG_DONTWAIT to
* indicate send_msg ( ) is never blocked .
*/
if ( ! timeout )
m . msg_flags = MSG_DONTWAIT ;
2015-03-02 15:37:47 +08:00
res = __tipc_sendmsg ( sock , & m , 0 ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
if ( ( res < 0 ) & & ( res ! = - EWOULDBLOCK ) )
goto exit ;
/* Just entered SS_CONNECTING state; the only
* difference is that return value in non - blocking
* case is EINPROGRESS , rather than EALREADY .
*/
res = - EINPROGRESS ;
case SS_CONNECTING :
2014-01-17 09:50:03 +08:00
if ( previous = = SS_CONNECTING )
res = - EALREADY ;
if ( ! timeout )
goto exit ;
timeout = msecs_to_jiffies ( timeout ) ;
/* Wait until an 'ACK' or 'RST' arrives, or a timeout occurs */
res = tipc_wait_for_connect ( sock , & timeout ) ;
tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-29 18:51:19 -05:00
break ;
case SS_CONNECTED :
res = - EISCONN ;
break ;
default :
res = - EINVAL ;
2014-01-17 09:50:03 +08:00
break ;
2008-04-15 00:20:37 -07:00
}
2008-04-15 00:22:02 -07:00
exit :
release_sock ( sk ) ;
2008-04-15 00:20:37 -07:00
return res ;
2006-01-02 19:04:38 +01:00
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_listen - allow socket to listen for incoming connections
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ len : ( unused )
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_listen ( struct socket * sock , int len )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
int res ;
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2011-07-06 06:01:13 -04:00
if ( sock - > state ! = SS_UNCONNECTED )
2008-04-15 00:22:02 -07:00
res = - EINVAL ;
else {
sock - > state = SS_LISTENING ;
res = 0 ;
}
release_sock ( sk ) ;
return res ;
2006-01-02 19:04:38 +01:00
}
2014-01-17 09:50:04 +08:00
static int tipc_wait_for_accept ( struct socket * sock , long timeo )
{
struct sock * sk = sock - > sk ;
DEFINE_WAIT ( wait ) ;
int err ;
/* True wake-one mechanism for incoming connections: only
* one process gets woken up , not the ' whole herd ' .
* Since we do not ' race & poll ' for established sockets
* anymore , the common case will execute the loop only once .
*/
for ( ; ; ) {
prepare_to_wait_exclusive ( sk_sleep ( sk ) , & wait ,
TASK_INTERRUPTIBLE ) ;
2014-03-06 14:40:18 +01:00
if ( timeo & & skb_queue_empty ( & sk - > sk_receive_queue ) ) {
2014-01-17 09:50:04 +08:00
release_sock ( sk ) ;
timeo = schedule_timeout ( timeo ) ;
lock_sock ( sk ) ;
}
err = 0 ;
if ( ! skb_queue_empty ( & sk - > sk_receive_queue ) )
break ;
err = - EINVAL ;
if ( sock - > state ! = SS_LISTENING )
break ;
err = sock_intr_errno ( timeo ) ;
if ( signal_pending ( current ) )
break ;
err = - EAGAIN ;
if ( ! timeo )
break ;
}
finish_wait ( sk_sleep ( sk ) , & wait ) ;
return err ;
}
2007-02-09 23:25:21 +09:00
/**
2014-02-18 16:06:46 +08:00
* tipc_accept - wait for connection request
2006-01-02 19:04:38 +01:00
* @ sock : listening socket
* @ newsock : new socket that is to be connected
* @ flags : file - related flags associated with socket
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_accept ( struct socket * sock , struct socket * new_sock , int flags )
2006-01-02 19:04:38 +01:00
{
2012-12-04 11:01:55 -05:00
struct sock * new_sk , * sk = sock - > sk ;
2006-01-02 19:04:38 +01:00
struct sk_buff * buf ;
2014-08-22 18:09:20 -04:00
struct tipc_sock * new_tsock ;
2012-12-04 11:01:55 -05:00
struct tipc_msg * msg ;
2014-01-17 09:50:04 +08:00
long timeo ;
2008-04-15 00:22:02 -07:00
int res ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
if ( sock - > state ! = SS_LISTENING ) {
res = - EINVAL ;
2006-01-02 19:04:38 +01:00
goto exit ;
}
2014-01-17 09:50:04 +08:00
timeo = sock_rcvtimeo ( sk , flags & O_NONBLOCK ) ;
res = tipc_wait_for_accept ( sock , timeo ) ;
if ( res )
goto exit ;
2008-04-15 00:22:02 -07:00
buf = skb_peek ( & sk - > sk_receive_queue ) ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
res = tipc_sk_create ( sock_net ( sock - > sk ) , new_sock , 0 , 1 ) ;
2012-12-04 11:01:55 -05:00
if ( res )
goto exit ;
2006-01-02 19:04:38 +01:00
2012-12-04 11:01:55 -05:00
new_sk = new_sock - > sk ;
2014-08-22 18:09:20 -04:00
new_tsock = tipc_sk ( new_sk ) ;
2012-12-04 11:01:55 -05:00
msg = buf_msg ( buf ) ;
2006-01-02 19:04:38 +01:00
2012-12-04 11:01:55 -05:00
/* we lock on new_sk; but lockdep sees the lock on sk */
lock_sock_nested ( new_sk , SINGLE_DEPTH_NESTING ) ;
/*
* Reject any stray messages received by new socket
* before the socket lock was taken ( very , very unlikely )
*/
2014-08-22 18:09:18 -04:00
tsk_rej_rx_queue ( new_sk ) ;
2012-12-04 11:01:55 -05:00
/* Connect new socket to it's peer */
2014-08-22 18:09:20 -04:00
tipc_sk_finish_conn ( new_tsock , msg_origport ( msg ) , msg_orignode ( msg ) ) ;
2012-12-04 11:01:55 -05:00
new_sock - > state = SS_CONNECTED ;
2014-08-22 18:09:20 -04:00
tsk_set_importance ( new_tsock , msg_importance ( msg ) ) ;
2012-12-04 11:01:55 -05:00
if ( msg_named ( msg ) ) {
2014-08-22 18:09:20 -04:00
new_tsock - > conn_type = msg_nametype ( msg ) ;
new_tsock - > conn_instance = msg_nameinst ( msg ) ;
2006-01-02 19:04:38 +01:00
}
2012-12-04 11:01:55 -05:00
/*
* Respond to ' SYN - ' by discarding it & returning ' ACK ' - .
* Respond to ' SYN + ' by queuing it on new socket .
*/
if ( ! msg_data_sz ( msg ) ) {
struct msghdr m = { NULL , } ;
2014-08-22 18:09:18 -04:00
tsk_advance_rx_queue ( sk ) ;
2015-03-02 15:37:47 +08:00
__tipc_send_stream ( new_sock , & m , 0 ) ;
2012-12-04 11:01:55 -05:00
} else {
__skb_dequeue ( & sk - > sk_receive_queue ) ;
__skb_queue_head ( & new_sk - > sk_receive_queue , buf ) ;
2013-01-20 23:30:09 +01:00
skb_set_owner_r ( buf , new_sk ) ;
2012-12-04 11:01:55 -05:00
}
release_sock ( new_sk ) ;
2006-01-02 19:04:38 +01:00
exit :
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
/**
2014-02-18 16:06:46 +08:00
* tipc_shutdown - shutdown socket connection
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
2008-03-06 15:05:38 -08:00
* @ how : direction to close ( must be SHUT_RDWR )
2006-01-02 19:04:38 +01:00
*
* Terminates connection ( if necessary ) , then purges socket ' s receive queue .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_shutdown ( struct socket * sock , int how )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( sk ) ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2014-11-26 11:41:55 +08:00
struct sk_buff * skb ;
2014-08-22 18:09:10 -04:00
u32 dnode ;
2006-01-02 19:04:38 +01:00
int res ;
2008-03-06 15:05:38 -08:00
if ( how ! = SHUT_RDWR )
return - EINVAL ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
switch ( sock - > state ) {
2008-04-15 00:22:02 -07:00
case SS_CONNECTING :
2006-01-02 19:04:38 +01:00
case SS_CONNECTED :
restart :
2012-04-30 15:29:02 -04:00
/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
2014-11-26 11:41:55 +08:00
skb = __skb_dequeue ( & sk - > sk_receive_queue ) ;
if ( skb ) {
if ( TIPC_SKB_CB ( skb ) - > handle ! = NULL ) {
kfree_skb ( skb ) ;
2006-01-02 19:04:38 +01:00
goto restart ;
}
2015-02-05 08:36:36 -05:00
if ( tipc_msg_reverse ( tsk_own_node ( tsk ) , skb , & dnode ,
2015-01-09 15:27:10 +08:00
TIPC_CONN_SHUTDOWN ) )
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( net , skb , dnode ,
tsk - > portid ) ;
tipc_node_remove_conn ( net , dnode , tsk - > portid ) ;
2008-04-15 00:22:02 -07:00
} else {
2014-08-22 18:09:20 -04:00
dnode = tsk_peer_node ( tsk ) ;
2015-02-05 08:36:36 -05:00
skb = tipc_msg_create ( TIPC_CRITICAL_IMPORTANCE ,
2014-08-22 18:09:10 -04:00
TIPC_CONN_MSG , SHORT_H_SIZE ,
2015-02-05 08:36:36 -05:00
0 , dnode , tsk_own_node ( tsk ) ,
2014-08-22 18:09:20 -04:00
tsk_peer_port ( tsk ) ,
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
tsk - > portid , TIPC_CONN_SHUTDOWN ) ;
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( net , skb , dnode , tsk - > portid ) ;
2006-01-02 19:04:38 +01:00
}
2014-08-22 18:09:20 -04:00
tsk - > connected = 0 ;
2008-04-15 00:22:02 -07:00
sock - > state = SS_DISCONNECTING ;
2015-01-09 15:27:05 +08:00
tipc_node_remove_conn ( net , dnode , tsk - > portid ) ;
2006-01-02 19:04:38 +01:00
/* fall through */
case SS_DISCONNECTING :
2012-10-29 09:38:15 -04:00
/* Discard any unreceived messages */
2013-01-20 23:30:08 +01:00
__skb_queue_purge ( & sk - > sk_receive_queue ) ;
2012-10-29 09:38:15 -04:00
/* Wake up anyone sleeping in poll */
sk - > sk_state_change ( sk ) ;
2006-01-02 19:04:38 +01:00
res = 0 ;
break ;
default :
res = - ENOTCONN ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
2015-01-09 15:27:02 +08:00
static void tipc_sk_timeout ( unsigned long data )
2014-08-22 18:09:09 -04:00
{
2015-01-09 15:27:02 +08:00
struct tipc_sock * tsk = ( struct tipc_sock * ) data ;
struct sock * sk = & tsk - > sk ;
2014-11-26 11:41:55 +08:00
struct sk_buff * skb = NULL ;
2014-08-22 18:09:09 -04:00
u32 peer_port , peer_node ;
2015-02-05 08:36:36 -05:00
u32 own_node = tsk_own_node ( tsk ) ;
2014-08-22 18:09:09 -04:00
2014-08-22 18:09:16 -04:00
bh_lock_sock ( sk ) ;
2014-08-22 18:09:20 -04:00
if ( ! tsk - > connected ) {
2014-08-22 18:09:16 -04:00
bh_unlock_sock ( sk ) ;
goto exit ;
2014-08-22 18:09:09 -04:00
}
2014-08-22 18:09:20 -04:00
peer_port = tsk_peer_port ( tsk ) ;
peer_node = tsk_peer_node ( tsk ) ;
2014-08-22 18:09:09 -04:00
2014-08-22 18:09:20 -04:00
if ( tsk - > probing_state = = TIPC_CONN_PROBING ) {
2014-08-22 18:09:09 -04:00
/* Previous probe not answered -> self abort */
2015-02-05 08:36:36 -05:00
skb = tipc_msg_create ( TIPC_CRITICAL_IMPORTANCE ,
2015-01-09 15:27:10 +08:00
TIPC_CONN_MSG , SHORT_H_SIZE , 0 ,
2015-02-05 08:36:36 -05:00
own_node , peer_node , tsk - > portid ,
2015-01-09 15:27:10 +08:00
peer_port , TIPC_ERR_NO_PORT ) ;
2014-08-22 18:09:09 -04:00
} else {
2015-02-05 08:36:36 -05:00
skb = tipc_msg_create ( CONN_MANAGER , CONN_PROBE ,
INT_H_SIZE , 0 , peer_node , own_node ,
2015-01-09 15:27:02 +08:00
peer_port , tsk - > portid , TIPC_OK ) ;
2014-08-22 18:09:20 -04:00
tsk - > probing_state = TIPC_CONN_PROBING ;
2015-01-13 17:07:48 +08:00
sk_reset_timer ( sk , & sk - > sk_timer , jiffies + tsk - > probing_intv ) ;
2014-08-22 18:09:09 -04:00
}
bh_unlock_sock ( sk ) ;
2014-11-26 11:41:55 +08:00
if ( skb )
2015-01-09 15:27:05 +08:00
tipc_link_xmit_skb ( sock_net ( sk ) , skb , peer_node , tsk - > portid ) ;
2014-08-22 18:09:16 -04:00
exit :
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
sock_put ( sk ) ;
2014-08-22 18:09:09 -04:00
}
2014-08-22 18:09:20 -04:00
static int tipc_sk_publish ( struct tipc_sock * tsk , uint scope ,
2014-08-22 18:09:17 -04:00
struct tipc_name_seq const * seq )
{
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( & tsk - > sk ) ;
2014-08-22 18:09:17 -04:00
struct publication * publ ;
u32 key ;
2014-08-22 18:09:20 -04:00
if ( tsk - > connected )
2014-08-22 18:09:17 -04:00
return - EINVAL ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
key = tsk - > portid + tsk - > pub_count + 1 ;
if ( key = = tsk - > portid )
2014-08-22 18:09:17 -04:00
return - EADDRINUSE ;
2015-01-09 15:27:05 +08:00
publ = tipc_nametbl_publish ( net , seq - > type , seq - > lower , seq - > upper ,
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
scope , tsk - > portid , key ) ;
2014-08-22 18:09:17 -04:00
if ( unlikely ( ! publ ) )
return - EINVAL ;
2014-08-22 18:09:20 -04:00
list_add ( & publ - > pport_list , & tsk - > publications ) ;
tsk - > pub_count + + ;
tsk - > published = 1 ;
2014-08-22 18:09:17 -04:00
return 0 ;
}
2014-08-22 18:09:20 -04:00
static int tipc_sk_withdraw ( struct tipc_sock * tsk , uint scope ,
2014-08-22 18:09:17 -04:00
struct tipc_name_seq const * seq )
{
2015-01-09 15:27:05 +08:00
struct net * net = sock_net ( & tsk - > sk ) ;
2014-08-22 18:09:17 -04:00
struct publication * publ ;
struct publication * safe ;
int rc = - EINVAL ;
2014-08-22 18:09:20 -04:00
list_for_each_entry_safe ( publ , safe , & tsk - > publications , pport_list ) {
2014-08-22 18:09:17 -04:00
if ( seq ) {
if ( publ - > scope ! = scope )
continue ;
if ( publ - > type ! = seq - > type )
continue ;
if ( publ - > lower ! = seq - > lower )
continue ;
if ( publ - > upper ! = seq - > upper )
break ;
2015-01-09 15:27:05 +08:00
tipc_nametbl_withdraw ( net , publ - > type , publ - > lower ,
2014-08-22 18:09:17 -04:00
publ - > ref , publ - > key ) ;
rc = 0 ;
break ;
}
2015-01-09 15:27:05 +08:00
tipc_nametbl_withdraw ( net , publ - > type , publ - > lower ,
2014-08-22 18:09:17 -04:00
publ - > ref , publ - > key ) ;
rc = 0 ;
}
2014-08-22 18:09:20 -04:00
if ( list_empty ( & tsk - > publications ) )
tsk - > published = 0 ;
2014-08-22 18:09:17 -04:00
return rc ;
}
2014-08-22 18:09:14 -04:00
/* tipc_sk_reinit: set non-zero address in all existing sockets
* when we go from standalone to network mode .
*/
2015-01-09 15:27:08 +08:00
void tipc_sk_reinit ( struct net * net )
2014-08-22 18:09:14 -04:00
{
2015-01-09 15:27:08 +08:00
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
const struct bucket_table * tbl ;
struct rhash_head * pos ;
struct tipc_sock * tsk ;
2014-08-22 18:09:14 -04:00
struct tipc_msg * msg ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
int i ;
2014-08-22 18:09:14 -04:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
rcu_read_lock ( ) ;
2015-01-09 15:27:08 +08:00
tbl = rht_dereference_rcu ( ( & tn - > sk_rht ) - > tbl , & tn - > sk_rht ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
for ( i = 0 ; i < tbl - > size ; i + + ) {
rht_for_each_entry_rcu ( tsk , pos , tbl , i , node ) {
spin_lock_bh ( & tsk - > sk . sk_lock . slock ) ;
msg = & tsk - > phdr ;
2015-01-09 15:27:10 +08:00
msg_set_prevnode ( msg , tn - > own_addr ) ;
msg_set_orignode ( msg , tn - > own_addr ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
spin_unlock_bh ( & tsk - > sk . sk_lock . slock ) ;
}
2014-08-22 18:09:14 -04:00
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
rcu_read_unlock ( ) ;
2014-08-22 18:09:14 -04:00
}
2015-01-09 15:27:08 +08:00
static struct tipc_sock * tipc_sk_lookup ( struct net * net , u32 portid )
2014-08-22 18:09:19 -04:00
{
2015-01-09 15:27:08 +08:00
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
struct tipc_sock * tsk ;
2014-08-22 18:09:19 -04:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
rcu_read_lock ( ) ;
2015-01-09 15:27:08 +08:00
tsk = rhashtable_lookup ( & tn - > sk_rht , & portid ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
if ( tsk )
sock_hold ( & tsk - > sk ) ;
rcu_read_unlock ( ) ;
2014-08-22 18:09:19 -04:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
return tsk ;
2014-08-22 18:09:19 -04:00
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
static int tipc_sk_insert ( struct tipc_sock * tsk )
2014-08-22 18:09:19 -04:00
{
2015-01-09 15:27:08 +08:00
struct sock * sk = & tsk - > sk ;
struct net * net = sock_net ( sk ) ;
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
u32 remaining = ( TIPC_MAX_PORT - TIPC_MIN_PORT ) + 1 ;
u32 portid = prandom_u32 ( ) % remaining + TIPC_MIN_PORT ;
2014-08-22 18:09:19 -04:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
while ( remaining - - ) {
portid + + ;
if ( ( portid < TIPC_MIN_PORT ) | | ( portid > TIPC_MAX_PORT ) )
portid = TIPC_MIN_PORT ;
tsk - > portid = portid ;
sock_hold ( & tsk - > sk ) ;
2015-01-09 15:27:08 +08:00
if ( rhashtable_lookup_insert ( & tn - > sk_rht , & tsk - > node ) )
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
return 0 ;
sock_put ( & tsk - > sk ) ;
2014-08-22 18:09:19 -04:00
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
return - 1 ;
2014-08-22 18:09:19 -04:00
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
static void tipc_sk_remove ( struct tipc_sock * tsk )
2014-08-22 18:09:19 -04:00
{
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
struct sock * sk = & tsk - > sk ;
2015-01-09 15:27:08 +08:00
struct tipc_net * tn = net_generic ( sock_net ( sk ) , tipc_net_id ) ;
2014-08-22 18:09:19 -04:00
2015-01-09 15:27:08 +08:00
if ( rhashtable_remove ( & tn - > sk_rht , & tsk - > node ) ) {
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
WARN_ON ( atomic_read ( & sk - > sk_refcnt ) = = 1 ) ;
__sock_put ( sk ) ;
2014-08-22 18:09:19 -04:00
}
}
2015-01-09 15:27:08 +08:00
int tipc_sk_rht_init ( struct net * net )
2014-08-22 18:09:19 -04:00
{
2015-01-09 15:27:08 +08:00
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
struct rhashtable_params rht_params = {
. nelem_hint = 192 ,
. head_offset = offsetof ( struct tipc_sock , node ) ,
. key_offset = offsetof ( struct tipc_sock , portid ) ,
. key_len = sizeof ( u32 ) , /* portid */
. hashfn = jhash ,
. max_shift = 20 , /* 1M */
. min_shift = 8 , /* 256 */
} ;
2014-08-22 18:09:19 -04:00
2015-01-09 15:27:08 +08:00
return rhashtable_init ( & tn - > sk_rht , & rht_params ) ;
2014-08-22 18:09:19 -04:00
}
2015-01-09 15:27:08 +08:00
void tipc_sk_rht_destroy ( struct net * net )
2014-08-22 18:09:19 -04:00
{
2015-01-09 15:27:08 +08:00
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
/* Wait for socket readers to complete */
synchronize_net ( ) ;
2014-08-22 18:09:19 -04:00
2015-01-09 15:27:08 +08:00
rhashtable_destroy ( & tn - > sk_rht ) ;
2014-08-22 18:09:19 -04:00
}
2006-01-02 19:04:38 +01:00
/**
2014-02-18 16:06:46 +08:00
* tipc_setsockopt - set socket option
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ lvl : option level
* @ opt : option identifier
* @ ov : pointer to new option value
* @ ol : length of option value
2007-02-09 23:25:21 +09:00
*
* For stream sockets only , accepts and ignores all IPPROTO_TCP options
2006-01-02 19:04:38 +01:00
* ( to ease compatibility ) .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_setsockopt ( struct socket * sock , int lvl , int opt ,
char __user * ov , unsigned int ol )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2006-01-02 19:04:38 +01:00
u32 value ;
int res ;
2007-02-09 23:25:21 +09:00
if ( ( lvl = = IPPROTO_TCP ) & & ( sock - > type = = SOCK_STREAM ) )
return 0 ;
2006-01-02 19:04:38 +01:00
if ( lvl ! = SOL_TIPC )
return - ENOPROTOOPT ;
if ( ol < sizeof ( value ) )
return - EINVAL ;
2010-12-31 18:59:33 +00:00
res = get_user ( value , ( u32 __user * ) ov ) ;
if ( res )
2006-01-02 19:04:38 +01:00
return res ;
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2007-02-09 23:25:21 +09:00
2006-01-02 19:04:38 +01:00
switch ( opt ) {
case TIPC_IMPORTANCE :
2014-08-22 18:09:20 -04:00
res = tsk_set_importance ( tsk , value ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_SRC_DROPPABLE :
if ( sock - > type ! = SOCK_STREAM )
2014-08-22 18:09:20 -04:00
tsk_set_unreliable ( tsk , value ) ;
2007-02-09 23:25:21 +09:00
else
2006-01-02 19:04:38 +01:00
res = - ENOPROTOOPT ;
break ;
case TIPC_DEST_DROPPABLE :
2014-08-22 18:09:20 -04:00
tsk_set_unreturnable ( tsk , value ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_CONN_TIMEOUT :
2011-05-26 13:44:34 -04:00
tipc_sk ( sk ) - > conn_timeout = value ;
2008-04-15 00:22:02 -07:00
/* no need to set "res", since already 0 at this point */
2006-01-02 19:04:38 +01:00
break ;
default :
res = - EINVAL ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
return res ;
}
/**
2014-02-18 16:06:46 +08:00
* tipc_getsockopt - get socket option
2006-01-02 19:04:38 +01:00
* @ sock : socket structure
* @ lvl : option level
* @ opt : option identifier
* @ ov : receptacle for option value
* @ ol : receptacle for length of option value
2007-02-09 23:25:21 +09:00
*
* For stream sockets only , returns 0 length result for all IPPROTO_TCP options
2006-01-02 19:04:38 +01:00
* ( to ease compatibility ) .
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2014-02-18 16:06:46 +08:00
static int tipc_getsockopt ( struct socket * sock , int lvl , int opt ,
char __user * ov , int __user * ol )
2006-01-02 19:04:38 +01:00
{
2008-04-15 00:22:02 -07:00
struct sock * sk = sock - > sk ;
2014-03-12 11:31:12 -04:00
struct tipc_sock * tsk = tipc_sk ( sk ) ;
2007-02-09 23:25:21 +09:00
int len ;
2006-01-02 19:04:38 +01:00
u32 value ;
2007-02-09 23:25:21 +09:00
int res ;
2006-01-02 19:04:38 +01:00
2007-02-09 23:25:21 +09:00
if ( ( lvl = = IPPROTO_TCP ) & & ( sock - > type = = SOCK_STREAM ) )
return put_user ( 0 , ol ) ;
2006-01-02 19:04:38 +01:00
if ( lvl ! = SOL_TIPC )
return - ENOPROTOOPT ;
2010-12-31 18:59:33 +00:00
res = get_user ( len , ol ) ;
if ( res )
2007-02-09 23:25:21 +09:00
return res ;
2006-01-02 19:04:38 +01:00
2008-04-15 00:22:02 -07:00
lock_sock ( sk ) ;
2006-01-02 19:04:38 +01:00
switch ( opt ) {
case TIPC_IMPORTANCE :
2014-08-22 18:09:20 -04:00
value = tsk_importance ( tsk ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_SRC_DROPPABLE :
2014-08-22 18:09:20 -04:00
value = tsk_unreliable ( tsk ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_DEST_DROPPABLE :
2014-08-22 18:09:20 -04:00
value = tsk_unreturnable ( tsk ) ;
2006-01-02 19:04:38 +01:00
break ;
case TIPC_CONN_TIMEOUT :
2014-08-22 18:09:20 -04:00
value = tsk - > conn_timeout ;
2008-04-15 00:22:02 -07:00
/* no need to set "res", since already 0 at this point */
2006-01-02 19:04:38 +01:00
break ;
2010-12-31 18:59:32 +00:00
case TIPC_NODE_RECVQ_DEPTH :
tipc: eliminate aggregate sk_receive_queue limit
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.
This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.
Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.
As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-27 06:15:27 -05:00
value = 0 ; /* was tipc_queue_size, now obsolete */
2009-06-30 03:25:39 +00:00
break ;
2010-12-31 18:59:32 +00:00
case TIPC_SOCK_RECVQ_DEPTH :
2009-06-30 03:25:39 +00:00
value = skb_queue_len ( & sk - > sk_receive_queue ) ;
break ;
2006-01-02 19:04:38 +01:00
default :
res = - EINVAL ;
}
2008-04-15 00:22:02 -07:00
release_sock ( sk ) ;
2010-12-31 18:59:31 +00:00
if ( res )
return res ; /* "get" failed */
2006-01-02 19:04:38 +01:00
2010-12-31 18:59:31 +00:00
if ( len < sizeof ( value ) )
return - EINVAL ;
if ( copy_to_user ( ov , & value , sizeof ( value ) ) )
return - EFAULT ;
return put_user ( sizeof ( value ) , ol ) ;
2006-01-02 19:04:38 +01:00
}
2015-01-09 15:27:05 +08:00
static int tipc_ioctl ( struct socket * sock , unsigned int cmd , unsigned long arg )
2014-04-24 16:26:47 +02:00
{
2015-01-09 15:27:05 +08:00
struct sock * sk = sock - > sk ;
2014-04-24 16:26:47 +02:00
struct tipc_sioc_ln_req lnr ;
void __user * argp = ( void __user * ) arg ;
switch ( cmd ) {
case SIOCGETLINKNAME :
if ( copy_from_user ( & lnr , argp , sizeof ( lnr ) ) )
return - EFAULT ;
2015-01-09 15:27:05 +08:00
if ( ! tipc_node_get_linkname ( sock_net ( sk ) ,
lnr . bearer_id & 0xffff , lnr . peer ,
2014-04-24 16:26:47 +02:00
lnr . linkname , TIPC_MAX_LINK_NAME ) ) {
if ( copy_to_user ( argp , & lnr , sizeof ( lnr ) ) )
return - EFAULT ;
return 0 ;
}
return - EADDRNOTAVAIL ;
default :
return - ENOIOCTLCMD ;
}
}
2012-07-10 10:55:35 +00:00
/* Protocol switches for the various types of TIPC sockets */
2008-02-07 18:18:01 -08:00
static const struct proto_ops msg_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2011-07-06 06:01:13 -04:00
. accept = sock_no_accept ,
2014-02-18 16:06:46 +08:00
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2011-07-06 06:01:13 -04:00
. listen = sock_no_listen ,
2014-02-18 16:06:46 +08:00
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_sendmsg ,
. recvmsg = tipc_recvmsg ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct proto_ops packet_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2014-02-18 16:06:46 +08:00
. accept = tipc_accept ,
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2014-02-18 16:06:46 +08:00
. listen = tipc_listen ,
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_send_packet ,
. recvmsg = tipc_recvmsg ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct proto_ops stream_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
2014-02-18 16:06:46 +08:00
. release = tipc_release ,
. bind = tipc_bind ,
. connect = tipc_connect ,
2007-06-10 17:24:55 -07:00
. socketpair = sock_no_socketpair ,
2014-02-18 16:06:46 +08:00
. accept = tipc_accept ,
. getname = tipc_getname ,
. poll = tipc_poll ,
2014-04-24 16:26:47 +02:00
. ioctl = tipc_ioctl ,
2014-02-18 16:06:46 +08:00
. listen = tipc_listen ,
. shutdown = tipc_shutdown ,
. setsockopt = tipc_setsockopt ,
. getsockopt = tipc_getsockopt ,
. sendmsg = tipc_send_stream ,
. recvmsg = tipc_recv_stream ,
2007-07-19 10:44:56 +09:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage
2006-01-02 19:04:38 +01:00
} ;
2008-02-07 18:18:01 -08:00
static const struct net_proto_family tipc_family_ops = {
2010-12-31 18:59:32 +00:00
. owner = THIS_MODULE ,
2006-01-02 19:04:38 +01:00
. family = AF_TIPC ,
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
. create = tipc_sk_create
2006-01-02 19:04:38 +01:00
} ;
static struct proto tipc_proto = {
. name = " TIPC " ,
. owner = THIS_MODULE ,
2013-06-17 10:54:37 -04:00
. obj_size = sizeof ( struct tipc_sock ) ,
. sysctl_rmem = sysctl_tipc_rmem
2006-01-02 19:04:38 +01:00
} ;
tipc: introduce new TIPC server infrastructure
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:39 -04:00
static struct proto tipc_proto_kern = {
. name = " TIPC " ,
. obj_size = sizeof ( struct tipc_sock ) ,
. sysctl_rmem = sysctl_tipc_rmem
} ;
2006-01-02 19:04:38 +01:00
/**
2006-01-18 00:38:21 +01:00
* tipc_socket_init - initialize TIPC socket interface
2007-02-09 23:25:21 +09:00
*
2006-01-02 19:04:38 +01:00
* Returns 0 on success , errno otherwise
*/
2006-01-18 00:38:21 +01:00
int tipc_socket_init ( void )
2006-01-02 19:04:38 +01:00
{
int res ;
2007-02-09 23:25:21 +09:00
res = proto_register ( & tipc_proto , 1 ) ;
2006-01-02 19:04:38 +01:00
if ( res ) {
2012-06-29 00:16:37 -04:00
pr_err ( " Failed to register TIPC protocol type \n " ) ;
2006-01-02 19:04:38 +01:00
goto out ;
}
res = sock_register ( & tipc_family_ops ) ;
if ( res ) {
2012-06-29 00:16:37 -04:00
pr_err ( " Failed to register TIPC socket type \n " ) ;
2006-01-02 19:04:38 +01:00
proto_unregister ( & tipc_proto ) ;
goto out ;
}
out :
return res ;
}
/**
2006-01-18 00:38:21 +01:00
* tipc_socket_stop - stop TIPC socket interface
2006-01-02 19:04:38 +01:00
*/
2006-01-18 00:38:21 +01:00
void tipc_socket_stop ( void )
2006-01-02 19:04:38 +01:00
{
sock_unregister ( tipc_family_ops . family ) ;
proto_unregister ( & tipc_proto ) ;
}
2014-11-20 10:29:10 +01:00
/* Caller should hold socket lock for the passed tipc socket. */
2014-11-24 11:10:29 +01:00
static int __tipc_nl_add_sk_con ( struct sk_buff * skb , struct tipc_sock * tsk )
2014-11-20 10:29:10 +01:00
{
u32 peer_node ;
u32 peer_port ;
struct nlattr * nest ;
peer_node = tsk_peer_node ( tsk ) ;
peer_port = tsk_peer_port ( tsk ) ;
nest = nla_nest_start ( skb , TIPC_NLA_SOCK_CON ) ;
if ( nla_put_u32 ( skb , TIPC_NLA_CON_NODE , peer_node ) )
goto msg_full ;
if ( nla_put_u32 ( skb , TIPC_NLA_CON_SOCK , peer_port ) )
goto msg_full ;
if ( tsk - > conn_type ! = 0 ) {
if ( nla_put_flag ( skb , TIPC_NLA_CON_FLAG ) )
goto msg_full ;
if ( nla_put_u32 ( skb , TIPC_NLA_CON_TYPE , tsk - > conn_type ) )
goto msg_full ;
if ( nla_put_u32 ( skb , TIPC_NLA_CON_INST , tsk - > conn_instance ) )
goto msg_full ;
}
nla_nest_end ( skb , nest ) ;
return 0 ;
msg_full :
nla_nest_cancel ( skb , nest ) ;
return - EMSGSIZE ;
}
/* Caller should hold socket lock for the passed tipc socket. */
2014-11-24 11:10:29 +01:00
static int __tipc_nl_add_sk ( struct sk_buff * skb , struct netlink_callback * cb ,
struct tipc_sock * tsk )
2014-11-20 10:29:10 +01:00
{
int err ;
void * hdr ;
struct nlattr * attrs ;
2015-01-09 15:27:10 +08:00
struct net * net = sock_net ( skb - > sk ) ;
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
2014-11-20 10:29:10 +01:00
hdr = genlmsg_put ( skb , NETLINK_CB ( cb - > skb ) . portid , cb - > nlh - > nlmsg_seq ,
2015-02-09 09:50:03 +01:00
& tipc_genl_family , NLM_F_MULTI , TIPC_NL_SOCK_GET ) ;
2014-11-20 10:29:10 +01:00
if ( ! hdr )
goto msg_cancel ;
attrs = nla_nest_start ( skb , TIPC_NLA_SOCK ) ;
if ( ! attrs )
goto genlmsg_cancel ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
if ( nla_put_u32 ( skb , TIPC_NLA_SOCK_REF , tsk - > portid ) )
2014-11-20 10:29:10 +01:00
goto attr_msg_cancel ;
2015-01-09 15:27:10 +08:00
if ( nla_put_u32 ( skb , TIPC_NLA_SOCK_ADDR , tn - > own_addr ) )
2014-11-20 10:29:10 +01:00
goto attr_msg_cancel ;
if ( tsk - > connected ) {
err = __tipc_nl_add_sk_con ( skb , tsk ) ;
if ( err )
goto attr_msg_cancel ;
} else if ( ! list_empty ( & tsk - > publications ) ) {
if ( nla_put_flag ( skb , TIPC_NLA_SOCK_HAS_PUBL ) )
goto attr_msg_cancel ;
}
nla_nest_end ( skb , attrs ) ;
genlmsg_end ( skb , hdr ) ;
return 0 ;
attr_msg_cancel :
nla_nest_cancel ( skb , attrs ) ;
genlmsg_cancel :
genlmsg_cancel ( skb , hdr ) ;
msg_cancel :
return - EMSGSIZE ;
}
int tipc_nl_sk_dump ( struct sk_buff * skb , struct netlink_callback * cb )
{
int err ;
struct tipc_sock * tsk ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
const struct bucket_table * tbl ;
struct rhash_head * pos ;
2015-01-09 15:27:08 +08:00
struct net * net = sock_net ( skb - > sk ) ;
struct tipc_net * tn = net_generic ( net , tipc_net_id ) ;
2015-01-16 12:30:40 +01:00
u32 tbl_id = cb - > args [ 0 ] ;
u32 prev_portid = cb - > args [ 1 ] ;
2014-11-20 10:29:10 +01:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
rcu_read_lock ( ) ;
2015-01-09 15:27:08 +08:00
tbl = rht_dereference_rcu ( ( & tn - > sk_rht ) - > tbl , & tn - > sk_rht ) ;
2015-01-16 12:30:40 +01:00
for ( ; tbl_id < tbl - > size ; tbl_id + + ) {
rht_for_each_entry_rcu ( tsk , pos , tbl , tbl_id , node ) {
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
spin_lock_bh ( & tsk - > sk . sk_lock . slock ) ;
2015-01-16 12:30:40 +01:00
if ( prev_portid & & prev_portid ! = tsk - > portid ) {
spin_unlock_bh ( & tsk - > sk . sk_lock . slock ) ;
continue ;
}
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
err = __tipc_nl_add_sk ( skb , cb , tsk ) ;
2015-01-16 12:30:40 +01:00
if ( err ) {
prev_portid = tsk - > portid ;
spin_unlock_bh ( & tsk - > sk . sk_lock . slock ) ;
goto out ;
}
prev_portid = 0 ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
spin_unlock_bh ( & tsk - > sk . sk_lock . slock ) ;
}
2014-11-20 10:29:10 +01:00
}
2015-01-16 12:30:40 +01:00
out :
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
rcu_read_unlock ( ) ;
2015-01-16 12:30:40 +01:00
cb - > args [ 0 ] = tbl_id ;
cb - > args [ 1 ] = prev_portid ;
2014-11-20 10:29:10 +01:00
return skb - > len ;
}
2014-11-20 10:29:11 +01:00
/* Caller should hold socket lock for the passed tipc socket. */
2014-11-24 11:10:29 +01:00
static int __tipc_nl_add_sk_publ ( struct sk_buff * skb ,
struct netlink_callback * cb ,
struct publication * publ )
2014-11-20 10:29:11 +01:00
{
void * hdr ;
struct nlattr * attrs ;
hdr = genlmsg_put ( skb , NETLINK_CB ( cb - > skb ) . portid , cb - > nlh - > nlmsg_seq ,
2015-02-09 09:50:03 +01:00
& tipc_genl_family , NLM_F_MULTI , TIPC_NL_PUBL_GET ) ;
2014-11-20 10:29:11 +01:00
if ( ! hdr )
goto msg_cancel ;
attrs = nla_nest_start ( skb , TIPC_NLA_PUBL ) ;
if ( ! attrs )
goto genlmsg_cancel ;
if ( nla_put_u32 ( skb , TIPC_NLA_PUBL_KEY , publ - > key ) )
goto attr_msg_cancel ;
if ( nla_put_u32 ( skb , TIPC_NLA_PUBL_TYPE , publ - > type ) )
goto attr_msg_cancel ;
if ( nla_put_u32 ( skb , TIPC_NLA_PUBL_LOWER , publ - > lower ) )
goto attr_msg_cancel ;
if ( nla_put_u32 ( skb , TIPC_NLA_PUBL_UPPER , publ - > upper ) )
goto attr_msg_cancel ;
nla_nest_end ( skb , attrs ) ;
genlmsg_end ( skb , hdr ) ;
return 0 ;
attr_msg_cancel :
nla_nest_cancel ( skb , attrs ) ;
genlmsg_cancel :
genlmsg_cancel ( skb , hdr ) ;
msg_cancel :
return - EMSGSIZE ;
}
/* Caller should hold socket lock for the passed tipc socket. */
2014-11-24 11:10:29 +01:00
static int __tipc_nl_list_sk_publ ( struct sk_buff * skb ,
struct netlink_callback * cb ,
struct tipc_sock * tsk , u32 * last_publ )
2014-11-20 10:29:11 +01:00
{
int err ;
struct publication * p ;
if ( * last_publ ) {
list_for_each_entry ( p , & tsk - > publications , pport_list ) {
if ( p - > key = = * last_publ )
break ;
}
if ( p - > key ! = * last_publ ) {
/* We never set seq or call nl_dump_check_consistent()
* this means that setting prev_seq here will cause the
* consistence check to fail in the netlink callback
* handler . Resulting in the last NLMSG_DONE message
* having the NLM_F_DUMP_INTR flag set .
*/
cb - > prev_seq = 1 ;
* last_publ = 0 ;
return - EPIPE ;
}
} else {
p = list_first_entry ( & tsk - > publications , struct publication ,
pport_list ) ;
}
list_for_each_entry_from ( p , & tsk - > publications , pport_list ) {
err = __tipc_nl_add_sk_publ ( skb , cb , p ) ;
if ( err ) {
* last_publ = p - > key ;
return err ;
}
}
* last_publ = 0 ;
return 0 ;
}
int tipc_nl_publ_dump ( struct sk_buff * skb , struct netlink_callback * cb )
{
int err ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
u32 tsk_portid = cb - > args [ 0 ] ;
2014-11-20 10:29:11 +01:00
u32 last_publ = cb - > args [ 1 ] ;
u32 done = cb - > args [ 2 ] ;
2015-01-09 15:27:08 +08:00
struct net * net = sock_net ( skb - > sk ) ;
2014-11-20 10:29:11 +01:00
struct tipc_sock * tsk ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
if ( ! tsk_portid ) {
2014-11-20 10:29:11 +01:00
struct nlattr * * attrs ;
struct nlattr * sock [ TIPC_NLA_SOCK_MAX + 1 ] ;
err = tipc_nlmsg_parse ( cb - > nlh , & attrs ) ;
if ( err )
return err ;
err = nla_parse_nested ( sock , TIPC_NLA_SOCK_MAX ,
attrs [ TIPC_NLA_SOCK ] ,
tipc_nl_sock_policy ) ;
if ( err )
return err ;
if ( ! sock [ TIPC_NLA_SOCK_REF ] )
return - EINVAL ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
tsk_portid = nla_get_u32 ( sock [ TIPC_NLA_SOCK_REF ] ) ;
2014-11-20 10:29:11 +01:00
}
if ( done )
return 0 ;
2015-01-09 15:27:08 +08:00
tsk = tipc_sk_lookup ( net , tsk_portid ) ;
2014-11-20 10:29:11 +01:00
if ( ! tsk )
return - EINVAL ;
lock_sock ( & tsk - > sk ) ;
err = __tipc_nl_list_sk_publ ( skb , cb , tsk , & last_publ ) ;
if ( ! err )
done = 1 ;
release_sock ( & tsk - > sk ) ;
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
sock_put ( & tsk - > sk ) ;
2014-11-20 10:29:11 +01:00
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
cb - > args [ 0 ] = tsk_portid ;
2014-11-20 10:29:11 +01:00
cb - > args [ 1 ] = last_publ ;
cb - > args [ 2 ] = done ;
return skb - > len ;
}