33993 Commits

Author SHA1 Message Date
David S. Miller
eae3f88ee4 net: Separate out SKB validation logic from transmit path.
dev_hard_start_xmit() does two things, it first validates and
canonicalizes the SKB, then it actually sends it.

Make a set of helper functions for doing the first part.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
95f6b3dda2 net: Have xmit_list() signal more==true when appropriate.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
fa2dbdc253 net: Pass a "more" indication down into netdev_start_xmit() code paths.
For now it will always be false.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
7f2e870f2a net: Move main gso loop out of dev_hard_start_xmit() into helper.
There is a slight policy change happening here as well.

The previous code would drop the entire rest of the GSO skb if any of
them got, for example, a congestion notification.

That makes no sense, anything NET_XMIT_MASK and below is something
like congestion or policing.  And in the congestion case it doesn't
even mean the packet was actually dropped.

Just continue until dev_xmit_complete() evaluates to false.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
2ea2551375 net: Create xmit_one() helper for dev_hard_start_xmit()
Hopefully making the code a bit easier to read and digest.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
10b3ad8c21 net: Do txq_trans_update() in netdev_start_xmit()
That way we don't have to audit every call site to make sure it is
doing this properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
Tom Herbert
202863fe4c sctp: Change sctp to implement csum_levels
CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to
appropriate account for this by decrementing csum_level. This is
done by calling __skb_dec_checksum_unnecessary.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:41:11 -07:00
Tom Herbert
662880f442 net: Allow GRO to use and set levels of checksum unnecessary
Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
and to report new checksums verfied for use in fallback to normal
path.

Change GRO checksum path to track csum_level using a csum_cnt field
in NAPI_GRO_CB. On GRO initialization, if ip_summed is
CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
skb->csum_level + 1. For each checksum verified, decrement
NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
deeper checksum than originally indicated in skbuf so increment
csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
CHECKSUM_NONE or CHECKSUM_COMPLETE).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:41:11 -07:00
Tom Herbert
77cffe23c1 net: Clarification of CHECKSUM_UNNECESSARY
This patch:
 - Clarifies the specific requirements of devices returning
   CHECKSUM_UNNECESSARY (comments in skbuff.h).
 - Adds csum_level field to skbuff. This is used to express how
   many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
   This replaces the overloading of skb->encapsulation, that field is
   is now only used to indicate inner headers are valid.
 - Change __skb_checksum_validate_needed to "consume" each checksum
   as indicated by csum_level as layers of the the packet are parsed.
 - Remove skb_pop_rcv_encapsulation, no longer needed in the new
   csum_level model.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:41:11 -07:00
Ying Xue
cc086fcf92 tipc: fix a potential oops
Commit 6c9808ce09f7 ("tipc: remove port_lock") accidentally involves
a potential bug: when tipc socket instance(tsk) is not got with given
reference number in tipc_sk_get(), tsk is set to NULL. Subsequently
we jump to exit label where to decrease socket reference counter
pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL,
oops may happen because of touching a NULL pointer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:22:43 -07:00
Daniel Borkmann
10c51b5623 net: add skb_get_tx_queue() helper
Replace occurences of skb_get_queue_mapping() and follow-up
netdev_get_tx_queue() with an actual helper function.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:02:07 -07:00
Florian Fainelli
246d7f773c net: dsa: add Broadcom SF2 switch driver
Add support for the Broadcom Starfigther 2 switch chip using a DSA
driver. This switch driver supports the following features:

- configuration of the external switch port interface: MII, RevMII,
  RGMII and RGMII_NO_ID are supported
- support for the per-port MIB counters
- support for link interrupts for special ports (e.g: MoCA)
- powering up/down of switch memories to conserve power when ports are
  unused

Finally, update the compatible property for the DSA core code to match
our switch top-level compatible node.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
5037d532b8 net: dsa: add Broadcom tag RX/TX handler
Add support for the 4-bytes Broadcom tag that built-in switches such as
the Starfighter 2 might insert when receiving packets, or that we need
to insert while targetting specific switch ports. We use a fake local
EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure
we can assign DSA-specific network operations within the DSA drivers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
ce31b31c68 net: dsa: allow updating fixed PHY link information
Allow switch drivers to hook a PHY link update callback to perform
port-specific link work.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
ec9436baed net: dsa: allow drivers to do link adjustment
Whenever libphy determines that the link status of a given PHY/port has
changed, allow to call into the switch driver link adjustment callback
so proper actions can be taken care of by the switch driver upon link
notification.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
5aed85cec2 net: dsa: allow switches to work without tagging
In case switch port tagging is disabled (voluntarily, or the switch just
does not support it), allow us to continue using the defined set of
dsa_device_ops in net/dsa/slave.c.

We introduce dsa_protocol_is_tagged() to check whether we need to
override skb->protocol and go through the DSA-specifif packet_type
function, or if we just go on and receive the SKB through the normal
path.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
0d8bcdd383 net: dsa: allow for more complex PHY setups
Modify the DSA slave interface to be bound to an arbitray PHY, not just
the ones that are available as child PHY devices of the switch MDIO bus.

This allows us for instance to have external PHYs connected to a
separate MDIO bus, but yet also connected to a given switch port.

Under certain configurations, the physical port mask might not be a 1:1
mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the
switch is used and connects to a PHY at a MDIO address different than 1.

Introduce a phys_mii_mask variable which allows driver to implement and
divert their own MDIO read/writes operations for a subset of the MDIO
PHY addresses.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
bd47497a01 net: dsa: retain a per-port device_node pointer
We will later use the per-port device_node pointer to fetch a bunch of
port-specific properties.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:40 -07:00
Florian Fainelli
fa981d9af8 net: dsa: provide a switch device device tree node pointer
We might need to fetch additional resources from the device tree node
pointer, such as register ranges or other properties. Keep a device_node
pointer around for this.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:39 -07:00
Florian Fainelli
3e8a72d1da net: dsa: reduce number of protocol hooks
DSA is currently registering one packet_type function per EtherType it
needs to intercept in the receive path of a DSA-enabled Ethernet device.
Right now we have three of them: trailer, DSA and eDSA, and there might
be more in the future, this will not scale to the addition of new
protocols.

This patch proceeds with adding a new layer of abstraction and two new
functions:

dsa_switch_rcv() which will dispatch into the tag-protocol specific
receive function implemented by net/dsa/tag_*.c

dsa_slave_xmit() which will dispatch into the tag-protocol specific
transmit function implemented by net/dsa/tag_*.c

When we do create the per-port slave network devices, we iterate over
the switch protocol to assign the DSA-specific receive and transmit
operations.

A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
like it used to be.

This allows us to greatly simplify the check in eth_type_trans() and
always override the skb->protocol with ETH_P_XDSA for Ethernet switches
tagged protocol, while also reducing the number repetitive slave
netdevice_ops assignments.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 22:59:39 -07:00
Florian Westphal
253ff51635 tcp: syncookies: mark cookie_secret read_mostly
only written once.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-27 16:30:49 -07:00
WANG Cong
453a940ea7 net: make skb an optional parameter for__skb_flow_dissect()
Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 17:21:26 -07:00
WANG Cong
6451b3f59a net: fix comments for __skb_flow_get_ports()
Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 17:21:26 -07:00
David S. Miller
4798248e4e net: Add ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 23:02:45 -07:00
Ian Morris
4c83acbc56 ipv6: White-space cleansing : gaps between function and symbol export
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Ian Morris
cc24becae3 ipv6: White-space cleansing : Structure layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch addresses structure definitions, specifically it cleanses the brace
placement and replaces spaces with tabs in a few places.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Ian Morris
67ba4152e8 ipv6: White-space cleansing : Line Layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Tom Herbert
48a5fc7731 gre: When GRE csum is present count as encap layer wrt csum
In GRE demux if the GRE checksum pop rcv encapsulation so that any
encapsulated checksums are treated as tunnel checksums.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert
57c67ff4bd udp: additional GRO support
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert
149d0774a7 tcp: Call skb_gro_checksum_validate
In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP
checksum in the gro context.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:24 -07:00
Tom Herbert
758f75d1ff gre: call skb_gro_checksum_simple_validate
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:23 -07:00
Tom Herbert
573e8fca25 net: skb_gro_checksum_* functions
Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 18:09:23 -07:00
Daniel Borkmann
8fc54f6891 net: use reciprocal_scale() helper
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 12:21:21 -07:00
David S. Miller
690e36e726 net: Allow raw buffers to be passed into the flow dissector.
Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 12:13:41 -07:00
Jon Paul Maloy
301bae56f2 tipc: merge struct tipc_port into struct tipc_sock
We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.

We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.

Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy
808d90f9c5 tipc: remove files ref.h and ref.c
The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy
2e84c60b77 tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy
0fc87aaebd tipc: remove source file port.c
In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.

There are only cosmetic changes to the moved functions.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:35 -07:00
Jon Paul Maloy
6c9808ce09 tipc: remove port_lock
In previous commits we have reduced usage of port_lock to a minimum,
and complemented it with usage of bh_lock_sock() at the remaining
locations. The purpose has been to remove this lock altogether, since
it largely duplicates the role of bh_lock_sock. We are now ready to do
this.

However, we still need to protect the BH callers from inadvertent
release of the socket while they hold a reference to it. We do this by
replacing port_lock by a combination of a rw-lock protecting the
reference table as such, and updating the socket reference counter while
the socket is referenced from BH. This technique is more standard and
comprehensible than the previous approach, and turns out to have a
positive effect on overall performance.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
9b50fd087a tipc: replace port pointer with socket pointer in registry
In order to make tipc_sock the only entity referencable from other
parts of the stack, we add a tipc_sock pointer instead of a tipc_port
pointer to the registry. As a consequence, we also let the function
tipc_port_lock() return a pointer to a tipc_sock instead  of a tipc_port.
We keep the function's name for now, since the lock still is owned by
the port.

This is another step in the direction of eliminating port_lock, replacing
its usage with lock_sock() and bh_lock_sock().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
5a9ee0be33 tipc: use registry when scanning sockets
The functions tipc_port_get_ports() and tipc_port_reinit() scan over
all sockets/ports to access each of them. This is done by using a
dedicated linked list, 'tipc_socks' where all sockets are members. The
list is in turn protected by a spinlock, 'port_list_lock', while each
socket is locked by using port_lock at the moment of access.

In order to reduce complexity and risk of deadlock, we want to get
rid of the linked list and the accompanying spinlock.

This is what we do in this commit. Instead of the linked list, we use
the port registry to scan across the sockets. We also add usage of
bh_lock_sock() inside the scope of port_lock in both functions, as a
preparation for the complete removal of port_lock.

Finally, we move the functions from port.c to socket.c, and rename them
to tipc_sk_sock_show() and tipc_sk_reinit() repectively.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
5b8fa7ce82 tipc: eliminate functions tipc_port_init and tipc_port_destroy
After the latest changes to the socket/port layer the existence of
the functions tipc_port_init() and tipc_port_destroy() cannot be
justified. They are both called only once, from tipc_sk_create() and
tipc_sk_delete() respectively, and their functionality can better be
merged into the latter two functions.

This also entails that all remaining references to port_lock now are
made from inside socket.c, something that will make it easier to remove
this lock.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
739f5e4efc tipc: redefine message acknowledge function
The function tipc_acknowledge() is a remnant from the obsolete native
API. Currently, it grabs port_lock, before building an acknowledge
message and sending it to the peer.

Since all access to socket members now is protected by the socket lock,
it has become unnecessary to grab port_lock here.

In this commit, we remove the usage of port_lock, simplify the
function, and move it to socket.c, renaming it to tipc_sk_send_ack().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
dadebc0029 tipc: eliminate port_connect()/port_disconnect() functions
tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
native API. Their only task is to grab port_lock and call the functions
__tipc_port_connect()/__tipc_port_disconnect() respectively, which will
perform the actual state change.

Since socket/port exection now is single-threaded the use of port_lock
is not needed any more, so we can safely replace the two functions with
their lock-free counterparts.

In this commit, we remove the two functions. Furthermore, the contents
of __tipc_port_disconnect() is so trivial that we choose to eliminate
that function too, expanding its functionality into tipc_shutdown().
__tipc_port_connect() is simplified, moved to socket.c, and given the
more correct name tipc_sk_finish_conn(). Finally, we eliminate the
function auto_connect(), and expand its contents into filter_connect().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
80e44c2225 tipc: eliminate function tipc_port_shutdown()
tipc_port_shutdown() is a remnant from the now obsolete native
interface. As such it grabs port_lock in order to protect itself
from concurrent BH processing.

However, after the recent changes to the port/socket upcalls, sockets
are now basically single-threaded, and all execution, except the read-only
tipc_sk_timer(), is executing within the protection of lock_sock(). So
the use of port_lock is not needed here.

In this commit we eliminate the whole function, and merge it into its
only caller, tipc_shutdown().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:34 -07:00
Jon Paul Maloy
5728901581 tipc: clean up socket timer function
The last remaining BH upcall to the socket, apart for the message
reception function tipc_sk_rcv(), is the timer function.

We prefer to let this function continue executing in BH, since it only
does read-acces to semi-permanent data, but we make three changes to it:

1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
   of port_lock.  This is a preparation for replacing port_lock with
   bh_lock_sock() at the locations where it is still used.

2) We move the function from port.c to socket.c, as a further step
   of eliminating the port code level altogether.

3) We let it make use of the newly introduced tipc_msg_create()
   function. This enables us to get rid of three context specific
   functions (port_create_self_abort_msg() etc.) in port.c

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy
02be61a981 tipc: use message to abort connections when losing contact to node
In the current implementation, each 'struct tipc_node' instance keeps
a linked list of those ports/sockets that are connected to the node
represented by that struct. The purpose of this is to let the node
object know which sockets to alert when it loses contact with its peer
node, i.e., which sockets need to have their connections aborted.

This entails an unwanted direct reference from the node structure
back to the port/socket structure, and a need to grab port_lock
when we have to make an upcall to the port. We want to get rid of
this unecessary BH entry point into the socket, and also eliminate
its use of port_lock.

In this commit, we instead let the node struct keep list of "connected
socket" structs, which each represents a connected socket, but is
allocated independently by the node at the moment of connection. If
the node loses contact with its peer node, the list is traversed, and
a "connection abort" message is created for each entry in the list. The
message is sent to it respective connected socket using the ordinary
data path, and the receiving socket aborts its connections upon reception
of the message.

This enables us to get rid of the direct reference from 'struct node' to
´struct port', and another unwanted BH access point to the latter.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy
50100a5e39 tipc: use pseudo message to wake up sockets after link congestion
The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
Jon Paul Maloy
1dd0bd2b14 tipc: introduce new function tipc_msg_create()
The function tipc_msg_init() has turned out to be of limited value
in many cases. It take too few parameters to be usable for creating
a complete message, it makes too many assumptions about what the
message should be used for, and it does not allocate any buffer to
be returned to the caller.

Therefore, we now introduce the new function tipc_msg_create(), which
takes all the parameters needed to create a full message, and returns
a buffer of the requested size. The new function will be very useful
for the changes we will be doing in later commits in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:18:33 -07:00
David S. Miller
f9474ddfaa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pulling to get some TIPC fixes that a net-next series depends
upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23 11:12:08 -07:00