Commit Graph

495147 Commits

Author SHA1 Message Date
Linus Lüssing
2c667a339c batman-adv: fix delayed foreign originator recognition
Currently it can happen that the reception of an OGM from a new
originator is not being accepted. More precisely it can happen that
an originator struct gets allocated and initialized
(batadv_orig_node_new()), even the TQ gets calculated and set correctly
(batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging
thread will decide to delete it if it has a chance to jump between
these two function calls.

This is because batadv_orig_node_new() initializes the last_seen value
to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to
other threads by adding it to the hash table already.
batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct,
current time a few lines later but if the purging thread jumps in between
that it will think that the orig_node timed out and will wrongly
schedule it for deletion already.

If the purging interval is the same as the originator interval (which is
the default: 1 second), then this game can continue for several rounds
until the random OGM jitter added enough difference between these
two (in tests, two to about four rounds seemed common).

Fixing this by initializing the last_seen variable of an orig_node
to the current time before adding it to the hash table.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:09 +01:00
Simon Wunderlich
329887ad13 batman-adv: fix and simplify condition when bonding should be used
The current condition actually does NOT consider bonding when the
interface the packet came in from is the soft interface, which is the
opposite of what it should do (and the comment describes). Fix that and
slightly simplify the condition.

Reported-by: Ray Gibson <booray@gmail.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06 11:05:07 +01:00
David S. Miller
a918eb9fac Merge branch 'rt_cong_ctrl'
Daniel Borkmann says:

====================
net: allow setting congctl via routing table

This is the second part of our work and allows for setting the congestion
control algorithm via routing table. For details, please see individual
patches.

Since patch 1 is a bug fix, we suggest applying patch 1 to net, and then
merging net into net-next, for example, and following up with the remaining
feature patches wrt dependencies.

Joint work with Florian Westphal, suggested by Hannes Frederic Sowa.

Patch for iproute2 is available under [1], but will be reposted with along
with the man-page update when this set hits net-next.

  [1] http://patchwork.ozlabs.org/patch/418149/

Thanks!

v2 -> v3:
 - Added module auto-loading as suggested by David Miller, thanks!
  - Added patch 2 for handling possible sleeps in fib6
  - While working on this, we discovered a bug, hence fix in patch 1
  - Added auto-loading to patch 4
 - Rebased, retested, rest the same.
v1 -> v2:
 - Very sorry, I noticed I had decnet disabled during testing.
   Added missing header include in decnet, rest as is.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:28 -05:00
Daniel Borkmann
81164413ad net: tcp: add per route congestion control
This work adds the possibility to define a per route/destination
congestion control algorithm. Generally, this opens up the possibility
for a machine with different links to enforce specific congestion
control algorithms with optimal strategies for each of them based
on their network characteristics, even transparently for a single
application listening on all links.

For our specific use case, this additionally facilitates deployment
of DCTCP, for example, applications can easily serve internal
traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
would also allow for utilizing e.g. long living, low priority
background flows for certain destinations/routes while still being
able for normal traffic to utilize the default congestion control
algorithm. We also thought about a per netns setting (where different
defaults are possible), but given its actually a link specific
property, we argue that a per route/destination setting is the most
natural and flexible.

The administrator can utilize this through ip-route(8) by appending
"congctl [lock] <name>", where <name> denotes the name of a
congestion control algorithm and the optional lock parameter allows
to enforce the given algorithm so that applications in user space
would not be allowed to overwrite that algorithm for that destination.

The dst metric lookups are being done when a dst entry is already
available in order to avoid a costly lookup and still before the
algorithms are being initialized, thus overhead is very low when the
feature is not being used. While the client side would need to drop
the current reference on the module, on server side this can actually
even be avoided as we just got a flat-copied socket clone.

Joint work with Florian Westphal.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
ea69763999 net: tcp: add RTAX_CC_ALGO fib handling
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
c5c6a8ab45 net: tcp: add key management to congestion control
This patch adds necessary infrastructure to the congestion control
framework for later per route congestion control support.

For a per route congestion control possibility, our aim is to store
a unique u32 key identifier into dst metrics, which can then be
mapped into a tcp_congestion_ops struct. We argue that having a
RTAX key entry is the most simple, generic and easy way to manage,
and also keeps the memory footprint of dst entries lower on 64 bit
than with storing a pointer directly, for example. Having a unique
key id also allows for decoupling actual TCP congestion control
module management from the FIB layer, i.e. we don't have to care
about expensive module refcounting inside the FIB at this point.

We first thought of using an IDR store for the realization, which
takes over dynamic assignment of unused key space and also performs
the key to pointer mapping in RCU. While doing so, we stumbled upon
the issue that due to the nature of dynamic key distribution, it
just so happens, arguably in very rare occasions, that excessive
module loads and unloads can lead to a possible reuse of previously
used key space. Thus, previously stale keys in the dst metric are
now being reassigned to a different congestion control algorithm,
which might lead to unexpected behaviour. One way to resolve this
would have been to walk FIBs on the actually rare occasion of a
module unload and reset the metric keys for each FIB in each netns,
but that's just very costly.

Therefore, we argue a better solution is to reuse the unique
congestion control algorithm name member and map that into u32 key
space through jhash. For that, we split the flags attribute (as it
currently uses 2 bits only anyway) into two u32 attributes, flags
and key, so that we can keep the cacheline boundary of 2 cachelines
on x86_64 and cache the precalculated key at registration time for
the fast path. On average we might expect 2 - 4 modules being loaded
worst case perhaps 15, so a key collision possibility is extremely
low, and guaranteed collision-free on LE/BE for all in-tree modules.
Overall this results in much simpler code, and all without the
overhead of an IDR. Due to the deterministic nature, modules can
now be unloaded, the congestion control algorithm for a specific
but unloaded key will fall back to the default one, and on module
reload time it will switch back to the expected algorithm
transparently.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
29ba4fffd3 net: tcp: refactor reinitialization of congestion control
We can just move this to an extra function and make the code
a bit more readable, no functional change.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Florian Westphal
e715b6d3a5 net: fib6: convert cfg metric to u32 outside of table write lock
Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Daniel Borkmann
0409c9a5a7 net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference
When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit e5fd387ad5.

One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.

Example trace from perf probe (inet_getpeer:41) where create is 1:

ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
  85e294 inet_getpeer.part.7 (<- kmem_cache_alloc())
  85e578 inet_getpeer
  8eb333 ipv6_cow_metrics
  8f10ff fib6_commit_metrics

Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.

Joint work with Florian Westphal.

Fixes: e5fd387ad5 ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
Hubert Sokolowski
6cb69742da net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined
Add checking whether the call to ndo_dflt_fdb_dump is needed.
It is not expected to call ndo_dflt_fdb_dump unconditionally
by some drivers (i.e. qlcnic or macvlan) that defines
own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
and don't want ndo_dflt_fdb_dump to be called at all.
At the same time it is desirable to call the default dump
function on a bridge device.
Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
Add extra checking in br_fdb_dump to avoid duplicate entries
as now filter_dev can be NULL.

Following tests for filtering have been performed before
the change and after the patch was applied to make sure
they are the same and it doesn't break the filtering algorithm.

[root@localhost ~]# cd /root/iproute2-3.18.0/bridge
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
[root@localhost bridge]# brctl addbr br0
[root@localhost bridge]# brctl addif  br0 dummy0
[root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
[root@localhost bridge]# # show all
[root@localhost bridge]# ./bridge fdb show
33:33:00:00:00:01 dev p2p1 self permanent
01:00:5e:00:00:01 dev p2p1 self permanent
33:33:ff:ac:ce:32 dev p2p1 self permanent
33:33:00:00:02:02 dev p2p1 self permanent
01:00:5e:00:00:fb dev p2p1 self permanent
33:33:00:00:00:01 dev p7p1 self permanent
01:00:5e:00:00:01 dev p7p1 self permanent
33:33:ff:79:50:53 dev p7p1 self permanent
33:33:00:00:02:02 dev p7p1 self permanent
01:00:5e:00:00:fb dev p7p1 self permanent
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by bridge
[root@localhost bridge]# ./bridge fdb show br br0
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by port
[root@localhost bridge]# ./bridge fdb show brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]# # filter by port + bridge
[root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]#

Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:52:06 -05:00
David S. Miller
d4253c6258 Merge branch 'ip_cmsg_csum'
Tom Herbert says:

====================
ip: Support checksum returned in csmg

This patch set allows the packet checksum for a datagram socket
to be returned in csum data in recvmsg. This allows userspace
to implement its own checksum over the data, for instance if an
IP tunnel was be implemented in user space, the inner checksum
could be validated.

Changes in this patch set:
  - Move checksum conversion to inet_sock from udp_sock. This
    generalizes checksum conversion for use with other protocols.
  - Move IP cmsg constants to a header file and make processing
    of the flags more efficient in ip_cmsg_recv
  - Return checksum value in cmsg. This is specifically the unfolded
    32 bit checksum of the full packet starting from the first byte
    returned in recvmsg

Tested: Wrote a little server to get checksums in cmsg for UDP and
        verfied correct checksum is returned.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:56 -05:00
Tom Herbert
ad6f939ab1 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
5961de9f19 ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
c44d13d6f3 ip: IP cmsg cleanup
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
they can be referenced in other source files.

Restructure ip_cmsg_recv to not go through flags using shift, check
for flags by 'and'. This eliminates both the shift and a conditional
per flag check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Tom Herbert
224d019c4f ip: Move checksum convert defines to inet
Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:44:46 -05:00
Thomas Graf
149118d893 netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel()
Calling nla_nest_cancel() in a different order as the nesting was
built up can lead to negative offsets being calculated which
results in skb_trim() being called with an underflowed unsigned
int. Warn if mark < skb->data as it's definitely a bug.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:38:22 -05:00
Linus Torvalds
b1940cd21c Linux 3.19-rc3 2015-01-05 17:05:20 -08:00
Linus Torvalds
79b8cb9737 powerpc fixes for 3.19
Wire up sys_execveat(). Tested on 32 & 64 bit.
 
 Fix for kdump on LE systems with cpus hot unplugged.
 
 Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this broke other
 platforms, we'll do a proper fix for 3.20.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUqillAAoJEFHr6jzI4aWA+aYP/0zMNyRbQehtpF9OLKw2TGJW
 LeSdQMN+g6JWKzG2hsNIiKga5kquuCi0zgtu3dW/rxlzbXY0gxwAY7HFkjzgE7Em
 qlt/iairlg8ifE436DRCkH2SXTCDN2AnlRLO35QZ2zKoo8osJuA6YjDiQRN3Y//p
 ELvWraYoVdhbfGPg+gYJn94U/OAmFu0E0VrwEmpWie1YrfTNbbfTkS/BieqhqJTu
 UEX53tW6FXbmit3ReQGGsPzcoVWbXfIg09gRy14Bn8dtA1EipHKr1m5o1JCjGu9J
 moJVB8cMkj5o4OcQrPtJZI76FkzaVueNUVTUdFtIt1q6NCyvX8tZe16OKPYY0TEN
 j8snpJfsk+oZij5caYSKqTeMAPrh6JbIjbkl8cIwc7ClRkRf/l5sW0zaFPS+GDWl
 c9OFu1CdKO59g+apQ16BmMRw8b/pV22WkxSpqm9GD3I7Y7ABoUN+1l07d5H+aGRq
 khD7M2rUWh4CwSRvg+ealpyAP++4Sj6mn/z4pe/gem4Wn/ynxGi6CSsYYijB6PWP
 bm47t+QqyomRiH65Yo41vKGNYu/JfenvAWyJp8AnIajWqxwCRLVNSgySjtylUESs
 f02qdvLsBTd3/CorZxigWgThM1wWeKyCDO6NBbjVUC/L9O8Ck9U0WbJUVQFDgG0b
 +52Wv5FRb/xBF9XdpWsw
 =1yze
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

 - Wire up sys_execveat(). Tested on 32 & 64 bit.

 - Fix for kdump on LE systems with cpus hot unplugged.

 - Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
   broke other platforms, we'll do a proper fix for 3.20.

* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
  powerpc/kdump: Ignore failure in enabling big endian exception during crash
  powerpc: Wire up sys_execveat() syscall
2015-01-05 14:49:02 -08:00
Linus Torvalds
f40bde8588 Add execveat syscall
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUqwAbAAoJEKurIx+X31iBQCkP/3W3+bo2uabm8sjz6ivBi3N8
 FW+xiBeyKoCsJIbsdr+txpWoSeIJSQ2Wgd3QwB5L4ztrsykZMSXDg5NzMnnsWh/A
 30EfjraillriN9PHt+jcBBEr43VIMoxHA4E8HOuG08tiEaEQMXyGE0PirbNIB2lu
 Dh5dq5u+fPA9KiD6lHLLfYSAq2PAOl425tsRRRWqaivWjyu0N5pvZJ7zsNN1bFv1
 zVlA++IK7XyiM8QuGIMsQ3hLbVs3tdWVeqlS3CcqHRNi2rHQaRI5oX4Rdfj573Tn
 M7LfYINdhEzzggOpAK9o3olzF9+Z7E9Dj+hSO70GercgD0m3uaRPIgTig9SGP8nq
 IctyFjhC0MfTPMql83fhmcqEyXJelzKXpELT1FG3MpPUlSMXKyOHWxwfqyfUIj1z
 5JV8hurPME0p5FlfLWVaA0o3tncx26SYZ3y7Tdd5fHoOkaZj+Wqcdgs48h5o+1UL
 VjtCh9JUTx5jwn1+hviPxi4l0gfLwcBLVMqV1F3UCBLMeFD9lRuUs9HiX6tQKXAv
 E6RR5CdGZPgzqgcmoNQU73JjleZWlMqNDgn/8QpZ1NWEOq8wkPJqKkbPL1wfO7yC
 Fso4tsRwJn7ykpz4xbEoXYaoTycuIPgqoddxv/OqWw1bTAtlk/mo1sxalrg0XLrv
 mdp5VTjy4jf+/m6HcOid
 =0lKw
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull ia64 fixlet from Tony Luck:
 "Add execveat syscall"

* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Enable execveat syscall for ia64
2015-01-05 14:31:20 -08:00
David S. Miller
a515abd777 Merge branch 'cxgb4-next'
Hariprasad Shenai says:

====================
RDMA/cxgb4/cxgb4vf/csiostor: Cleanup register defines

This series continues to cleanup all the macros/register defines related to
SGE, PCIE, MC, MA, TCAM, MAC, etc that are defined in t4_regs.h and the
affected files.

Will post another 1 or 2 series so that we can cover all the macros so that
they all follow the same style to be consistent.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4 and csiostor driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:53 -05:00
Hariprasad Shenai
0d8043389b cxgb4/cxgb4vf/csiostor: Cleanup PL, XGMAC, SF and MC related register defines
This patch cleanups all PL, XGMAC and SF related macros/register defines
that are defined in t4_regs.h and the affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:48 -05:00
Hariprasad Shenai
837e4a42bb cxgb4/csiostor: Cleanup TP, MPS and TCAM related register defines
This patch cleanups all TP, MPS and TCAM related macros/register defines
that are defined in t4_regs.h and the affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:48 -05:00
Hariprasad Shenai
89c3a86cc7 cxgb4/cxg4vf/csiostor: Cleanup MC, MA and CIM related register defines
This patch cleanups all MC, MA and CIM related macros/register defines that are
defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:47 -05:00
Hariprasad Shenai
f061de42e6 cxgb4/cxgb4vf/csiostor: Cleanup SGE and PCI related register defines
This patch cleansup remaining SGE related macros/register defines and all PCI
related ones that are defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:47 -05:00
Hariprasad Shenai
f612b815d7 RDMA/cxgb4/cxgb4vf/csiostor: Cleanup SGE register defines
This patch cleanups all SGE related macros/register defines that are
defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:47 -05:00
Sathya Perla
5f07b3c51a be2net: support TX batching using skb->xmit_more flag
This patch uses skb->xmit_more flag to batch TX requests.
TX is flushed either when xmit_more is false or there is
no more space in the TXQ.

Skyhawk-R and BEx chips require an even number of wrbs to be posted.
So, when a batch of TX requests is accumulated, the last header wrb
may need to be fixed with an extra dummy wrb.

This patch refactors be_xmit() routine as a sequence of be_xmit_enqueue()
and be_xmit_flush() calls. The Tx completion code is also
updated to be able to unmap/free a batch of skbs rather than a single
skb.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:32:53 -05:00
Krzysztof Kozlowski
889ee2c7d7 at86rf230: Constify struct regmap_config
The regmap_config struct may be const because it is not modified by the
driver and regmap_init() accepts pointer to const.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:30:50 -05:00
Tony Luck
b739896dd2 [IA64] Enable execveat syscall for ia64
See commit 51f39a1f0c
    syscalls: implement execveat() system call

Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-01-05 11:25:19 -08:00
Johannes Berg
1e359a5de8 Revert "mac80211: Fix accounting of the tailroom-needed counter"
This reverts commit ca34e3b5c8.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Cc: stable@vger.kernel.org
Fixes: ca34e3b5c8 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez <chrischavez@gmx.us>
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Debugged-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-05 10:33:46 +01:00
Ying Xue
a3449ded12 list_nulls: fix missing header
Fixup below build error:

include/linux/list_nulls.h: In function ‘hlist_nulls_del’:
include/linux/list_nulls.h:84:13: error: ‘LIST_POISON2’ undeclared (first use in this function)

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 23:11:43 -05:00
Ying Xue
86b35b64ed rhashtable: fix missing header
Fixup below build error:

include/linux/rhashtable.h: At top level:
include/linux/rhashtable.h:118:34: error: field ‘mutex’ has incomplete type

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 23:11:43 -05:00
Joe Perches
3620af0efa qlcnic: Fix dump_skb output
Use normal facilities to avoid printing each byte
on a separate line.

Now emits at KERN_DEBUG instead of KERN_INFO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:24:30 -05:00
Govindarajulu Varadarajan
3f255dcc97 enic: reconfigure resources for kdump crash kernel
When running in kdump kernel, reduce number of resources used by the driver.
This will enable NIC to operate in low memory kdump kernel environment.

Also change the driver version to .83

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:23:43 -05:00
David S. Miller
9ace422320 Merge branch 'geneve-next'
Jesse Gross says:

====================
Geneve Cleanups

Much of the basis for the Geneve code comes from VXLAN. However,
Geneve is quite a bit simpler than VXLAN and so this cleans up a
lot of the infrastruction - particularly around locking - where the
extra complexity is not necessary.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:39 -05:00
Jesse Gross
46b1e4f911 geneve: Check family when reusing sockets.
When searching for an existing socket to reuse, the address family
is not taken into account - only port number. This means that an
IPv4 socket could be used for IPv6 traffic and vice versa, which
is sure to cause problems when passing packets.

It is not possible to trigger this problem currently because the
only user of Geneve creates just IPv4 sockets. However, that is
likely to change in the near future.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross
df5dba8e52 geneve: Remove socket hash table.
The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross
829a3ada9c geneve: Simplify locking.
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.

In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Jesse Gross
61f3cade76 geneve: Remove workqueue.
The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:21:33 -05:00
Felipe Balbi
7ce67a38f7 net: ethernet: cpsw: fix hangs with interrupts
The CPSW IP implements pulse-signaled interrupts. Due to
that we must write a correct, pre-defined value to the
CPDMA_MACEOIVECTOR register so the controller generates
a pulse on the correct IRQ line to signal the End Of
Interrupt.

The way the driver is written today, all four IRQ lines
are requested using the same IRQ handler and, because of
that, we could fall into situations where a TX IRQ fires
but we tell the controller that we ended an RX IRQ (or
vice-versa). This situation triggers an IRQ storm on the
reserved IRQ 127 of INTC which will in turn call ack_bad_irq()
which will, then, print a ton of:

	unexpected IRQ trap at vector 00

In order to fix the problem, we are moving all calls to
cpdma_ctlr_eoi() inside the IRQ handler and making sure
we *always* write the correct value to the CPDMA_MACEOIVECTOR
register. Note that the algorithm assumes that IRQ numbers and
value-to-be-written-to-EOI are proportional, meaning that a
write of value 0 would trigger an EOI pulse for the RX_THRESHOLD
Interrupt and that's the IRQ number sitting in the 0-th index
of our irqs_table array.

This, however, is safe at least for current implementations of
CPSW so we will refrain from making the check smarter (and, as
a side-effect, slower) until we actually have a platform where
IRQ lines are swapped.

This patch has been tested for several days with AM335x- and
AM437x-based platforms. AM57x was left out because there are
still pending patches to enable ethernet in mainline for that
platform. A read of the TRM confirms the statement on previous
paragraph.

Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly)
Cc: <stable@vger.kernel.org> # v3.9+
Signed-off-by: Felipe Balbi <balbi@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04 22:18:34 -05:00
Linus Torvalds
693a30b8f1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
 "Two fixes for UML regressions. Nothing exciting"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
2015-01-04 11:46:43 -08:00
Pavel Machek
4bf9636c39 Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo"
Commit 9fc2105aea ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-04 11:39:59 -08:00
Daniel Borkmann
b485342bd7 x86, um: actually mark system call tables readonly
Commit a074335a37 ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a37 ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-01-04 14:21:25 +01:00
Richard Weinberger
f911d73105 um: Skip futex_atomic_cmpxchg_inatomic() test
futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>
2015-01-04 14:20:26 +01:00
David S. Miller
7beceebf5b Merge branch 'rhashtable-next'
Thomas Graf says:

====================
rhashtable: Per bucket locks & deferred table resizing

Prepares for and introduces per bucket spinlocks and deferred table
resizing. This allows for parallel table mutations in different hash
buckets from atomic context. The resizing occurs in the background
in a separate worker thread while lookups, inserts, and removals can
continue.

Also modified the chain linked list to be terminated with a special
nulls marker to allow entries to move between multiple lists.

Last but not least, reintroduces lockless netlink_lookup() with
deferred Netlink socket destruction to avoid the side effect of
increased netlink_release() runtime.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:33:03 -05:00
Thomas Graf
21e4902aea netlink: Lockless lookup with RCU grace period in socket release
Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()

This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf
f89bd6f87a rhashtable: Supports for nulls marker
In order to allow for wider usage of rhashtable, use a special nulls
marker to terminate each chain. The reason for not using the existing
nulls_list is that the prev pointer usage would not be valid as entries
can be linked in two different buckets at the same time.

The 4 nulls base bits can be set through the rhashtable_params structure
like this:

struct rhashtable_params params = {
        [...]
        .nulls_base = (1U << RHT_BASE_SHIFT),
};

This reduces the hash length from 32 bits to 27 bits.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf
97defe1ecf rhashtable: Per bucket locks & deferred expansion/shrinking
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts.  Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf
113948d841 spinlock: Add spin_lock_bh_nested()
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf
897362e446 nft_hash: Remove rhashtable_remove_pprev()
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().

With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.

Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00
Thomas Graf
b8e1943e9f rhashtable: Factor out bucket_tail() function
Subsequent patches will require access to the bucket tail. Access
to the tail is relatively cheap as the automatic resizing of the
table should keep the number of entries per bucket to no more
than 0.75 on average.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03 14:32:57 -05:00