IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
net: Fix routing tables with id > 255 for legacy software
sky2: Hold RTNL while calling dev_close()
s2io iomem annotations
atl1: fix suspend regression
qeth: start dev queue after tx drop error
qeth: Prepare-function to call s390dbf was wrong
qeth: reduce number of kernel messages
qeth: Use ccw_device_get_id().
qeth: layer 3 Oops in ip event handler
virtio: use callback on empty in virtio_net
virtio: virtio_net free transmit skbs in a timer
virtio: Fix typo in virtio_net_hdr comments
virtio_net: Fix skb->csum_start computation
ehea: set mac address fix
sfc: Recover from RX queue flush failure
add missing lance_* exports
ixgbe: fix typo
forcedeth: msi interrupts
ipsec: pfkey should ignore events when no listeners
pppoe: Unshare skb before anything else
...
Step 8.5 in RFC 4340 says for the newly cloned socket
Initialize S.GAR := S.ISS,
but what in fact the code (minisocks.c) does is
Initialize S.GAR := S.ISR,
which is wrong (typo?) -- fixed by the patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in computing the inter-packet-interval t_ipi = s/X:
scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
than 2^26 Bps (~537 Mbps).
Using full 64-bit division now.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).
The smallest tabulated value of
10^6 * f(p) = sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)
for p=0.0001 is 8172.
Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.
This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.
Also corrected the printk format (%u for u32).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes an oversight from an earlier patch, ensuring that Ack Vectors
are not processed on request sockets.
The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:
"Using CCID-specific options and feature options during a negotiation
for the corresponding CCID feature is NOT RECOMMENDED [...]"
And it is not even possible: when the server receives the Request from the
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch fixes the following sparse warnings:
* nested min(max()) expression:
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
* Declaration of function prototypes in .c instead of .h file, resulting in
"should it be static?" warnings.
* Declared "struct dccpw" static (local to dccp_probe).
* Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
("Receivers SHOULD implement delayed acknowledgement timers ...").
* Used a different local variable name to avoid
net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
net/dccp/ackvec.c:238:33: originally declared here
* Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
In commit $(825de27d9e) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.
Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.
All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.
The fix is to use the fallback-RTT from RFC 4340, 3.4.
This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).
This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When pfkey has no km listeners, it still does a lot of work
before finding out there aint nobody out there.
If a tree falls in a forest and no one is around to hear it, does it make
a sound? In this case it makes a lot of noise:
With this short-circuit adding 10s of thousands of SAs using
netlink improves performance by ~10%.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bindv6only is tuned via sysctl. It is already on a struct net
and per-net sysctls allow for its modification (ipv6_sysctl_net_init).
Despite this the value configured in the init net is used for the
rest of them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a check to the set_channel flow. When attempting to change
the channel while in IBSS mode, and the new channel does not support IBSS
mode, the flow return with an error value with no consequences on the
mac80211 and driver state.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sufficient scans (at least 2 or 3) should have been done within 7
seconds to find an existing IBSS to join. This should improve IBSS
creation latency; and since IBSS merging is still in effect, shouldn't
have detrimental effects on eventual IBSS convergence.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes the issue of slow reconnection to an IBSS cell after
disconnection from it. Now the interface's bssid is reset upon ifdown.
ieee80211_sta_find_ibss:
if (found && memcmp(ifsta->bssid, bssid, ETH_ALEN) != 0 &&
(bss = ieee80211_rx_bss_get(dev, bssid,
local->hw.conf.channel->center_freq,
ifsta->ssid, ifsta->ssid_len)))
Note:
In general disconnection is still not handled properly in mac80211
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise userspace has no idea the IBSS creation succeeded.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- Don't trust a length which is greater than the working buffer.
An invalid length could cause overflow when calculating buffer size
for decoding oid.
- An oid length of zero is invalid and allows for an off-by-one error when
decoding oid because the first subid actually encodes first 2 subids.
- A primitive encoding may not have an indefinite length.
Thanks to Wei Wang from McAfee for report.
Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
tcp: Increment OUTRSTS in tcp_send_active_reset()
raw: Raw socket leak.
lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
ipw2200: expire and use oldest BSS on adhoc create
airo warning fix
b43legacy: Fix controller restart crash
sctp: Fix ECN markings for IPv6
sctp: Flush the queue only once during fast retransmit.
sctp: Start T3-RTX timer when fast retransmitting lowest TSN
sctp: Correctly implement Fast Recovery cwnd manipulations.
sctp: Move sctp_v4_dst_saddr out of loop
sctp: retran_path update bug fix
tcp: fix skb vs fack_count out-of-sync condition
sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
xfrm: xfrm_algo: correct usage of RIPEMD-160
...
skb_splice_bits temporary drops the socket lock while iterating over
the socket queue in order to break a reverse locking condition which
happens with sendfile. This, however, opens a window of opportunity
for tcp_collapse() to aggregate skbs and thus potentially free the
current skb used in skb_splice_bits and tcp_read_sock.
This patch fixes the problem by (re-)getting the same "logical skb"
after the lock has been temporary dropped.
Based on idea and initial patch from Evgeniy Polyakov.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP "resets sent" counter is not incremented when a TCP Reset is
sent via tcp_send_active_reset().
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program below just leaks the raw kernel socket
int main() {
int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
inet_aton("127.0.0.1", &addr.sin_addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(2048);
sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr));
return 0;
}
Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
1) DSACK in the middle of previous SACK block must be generated.
2) In order to take that particular branch, part or all of the
DSACKed segment must already be SACKed so that we have that
in cache in the first place.
3) The new info must be top enough so that fackets_out will be
updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.
It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.
Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).
This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.
This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).
Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is not allowed to change underlying protocol for
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified. If the ipi6_ifindex is the not-exist interface,
it should be fail.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);
This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
helper functions: addrconf_timeout_fixup() and
addrconf_finite_timeout().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).
Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.
[Use unsigned int instead to generate better code - yoshfuji]
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.
Bug report and initial patch by Denis V. Lunev <den@openvz.org>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.
I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).
This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ 63.531438] =================================
[ 63.531520] [ INFO: inconsistent lock state ]
[ 63.531520] 2.6.26-rc4 #7
[ 63.531520] ---------------------------------
[ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[ 63.531520] {softirq-on-W} state was registered at:
[ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080
[ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0
[ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40
[ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...
According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xt_connlimit match module, the counter of an IP is decreased when
the TCP packet is go through the chain with ip_conntrack state TW.
Well, it's very natural that the server and client close the socket
with FIN packet. But when the client/server close the socket with RST
packet(using so_linger), the counter for this connection still exsit.
The following patch can fix it which is based on linux-2.6.25.4
Signed-off-by: Dong Wei <dwei.zh@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field was supposed to allow the creation of an anycast route by
assigning an anycast address to an address prefix. It was never
implemented so this field is unused and serves no purpose. Remove it.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.
Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an unused policy entry for an attribute which is
only used in kernel->user direction.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also removes an obsolete check for the unused flag RTCF_MASQ.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to compute copy twice in the frags loop in
dma_skb_copy_datagram_iovec().
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbor table time of last use information is returned in the
incorrect unit. Kernel to user space ABI's need to use USER_HZ (or
milliseconds), otherwise the application has to try and discover the
real system HZ value which is problematic. Linux has standardized on
keeping USER_HZ consistent (100hz) even when kernel is running
internally at some other value.
This change is small, but it breaks the ABI for older version of
iproute2 utilities. But these utilities are already broken since they
are looking at the psched_hz values which are completely different. So
let's just go ahead and fix both kernel and user space. Older
utilities will just print wrong values.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bad type/protocol specified result in sk leak.
Fix is simple - release the sk if bad values are given,
but to make it possible just to call sk_free(), I move
some sk initialization a bit lower.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Jarek Poplawski <jarkao2@gmail.com>
There is only one function in AX25 calling skb_append(), and it really
looks suspicious: appends skb after previously enqueued one, but in
the meantime this previous skb could be removed from the queue.
This patch Fixes it the simple way, so this is not fully compatible with
the current method, but testing hasn't shown any problems.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's logic in __rfcomm_dlc_close:
rfcomm_dlc_lock(d);
d->state = BT_CLOSED;
d->state_changed(d, err);
rfcomm_dlc_unlock(d);
In rfcomm_dev_state_change, it's possible that rfcomm_dev_put try to
take the dlc lock, then we will deadlock.
Here fixed it by unlock dlc before rfcomm_dev_get in
rfcomm_dev_state_change.
why not unlock just before rfcomm_dev_put? it's because there's
another problem. rfcomm_dev_get/rfcomm_dev_del will take
rfcomm_dev_lock, but in rfcomm_dev_add the lock order is :
rfcomm_dev_lock --> dlc lock
so I unlock dlc before the taken of rfcomm_dev_lock.
Actually it's a regression caused by commit
1905f6c736 ("bluetooth :
__rfcomm_dlc_close lock fix"), the dlc state_change could be two
callbacks : rfcomm_sk_state_change and rfcomm_dev_state_change. I
missed the rfcomm_sk_state_change that time.
Thanks Arjan van de Ven <arjan@linux.intel.com> for the effort in
commit 4c8411f8c1 ("bluetooth: fix
locking bug in the rfcomm socket cleanup handling") but he missed the
rfcomm_dev_state_change lock issue.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
llc_sap_rcv was being preceded by skb_set_owner_r, then calling
llc_state_process that calls sock_queue_rcv_skb, that in turn calls
skb_set_owner_r again making the space allowed to be used by the socket to be
leaked, making the socket to get stuck.
Fix it by setting skb->sk at llc_sap_rcv and leave the accounting to be done
only at sock_queue_rcv_skb.
Reported-by: Dmitry Petukhov <dmgenp@gmail.com>
Tested-by: Dmitry Petukhov <dmgenp@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
in net/bluetooth/rfcomm/sock.c, rfcomm_sk_state_change() does the
following operation:
if (parent && sock_flag(sk, SOCK_ZAPPED)) {
/* We have to drop DLC lock here, otherwise
* rfcomm_sock_destruct() will dead lock. */
rfcomm_dlc_unlock(d);
rfcomm_sock_kill(sk);
rfcomm_dlc_lock(d);
}
}
which is fine, since rfcomm_sock_kill() will call sk_free() which will call
rfcomm_sock_destruct() which takes the rfcomm_dlc_lock()... so far so good.
HOWEVER, this assumes that the rfcomm_sk_state_change() function always gets
called with the rfcomm_dlc_lock() taken. This is the case for all but one
case, and in that case where we don't have the lock, we do a double unlock
followed by an attempt to take the lock, which due to underflow isn't
going anywhere fast.
This patch fixes this by moving the stragling case inside the lock, like
the other usages of the same call are doing in this code.
This was found with the help of the www.kerneloops.org project, where this
deadlock was observed 51 times at this point in time:
http://www.kerneloops.org/search.php?search=rfcomm_sock_destruct
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addresses an alignment issue with compare_ether_addr().
The addresses passed to compare_ether_addr should be two bytes aligned.
It may function properly in x86 platform. However may not work properly
on IA-64 or ARM processor.
This also fixes a typo in mlme.c where the sk_buff struct name is incorect.
Though sizeof() works for any incorrect structure pointer name as its just
a pointer length that we want, lets just fix it.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This addresses a NULL pointer dereference in sta_info_get().
TID and sta_info are extracted in ADDBA Timer expiry function
through the timer handler's argument.
The problem is extracging the TID (which was stored in
timer_to_tid[] array of type "u8") through "int *" typecast which
may also yield unwanted bytes for the MSB of TID that results
in incorrect sta_info and ieee80211_local pointers.
ieee80211_local pointer is NULL as illustrated below, it crashes in
sta_info_get(). The problem started when extracting ieee80211_local
pointer out of sta_info iteself and eventually crashed in
stat_info_get().
The proper way to fix is to change the data type of TID to u8
instead of u16. However changing all the occurences requires
some prototype changes as well. We should fix this in upcoming
patches.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: Luis Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
fix a typo in ieee80211_handle_filtered_frame comment
Signed-off-by: Yi Zhu <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
iwconfig was showing incorrect status messages when disassociated.
Patch fixes this by always checking for association status in
ioctl calls for getting ap address.
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch switch order of channel and freq (SIOCGIWFREQ) reports
in scan results in order to overcome wpa_supplicant inability
to handle channel numbers in 5.2Ghz band.
Wext reporting channel number is ambiguous as channels 7-12 (802.11j)
exist on both bands.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes iee80211_rx_bss_put/get imbalance
introduced by 'mac80211: enable IBSS merging' patch.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The identification of this bug is thanks to Cheng Wei and Tomasz
Grobelny.
To avoid divide-by-zero, the implementation previously ignored RTTs
smaller than 4 microseconds when performing integer division RTT/4.
When the RTT reached a value less than 4 microseconds (as observed on
loopback), this prevented the Window Counter CCVal value from
advancing. As a result, the receiver stopped sending feedback. This in
turn caused non-ending expiries of the nofeedback timer at the sender,
so that the sending rate was progressively reduced until reaching the
minimum of one packet per 64 seconds.
The patch fixes this bug by handling integer division more
intelligently. Due to consistent use of dccp_sample_rtt(),
divide-by-zero-RTT is avoided.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4340 said:
8.5. Pseudocode
...
If P.type is not Data, Ack, or DataAck and P.X == 0 (the packet
has short sequence numbers), drop packet and return
But DCCP has some mistake to handle short sequence numbers packet, now
it drop packet only if P.type is Data, Ack, or DataAck and P.X == 0.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (52 commits)
vlan: Use bitmask of feature flags instead of seperate feature bits
fmvj18x_cs: add NextCom NC5310 rev B support
xirc2ps_cs: re-initialize the multicast address in do_reset
3C509: rx_bytes should not be increased when alloc_skb failed
NETFRONT: Use __skb_queue_purge()
VIRTIO: Use __skb_queue_purge()
phylib: do EXPORT_SYMBOL on get_phy_id
netlink: Fix nla_parse_nested_compat() to call nla_parse() directly
WAN: protect HDLC proto list while insmod/rmmod
drivers/net/fs_enet: remove null pointer dereference
S2io: Version update for napi and MSI-X patches
S2io: Added napi support when MSIX is enabled.
S2io: Move all the transmit completions to a single msi-x (alarm) vector
drivers/net/ehea - remove unnecessary memset after kzalloc
au1000_eth: remove useless check
Blackfin EMAC Driver: Removed duplicated include <linux/ethtool.h>
cpmac bugfixes and enhancements
e1000e: use resource_size_t, not unsigned long, for phys addrs
net/usb: add support for Apple USB Ethernet Adapter
uli526x: add support for netpoll
...
As git-grep shows, open_softirq() is always called with the last argument
being NULL
block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
This observation has already been made by Matthew Wilcox in June 2002
(http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
"I notice that none of the current softirq routines use the data element
passed to them."
and the situation hasn't changed since them. So it appears we can safely
remove that extra argument to save 128 (54) bytes of kernel data (text).
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Herbert Xu points out that the use of seperate feature bits for features
to be propagated to VLAN devices is going to get messy real soon.
Replace the VLAN feature bits by a bitmask of feature flags to be
propagated and restore the old GSO_SHIFT/MASK values.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: The world is not perfect patch.
tcp: Make prior_ssthresh a u32
xfrm_user: Remove zero length key checks.
net/ipv4/arp.c: Use common hex_asc helpers
cassini: Only use chip checksum for ipv4 packets.
tcp: TCP connection times out if ICMP frag needed is delayed
netfilter: Move linux/types.h inclusions outside of #ifdef __KERNEL__
af_key: Fix selector family initialization.
libertas: Fix ethtool statistics
mac80211: fix NULL pointer dereference in ieee80211_compatible_rates
mac80211: don't claim iwspy support
orinoco_cs: add ID for SpeedStream wireless adapters
hostap_cs: add ID for Conceptronic CON11CPro
rtl8187: resource leak in error case
ath5k: Fix loop variable initializations
Unless there will be any objection here, I suggest consider the
following patch which simply removes the code for the
-DI_WISH_WORLD_WERE_PERFECT in the three methods which use it.
The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT
show that this code was not built and not used for really a long time.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The crypto layer will determine whether that is valid
or not.
Suggested by Herbert Xu, based upon a report and patch
by Martin Willi.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Here the local hexbuf is a duplicate of global const char hex_asc from
lib/hexdump.c, except the hex letters' cases:
const char hexbuf[] = "0123456789ABCDEF";
const char hex_asc[] = "0123456789abcdef";
and here to print HW addresses, the hex cases are not significant.
Thanks to Harvey Harrison to introduce the hex_asc_hi/hex_asc_lo helpers.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are seeing an issue with TCP in handling an ICMP frag needed
message that is received after net.ipv4.tcp_retries1 retransmits.
The default value of retries1 is 3. So if the path mtu changes
and ICMP frag needed is lost for the first 3 retransmits or if
it gets delayed until 3 retransmits are done, TCP doesn't update
MSS correctly and continues to retransmit the orginal message
until it timesout after tcp_retries2 retransmits.
I am seeing this issue even with the latest 2.6.25.4 kernel.
In tcp_retransmit_timer(), when retransmits counter exceeds
tcp_retries1 value, the dst cache entry of the socket is reset.
At this time, if we receive an ICMP frag needed message, the
dst entry gets updated with the new MTU, but the TCP sockets
dst_cache entry remains NULL.
So the next time when we try to retransmit after the ICMP frag
needed is received, tcp_retransmit_skb() gets called. Here the
cur_mss value is calculated at the start of the routine with
a NULL sk_dst_cache. Instead we should call tcp_current_mss after
the rebuild_header that caches the dst entry with the updated mtu.
Also the rebuild_header should be called before tcp_fragment
so that skb is fragmented if the mss goes down.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This propagates the xfrm_user fix made in commit
bcf0dda8d2 ("[XFRM]: xfrm_user: fix
selector family initialization")
Based upon a bug report from, and tested by, Alan Swanson.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a possible NULL pointer dereference in ieee80211_compatible_rates
introduced in the patch "mac80211: fix association with some APs". If no bss
is available just use all supported rates in the association request.
Signed-off-by: Helmut Schaa <hschaa@suse.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-2.6.26' of git://linux-nfs.org/~bfields/linux: (25 commits)
svcrdma: Verify read-list fits within RPCSVC_MAXPAGES
svcrdma: Change svc_rdma_send_error return type to void
svcrdma: Copy transport address and arm CQ before calling rdma_accept
svcrdma: Set rqstp transport address in rdma_read_complete function
svcrdma: Use ib verbs version of dma_unmap
svcrdma: Cleanup queued, but unprocessed I/O in svc_rdma_free
svcrdma: Move the QP and cm_id destruction to svc_rdma_free
svcrdma: Add reference for each SQ/RQ WR
svcrdma: Move destroy to kernel thread
svcrdma: Shrink scope of spinlock on RQ CQ
svcrdma: Use standard Linux lists for context cache
svcrdma: Simplify RDMA_READ deferral buffer management
svcrdma: Remove unused READ_DONE context flags bit
svcrdma: Return error from rdma_read_xdr so caller knows to free context
svcrdma: Fix error handling during listening endpoint creation
svcrdma: Free context on post_recv error in send_reply
svcrdma: Free context on ib_post_recv error
svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler
svcrdma: Fix return value in svc_rdma_send
svcrdma: Fix race with dto_tasklet in svc_rdma_send
...
The following courruption can happen during pktgen stop:
list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
kernel BUG at lib/list_debug.c:67!
:pktgen:pktgen_thread_worker+0x374/0x10b0
? autoremove_wake_function+0x0/0x40
? _spin_unlock_irqrestore+0x42/0x80
? :pktgen:pktgen_thread_worker+0x0/0x10b0
kthread+0x4d/0x80
child_rip+0xa/0x12
? restore_args+0x0/0x30
? kthread+0x0/0x80
? child_rip+0x0/0x12
RIP list_del+0x48/0x70
The problem is that pktgen_thread_worker can not be executed if kthread_stop
has been called too early. Insert a completion on the normal initialization
path to make sure that pktgen_thread_worker will gain the control for sure.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We removed iwspy support a very long time ago because it is useless, but
forgot to stop claiming to support it. Apparently, nobody cares, but
remove it nonetheless.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Propagate feature bits from the NETDEV_FEAT_CHANGE notifier. For now
only TSO is propagated for devices that announce their ability to
support TSO in combination with VLAN accel by setting the NETIF_F_VLAN_TSO
flag.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 30688a9 ([VLAN]: Handle vlan devices net namespace changing)
changed the device notifier to special-case notifications for VLAN
devices, effectively disabling state propagation to underlying VLAN
devices. This is needed for layered VLANs though, so restore the
original behaviour.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Am I just being particularly dim today, or can the call to
dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never
happen?
We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the
condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be
true.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cls_api should return ENOENT when the requested classifier doesn't
exist.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because the IPsec output function xfrm_output_resume does its
own dst_output call it should always call __ip_local_output
instead of ip_local_output as the latter may invoke dst_output
directly. Otherwise the return values from nf_hook and dst_output
may clash as they both use the value 1 but for different purposes.
When that clash occurs this can cause a packet to be used after
it has been freed which usually leads to a crash. Because the
offending value is only returned from dst_output with qdiscs
such as HTB, this bug is normally not visible.
Thanks to Marco Berizzi for his perseverance in tracking this
down.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to handle infinite prefix lifetime specially.
With help from original reporter "Bonitch, Joseph"
<Joseph.Bonitch@xerox.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We could not see appropriate lifetime if the route had been scheduled
to expired at 0 (in jiffies). We should check rt6i_flags instead of
rt6i_expires to determine whether lifetime is valid or not.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of arithmetic overflow avoidance, the actual lifetime setting
(vs the value given by RA) did not increase monotonically around
0x7fffffff/HZ.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed from Al Viro <viro@ftp.linux.org.uk> via David Miller
<davem@davemloft.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A RDMA read-list cannot contain more elements than RPCSVC_MAXPAGES or
it will overflow the DTO context. Verify this when processing the
protocol header.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
This race was found by inspection. Messages can be received from the peer
immediately following the rdma_accept call, however, the CQ have not yet
been armed and the transport address has not yet been set.
Set the transport address in the connect request handler and arm the CQ
prior to calling rdma_accept.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The rdma_read_complete function needs to copy the rqstp transport address
from the transport. Failure to do so can result in using the wrong
authentication method for the RPC or bug checking if the rqstp address
is not valid.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Use the ib_verbs version of the dma_unmap service in the
svc_rdma_put_context function. This should support providers
using software rdma.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
When the transport is closing, the DTO tasklet may queue data
that never gets processed. Clean up resources associated with
this I/O.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Move the destruction of the QP and CM_ID to the free path so that the
QP cleanup code doesn't race with the dto_tasklet handling flushed WR.
The QP reference is not needed because we now have a reference for
every WR.
Also add a guard in the SQ and RQ completion handlers to ignore
calls generated by some providers when the QP is destroyed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Some providers may wait while destroying adapter resources.
Since it is possible that the last reference is put on the
dto_tasklet, the actual destroy must be scheduled as a work item.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The rq_cq_reap function is only called from the dto_tasklet. The
only resource shared with other threads is the sc_rq_dto_q. Move the
spin lock to protect only this list.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Replace the one-off linked list implementation used to implement the
context cache with the standard Linux list_head lists. Add a context
counter to catch resource leaks. A WARN_ON will be added later to
ensure that we've freed all contexts.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC. This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The rdma_read_xdr function did not discriminate between no read-list and
an error posting the read-list. This results in a leak of a page if there
is an error posting the read-list.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
A listening endpoint isn't known to the generic transport switch until
the svc_create_xprt function returns without error. Calling
svc_xprt_put within the xpo_create function causes the module reference
count to be erroneously decremented.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
If an error is encountered trying to post a recv buffer in send_reply,
free the passed in context. Return an error to the caller so it is
aware that the request was not posted.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
If there is an error posting the recv WR to the RQ, free the
context associated with the WR. This would leak a context when
asynchronous errors occurred on the transport while conccurent threads
were processing their RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The svcrdma transport takes a reference when it gets the ESTABLISHED
event from the provider. This reference is supposed to be removed when
the DISCONNECT event is received, however, the call to svc_xprt_put
was missing in the switch statement. This results in the memory
associated with the transport never being freed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Fix the return value on close to -ENOTCONN so caller knows to free context.
Also if a thread is waiting for free SQ space, check for close when waking
to avoid posting WR to a closing transport.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The svc_rdma_send function will attempt to reap SQ WR to make room for
a new request if it finds the SQ full. This function races with the
dto_tasklet that also reaps SQ WR. To avoid polling and arming the CQ
unnecessarily move the test_and_clear_bit of the RDMAXPRT_SQ_PENDING
flag and arming of the CQ to the sq_cq_reap function.
Refactor the rq_cq_reap function to match sq_cq_reap so that the
code is easier to follow.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
The svcrdma transport provider currently allocates receive buffers
to the RQ through the xpo_release_rqst method. This approach is overly
complicated since it means that the rqstp rq_xprt_ctxt has to be
selectively set based on whether the RPC is going to be processed
immediately or deferred. Instead, just post the receive buffer when
we are certain that we are replying in the send_reply function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Remove a redundant check for the XPT_DEAD bit in the svc_xprt_enqueue
function. This same bit is checked below while holding the pool lock
and prints a debug message if found to be dead.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.
For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.
Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit f15364bd4c ("IPv6 support for NFS
server export caches") dropped a couple spaces, rendering the output
here difficult to read.
(However note that we expect the output to be parsed only by humans, not
machines, so this shouldn't have broken any userland software.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Apparently this causes Solaris 10 servers to refuse our NFSv4 SETCLIENTID
calls. Fall back to root creds for now, since most servers that care are
very likely to have root squashing enabled.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since commit e38bad4766
mac80211: make ieee80211_iterate_active_interfaces not need rtnl
rt2500usb and rt73usb broke down due to attempting register access
in atomic context (which is not possible for USB hardware).
This patch restores ieee80211_iterate_active_interfaces() to use RTNL lock,
and provides the non-RTNL version under a new name:
ieee80211_iterate_active_interfaces_atomic()
So far only rt2x00 uses ieee80211_iterate_active_interfaces(), and those
drivers require the RTNL version of ieee80211_iterate_active_interfaces().
Since they already call that function directly, this patch will automatically
fix the USB rt2x00 drivers.
v2: Rename ieee80211_iterate_active_interfaces_rtnl
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes the association problem with 11n hidden ssid ap.
Patch fixes the problem of associating with hidden ssid when
all three parameters ap,essid and channel are given to iwconfig.
This patch removes the condition of checking three parameters
and always checks for bss in bss list while associating.
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
device_rename can fail with -EEXIST or -ENOMEM, so handle any
problems.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix error path during early mount
9p: make cryptic unknown error from server less scary
9p: fix flags length in net
9p: Correct fidpool creation failure in p9_client_create
9p: use struct mutex instead of struct semaphore
9p: propagate parse_option changes to client and transports
fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
9p: Documentation updates
add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
net/irda/irnet/irnet_irda.c: In function 'irnet_discovery_indication':
net/irda/irnet/irnet_irda.c:1676: error: implicit declaration of function 'get_unaligned'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was some cleanup issues during early mount which would trigger
a kernel bug for certain types of failure. This patch reorganizes the
cleanup to get rid of the bad behavior.
This also merges the 9pnet and 9pnet_fd modules for the purpose of
configuration and initialization. Keeping the fd transport separate
from the core 9pnet code seemed like a good idea at the time, but in
practice has caused more harm and confusion than good.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Right now when we get an error string from the server that we can't
map we report a cryptic error that actually makes it look like we are
reporting a problem with the client. This changes the text of the log
message to clarify where the error is coming from.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Some files in the net/9p directory uses "int" for flags. This can
cause hard to find bugs on some architectures. This patch converts the
flags to use "long" instead.
This bug was discovered by doing an allyesconfig make on the -rt kernel
where checks are done to ensure all flags are of size sizeof(long).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
On error, p9_idpool_create returns an ERR_PTR-encoded errno.
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Replace semaphores protecting use flags with a mutex.
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Propagate changes that were made to the parse_options code to the
other parse options pieces present in the other modules. Looks like
the client parse options was probably corrupting the parse string
and causing problems for others.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The kernel-doc comments of much of the 9p system have been in disarray since
reorganization. This patch fixes those problems, adds additional documentation
and a template book which collects the 9p information.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (73 commits)
net: Fix typo in net/core/sock.c.
ppp: Do not free not yet unregistered net device.
netfilter: xt_iprange: module aliases for xt_iprange
netfilter: ctnetlink: dump conntrack ID in event messages
irda: Fix a misalign access issue. (v2)
sctp: Fix use of uninitialized pointer
cipso: Relax too much careful cipso hash function.
tcp FRTO: work-around inorder receivers
tcp FRTO: Fix fallback to conventional recovery
New maintainer for Intel ethernet adapters
DM9000: Use delayed work to update MII PHY state
DM9000: Update and fix driver debugging messages
DM9000: Add __devinit and __devexit attributes to probe and remove
sky2: fix simple define thinko
[netdrvr] sfc: sfc: Add self-test support
[netdrvr] sfc: Increment rx_reset when reported as driver event
[netdrvr] sfc: Remove unused macro EFX_XAUI_RETRAIN_MAX
[netdrvr] sfc: Fix code formatting
[netdrvr] sfc: Remove kernel-doc comments for removed members of struct efx_nic
[netdrvr] sfc: Remove garbage from comment
...
In sock_queue_rcv_skb() (net/core/sock.c) it should be:
"Cast sk->rcvbuf ..." instead of: "Cast skb->rcvbuf ..."
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using iptables 1.3.8 with kernel 2.6.25, rules which include '-m
iprange' don't automatically pull in xt_iprange module. Below patch
adds module aliases to fix that. Patch against latest -git, but seems
like a good candidate for -stable also.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conntrack ID is not put (anymore ?) in event messages. This causes
current ulogd2 code to fail because it uses the ID to build a hash in
userspace. This hash is used to be able to output the starting time of
a connection.
Conntrack ID can be used in userspace application to maintain an easy
match between kernel connections list and userspace one. It may worth
to add it if there is no performance related issue.
[ Patrick: it was never included in events, but really should be ]
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace u16ho with put/get_unaligned functions
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced by c4492586 (sctp: Add address type check while process
paramaters of ASCONF chunk):
net/sctp/sm_make_chunk.c: In function 'sctp_process_asconf':
net/sctp/sm_make_chunk.c:2828: warning: 'addr_param' may be used uninitialized in this function
net/sctp/sm_make_chunk.c:2828: note: 'addr_param' was declared here
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cipso_v4_cache is allocated to contain CIPSO_V4_CACHE_BUCKETS
buckets. The CIPSO_V4_CACHE_BUCKETS = 1 << CIPSO_V4_CACHE_BUCKETBITS,
where CIPSO_V4_CACHE_BUCKETBITS = 7.
The bucket-selection function for this hash is calculated like this:
bkt = hash & (CIPSO_V4_CACHE_BUCKETBITS - 1);
^^^
i.e. picking only 4 buckets of possible 128 :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If receiver consumes segments successfully only in-order, FRTO
fallback to conventional recovery produces RTO loop because
FRTO's forward transmissions will always get dropped and need to
be resent, yet by default they're not marked as lost (which are
the only segments we will retransmit in CA_Loss).
Price to pay about this is occassionally unnecessarily
retransmitting the forward transmission(s). SACK blocks help
a bit to avoid this, so it's mainly a concern for NewReno case
though SACK is not fully immune either.
This change has a side-effect of fixing SACKFRTO problem where
it didn't have snd_nxt of the RTO time available anymore when
fallback become necessary (this problem would have only occured
when RTO would occur for two or more segments and ECE arrives
in step 3; no need to figure out how to fix that unless the
TODO item of selective behavior is considered in future).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that commit 009a2e3e4e ("[TCP] FRTO: Improve
interoperability with other undo_marker users") run into
another land-mine which caused fallback to conventional
recovery to break:
1. Cumulative ACK arrives after FRTO retransmission
2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp
which should be kept like in CA_Loss state it would be
3. undo_marker change allowed tcp_packet_delayed to return
true because of the cleared retrans_stamp once FRTO is
terminated causing LossUndo to occur, which means all loss
markings FRTO made are reverted.
This means that the conventional recovery basically recovered
one loss per RTT, which is not that efficient. It was quite
unobvious that the undo_marker change broken something like
this, I had a quite long session to track it down because of
the non-intuitiviness of the bug (luckily I had a trivial
reproducer at hand and I was also able to learn to use kprobes
in the process as well :-)).
This together with the NewReno+FRTO fix and FRTO in-order
workaround this fixes Damon's problems, this and the first
mentioned are enough to fix Bugzilla #10063.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Sebastian Hyrwall <zibbe@cisko.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This assigns the netdev's needed_headroom/tailroom members to take
advantage of pre-allocated space for 802.11 headers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some APs refuse association if the supported rates contained in the
association request do not match its own supported rates. This patch
introduces a new function which builds the intersection between the AP's
supported rates and the client's supported rates to work around such
problems. The same approach is already used in ipw2200 for example.
Signed-off-by: Helmut Schaa <hschaa@suse.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>