IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Conflicts:
drivers/net/Makefile
net/ipv6/sysctl_net_ipv6.c
Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.
In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling a Virtual Interface can result in an interrupt during the processing
of the VI Enable command and, in some paths, result in an attempt to issue
another command in the interrupt context, eventually crashing the system. Thus,
we disable interrupts during the course of the VI Enable command and ensure
enable doesn't sleep.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
USB network drivers are already handled in drivers/net/usb/Kconfig.
Let's save the maintenance burden of dependencies in drivers/net/Makefile.
The newly introduced USB_NET_DRIVERS umbrella config option defaults
to 'y' so as to minimize the changes of behavior.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3_tso_bug() was originally designed to handle only HW TX ring 0, Commit
d3f6f3a1d8 ("tg3: Prevent page allocation failure
during TSO workaround") changed the driver logic to use tg3_tso_bug() for all
HW TX rings that are enabled. This patch fixes the regression by modifying
tg3_tso_bug() to handle multiple HW TX rings.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver update 2014-08-05
The following series of patches includes fixes/updates to the driver.
- Use dma_set_mask_and_coherent to set the DMA mask
- Move the phy connect/disconnect logic to allow for module unloading
Changes in V2:
- Check the return value of the dma_set_mask_and_coherent call
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A change added to the mdiobus/phy api added a module_get/module_put
during phy connect/disconnect processing. Currently, the driver
performs a phy connect during module probe and a phy disconnect during
module remove. With the addition of the module_get during phy connect
the amd-xgbe module use count is incremented and can no longer be
unloaded.
Move the phy connect/disconnect from the driver probe/remove functions
to the net_device_ops ndo_open/ndo_stop functions. This allows the
module use count to be decremented when the device(s) are brought down
and allows the module to be unloaded.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the dma_set_mask_and_coherent function to set the DMA mask rather
than setting the DMA mask fields directly. This was originally done
to work around a bug in the arm64 DMA support when RAM started above
the 4GB boundary which has since been fixed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon reception of a new frame, the emac driver checks for a number
of error conditions, and flag the packet as "bad" if any of these
are present. It then allocates a skb unconditionally, but only uses
it if the packet is "good". On the error path, the skb is just forgotten,
and the system leaks memory.
The piece of junk I have on my desk seems to encounter such error
frequently enough so that the box goes OOM after a couple of days,
which makes me grumpy.
Fix this by moving the allocation on the "good_packet" path (and
convert it to netdev_alloc_skb while we're at it).
Tested on a random Allwinner A20 board.
Cc: Stefan Roese <sr@denx.de>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: <stable@vger.kernel.org> # 3.11+
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a71e3c3796 ("net: phy: Set the driver when registering an MDIO bus
device") caused the following regression on the fec driver:
root@imx6qsabresd:~# echo mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.003 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
Unable to handle kernel NULL pointer dereference at virtual address 0000002c
pgd = bcd14000
[0000002c] *pgd=4d9e0831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 617 Comm: sh Not tainted 3.16.0 #17
task: bc0c4e00 ti: bceb6000 task.ti: bceb6000
PC is at fec_suspend+0x10/0x70
LR is at dpm_run_callback.isra.7+0x34/0x6c
pc : [<803f8a98>] lr : [<80361f44>] psr: 600f0013
sp : bceb7d70 ip : bceb7d88 fp : bceb7d84
r10: 8091523c r9 : 00000000 r8 : bd88f478
r7 : 803f8a88 r6 : 81165988 r5 : 00000000 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : bd88f478 r0 : bd88f478
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: 4cd1404a DAC: 00000015
Process sh (pid: 617, stack limit = 0xbceb6240)
Stack: (0xbceb7d70 to 0xbceb8000)
....
The problem with the original commit is explained by Russell King:
"It has the effect (as can be seen from the oops) of attaching the MDIO bus
device (itself is a bus-less device) to the platform driver, which means
that if the platform driver supports power management, it will be called
to power manage the MDIO bus device.
Moreover, drivers do not expect to be called for power management
operations for devices which they haven't probed, and certainly not for
devices which aren't part of the same bus that the driver is registered
against."
This reverts commit a71e3c3796.
Cc: <stable@vger.kernel.org> #3.16
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to turn off SGE RX/TX Callback Timers & interrupt in cxgb4vf PCI Shutdown
routine in order to prevent crashes during reboot/poweroff when traffic is
running.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- kmalloc_array instead of kmalloc when possible
- avoid log spam due to useless net_ratelimit() invocations
- increase default metric hop penalty from 15 to 30
- update internal version number
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJT4Ir0AAoJEJgn97Bh2u9eYG8P/2Bx9uCbajNQSCBvQ20OXW4o
YzubqMETLel6GLoVjL2XFsmJATaQGGWYWGSytDF2sRa1C7/dXXfUrwgfFNzw2L9Z
nZSd8WbhiBW0LK/ECzTQ12M3VnTdjgj3IP9A58N9zqQf5uM1onQR6gR7IQGARn9S
D5onFdoRz5bWcVffyf4YN9irMkvAFjOB7OzNtBaHRbH5oObITS10LcQ7rVLAic/0
lyMCX1ioBnFpbH1YfII0dcSFjY22m9QTjJDj4dx6LbjDXaxv9BMiJ7bfjCt5wn1o
0ju5X898fhTX0L4Z67pGGHzawByyXtrf1r9INot8K8oqftq5vkHpIJdn9n0GrYGa
9FVoQ0hzDVSp7mxnBKnbXyaf47HX8RYRBZegHEDW8zrpO0zj7cyruRKifDL3q9R9
cqGIRYSFPaXEGF6HfmxWXCjst1VpddcTBgD7OWlKuTi/Juk2ZoWI4O+h3WlotOv8
niTiG0DiNoGlrLXjFlya6TdfL43zg3sE8O4oukcVO4uejo+wdL+Rc7ZGuO7CbO9k
V+ORQ/8Efg3XD8xRSmzi0pgyBfU/C6HTTWCIv8DIMfl0B5zoeIx75quDP9/CKIfz
UTF0R1+mA0o6UmSiIngqRTFEEB9cWrMvCjM1drMb1V99NzI+/fg1skO7pcnJRddC
rbPgeTHDAYgMsPj74rhs
=EIGl
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
pull request: batman-adv 2014-08-05
this is a pull request intended for net-next/linux-3.17 (yeah..it's really
late).
Patches 1, 2 and 4 are really minor changes:
- kmalloc_array is substituted to kmalloc when possible (as suggested by
checkpatch);
- net_ratelimited() is now used properly and the "suppressed" message is not
printed anymore if not needed;
- the internal version number has been increased to reflect our current version.
Patch 3 instead is introducing a change in the metric computation function
by changing the penalty applied at each mesh hop from 15/255 (~6%) to
30/255 (~11%). This change is introduced by Simon Wunderlich after having
observed a performance improvement in several networks when using the new value.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable "err" is not necessary.
Return register_netdevice() directly.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an
unnecessary operation.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
net-timestamp: new tx tstamps and tcp
Extend socket tx timestamping:
- allow multiple types of software timestamps aside from send (1)
- add software timestamp on enter packet scheduling (4)
- add software timestamp for TCP (5)
- add software timestamp for TCP on ACK (6)
The sk_flags option space is nearly exhausted. Also move the
many timestamp options to a new sk->sk_tstamps (2).
To disambiguate data when tstamps may arrive out of order,
optionally return a sequential ID assigned at send (3).
Extend Linux tx timestamping to monitoring of latency
incurred within the kernel stack and to protocols embedded in TCP.
Complex kernel setups may have multiple layers of queueing, including
multiple instances of packet scheduling, and many classes per layer.
Many applications embed discrete payloads into TCP bytestreams for
reliability, flow control, etcetera. Detecting application tail
latency in such scenarios relies on identifying the exact queue
responsible if on the host, or the network latency if otherwise.
Changelog:
v4->v5
- define SCM_TSTAMP_SND == 0, for legacy behavior
- add TCP tstamps without changing the generated byte stream
- modify GSO and ACK to find offset: slightly more complex
than previous invariant that it is the last byte
- consistent naming of packet scheduling
- rename SCM_TSTAMP_ENQ to SCM_TSTAMP_SCHED
- add unique key in ee_data
- add id field in ee_info to disambiguate tstamps
- optional, only on new flag SOF_TIMESTAMPING_OPT_ID
- for bytestream, in bytes
v3->v4
- (v3 review comment) removed skb->mark packet identification (*A)
- (v3 review comment) fixed indentation
- tcp: fixed poll() to return POLLERR on non-zero queue
- rebased to work without syststamp
- comments: removed all traces of MSG_TSTAMP_.. (*B)
v2->v3
- extend the SO_TIMESTAMPING API, instead of defining a new one.
- add protocol independent support to correlate tstamps with data,
based on returning skb->mark.
- removed no-payload optimization and documentation (for now):
I have a follow-on patch that reintroduces MSG_TSTAMP along with a
new socket option SOF_TIMESTAMPING_OPT_ONFLAG. This is equivalent
to sequence setsockopt(<enable>); send(..); setsockopt(<disable>),
but avoids the need to define a MSG_TSTAMP_<TYPE> for each type.
I will leave these three patches as follow-on, as this patchset is
large enough as is.
v1->v2
- expand timestamping (existing and new) to SOCK_RAW and ping sockets
- rename sock_errqueue_timestamping to scm_timestamping
- change timestamp data format: do not add fields to scm_timestamping.
Doing so could break legacy applications. Instead, communicate
through an existing, but unused, field in the error message.
- rename SOF_.._OPT_TX_NO_PAYLOAD to shorter SOF_.._OPT_TSONLY
- move msg_tstamp test app out of patchset and to github
git://github.com/wdebruij/kerneltools.git
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
in the send() call is acknowledged. It implements the feature for TCP.
The timestamp is generated when the TCP socket cumulative ACK is moved
beyond the tracked seqno for the first time. The feature ignores SACK
and FACK, because those acknowledge the specific byte, but not
necessarily the entire contents of the buffer up to that byte.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP timestamping extends SO_TIMESTAMPING to bytestreams.
Bytestreams do not have a 1:1 relationship between send() buffers and
network packets. The feature interprets a send call on a bytestream as
a request for a timestamp for the last byte in that send() buffer.
The choice corresponds to a request for a timestamp when all bytes in
the buffer have been sent. That assumption depends on in-order kernel
transmission. This is the common case. That said, it is possible to
construct a traffic shaping tree that would result in reordering.
The guarantee is strong, then, but not ironclad.
This implementation supports send and sendpages (splice). GSO replaces
one large packet with multiple smaller packets. This patch also copies
the option into the correct smaller packet.
This patch does not yet support timestamping on data in an initial TCP
Fast Open SYN, because that takes a very different data path.
If ID generation in ee_data is enabled, bytestream timestamps return a
byte offset, instead of the packet counter for datagrams.
The implementation supports a single timestamp per packet. It silenty
replaces requests for previous timestamps. To avoid missing tstamps,
flush the tcp queue by disabling Nagle, cork and autocork. Missing
tstamps can be detected by offset when the ee_data ID is enabled.
Implementation details:
- On GSO, the timestamping code can be included in the main loop. I
moved it into its own loop to reduce the impact on the common case
to a single branch.
- To avoid leaking the absolute seqno to userspace, the offset
returned in ee_data must always be relative. It is an offset between
an skb and sk field. The first is always set (also for GSO & ACK).
The second must also never be uninitialized. Only allow the ID
option on sockets in the ESTABLISHED state, for which the seqno
is available. Never reset it to zero (instead, move it to the
current seqno when reenabling the option).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel transmit latency is often incurred in the packet scheduler.
Introduce a new timestamp on transmission just before entering the
scheduler. When data travels through multiple devices (bonding,
tunneling, ...) each device will export an individual timestamp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.
Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.
The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.
The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.
Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.
SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].
1) define struct scm_timestamping explicitly and
2) add support for new tstamp types. On tx, scm_timestamping always
accompanies a sock_extended_err. Define previously unused field
ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
define the existing behavior.
The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These belong to the t4 msg header, will ensure there is no accidental code
duplication in the future
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.
When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.
Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.
To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.
A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.
We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:
(1) 90% of inter-ACK delays were less than 10ms
(2) 99% of inter-ACK delays were less than RTT/2
In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia says:
====================
qlcnic: Bug fixes
The patch series contains following bug fixes.
* Aggregating tx stats in adapter variable was resulting
in increase of stats when user runs ifconfig command
and no traffic is running. Instead aggregate tx stats
in local variable and then assign it to adapter struct
variable.
* Set_driver_version was called after registering netdev
which was resulting in a race between FLR in open
handler and set_driver_version command as open handler
can be called simulatneously on another cpu even if probe
is not complete. So call this command before registering
netdev.
* dcbnl_ops should be initialized before registering netdev
as they are referenced in open handler.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
o Initialization of dcbnl_ops after register netdev may result in
dcbnl_ops not getting set before it is being accessed from open.
So, moving it before register_netdev.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Earlier, set_drv_version was getting called after register_netdev.
This was resulting in a race between set_drv_version and FLR called
from open(). Moving set_drv_version before register_netdev avoids
the race.
o Log response code in error message on CDRP failure.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Aggregating tx stats in adapter variable was resulting in
an increase in stats even after no traffic was run and
user runs ifconfig/ethtool command.
o qlcnic_update_stats used to accumulate stats in adapter
struct at each function call, instead accumulate tx stats
in local variable and then assign it to adapter structure.
Reported-by: Holger Kiehl <holger.kiehl@dwd.de>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJT4IPwAAoJEJgn97Bh2u9e1VYP/3oZvI8BPNJaU6rv+KsACBPG
A1JD6bx9x3msFUZmuZtjcAp6EPDQGsBfTnV8wq7NhtKds7xbK8X+G3rXAHoaeNiC
awyWQ3shD0Ksi8Y0dIlXvRJK8xcaVF0X+G9yjtT5fokDSVraaPNntJEsX2r/5+Gw
+wMnOe3AorZy2koBucmf8Q5xXqiAbMY4l2Fwx8HIzM70z04h9JwgQyf1Ji/uM/q4
s30wB5Omd+HY4TAbsbHe/Or7inT6qhqwWbitS13VK6V7765Tb3WCg19hDtYYX09p
ZlLvqolBgwb5Ey4ZV/VaZ5erLdZ12wEfBAxYjem7JnHUyLKqr9KdLiv8i4/cNaLS
lvTCIPD/RTtLmzKni6NpYxQ7WeB7533+4I+dg1ZfqtTuRMrZdaPQ7ER/QeR4/kV4
Ua88zJ9Zj9ffxsCKjyOf2Dw1dQtzfLJdvHBU5pJ9sC47RcLv0a+68dE2+kmbPdNY
eLVxUAw1fMQhH4PyBRCJPncY79YLeM6eemMZhukc728/NfXVkfGXgNJlMpnD+dJE
+yYSpYRd6/RjyXQUBhzDPyrsNNKiPP8flW/R11jdcSq8YSdPzvMWUHvXPTA/1d3C
U9Qsf3aA+0wH4vtJ8G7klk8GmYxFZuYFNKwYVxVX9PQNzLCoVCNYENkPvjEz827C
MGwkqrh44S/+5++12b9+
=CJDS
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
pull request net: batman-adv 2014-08-04
this is a pull request intended for net.
It just contains a patch by Sven Eckelmann that fixes the
reception of out-of-order fragments. As explained in the
commit message, the issue was due to a wrong assumption
about hlist_for_each_entry() in batadv_frag_insert_packet().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss says:
====================
xen-netback: Changes around carrier handling
This series starts using carrier off as a way to purge packets when the guest is
not able (or willing) to receive them. It is a much faster way to get rid of
packets waiting for an overwhelmed guest.
The first patch changes current netback code where it relies currently on
netif_carrier_ok.
The second turns off the carrier if the guest times out on a queue, and only
turn it on again if that queue (or queues) resurrects.
====================
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when the guest is not able to receive more packets, qdisc layer starts
a timer, and when it goes off, qdisc is started again to deliver a packet again.
This is a very slow way to drain the queues, consumes unnecessary resources and
slows down other guests shutdown.
This patch change the behaviour by turning the carrier off when that timer
fires, so all the packets are freed up which were stucked waiting for that vif.
Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit to
signal the thread that either the timeout happened or an RX interrupt arrived,
so the thread can check what it should do. It also disables NAPI, so the guest
can't transmit, but leaves the interrupts on, so it can resurrect.
Only the queues which brought down the interface can enable it again, the bit
QUEUE_STATUS_RX_STALLED makes sure of that.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new state bit VIF_STATUS_CONNECTED to track whether the
vif is in a connected state. Using carrier will not work with the next patch
in this series, which aims to turn the carrier temporarily off if the guest
doesn't seem to be able to receive packets.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
v2:
- rename the bitshift type to "enum state_bit_shift" here, not in the next patch
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/6lowpan/iphc.c
Minor conflicts in iphc.c were changes overlapping with some
style cleanups.
John W. Linville says:
====================
Please pull this last(?) batch of wireless change intended for the
3.17 stream...
For the NFC bits, Samuel says:
"This is a rather quiet one, we have:
- A new driver from ST Microelectronics for their NCI ST21NFCB,
including device tree support.
- p2p support for the ST21NFCA driver
- A few fixes an enhancements for the NFC digital laye"
For the Atheros bits, Kalle says:
"Michal and Janusz did some important RX aggregation fixes, basically we
were missing RX reordering altogether. The 10.1 firmware doesn't support
Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc
support with that firmware. Also he implemented a workaround for a KVM
issue."
For the Bluetooth bits, Gustavo and Johan say:
"To quote Gustavo from his previous request:
'Some last minute fixes for -next. We have a fix for a use after free in
RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with
auto-connection handling. Last, we added support for reading the codec and
MWS setting for controllers that support these features.'
Additionally there are fixes to LE scanning, an update to conform to the 4.1
core specification as well as fixes for tracking the page scan state. All
of these fixes are important for 3.17."
And,
"We've got:
- 6lowpan fixes/cleanups
- A couple crash fixes, one for the Marvell HCI driver and another in LE SMP.
- Fix for an incorrect connected state check
- Fix for the bondable requirement during pairing (an issue which had
crept in because of using "pairable" when in fact the actual meaning
was "bondable" (these have different meanings in Bluetooth)"
Along with those are some late-breaking hardware support patches in
brcmfmac and b43 as well as a stray ath9k patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The default hop penalty is currently set to 15, which is applied like
that for multi interface devices (e.g. dual band APs). Single band
devices will still use an effective penalty of 30 (hop penalty + wifi
penalty).
After receiving reports of too long paths in mesh networks with dual
band APs which were fixed by increasing the hop penalty, we'd like to
suggest to increase that default value in the default setting as well.
We've evaluated that increase in a handful of medium sized mesh
networks (5-20 nodes) with single and dual band devices, with changes
for the better (shorter routes, higher throughput) or no change at all.
This patch changes the hop penalty to 30, which will give an effective
penalty of 60 on single band devices (hop penalty + wifi penalty).
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
This patch removes unnecessary logspam which resulted from superfluous
calls to net_ratelimit(). With the supplied patch, net_ratelimit() is
called after the loglevel has been checked.
Signed-off-by: André Gaul <gaul@web-yard.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
batadv_frag_insert_packet was unable to handle out-of-order packets because it
dropped them directly. This is caused by the way the fragmentation lists is
checked for the correct place to insert a fragmentation entry.
The fragmentation code keeps the fragments in lists. The fragmentation entries
are kept in descending order of sequence number. The list is traversed and each
entry is compared with the new fragment. If the current entry has a smaller
sequence number than the new fragment then the new one has to be inserted
before the current entry. This ensures that the list is still in descending
order.
An out-of-order packet with a smaller sequence number than all entries in the
list still has to be added to the end of the list. The used hlist has no
information about the last entry in the list inside hlist_head and thus the
last entry has to be calculated differently. Currently the code assumes that
the iterator variable of hlist_for_each_entry can be used for this purpose
after the hlist_for_each_entry finished. This is obviously wrong because the
iterator variable is always NULL when the list was completely traversed.
Instead the information about the last entry has to be stored in a different
variable.
This problem was introduced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge").
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
With netlink_lookup() conversion to RCU, we need to use appropriate
rcu dereference in netlink_seq_socket_idx() & netlink_seq_next()
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: David S. Miller <davem@davemloft.net>
tcpm_key is an array inside struct tcp_md5sig, there is no need to check it
against NULL.
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intel did some benchmarking on our network throughput when Linux on Hyper-V
is as used as a gateway. This fix gave us almost a 1 Gbps additional throughput
on about 5Gbps base throughput we hadi, prior to increasing the sendbuf size.
The sendbuf mechanism is a copy based transport that we have which is clearly
more optimal than the copy-free page flipping mechanism (for small packets).
In the forwarding scenario, we deal only with MTU sized packets,
and increasing the size of the senbuf area gave us the additional performance.
For what it is worth, Windows guests on Hyper-V, I am told use similar sendbuf
size as well.
The exact value of sendbuf I think is less important than the fact that it needs
to be larger than what Linux can allocate as physically contiguous memory.
Thus the change over to allocating via vmalloc().
We currently allocate 16MB receive buffer and we use vmalloc there for allocation.
Also the low level channel code has already been modified to deal with physically
dis-contiguous memory in the ringbuffer setup.
Based on experimentation Intel did, they say there was some improvement in throughput
as the sendbuf size was increased up to 16MB and there was no effect on throughput
beyond 16MB. Thus I have chosen 16MB here.
Increasing the sendbuf value makes a material difference in small packet handling
In this version of the patch, based on David's feedback, I have added
additional details in the commit log.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leftover from 5c601d0c94 ("wireless: move
zd1201 where it belongs").
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the use of devm_kzalloc and does away with the
kfrees in the probe and remove functions. Also, a label is removed.
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6cbdceeb1c
bridge: Dump vlan information from a bridge port
introduced a comment in an attempt to explain the
code logic. The comment is unfinished so it confuses more
than it explains, remove it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported by checkpatch with the following warning:
WARNING: Prefer kmalloc_array over kmalloc with multiply
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Add missing Makefile rule for r8152 driver
In the current kernel the r8152 driver is *never* built because of a missing rule in drivers/net/Makefile, despite being selected as built-in or module in the .config file. There is no error message or warning to indicate that the driver isn't built. This change adds the rule and lets the driver build.
Tested as built-in and module for 3.15.8.
Signed-off by: JF Le Fillatre <jflf-kernel@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by the intel guys, there is no need to hold rcu read lock in
cxgbi_inet6addr_handler(), this patch removes it.
Fixes: 759a0cc5a3 ("cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf says:
====================
Lockless netlink_lookup() with new concurrent hash table
Netlink sockets are maintained in a hash table to allow efficient lookup
via the port ID for unicast messages. However, lookups currently require
a read lock to be taken. This series adds a new generic, resizable,
scalable, concurrent hash table based on the paper referenced in the first
patch. It then makes use of the new data type to implement lockless
netlink_lookup().
Patch 3/3 to convert nft_hash is included for reference but should be
merged via the netfilter tree. Inclusion in this series is to provide
context for the suggested API.
Against net-next since the initial user of the new hash table is in net/
Changes:
v4-v5:
- use GFP_KERNEL to alloc Netlink buckets as suggested by Nikolay
Aleksandrov
- free nft hash element on removal as spotted by Nikolay Aleksandrov
and Patrick McHardy
v3-v4:
- fixed wrong shift assignment placement as spotted by Nikolay Aleksandrov
- reverted default size of nft_hash to 4 as requested by Patrick McHardy,
default size for other hash tables remains at 64 if no hint is given
- fixed copyright as requested by Patrick McHardy
v2-v3:
- fixed typo in nft_hash_destroy() when passing rhashtable handle
v1-v2:
- fixed traversal off-by-one as spotted by Tobias Klauser
- removed unlikely() from BUG_ON() as spotted by Josh Triplett
- new 3rd patch to convert nft_hash to rhashtable
- make rhashtable_insert() return void
- nl_sk_hash_lock must be a mutex
- fixed wrong name of rht_shrink_below_30()
- exported symbols rht_grow_above_75() and rht_shrink_below_30()
- allow table freeing with RCU callback
====================
Signed-off-by: David S. Miller <davem@davemloft.net>