967847 Commits

Author SHA1 Message Date
Ioana Ciornei
66d7439e83 net: phy: adin: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:05 -08:00
Ioana Ciornei
e1bc534df8 net: phy: ste10Xp: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
80ca9ee741 net: phy: ste10Xp: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
824ef51f08 net: phy: smsc: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Andre Edich <andre.edich@microchip.com>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
36b25c26e2 net: phy: smsc: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Andre Edich <andre.edich@microchip.com>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
347917c7e0 net: phy: amd: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
d995a36b7e net: phy: amd: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
45f52f1238 net: phy: nxp-tja11xx: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Marek Vasut <marex@denx.de>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:37:00 -08:00
Ioana Ciornei
52b1984a88 net: phy: nxp-tja11xx: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Marek Vasut <marex@denx.de>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
9a12dd6f18 net: phy: lxt: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
01c4a00bf3 net: phy: lxt: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
1f6d0f267a net: phy: marvell: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
a0723b375f net: phy: marvell: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
cf49939198 net: phy: microchip: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
e01a3feb8f net: phy: microchip: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
e96a0d9774 net: phy: vitesse: remove the use of .ack_interrupt()
In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Ioana Ciornei
b606ad8fa2 net: phy: vitesse: implement generic .handle_interrupt() callback
In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:36:59 -08:00
Randy Dunlap
97f53a08cb net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling
The previous Kconfig patch led to some other build errors as
reported by the 0day bot and my own overnight build testing.

These are all in <linux/skbuff.h> when KCOV is enabled but
SKB_EXTENSIONS is not enabled, so fix those by combining those conditions
in the header file.

Fixes: 6370cc3bbd8a ("net: add kcov handle to skb extensions")
Fixes: 85ce50d337d1 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116212108.32465-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 11:04:55 -08:00
Sven Van Asbroeck
7c3e2b771d lan743x: replace devicetree phy parse code with library function
The code in this driver which parses the devicetree to determine
the phy/fixed link setup, can be replaced by a single library
function: of_phy_get_and_connect().

Behaviour is identical, except that the library function will
complain when 'phy-connection-type' is omitted, instead of
blindly using PHY_INTERFACE_MODE_NA, which would result in an
invalid phy configuration.

The library function no longer brings out the exact phy_mode,
but the driver doesn't need this, because phy_interface_is_rgmii()
queries the phydev directly. Remove 'phy_mode' from the private
adapter struct.

While we're here, log info about the attached phy on connect,
this is useful because the phy type and connection method is now
fully configurable via the devicetree.

Tested on a lan7430 chip with built-in phy. Verified that adding
fixed-link/phy-connection-type in the devicetree results in a
fixed-link setup. Used ethtool to verify that the devicetree
settings are used.

Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201116170155.26967-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 10:46:20 -08:00
Heiner Kallweit
a98cabdb8c net: phy: don't duplicate driver name in phy_attached_print
Currently we print the driver name twice in phy_attached_print():
- phy_dev_info() prints it as part of the device info
- and we print it as part of the info string

This is a little bit ugly, it makes the info harder to read,
especially if the driver name is a little bit longer.
Therefore omit the driver name (if set) in the info string.

Example from r8169 that uses phylib:

old: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver \
   [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
new: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver \
   (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/8ab72586-f079-41d8-84ee-9f6a5bd97b2a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 10:28:37 -08:00
Heiner Kallweit
83c317d7b3 r8169: remove nr_frags argument from rtl_tx_slots_avail
The only time when nr_frags isn't SKB_MAX_FRAGS is when entering
rtl8169_start_xmit(). However we can use SKB_MAX_FRAGS also here
because when queue isn't stopped there should always be room for
MAX_SKB_FRAGS + 1 descriptors.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/3d1f2ad7-31d5-2cac-4f4a-394f8a3cab63@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:29:14 -08:00
kernel test robot
b618c32702 net: phy: mscc: fix excluded_middle.cocci warnings
Condition !A || A && B is equivalent to !A || B.

Generated by: scripts/coccinelle/misc/excluded_middle.cocci

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2011161633240.2682@hadrien
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:27:39 -08:00
Jakub Kicinski
f85cd064cd Merge branch 'net-dsa-tag_dsa-unify-regular-and-ethertype-dsa-taggers'
Tobias Waldekranz says:

====================
net: dsa: tag_dsa: Unify regular and ethertype DSA taggers

The first patch ports tag_edsa.c's handling of IGMP/MLD traps to
tag_dsa.c. That way, we start from two logically equivalent taggers
that are then merged. The second commit does the heavy lifting of
actually fusing tag_dsa.c and tag_edsa.c. The final one just follows
up with some clean up of existing comments.

v2 -> v3:
  - Add the first patch described above as suggested by Andrew.
  - Better documentation of TO_SNIFFER and FORWARD tags.
  - Spelling.

v1 -> v2:
  - Fixed some grammar and whitespace errors.
  - Removed unnecessary default value in Kconfig.
  - Removed unnecessary #ifdef.
  - Split out comment fixes from functional changes.
  - Fully document enum dsa_code.
====================

Link: https://lore.kernel.org/r/20201114234558.31203-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:14 -08:00
Tobias Waldekranz
13f49b6f26 net: dsa: tag_dsa: Use a consistent comment style
Use a consistent style of one-line/multi-line comments throughout the
file.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:12 -08:00
Tobias Waldekranz
469ee5fe73 net: dsa: tag_dsa: Unify regular and ethertype DSA taggers
Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.

This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.

While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:12 -08:00
Tobias Waldekranz
e468d141b9 net: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.

This is a port of:

1ed9ec9b08ad ("dsa: Allow forwarding of redirected IGMP traffic")

Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 09:16:11 -08:00
Jakub Kicinski
72308ecbf3 Merge branch 'mptcp-improve-multiple-xmit-streams-support'
Paolo Abeni says:

====================
mptcp: improve multiple xmit streams support

This series improves MPTCP handling of multiple concurrent
xmit streams.

The to-be-transmitted data is enqueued to a subflow only when
the send window is open, keeping the subflows xmit queue shorter
and allowing for faster switch-over.

The above requires a more accurate msk socket state tracking
and some additional infrastructure to allow pushing the data
pending in the msk xmit queue as soon as the MPTCP's send window
opens (patches 6-10).

As a side effect, the MPTCP socket could enqueue data to subflows
after close() time - to completely spooling the data sitting in the
msk xmit queue. Dealing with the requires some infrastructure and
core TCP changes (patches 1-5)

Finally, patches 11-12 introduce a more accurate tracking of the other
end's receive window.

Overall this refactor the MPTCP xmit path, without introducing
new features - the new code is covered by the existing self-tests.

v2 -> v3:
 - rebased,
 - fixed checkpatch issue in patch 1/13
 - fixed some state tracking issues in patch 8/13

v1 -> v2:
 - this is just a repost, to cope with patchwork issues, no changes
   at all
====================

Link: https://lore.kernel.org/r/cover.1605458224.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:11 -08:00
Paolo Abeni
7ed90803a2 mptcp: send explicit ack on delayed ack_seq incr
When the worker moves some bytes from the OoO queue into
the receive queue, the msk->ask_seq is updated, the MPTCP-level
ack carrying that value needs to wait the next ingress packet,
possibly slowing down or hanging the peer

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Florian Westphal
6f8a612a33 mptcp: keep track of advertised windows right edge
Before sending 'x' new bytes also check that the new snd_una would
be within the permitted receive window.

For every ACK that also contains a DSS ack, check whether its tcp-level
receive window would advance the current mptcp window right edge and
update it if so.

Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Florian Westphal
8edf08649e mptcp: rework poll+nospace handling
MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
least one subflow and the mptcp socket itself are writeable.

mptcp_poll returns EPOLLOUT if the bit is set.

mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
has used up all subflows or the mptcp socket wmem.

This reworks nospace handling as follows:

MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
This bit is set when the mptcp socket is not writeable.
The mptcp-level ack path schedule will then schedule the mptcp worker
to allow it to free already-acked data (and reduce wmem usage).

This will then wake userspace processes that wait for a POLLOUT event.

sendmsg will set MPTCP_NOSPACE only when it has to wait for more
wmem (blocking I/O case).

poll path will set MPTCP_NOSPACE in case the mptcp socket is
not writeable.

Normal tcp-level notification (SOCK_NOSPACE) is only enabled
in case the subflow socket has no available wmem.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Paolo Abeni
813e0a683d mptcp: try to push pending data on snd una updates
After the previous patch we may end-up with unsent data
in the write buffer. If such buffer is full, the writer
will block for unlimited time.

We need to trigger the MPTCP xmit path even for the
subflow rx path, on MPTCP snd_una updates.

Keep things simple and just schedule the work queue if
needed.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Paolo Abeni
d9ca1de8c0 mptcp: move page frag allocation in mptcp_sendmsg()
mptcp_sendmsg() is refactored so that first it copies
the data provided from user space into the send queue,
and then tries to spool the send queue via sendmsg_frag.

There a subtle change in the mptcp level collapsing on
consecutive data fragment: we now allow that only on unsent
data.

The latter don't need to deal with msghdr data anymore
and can be simplified in a relevant way.

snd_nxt and write_seq are now tracked independently.

Overall this allows some relevant cleanup and will
allow sending pending mptcp data on msk una update in
later patch.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Paolo Abeni
e16163b6e2 mptcp: refactor shutdown and close
We must not close the subflows before all the MPTCP level
data, comprising the DATA_FIN has been acked at the MPTCP
level, otherwise we could be unable to retransmit as needed.

__mptcp_wr_shutdown() shutdown is responsible to check for the
correct status and close all subflows. Is called by the output
path after spooling any data and at shutdown/close time.

In a similar way, __mptcp_destroy_sock() is responsible to clean-up
the MPTCP level status, and is called when the msk transition
to TCP_CLOSE.

The protocol level close() does not force anymore the TCP_CLOSE
status, but orphan the msk socket and all the subflows.
Orphaned msk sockets are forciby closed after a timeout or
when all MPTCP-level data is acked.

There is a caveat about keeping the orphaned subflows around:
the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
tcp_close(). To prevent accessing freed memory on later MPTCP
level operations, the msk acquires a reference to each subflow
socket and prevent subflow_ulp_release() from releasing the
subflow context before __mptcp_destroy_sock().

The additional subflow references are released by __mptcp_done()
and the async ULP release is detected checking ULP ops. If such
field has been already cleared by the ULP release path, the
dangling context is freed directly by __mptcp_done().

Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Paolo Abeni
eaa2ffabfc mptcp: introduce MPTCP snd_nxt
Track the next MPTCP sequence number used on xmit,
currently always equal to write_next.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:07 -08:00
Paolo Abeni
f0e6a4cf11 mptcp: add accounting for pending data
Preparation patch to track the data pending in the msk
write queue. No functional change introduced here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
caf971df01 mptcp: reduce the arguments of mptcp_sendmsg_frag
The current argument list is pretty long and quite unreadable,
move many of them into a specific struct. Later patches
will add more stuff to such struct.

Additionally drop the 'timeo' argument, now unused.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
ba8f48f7a4 mptcp: introduce mptcp_schedule_work
remove some of code duplications an allow preventing
rescheduling on close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
77c3c95637 tcp: factor out __tcp_close() helper
unlocked version of protocol level close, will be used by
MPTCP to allow decouple orphaning and subflow level close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
e2223995a2 mptcp: use tcp_build_frag()
mptcp_push_pending() is called even on orphaned
msk (and orphaned subflows), if there is outstanding
data at close() time.

To cope with the above MPTCP needs to handle explicitly
the allocation failure on xmit. The newly introduced
do_tcp_sendfrag() allows that, just plug it.

We can additionally drop a couple of sanity checks,
duplicate in the TCP code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
b796d04bd0 tcp: factor out tcp_build_frag()
Will be needed by the next patch, as MPTCP needs to handle
directly the error/memory-allocation-needed path.

No functional changes intended.

Additionally let MPTCP code access the tcp_remove_empty_skb()
helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Jakub Kicinski
c0a645a7f9 Merge branch 'fix-inefficiences-and-rename-nla_strlcpy'
Francis Laniel says:

====================
Fix inefficiences and rename nla_strlcpy

This patch set answers to first three issues listed in:
https://github.com/KSPP/linux/issues/110

To sum up, the patch contributions are the following:
1. the first patch fixes an inefficiency where some bytes in dst were written
twice, one with 0 the other with src content.
2. The second one modifies nla_strlcpy to return the same value as strscpy,
i.e. number of bytes written or -E2BIG if src was truncated.
It also modifies code that calls nla_strlcpy and checks for its return value.
3. The third renames nla_strlcpy to nla_strscpy.

Unfortunately, I did not find how to create struct nlattr objects so I tested
my modifications on simple char* and with GDB using tc to get to
tcf_proto_check_kind.
====================

Link: https://lore.kernel.org/r/20201115170806.3578-1-laniel_francis@privacyrequired.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:11:05 -08:00
Francis Laniel
872f690341 treewide: rename nla_strlcpy to nla_strscpy.
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Francis Laniel
9ca718743a Modify return value of nla_strlcpy to match that of strscpy.
nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.

For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.

The callers of nla_strlcpy were modified to take into account this modification.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Francis Laniel
8eeb99bc81 Fix unefficient call to memset before memcpu in nla_strlcpy.
Before this commit, nla_strlcpy first memseted dst to 0 then wrote src into it.
This is inefficient because bytes whom number is less than src length are written
twice.

This patch solves this issue by first writing src into dst then fill dst with
0's.
Note that, in the case where src length is higher than dst, only 0 is written.
Otherwise there are as many 0's written to fill dst.

For example, if src is "foo\0" and dst is 5 bytes long, the result will be:
1. "fooGG" after memcpy (G means garbage).
2. "foo\0\0" after memset.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Heiner Kallweit
41294e6a43 r8169: improve rtl8169_start_xmit
Improve the following in rtl8169_start_xmit:
- tp->cur_tx can be accessed in parallel by rtl_tx(), therefore
  annotate the race by using WRITE_ONCE
- avoid checking stop_queue a second time by moving the doorbell check
- netif_stop_queue() uses atomic operation set_bit() that includes a
  full memory barrier on some platforms, therefore use
  smp_mb__after_atomic to avoid overhead

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/80085451-3eaf-507a-c7c0-08d607c46fbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 07:54:07 -08:00
Lev Stipakov
0064c5c1b3 net: xfrm: use core API for updating/providing stats
Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.

Use this function instead of own code.

While on it, remove xfrmi_get_stats64() and replace it with
dev_get_tstats64().

Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215939.147007-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 17:01:08 -08:00
Lev Stipakov
865e6ae02d net: openvswitch: use core API to update/provide stats
Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.

Use this function instead of own code.

While on it, remove internal_get_stats() and replace it
with dev_get_tstats64().

Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:59:32 -08:00
Jakub Kicinski
cf70b5cfab Merge branch 'mlxsw-preparations-for-nexthop-objects-support-part-1-2'
Ido Schimmel says:

====================
mlxsw: Preparations for nexthop objects support - part 1/2

This patch set contains small and non-functional changes aimed at making
it easier to support nexthop objects in mlxsw. Follow up patches can be
found here [1].

Patches #1-#4 add a type field to the nexthop group struct instead of
the existing protocol field. This will be used later on to add a nexthop
object type, which can contain both IPv4 and IPv6 nexthops.

Patches #5-#7 move the IPv4 FIB info pointer (i.e., 'struct fib_info')
from the nexthop group struct to the route. The pointer will not be
available when the nexthop group is a nexthop object, but it needs to be
accessible to routes regardless.

Patch #8 is the biggest change, but it is an entirely cosmetic change
and should therefore be easy to review. The motivation and the change
itself are explained in detail in the commit message.

Patches #9-#12 perform small changes so that two functions that are
currently split between IPv4 and IPv6 could be consolidated in patches

Patch #15 removes an outdated comment.

[1] https://github.com/idosch/linux/tree/submit/nexthop_objects
====================

Link: https://lore.kernel.org/r/20201113160559.22148-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:55:08 -08:00
Ido Schimmel
245f4e44d2 mlxsw: spectrum_router: Remove outdated comment
Since commit 21151f64a458 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:55:04 -08:00
Ido Schimmel
9ed2b4d287 mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_fini()
The two functions are identical, so consolidate them to
mlxsw_sp_nexthop_type_fini().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:55:04 -08:00