707982 Commits

Author SHA1 Message Date
Nicolas Pitre
9eaa903c06 mtd: implement mtd_get_unmapped_area() using the point method
The mtd->_point method is a superset of mtd->_get_unmapped_area.
Especially in the NOR flash case, the point method ensures the flash
memory is in array (data) mode and that it will stay that way which
is precisely what callers of mtd_get_unmapped_area() would expect.

Implement mtd_get_unmapped_area() in terms of mtd->_point now that all
drivers that provided a _get_unmapped_area method also have the _point
method implemented.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Richard Weinberger <richard@nod.at>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:18 +01:00
Nicolas Pitre
55100cfa33 mtd: chips/map_rom.c: implement point and unpoint methods
This will allow for the removal of the get_unmapped_area method later.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
[rw: fixed build]
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:18 +01:00
Nicolas Pitre
2caaf2d83a mtd: chips/map_ram.c: implement point and unpoint methods
This will allow for the removal of the get_unmapped_area method later.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Richard Weinberger <richard@nod.at>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:17 +01:00
Nicolas Pitre
877b58ebc0 mtd: mtdram: properly handle the phys argument in the point method
When the phys pointer is non null, the point method is expected to return
the physical address for the pointed area. In the case of the mtdram
driver we have to retrieve the physical address for the corresponding
vmalloc area. However, there is no guarantee that the vmalloc area is
made of physically contiguous pages. In that case we simply limit retlen
to the actually contiguous pages.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Richard Weinberger <richard@nod.at>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:17 +01:00
Arvind Yadav
a5929b64fa mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD'
Trivial fix to spelling mistakes.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:16 +01:00
Roy Franz
cb9e20633d mtd: slram: use memremap() instead of ioremap()
Convert slram to use memremap() to map the memory it uses to back an MTD
device, as this is the proper interface for mapping memory. This change
enables normal memory to be used to back an MTD device on arm64, as arm64
prevents ioremap() being used on normal memory.

Signed-off-by: Roy Franz <roy.franz@cavium.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:16 +01:00
Rob Herring
9de8da4774 kconfig: kill off GENERIC_IO option
The GENERIC_IO option is set for every architecture except tile and score
as those define NO_IOMEM. The option only controls visibility of
CONFIG_MTD which doesn't appear to be necessary for any reason, so let's
just remove GENERIC_IO.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:15 +01:00
Pavel Machek
971e4aeeaa mtd: Fix C++ comment in include/linux/mtd/mtd.h
C++ comments look wrong in kernel tree. Fix one.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-06 23:26:02 +01:00
Arvind Yadav
d4906688d4 mtd: constify mtd_partition
mtd_partition are not supposed to change at runtime.
Functions 'mtd_device_parse_register' working with const mtd_partition
provided by <linux/mtd/mtd.h>. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-06 23:26:01 +01:00
Anton Vasilyev
2e442aebed mtd: plat-ram: Replace manual resource management by devm
Driver contains unsuitable request_mem_region() and
release_resource() calls.

The patch switches manual resource management by devm interface for
readability and error-free simplification.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Suggested-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-06 23:26:01 +01:00
Richard Weinberger
16271224bc Core changes:
* Add a flag to mark NANDs that require 3 address cycles to encode a
   page address
 * Set a default ECC/free layout when NAND_ECC_NONE is requested
 * Fix a bug in panic_nand_write()
 
 Driver changes:
 * Another batch of cleanups for the denali driver
 * Fix PM support in the atmel driver
 * Remove support for platform data in the omap driver
 * Fix subpage write in the omap driver
 * Fix irq handling in the mtk driver
 * Change link order of mtk_ecc and mtk_nand drivers to speed up boot
   time
 * Change log level of ECC error messages in the mxc driver
 * Patch the pxa3xx driver to support Armada 8k platforms
 * Add BAM DMA support to the qcom driver
 * Convert gpio-nand to the GPIO desc API
 * Fix ECC handling in the mt29f driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJABAABCAAqBQJZ+iDfIxxib3Jpcy5icmV6aWxsb25AZnJlZS1lbGVjdHJvbnMu
 Y29tAAoJEGXtNgF+CLcAPxoP/iuRGzfzs7DTbS6rLtcbIFKbulj/kjB8BfPtYGC8
 1n7C2ZZkQOeargPyf1wtcvNgbVRjUv4/lZ22+HD7l/wDGDjOWeTs0v+it4yGVYzo
 iafyx+8m7J4kZWmZnguc6MQnFJ4g0yorUF3tmMYtd+OihgtlB/NWoxEAG40kPuhQ
 JpARsV/yWxV+l+30TBVtKCOmcS4tBh7Kjhlmr624BJv6sWilv63PnkG90a1qZUCw
 He2PLSNAXXaU7nWta+FKUSzIiRnsWhp2hqf9HIndx4zs1WHK86C15oBXvPuFs3q7
 FD5TB/sutTIhmkrqpZZJID/h1QDUkCYd9p2ZO6a0if/S1gZgiBKFFeJXcAlhj0Ze
 xqFvE/gni/w2mY8xlqX4/Ras5ndfMuNIIQgyCR/iDwQM4Sv6G5t59nMaCb7r0XYy
 Y1pZqVQ/jE8Kh5IkANEmQPVWv95OeQQwY0igtSb5Ih2J9cIzbX/8daE3CP1SOUaX
 REOmUJkb1Ad6gA9e3/nS0ZhLttmFtLEgxQqMQ16XWDtKkf+6uQcBPF/1JD6CuFjn
 0q6S5p1Mci/IZy2/ds9zIm42/dkG3LSLSG0cd2j60lTgTZsTloIsLcX120bDH/DM
 3LejsHgHuaA1Qd7ku9Bn/rfTZdQbSoqQtvkSw3t0touMG/5ErKuleTv9JDaoEb2e
 vRGr
 =iUhH
 -----END PGP SIGNATURE-----

Merge tag 'nand/for-4.15' of git://git.infradead.org/l2-mtd

From Boris:
"
Core changes:
* Add a flag to mark NANDs that require 3 address cycles to encode a
  page address
* Set a default ECC/free layout when NAND_ECC_NONE is requested
* Fix a bug in panic_nand_write()

Driver changes:
* Another batch of cleanups for the denali driver
* Fix PM support in the atmel driver
* Remove support for platform data in the omap driver
* Fix subpage write in the omap driver
* Fix irq handling in the mtk driver
* Change link order of mtk_ecc and mtk_nand drivers to speed up boot
  time
* Change log level of ECC error messages in the mxc driver
* Patch the pxa3xx driver to support Armada 8k platforms
* Add BAM DMA support to the qcom driver
* Convert gpio-nand to the GPIO desc API
* Fix ECC handling in the mt29f driver
"
2017-11-02 22:30:37 +01:00
Richard Weinberger
20b2fc79a2 This pull-request contains the following notable changes:
Core changes:
 * Introduce system power management support.
 * New mechanism to select the proper .quad_enable() hook by JEDEC ID,
   when needed, instead of only by manufacturer ID.
 * Add support to new memory parts from Gigadevice, Winbond, Macronix and
   Everspin.
 
 Driver changes:
 * Maintainance for Cadence, Intel, Mediatek and STM32 drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQI4BAABCAAiBQJZ+eGDGxxjeXJpbGxlLnBpdGNoZW5Ad2VkZXY0dS5mcgAKCRDn
 4OgLHRpJcozND/wK57QO0ZQXU42j6fMk5BM6Aj/YoKJC1jDsX7rRIjUifiydKQ4+
 g5yQ4GzUXGMNA/oGJuy4vANmRIMWlvgBhkA+yCE8GnxM5s+RXhtfHKYsRk6pPdXA
 6obVCo1eY9lZd/clBBnjAreD4bM94fWjZqupwvJDKMnWQhAvA6FC8kRQsyU0ZpvV
 RUH8AMY9Pf9F6rZ+3YxFvqov4xDHdH5BhQ9ZjjmPs1kV56rPS+xoqN3S8gIQR/Zk
 zVMtZ+XBl/n47gEL827GP1ZA42i9+fhWqBJ4PaIWJFPYtvGYdoId0NXTpPyzcMrn
 Ox93PXR+FSZHFill7FIdJ2qam6eBZrhuhXotKIFthe8uWauE9UweTFn0bKlgwXRI
 5bLi9B4VXCYSS3PDEiDg+ohabaJBYfmIingBcr8PK+0vMdRN+A0ovKPkm3Et4dy3
 wcPXuSsVOziwqgYHLvAWjo0M3hN9L9Nm+RTs6izhWunW6vTCzfOABBAT9+a6a7Fp
 v3XNJwV1NxxU9Mcj/LKgIXgQl8r+YAUHHjn6yBbdwh0iWcNRVBqpkaUygXogAJMi
 6qeN0znLc3k4k05Vsv/oAm3h/f+dY6yVslXJTVhjWDzC4z3pywRyAfSzMai+Ai1f
 YFzz19abzJ/FvHMVG8jeDjpn0nk3s14IClK9wFcQfMPS4GUyCyWF6MAGTw==
 =sCfc
 -----END PGP SIGNATURE-----

Merge tag 'spi-nor/for-4.15' of git://git.infradead.org/l2-mtd

This pull-request contains the following notable changes:

From Cyrille:
"
Core changes:
* Introduce system power management support.
* New mechanism to select the proper .quad_enable() hook by JEDEC ID,
  when needed, instead of only by manufacturer ID.
* Add support to new memory parts from Gigadevice, Winbond, Macronix and
  Everspin.

Driver changes:
* Maintainance for Cadence, Intel, Mediatek and STM32 drivers.
"
2017-11-02 22:29:24 +01:00
Brent Taylor
30863e38eb mtd: nand: Fix writing mtdoops to nand flash.
When mtdoops calls mtd_panic_write(), it eventually calls
panic_nand_write() in nand_base.c. In order to properly wait for the
nand chip to be ready in panic_nand_wait(), the chip must first be
selected.

When using the atmel nand flash controller, a panic would occur due to
a NULL pointer exception.

Fixes: 2af7c6539931 ("mtd: Add panic_write for NAND flashes")
Cc: <stable@vger.kernel.org>
Signed-off-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-10-31 16:25:43 +01:00
Kuppuswamy Sathyanarayanan
ec0a9f62b3 mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID
This patch adds Intel Lewisburg PCH SPI serial flash controller super
SKU PCI ID.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-30 11:51:18 +01:00
Xiaolei Li
1d2fcdcf33 mtd: nand: mtk: fix infinite ECC decode IRQ issue
For MT2701 NAND Controller, there may generate infinite ECC decode IRQ
during long time burn test on some platforms. Once this issue occurred,
the ECC decode IRQ status cannot be cleared in the IRQ handler function,
and threads cannot be scheduled.

ECC HW generates decode IRQ each sector, so there will have more than one
decode IRQ if read one page of large page NAND.

Currently, ECC IRQ handle flow is that we will check whether it is decode
IRQ at first by reading the register ECC_DECIRQ_STA. This is a read-clear
type register. If this IRQ is decode IRQ, then the ECC IRQ signal will be
cleared at the same time.
Secondly, we will check whether all sectors are decoded by reading the
register ECC_DECDONE. This is because the current IRQ may be not dealed
in time, and the next sectors have been decoded before reading the
register ECC_DECIRQ_STA. Then, the next sectors's decode IRQs will not
be generated.
Thirdly, if all sectors are decoded by comparing with ecc->sectors, then we
will complete ecc->done, set ecc->sectors as 0, and disable ECC IRQ by
programming the register ECC_IRQ_REG(op) as 0. Otherwise, wait for the
next ECC IRQ.

But, there is a timing issue between step one and two. When we read the
reigster ECC_DECIRQ_STA, all sectors are decoded except the last sector,
and the ECC IRQ signal is cleared. But the last sector is decoded before
reading ECC_DECDONE, so the ECC IRQ signal is enabled again by ECC HW, and
it means we will receive one extra ECC IRQ later. In step three, we will
find that all sectors were decoded, then disable ECC IRQ and return.
When deal with the extra ECC IRQ, the ECC IRQ status cannot be cleared
anymore. That is because the register ECC_DECIRQ_STA can only be cleared
when the register ECC_IRQ_REG(op) is enabled. But actually we have
disabled ECC IRQ in the previous ECC IRQ handle. So, there will
keep receiving ECC decode IRQ.

Now, we read the register ECC_DECIRQ_STA once again before completing the
ecc done event. This ensures that there will be no extra ECC decode IRQ.

Also, remove writel(0, ecc->regs + ECC_IRQ_REG(op)) from irq handler,
because ECC IRQ is disabled in mtk_ecc_disable(). And clear ECC_DECIRQ_STA
in mtk_ecc_disable() in case there is a timeout to wait decode IRQ.

Fixes: 1d6b1e464950 ("mtd: mediatek: driver for MTK Smart Device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-10-30 09:35:04 +01:00
Linus Torvalds
0b07194bb5 Linux 4.14-rc7 2017-10-29 13:58:38 -07:00
Philipp Puschmann
282e45dc64 mtd: spi-nor: Add support for mr25h128
Add Everspin mr25h128 16KB MRAM to the list of supported chips.

Signed-off-by: Philipp Puschmann <pp@emlix.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-29 20:57:19 +01:00
Xiaolei Li
1c782b9a85 mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o
There will get mtk ecc handler during mtk nand probe now.
If mtk ecc module is not initialized, then mtk nand probe will return
-EPROBE_DEFER, and retry later.

Change the compile sequence of mtk_nand.o and mtk_ecc.o, initialize mtk
ecc module before mtk nand module. This makes mtk nand module initialized
as soon as possible.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-10-29 20:40:40 +01:00
Roman Yeryomin
d342b6a973 mtd: spi-nor: enable 4B opcodes for mx66l51235l
Signed-off-by: Roman Yeryomin <roman@advem.lv>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-29 19:02:20 +01:00
Ludovic Barre
10cd4b7b74 mtd: spi-nor: stm32-quadspi: fix prefetching outside fsize
When memory-mapped mode is used, a prefetching mechanism fully
managed by the hardware allows to optimize the read from external
the QSPI memory. A 32-bytes FIFO is used for prefetching.
When the limit of flash size - fifo size is reached the prefetching
mechanism tries to read outside the fsize.
The stm32 quadspi hardware become busy and should be aborted.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Reported-by: Bruno Herrera <bruherrera@gmail.com>
Tested-by: Bruno Herrera <bruherrera@gmail.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-29 18:54:22 +01:00
Ludovic Barre
e812963b91 mtd: spi-nor: stm32-quadspi: change license text
-Change the license text with long template.
-Change Copyright to STMicroelectronics.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-29 18:43:22 +01:00
Geert Uytterhoeven
05521bd3d1 mtd: spi-nor: stm32-quadspi: Fix uninitialized error return code
With gcc 4.1.2:

    drivers/mtd/spi-nor/stm32-quadspi.c: In function ‘stm32_qspi_tx_poll’:
    drivers/mtd/spi-nor/stm32-quadspi.c:230: warning: ‘ret’ may be used uninitialized in this function

Indeed, if stm32_qspi_cmd.len is zero, ret will be uninitialized.
This length is passed from outside the driver using the
spi_nor.{read,write}{,_reg}() callbacks.

Several functions in drivers/mtd/spi-nor/spi-nor.c (e.g. write_enable(),
write_disable(), and erase_chip()) call spi_nor.write_reg() with a zero
length.

Fix this by returning an explicit zero on success.

Fixes: 0d43d7ab277a048c ("mtd: spi-nor: add driver for STM32 quad spi flash controller")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-10-29 18:30:13 +01:00
Linus Torvalds
19e12196da Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix route leak in xfrm_bundle_create().

 2) In mac80211, validate user rate mask before configuring it. From
    Johannes Berg.

 3) Properly enforce memory limits in fair queueing code, from Toke
    Hoiland-Jorgensen.

 4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.

 5) Fix TSO header allocation and management in mvpp2 driver, from Yan
    Markman.

 6) Don't take socket lock in BH handler in strparser code, from Tom
    Herbert.

 7) Don't show sockets from other namespaces in AF_UNIX code, from
    Andrei Vagin.

 8) Fix double free in error path of tap_open(), from Girish Moodalbail.

 9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
    and Alexander Duyck.

10) Fix DCB mode programming in stmmac driver, from Jose Abreu.

11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
    Long.

12) Properly align SKB head before building SKB in tuntap, from Jason
    Wang.

13) Avoid matching qdiscs with a zero handle during lookups, from Cong
    Wang.

14) Fix various endianness bugs in sctp, from Xin Long.

15) Fix tc filter callback races and add selftests which trigger the
    problem, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  selftests: Introduce a new test case to tc testsuite
  selftests: Introduce a new script to generate tc batch file
  net_sched: fix call_rcu() race on act_sample module removal
  net_sched: add rtnl assertion to tcf_exts_destroy()
  net_sched: use tcf_queue_work() in tcindex filter
  net_sched: use tcf_queue_work() in rsvp filter
  net_sched: use tcf_queue_work() in route filter
  net_sched: use tcf_queue_work() in u32 filter
  net_sched: use tcf_queue_work() in matchall filter
  net_sched: use tcf_queue_work() in fw filter
  net_sched: use tcf_queue_work() in flower filter
  net_sched: use tcf_queue_work() in flow filter
  net_sched: use tcf_queue_work() in cgroup filter
  net_sched: use tcf_queue_work() in bpf filter
  net_sched: use tcf_queue_work() in basic filter
  net_sched: introduce a workqueue for RCU callbacks of tc filter
  sctp: fix some type cast warnings introduced since very beginning
  sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
  sctp: fix some type cast warnings introduced by transport rhashtable
  sctp: fix some type cast warnings introduced by stream reconf
  ...
2017-10-29 08:11:49 -07:00
David S. Miller
6c325f4eca Merge branch 'net_sched-fix-races-with-RCU-callbacks'
Cong Wang says:

====================
net_sched: fix races with RCU callbacks

Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:

1. A simple one fixed by Daniel:

commit c78e1746d3ad7d548bdf3fe491898cc453911a49
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

2. A very nasty one fixed by me:

commit 1697c4bb5245649a23f06a144cc38c06715e1b65
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/

Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.

Alternatively, we could also:

a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb5245 ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).

b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.

As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.

Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:32 +09:00
Chris Mi
31c2611b66 selftests: Introduce a new test case to tc testsuite
In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
  # sudo ./tdc.py -d enp4s0f0

This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.

In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Chris Mi
7f07199847 selftests: Introduce a new script to generate tc batch file
# ./tdc_batch.py -h
  usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

  TC batch file generator

  positional arguments:
    device                device name
    file                  batch file name

  optional arguments:
    -h, --help            show this help message and exit
    -n NUMBER, --number NUMBER
                          how many lines in batch file
    -o, --skip_sw         skip_sw (offload), by default skip_hw
    -s, --share_action    all filters share the same action
    -p, --prio            all filters have different prio

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
46e235c15c net_sched: fix call_rcu() race on act_sample module removal
Similar to commit c78e1746d3ad
("net: sched: fix call_rcu() race on classifier module unloads"),
we need to wait for flying RCU callback tcf_sample_cleanup_rcu().

Cc: Yotam Gigi <yotamg@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
2d132eba1d net_sched: add rtnl assertion to tcf_exts_destroy()
After previous patches, it is now safe to claim that
tcf_exts_destroy() is always called with RTNL lock.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
27ce4f05e2 net_sched: use tcf_queue_work() in tcindex filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
d4f84a41dc net_sched: use tcf_queue_work() in rsvp filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
c2f3f31d40 net_sched: use tcf_queue_work() in route filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
c0d378ef12 net_sched: use tcf_queue_work() in u32 filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
df2735ee8e net_sched: use tcf_queue_work() in matchall filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
e071dff2a6 net_sched: use tcf_queue_work() in fw filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
0552c8afa0 net_sched: use tcf_queue_work() in flower filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
94cdb47566 net_sched: use tcf_queue_work() in flow filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
b1b5b04fdb net_sched: use tcf_queue_work() in cgroup filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
e910af676b net_sched: use tcf_queue_work() in bpf filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
c96a48385d net_sched: use tcf_queue_work() in basic filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
7aa0045dad net_sched: introduce a workqueue for RCU callbacks of tc filter
This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.

Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.

On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.

Please also see the code comments below.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
David S. Miller
8c83c88584 Merge branch 'sctp-endianness-fixes'
Xin Long says:

====================
sctp: a bunch of fixes for some sparse warnings

As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
warnings or errors checked by sparse appear. They are all problems
about Endian and type cast.

Most of them are just warnings by which no issues could be caused
while some might be bugs.

This patchset fixes them with four patches basically according to
how they are introduced.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:25 +09:00
Xin Long
978aa04741 sctp: fix some type cast warnings introduced since very beginning
These warnings were found by running 'make C=2 M=net/sctp/'.
They are there since very beginning.

Note after this patch, there still one warning left in
sctp_outq_flush():
  sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)

Since it has been moved to sctp_stream_outq_migrate on net-next,
to avoid the extra job when merging net-next to net, I will post
the fix for it after the merging is done.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00
Xin Long
f6fc6bc0b8 sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
These warnings were found by running 'make C=2 M=net/sctp/'.

Commit d4d6fb5787a6 ("sctp: Try not to change a_rwnd when faking a
SACK from SHUTDOWN.") expected to use the peers old rwnd and add
our flight size to the a_rwnd. But with the wrong Endian, it may
not work as well as expected.

So fix it by converting to the right value.

Fixes: d4d6fb5787a6 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00
Xin Long
8d32503efd sctp: fix some type cast warnings introduced by transport rhashtable
These warnings were found by running 'make C=2 M=net/sctp/'.

They are introduced by not aware of Endian for the port when
coding transport rhashtable patches.

Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00
Xin Long
1da4fc97cb sctp: fix some type cast warnings introduced by stream reconf
These warnings were found by running 'make C=2 M=net/sctp/'.

They are introduced by not aware of Endian when coding stream
reconf patches.

Since commit c0d8bab6ae51 ("sctp: add get and set sockopt for
reconf_enable") enabled stream reconf feature for users, the
Fixes tag below would use it.

Fixes: c0d8bab6ae51 ("sctp: add get and set sockopt for reconf_enable")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00
Cong Wang
50317fce2c net_sched: avoid matching qdisc with zero handle
Davide found the following script triggers a NULL pointer
dereference:

ip l a name eth0 type dummy
tc q a dev eth0 parent :1 handle 1: htb

This is because for a freshly created netdevice noop_qdisc
is attached and when passing 'parent :1', kernel actually
tries to match the major handle which is 0 and noop_qdisc
has handle 0 so is matched by mistake. Commit 69012ae425d7
tries to fix a similar bug but still misses this case.

Handle 0 is not a valid one, should be just skipped. In
fact, kernel uses it as TC_H_UNSPEC.

Fixes: 69012ae425d7 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
Fixes: 59cc1f61f09c ("net: sched:convert qdisc linked list to hashtable")
Reported-by: Davide Caratti <dcaratti@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 17:55:03 +09:00
Xin Long
d04adf1b35 sctp: reset owner sk for data chunks on out queues when migrating a sock
Now when migrating sock to another one in sctp_sock_migrate(), it only
resets owner sk for the data in receive queues, not the chunks on out
queues.

It would cause that data chunks length on the sock is not consistent
with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
the old sk would never be freed, and the new sock may crash due to
the overflow sk_wmem_alloc.

syzbot found this issue with this series:

  r0 = socket$inet_sctp()
  sendto$inet(r0)
  listen(r0)
  accept4(r0)
  close(r0)

Although listen() should have returned error when one TCP-style socket
is in connecting (I may fix this one in another patch), it could also
be reproduced by peeling off an assoc.

This issue is there since very beginning.

This patch is to reset owner sk for the chunks on out queues so that
sk sk_wmem_alloc has correct value after accept one sock or peeloff
an assoc to one sock.

Note that when resetting owner sk for chunks on outqueue, it has to
sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
first and then sctp_set_owner_w them after changing assoc->base.sk,
due to that sctp_wfree and it's callees are using assoc->base.sk.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 12:06:57 +09:00
David S. Miller
151516fab4 Merge branch 'sockmap-fixes'
John Fastabend says:

====================
net: sockmap fixes

Last two fixes (as far as I know) for sockmap code this round.

First, we are using the qdisc cb structure when making the data end
calculation. This is really just wrong so, store it with the other
metadata in the correct tcp_skb_cb sturct to avoid breaking things.

Next, with recent work to attach multiple programs to a cgroup a
specific enumeration of return codes was agreed upon. However,
I wrote the sk_skb program types before seeing this work and used
a different convention. Patch 2 in the series aligns the return
codes to avoid breaking with this infrastructure and also aligns
with other programming conventions to avoid being the odd duck out
forcing programs to remember SK_SKB programs are different. Pusing
to net because its a user visible change. With this SK_SKB program
return codes are the same as other cgroup program types.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 11:18:49 +09:00
John Fastabend
bfa640757e bpf: rename sk_actions to align with bpf infrastructure
Recent additions to support multiple programs in cgroups impose
a strict requirement, "all yes is yes, any no is no". To enforce
this the infrastructure requires the 'no' return code, SK_DROP in
this case, to be 0.

To apply these rules to SK_SKB program types the sk_actions return
codes need to be adjusted.

This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
SK_ABORTED to remove any chance that the API may allow aborted
program flows to be passed up the stack. This would be incorrect
behavior and allow programs to break existing policies.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 11:18:48 +09:00
John Fastabend
8108a77515 bpf: bpf_compute_data uses incorrect cb structure
SK_SKB program types use bpf_compute_data to store the end of the
packet data. However, bpf_compute_data assumes the cb is stored in the
qdisc layer format. But, for SK_SKB this is the wrong layer of the
stack for this type.

It happens to work (sort of!) because in most cases nothing happens
to be overwritten today. This is very fragile and error prone.
Fortunately, we have another hole in tcp_skb_cb we can use so lets
put the data_end value there.

Note, SK_SKB program types do not use data_meta, they are failed by
sk_skb_is_valid_access().

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 11:18:48 +09:00