964721 Commits

Author SHA1 Message Date
Linus Torvalds
b32649b863 ARC changes for 5.10
- Drop support for EZChip NPS platform
 
  - miscll other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOXpuCuR6hedrdLCJadfx3eKKwl4FAl+OGm8ACgkQadfx3eKK
 wl6O9BAAh8sA8rHHu0e7me8eguxweOztSTYgnA0RO5SM0l1D+Qt0IKJO+0hZCaDs
 pdjoORRhm7KJiREcG/ADtLDy0XwxdoI/nBPX1Ckd3QW3u1hNwHqCKK51QnDg+e3c
 w9mkDQxYVwihDb1zaVt+cJzxZ80JtNd4p3ad+0JcZwtk8cwLWIXAHNrogwJdL81Q
 TataWXF+F7mrLjEZHheSRUd8LVU39y/IV06jEKBMkEc8Kf+igV82j7LlHxTRrggD
 h2q8bG8JnvEOOEh9D46dQ74Vm3i/D63MBn2J+wDRWYni3vrTlXK45877mdZSI/JL
 0KqEW3jcVT9Ockja0qbeL8RmbeuO6mRgCfaZQplt4Yyib5C6A/sKhyNIyZIlk7w1
 Q9FJ0JdBh5YUF15GOoMNiMHU+P04jcbp0il4atxlp95Ib/J4m+/VU9Z/zeXVEMDH
 eYmEFt/FfdhnsDCDpwgW0/b5PSU+Wc8m8SLeSn5fOgMNO4sxp0sjSFouvtYCkx0w
 kHFc4oMyTGyig+6GTy/vRAZ7sHsQem+SVgXYp9e8ybm6Z2/MlXGljjZkEFF+xoED
 rMYnlkzWUNW0PCdPOKfceBEk2GBfoWCOild/Uej3GfI6VbLze8uKDJXaMbx6Qs77
 pcbl74BrAER6ypj3/V1PhfZhXghur4M/uRQXpDSKDRQTAbrWHRE=
 =6IKf
 -----END PGP SIGNATURE-----

Merge tag 'arc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:
 "The bulk of ARC pull request is removal of EZChip NPS platform which
  was suffering from constant bitrot. In recent years EZChip has gone
  though multiple successive acquisitions and I guess things and people
  move on. I would like to take this opportunity to recognize and thank
  all those good folks (Gilad, Noam, Ofer...) for contributing major
  bits to ARC port (SMP, Big Endian).

  Summary:

   - drop support for EZChip NPS platform

   - misc other fixes"

* tag 'arc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  arc: include/asm: fix typos of "themselves"
  ARC: SMP: fix typo and use "come up" instead of "comeup"
  ARC: [dts] fix the errors detected by dtbs_check
  arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER
  ARC: [plat-eznps]: Drop support for EZChip NPS platform
2020-10-20 09:09:44 -07:00
Tom Rix
58e0cd3e23 PCI: v3-semi: Remove unneeded break
A break is not needed if it is preceded by a return

Link: https://lore.kernel.org/r/20201019190249.7825-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2020-10-20 10:59:55 -05:00
Chris Down
1e66d50ad3 kbuild: Use uname for LINUX_COMPILE_HOST detection
`hostname` may not be present on some systems as it's not mandated by
POSIX/SUSv4. This isn't just a theoretical problem: on Arch Linux,
`hostname` is provided by `inetutils`, which isn't part of the base
distribution.

    ./scripts/mkcompile_h: line 38: hostname: command not found

Use `uname -n` instead, which is more likely to be available (and
mandated by standards).

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-21 00:46:04 +09:00
Mark Wielaard
121c5d08d5 kbuild: Only add -fno-var-tracking-assignments for old GCC versions
Some old GCC versions between 4.5.0 and 4.9.1 might miscompile code
with -fvar-tracking-assingments (which is enabled by default with -g -O2).
Commit 2062afb4f804 ("Fix gcc-4.9.0 miscompilation of load_balance()
in scheduler") added -fno-var-tracking-assignments unconditionally to
work around this. But newer versions of GCC no longer have this bug, so
only add it for versions of GCC before 5.0. This allows various tools
such as a perf probe or gdb debuggers or systemtap to resolve variable
locations using dwarf locations in more code.

Signed-off-by: Mark Wielaard <mark@klomp.org>
Acked-by: Ian Rogers <irogers@google.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-21 00:28:53 +09:00
Rasmus Villemoes
8402ee182c kbuild: remove leftover comment for filechk utility
After commit 43fee2b23895 ("kbuild: do not redirect the first
prerequisite for filechk"), the rule is no longer automatically passed
$< as stdin, so remove the stale comment.

Fixes: 43fee2b23895 ("kbuild: do not redirect the first prerequisite for filechk")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-21 00:28:53 +09:00
Sami Tolvanen
0f6372e522 treewide: remove DISABLE_LTO
This change removes all instances of DISABLE_LTO from
Makefiles, as they are currently unused, and the preferred
method of disabling LTO is to filter out the flags instead.

Note added by Masahiro Yamada:
DISABLE_LTO was added as preparation for GCC LTO, but GCC LTO was
not pulled into the mainline. (https://lkml.org/lkml/2014/4/8/272)

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-21 00:28:53 +09:00
Will Deacon
ea8f8c99a2 arm64: spectre-v2: Favour CPU-specific mitigation at EL2
Spectre-v2 can be mitigated on Falkor CPUs either by calling into
firmware or by issuing a magic, CPU-specific sequence of branches.
Although the latter is faster, the size of the code sequence means that
it cannot be used in the EL2 vectors, and so there is a need for both
mitigations to co-exist in order to achieve optimal performance.

Change the mitigation selection logic for Spectre-v2 so that the
CPU-specific mitigation is used only when the firmware mitigation is
also available, rather than when a firmware mitigation is unavailable.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-20 16:01:57 +01:00
Paolo Bonzini
1b21c8db0e KVM/arm64 updates for Linux 5.10
- New page table code for both hypervisor and guest stage-2
 - Introduction of a new EL2-private host context
 - Allow EL2 to have its own private per-CPU variables
 - Support of PMU event filtering
 - Complete rework of the Spectre mitigation
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+B2vwPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDI20P/3zj+DT2INf6VrQr9nKg4uxwRk3AQYevRUhs
 id6s6OA9CCXNrN4OuboH3XXcPbI9bYJrp9xuxALAIPzVtLpQs4CLU0tgNclM3zw2
 Fe6u8wEIEFUxuNKlezd1J0gUqskTOloCsscMZ4s72ayDp56VR9fs6Ll/48rvtKXO
 x0FDkouIOAK8Cp/LVhFc0R4ZqaLvFiRr8OlJIqybM5JATl7CEq60jVlZqtoJcjOp
 M/Q/uPtoPZ+yaNizvzhNX9SSX5q7bW2hYqfjveCvmJXLoRZnUzmSiCGHk0DWSw86
 YO0X9cmK/E4feu7EwDppCbVi+FD/+Wcvhs87wAAa6jG2QgHab+OjGOg88gyNAQdL
 APMGALROiTPyC5s9SmrS8eMYZHJku9NRvBjXz5kdrXYhGhMSEmv9RfoBfwJKC4gz
 iqaQ9ejDm+uo6n9p9O8Fng34SqpIzCD1NwRW7AowM2WZH7wkpZ7d0RnbtToNzXsH
 TRKWrSZDydsy/rfL/e6k9CHdoIYCZcrRo/tfc7WGPoO6bwpUZP8O2K/dNz39fUTY
 eQls1fpEFJa8xRJFrxLHl3LuLcL4MuiTfN7TzbJpiwojXCFfUYcBJ3MGniONXIdQ
 2rtl6cx/FhUFekaC8S/k3UDK4i7AqjWsBm9nxRNBwSdwD/ecnqXMljq/DmOfjEmt
 3kOh0WS3
 =mjUF
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.10

- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
2020-10-20 08:14:25 -04:00
Saeed Mirzamohammadi
31cc578ae2 netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:54 +02:00
Jeremy Sowden
64747d5ed1 docs: nf_flowtable: fix typo.
"mailined" should be "mainlined."

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:53 +02:00
Timothée COCAULT
63137bc588 netfilter: ebtables: Fixes dropping of small packets in bridge nat
Fixes an error causing small packets to get dropped. skb_ensure_writable
expects the second parameter to be a length in the ethernet payload.=20
If we want to write the ethernet header (src, dst), we should pass 0.
Otherwise, packets with small payloads (< ETH_ALEN) will get dropped.

Fixes: c1a831167901 ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable")
Signed-off-by: Timothée COCAULT <timothee.cocault@orange.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:53 +02:00
Georg Kohmann
68f9f9c2c3 netfilter: Drop fragmented ndisc packets assembled in netfilter
Fragmented ndisc packets assembled in netfilter not dropped as specified
in RFC 6980, section 5. This behaviour breaks TAHI IPv6 Core Conformance
Tests v6LC.2.1.22/23, V6LC.2.2.26/27 and V6LC.2.3.18.

Setting IP6SKB_FRAGMENTED flag during reassembly.

References: commit b800c3b966bc ("ipv6: drop fragmented ndisc packets by default (RFC 6980)")
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:53 +02:00
Francesco Ruggeri
4f25434bcc netfilter: conntrack: connection timeout after re-register
If the first packet conntrack sees after a re-register is an outgoing
keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
SND.NXT-1.
When the peer correctly acknowledges SND.NXT, tcp_in_window fails
check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
returns false, which cascades into nf_conntrack_in setting
skb->_nfct = 0 and in later conntrack iptables rules not matching.
In cases where iptables are dropping packets that do not match
conntrack rules this can result in idle tcp connections to time out.

v2: adjust td_end when getting the reply rather than when sending out
    the keepalive packet.

Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:53 +02:00
longguang.yue
79dce09ab0 ipvs: adjust the debug info in function set_tcp_state
Outputting client,virtual,dst addresses info when tcp state changes,
which makes the connection debug more clear

Signed-off-by: longguang.yue <bigclouds@163.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:46 +02:00
Nick Desaulniers
3b92fa7485 arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n,
CONFIG_RELOCATABLE=n, we observe the following failure when trying to
link the kernel image with LD=ld.lld:

error: section: .exit.data is not contiguous with other relro sections

ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This
was previously fixed, but only for CONFIG_RELOCATABLE=y.

Fixes: 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201016175339.2429280-1-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-20 12:35:37 +01:00
Palmer Dabbelt
7bdf468a5b arm64: Fix a broken copyright header in gen_vdso_offsets.sh
I was going to copy this but I didn't want to chase around the build
system stuff so I did it a different way.

Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Link: https://lore.kernel.org/r/20201017002637.503579-1-palmer@dabbelt.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-20 12:33:01 +01:00
Hou Zhiqiang
15b2390634 PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()
NXP Layerscape (ls1028a, ls2088a), dra7xxx and imx6 platforms are either
programmed or statically configured to forward the error triggered by a
link-down state (eg no connected endpoint device) on the system bus for
PCI configuration transactions; these errors are reported as an SError
at system level, which is fatal.

Enumerating a PCI tree when the PCIe link is down is not sensible
either, so even if the link-up check is racy (link can go down after
map_bus() is called) add a link-up check in map_bus() to prevent issuing
configuration transactions when the link is down.

SError report:

 SError Interrupt on CPU2, code 0xbf000002 -- SError
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc5-next-20200914-00001-gf965d3ec86fa #67
 Hardware name: LS1046A RDB Board (DT)
 pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
 pc : pci_generic_config_read+0x3c/0xe0
 lr : pci_generic_config_read+0x24/0xe0
 sp : ffff80001003b7b0
 x29: ffff80001003b7b0 x28: ffff80001003ba74
 x27: ffff000971d96800 x26: ffff00096e77e0a8
 x25: ffff80001003b874 x24: ffff80001003b924
 x23: 0000000000000004 x22: 0000000000000000
 x21: 0000000000000000 x20: ffff80001003b874
 x19: 0000000000000004 x18: ffffffffffffffff
 x17: 00000000000000c0 x16: fffffe0025981840
 x15: ffffb94c75b69948 x14: 62203a383634203a
 x13: 666e6f635f726568 x12: 202c31203d207265
 x11: 626d756e3e2d7375 x10: 656877202c307830
 x9 : 203d206e66766564 x8 : 0000000000000908
 x7 : 0000000000000908 x6 : ffff800010900000
 x5 : ffff00096e77e080 x4 : 0000000000000000
 x3 : 0000000000000003 x2 : 84fa3440ff7e7000
 x1 : 0000000000000000 x0 : ffff800010034000
 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc5-next-20200914-00001-gf965d3ec86fa #67
 Hardware name: LS1046A RDB Board (DT)
 Call trace:
  dump_backtrace+0x0/0x1c0
  show_stack+0x18/0x28
  dump_stack+0xd8/0x134
  panic+0x180/0x398
  add_taint+0x0/0xb0
  arm64_serror_panic+0x78/0x88
  do_serror+0x68/0x180
  el1_error+0x84/0x100
  pci_generic_config_read+0x3c/0xe0
  dw_pcie_rd_other_conf+0x78/0x110
  pci_bus_read_config_dword+0x88/0xe8
  pci_bus_generic_read_dev_vendor_id+0x30/0x1b0
  pci_bus_read_dev_vendor_id+0x4c/0x78
  pci_scan_single_device+0x80/0x100

Link: https://lore.kernel.org/r/20200916054130.8685-1-Zhiqiang.Hou@nxp.com
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
[lorenzo.pieralisi@arm.com: rewrote the commit log, remove Fixes tag]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2020-10-20 11:14:24 +01:00
Juergen Gross
5f7f77400a xen/events: block rogue events for some time
In order to avoid high dom0 load due to rogue guests sending events at
high frequency, block those events in case there was no action needed
in dom0 to handle the events.

This is done by adding a per-event counter, which set to zero in case
an EOI without the XEN_EOI_FLAG_SPURIOUS is received from a backend
driver, and incremented when this flag has been set. In case the
counter is 2 or higher delay the EOI by 1 << (cnt - 2) jiffies, but
not more than 1 second.

In order not to waste memory shorten the per-event refcnt to two bytes
(it should normally never exceed a value of 2). Add an overflow check
to evtchn_get() to make sure the 2 bytes really won't overflow.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:19 +02:00
Juergen Gross
e99502f762 xen/events: defer eoi in case of excessive number of events
In case rogue guests are sending events at high frequency it might
happen that xen_evtchn_do_upcall() won't stop processing events in
dom0. As this is done in irq handling a crash might be the result.

In order to avoid that, delay further inter-domain events after some
time in xen_evtchn_do_upcall() by forcing eoi processing into a
worker on the same cpu, thus inhibiting new events coming in.

The time after which eoi processing is to be delayed is configurable
via a new module parameter "event_loop_timeout" which specifies the
maximum event loop time in jiffies (default: 2, the value was chosen
after some tests showing that a value of 2 was the lowest with an
only slight drop of dom0 network throughput while multiple guests
performed an event storm).

How long eoi processing will be delayed can be specified via another
parameter "event_eoi_delay" (again in jiffies, default 10, again the
value was chosen after testing with different delay values).

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:16 +02:00
Juergen Gross
7beb290caa xen/events: use a common cpu hotplug hook for event channels
Today only fifo event channels have a cpu hotplug callback. In order
to prepare for more percpu (de)init work move that callback into
events_base.c and add percpu_init() and percpu_deinit() hooks to
struct evtchn_ops.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:14 +02:00
Juergen Gross
c44b849cee xen/events: switch user event channels to lateeoi model
Instead of disabling the irq when an event is received and enabling
it again when handled by the user process use the lateeoi model.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:11 +02:00
Juergen Gross
c2711441bc xen/pciback: use lateeoi irq binding
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving pcifront use the lateeoi irq
binding for pciback and unmask the event channel only just before
leaving the event handling function.

Restructure the handling to support that scheme. Basically an event can
come in for two reasons: either a normal request for a pciback action,
which is handled in a worker, or in case the guest has finished an AER
request which was requested by pciback.

When an AER request is issued to the guest and a normal pciback action
is currently active issue an EOI early in order to be able to receive
another event when the AER request has been finished by the guest.

Let the worker processing the normal requests run until no further
request is pending, instead of starting a new worker ion that case.
Issue the EOI only just before leaving the worker.

This scheme allows to drop calling the generic function
xen_pcibk_test_and_schedule_op() after processing of any request as
the handling of both request types is now separated more cleanly.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:08 +02:00
Juergen Gross
c8d647a326 xen/pvcallsback: use lateeoi irq binding
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving pvcallsfront use the lateeoi
irq binding for pvcallsback and unmask the event channel only after
handling all write requests, which are the ones coming in via an irq.

This requires modifying the logic a little bit to not require an event
for each write request, but to keep the ioworker running until no
further data is found on the ring page to be processed.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:07 +02:00
Juergen Gross
86991b6e7e xen/scsiback: use lateeoi irq binding
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving scsifront use the lateeoi
irq binding for scsiback and unmask the event channel only just before
leaving the event handling function.

In case of a ring protocol error don't issue an EOI in order to avoid
the possibility to use that for producing an event storm. This at once
will result in no further call of scsiback_irq_fn(), so the ring_error
struct member can be dropped and scsiback_do_cmd_fn() can signal the
protocol error via a negative return value.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:04 +02:00
Juergen Gross
23025393db xen/netback: use lateeoi irq binding
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving netfront use the lateeoi
irq binding for netback and unmask the event channel only just before
going to sleep waiting for new events.

Make sure not to issue an EOI when none is pending by introducing an
eoi_pending element to struct xenvif_queue.

When no request has been consumed set the spurious flag when sending
the EOI for an interrupt.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:03 +02:00
Juergen Gross
01263a1fab xen/blkback: use lateeoi irq binding
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving blkfront use the lateeoi
irq binding for blkback and unmask the event channel only after
processing all pending requests.

As the thread processing requests is used to do purging work in regular
intervals an EOI may be sent only after having received an event. If
there was no pending I/O request flag the EOI as spurious.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:22:01 +02:00
Juergen Gross
54c9de8989 xen/events: add a new "late EOI" evtchn framework
In order to avoid tight event channel related IRQ loops add a new
framework of "late EOI" handling: the IRQ the event channel is bound
to will be masked until the event has been handled and the related
driver is capable to handle another event. The driver is responsible
for unmasking the event channel via the new function xen_irq_lateeoi().

This is similar to binding an event channel to a threaded IRQ, but
without having to structure the driver accordingly.

In order to support a future special handling in case a rogue guest
is sending lots of unsolicited events, add a flag to xen_irq_lateeoi()
which can be set by the caller to indicate the event was a spurious
one.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:21:59 +02:00
Juergen Gross
f013371974 xen/events: fix race in evtchn_fifo_unmask()
Unmasking a fifo event channel can result in unmasking it twice, once
directly in the kernel and once via a hypercall in case the event was
pending.

Fix that by doing the local unmask only if the event is not pending.

This is part of XSA-332.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2020-10-20 10:21:58 +02:00
Juergen Gross
4d3fe31bd9 xen/events: add a proper barrier to 2-level uevent unmasking
A follow-up patch will require certain write to happen before an event
channel is unmasked.

While the memory barrier is not strictly necessary for all the callers,
the main one will need it. In order to avoid an extra memory barrier
when using fifo event channels, mandate evtchn_unmask() to provide
write ordering.

The 2-level event handling unmask operation is missing an appropriate
barrier, so add it. Fifo event channels are fine in this regard due to
using sync_cmpxchg().

This is part of XSA-332.

Cc: stable@vger.kernel.org
Suggested-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:21:56 +02:00
Juergen Gross
073d0552ea xen/events: avoid removing an event channel while handling it
Today it can happen that an event channel is being removed from the
system while the event handling loop is active. This can lead to a
race resulting in crashes or WARN() splats when trying to access the
irq_info structure related to the event channel.

Fix this problem by using a rwlock taken as reader in the event
handling loop and as writer when deallocating the irq_info structure.

As the observed problem was a NULL dereference in evtchn_from_irq()
make this function more robust against races by testing the irq_info
pointer to be not NULL before dereferencing it.

And finally make all accesses to evtchn_to_irq[row][col] atomic ones
in order to avoid seeing partial updates of an array element in irq
handling. Note that irq handling can be entered only for event channels
which have been valid before, so any not populated row isn't a problem
in this regard, as rows are only ever added and never removed.

This is XSA-331.

Cc: stable@vger.kernel.org
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reported-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:21:51 +02:00
Steve French
acf96fef46 smb3.1.1: do not fail if no encryption required but server doesn't support it
There are cases where the server can return a cipher type of 0 and
it not be an error. For example server supported no encryption types
(e.g. server completely disabled encryption), or the server and
client didn't support any encryption types in common (e.g. if a
server only supported AES256_CCM). In those cases encryption would
not be supported, but that can be ok if the client did not require
encryption on mount and it should not return an error.

In the case in which mount requested encryption ("seal" on mount)
then checks later on during tree connection will return the proper
rc, but if seal was not requested by client, since server is allowed
to return 0 to indicate no supported cipher, we should not fail mount.

Reported-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-20 02:15:56 -05:00
Ido Schimmel
df6afe2f7c nexthop: Fix performance regression in nexthop deletion
While insertion of 16k nexthops all using the same netdev ('dummy10')
takes less than a second, deletion takes about 130 seconds:

# time -p ip -b nexthop.batch
real 0.29
user 0.01
sys 0.15

# time -p ip link set dev dummy10 down
real 131.03
user 0.06
sys 0.52

This is because of repeated calls to synchronize_rcu() whenever a
nexthop is removed from a nexthop group:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
...
    b'finish_task_switch'
    b'schedule'
    b'schedule_timeout'
    b'wait_for_completion'
    b'__wait_rcu_gp'
    b'synchronize_rcu.part.0'
    b'synchronize_rcu'
    b'__remove_nexthop'
    b'remove_nexthop'
    b'nexthop_flush_dev'
    b'nh_netdev_event'
    b'raw_notifier_call_chain'
    b'call_netdevice_notifiers_info'
    b'__dev_notify_flags'
    b'dev_change_flags'
    b'do_setlink'
    b'__rtnl_newlink'
    b'rtnl_newlink'
    b'rtnetlink_rcv_msg'
    b'netlink_rcv_skb'
    b'rtnetlink_rcv'
    b'netlink_unicast'
    b'netlink_sendmsg'
    b'____sys_sendmsg'
    b'___sys_sendmsg'
    b'__sys_sendmsg'
    b'__x64_sys_sendmsg'
    b'do_syscall_64'
    b'entry_SYSCALL_64_after_hwframe'
    -                ip (277)
        126554955

Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().

With this patch deletion of 16k nexthops takes less than a second:

# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04

Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:

# ./fib_nexthops.sh
...
Tests passed: 134
Tests failed:   0

Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 20:07:15 -07:00
Linus Torvalds
270315b823 RISC-V Patches for the 5.10 Merge Window, Part 1
This contains a handful of cleanups and new features, including:
 
 * A handful of cleanups for our page fault handling.
 * Improvements to how we fill out cacheinfo.
 * Support for EFI-based systems.
 
 ---
 
 This contains a merge from the EFI tree that was necessary as some of the EFI
 support landed over there.  It's my first time doing something like this,
 
 I haven't included the set_fs stuff because the base branch it depends on
 hasn't been merged yet.  I'll probably have another merge window PR, as
 there's more in flight (most notably the fix for new binutils I just sent out),
 but I figured there was no reason to delay this any longer.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl+KQ6gTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYibmwD/4qWfOW7R/kUWi08ethcaAhNEWLvqIh
 2/KjGLORw+NTZ1F4pEFyQG5LRd3yWDT/UXh/k8gXINqmdclNV01Z3T+O7WuRlISs
 07i26W1qRpNeJ7lDVhr9foKpeOU/AXvidgoF330nGlyO4HZkYKhK2yB3t8uGWywr
 Zt/EpMJeBIRKzWiLhOgLAdYJthhZ9AlnouNnr9myHnO5Ksel+AZ/BKYvn7ZbHMns
 6vFUxp6392/LERRRIfDqPsTuxPIYMHjuEsGSESLsjAIyq/shgN1knG/C+zwU5DcK
 zUDBt1DEP7Tb45w7VBASSjn1M+cUolz9/c2dBhlVcdBlk1GKF+KILSTmWUBpQ8oP
 ETVAuQK5HTcjy9bVcJMj0Oa3mFshVAAByOH+Wyrdo+qSLkb7y3spPvsL4dyjrKjL
 +pe6C7WvavaEFoQXVWO2sTUBGYt7qDLRdrDgOGBIHylTXhTxf2wYzAF4ZmDROECT
 Qfc7Ac3aIWYvWDmxE+x8OniuclfZ0DndKLKQj6FJWUTIxFZzTxsHK75d47D1ID0S
 ZwAmUd0eYjjwMTO/6AM/Aqu3o8IP4GOXjJf4ijxH9+LjpUhm/ibmHDAUY69sU1WX
 kdX51gQzoEuW7XMVz1HoTSvaGGKtyFDuRxs8RG/tSFaRtznRz0Sro6BpLCeG968n
 k/d5WL/vZZ/NDA==
 =FYs/
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "A handful of cleanups and new features:

   - A handful of cleanups for our page fault handling

   - Improvements to how we fill out cacheinfo

   - Support for EFI-based systems"

* tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits)
  RISC-V: Add page table dump support for uefi
  RISC-V: Add EFI runtime services
  RISC-V: Add EFI stub support.
  RISC-V: Add PE/COFF header for EFI stub
  RISC-V: Implement late mapping page table allocation functions
  RISC-V: Add early ioremap support
  RISC-V: Move DT mapping outof fixmap
  RISC-V: Fix duplicate included thread_info.h
  riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault()
  riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration
  riscv: Add cache information in AUX vector
  riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
  riscv: Set more data to cacheinfo
  riscv/mm/fault: Move access error check to function
  riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault()
  riscv/mm/fault: Simplify mm_fault_error()
  riscv/mm/fault: Move fault error handling to mm_fault_error()
  riscv/mm/fault: Simplify fault error handling
  riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault()
  riscv/mm/fault: Move bad area handling to bad_area()
  ...
2020-10-19 18:18:30 -07:00
Linus Torvalds
d3876ff744 m68knommu: collection of fixes for 5.10
Fixes include:
 . switch to using asm-generic uaccess code
 . fix sparse warnings in signal code
 . fix compilation of ColdFire MMC support
 . support sysrq in ColdFire serial driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAl+M4OUACgkQTiQVqaG9
 L4Bb0A//fsrbpUL6p0NHiuOoP+327shE+Tw+l78yCQRmnxtSnI2cK/Mq7O4/mLI/
 6w/Fisw/EnBAVeG1v6HZ4BoQopTVzFj7/iE4KgOkum0FDhy55VHKhkAWRAX6ZslW
 RozzahotW4LbmhD1IUSmlNU5gIyABm3hSlZKjKJEo90FuGTeMpwnClefjhTSzOuY
 rfKk73PtDHCCRXfF52vweabhR+17Akh2D3gSwje5MSRvIC+7BC3wyE1s4B/pGFX2
 Xw9ziGw8dz9J7wqREKT/AUuylqO++HrGfAeXkNJ6xggI8iOJsxVb943BDgu3P1gF
 MaowvV/sHwwocNBOzqDExobx9OS3/UC124+sSwcPJxBCXDJkMiqvxVGymX+6j/aN
 WXXv+8DVUDL3hKbRsVVssiBo2oQ/pBtZigyB2HHt9z9zIPtyQ03nm0Nud8o+fYm6
 6CT2xdntnBKAjAyP2eu7L3dEBCoK3E6juYMiwdiwys8C2i8SF8hCI+QgXRiSQcml
 dQeyCxZJLxwH1fG3zJByWDUP9Pm6hC3MV1dKGhzbVJnieX0qKn7sHW1CBcvW/X84
 qZp+C7O+DjJs0yfR2QzfJ7gIk1GhJH13HwiXWx4BMEGU4oAUCgajEeEip+XOKol4
 as5xSqcyU8eGNDmzjibvi2Lqnn/CPzjx/CATaduYrpAmao55kg4=
 =QMK9
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:
 "A collection of fixes for 5.10:

   - switch to using asm-generic uaccess code

   - fix sparse warnings in signal code

   - fix compilation of ColdFire MMC support

   - support sysrq in ColdFire serial driver"

* tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  serial: mcf: add sysrq capability
  m68knommu: include SDHC support only when hardware has it
  m68knommu: fix sparse warnings in signal code
  m68knommu: switch to using asm-generic/uaccess.h
2020-10-19 18:12:44 -07:00
Maxim Kochetkov
a15a6afb3b net: dsa: seville: the packet buffer is 2 megabits, not megabytes
The VSC9953 Seville switch has 2 megabits of buffer split into 4360
words of 60 bytes each. 2048 * 1024 is 2 megabytes instead of 2 megabits.
2 megabits is (2048 / 8) * 1024 = 256 * 1024.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: a63ed92d217f ("net: dsa: seville: fix buffer size of the queue system")
Link: https://lore.kernel.org/r/20201019050625.21533-1-fido_max@inbox.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 18:03:42 -07:00
Po-Hsu Lin
26ebd6fed9 selftests: rtnetlink: load fou module for kci_test_encap_fou() test
The kci_test_encap_fou() test from kci_test_encap() in rtnetlink.sh
needs the fou module to work. Otherwise it will fail with:

  $ ip netns exec "$testns" ip fou add port 7777 ipproto 47
  RTNETLINK answers: No such file or directory
  Error talking to the kernel

Add the CONFIG_NET_FOU into the config file as well. Which needs at
least to be set as a loadable module.

Fixes: 6227efc1a20b ("selftests: rtnetlink.sh: add vxlan and fou test cases")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20201019030928.9859-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:55:29 -07:00
Christian Eggers
bc7e343dbd net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tags
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.

Fixes: 7a6ffe764be3 ("net: dsa: point out the tail taggers")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:32:50 -07:00
Valentin Vidic
3bd57b9055 net: korina: cast KSEG0 address to pointer in kfree
Fixes gcc warning:

passing argument 1 of 'kfree' makes pointer from integer without a cast

Fixes: 3af5f0f5c74e ("net: korina: fix kfree of rx/tx descriptor array")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Link: https://lore.kernel.org/r/20201018184255.28989-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:00:00 -07:00
Heiner Kallweit
424a646e07 r8169: fix operation under forced interrupt threading
For several network drivers it was reported that using
__napi_schedule_irqoff() is unsafe with forced threading. One way to
fix this is switching back to __napi_schedule, but then we lose the
benefit of the irqoff version in general. As stated by Eric it doesn't
make sense to make the minimal hard irq handlers in drivers using NAPI
a thread. Therefore ensure that the hard irq handler is never
thread-ified.

Fixes: 9a899a35b0d6 ("r8169: switch to napi_schedule_irqoff")
Link: https://lkml.org/lkml/2020/10/18/19
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/4d3ef84a-c812-5072-918a-22a6f6468310@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 16:55:54 -07:00
Martin KaFai Lau
8568c3cefd bpf: selftest: Ensure the return value of the bpf_per_cpu_ptr() must be checked
This patch tests all pointers returned by bpf_per_cpu_ptr() must be
tested for NULL first before it can be accessed.

This patch adds a subtest "null_check", so it moves the ".data..percpu"
existence check to the very beginning and before doing any subtest.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194225.1051596-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Martin KaFai Lau
e710bcc6d9 bpf: selftest: Ensure the return value of bpf_skc_to helpers must be checked
This patch tests:

int bpf_cls(struct __sk_buff *skb)
{
	/* REG_6: sk
	 * REG_7: tp
	 * REG_8: req_sk
	 */

	sk = skb->sk;
	if (!sk)
		return 0;

	tp = bpf_skc_to_tcp_sock(sk);
	req_sk = bpf_skc_to_tcp_request_sock(sk);
	if (!req_sk)
		return 0;

	/* !tp has not been tested, so verifier should reject. */
	return *(__u8 *)tp;
}

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194219.1051314-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Martin KaFai Lau
93c230e3f5 bpf: Enforce id generation for all may-be-null register type
The commit af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
introduces RET_PTR_TO_BTF_ID_OR_NULL and
the commit eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
introduces RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL.
Note that for RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL, the reg0->type
could become PTR_TO_MEM_OR_NULL which is not covered by
BPF_PROBE_MEM.

The BPF_REG_0 will then hold a _OR_NULL pointer type. This _OR_NULL
pointer type requires the bpf program to explicitly do a NULL check first.
After NULL check, the verifier will mark all registers having
the same reg->id as safe to use.  However, the reg->id
is not set for those new _OR_NULL return types.  One of the ways
that may be wrong is, checking NULL for one btf_id typed pointer will
end up validating all other btf_id typed pointers because
all of them have id == 0.  The later tests will exercise
this path.

To fix it and also avoid similar issue in the future, this patch
moves the id generation logic out of each individual RET type
test in check_helper_call().  Instead, it does one
reg_type_may_be_null() test and then do the id generation
if needed.

This patch also adds a WARN_ON_ONCE in mark_ptr_or_null_reg()
to catch future breakage.

The _OR_NULL pointer usage in the bpf_iter_reg.ctx_arg_info is
fine because it just happens that the existing id generation after
check_ctx_access() has covered it.  It is also using the
reg_type_may_be_null() to decide if id generation is needed or not.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194212.1050855-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Linus Torvalds
bbe85027ce Recalling the first round of new code for 5.10, in which we added:
- New feature: Widen inode timestamps and quota grace expiration
   timestamps to support dates through the year 2486.
 - New feature: storing inode btree counts in the AGI to speed up certain
   mount time per-AG block reservation operatoins and add a little more
   metadata redundancy.
 
 For the second round of new code for 5.10:
 - Deprecate the V4 filesystem format, some disused mount options, and some
   legacy sysctl knobs now that we can support dates into the 25th century.
   Note that removal of V4 support will not happen until the early 2030s.
 - Fix some probles with inode realtime flag propagation.
 - Fix some buffer handling issues when growing a rt filesystem.
 - Fix a problem where a BMAP_REMAP unmap call would free rt extents even
   though the purpose of BMAP_REMAP is to avoid freeing the blocks.
 - Strengthen the dabtree online scrubber to check hash values on child
   dabtree blocks.
 - Actually log new intent items created as part of recovering log intent
   items.
 - Fix a bug where quotas weren't attached to an inode undergoing bmap
   intent item recovery.
 - Fix a buffer overrun problem with specially crafted log buffer
   headers.
 - Various cleanups to type usage and slightly inaccurate comments.
 - More cleanups to the xattr, log, and quota code.
 - Don't run the (slower) shared-rmap operations on attr fork mappings.
 - Fix a bug where we failed to check the LSN of finobt blocks during
   replay and could therefore overwrite newer data with older data.
 - Clean up the ugly nested transaction mess that log recovery uses to
   stage intent item recovery in the correct order by creating a proper
   data structure to capture recovered chains.
 - Use the capture structure to resume intent item chains with the
   same log space and block reservations as when they were captured.
 - Fix a UAF bug in bmap intent item recovery where we failed to maintain
   our reference to the incore inode if the bmap operation needed to
   relog itself to continue.
 - Rearrange the defer ops mechanism to finish newly created subtasks
   of a parent task before moving on to the next parent task.
 - Automatically relog intent items in deferred ops chains if doing so
   would help us avoid pinning the log tail.  This will help fix some
   log scaling problems now and will facilitate atomic file updates later.
 - Fix a deadlock in the GETFSMAP implementation by using an internal
   memory buffer to reduce indirect calls and copies to userspace,
   thereby improving its performance by ~20%.
 - Fix various problems when calling growfs on a realtime volume would
   not fully update the filesystem metadata.
 - Fix broken Kconfig asking about deprecated XFS when XFS is disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl+KRuUACgkQ+H93GTRK
 tOtqsBAAm+AZ92DRjOD7/TbU3vJALRKBBUCc6weEYUJKaZUkdYpx7Fn8Si3K8Nu0
 Pxo8qLO8WtP3ECyd+CZgkQgZAhHrjRG+FnCOuNyj1yMguX9CDu4cK0dOh/M64+pM
 BvWPqLfd99mzr7HkQ0SuLIyDMeio3leU4lySAIVpADO3V7WF5ZgHCfEETpOh5Di1
 oIzhYlxHyfK+32u4sXSWsPnogQZwjyn4CyQ+6humK0d089pVB1wbjHaTym7exjSa
 cFhMqS1XDbpMuoF4BXMcx31UTOb+8/S6TKCVsRl61j3XKGzbYKSrLmrSb/r6gFWn
 wyXJGmLok0I2UDnX1ZArIWstJHcgPlTelWrssG8wAnopLSJoU10f8o88m43d0krF
 fCUCac1rKPcisg7CS5njgUkOBknSLeBCeztl59N/8acnkaETPQr0tReDpB4wGGaW
 aGEWBrCbz1QZyfDBttNPQLcreROGukZ8R8MMRl4GiAQwZz5UrTUFeoK6thplHVvp
 ANhpYGdJy4jJ79wt4MNVYUF8U8IRWdn0ddsRx08pLWchC1PH8HH944qrUXAVPYZ+
 MohSQqKtjvaKwZLP86SvCJFs20wEEUxzCSQbz4LTO5aBz3uDfo0LQSOYzPBU+OKp
 E33SNds13nsjeH8HBLtXH3lr3absywLcV2ZMaIGsQSLpE2p8AHA=
 =LuWg
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "The second large pile of new stuff for 5.10, with changes even more
  monumental than last week!

  We are formally announcing the deprecation of the V4 filesystem format
  in 2030. All users must upgrade to the V5 format, which contains
  design improvements that greatly strengthen metadata validation,
  supports reflink and online fsck, and is the intended vehicle for
  handling timestamps past 2038. We're also deprecating the old Irix
  behavioral tweaks in September 2025.

  Coming along for the ride are two design changes to the deferred
  metadata ops subsystem. One of the improvements is to retain correct
  logical ordering of tasks and subtasks, which is a more logical design
  for upper layers of XFS and will become necessary when we add atomic
  file range swaps and commits. The second improvement to deferred ops
  improves the scalability of the log by helping the log tail to move
  forward during long-running operations. This reduces log contention
  when there are a large number of threads trying to run transactions.

  In addition to that, this fixes numerous small bugs in log recovery;
  refactors logical intent log item recovery to remove the last
  remaining place in XFS where we could have nested transactions; fixes
  a couple of ways that intent log item recovery could fail in ways that
  wouldn't have happened in the regular commit paths; fixes a deadlock
  vector in the GETFSMAP implementation (which improves its performance
  by 20%); and fixes serious bugs in the realtime growfs, fallocate, and
  bitmap handling code.

  Summary:

   - Deprecate the V4 filesystem format, some disused mount options, and
     some legacy sysctl knobs now that we can support dates into the
     25th century. Note that removal of V4 support will not happen until
     the early 2030s.

   - Fix some probles with inode realtime flag propagation.

   - Fix some buffer handling issues when growing a rt filesystem.

   - Fix a problem where a BMAP_REMAP unmap call would free rt extents
     even though the purpose of BMAP_REMAP is to avoid freeing the
     blocks.

   - Strengthen the dabtree online scrubber to check hash values on
     child dabtree blocks.

   - Actually log new intent items created as part of recovering log
     intent items.

   - Fix a bug where quotas weren't attached to an inode undergoing bmap
     intent item recovery.

   - Fix a buffer overrun problem with specially crafted log buffer
     headers.

   - Various cleanups to type usage and slightly inaccurate comments.

   - More cleanups to the xattr, log, and quota code.

   - Don't run the (slower) shared-rmap operations on attr fork
     mappings.

   - Fix a bug where we failed to check the LSN of finobt blocks during
     replay and could therefore overwrite newer data with older data.

   - Clean up the ugly nested transaction mess that log recovery uses to
     stage intent item recovery in the correct order by creating a
     proper data structure to capture recovered chains.

   - Use the capture structure to resume intent item chains with the
     same log space and block reservations as when they were captured.

   - Fix a UAF bug in bmap intent item recovery where we failed to
     maintain our reference to the incore inode if the bmap operation
     needed to relog itself to continue.

   - Rearrange the defer ops mechanism to finish newly created subtasks
     of a parent task before moving on to the next parent task.

   - Automatically relog intent items in deferred ops chains if doing so
     would help us avoid pinning the log tail. This will help fix some
     log scaling problems now and will facilitate atomic file updates
     later.

   - Fix a deadlock in the GETFSMAP implementation by using an internal
     memory buffer to reduce indirect calls and copies to userspace,
     thereby improving its performance by ~20%.

   - Fix various problems when calling growfs on a realtime volume would
     not fully update the filesystem metadata.

   - Fix broken Kconfig asking about deprecated XFS when XFS is
     disabled"

* tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
  xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n
  xfs: fix high key handling in the rt allocator's query_range function
  xfs: annotate grabbing the realtime bitmap/summary locks in growfs
  xfs: make xfs_growfs_rt update secondary superblocks
  xfs: fix realtime bitmap/summary file truncation when growing rt volume
  xfs: fix the indent in xfs_trans_mod_dquot
  xfs: do the ASSERT for the arguments O_{u,g,p}dqpp
  xfs: fix deadlock and streamline xfs_getfsmap performance
  xfs: limit entries returned when counting fsmap records
  xfs: only relog deferred intent items if free space in the log gets low
  xfs: expose the log push threshold
  xfs: periodically relog deferred intent items
  xfs: change the order in which child and parent defer ops are finished
  xfs: fix an incore inode UAF in xfs_bui_recover
  xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  xfs: clean up bmap intent item recovery checking
  xfs: xfs_defer_capture should absorb remaining transaction reservation
  xfs: xfs_defer_capture should absorb remaining block reservations
  xfs: proper replay of deferred ops queued during log recovery
  xfs: remove XFS_LI_RECOVERED
  ...
2020-10-19 14:38:46 -07:00
Linus Torvalds
694565356c fuse update for 5.10
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCX4n0/gAKCRDh3BK/laaZ
 PM3jAP4xhaix0j/y3VyaxsUqWg6ZSrjq6X0o9clGMJv27IAtjgD/fJ7ZwzTldojD
 qb7N3utjLiPVRjwFmvsZ8JZ7O7PbwQ0=
 =oUbZ
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Support directly accessing host page cache from virtiofs. This can
   improve I/O performance for various workloads, as well as reducing
   the memory requirement by eliminating double caching. Thanks to Vivek
   Goyal for doing most of the work on this.

 - Allow automatic submounting inside virtiofs. This allows unique
   st_dev/ st_ino values to be assigned inside the guest to files
   residing on different filesystems on the host. Thanks to Max Reitz
   for the patches.

 - Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
  virtiofs: calculate number of scatter-gather elements accurately
  fuse: connection remove fix
  fuse: implement crossmounts
  fuse: Allow fuse_fill_super_common() for submounts
  fuse: split fuse_mount off of fuse_conn
  fuse: drop fuse_conn parameter where possible
  fuse: store fuse_conn in fuse_req
  fuse: add submount support to <uapi/linux/fuse.h>
  fuse: fix page dereference after free
  virtiofs: add logic to free up a memory range
  virtiofs: maintain a list of busy elements
  virtiofs: serialize truncate/punch_hole and dax fault path
  virtiofs: define dax address space operations
  virtiofs: add DAX mmap support
  virtiofs: implement dax read/write operations
  virtiofs: introduce setupmapping/removemapping commands
  virtiofs: implement FUSE_INIT map_alignment field
  virtiofs: keep a list of free dax memory ranges
  virtiofs: add a mount option to enable dax
  virtiofs: set up virtio_fs dax_device
  ...
2020-10-19 14:28:30 -07:00
Linus Torvalds
922a763ae1 zonefs changes for 5.10
This pull request introduces the following changes to zonefs:
 
 * Add the "explicit-open" mount option to automatically issue a
   REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file
   is open for writing for the first time. This avoids "insufficient zone
   resources" errors for write operations on some drives with limited
   zone resources or on ZNS drives with a limited number of active zones.
   From Johannes.
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCX4zOWAAKCRDdoc3SxdoY
 dh7PAP9IdcYnR9x6ttd2Aqsm17IBfY6b/TroE70Lm2YTlY0nTgD+IJTwYQG8KQAE
 QHAe6TD6VQfSftOeAOAnjEG64Iv2hQE=
 =vwu+
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs updates from Damien Le Moal:
 "Add an 'explicit-open' mount option to automatically issue a
  REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file
  is open for writing for the first time.

  This avoids 'insufficient zone resources' errors for write operations
  on some drives with limited zone resources or on ZNS drives with a
  limited number of active zones. From Johannes"

* tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: document the explicit-open mount option
  zonefs: open/close zone on file open/close
  zonefs: provide no-lock zonefs_io_error variant
  zonefs: introduce helper for zone management
2020-10-19 13:52:01 -07:00
Alexandre Belloni
35331b506f rtc: r9701: set range
Set range and remove the set_time check. This is a classic BCD RTC.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-6-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni
dfe13cf2ae rtc: r9701: convert to devm_rtc_allocate_device
This allows further improvement of the driver.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-5-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni
8b34134907 rtc: r9701: stop setting RWKCNT
tm_wday is never checked for validity and it is not read back in
r9701_get_datetime. Avoid setting it to stop tripping static checkers:

        drivers/rtc/rtc-r9701.c:109 r9701_set_datetime()
        error: undefined (user controlled) shift '1 << dt->tm_wday'

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-4-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni
2a8f3380c9 rtc: r9701: remove useless memset
The RTC core already sets to zero the struct rtc_tie it passes to the
driver, avoid doing it a second time.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-3-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni
7390bec4ed rtc: r9701: stop setting a default time
It doesn't make sense to set the RTC to a default value at probe time. Let
the core handle invalid date and time.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-2-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00