1042874 Commits

Author SHA1 Message Date
Lucas Stach
725cbc7884 drm/etnaviv: exec and MMU state is lost when resetting the GPU
When the GPU is reset both the current exec state, as well as all MMU
state is lost. Move the driver side state tracking into the reset function
to keep hardware and software state from diverging.

Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2021-09-16 10:35:24 +02:00
Lucas Stach
8f3eea9d01 drm/etnaviv: keep MMU context across runtime suspend/resume
The MMU state may be kept across a runtime suspend/resume cycle, as we
avoid a full hardware reset to keep the latency of the runtime PM small.

Don't pretend that the MMU state is lost in driver state. The MMU
context is pushed out when new HW jobs with a different context are
coming in. The only exception to this is when the GPU is unbound, in
which case we need to make sure to also free the last active context.

Cc: stable@vger.kernel.org # 5.4
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2021-09-16 10:35:20 +02:00
Lucas Stach
23e0f5a57d drm/etnaviv: stop abusing mmu_context as FE running marker
While the DMA frontend can only be active when the MMU context is set, the
reverse isn't necessarily true, as the frontend can be stopped while the
MMU state is kept. Stop treating mmu_context being set as a indication that
the frontend is running and instead add a explicit property.

Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2021-09-16 10:35:15 +02:00
Lucas Stach
cda7532916 drm/etnaviv: put submit prev MMU context when it exists
The prev context is the MMU context at the time of the job
queueing in hardware. As a job might be queued multiple times
due to recovery after a GPU hang, we need to make sure to put
the stale prev MMU context from a prior queuing, to avoid the
reference and thus the MMU context leaking.

Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2021-09-16 10:35:11 +02:00
Lucas Stach
78edefc05e drm/etnaviv: return context from etnaviv_iommu_context_get
Being able to have the refcount manipulation in an assignment makes
it much easier to parse the code.

Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2021-09-16 10:34:59 +02:00
Helge Deller
90cc7bed1e parisc: Use absolute_pointer() to define PAGE0
Use absolute_pointer() wrapper for PAGE0 to avoid this compiler warning:

  arch/parisc/kernel/setup.c: In function 'start_parisc':
  error: '__builtin_memcmp_eq' specified bound 8 exceeds source size 0

Signed-off-by: Helge Deller <deller@gmx.de>
Co-Developed-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-16 08:35:42 +02:00
Linus Torvalds
ff1ffd71d5 hyperv-fixes for 5.15-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmFB6pwTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXoo5CAChbzKMMbqBHArnNCO+pKkUWmc7eYqJ
 U368ux75wWEy6ywCUxCHqhwnTrp5KJhyjTPi89V8Vwh+aNG6q86g2dT3I6qsoIby
 Dav9yw1NiExxNzAEiJVH/WgE+WGZUvWqzbKixdZWjDk9DWhVv7h96chik9dvh9SW
 /nm27o4sNmnFETQ+kh/hmX+8T6V8HeqZuL9WrGw4EW9At/WE16vjk47Wm5gJRl+j
 Z1KylALvOiarzzMH3Qx1IxvZ1789JtCIr2b5rHJH8tCPvPF0P2dihm/Wjf6xguyT
 tDMvquBdQnfugbZXQDy58Agp34Dw+fHCFaOmoruJePa78qqBYzujHvW9
 =gBaz
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix kernel crash caused by uio driver (Vitaly Kuznetsov)

 - Remove on-stack cpumask from HV APIC code (Wei Liu)

* tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself
  asm-generic/hyperv: provide cpumask_to_vpset_noself
  Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
2021-09-15 17:18:56 -07:00
Linus Torvalds
453fa43cdb RTC fixes for 5.15
Drivers:
  - cmos: fix a locking issue
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmFBHF4ACgkQY6TcMGxw
 OjJz5w/+NCnCVV1LU+N+SRke7/9rnYBQThtED00z+vIZloT+ELz0joO+Zk+NFIp0
 H0CQN8Hj03nqaeurYzPunhuqYGEpih8twI3UdrCvcJ0v4Nl9/Xz4x/OLq/uiBGdT
 zH15hsle7TMccrcgjAVonzPkVdhaKOgbfvybcAkvUX7rJaHTn8D0T9Ne/3+hEn4f
 KfTMghAa5WeZaD4hQV9IdxG7wXy3Q3i4hEOfhBiAXz4x80WVhs7htLdH8VFlD1kc
 aZ7jlnT6rna7i5SJUX+DhbDx2q+mE443UPmIu8ow8KS7zvhqoDCnMB8lnOe4diOX
 djFxrD9Ql1hvOH5G8VGLbQJb8R7HsKaG7fRW/cG5eSYqspBgd5d6TooiVhI90EjX
 sBIZ4Lq7PopAEH1NDh3ONOSND+D6Q2pN9xrN/yicheQVNUZL9nrXKSWXVZePETHn
 4pofNnsjCWzDcmQFxrjWury9vs3WR5FEyLxHrbt+EcN6z2hZlmjxpWJAsvFRA2jC
 mu9GtvgTRLM1Vhz3dRmS5nwl+2yZ4oBLcrRZ8Q+6JZa9qhmfATawUZJXroihpvZM
 SZavK8OwXiztWuW+lunsRGdxSDW/QK9E5Bydc/gRcfTDRPzWqqsnDIMOXptWSJ1L
 VS58F4/YrPnbdoH8wYWhZT0ZnB7P3ca9oEqxQp0afFcRjHSqkLk=
 =sw2n
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC fix from Alexandre Belloni:
 "Fix a locking issue in the cmos rtc driver"

* tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
2021-09-15 17:06:01 -07:00
Vladimir Oltean
a57d8c217a net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
messages appear:

mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2

(and similarly for other ports)

What happens is that DSA has a policy "even if there are bugs, let's at
least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
dp->mdbs lists, which are supposed to be empty.

But deleting that cleanup code, the warnings go away.

=> the FDB and MDB lists (used for refcounting on shared ports, aka CPU
and DSA ports) will eventually be empty, but are not empty by the time
we tear down those ports. Aka we are deleting them too soon.

The addresses that DSA complains about are host-trapped addresses: the
local addresses of the ports, and the MAC address of the bridge device.

The problem is that offloading those entries happens from a deferred
work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
races with the teardown of the CPU and DSA ports where the refcounting
is kept.

In fact, not only it races, but fundamentally speaking, if we iterate
through the port list linearly, we might end up tearing down the shared
ports even before we delete a DSA user port which has a bridge upper.

So as it turns out, we need to first tear down the user ports (and the
unused ones, for no better place of doing that), then the shared ports
(the CPU and DSA ports). In between, we need to ensure that all work
items scheduled by our switchdev handlers (which only run for user
ports, hence the reason why we tear them down first) have finished.

Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-15 15:09:46 -07:00
Vladimir Oltean
301de697d8 Revert "net: phy: Uniform PHY driver access"
This reverts commit 3ac8eed62596387214869319379c1fcba264d8c6, which did
more than it said on the box, and not only it replaced to_phy_driver
with phydev->drv, but it also removed the "!drv" check, without actually
explaining why that is fine.

That patch in fact breaks suspend/resume on any system which has PHY
devices with no drivers bound.

The stack trace is:

Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
pc : mdio_bus_phy_suspend+0xd8/0xec
lr : dpm_run_callback+0x38/0x90
Call trace:
 mdio_bus_phy_suspend+0xd8/0xec
 dpm_run_callback+0x38/0x90
 __device_suspend+0x108/0x3cc
 dpm_suspend+0x140/0x210
 dpm_suspend_start+0x7c/0xa0
 suspend_devices_and_enter+0x13c/0x540
 pm_suspend+0x2a4/0x330

Examples why that assumption is not fine:

- There is an MDIO bus with a PHY device that doesn't have a specific
  PHY driver loaded, because mdiobus_register() automatically creates a
  PHY device for it but there is no specific PHY driver in the system.
  Normally under those circumstances, the generic PHY driver will be
  bound lazily to it (at phy_attach_direct time). But some Ethernet
  drivers attach to their PHY at .ndo_open time. Until then it, the
  to-be-driven-by-genphy PHY device will not have a driver. The blamed
  patch amounts to saying "you need to open all net devices before the
  system can suspend, to avoid the NULL pointer dereference".

- There is any raw MDIO device which has 'plausible' values in the PHY
  ID registers 2 and 3, which is located on an MDIO bus whose driver
  does not set bus->phy_mask = ~0 (which prevents auto-scanning of PHY
  devices). An example could be a MAC's internal MDIO bus with PCS
  devices on it, for serial links such as SGMII. PHY devices will get
  created for those PCSes too, due to that MDIO bus auto-scanning, and
  although those PHY devices are not used, they do not bother anybody
  either. PCS devices are usually managed in Linux as raw MDIO devices.
  Nonetheless, they do not have a PHY driver, nor does anybody attempt
  to connect to them (because they are not a PHY), and therefore this
  patch breaks that.

The goal itself of the patch is questionable, so I am going for a
straight revert. to_phy_driver does not seem to have a need to be
replaced by phydev->drv, in fact that might even trigger code paths
which were not given too deep of a thought.

For instance:

phy_probe populates phydev->drv at the beginning, but does not clean it
up on any error (including EPROBE_DEFER). So if the phydev driver
requests probe deferral, phydev->drv will remain populated despite there
being no driver bound.

If a system suspend starts in between the initial probe deferral request
and the subsequent probe retry, we will be calling the phydev->drv->suspend
method, but _before_ any phydev->drv->probe call has succeeded.

That is to say, if the phydev->drv is allocating any driver-private data
structure in ->probe, it pretty much expects that data structure to be
available in ->suspend. But it may not. That is a pretty insane
environment to present to PHY drivers.

In the code structure before the blamed patch, mdio_bus_phy_may_suspend
would just say "no, don't suspend" to any PHY device which does not have
a driver pointer _in_the_device_structure_ (not the phydev->drv). That
would essentially ensure that ->suspend will never get called for a
device that has not yet successfully completed probe. This is the code
structure the patch is returning to, via the revert.

Fixes: 3ac8eed62596 ("net: phy: Uniform PHY driver access")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210914140515.2311548-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-15 15:06:46 -07:00
Vladimir Oltean
6a52e73368 net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
DSA supports connecting to a phy-handle, and has a fallback to a non-OF
based method of connecting to an internal PHY on the switch's own MDIO
bus, if no phy-handle and no fixed-link nodes were present.

The -ENODEV error code from the first attempt (phylink_of_phy_connect)
is what triggers the second attempt (phylink_connect_phy).

However, when the first attempt returns a different error code than
-ENODEV, this results in an unbalance of calls to phylink_create and
phylink_destroy by the time we exit the function. The phylink instance
has leaked.

There are many other error codes that can be returned by
phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
So this is a practical issue too.

Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-15 15:03:36 -07:00
Linus Torvalds
b7213ffa0e qnx4: avoid stringop-overread errors
The qnx4 directory entries are 64-byte blocks that have different
contents depending on the a status byte that is in the last byte of the
block.

In particular, a directory entry can be either a "link info" entry with
a 48-byte name and pointers to the real inode information, or an "inode
entry" with a smaller 16-byte name and the full inode information.

But the code was written to always just treat the directory name as if
it was part of that "inode entry", and just extend the name to the
longer case if the status byte said it was a link entry.

That work just fine and gives the right results, but now that gcc is
tracking data structure accesses much more, the code can trigger a
compiler error about using up to 48 bytes (the long name) in a structure
that only has that shorter name in it:

   fs/qnx4/dir.c: In function ‘qnx4_readdir’:
   fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
      51 |                         size = strnlen(de->di_fname, size);
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from fs/qnx4/qnx4.h:3,
                    from fs/qnx4/dir.c:16:
   include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
      45 |         char            di_fname[QNX4_SHORT_NAME_MAX];
         |                         ^~~~~~~~

which is because the source code doesn't really make this whole "one of
two different types" explicit.

Fix this by introducing a very explicit union of the two types, and
basically explaining to the compiler what is really going on.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 13:56:37 -07:00
Linus Torvalds
fc7c028dcd sparc: avoid stringop-overread errors
The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
didn't describe to the compiler how that header is then followed by the
data that the header describes.

As a result, gcc is now unhappy since it does stricter pointer range
tracking, and doesn't understand about how these things work.  This
results in various errors like:

    arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
    arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
      647 |                 if (!strcmp(names + ep[ret].name_offset, name))
          |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

which are easily avoided by just describing 'struct mdesc_hdr' better,
and making the node_block() helper function look into that unsized
data[] that follows the header.

This makes the sparc64 build happy again at least for my cross-compiler
version (gcc version 11.2.1).

Link: https://lore.kernel.org/lkml/CAHk-=wi4NW3NC0xWykkw=6LnjQD6D_rtRtxY9g8gQAJXtQMi8A@mail.gmail.com/
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 13:42:33 -07:00
Linus Torvalds
d6efd3f187 Merge branch 'absolute-pointer' (patches from Guenter)
Merge absolute_pointer macro series from Guenter Roeck:
 "Kernel test builds currently fail for several architectures with error
  messages such as the following.

  drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
  arch/m68k/include/asm/string.h:72:25: error:
        '__builtin_memcpy' reading 6 bytes from a region of size 0
                [-Werror=stringop-overread]

  Such warnings may be reported by gcc 11.x for string and memory
  operations on fixed addresses if gcc's builtin functions are used for
  those operations.

  This series introduces absolute_pointer() to fix the problem.
  absolute_pointer() disassociates a pointer from its originating symbol
  type and context, and thus prevents gcc from making assumptions about
  pointers passed to memory operations"

* emailed patches from Guenter Roeck <linux@roeck-us.net>:
  alpha: Use absolute_pointer to define COMMAND_LINE
  alpha: Move setup.h out of uapi
  net: i825xx: Use absolute_pointer for memcpy from fixed memory location
  compiler.h: Introduce absolute_pointer macro
2021-09-15 12:11:48 -07:00
Guenter Roeck
ebdc20d7bc alpha: Use absolute_pointer to define COMMAND_LINE
alpha:allmodconfig fails to build with the following error
when using gcc 11.x.

  arch/alpha/kernel/setup.c: In function 'setup_arch':
  arch/alpha/kernel/setup.c:493:13: error:
	'strcmp' reading 1 or more bytes from a region of size 0

Avoid the problem by declaring COMMAND_LINE as absolute_pointer().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 12:04:28 -07:00
Guenter Roeck
3cb8b1537f alpha: Move setup.h out of uapi
Most of the contents of setup.h have no value for userspace
applications.  The file was probably moved to uapi accidentally.

Keep the file in uapi to define the alpha-specific COMMAND_LINE_SIZE.
Move all other defines to arch/alpha/include/asm/setup.h.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 12:04:28 -07:00
Guenter Roeck
dff2d13114 net: i825xx: Use absolute_pointer for memcpy from fixed memory location
gcc 11.x reports the following compiler warning/error.

  drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
  arch/m68k/include/asm/string.h:72:25: error:
	'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]

Use absolute_pointer() to work around the problem.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 12:04:28 -07:00
Guenter Roeck
f6b5f1a569 compiler.h: Introduce absolute_pointer macro
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as

  drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
  arch/m68k/include/asm/string.h:72:25: error:
	'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]

Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 12:04:28 -07:00
Li Jinlin
858560b276 blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
KASAN reports a use-after-free report when doing fuzz test:

[693354.104835] ==================================================================
[693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
[693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338

[693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147
[693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018
[693354.105612] Call Trace:
[693354.105621]  dump_stack+0xf1/0x19b
[693354.105626]  ? show_regs_print_info+0x5/0x5
[693354.105634]  ? printk+0x9c/0xc3
[693354.105638]  ? cpumask_weight+0x1f/0x1f
[693354.105648]  print_address_description+0x70/0x360
[693354.105654]  kasan_report+0x1b2/0x330
[693354.105659]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105665]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105670]  bfq_io_set_weight_legacy+0xd3/0x160
[693354.105675]  ? bfq_cpd_init+0x20/0x20
[693354.105683]  cgroup_file_write+0x3aa/0x510
[693354.105693]  ? ___slab_alloc+0x507/0x540
[693354.105698]  ? cgroup_file_poll+0x60/0x60
[693354.105702]  ? 0xffffffff89600000
[693354.105708]  ? usercopy_abort+0x90/0x90
[693354.105716]  ? mutex_lock+0xef/0x180
[693354.105726]  kernfs_fop_write+0x1ab/0x280
[693354.105732]  ? cgroup_file_poll+0x60/0x60
[693354.105738]  vfs_write+0xe7/0x230
[693354.105744]  ksys_write+0xb0/0x140
[693354.105749]  ? __ia32_sys_read+0x50/0x50
[693354.105760]  do_syscall_64+0x112/0x370
[693354.105766]  ? syscall_return_slowpath+0x260/0x260
[693354.105772]  ? do_page_fault+0x9b/0x270
[693354.105779]  ? prepare_exit_to_usermode+0xf9/0x1a0
[693354.105784]  ? enter_from_user_mode+0x30/0x30
[693354.105793]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.105875] Allocated by task 1453337:
[693354.106001]  kasan_kmalloc+0xa0/0xd0
[693354.106006]  kmem_cache_alloc_node_trace+0x108/0x220
[693354.106010]  bfq_pd_alloc+0x96/0x120
[693354.106015]  blkcg_activate_policy+0x1b7/0x2b0
[693354.106020]  bfq_create_group_hierarchy+0x1e/0x80
[693354.106026]  bfq_init_queue+0x678/0x8c0
[693354.106031]  blk_mq_init_sched+0x1f8/0x460
[693354.106037]  elevator_switch_mq+0xe1/0x240
[693354.106041]  elevator_switch+0x25/0x40
[693354.106045]  elv_iosched_store+0x1a1/0x230
[693354.106049]  queue_attr_store+0x78/0xb0
[693354.106053]  kernfs_fop_write+0x1ab/0x280
[693354.106056]  vfs_write+0xe7/0x230
[693354.106060]  ksys_write+0xb0/0x140
[693354.106064]  do_syscall_64+0x112/0x370
[693354.106069]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106114] Freed by task 1453336:
[693354.106225]  __kasan_slab_free+0x130/0x180
[693354.106229]  kfree+0x90/0x1b0
[693354.106233]  blkcg_deactivate_policy+0x12c/0x220
[693354.106238]  bfq_exit_queue+0xf5/0x110
[693354.106241]  blk_mq_exit_sched+0x104/0x130
[693354.106245]  __elevator_exit+0x45/0x60
[693354.106249]  elevator_switch_mq+0xd6/0x240
[693354.106253]  elevator_switch+0x25/0x40
[693354.106257]  elv_iosched_store+0x1a1/0x230
[693354.106261]  queue_attr_store+0x78/0xb0
[693354.106264]  kernfs_fop_write+0x1ab/0x280
[693354.106268]  vfs_write+0xe7/0x230
[693354.106271]  ksys_write+0xb0/0x140
[693354.106275]  do_syscall_64+0x112/0x370
[693354.106280]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106329] The buggy address belongs to the object at ffff888be0a35580
                 which belongs to the cache kmalloc-1k of size 1024
[693354.106736] The buggy address is located 228 bytes inside of
                 1024-byte region [ffff888be0a35580, ffff888be0a35980)
[693354.107114] The buggy address belongs to the page:
[693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0
[693354.107606] flags: 0x17ffffc0008100(slab|head)
[693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080
[693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
[693354.108278] page dumped because: kasan: bad access detected

[693354.108511] Memory state around the buggy address:
[693354.108671]  ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[693354.116396]  ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.132421]                                                        ^
[693354.140284]  ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.147912]  ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.155281] ==================================================================

blkgs are protected by both queue and blkcg locks and holding
either should stabilize them. However, the path of destroying
blkg policy data is only protected by queue lock in
blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
can get the blkg policy data before the blkg policy data is
destroyed, and use it after destroyed, which will result in a
use-after-free.

CPU0                             CPU1
blkcg_deactivate_policy
  spin_lock_irq(&q->queue_lock)
                                 bfq_io_set_weight_legacy
                                   spin_lock_irq(&blkcg->lock)
                                   blkg_to_bfqg(blkg)
                                     pd_to_bfqg(blkg->pd[pol->plid])
                                     ^^^^^^blkg->pd[pol->plid] != NULL
                                           bfqg != NULL
  pol->pd_free_fn(blkg->pd[pol->plid])
    pd_to_bfqg(blkg->pd[pol->plid])
    bfqg_put(bfqg)
      kfree(bfqg)
  blkg->pd[pol->plid] = NULL
  spin_unlock_irq(q->queue_lock);
                                   bfq_group_set_weight(bfqg, val, 0)
                                     bfqg->entity.new_weight
                                     ^^^^^^trigger uaf here
                                   spin_unlock_irq(&blkcg->lock);

Fix by grabbing the matching blkcg lock before trying to
destroy blkg policy data.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Li Jinlin <lijinlin3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210914042605.3260596-1-lijinlin3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 12:03:18 -06:00
Yanfei Xu
6f5ddde410 blkcg: fix memory leak in blk_iolatency_init
BUG: memory leak
unreferenced object 0xffff888129acdb80 (size 96):
  comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s)
  hex dump (first 32 bytes):
    20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff   G...... ..)....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline]
    [<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline]
    [<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724
    [<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185
    [<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566
    [<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline]
    [<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124
    [<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344
    [<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline]
    [<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516
    [<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline]
    [<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline]
    [<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline]
    [<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
    [<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will
not be invoked for cleanup. That leads a memory leak. Swap the
blk_throtl_init() and blk_iolatency_init() calls can solve this.

Reported-by: syzbot+01321b15cc98e6bf96d6@syzkaller.appspotmail.com
Fixes: 19688d7f9592 (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls)
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210915072426.4022924-1-yanfei.xu@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 11:57:44 -06:00
Masami Hiramatsu
80be5998ad tools/bootconfig: Define memblock_free_ptr() to fix build error
The lib/bootconfig.c file is shared with the 'bootconfig' tooling, and
as a result, the changes incommit 77e02cf57b6c ("memblock: introduce
saner 'memblock_free_ptr()' interface") need to also be reflected in the
tooling header file.

So define the new memblock_free_ptr() wrapper, and remove unused __pa()
and memblock_free().

Fixes: 77e02cf57b6c ("memblock: introduce saner 'memblock_free_ptr()' interface")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-15 09:49:48 -07:00
Pavel Begunkov
b66ceaf324 io_uring: move iopoll reissue into regular IO path
230d50d448acb ("io_uring: move reissue into regular IO path")
made non-IOPOLL I/O to not retry from ki_complete handler. Follow it
steps and do the same for IOPOLL. Same problems, same implementation,
same -EAGAIN assumptions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 09:22:35 -06:00
Jens Axboe
7dedd3e180 Revert "iov_iter: track truncated size"
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa.

We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 09:22:35 -06:00
Jens Axboe
cd65869512 io_uring: use iov_iter state save/restore helpers
Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.

We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 09:22:32 -06:00
Jens Axboe
65ed1e692f nvme fixes for Linux 5.15
- fix ANA state updates when a namespace is not present (Anton Eidelman)
  - nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
    (Dan Carpenter)
  - avoid race in shutdown namespace removal (Daniel Wagner)
  - fix io_work priority inversion in nvme-tcp (Keith Busch)
  - destroy cm id before destroy qp to avoid use after free (Ruozhu Li)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmFB+aQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOfPg//fREqsvY1eMkexP/fOyrpVpLH1lLLM2SHUYwyQOtg
 4CVxl2KU8YGyGLGpAMO6ToctrNV83oUQUQlgy7PyQQpUzr6m4sJm7Lbxzvqhyh96
 MKeHRcWo173twiCLICx4GADSmlwofmwUSzBXdhEtt1gNmg26T+nnDewHpSqLY5Ax
 BNAtFkAGTZodupVKUNsI+tqAyXY3Bgk069YqDiZfBCqU62JzLj6QX0aVXcpS2kcv
 bVpJ0kbnuoUkHG/eIrLZeiyvikxQWx1KaLaL3Y24wrdkVWmZJpCYbg/gXq5v9GUQ
 U8yq/iuOSTtfE0+7hNCyNz8tNLfzxejd0jDCfQcPwXOuzIfNkdzdqhkhste2HYB5
 uriDj1eTQq1siH6kKu8oYQBMSxMSw9wM5q71d5QX2XJzw3hl9bqw5GqiT8Gjk1pX
 K759Tj+IMFNa1SqUGI5IBfkHz4vvuwOr+pO60fwd3CzajOI8mNnNAuJ+1I2UJaEP
 cgq+coD950BsWCyhL+XO/NhYPUSugI331JQqlIkVPPYxqBCJK22c7rZusM0Sm4Gq
 aweWKIxtI9cbsrlf6WfJdyHf5clZ9AjMmWEzHFbDeGqUhb3pTFoT5H/UqSw9/6xe
 pi7iKS6lBoZAoWKfyt9bOKfmaDlcrylAxPQ/LfyJGSTy5Buj5bcc4FO5rRJ+BIay
 t88=
 =wad8
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme into block-5.15

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.15

 - fix ANA state updates when a namespace is not present (Anton Eidelman)
 - nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
   (Dan Carpenter)
 - avoid race in shutdown namespace removal (Daniel Wagner)
 - fix io_work priority inversion in nvme-tcp (Keith Busch)
 - destroy cm id before destroy qp to avoid use after free (Ruozhu Li)"

* tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme:
  nvme-tcp: fix io_work priority inversion
  nvme-rdma: destroy cm id before destroy qp to avoid use after free
  nvme-multipath: fix ANA state updates when a namespace is not present
  nvme: avoid race in shutdown namespace removal
  nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()
2021-09-15 07:53:32 -06:00
Jan Beulich
d859ed25b2 swiotlb-xen: drop DEFAULT_NSLABS
It was introduced by 4035b43da6da ("xen-swiotlb: remove xen_set_nslabs")
and then not removed by 2d29960af0be ("swiotlb: dynamically allocate
io_tlb_default_mem").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/15259326-209a-1d11-338c-5018dc38abe8@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
7fd880a38c swiotlb-xen: arrange to have buffer info logged
I consider it unhelpful that address and size of the buffer aren't put
in the log file; it makes diagnosing issues needlessly harder. The
majority of callers of swiotlb_init() also passes 1 for the "verbose"
parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/2e3c8e68-36b2-4ae9-b829-bf7f75d39d47@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
68573c1b5c swiotlb-xen: drop leftover __ref
Commit a98f565462f0 ("xen-swiotlb: split xen_swiotlb_init") should not
only have added __init to the split off function, but also should have
dropped __ref from the one left.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/7cd163e1-fe13-270b-384c-2708e8273d34@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
cabb7f89b2 swiotlb-xen: limit init retries
Due to the use of max(1024, ...) there's no point retrying (and issuing
bogus log messages) when the number of slabs is already no larger than
this minimum value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/984fa426-2b7b-4b77-5ce8-766619575b7f@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
79ca5f778a swiotlb-xen: suppress certain init retries
Only on the 2nd of the paths leading to xen_swiotlb_init()'s "error"
label it is useful to retry the allocation; the first one did already
iterate through all possible order values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/56477481-87da-4962-9661-5e1b277efde0@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
d9a688add3 swiotlb-xen: maintain slab count properly
Generic swiotlb code makes sure to keep the slab count a multiple of the
number of slabs per segment. Yet even without checking whether any such
assumption is made elsewhere, it is easy to see that xen_swiotlb_fixup()
might alter unrelated memory when calling xen_create_contiguous_region()
for the last segment, when that's not a full one - the function acts on
full order-N regions, not individual pages.

Align the slab count suitably when halving it for a retry. Add a build
time check and a runtime one. Replace the no longer useful local
variable "slabs" by an "order" one calculated just once, outside of the
loop. Re-use "order" for calculating "dma_bits", and change the type of
the latter as well as the one of "i" while touching this anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/dc054cb0-bec4-4db0-fc06-c9fc957b6e66@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
4c092c5901 swiotlb-xen: fix late init retry
The commit referenced below removed the assignment of "bytes" from
xen_swiotlb_init() without - like done for xen_swiotlb_init_early() -
adding an assignment on the retry path, thus leading to excessively
sized allocations upon retries.

Fixes: 2d29960af0be ("swiotlb: dynamically allocate io_tlb_default_mem")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/778299d6-9cfd-1c13-026e-25ee5d14ecb3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
ce6a80d1b2 swiotlb-xen: avoid double free
Of the two paths leading to the "error" label in xen_swiotlb_init() one
didn't allocate anything, while the other did already free what was
allocated.

Fixes: b82776005369 ("xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>

Link: https://lore.kernel.org/r/ce9c2adb-8a52-6293-982a-0d6ece943ac6@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Jan Beulich
45da234467 xen/pvcalls: backend can be a module
It's not clear to me why only the frontend has been tristate. Switch the
backend to be, too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

Link: https://lore.kernel.org/r/54a6070c-92bb-36a3-2fc0-de9ccca438c5@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Juergen Gross
36c9b5929b xen: fix usage of pmd_populate in mremap for pv guests
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page
table entries") introduced a regression when running as Xen PV guest.

Today pmd_populate() for Xen PV assumes that the PFN inserted is
referencing a not yet used page table. In case of move_normal_pmd()
this is not true, resulting in WARN splats like:

[34321.304270] ------------[ cut here ]------------
[34321.304277] WARNING: CPU: 0 PID: 23628 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x176/0x1a0
[34321.304288] Modules linked in:
[34321.304291] CPU: 0 PID: 23628 Comm: apt-get Not tainted 5.14.1-20210906-doflr-mac80211debug+ #1
[34321.304294] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
[34321.304296] RIP: e030:xen_mc_flush+0x176/0x1a0
[34321.304300] Code: 89 45 18 48 c1 e9 3f 48 89 ce e9 20 ff ff ff e8 60 03 00 00 66 90 5b 5d 41 5c 41 5d c3 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 10 97 aa 82 31 db 49 c7 c5 38 97 aa 82 65
[34321.304303] RSP: e02b:ffffc90000a97c90 EFLAGS: 00010002
[34321.304305] RAX: ffff88807d416398 RBX: ffff88807d416350 RCX: ffff88807d416398
[34321.304306] RDX: 0000000000000001 RSI: 0000000000000001 RDI: deadbeefdeadf00d
[34321.304308] RBP: ffff88807d416300 R08: aaaaaaaaaaaaaaaa R09: ffff888006160cc0
[34321.304309] R10: deadbeefdeadf00d R11: ffffea000026a600 R12: 0000000000000000
[34321.304310] R13: ffff888012f6b000 R14: 0000000012f6b000 R15: 0000000000000001
[34321.304320] FS:  00007f5071177800(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
[34321.304322] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[34321.304323] CR2: 00007f506f542000 CR3: 00000000160cc000 CR4: 0000000000000660
[34321.304326] Call Trace:
[34321.304331]  xen_alloc_pte+0x294/0x320
[34321.304334]  move_pgt_entry+0x165/0x4b0
[34321.304339]  move_page_tables+0x6fa/0x8d0
[34321.304342]  move_vma.isra.44+0x138/0x500
[34321.304345]  __x64_sys_mremap+0x296/0x410
[34321.304348]  do_syscall_64+0x3a/0x80
[34321.304352]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[34321.304355] RIP: 0033:0x7f507196301a
[34321.304358] Code: 73 01 c3 48 8b 0d 76 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 0e 0c 00 f7 d8 64 89 01 48
[34321.304360] RSP: 002b:00007ffda1eecd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
[34321.304362] RAX: ffffffffffffffda RBX: 000056205f950f30 RCX: 00007f507196301a
[34321.304363] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f506dc56000
[34321.304364] RBP: 0000000001a00000 R08: 0000000000000010 R09: 0000000000000004
[34321.304365] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f506dc56060
[34321.304367] R13: 00007f506dc56000 R14: 00007f506dc56060 R15: 000056205f950f30
[34321.304368] ---[ end trace a19885b78fe8f33e ]---
[34321.304370] 1 of 2 multicall(s) failed: cpu 0
[34321.304371]   call  2: op=12297829382473034410 arg=[aaaaaaaaaaaaaaaa] result=-22

Fix that by modifying xen_alloc_ptpage() to only pin the page table in
case it wasn't pinned already.

Fixes: 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries")
Cc: <stable@vger.kernel.org>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210908073640.11299-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:04 +02:00
Juergen Gross
f68aa100d8 xen: reset legacy rtc flag for PV domU
A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:

[    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[    1.333404] Modules linked in:
[    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ #282
[    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[    1.333404] Call Trace:
[    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
[    1.333404]  ? rdinit_setup+0x2b/0x2b
[    1.333404]  early_resume_init+0x23/0xa4
[    1.333404]  ? cn_proc_init+0x36/0x36
[    1.333404]  do_one_initcall+0x3e/0x200
[    1.333404]  kernel_init_freeable+0x232/0x28e
[    1.333404]  ? rest_init+0xd0/0xd0
[    1.333404]  kernel_init+0x16/0x120
[    1.333404]  ret_from_fork+0x1f/0x30

Cc: <stable@vger.kernel.org>
Fixes: 8d152e7a5c7537 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210903084937.19392-3-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-09-15 08:42:03 +02:00
Randy Dunlap
7366c23ff4 ptp: dp83640: don't define PAGE0
Building dp83640.c on arch/parisc/ produces a build warning for
PAGE0 being redefined. Since the macro is not used in the dp83640
driver, just make it a comment for documentation purposes.

In file included from ../drivers/net/phy/dp83640.c:23:
../drivers/net/phy/dp83640_reg.h:8: warning: "PAGE0" redefined
    8 | #define PAGE0                     0x0000
                 from ../drivers/net/phy/dp83640.c:11:
../arch/parisc/include/asm/page.h:187: note: this is the location of the previous definition
  187 | #define PAGE0   ((struct zeropage *)__PAGE_OFFSET)

Fixes: cb646e2b02b2 ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210913220605.19682-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-14 20:03:24 -07:00
Adrian Bunk
52ce14c134 bnx2x: Fix enabling network interfaces without VFs
This function is called to enable SR-IOV when available,
not enabling interfaces without VFs was a regression.

Fixes: 65161c35554f ("bnx2x: Fix missing error code in bnx2x_iov_init_one()")
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reported-by: YunQiang Su <wzssyqa@gmail.com>
Tested-by: YunQiang Su <wzssyqa@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20210912190523.27991-1-bunk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-14 19:26:48 -07:00
Christoph Hellwig
9da4c7276e nvme: remove the call to nvme_update_disk_info in nvme_ns_remove
There is no need to explicitly unregister the integrity profile when
deleting the gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14 20:03:30 -06:00
Lihong Kou
3df49967f6 block: flush the integrity workqueue in blk_integrity_unregister
When the integrity profile is unregistered there can still be integrity
reads queued up which could see a NULL verify_fn as shown by the race
window below:

CPU0                                    CPU1
  process_one_work                      nvme_validate_ns
    bio_integrity_verify_fn                nvme_update_ns_info
	                                     nvme_update_disk_info
	                                       blk_integrity_unregister
                                               ---set queue->integrity as 0
	bio_integrity_process
	--access bi->profile->verify_fn(bi is a pointer of queue->integity)

Before calling blk_integrity_unregister in nvme_update_disk_info, we must
make sure that there is no work item in the kintegrityd_wq. Just call
blk_flush_integrity to flush the work queue so the bug can be resolved.

Signed-off-by: Lihong Kou <koulihong@huawei.com>
[hch: split up and shortened the changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14 20:03:30 -06:00
Christoph Hellwig
783a40a1b3 block: check if a profile is actually registered in blk_integrity_unregister
While clearing the profile itself is harmless, we really should not clear
the stable writes flag if it wasn't set due to a registered integrity
profile.

Reported-by: Lihong Kou <koulihong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14 20:03:30 -06:00
Huang Rui
3ca706c189 drm/ttm: fix type mismatch error on sparc64
On sparc64, __fls() returns an "int", but the drm TTM code expected it
to be "unsigned long" as on x86.  As a result, on sparc (and arc, and
m68k) you get build errors because 'min()' checks that the types match.

As suggested by Linus, it can use min_t instead of min to force the type
to be "unsigned int".

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 13:36:02 -07:00
Linus Torvalds
77e02cf57b memblock: introduce saner 'memblock_free_ptr()' interface
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.

Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:

   https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/

or just random memory corruption if the debug checks don't catch it:

   https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/

I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.

I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.

So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer.  And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 13:23:22 -07:00
Nirmoy Das
b04ce53eac drm/amdgpu: use IS_ERR for debugfs APIs
debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-By: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-14 16:21:15 -04:00
Simon Ser
7bbee36d71 amd/display: downgrade validation failure log level
In amdgpu_dm_atomic_check, dc_validate_global_state is called. On
failure this logs a warning to the kernel journal. However warnings
shouldn't be used for atomic test-only commit failures: user-space
might be perfoming a lot of atomic test-only commits to find the
best hardware configuration.

Downgrade the log to a regular DRM atomic message. While at it, use
the new device-aware logging infrastructure.

This fixes error messages in the kernel when running gamescope [1].

[1]: https://github.com/Plagman/gamescope/issues/245

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-14 16:18:33 -04:00
Christian König
c92db8d64f drm/amdgpu: fix use after free during BO move
The memory backing old_mem is already freed at that point, move the
check a bit more up.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1699
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-14 16:17:39 -04:00
Kenneth Feng
5598d7c21a drm/amd/pm: fix the issue of uploading powerplay table
fix the issue of uploading powerplay table due to the dependancy of rlc.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-14 16:15:21 -04:00
Ernst Sjöstrand
67a44e6598 drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10
Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: stable@vger.kernel.org
Signed-off-by: Ernst Sjöstrand <ernstp@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-14 16:14:41 -04:00
Vasily Averin
6a4746ba06 ipc: remove memcg accounting for sops objects in do_semtimedop()
Linus proposes to revert an accounting for sops objects in
do_semtimedop() because it's really just a temporary buffer
for a single semtimedop() system call.

This object can consume up to 2 pages, syscall is sleeping
one, size and duration can be controlled by user, and this
allocation can be repeated by many thread at the same time.

However Shakeel Butt pointed that there are much more popular
objects with the same life time and similar memory
consumption, the accounting of which was decided to be
rejected for performance reasons.

Considering at least 2 pages for task_struct and 2 pages for
the kernel stack, a back of the envelope calculation gives a
footprint amplification of <1.5 so this temporal buffer can be
safely ignored.

The factor would IMO be interesting if it was >> 2 (from the
PoV of excessive (ab)use, fine-grained accounting seems to be
currently unfeasible due to performance impact).

Link: https://lore.kernel.org/lkml/90e254df-0dfe-f080-011e-b7c53ee7fd20@virtuozzo.com/
Fixes: 18319498fdd4 ("memcg: enable accounting of ipc resources")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 10:22:11 -07:00
Jens Axboe
5d329e1286 io_uring: allow retry for O_NONBLOCK if async is supported
A common complaint is that using O_NONBLOCK files with io_uring can be a
bit of a pain. Be a bit nicer and allow normal retry IFF the file does
support async behavior. This makes it possible to use io_uring more
reliably with O_NONBLOCK files, for use cases where it either isn't
possible or feasible to modify the file flags.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14 11:09:42 -06:00