IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 0948a9c5386095baae4012190a6b65aba684a907 ]
VMID introduced an extra increment of cmd_pending, causing double-counting
of the I/O. The normal increment ios performed in lpfc_get_scsi_buf.
Link: https://lore.kernel.org/r/20220701211425.2708-5-jsmart2021@gmail.com
Fixes: 33c79741deaf ("scsi: lpfc: vmid: Introduce VMID in I/O path")
Cc: <stable@vger.kernel.org> # v5.14+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3512ac0942938d6977e7999ee69765d948d2faf1 ]
This patch refactors the SCSI paths to use SLI-4 as the primary interface.
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
Link: https://lore.kernel.org/r/20220225022308.16486-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b64aa9eae28ac598a03ed3d62a63ac5e5b295fc ]
Convert the SLI4 fast and slow paths to use native SLI4 wqe constructs
instead of iocb SLI3-isms.
Includes the following:
- Create simple get_xxx and set_xxx routines to wrapper access to common
elements in both SLI3 and SLI4 commands - allowing calling routines to
avoid sli-rev-specific structures to access the elements.
- using the wqe in the job structure as the primary element
- use defines from SLI-4, not SLI-3
- Removal of iocb to wqe conversion from fast and slow path
- Add below routines to handle fast path
lpfc_prep_embed_io - prepares the wqe for fast path
lpfc_wqe_bpl2sgl - manages bpl to sgl conversion
lpfc_sli_wqe2iocb - converts a WQE to IOCB for SLI-3 path
- Add lpfc_sli3_iocb2wcqecmpl in completion path to convert an SLI-3
iocb completion to wcqe completion
- Refactor some of the code that works on both revs for clarity
Link: https://lore.kernel.org/r/20220225022308.16486-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a680a9298e7b4ff344aca3456177356b276e5038 ]
Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure.
This is a "common" structure but many of the components refer to sli-rev
specific entities which can lead the developer astray as to what they
actually mean, should be set to, or when they should be used.
This first patch prepares the lpfc_iocbq structure so that elements common
to both SLI3 and SLI4 data paths are more appropriately named, making it
clear they apply generically.
Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually
generic to the paths are renamed to 'cmd':
- iocb_flag is renamed to cmd_flag
- lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag
- fabric_iocb_cmpl is renamed to fabric_cmd_cmpl
- wait_iocb_cmpl is renamed to wait_cmd_cmpl
- iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl
- rsvd2 member is renamed to num_bdes due to pre-existing usage
The structure name itself will retain the iocb reference as changing to a
more relevant "job" or "cmd" title induces many hundreds of line changes
for only a name change.
lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use
in the SLI3 path only.
Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25ac2c970be32993f1dff607f8354f3c053d42bc ]
Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.
There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.
Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.
The following modifications are made:
- A new flag is set in PCI error entrypoints so the driver can track being
called by that path.
- An interlock is added in the SLI hw error path and the PCI error path
such that only one of the paths proceeds with the teardown logic.
- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
cmds in cases of hw error.
- If entering the SLI port re-init calls, a case where SLI error teardown
was quick and beat the PCI calls now reporting error, check whether the
SLI port is still live on the PCI bus.
- In the PCI reset code to bring the adapter back, recheck the IRQ
settings. Different checks for SLI3 vs SLI4.
- In I/O completions, that may be called as part of the cleanup or
underway just before the hw error, check the state of the adapter. If
in error, shortcut handling that would expect further adapter
completions as the hw error won't be sending them.
- In routines waiting on I/O completions, which may have been in progress
prior to the hw error, detect the device is being torn down and abort
from their waits and just give up. This points to a larger issue in the
driver on ref-counting for data structures, as it doesn't have
ref-counting on q and port structures. We'll do this fix for now as it
would be a major rework to be done differently.
- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
failed back due to hw error.
- In I/O buf allocation, done at the start of new I/Os, check hw state and
fail if hw error.
Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 689640efc0a2c4e07e6f88affe6d42cd40cc3f85 ]
When scpi probe fails, at any point, we need to ensure that the scpi_info
is not set and will remain NULL until the probe succeeds. If it is not
taken care, then it could result use-after-free as the value is exported
via get_scpi_ops() and could refer to a memory allocated via devm_kzalloc()
but freed when the probe fails.
Link: https://lore.kernel.org/r/20220701160310.148344-1-sudeep.holla@arm.com
Cc: stable@vger.kernel.org # 4.19+
Reported-by: huhai <huhai@kylinos.cn>
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b960c967f2aa01ab8f45c5a0bd78e754cffdeee ]
Commit 05b35e7eb9a1 ("smsc95xx: add phylib support") amended
smsc95xx_resume() to call phy_init_hw(). That function waits for the
device to runtime resume even though it is placed in the runtime resume
path, causing a deadlock.
The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(),
which never uses the _nopm variant of usbnet_read_cmd().
Commit b4df480f68ae ("usbnet: smsc95xx: add reset_resume function with
reset operation") causes a similar deadlock on resume if the device was
already runtime suspended when entering system sleep:
That's because the commit introduced smsc95xx_reset_resume(), which
calls down to smsc95xx_reset(), which neglects to use _nopm accessors.
Fix by auto-detecting whether a device access is performed by the
suspend/resume task_struct and use the _nopm variant if so. This works
because the PM core guarantees that suspend/resume callbacks are run in
task context.
Stacktrace for posterity:
INFO: task kworker/2:1:49 blocked for more than 122 seconds.
Workqueue: usb_hub_wq hub_event
schedule
rpm_resume
__pm_runtime_resume
usb_autopm_get_interface
usbnet_read_cmd
__smsc95xx_read_reg
__smsc95xx_phy_wait_not_busy
__smsc95xx_mdio_read
smsc95xx_mdiobus_read
__mdiobus_read
mdiobus_read
smsc_phy_reset
phy_init_hw
smsc95xx_resume
usb_resume_interface
usb_resume_both
usb_runtime_resume
__rpm_callback
rpm_callback
rpm_resume
__pm_runtime_resume
usb_autoresume_device
hub_event
process_one_work
Fixes: b4df480f68ae ("usbnet: smsc95xx: add reset_resume function with reset operation")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.16+
Cc: Andre Edich <andre.edich@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ce8b37241ed291af56f7a49bbdbf20c08728e88 ]
Link status of SMSC LAN95xx chips is polled once per second, even though
they're capable of signaling PHY interrupts through the MAC layer.
Forward those interrupts to the PHY driver to avoid polling. Benefits
are reduced bus traffic, reduced CPU overhead and quicker interface
bringup.
Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
smsc95xx: fix link detection for disabled autonegotiation").
Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
hence couldn't detect link-up events when auto-negotiation was disabled.
The proper solution would have been to enable the ENERGYON interrupt
instead of polling.
Since then, PHY handling was moved from the LAN95xx driver to the SMSC
PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
That PHY driver is capable of link detection with auto-negotiation
disabled because it enables the ENERGYON interrupt.
Note that signaling interrupts through the MAC layer not only works with
the integrated PHY, but also with an external PHY, provided its
interrupt pin is attached to LAN95xx's nPHY_INT pin.
In the unlikely event that the interrupt pin of an external PHY is
attached to a GPIO of the SoC (or not connected at all), the driver can
be amended to retrieve the irq from the PHY's of_node.
To forward PHY interrupts to phylib, it is not sufficient to call
phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run
so that PHY interrupts are cleared. That's because according to page
119 of the LAN950x datasheet, "The source of this interrupt is a level.
The interrupt persists until it is cleared in the PHY."
https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
Therefore, create an IRQ domain with a single IRQ for the PHY. In the
future, the IRQ domain may be extended to support the 11 GPIOs on the
LAN95xx.
Normally the PHY interrupt should be masked until the PHY driver has
cleared it. However masking requires a (sleeping) USB transaction and
interrupts are received in (non-sleepable) softirq context. I decided
not to mask the interrupt at all (by using the dummy_irq_chip's noop
->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec
intervals and normally that's sufficient to wake the PHY driver's IRQ
thread and have it clear the interrupt. If it does take longer, worst
thing that can happen is the IRQ thread is woken again. No big deal.
Because PHY interrupts are now perpetually enabled, there's no need to
selectively enable them on suspend. So remove all invocations of
smsc95xx_enable_phy_wakeup_interrupts().
In smsc95xx_resume(), move the call of phy_init_hw() before
usbnet_resume() (which restarts the status URB) to ensure that the PHY
is fully initialized when an interrupt is handled.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
Cc: Andre Edich <andre.edich@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8960f878e39fadc03d74292a6731f1e914cf2019 ]
When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the
MAC full duplex mode and PHY flow control registers based on cached data
in struct phy_device:
smsc95xx_status() # raises EVENT_LINK_RESET
usbnet_deferred_kevent()
smsc95xx_link_reset() # uses cached data in phydev
Simultaneously, phylib polls link status once per second and updates
that cached data:
phy_state_machine()
phy_check_link_status()
phy_read_status()
lan87xx_read_status()
genphy_read_status() # updates cached data in phydev
If smsc95xx_link_reset() wins the race against genphy_read_status(),
the registers may be updated based on stale data.
E.g. if the link was previously down, phydev->duplex is set to
DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even
though genphy_read_status() may update it to DUPLEX_FULL afterwards.
PHY interrupts are currently only enabled on suspend to trigger wakeup,
so the impact of the race is limited, but we're about to enable them
perpetually.
Avoid the race by delaying execution of smsc95xx_link_reset() until
phy_state_machine() has done its job and calls back via
smsc95xx_handle_link_change().
Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib
picks up link status changes through polling. So drop the declaration
of a ->link_reset() callback.
Note that the semicolon on a line by itself added in smsc95xx_status()
is a placeholder for a function call which will be added in a subsequent
commit. That function call will actually handle the INT_ENP_PHY_INT_
interrupt.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3108871f19221372b251f7da1ac38736928b5b3a ]
Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver
attempts to clear the signaled interrupts by writing "all ones" to the
Interrupt Status Register.
However the driver only ever enables a single type of interrupt, namely
the PHY Interrupt. And according to page 119 of the LAN950x datasheet,
its bit in the Interrupt Status Register is read-only. There's no other
way to clear it than in a separate PHY register:
https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
Consequently, writing "all ones" to the Interrupt Status Register is
pointless and can be dropped.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da8e393e23efb60eba8959856c7df88f9859f6eb ]
vc4_drv isn't necessarily under the /soc node in DT as it is a
virtual device, but it is the one that does the allocations.
The DMA addresses are consumed by primarily the HVS or V3D, and
those require VideoCore cache alias address mapping, and so will be
under /soc.
During probe find the a suitable device node for HVS or V3D,
and adopt the DMA configuration of that node.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://lore.kernel.org/r/20220613144800.326124-2-maxime@cerno.tech
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a710a8bc909313951eb9252d8419924c771d7c2 ]
The WD22TB4 Thunderbolt dock at least will revert its DP_MAX_LINK_RATE
from HBR3 to HBR2 after system suspend/resume if the DP_DP13_DPCD_REV
registers are not read subsequently also as required.
Fix this by reading DP_DP13_DPCD_REV registers as well, matching what is
done during connector detection. While at it also fix up the same call
in drm_dp_mst_dump_topology().
Cc: Lyude Paul <lyude@redhat.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5292
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Cc: <stable@vger.kernel.org> # v5.14+
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220614094537.885472-1-imre.deak@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d16803c562ecc644803d42ba98a8e0aef9c014e ]
BLAKE2s has no currently known use as an shash. Just remove all of this
unnecessary plumbing. Removing this shash was something we talked about
back when we were making BLAKE2s a built-in, but I simply never got
around to doing it. So this completes that project.
Importantly, this fixs a bug in which the lib code depends on
crypto_simd_disabled_for_test, causing linker errors.
Also add more alignment tests to the selftests and compare SIMD and
non-SIMD compression functions, to make up for what we lose from
testmgr.c.
Reported-by: gaochao <gaochao49@huawei.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39e8d062b03c3dc257d880d82bd55cdd9e185a3b ]
To comply with the panel sequence, hold the mipi signal to LP00 before
the dcs cmds transmission, and pull the mipi signal high from LP00 to
LP11 until the start of the dcs cmds transmission.
The normal panel timing is :
(1) pp1800 DC pull up
(2) avdd & avee AC pull high
(3) lcm_reset pull high -> pull low -> pull high
(4) Pull MIPI signal high (LP11) -> initial code -> send video data
(HS mode)
The power-off sequence is reversed.
If dsi is not in cmd mode, then dsi will pull the mipi signal high in
the mtk_output_dsi_enable function. The delay in lane_ready func is
the reaction time of dsi_rx after pulling up the mipi signal.
Fixes: 2dd8075d2185 ("drm/mediatek: mtk_dsi: Use the drm_panel_bridge API")
Link: https://patchwork.kernel.org/project/linux-mediatek/patch/1653012007-11854-4-git-send-email-xinlei.lee@mediatek.com/
Cc: <stable@vger.kernel.org> # 5.10.x: 7f6335c6a258: drm/mediatek: Modify dsi funcs to atomic operations
Cc: <stable@vger.kernel.org> # 5.10.x: cde7e2e35c28: drm/mediatek: Separate poweron/poweroff from enable/disable and define new funcs
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Jitao Shi <jitao.shi@mediatek.com>
Signed-off-by: Xinlei Lee <xinlei.lee@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81cc7e51c4f1686b71e30046437056ece6b2cb4d ]
Mipi dsi panel drivers can use mipi_dsi_dcs_{set,get}_display_brightness()
to request backlight changes.
This can be done during panel initialization (dsi is in command mode)
or afterwards (dsi is in Video Mode).
When the DSI is in Video Mode, all commands are rejected.
Detect current DSI mode in mtk_dsi_host_transfer() and switch modes
temporarily to allow commands to be sent.
Signed-off-by: Julien STEPHAN <jstephan@baylibre.com>
Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2af28b241eea816e6f7668d1954f15894b45d7e3 upstream.
trace_spmi_write_begin() and trace_spmi_read_end() both call
memcpy() with a length of "len + 1". This leads to one extra
byte being read beyond the end of the specified buffer. Fix
this out-of-bound memory access by using a length of "len"
instead.
Here is a KASAN log showing the issue:
BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_spmi_read_end+0x1d0/0x234
Read of size 2 at addr ffffffc0265b7540 by task thermal@2.0-ser/1314
...
Call trace:
dump_backtrace+0x0/0x3e8
show_stack+0x2c/0x3c
dump_stack_lvl+0xdc/0x11c
print_address_description+0x74/0x384
kasan_report+0x188/0x268
kasan_check_range+0x270/0x2b0
memcpy+0x90/0xe8
trace_event_raw_event_spmi_read_end+0x1d0/0x234
spmi_read_cmd+0x294/0x3ac
spmi_ext_register_readl+0x84/0x9c
regmap_spmi_ext_read+0x144/0x1b0 [regmap_spmi]
_regmap_raw_read+0x40c/0x754
regmap_raw_read+0x3a0/0x514
regmap_bulk_read+0x418/0x494
adc5_gen3_poll_wait_hs+0xe8/0x1e0 [qcom_spmi_adc5_gen3]
...
__arm64_sys_read+0x4c/0x60
invoke_syscall+0x80/0x218
el0_svc_common+0xec/0x1c8
...
addr ffffffc0265b7540 is located in stack of task thermal@2.0-ser/1314 at offset 32 in frame:
adc5_gen3_poll_wait_hs+0x0/0x1e0 [qcom_spmi_adc5_gen3]
this frame has 1 object:
[32, 33) 'status'
Memory state around the buggy address:
ffffffc0265b7400: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
ffffffc0265b7480: 04 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc0265b7500: 00 00 00 00 f1 f1 f1 f1 01 f3 f3 f3 00 00 00 00
^
ffffffc0265b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0265b7600: f1 f1 f1 f1 01 f2 07 f2 f2 f2 01 f3 00 00 00 00
==================================================================
Fixes: a9fce374815d ("spmi: add command tracepoints for SPMI")
Cc: stable@vger.kernel.org
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: David Collins <quic_collinsd@quicinc.com>
Link: https://lore.kernel.org/r/20220627235512.2272783-1-quic_collinsd@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20aac6c60981f5bfacd66661d090d907bf1482f0 upstream.
Validate mount_lock seqcount as soon as we cross into mount in RCU
mode. Sure, ->mnt_root is pinned and will remain so until we
do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
the mount_lock had been touched, but we might run into a hard error
(e.g. -ENOENT) before trying to unlazy. And it's possible to end
up with RCU pathwalk racing with rename() and umount() in a way
that would fail with -ENOENT while non-RCU pathwalk would've
succeeded with any timings.
Once upon a time we hadn't needed that, but analysis had been subtle,
brittle and went out of window as soon as RENAME_EXCHANGE had been
added.
It's narrow, hard to hit and won't get you anything other than
stray -ENOENT that could be arranged in much easier way with the
same priveleges, but it's a bug all the same.
Cc: stable@kernel.org
X-sky-is-falling: unlikely
Fixes: da1ce0670c14 "vfs: add cross-rename"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e61b3125a4f036b3c6b87ffd656fc1ab00440ae9 upstream.
The function ioremap() in gscps2_probe() can fail, so
its return value should be checked.
Fixes: 4bdc0d676a643 ("remove ioremap_nocache and devm_ioremap_nocache")
Cc: <stable@vger.kernel.org> # v5.6+
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: Xie Shaowen <studentxswpy@163.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e362359ace6f87c201531872486ff295df306d13 upstream.
Commit 55e8c8eb2c7b ("posix-cpu-timers: Store a reference to a pid not a
task") started looking up tasks by PID when deleting a CPU timer.
When a non-leader thread calls execve, it will switch PIDs with the leader
process. Then, as it calls exit_itimers, posix_cpu_timer_del cannot find
the task because the timer still points out to the old PID.
That means that armed timers won't be disarmed, that is, they won't be
removed from the timerqueue_list. exit_itimers will still release their
memory, and when that list is later processed, it leads to a
use-after-free.
Clean up the timers from the de-threaded task before freeing them. This
prevents a reported use-after-free.
Fixes: 55e8c8eb2c7b ("posix-cpu-timers: Store a reference to a pid not a task")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220809170751.164716-1-cascardo@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e31678fb403eae0f4fe37c6374be098835c73cd upstream.
Solution is to send lease break ack immediately even in case of
deferred close handles to avoid lease break request timing out
and let deferred closed handle gets closed as scheduled.
Later patches could optimize cases where we then close some
of these handles sooner for the cases where lease break is to 'none'
Cc: stable@kernel.org
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a2ba42cbd0b669ce3837ba400905f93dd06c79f upstream.
The bitops compile-time optimization series revealed one more
problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC
warnings:
arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state':
arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
| ^~
arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning
Despite this code working as intended, this redundant double
negation of boolean value, together with comparing to `char`
with no explicit conversion to bool, makes compilers think
the author made some unintentional logical mistakes here.
Make it the other way around and negate the char instead
to silence the warnings.
Fixes: d2aa37411b8e ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches")
Cc: stable@vger.kernel.org # 3.5+
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dec8784c9088b131a1523f582c2194cfc8107dc0 upstream.
Fix kprobes to update kcb (kprobes control block) status flag to
KPROBE_HIT_SSDONE even if the kp->post_handler is not set.
This bug may cause a kernel panic if another INT3 user runs right
after kprobes because kprobe_int3_handler() misunderstands the
INT3 is kprobe's single stepping INT3.
Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Daniel Müller <deso@posteo.net>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20220727210136.jjgc3lpqeq42yr3m@muellerd-fedora-PC2BDTX9
Link: https://lore.kernel.org/r/165942025658.342061.12452378391879093249.stgit@devnote2
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac6c1b2ca77e722a1e5d651f12f437f2f237e658 upstream.
When a ftrace_bug happens (where ftrace fails to modify a location) it is
helpful to have what was at that location as well as what was expected to
be there.
But with the conversion to text_poke() the variable that assigns the
expected for debugging was dropped. Unfortunately, I noticed this when I
needed it. Add it back.
Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream.
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:
Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.
So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.
[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]
Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58d1c124cd79ea686b512043c5bd515590b2ed95 upstream.
When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.
Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 118b0c863c8f5629cc5271fc24d72d926e0715d9 upstream.
FC target disappeared during port perturbation tests due to a race that
tramples target state. Fix the issue by adding state checks before
proceeding.
Link: https://lore.kernel.org/r/20220616053508.27186-8-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2416ccd3815ba1613e10a6da0a24ef21acfe5633 upstream.
FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up. Fix this by accepting RSCN when
the device is not online.
Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3117c83ba316b3200d9f2fe900f2b9a5525a25c upstream.
Put adapter into a wind down state if OS does not make any attempt to
recover the adapter after PCIe error.
Link: https://lore.kernel.org/r/20220616053508.27186-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f260694e6463b63ae550aad25ddefe94cb1904da upstream.
Clear wait for mailbox interrupt flag to prevent stale mailbox:
Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...
To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.
Link: https://lore.kernel.org/r/20220616053508.27186-11-njavali@marvell.com
Fixes: b6faaaf796d7 ("scsi: qla2xxx: Serialize mailbox request")
Cc: Naresh Bannoth <nbannoth@in.ibm.com>
Cc: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Cc: stable@vger.kernel.org
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Tested-by: Naresh Bannoth <nbannoth@in.ibm.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47ccb113cead905bdc236571bf8ac6fed90321b3 upstream.
A direct attach tape device, when gets swapped with another, was not
discovered. Fix this by looking at loop map and reinitialize link if there
are devices present.
Link: https://lore.kernel.org/linux-scsi/baef87c3-5dad-3b47-44c1-6914bfc90108@cybernetics.com/
Link: https://lore.kernel.org/r/20220713052045.10683-8-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63fa7f2644b4b48e1913af33092c044bf48e9321 upstream.
vref_count took an extra decrement in the task management path. Add an
extra ref count to compensate the imbalance.
Link: https://lore.kernel.org/r/20220713052045.10683-7-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4da8c5f76825269f28d6a89fa752934a4bcb6dfa upstream.
Case (1):
The only waiter on wka_port->completion_wq is zfcp_fc_wka_port_get()
trying to open a WKA port. As such it should only be woken up by WKA port
*open* responses, not by WKA port close responses.
Case (2):
A close WKA port response coming in just after having sent a new open WKA
port request and before blocking for the open response with wait_event()
in zfcp_fc_wka_port_get() erroneously renders the wait_event a NOP
because the close handler overwrites wka_port->status. Hence the
wait_event condition is erroneously true and it does not enter blocking
state.
With non-negligible probability, the following time space sequence happens
depending on timing without this fix:
user process ERP thread zfcp work queue tasklet system work queue
============ ========== =============== ======= =================
$ echo 1 > online
zfcp_ccw_set_online
zfcp_ccw_activate
zfcp_erp_adapter_reopen
msleep scan backoff zfcp_erp_strategy
| ...
| zfcp_erp_action_cleanup
| ...
| queue delayed scan_work
| queue ns_up_work
| ns_up_work:
| zfcp_fc_wka_port_get
| open wka request
| open response
| GSPN FC-GS
| RSPN FC-GS [NPIV-only]
| zfcp_fc_wka_port_put
| (--wka->refcount==0)
| sched delayed wka->work
|
~~~Case (1)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zfcp_erp_wait
flush scan_work
| wka->work:
| wka->status=CLOSING
| close wka request
| scan_work:
| zfcp_fc_wka_port_get
| (wka->status==CLOSING)
| wka->status=OPENING
| open wka request
| wait_event
| | close response
| | wka->status=OFFLINE
| | wake_up /*WRONG*/
~~~Case (2)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| wka->work:
| wka->status=CLOSING
| close wka request
zfcp_erp_wait
flush scan_work
| scan_work:
| zfcp_fc_wka_port_get
| (wka->status==CLOSING)
| wka->status=OPENING
| open wka request
| close response
| wka->status=OFFLINE
| wake_up /*WRONG&NOP*/
| wait_event /*NOP*/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| (wka->status!=ONLINE)
| return -EIO
| return early
open response
wka->status=ONLINE
wake_up /*NOP*/
So we erroneously end up with no automatic port scan. This is a big problem
when it happens during boot. The timing is influenced by v3.19 commit
18f87a67e6d6 ("zfcp: auto port scan resiliency").
Fix it by fully mutually excluding zfcp_fc_wka_port_get() and
zfcp_fc_wka_port_offline(). For that to work, we make the latter block
until we got the response for a close WKA port. In order not to penalize
the system workqueue, we move wka_port->work to our own adapter workqueue.
Note that before v2.6.30 commit 828bc1212a68 ("[SCSI] zfcp: Set WKA-port to
offline on adapter deactivation"), zfcp did block in
zfcp_fc_wka_port_offline() as well, but with a different condition.
While at it, make non-functional cleanups to improve code reading in
zfcp_fc_wka_port_get(). If we cannot send the WKA port open request, don't
rely on the subsequent wait_event condition to immediately let this case
pass without blocking. Also don't want to rely on the additional condition
handling the refcount to be skipped just to finally return with -EIO.
Link: https://lore.kernel.org/r/20220729162529.1620730-1-maier@linux.ibm.com
Fixes: 5ab944f97e09 ("[SCSI] zfcp: attach and release SAN nameserver port on demand")
Cc: <stable@vger.kernel.org> #v2.6.28+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00511d2abf5708ad05dd5d1c36adb2468d274698 upstream.
After ufshcd_wl_shutdown() set device power off and link off,
ufshcd_shutdown() could turn off clock/power. Also remove
pm_runtime_get_sync.
The reason why it is safe to remove pm_runtime_get_sync() is because:
- ufshcd_wl_shutdown() -> pm_runtime_get_sync() will resume hba->dev too.
- device resume(turn on clk/power) is not required, even if device is in
RPM_SUSPENDED.
Link: https://lore.kernel.org/r/20220727030526.31022-1-peter.wang@mediatek.com
Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
Cc: <stable@vger.kernel.org> # 5.15.x
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ba592fa014f21f35a8ee8da4ca7b95a018f13e8 ]
In the function s3fb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
[ 54.083733] BUG: unable to handle page fault for address: ffffc90003000000
[ 54.083742] #PF: supervisor write access in kernel mode
[ 54.083744] #PF: error_code(0x0002) - not-present page
[ 54.083760] RIP: 0010:memset_orig+0x33/0xb0
[ 54.083782] Call Trace:
[ 54.083788] s3fb_set_par+0x1ec6/0x4040
[ 54.083806] fb_set_var+0x604/0xeb0
[ 54.083836] do_fb_ioctl+0x234/0x670
Fix the this by checking the value of 'screen_size' before memset_io().
Fixes: a268422de8bf ("fbdev driver for S3 Trio/Virge")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96b550971c65d54d64728d8ba973487878a06454 ]
In the function arkfb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
[ 659.399066] BUG: unable to handle page fault for address: ffffc90003000000
[ 659.399077] #PF: supervisor write access in kernel mode
[ 659.399079] #PF: error_code(0x0002) - not-present page
[ 659.399094] RIP: 0010:memset_orig+0x33/0xb0
[ 659.399116] Call Trace:
[ 659.399122] arkfb_set_par+0x143f/0x24c0
[ 659.399130] fb_set_var+0x604/0xeb0
[ 659.399161] do_fb_ioctl+0x234/0x670
[ 659.399189] fb_ioctl+0xdd/0x130
Fix the this by checking the value of 'screen_size' before memset_io().
Fixes: 681e14730c73 ("arkfb: new framebuffer driver for ARK Logic cards")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec0754c60217248fa77cc9005d66b2b55200ac06 ]
In the function vt8623fb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
[ 583.339036] BUG: unable to handle page fault for address: ffffc90005000000
[ 583.339049] #PF: supervisor write access in kernel mode
[ 583.339052] #PF: error_code(0x0002) - not-present page
[ 583.339074] RIP: 0010:memset_orig+0x33/0xb0
[ 583.339110] Call Trace:
[ 583.339118] vt8623fb_set_par+0x11cd/0x21e0
[ 583.339146] fb_set_var+0x604/0xeb0
[ 583.339181] do_fb_ioctl+0x234/0x670
[ 583.339209] fb_ioctl+0xdd/0x130
Fix the this by checking the value of 'screen_size' before memset_io().
Fixes: 558b7bd86c32 ("vt8623fb: new framebuffer driver for VIA VT8623")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90be48bd9d29ece3965e5e8b21499b6db166e57b ]
If a file has FI_COMPRESS_RELEASED, all writes for it should not be
allowed. However, as of now, in case of compress_mode=user, writes
triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly,
which could crash that file.
To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already
has FI_COMPRESS_RELEASED flag.
This is the reproduction process:
1. $ touch ./file
2. $ chattr +c ./file
3. $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc
4. $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc
5. $ sync
6. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE
7. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS
8. $ release ./file ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS
9. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE again
10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again
This reproduction process is tested in 128kb cluster size.
You can find compr_blocks has a negative value.
Fixes: 5fdb322ff2c2b ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Jaewook Kim <jw5454.kim@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66d34fcbbe63ebd8584b792e0d741f6648100894 ]
Since commit e3c548323d32 ("f2fs: let's allow compression for mmap files"),
it has been allowed to compress mmap files. However, in compress_mode=user,
it is not allowed yet. To keep the same concept in both compress_modes,
f2fs_ioc_(de)compress_file() should also allow it.
Let's remove checking mmap files in f2fs_ioc_(de)compress_file() so that
the compression for mmap files is also allowed in compress_mode=user.
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de979c83574abf6e78f3fa65b716515c91b2613d ]
With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
an empty object file.
With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
GNU assembler doesn't generate a symbol table for empty object files and
objtool fails with the following error when a valid symbol table cannot
be found:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table
To prevent this from happening, build thunk_$(BITS).o only if
CONFIG_PREEMPTION is enabled.
BugLink: https://bugs.launchpad.net/bugs/1911359
Fixes: 320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Ys/Ke7EWjcX+ZlXO@arighi-desktop
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 751d4cbc43879229dbc124afefe240b70fd29a85 ]
The following warning was triggered on a large machine early in boot on
a distribution kernel but the same problem should also affect mainline.
WARNING: CPU: 439 PID: 10 at ../kernel/workqueue.c:2231 process_one_work+0x4d/0x440
Call Trace:
<TASK>
rescuer_thread+0x1f6/0x360
kthread+0x156/0x180
ret_from_fork+0x22/0x30
</TASK>
Commit c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
optimises ttwu by queueing a task that is descheduling on the wakelist,
but does not check if the task descheduling is still allowed to run on that CPU.
In this warning, the problematic task is a workqueue rescue thread which
checks if the rescue is for a per-cpu workqueue and running on the wrong CPU.
While this is early in boot and it should be possible to create workers,
the rescue thread may still used if the MAYDAY_INITIAL_TIMEOUT is reached
or MAYDAY_INTERVAL and on a sufficiently large machine, the rescue
thread is being used frequently.
Tracing confirmed that the task should have migrated properly using the
stopper thread to handle the migration. However, a parallel wakeup from udev
running on another CPU that does not share CPU cache observes p->on_cpu and
uses task_cpu(p), queues the task on the old CPU and triggers the warning.
Check that the wakee task that is descheduling is still allowed to run
on its current CPU and if not, wait for the descheduling to complete
and select an allowed CPU.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220804092119.20137-1-mgorman@techsingularity.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3dd3f674555bd9455c5ae7fafce0696bd9931b3 ]
Wakelist can help avoid cache bouncing and offload the overhead of waker
cpu. So far, using wakelist within the same llc only happens on
WF_ON_CPU, and this limitation could be removed to further improve
wakeup performance.
The commit 518cd6234178 ("sched: Only queue remote wakeups when
crossing cache boundaries") disabled queuing tasks on wakelist when
the cpus share llc. This is because, at that time, the scheduler must
send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
supports TIF_POLLING, so this is not a problem now when the wakee cpu is
in idle polling.
Benefits:
Queuing the task on idle cpu can help improving performance on waker cpu
and utilization on wakee cpu, and further improve locality because
the wakee cpu can handle its own rq. This patch helps improving rt on
our real java workloads where wakeup happens frequently.
Consider the normal condition (CPU0 and CPU1 share same llc)
Before this patch:
CPU0 CPU1
select_task_rq() idle
rq_lock(CPU1->rq)
enqueue_task(CPU1->rq)
notify CPU1 (by sending IPI or CPU1 polling)
resched()
After this patch:
CPU0 CPU1
select_task_rq() idle
add to wakelist of CPU1
notify CPU1 (by sending IPI or CPU1 polling)
rq_lock(CPU1->rq)
enqueue_task(CPU1->rq)
resched()
We see CPU0 can finish its work earlier. It only needs to put task to
wakelist and return.
While CPU1 is idle, so let itself handle its own runqueue data.
This patch brings no difference about IPI.
This patch only takes effect when the wakee cpu is:
1) idle polling
2) idle not polling
For 1), there will be no IPI with or without this patch.
For 2), there will always be an IPI before or after this patch.
Before this patch: waker cpu will enqueue task and check preempt. Since
"idle" will be sure to be preempted, waker cpu must send a resched IPI.
After this patch: waker cpu will put the task to the wakelist of wakee
cpu, and send an IPI.
Benchmark:
We've tested schbench, unixbench, and hachbench on both x86 and arm64.
On x86 (Intel Xeon Platinum 8269CY):
schbench -m 2 -t 8
Latency percentiles (usec) before after
50.0000th: 8 6
75.0000th: 10 7
90.0000th: 11 8
95.0000th: 12 8
*99.0000th: 13 10
99.5000th: 15 11
99.9000th: 18 14
Unixbench with full threads (104)
before after
Dhrystone 2 using register variables 3011862938 3009935994 -0.06%
Double-Precision Whetstone 617119.3 617298.5 0.03%
Execl Throughput 27667.3 27627.3 -0.14%
File Copy 1024 bufsize 2000 maxblocks 785871.4 784906.2 -0.12%
File Copy 256 bufsize 500 maxblocks 210113.6 212635.4 1.20%
File Copy 4096 bufsize 8000 maxblocks 2328862.2 2320529.1 -0.36%
Pipe Throughput 145535622.8 145323033.2 -0.15%
Pipe-based Context Switching 3221686.4 3583975.4 11.25%
Process Creation 101347.1 103345.4 1.97%
Shell Scripts (1 concurrent) 120193.5 123977.8 3.15%
Shell Scripts (8 concurrent) 17233.4 17138.4 -0.55%
System Call Overhead 5300604.8 5312213.6 0.22%
hackbench -g 1 -l 100000
before after
Time 3.246 2.251
On arm64 (Ampere Altra):
schbench -m 2 -t 8
Latency percentiles (usec) before after
50.0000th: 14 10
75.0000th: 19 14
90.0000th: 22 16
95.0000th: 23 16
*99.0000th: 24 17
99.5000th: 24 17
99.9000th: 28 25
Unixbench with full threads (80)
before after
Dhrystone 2 using register variables 3536194249 3537019613 0.02%
Double-Precision Whetstone 629383.6 629431.6 0.01%
Execl Throughput 65920.5 65846.2 -0.11%
File Copy 1024 bufsize 2000 maxblocks 1063722.8 1064026.8 0.03%
File Copy 256 bufsize 500 maxblocks 322684.5 318724.5 -1.23%
File Copy 4096 bufsize 8000 maxblocks 2348285.3 2328804.8 -0.83%
Pipe Throughput 133542875.3 131619389.8 -1.44%
Pipe-based Context Switching 3215356.1 3576945.1 11.25%
Process Creation 108520.5 120184.6 10.75%
Shell Scripts (1 concurrent) 122636.3 121888 -0.61%
Shell Scripts (8 concurrent) 17462.1 17381.4 -0.46%
System Call Overhead 4429998.9 4435006.7 0.11%
hackbench -g 1 -l 100000
before after
Time 4.217 2.916
Our patch has improvement on schbench, hackbench
and Pipe-based Context Switching of unixbench
when there exists idle cpus,
and no obvious regression on other tests of unixbench.
This can help improve rt in scenes where wakeup happens frequently.
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20220608233412.327341-3-dtcccc@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28156108fecb1f808b21d216e8ea8f0d205a530c ]
The commit 2ebb17717550 ("sched/core: Offload wakee task activation if it
the wakee is descheduling") checked rq->nr_running <= 1 to avoid task
stacking when WF_ON_CPU.
Per the ordering of writes to p->on_rq and p->on_cpu, observing p->on_cpu
(WF_ON_CPU) in ttwu_queue_cond() implies !p->on_rq, IOW p has gone through
the deactivate_task() in __schedule(), thus p has been accounted out of
rq->nr_running. As such, the task being the only runnable task on the rq
implies reading rq->nr_running == 0 at that point.
The benchmark result is in [1].
[1] https://lore.kernel.org/all/e34de686-4e85-bde1-9f3c-9bbc86b38627@linux.alibaba.com/
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20220608233412.327341-2-dtcccc@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c58cf40e3a1d2f47c09d3489857e9476316788a ]
A build with -D_FORTIFY_SOURCE=2 enabled will produce the following warnings:
sysfs.c:63:30: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 255 [-Wformat-truncation=]
snprintf(filepath, 256, "%s/%s", path, filename);
^~
Bump up the buffer to PATH_MAX which is the limit and account for all of
the possible NUL and separators that could lead to exceeding the
allocated buffer sizes.
Fixes: 94f69966faf8 ("tools/thermal: Introduce tmon, a tool for thermal subsystem")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f1c4523f7a3aaabe7e53d3ebd378292947e95c8 ]
Since the user can control the arguments of the ioctl() from the user
space, under special arguments that may result in a divide-by-zero bug
in:
drivers/video/fbdev/arkfb.c:784: ark_set_pixclock(info, (hdiv * info->var.pixclock) / hmul);
with hdiv=1, pixclock=1 and hmul=2 you end up with (1*1)/2 = (int) 0.
and then in:
drivers/video/fbdev/arkfb.c:504: rv = dac_set_freq(par->dac, 0, 1000000000 / pixclock);
we'll get a division-by-zero.
The following log can reveal it:
divide error: 0000 [#1] PREEMPT SMP KASAN PTI
RIP: 0010:ark_set_pixclock drivers/video/fbdev/arkfb.c:504 [inline]
RIP: 0010:arkfb_set_par+0x10fc/0x24c0 drivers/video/fbdev/arkfb.c:784
Call Trace:
fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1034
do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110
fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189
Fix this by checking the argument of ark_set_pixclock() first.
Fixes: 681e14730c73 ("arkfb: new framebuffer driver for ARK Logic cards")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 625395c4a0f4775e0fe00f616888d2e6c1ba49db ]
GCC-12 started triggering a new warning:
arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
916 | if (node_to_cpumask_map[node] == NULL) {
| ^~
node_to_cpumask_map is of type cpumask_var_t[].
When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
pointer for dynamic allocation, else to an array of one element. The
"wicked game" can be checked on line 700 of include/linux/cpumask.h.
The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
(i.e. dynamic allocation) in mind, checking if the cpumask was available
via a direct NULL check.
When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
while compiling the kernel.
Fix that by using cpumask_available(), which does the NULL check when
CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
such checks are made.
Conditional definitions of cpumask_available() can be found along with
the definition of cpumask_var_t. Check the cpumask.h reference mentioned
above.
Fixes: c032ef60d1aa ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t")
Fixes: de2d9445f162 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit")
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6e8d40d43ae4dec00c8fea2593eeea3114b8f44 ]
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:
[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
:
[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
:
[80468.207946] Call Trace:
[80468.208947] cpuset_can_attach+0xa0/0x140
[80468.209953] cgroup_migrate_execute+0x8c/0x490
[80468.210931] cgroup_update_dfl_csses+0x254/0x270
[80468.211898] cgroup_subtree_control_write+0x322/0x400
[80468.212854] kernfs_fop_write_iter+0x11c/0x1b0
[80468.213777] new_sync_write+0x11f/0x1b0
[80468.214689] vfs_write+0x1eb/0x280
[80468.215592] ksys_write+0x5f/0xe0
[80468.216463] do_syscall_64+0x5c/0x80
[80468.224287] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.
Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value >= nr_cpu_ids.
Fixes: 7f51412a415d ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 772b6539fdda31462cc08368e78df60b31a58bab ]
Both functions are doing almost the same, that is checking if admission
control is still respected.
With exclusive cpusets, dl_task_can_attach() checks if the destination
cpuset (i.e. its root domain) has enough CPU capacity to accommodate the
task.
dl_cpu_busy() checks if there is enough CPU capacity in the cpuset in
case the CPU is hot-plugged out.
dl_task_can_attach() is used to check if a task can be admitted while
dl_cpu_busy() is used to check if a CPU can be hotplugged out.
Make dl_cpu_busy() able to deal with a task and use it instead of
dl_task_can_attach() in task_can_attach().
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220302183433.333029-4-dietmar.eggemann@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>