IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 9eab0421fa94a3dde0d1f7e36ab3294fc306c99d upstream.
The bug happens when highest bit of holebegin is 1, suppose holebegin is
0x8000000111111000, after shift, hba would be 0xfff8000000111111, then
vma_interval_tree_foreach would look it up fail or leads to the wrong
result.
error call seq e.g.:
- mmap(..., offset=0x8000000111111000)
|- syscall(mmap, ... unsigned long, off):
|- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
here pgoff is correctly shifted to 0x8000000111111,
but pass 0x8000000111111000 as holebegin to unmap
would then cause terrible result, as shown below:
- unmap_mapping_range(..., loff_t const holebegin)
|- pgoff_t hba = holebegin >> PAGE_SHIFT;
/* hba = 0xfff8000000111111 unexpectedly */
The issue happens in Heterogeneous computing, where the device(e.g.
gpu) and host share the same virtual address space.
A simple workflow pattern which hit the issue is:
/* host */
1. userspace first mmap a file backed VA range with specified offset.
e.g. (offset=0x800..., mmap return: va_a)
2. write some data to the corresponding sys page
e.g. (va_a = 0xAABB)
/* device */
3. gpu workload touches VA, triggers gpu fault and notify the host.
/* host */
4. reviced gpu fault notification, then it will:
4.1 unmap host pages and also takes care of cpu tlb
(use unmap_mapping_range with offset=0x800...)
4.2 migrate sys page to device
4.3 setup device page table and resolve device fault.
/* device */
5. gpu workload continued, it accessed va_a and got 0xAABB.
6. gpu workload continued, it wrote 0xBBCC to va_a.
/* host */
7. userspace access va_a, as expected, it will:
7.1 trigger cpu vm fault.
7.2 driver handling fault to migrate gpu local page to host.
8. userspace then could correctly get 0xBBCC from va_a
9. done
But in step 4.1, if we hit the bug this patch mentioned, then userspace
would never trigger cpu fault, and still get the old value: 0xAABB.
Making holebegin unsigned first fixes the bug.
Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com
Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3368e1186e3ce8e38f78cbca019622095b1f331 upstream.
Since commit aa49c90894d0 ("i2c: core: Run atomic i2c xfer when
!preemptible"), the whole reboot/power off sequence on non-preempt kernels
is using atomic i2c xfer, as !preemptible() always results to 1.
During device_shutdown(), the i2c might be used a lot and not all busses
have implemented an atomic xfer handler. This results in a lot of
avoidable noise, like:
[ 12.687169] No atomic I2C transfer handler for 'i2c-0'
[ 12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
...
Fix this by allowing non-atomic xfer when the interrupts are enabled, as
it was before.
Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea
Link: https://lore.kernel.org/r/20240102150350.3180741-1-mwalle@kernel.org
Link: https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
Fixes: aa49c90894d0 ("i2c: core: Run atomic i2c xfer when !preemptible")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
Tested-by: Michael Walle <mwalle@kernel.org>
Tested-by: Tor Vic <torvic9@mailbox.org>
[wsa: removed a comment which needs more work, code is ok]
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac9184fbb8478dab4a0724b279f94956b69be827 upstream.
VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When
the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge,
it appears that accesses to its 'Isochronous Cycle Timer' register (offset
0xf0 on PCI memory space) often causes unexpected system reboot in any
type of AMD Ryzen machine (both 0x17 and 0x19 families). It does not
appears in the other type of machine (AMD pre-Ryzen machine, Intel
machine, at least), or in the other OHCI 1394 hardware (e.g. Texas
Instruments).
The issue explicitly appears at a commit dcadfd7f7c74 ("firewire: core:
use union for callback of transaction completion") added to v6.5 kernel.
It changed 1394 OHCI driver to access to the register every time to
dispatch local asynchronous transaction. However, the issue exists in
older version of kernel as long as it runs in AMD Ryzen machine, since
the access to the register is required to maintain bus time. It is not
hard to imagine that users experience the unexpected system reboot when
generating bus reset by plugging any devices in, or reading the register
by time-aware application programs; e.g. audio sample processing.
This commit suppresses the unexpected system reboot in the combination of
hardware. It avoids the access itself. As a result, the software stack can
not provide the hardware time anymore to unit drivers, userspace
applications, and nodes in the same IEEE 1394 bus. It brings apparent
disadvantage since time-aware application programs require it, while
time-unaware applications are available again; e.g. sbp2.
Cc: stable@vger.kernel.org
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1215436
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217994
Reported-by: Tobias Gruetzmacher <tobias-lists@23.gs>
Closes: https://sourceforge.net/p/linux1394/mailman/message/58711901/
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2240973
Closes: https://bugs.launchpad.net/linux/+bug/2043905
Link: https://lore.kernel.org/r/20240102110150.244475-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c28ac3c7eb945fee6e20f47d576af68fdff1392a upstream.
Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't
much profit in doing lookaround on them. Besides, they can trigger the
pte_special() warning in get_pte_pfn().
Skip them in lru_gen_look_around().
Link: https://lkml.kernel.org/r/20231223045647.1566043-1-yuzhao@google.com
Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: syzbot+03fd9b3f71641f0ebf2d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/000000000000f9ff00060d14c256@google.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5033f58d5feed1040eebeadb0c5efc95b8bf5720 ]
Both helpers only read fields from their socket argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 738b54b9b6236f573eed2453c4cbfa77326793e2 ]
ifconfig ethx up, will set page->refcount larger than 1,
and then ifconfig ethx down, calling __page_frag_cache_drain()
to free pages, it is not compatible with page pool.
So deleting codes which changing page->refcount.
Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI")
Signed-off-by: duanqiangwen <duanqiangwen@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cf72f7f14d12cb065c3d01954cf42fc5638aa69 ]
The hypervisor returns migration failure if all VAS windows are not
closed. During pre-migration stage, vas_migration_handler() sets
migration_in_progress flag and closes all windows from the list.
The allocate VAS window routine checks the migration flag, setup
the window and then add it to the list. So there is possibility of
the migration handler missing the window that is still in the
process of setup.
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
unlock vas_pseries_mutex
Modify window HCALL lock vas_pseries_mutex
setup window migration_in_progress=true
Closes all windows from the list
// May miss windows that are
// not in the list
unlock vas_pseries_mutex
lock vas_pseries_mutex return
if nr_closed_windows == 0
// No DLPAR CPU or migration
add window to the list
// Window will be added to the
// list after the setup is completed
unlock vas_pseries_mutex
return
unlock vas_pseries_mutex
Close VAS window
// due to DLPAR CPU or migration
return -EBUSY
This patch resolves the issue with the following steps:
- Set the migration_in_progress flag without holding mutex.
- Introduce nr_open_wins_progress counter in VAS capabilities
struct
- This counter tracks the number of open windows are still in
progress
- The allocate setup window thread closes windows if the migration
is set and decrements nr_open_window_progress counter
- The migration handler waits for no in-progress open windows.
The code flow with the fix is as follows:
t1: Allocate and open VAS t2: Migration event
window
lock vas_pseries_mutex
If migration_in_progress set
unlock vas_pseries_mutex
return
open window HCALL
nr_open_wins_progress++
// Window opened, but not
// added to the list yet
unlock vas_pseries_mutex
Modify window HCALL migration_in_progress=true
setup window lock vas_pseries_mutex
Closes all windows from the list
While nr_open_wins_progress {
unlock vas_pseries_mutex
lock vas_pseries_mutex sleep
if nr_closed_windows == 0 // Wait if any open window in
or migration is not started // progress. The open window
// No DLPAR CPU or migration // thread closes the window without
add window to the list // adding to the list and return if
nr_open_wins_progress-- // the migration is in progress.
unlock vas_pseries_mutex
return
Close VAS window
nr_open_wins_progress--
unlock vas_pseries_mutex
return -EBUSY lock vas_pseries_mutex
}
unlock vas_pseries_mutex
return
Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231125235104.3405008-1-haren@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ad9843e1ea088bd2529290234c6c4c6374836a7 ]
The emulated IMSIC update the external interrupt pending depending on
the value of eidelivery and topei. It might lose an interrupt when it
is interrupted before setting the new value to the pending status.
For example, when VCPU0 sends an IPI to VCPU1 via IMSIC:
VCPU0 VCPU1
CSRSWAP topei = 0
The VCPU1 has claimed all the external
interrupt in its interrupt handler.
topei of VCPU1's IMSIC = 0
set pending in VCPU1's IMSIC
topei of VCPU1' IMSIC = 1
set the external interrupt
pending of VCPU1
clear the external interrupt pending
of VCPU1
When the VCPU1 switches back to VS mode, it exits the interrupt handler
because the result of CSRSWAP topei is 0. If there are no other external
interrupts injected into the VCPU1's IMSIC, VCPU1 will never know this
pending interrupt unless it initiative read the topei.
If the interruption occurs between updating interrupt pending in IMSIC
and updating external interrupt pending of VCPU, it will not cause a
problem. Suppose that the VCPU1 clears the IPI pending in IMSIC right
after VCPU0 sets the pending, the external interrupt pending of VCPU1
will not be set because the topei is 0. But when the VCPU1 goes back to
VS mode, the pending IPI will be reported by the CSRSWAP topei, it will
not lose this interrupt.
So we only need to make the external interrupt updating procedure as a
critical section to avoid the problem.
Fixes: db8b7e97d613 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Tested-by: Roy Lin <roy.lin@sifive.com>
Tested-by: Wayling Chen <wayling.chen@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b750b22530fe53bf7fd6a30baacd53ada26911b ]
Does the same thing as:
commit 6740ec97bcdb ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311302107.hUDXVyWT-lkp@intel.com/
Fixes: 67e38874b85b ("drm/amd/display: Increase num voltage states to 40")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Alvin Lee <alvin.lee2@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Samson Tam <samson.tam@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 659aa050a53817157b7459529538598a6449c1d3 ]
Currently get_free_mem_region() searches for available capacity
in increments equal to the region size being requested. This can
cause the search to take giant steps through the resource leaving
needless gaps and missing available space.
Specifically 'cxl create-region' fails with ERANGE even though capacity
of the given size and CXL's expected 256M x InterleaveWays alignment can
be satisfied.
Replace the total-request-size increment with a next alignment increment
so that the next possible address is always examined for availability.
Fixes: 14b80582c43e ("resource: Introduce alloc_free_mem_region()")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231113221324.1118092-1-alison.schofield@intel.com
Cc: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5558b92e8d39e18aa19619be2ee37274e9592528 ]
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.
In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).
In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.
Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 458ba8189cb4380aa6a6cc4d52ab067f80a64829 ]
Add a helper to retrieve the number of decoders committed for the port.
Replace all the open coding of the calculation with the helper.
Link: https://lore.kernel.org/linux-cxl/651c98472dfed_ae7e729495@dwillia2-xfh.jf.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/169747906849.272156.1729290904857372335.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Stable-dep-of: 5558b92e8d39 ("cxl/core: Always hold region_rwsem while reading poison lists")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67e38874b85b8df7b23d29f78ac3d7ecccd9519d ]
[Description]
If during driver init stage there are greater than 20
intermediary voltage states while constructing the SOC
BB we could hit issues because we will index outside of the
clock_limits array and start overwriting data. Increase the
total number of states to 40 to avoid this issue.
Cc: stable@vger.kernel.org # 6.1+
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d21a3962d3042e6f56ad324cf18bdd64a1e6ecfa ]
We used to call intel_pre_plane_updates() for any pipe going through
a modeset whether the pipe was previously enabled or not. This in
fact needed to apply all the necessary clock gating workarounds/etc.
Restore the correct behaviour.
Fixes: 39919997322f ("drm/i915: Disable all planes before modesetting any pipes")
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231121054324.9988-3-ville.syrjala@linux.intel.com
(cherry picked from commit e0d5ce11ed0a21bb2bf328ad82fd261783c7ad88)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99fe9ee56bd2f7358f1bc72551c2f3a6bbddf80a ]
SCLK_SDMMC is the parent for SCLK_SDMMC_DRV and SCLK_SDMMC_SAMPLE, but
used with the (more) correct name sclk_sdmmc. SD card tuning does currently
fail as the parent can't be found under that name.
There is no need to suffix the name with '0' since RK312x SoCs do have a
single sdmmc controller - so rename it to the name which is already used
by it's children.
Fixes: f6022e88faca ("clk: rockchip: add clock controller for rk3128")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20231127181415.11735-6-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98dcc6be3859fb15257750b8e1d4e0eefd2c5e1e ]
According to the TRM there are no specific gpll_peri, cpll_peri,
gpll_div2_peri or gpll_div3_peri gates, but a single clk_peri_src gate.
Instead mux_clk_peri_src directly connects to the plls respectively the pll
divider clocks.
Fix this by creating a single gated composite.
Also rename all occurrences of aclk_peri_src to clk_peri_src, since it
is the parent for peri aclks, pclks and hclks. That name also matches
the one used in the TRM.
Fixes: f6022e88faca ("clk: rockchip: add clock controller for rk3128")
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
[renamed aclk_peri_src -> clk_peri_src and added commit message]
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20231127181415.11735-4-knaerzche@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06f76e464ac81c6915430b7155769ea4ef16efe4 ]
The lowest supported clock frequency of the PHY is 125MHz (see also
mtk_mipi_tx_pll_enable()), but the clamping in .round_rate() has the
wrong minimal value, which will make the .enable() op return -EINVAL on
low frequencies. Fix the minimal clamping value.
Fixes: efda51a58b4a ("drm/mediatek: add mipi_tx driver for mt8183")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20231123110202.2025585-1-mwalle@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e645c20e8e9cde549bc233435d3c1338e1cd27fe ]
The enforce_cache_coherency callback ensures DMA cache coherency for
devices attached to the domain.
Intel IOMMU supports enforced DMA cache coherency when the Snoop
Control bit in the IOMMU's extended capability register is set.
Supporting it differs between legacy and scalable modes.
In legacy mode, it's supported page-level by setting the SNP field
in second-stage page-table entries. In scalable mode, it's supported
in PASID-table granularity by setting the PGSNP field in PASID-table
entries.
In legacy mode, mappings before attaching to a device have SNP
fields cleared, while mappings after the callback have them set.
This means partial DMAs are cache coherent while others are not.
One possible fix is replaying mappings and flipping SNP bits when
attaching a domain to a device. But this seems to be over-engineered,
given that all real use cases just attach an empty domain to a device.
To meet practical needs while reducing mode differences, only support
enforce_cache_coherency on a domain without mappings if SNP field is
used.
Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cd2fe4fd63e54b799a68c0856bda18f2e40caa8 ]
assign_bit() expects a bit number and not a mask like BIT(x). Hence,
just remove the BIT() macro from the #defines.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202311060647.i9XyO4ej-lkp@intel.com/
Fixes: fff7352bf7a3ce ("iio: imu: Add support for adis16475")
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20231106150730.945-1-nuno.sa@analog.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be37542afbfcd27b3bb99a135abf9b4736b96f75 ]
AM62Ax has 3 SPI channels where each channel has 4x TX and 4x RX
threads. Also fix the thread numbers to match what the firmware expects
according to the PSI-L device description.
Link: http://downloads.ti.com/tisci/esd/latest/5_soc_doc/am62ax/psil_cfg.html [1]
Fixes: aac6db7e243a ("dmaengine: ti: k3-psil-am62a: Add AM62Ax PSIL and PDMA data")
Signed-off-by: Jai Luthra <j-luthra@ti.com>
Link: https://lore.kernel.org/r/20231123-psil_fix-v1-1-6604d80819be@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 744f5e7b69710701dc225020769138f8ca2894df ]
AM62x has 3 SPI channels where each channel has 4 TX and 4 RX threads.
This also fixes the thread numbers.
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Fixes: 5ac6bfb58777 ("dmaengine: ti: k3-psil: Add AM62x PSIL and PDMA data")
Reviewed-by: Jai Luthra <j-luthra@ti.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20231030190113.16782-1-rwahl@gmx.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f40d5099cd6d828fd7de6227d3eabe86016724c ]
When the node for this phy selector is a child node of a syscon node then the
property 'reg' is used as an offset into the parent regmap. When the node
is standalone and gets its own regmap this offset is pre-applied. So we need
to track which method was used to get the regmap and not apply the offset
in the standalone case.
Fixes: 1fdfa7cccd35 ("phy: ti: gmii-sel: Allow parent to not be syscon node")
Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20231025143302.1265633-1-afd@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80aea01c48971a1fffc0252d036995572d84950d ]
When the host invalidates a guest page, it will also check if the page
was used to map the prefix of any guest CPUs, in which case they are
stopped and marked as needing a prefix refresh. Upon starting the
affected CPUs again, their prefix pages are explicitly faulted in and
revalidated if they had been invalidated. A bit in the PGSTEs indicates
whether or not a page might contain a prefix. The bit is allowed to
overindicate. Pages above 2G are skipped, because they cannot be
prefixes, since KVM runs all guests with MSO = 0.
The same applies for nested guests (VSIE). When the host invalidates a
guest page that maps the prefix of the nested guest, it has to stop the
affected nested guest CPUs and mark them as needing a prefix refresh.
The same PGSTE bit used for the guest prefix is also used for the
nested guest. Pages above 2G are skipped like for normal guests, which
is the source of the bug.
The nested guest runs is the guest primary address space. The guest
could be running the nested guest using MSO != 0. If the MSO + prefix
for the nested guest is above 2G, the check for nested prefix will skip
it. This will cause the invalidation notifier to not stop the CPUs of
the nested guest and not mark them as needing refresh. When the nested
guest is run again, its prefix will not be refreshed, since it has not
been marked for refresh. This will cause a fatal validity intercept
with VIR code 37.
Fix this by removing the check for 2G for nested guests. Now all
invalidations of pages with the notify bit set will always scan the
existing VSIE shadow state descriptors.
This allows to catch invalidations of nested guest prefix mappings even
when the prefix is above 2G in the guest virtual address space.
Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
Tested-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20231102153549.53984-1-imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c20d36cc2a2073d4cdcda92bd7a1bb9b3b3b7c79 ]
If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912154040.3306-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a80712b9cc7e57830260ec5e1feb9cdb59e1da2f ]
The commit:
cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
has changed the semantics of what is to be considered an idle task in
such a way that the idle task of an offline CPU may not carry the
PF_IDLE flag anymore.
However RCU-tasks-trace tests the opposite assertion, still assuming
that idle tasks carry the PF_IDLE flag during their whole lifecycle.
Remove this assumption to avoid spurious warnings but keep the initial
test verifying that the idle task is the current task on any offline
CPU.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9715ed501b585d47444865071674c961c0cc0020 ]
The commit:
cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
has changed the semantics of what is to be considered an idle task in
such a way that CPU boot code preceding the actual idle loop is excluded
from it.
This has however introduced new potential RCU-tasks stalls when either:
1) Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.
2) Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until the
CPU ever boots up.
3) Similar to 1) but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It stays
then stuck in the holdout list until that CPU ever schedules. The
effect is mitigated here by the hotplug AP thread that must run to
bring the CPU up.
Fix this with handling the new semantics of PF_IDLE, keeping in mind
that it may or may not be set on an idle task. Take advantage of that to
strengthen the coverage of an RCU-tasks quiescent state within an idle
task, excluding the CPU boot code from it. Only the code running within
the idle loop is now a quiescent state, along with offline CPUs.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2be4686d866ad5896f2bb94d82fe892197aea9c7 ]
Export the RCU point of view as to when a CPU is considered offline
(ie: when does RCU consider that a CPU is sufficiently down in the
hotplug process to not feature any possible read side).
This will be used by RCU-tasks whose vision of an offline CPU should
reasonably match the one of RCU core.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85d68222ddc5f4522e456d97d201166acb50f716 ]
Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") added a kfree() call to free any user
provided affinity mask, if present. It was changed later to use
kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
in do_set_cpus_allowed()") to avoid a circular locking dependency
problem.
It turns out that even kfree_rcu() isn't safe for avoiding
circular locking problem. As reported by kernel test robot,
the following circular locking dependency now exists:
&rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
the resched_cpu() out from under rcu_node lock.
[peterz: heavily borrowed from Waiman's Changelog]
[paulmck: applied Z qiang feedback]
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b27d5c420335dad7aea1aa6e799fe1d05c63b7e ]
The acpi_thermal_unregister_thermal_zone() is paired with
acpi_thermal_register_thermal_zone() so it should mirror it. It should
clean up all the resources that the register function allocated and
leave the stuff that was allocated elsewhere.
Unfortunately, it doesn't call thermal_zone_device_disable(). Also it
calls kfree(tz->trip_table) when it shouldn't. That was allocated in
acpi_thermal_add(). Putting the kfree() here leads to a double free
in the acpi_thermal_add() clean up function.
Likewise, the acpi_thermal_remove() should mirror acpi_thermal_add() so
it should have an explicit kfree(tz->trip_table) as well.
Fixes: ec23c1c462de ("ACPI: thermal: Use trip point table to register thermal zones")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a53e215f90079f617360439b1b6284820731e34c ]
The cited patch tries to ensure no pending works on the mkey cache
workqueue by disabling adding new works and call flush_workqueue().
But this workqueue also has delayed works which might still be pending
the delay time to be queued.
Add cancel_delayed_work() for the delayed works which waits to be queued
and then the flush_workqueue() will flush all works which are already
queued and running.
Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
Link: https://lore.kernel.org/r/b8722f14e7ed81452f791764a26d2ed4cfa11478.1698256179.git.leon@kernel.org
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e52b1164a474dc7b90f68fbb40e35ccd7f7e2e2 ]
Increase the size of temporary print buffer on stack to fix the
following warnings reported by LKP.
Since all the input parameters of snprintf() are under control
of this driver, it is not possible to trigger and overflow here,
but since the print buffer is on stack and discarded once driver
probe() finishes, it is not an issue to increase it by 10 bytes
and fix the warning in the process. Make it so.
"
drivers/clk/clk-si521xx.c: In function 'si521xx_probe':
>> drivers/clk/clk-si521xx.c:318:26: warning: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size 2 [-Wformat-truncation=]
snprintf(name, 6, "DIFF%d", i);
^~
drivers/clk/clk-si521xx.c:318:21: note: directive argument in the range [0, 2147483647]
snprintf(name, 6, "DIFF%d", i);
^~~~~~~~
drivers/clk/clk-si521xx.c:318:3: note: 'snprintf' output between 6 and 15 bytes into a destination of size 6
snprintf(name, 6, "DIFF%d", i);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"
Fixes: edc12763a3a2 ("clk: si521xx: Clock driver for Skyworks Si521xx I2C PCIe clock generators")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310260412.AGASjFN4-lkp@intel.com/
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20231027085840.30098-1-marex@denx.de
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 293fbc28818135743f54d46c418ede3e4a20a742 ]
The mtty driver does not currently conform to the vfio SET_IRQS uAPI.
For example, it claims to support mask and unmask of INTx, but actually
does nothing. It claims to support AUTOMASK for INTx, but doesn't. It
fails to teardown eventfds under the full semantics specified by the
SET_IRQS ioctl. It also fails to teardown eventfds when the device is
closed, leading to memory leaks. It claims to support the request IRQ,
but doesn't.
Fix all these.
A side effect of this is that QEMU will now report a warning:
vfio <uuid>: Failed to set up UNMASK eventfd signaling for interrupt \
INTX-0: VFIO_DEVICE_SET_IRQS failure: Inappropriate ioctl for device
The fact is that the unmask eventfd was never supported but quietly
failed. mtty never honored the AUTOMASK behavior, therefore there
was nothing to unmask. QEMU is verbose about the failure, but
properly falls back to userspace unmasking.
Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20231016224736.2575718-2-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5acab6eb592387191c1bb745ba9b815e1e076db5 ]
During hisilicon accelerator live migration operation. In order to
prevent the problem of EQ/AEQ interrupt loss. Migration driver will
trigger an EQ/AEQ doorbell at the end of the migration.
This operation may cause double interruption of EQ/AEQ events.
To ensure that the EQ/AEQ interrupt processing function is normal.
The interrupt handling functionality of EQ/AEQ needs to be updated.
Used to handle repeated interrupts event.
Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01aed663e6c421aeafc9c330bda630976b50a764 ]
There is no need to free the reset_data structure if the recovery is
unsuccessful and the reset is synchronous. The function
adf_dev_aer_schedule_reset() handles the cleanup properly. Only
asynchronous resets require such structure to be freed inside the reset
worker.
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Signed-off-by: Svyatoslav Pankratov <svyatoslav.pankratov@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb40d32689d73c46de39a0529d551f523f21dc9b ]
Since commit adad556efcdd ("crypto: api - Fix built-in testing
dependency failures"), the following warning appears when booting an
x86_64 kernel that is configured with
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y and CONFIG_CRYPTO_AES_NI_INTEL=y,
even when CONFIG_CRYPTO_XTS=y and CONFIG_CRYPTO_AES=y:
alg: skcipher: skipping comparison tests for xts-aes-aesni because xts(ecb(aes-generic)) is unavailable
This is caused by an issue in the xts template where it allocates an
"aes" single-block cipher without declaring a dependency on it via the
crypto_spawn mechanism. This issue was exposed by the above commit
because it reversed the order that the algorithms are tested in.
Specifically, when "xts(ecb(aes-generic))" is instantiated and tested
during the comparison tests for "xts-aes-aesni", the "xts" template
allocates an "aes" crypto_cipher for encrypting tweaks. This resolves
to "aes-aesni". (Getting "aes-aesni" instead of "aes-generic" here is a
bit weird, but it's apparently intended.) Due to the above-mentioned
commit, the testing of "aes-aesni", and the finalization of its
registration, now happens at this point instead of before. At the end
of that, crypto_remove_spawns() unregisters all algorithm instances that
depend on a lower-priority "aes" implementation such as "aes-generic"
but that do not depend on "aes-aesni". However, because "xts" does not
use the crypto_spawn mechanism for its "aes", its dependency on
"aes-aesni" is not recognized by crypto_remove_spawns(). Thus,
crypto_remove_spawns() unexpectedly unregisters "xts(ecb(aes-generic))".
Fix this issue by making the "xts" template use the crypto_spawn
mechanism for its "aes" dependency, like what other templates do.
Note, this fix could be applied as far back as commit f1c131b45410
("crypto: xts - Convert to skcipher"). However, the issue only got
exposed by the much more recent changes to how the crypto API runs the
self-tests, so there should be no need to backport this to very old
kernels. Also, an alternative fix would be to flip the list iteration
order in crypto_start_tests() to restore the original testing order.
I'm thinking we should do that too, since the original order seems more
natural, but it shouldn't be relied on for correctness.
Fixes: adad556efcdd ("crypto: api - Fix built-in testing dependency failures")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23671f4dfd10b48b4a2fee4768886f0d8ec55b7e ]
libbpf accesses the ELF data requiring at least 8 byte alignment,
however, the data is generated into a C string that doesn't guarantee
alignment. Fix this by assigning to an aligned char array. Use sizeof
on the array, less one for the \0 terminator, rather than generating a
constant.
Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231007044439.25171-1-irogers@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebc8484d0e6da9e6c9e8cfa1f40bf94e9c6fc512 ]
This cast was made by purpose for older libbpf where the
bpf_object_skeleton field is void * instead of const void *
to eliminate a warning (as i understand
-Wincompatible-pointer-types-discards-qualifiers) but this
cast introduces another warning (-Wcast-qual) for libbpf
where data field is const void *
It makes sense for bpftool to be in sync with libbpf from
kernel sources
Signed-off-by: Denys Zagorui <dzagorui@cisco.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230907090210.968612-1-dzagorui@cisco.com
Stable-dep-of: 23671f4dfd10 ("bpftool: Align output skeleton ELF code")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbf80d713fe75cfbecda26e7c03a9a8d22af2f4f ]
While BPF allows to set icsk->->icsk_delack_max
and/or icsk->icsk_rto_min, we have an ip route
attribute (RTAX_RTO_MIN) to be able to tune rto_min,
but nothing to consequently adjust max delayed ack,
which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
This makes RTAX_RTO_MIN of almost no practical use,
unless customers are in big trouble.
Modern days datacenter communications want to set
rto_min to ~5 ms, and the max delayed ack one jiffie
smaller to avoid spurious retransmits.
After this patch, an "rto_min 5" route attribute will
effectively lower max delayed ack timers to 4 ms.
Note in the following ss output, "rto:6 ... ato:4"
$ ss -temoi dst XXXXXX
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 [2002:a05:6608:295::]:52950 [2002:a05:6608:297::]:41597
ino:255134 sk:1001 <->
skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack
cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500
rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121
bytes_received:54823120 segs_out:1370582 segs_in:1370580
data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps
pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579
busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920
rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536
While we could argue this patch fixes a bug with RTAX_RTO_MIN,
I do not add a Fixes: tag, so that we can soak it a bit before
asking backports to stable branches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f69791c39745e64621216fe8919cb73c0065002b ]
Right now we never release the power-domains properly on the error path.
Add a routine to be reused for this purpose and appropriate jumps in
probe() to run that routine where necessary.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable@vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b278080a89f452063915beda0ade6b3ed5ee4271 ]
Previously the jump label err_cleanup was used higher in the probe()
function to release the async notifier however the async notifier
registration was moved later in the code rendering the previous four jumps
redundant.
Rename the label from err_cleanup to err_v4l2_device_unregister to capture
what the jump does.
Fixes: 51397a4ec75d ("media: qcom: Initialise V4L2 async notifier later")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil: fix old name in commit log: err_v4l2_device_register -> err_v4l2_device_unregister]
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d609f3d228a8efe991f44f11f24146e2a5209755 ]
Userspace applications indicate their multi-buffer capability to xsk
using XSK_USE_SG socket bind flag. For sockets using shared umem the
bind flag may contain XSK_USE_SG only for the first socket. For any
subsequent socket the only option supported is XDP_SHARED_UMEM.
Add option XDP_UMEM_SG_FLAG in umem config flags to store the
multi-buffer handling capability when indicated by XSK_USE_SG option in
bing flag by the first socket. Use this to derive multi-buffer capability
for subsequent sockets in xsk core.
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 376907f3a0b34a17e80417825f8cc1c40fcba81b ]
Patch series "Three memory-failure fixes".
I've been looking at the memory-failure code and I believe I have found
three bugs that need fixing -- one going all the way back to 2010! I'll
have more patches later to use folios more extensively but didn't want
these bugfixes to get caught up in that.
This patch (of 3):
Both collect_procs_anon() and collect_procs_file() iterate over the VMA
interval trees looking for a single pgoff, so it is wrong to look for the
pgoff of the head page as is currently done. However, it is also wrong to
look at page->mapping of the precise page as this is invalid for tail
pages. Clear up the confusion by passing both the folio and the precise
page to collect_procs().
Link: https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20231218135837.3310403-2-willy@infradead.org
Fixes: 415c64c1453a ("mm/memory-failure: split thp earlier in memory error handling")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>