linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	2f478b6011	RISC-V updates for v5.3-rc5 These updates include: - Two patches to fix significant bugs in floating point register context handling - A minor fix in RISC-V flush_tlb_page(), to supply a valid end address to flush_tlb_range() - Two minor defconfig additions: to build the virtio hwrng driver by default (for QEMU targets), and to partially synchronize the 32-bit defconfig with the 64-bit defconfig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl1XVWUACgkQx4+xDQu9 KkvF8w//b5zWgQVjTpPtaBDz18AKeSRivBAYDexJOangtbz7j7B0uuR3Rjd7nWpP Ky3R9UKKTms9/wg/jnY0bSd7FzNW8om22gnbTDPTIq4EEuTrsHBIIyFd/P+/5w5p zDl5qrWCDNQoYQ9OlBRTAGFZuw1fv5Nd90FDZtI0GlTVdZDu8TPuL2o8HQ3DcSVM /yaqm/ceB6mNmIjY6YF3YviwWn3KcfeHKqw3P0SDAPFwBF4kP8q/47QqAh8a1uVr M5upTf5ccnGe4lJhLBQLOir7Ua80rH11ZBGlVWfN2Vxf8aFaYPfjYBnm2ByJaHBd Ooux1u1fghLqWBRfiNaIwalOZBVLyWF9Q3FpCYauJNsJ0ldHrkB/Pdfo+lGEUqKq xx6SPwA9tnIiUe0X8Rh2duOz0e7HjXzU6J6VmdEGydLNlwTz/vwFeEqWDfpEooXE AEBr2vU44CQ8s0TWDD2W3oGV5qwyCmI86Ib9jcIpZmrbVGt6lukpD7DDOCD3hjhz moh7H4thytS7su+O4VZwsshYfQ/JX5Y9YhGbi7xYh8LUkHciREja1Qi14owkzkcd A8u6rvWkWqUxuqGJTkMSlU61O02+z2LoTV4Iwvm6Jt55OGmx58gDHDATobyRBn2b HyKyQKW4ur9pZQGlAk9tfglSeTTP6FcX6ZMapubM32VWJ/ZNzcM= =dwBU -----END PGP SIGNATURE----- Merge tag 'riscv/for-v5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - Two patches to fix significant bugs in floating point register context handling - A minor fix in RISC-V flush_tlb_page(), to supply a valid end address to flush_tlb_range() - Two minor defconfig additions: to build the virtio hwrng driver by default (for QEMU targets), and to partially synchronize the 32-bit defconfig with the 64-bit defconfig * tag 'riscv/for-v5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Make __fstate_clean() work correctly. riscv: Correct the initialized flow of FP register riscv: defconfig: Update the defconfig riscv: rv32_defconfig: Update the defconfig riscv: fix flush_tlb_range() end address for flush_tlb_page()	2019-08-17 10:36:47 -07:00
Linus Torvalds	6e625a1a3f	Xtensa fixes for v5.3: - add missing isync into cpu_reset to make sure ITLB changes are effective. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAl1W6QYTHGpjbXZia2Jj QGdtYWlsLmNvbQAKCRBR+cyR+D+gRHy6EACXPega7A/w9noqMosAWlqVGRS08tlH DCWSyqrwYeDBcG14lHUCzPUyV1xHTnOnplYtj33pvJX0PSh8ERAcARI9MBriwVY8 hfEcLIc0dOz9wP/bZuYXM6OgCFd34reozYU43c9/r28A+bUMARF2gxuzhxiY8CNj 7gw5rJDSiTGFK919/gdElu5HVY0LtW9DUBIgAZcdqA1rnu8vOV0oitB8/xPUNUex WrGWKgv32xy3UWMZD9b5G81lHs+zCvXxUVbE7UEAT2AqzaB6FpiEW8V6OekckJVD mZGtWJIQYMlh5bJR9x4+NQOlysYnZxhEzzzJxEFlG9E3Hm3CkeSt7mYhxFi41cDH 1G+34qm7JEJWwOBeSPzb/RqAGPzbjp0EkVad0M0ocuPjm8+AsojonH749Ox58gWL RBYvaKH98nYPkD2mu6vMHZGwOFUr8iJSNBel+8IejiOifUKJiZjRCpZcNK6Toc5h /RVSQuKeTKEe/uZhMZTGkgFqGrTvWOipbEW3+//QSxJVTiezjZC5LiMwGZ10xHBN aGilB6Ty7bPi/JBhSZNNxKMYAvS3jnOVdVpggWyVHxXSRgnJ+WKGOdcgBGjWFnL8 WPP0Uwcs7ieo6Y92RfVWNhAYhazGSrCVJ6/M12O22xVPWzyJTgQSK4H4bj6H7n0s cXTvRnJBYsVqDg== =Awr/ -----END PGP SIGNATURE----- Merge tag 'xtensa-20190816' of git://github.com/jcmvbkbc/linux-xtensa Pull Xtensa fix from Max Filippov: "Add missing isync into cpu_reset to make sure ITLB changes are effective" * tag 'xtensa-20190816' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: add missing isync to the cpu_reset TLB code	2019-08-16 17:27:55 -07:00
Linus Torvalds	b7e7c85dc7	arm64 fixes: - Don't taint the kernel if CPUs have different sets of page sizes supported (other than the one in use). - Issue I-cache maintenance for module ftrace trampoline. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl1W5VMACgkQa9axLQDI XvFAWQ//RrxMHeN7/Ynv1MDqucZlMpQJMUV2V0K5wYO6ZwGMKYXw62SBzb9AMv0y iFhYNZbtBoE2JhLNEfwhdNkk9NfUsKbUcfyt0cono2DU1tVihmmVqbNbairNhKVo j7vxnCx8SMQ5ZT+QaTpKUddzlzb4jdycXGYaje62bbA19BCOVVIuy61wGNtHP4IU 817RHqIj6aqIbmplt79+Q3vOopB8BbxrnpC5cp8rw1CKbfu7kQN4zePIA4Z4bhag G5hm+aTV1qzrmEud2WkuA7044vsw2Wkd/8gksRyQKkypfqfQ3NrASDn3xGqOi/5t 2DsyJPaVUC1tsJdrVqpfWZfLJJ8FGyox7aeXL7OdPrfWP6HI9jJnodRVH/av97g6 psaSoJNxXbVqrg/wva6i2f25KBYdp/vQW0Nmjljvt4dmEDrE/jpifAYAH/LUmi5H fMNXyOdeDcgUoVfcEk/S1leiDf0gUy+B8ylknk12knbMuk/9ATq/2H3RKqrRJYWL qUQBvB04d7NHZl8wl+IVLlK8g4x5Jeetjm7GHvLbOp9agb63kwVFq50c4zD052LA eKy6LbUO2xHyA3PrtvBem/hKZ/GCTh0o0xcwUkyNcsLUHfKnMcHLEH/WWd5rmYIQ xvbQVhVORR8ru7eCQCRMBmjGaHFz2ZQa4Q3V3qVBnX+q/F6dku8= =VneQ -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Don't taint the kernel if CPUs have different sets of page sizes supported (other than the one in use). - Issue I-cache maintenance for module ftrace trampoline. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side arm64: cpufeature: Don't treat granule sizes as strict	2019-08-16 10:51:47 -07:00
Will Deacon	b6143d10d2	arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side The initial support for dynamic ftrace trampolines in modules made use of an indirect branch which loaded its target from the beginning of a special section (e71a4e1bebaf7 ("arm64: ftrace: add support for far branches to dynamic ftrace")). Since no instructions were being patched, no cache maintenance was needed. However, later in be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code") this code was reworked to output the trampoline instructions directly into the PLT entry but, unfortunately, the necessary cache maintenance was overlooked. Add a call to __flush_icache_range() after writing the new trampoline instructions but before patching in the branch to the trampoline. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: James Morse <james.morse@arm.com> Cc: <stable@vger.kernel.org> Fixes: be0f272bfc83 ("arm64: ftrace: emit ftrace-mod.o contents through code") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2019-08-16 17:40:03 +01:00
Linus Torvalds	2d63ba3e41	Power management fixes for 5.3-rc5 - Disable NVMe power optimization related to suspend-to-idle added recently on systems where PCIe ASPM is not able to put PCIe links into low-power states to prevent excess power from being drawn by the system while suspended (Rafael Wysocki). - Make the schedutil governor handle frequency limits changes properly in all cases (Viresh Kumar). - Prevent the cpufreq core from treating positive values returned by dev_pm_qos_update_request() as errors (Viresh Kumar). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl1WqLMSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxCkAP/146AuGXj8tOdHxkpl6DVgm0WVRNxCtL Z9Y+1xBRBSYVZkeDsjzox995z8Ha/0tnMp6EPcnxebkFpRx3fyldXKUKxqJARPPi n2jGhqCPNcAHK2UPdGH8EvHOI2uWBMBa2jW2Qw9m0V/+9Zy58ZvKqso/+myFkz2S YRekJPADsI3GZW1SZ3dY4/12jcKsQt32TWaGOLqKx3R1J1BnpyxduXfqJ6FUrH9b P/F9cVb2UEbawh5QpNmfMsfBb/DsE08NQhPWe91m0VgcLd6IZsoNux0Rd8HJOvRM +5vh6qPTABnNN1+7blFw64/hCu1N2hq8KLl6DzPeKohysKiDkmLh3QGB+ISRpj+H 5GKF8gnQFvN0fPJF8NU+eIZ0IaOryrooSu4TeCcAWAozJ0ln2mjNoC2h6U1B8Y29 UH+e2z+6kVTHwjiTjPacjQ0wnkUctoiT71kMxQ8Q+GFG3fQcz3GFFM17eITnAI/Q ws1bPHn1ovxl1GmdQwQK3KnT1cK5/fApaVKQLJiRkUvZ1gCZ3ZcruPlh+qA5zpGf +RGPXn/Rm1LA1uCkS4j6REBp6vhcVJoVEVnEGzhovdtJcuJ9erlh5I2zz4UxURnn cHH48exFmwC+uBhIyQVuYOYgLU3naztBLFg1/l68sMQFonWjIQ/Hp1B9cIgigwbf 5+BlT1llvIH3 =eCcy -----END PGP SIGNATURE----- Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These add a check to avoid recent suspend-to-idle power regression on systems with NVMe drives where the PCIe ASPM policy is "performance" (or when the kernel is built without ASPM support), fix an issue related to frequency limits in the schedutil cpufreq governor and fix a mistake related to the PM QoS usage in the cpufreq core introduced recently. Specifics: - Disable NVMe power optimization related to suspend-to-idle added recently on systems where PCIe ASPM is not able to put PCIe links into low-power states to prevent excess power from being drawn by the system while suspended (Rafael Wysocki). - Make the schedutil governor handle frequency limits changes properly in all cases (Viresh Kumar). - Prevent the cpufreq core from treating positive values returned by dev_pm_qos_update_request() as errors (Viresh Kumar)" * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled PCI/ASPM: Add pcie_aspm_enabled() cpufreq: schedutil: Don't skip freq update when limits change cpufreq: dev_pm_qos_update_request() can return 1 on success	2019-08-16 09:13:16 -07:00
Linus Torvalds	9da5bb24bb	dmaengine fixes for v5.3-rc5 Fixes in dmaengine drivers for: - dw-edma endianess, _iomem type and stack usages - ste_dma40 unneeded variable and null-pointer dereference - tegra210-adma unused function - omap-dma off-by-one fix -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJdVnkYAAoJEHwUBw8lI4NHk2sQAM8dOHAsYexFc4ERaNOXC9pL 9ZhZccBgYeVaKE/oCJ4iKevMpBNFNzRZMOPNOvEtO1tbMTJjkgBGrKrQQju6X0Nr SBwVpSrQu1Oex3ANwBXei3CJGzbdOoLxDdyveGbsfL1ZY0c9Ilsh0Os8EwrNSKFr NDns84snjk69ipqMTzRAA3oM1Zk7ZUrmwNpgw0o8UWjoGSK0/Yt1MJ4RwlX2IgoA 8vESw9Mu/9a3gzFKe/mmVrG+LF22KYd8fMhtu3DbIPzDqb3Ns1gUWQQvBStGXPR9 lkaa5ZNL9Eo5vhpX4/QqZ3s7VkN49sJ/t4etsXv8m+s9g3zepRSSiAgL84JBbjEj t8o9PRb4ePPZmBElQ165He/GJqBimbPDYasSgO1xuT49ENbH9HzTM2moHLDNPz7I bveVfU5Mtzxxy5WaHBhBTnSwNhzNmgZ8G+17fBeItpcvZA2nbmoNfVN9Ma0rKuDQ zMLzU25UPoD9/aONg26VJXJnyZ0n01a6L1Ltj1X5SmW6BLnMziRpOqSXOkuTXbF1 h5P7iJH0cz1XfA7PJDh/Dzf3WjJYmsnBzxsj1P8xt/LTCk9iuJ5m/TFEMmTtW6iX IkkjTITLoy/APpLHxmU/3mx3j+gH400C1li5fk7oNVJU9iV+IQV4EKGi/2An0Kd1 +DBrOcw8TRdIIBkKAuXv =x4VH -----END PGP SIGNATURE----- Merge tag 'dmaengine-fix-5.3-rc5' of git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "Fixes in dmaengine drivers for: - dw-edma: endianess, _iomem type and stack usages - ste_dma40: unneeded variable and null-pointer dereference - tegra210-adma: unused function - omap-dma: off-by-one fix" * tag 'dmaengine-fix-5.3-rc5' of git://git.infradead.org/users/vkoul/slave-dma: omap-dma/omap_vout_vrfb: fix off-by-one fi value dmaengine: stm32-mdma: Fix a possible null-pointer dereference in stm32_mdma_irq_handler() dmaengine: tegra210-adma: Fix unused function warnings dmaengine: ste_dma40: fix unneeded variable warning dmaengine: dw-edma: fix endianess confusion dmaengine: dw-edma: fix __iomem type confusion dmaengine: dw-edma: fix unnecessary stack usage	2019-08-16 08:59:33 -07:00
Linus Torvalds	cfa0bb2aef	sound fixes for 5.3-rc5 All small fixes targeted for stable: - Two fixes for USB-audio with malformed descriptor, spotted by fuzzers - Two fixes Conexant HD-audio codec wrt power management - Quirks for HD-audio AMD platform and HP laptop - HD-audio memory leak fix -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl1WXYkOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE8PdQ/7BBffAHVuC43sLno+AwRHUXyFBmkXKrheGaqu in4tjm2Usk79vxDeAgdHc/J/Ge26D+ZMHwKTlU9+7RVWeKkPcsrA5EigGmjPAl59 RbHktmVd47vRWQr2GdmMxJ8fGRkR+68qImujfHw0+iWWUPZJrrwOsrerzwNaFlNf siIIfD5yFzmEgjD8mQDT8PAp47x8tU46t6x85GkQ2BQZHulpkkemfA6H9nRfLQQz qsrdnBGTZJ+Pz2plFK0bhotWnb2F6amFxjJ6PI4/pesgVx9pMLVABvjQOujJqpi+ tr+K7wC3WADKDdSv5roA9iNKV09sMFxvCJX+49bCMsDWvF9mDrzHMeL/1O2rd9gg AAiSn1UT0RvXT1y7xdzJRc6xxD1paVANWT3qXQItapgFCI6Mhi1k4qUu4Vy1R/dr mdlt4NhGYlRRoNrZFoWpFjgzdDXnuo4UJA81sTTTUCYBO7PB62eKJJuzb8Q5qCew ay1WOOgM9QjYptA71nshD3UWF+V1I4imd7EtWMeZ/rBR/1CKxdH+QugFmw90Z2ew mVgQd4NjRtDhognJ8OTnVHr5PP/Wj/LwHdd11QYHZLV8YCCzJCIGMtxvNS0R3T1E 7zmBzzIOtUzegtFR6Ov5cRhuKh8zmxD1Rz63j/u9M3yFBmDftzbSFDd6uAOPpO6z AGonmQI= =9hVN -----END PGP SIGNATURE----- Merge tag 'sound-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All small fixes targeted for stable: - Two fixes for USB-audio with malformed descriptor, spotted by fuzzers - Two fixes Conexant HD-audio codec wrt power management - Quirks for HD-audio AMD platform and HP laptop - HD-audio memory leak fix" * tag 'sound-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit ALSA: hda - Add a generic reboot_notify ALSA: hda - Let all conexant codec enter D3 when rebooting ALSA: hda/realtek - Add quirk for HP Envy x360 ALSA: hda - Fix a memory leak bug ALSA: hda - Apply workaround for another AMD chip 1022:1487	2019-08-16 08:49:45 -07:00
Linus Torvalds	ec037ac244	drm fixes for 5.3-rc5 i915: - single GVT use after free fix scheduler: - entity destruction race fix amdgpu: - struct allocation fix - gfx9 soft recovery fix nouveau: - followup MST fix ast: - vga register race fix. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJdVi4dAAoJEAx081l5xIa+MWYP/2e4fqA03cee+ExwG3oILILQ 7aFeYk3pA/popEJsdYqUU4SWqAJhQ6DDpIex5K4ASIxEmz75Mt2HhohuYU6jQXmP auK0b162TeGJDhFkiRpK3rD0EidUYg4q7fq3eooLqqrU66Bli8OvYKHChoXcU4za jGbqywFHRW4VwxfLxJErLIz+wq6k8CMFie3vxGcEN3uQaX81KrVfPwhzauh+9GJd aN4rDsvyp0jzl/HRK9GtKBlkfNCUeSSZ/By7PMbUfTCgP2dkTFUuKpoieJ0JU3tR VnUEmZTtMweL/3f/yPfTkAFURYMjdeXkgnTR7DZv8y46zFbbMOsTFoBI7df9ef9k +QBmW0dMKcS16i+aJBckb5bDOYwt39hiscGuBporaNhaVrQLGtmxEQet5N4iEOBY 3P+yEXOvviz0WXh4WrbHkcz7tGTjITeiXZSNzcNueZj3n+2AN8EwDMJNuVPQbIh1 87Kb53Gbd3kHqHe5fJ82sHlRifjlPcQmKyd6+uk72ZewHQq2VPswtdnjcfmtuqpO cM3pwdZGtvfuQs6+EurzmXpNCzn32GdJdJY8A/q+zvekin5yn+RmW2eb9w/T1hV2 hP399EoMDNQQ750YKP4Paf3QdDSGtrY7UxSfUxVcYOKuGi1RzTBKndIGTv9IGLg9 PNLm3KaPKtLNx6yq4+OW =TusM -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2019-08-16' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Nothing too crazy this week, one amdgpu fix to use vmalloc for a struct that grew in size, and another MST fix for nouveau, and some other misc fixes: i915: - single GVT use after free fix scheduler: - entity destruction race fix amdgpu: - struct allocation fix - gfx9 soft recovery fix nouveau: - followup MST fix ast: - vga register race fix" * tag 'drm-fixes-2019-08-16' of git://anongit.freedesktop.org/drm/drm: drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes drm/ast: Fixed reboot test may cause system hanged drm/scheduler: use job count instead of peek drm/amd/display: use kvmalloc for dc_state (v2) drm/amdgpu: fix gfx9 soft recovery drm/i915: Use after free in error path in intel_vgpu_create_workload()	2019-08-16 08:41:15 -07:00
Rafael J. Wysocki	a3ee2477c4	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: schedutil: Don't skip freq update when limits change cpufreq: dev_pm_qos_update_request() can return 1 on success	2019-08-16 14:24:51 +02:00
Dave Airlie	a85abd5d45	Merge tag 'drm-intel-fixes-2019-08-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.4-rc5: - GVT use-after-free fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87zhkag9ic.fsf@intel.com	2019-08-16 12:41:52 +10:00
Hui Peng	19bce474c4	ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term `check_input_term` recursively calls itself with input from device side (e.g., uac_input_terminal_descriptor.bCSourceID) as argument (id). In `check_input_term`, if `check_input_term` is called with the same `id` argument as the caller, it triggers endless recursive call, resulting kernel space stack overflow. This patch fixes the bug by adding a bitmap to `struct mixer_build` to keep track of the checked ids and stop the execution if some id has been checked (similar to how parse_audio_unit handles unitid argument). Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Signed-off-by: Hui Peng <benquike@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2019-08-15 21:48:52 +02:00
Linus Torvalds	a69e90512d	Changes since last update: - Fix crashes when the attr fork isn't present due to errors but inode inactivation tries to zap the attr data anyway. - Convert more directory corruption debugging asserts to actual EFSCORRUPTED returns instead of blowing up later on. - Don't fail writeback just because we ran out of memory allocating metadata log data. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1RlXoACgkQ+H93GTRK tOtc7A/+JIidhI/MHQLs7Ab9GW+PsHHBMSbVTV4Ge+SlfZtPNI38zrC1MC5LWvvV bndOpRjLm4nOJcB7fsoEWufTs1dKOIUjk2yQi8x47ZvE+B/RcA4b6IDhwpbAI8GW kt1RLNec9kpzhxCFFPzsXT9MwjEvvOvTeXfxaXTmiuB2kbJkR5dTlCUS2nUDnqsG FGdmOUDjy1uVfFcSrp75KT/iYaqW08cG+uY/eUHRm+YMUKI8hF1t+n8cDnSg96VX IN2DT1d3dTWiiF+JUZnMhVwJvPgV95DOf+yYy/F7qOcJUEmQ9tD6+0Ml/cI/AeLG zERxHXM9A9Jy8S+2xkvf0J/+HStwfviWNToK3pbMIM1ZsoMTi9q8VgbB3AaFiijf C4Q4T3W0jC44om8X/Ta/c+G/64Tj8yenzLDeTHvtQkoq77QPBam/aYjBc79oYvHi r+R61kHNto+YjJsRbkwgF/S+bzru1qY9Ccr0LJZrUkSzh4d6p94fbQc+NX4L2sv7 WzAc+kOR/7qgVgy4gVr3ju0d89kP/Xn/0e0Ma0V8CSZlX5yg1dMLew5TJq693UYX xjLGD2ltOoFEN8e7/WXI0/ktvvSCAQalmz+sPgJvTlosUhpGXky85ced1PSrKiEV l0tREpmawDo9WVvC/06yBj97Op6PDdb4CovDcyLT6Yt3v1aBZT0= =ivN3 -----END PGP SIGNATURE----- Merge tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - Fix crashes when the attr fork isn't present due to errors but inode inactivation tries to zap the attr data anyway. - Convert more directory corruption debugging asserts to actual EFSCORRUPTED returns instead of blowing up later on. - Don't fail writeback just because we ran out of memory allocating metadata log data. * tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't crash on null attr fork xfs_bmapi_read xfs: remove more ondisk directory corruption asserts fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().	2019-08-15 12:29:36 -07:00
Linus Torvalds	4ec1fa692d	Changes since last update: - Update MAINTAINERS now that we've removed fs/iomap.c. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1UPLMACgkQ+H93GTRK tOvPVQ/+O9rOWD1udgyx3Qvn0mZhYgyKw8k3rHdZjRVV8fo5SIyS7cZRqi/B5WGf pRLHVWMGVeCiiRlJJzJLuD6tSJ04AOASO/VklRVbDpWkqRWqk3Hk8NiYt4KbgAcm P31I836I84L2qhfUN+EGemR7ByeH3XjSgNqRHRC60WiEs+gK5Em5loJfmohuBE29 +XIlBusc84rN6SfP4xBpj/SL3U2XTDkXQX4ESXbU80mOypVqTJMsDv4YLBbWZcCT ZZ9HV35duUQFUC5t98Z+KQ4buwVeTvuXpzbxvZRnT5uMcQGPboH8NseyenNTJvI2 YLMRoP9yQImxbhKTD+VTCzfVw6qGX7JrVMT0hYszK3uhSUNVUbD6ChhdeOHI5I+U BdD6t/pKsM/3k030jxrqrI/PMTPJRe2Pu1MKcK1qR7uptagUFBfVUVLz8uoXp4/U O/WOub72pLcbIJ2kCffknfvZkD7YnlB1qwNBHaMqdSZJp89ZL5zluXh58RvXv15e aU3UPzotC3qoCwz/CW5/R0oiUQeEs78jR91urKCGPaOQqRtgQatIccaxJqG9NsMz pHAY8Ouw8bx4q6Q1CNwrNVqVHJsh87pxvpBgw/VVGUhprwkNKaBG2UvVnu4CqkLN MW2KaI66dmCL+VbEK5ikPNF0YXXbQLDdtW4K/S3OIpW7H2IRgjo= =mOPT -----END PGP SIGNATURE----- Merge tag 'iomap-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull iomap fixlet from Darrick Wong: "A single update to the MAINTAINERS entry for iomap now that we've removed fs/iomap.c" * tag 'iomap-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: MAINTAINERS: iomap: Remove fs/iomap.c record	2019-08-15 12:15:45 -07:00
Linus Torvalds	3291204239	A few minor auxdisplay improvements: - A couple of small header cleanups for charlcd From Masahiro Yamada - A trivial typo fix for the sampels of cfag12864b From Masahiro Yamada - An Kconfig help text improvement for charlcd From Mans Rullgard - An error path fix for panel From zhengbin -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl1VeiAACgkQGXyLc2ht IW3GGhAAzvDyOYj6Qql0z8z0VtciPWXgf12gpuQQENfBxdTaqMDVkpirAN/D2h+R VT1lBenlOHbBYdNs73B97xbM7RRU6K0EsTCEzny9ScnHA8dWLXCVpUuEY1sadrAt s+7FKPLxHbLA8PNhMSViXhgFyXdMWnQToRFfnOEamMzKf66AlsUWrl0ESTvBKgD+ Ph6yVENuFqBMr6818mFG/NdnChNY6nKhPbmrMEUY9igzGxhkED1Y918qJuxILAi6 uyOM7SWZuJJHG+tGXvNYyp73qy6SFBP2iUJb3UIfkhzIwSvupHRFpEmTEtYS1PAa ADTzeMtY/y42NfYGzy17S3jEE0ew2W32yD/wh25Tjm34HXHnLAR1sbgohNdSskkm gBocO8N2QJ1IS3rULXHYqLuBfIvATiiOQCtBH2NynA23DJrEAG59RvkZYevz6Dfk cZeKp8eiJjiF7ji7LQLfNjOWcQcvXQHMhXLiJg34lakDhSHpz+abCpI7wFaUMISK 0cFLaHoST0yQc4KCqY5PuBPfU/7/HvhO+MQd2QBkws2htbKuHwt7KGbooA9Rsayx XNNZz6V0YgA/AMLaEASfiBDzJdXrTUBEaDu9bWqlhZ7cZ3fIRP2KvsM8OiGNobUE 946wffX7GzAgmoVFLKKdKGHeDVuDeG5kR5IcpUmKOWqsNXUs4f4= =xjfn -----END PGP SIGNATURE----- Merge tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux Pull auxdisplay fixes from Miguel Ojeda: "A few minor auxdisplay improvements: - A couple of small header cleanups for charlcd (Masahiro Yamada) - A trivial typo fix for the examples of cfag12864b (Masahiro Yamada) - An Kconfig help text improvement for charlcd (Mans Rullgard) - An error path fix for panel (zhengbin)" * tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux: auxdisplay: Fix a typo in cfag12864b-example.c auxdisplay: charlcd: add include guard to charlcd.h auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay auxdisplay: charlcd: add help text for backlight initial state auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach	2019-08-15 09:20:17 -07:00
Linus Torvalds	2b245b8b03	Devicetree fixes for 5.3: - Fix building DT binding examples for in tree builds - Correct some refcounting in adjust_local_phandle_references() - Update FSL FEC binding with deprecated properties - Schema fix in stm32 pinctrl - Fix typo in of_irq_parse_one docbook comment -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl1UwjkQHHJvYmhAa2Vy bmVsLm9yZwAKCRD6+121jbxhw6NsEACk9VYUmQhN4haogNkYGapJg3mvfuguzuLU ps8Wd61WDYqmtcHLC9bJaiYzaSrSzpgJcAAsU7ryJuesZD8XRe5jGFs494lBCGq9 wZeXSECFr898ThmHWtIbQmXvBd1sQjh+EhHn5u1G1Pzxyan5NeiPgJ4LM/sLRl0m kdunXbXzg1LxwEYuPiEV/DmPAHG6MPV98E9haQ3z82j167CuRpU0Hi0QQim7Dzu1 wc+f8hUD4X6hFJsHxho25mMvcSKypadzgv2wGEQve+GdDOSOrWOfXZUgHLopkNrm tDy2FZD39McX4Tuv90DDLVNulVbfZdXRyepgH666QOelelyIAWuE2mcUrV5sgms9 EPTmcQC73o/Vnx/ALpvhs5kYvxmwPH3xPfDiF4QEcC8uHZkM5Ldms0SVnAcuTlYY 8Z+rDmhR9GQwJ4oDk0ZwsovNWB7HICJRC7uxotqL6A5kY3WfbzlrfWiA8zV/j9h6 wHXlWH6hf8hUTnNSRV9mztL7m52qfDbALDQTShNitbKsUQgVPaKMwA1hcXsMGyPS 5wwpU1gtj6UbTeLLs+3IFQQUTx1cI3yVhXrpTUOf6ke24510NBWfyJrUPGQPH24E Cgd67Octcbxe+r6gReo4r0XNrDQwE+f7CC0YP9VZg3zmFlHhjlsTDoAddnLPDewn KcIYS6GfGA== =kE9S -----END PGP SIGNATURE----- Merge tag 'devicetree-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix building DT binding examples for in tree builds - Correct some refcounting in adjust_local_phandle_references() - Update FSL FEC binding with deprecated properties - Schema fix in stm32 pinctrl - Fix typo in of_irq_parse_one docbook comment * tag 'devicetree-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: irq: fix a trivial typo in a doc comment dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema dt-bindings: fec: explicitly mark deprecated properties of: resolver: Add of_node_put() before return and break dt-bindings: Fix generated example files getting added to schemas	2019-08-15 09:18:56 -07:00
Dave Airlie	2f62c5d6ed	Merge tag 'drm-fixes-5.3-2019-08-14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.3-2019-08-14: amdgpu: - Use kvalloc for dc_state to avoid allocation failures in some cases. - Fix gfx9 soft recovery scheduler: - Fix a race condition when destroying entities Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815024919.3434-1-alexander.deucher@amd.com	2019-08-15 13:29:18 +10:00
Lyude Paul	db1231ddc0	drm/nouveau: Only recalculate PBN/VCPI on mode/connector changes I -thought- I had fixed this entirely, but it looks like that I didn't test this thoroughly enough as we apparently still make one big mistake with nv50_msto_atomic_check() - we don't handle the following scenario: * CRTC #1 has n VCPI allocated to it, is attached to connector DP-4 which is attached to encoder #1. enabled=y active=n * CRTC #1 is changed from DP-4 to DP-5, causing: * DP-4 crtc=#1→NULL (VCPI n→0) * DP-5 crtc=NULL→#1 * CRTC #1 steals encoder #1 back from DP-4 and gives it to DP-5 * CRTC #1 maintains the same mode as before, just with a different connector * mode_changed=n connectors_changed=y (we _SHOULD_ do VCPI 0→n here, but don't) Once the above scenario is repeated once, we'll attempt freeing VCPI from the connector that we didn't allocate due to the connectors changing, but the mode staying the same. Sigh. Since nv50_msto_atomic_check() has broken a few times now, let's rethink things a bit to be more careful: limit both VCPI/PBN allocations to mode_changed \|\| connectors_changed, since neither VCPI or PBN should ever need to change outside of routing and mode changes. Changes since v1: * Fix accidental reversal of clock and bpp arguments in drm_dp_calc_pbn_mode() - William Lewis Signed-off-by: Lyude Paul <lyude@redhat.com> Reported-by: Bohdan Milar <bmilar@redhat.com> Tested-by: Bohdan Milar <bmilar@redhat.com> Fixes: 232c9eec417a ("drm/nouveau: Use atomic VCPI helpers for MST") References: 412e85b60531 ("drm/nouveau: Only release VCPI slots on mode changes") Cc: Lyude Paul <lyude@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@redhat.com> Cc: Jerry Zuo <Jerry.Zuo@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Juston Li <juston.li@intel.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <stable@vger.kernel.org> # v5.1+ Acked-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809005307.18391-1-lyude@redhat.com	2019-08-15 13:27:59 +10:00
Y.C. Chen	05b439711f	drm/ast: Fixed reboot test may cause system hanged There is another thread still access standard VGA I/O while loading drm driver. Disable standard VGA I/O decode to avoid this issue. Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/1523410059-18415-1-git-send-email-yc_chen@aspeedtech.com	2019-08-15 13:04:57 +10:00
Lubomir Rintel	83f82d7a42	of: irq: fix a trivial typo in a doc comment Diverged from what the code does with commit 530210c7814e ("of/irq: Replace of_irq with of_phandle_args"). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Rob Herring <robh@kernel.org>	2019-08-14 20:12:16 -06:00
Rob Herring	626633425c	dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema The proper way to add additional contraints to an existing json-schema is using 'allOf' to reference the base schema. Using just '$ref' doesn't work. Fix this for the 'st,syscfg' property. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-arm-kernel@lists.infradead.org Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>	2019-08-14 20:07:43 -06:00
Linus Torvalds	41de596340	Wimplicit-fallthrough patches for 5.3-rc5 Hi Linus, Please, pull the following patches that fix sh mainline builds: - Fix fall-through warning in sh. - Fix missing break bug in sh (this is a 10-year-old bug). Currently, mainline builds for sh are broken. These patches fix that. Thanks Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl1UVM0ACgkQRwW0y0cG 2zFwhhAAw4Of7EnmcEJwLAiIaE7O9JHewg3CoVyZr2jim+EVoXJErdBV3Kg7yTcf JH6eYicdA50zr38bQ+eVBxSAJvqsYry7ZZhgyWoucNZmFSRxuhyQh9ucIGiTVfSO k24Bb7n4new24rf2zo/JXpqD5mWEtKlQqJYp84rNovIbFwENnOimRU1f0H64pdRg 4UH578lsJTuw8DL1x+sc8ZxYvTa8YbaEcrwMnexiYR8ZAtBMfJzIRcKNwScxUe86 +WKfipUTmidtLMZU9J9I9jQYFyX/o/Dkenq+6uxeephl0M7mUbsA0xc9oHLEnc3Q ieOduorxTgFb13SyRXhdW5frTGcMWJufoyD7+pTsCE09ws1PdXcxQ1CWPN09lG3/ R+3ceARfakuvBnCchawpses64zjlq6As68WfW1BBnwhl5sAr9uE/eDBLxXb/tYc3 qnJ5q32Vj9c7exLL95BtlZwm2Xaz687ZYhdZB7nsoYEA2CvExmriMFt9pl6SgRlQ ixkyJs0uDh6Fy+RawTFMS7/1NFRI8I/3r2vzqTN7ykFd5QsR9HCLfi2xcz/BCy84 bsd6WfdtrGolpnodnyUx3C99wO5yjRGJc0H17W2Lz28JXYFFlbQuambNV18krN1v CHrVpqmZybWf6hfNtedwdGcvv7AR46IMdULfPUS26O0nt1T67aU= =hnDA -----END PGP SIGNATURE----- Merge tag 'Wimplicit-fallthrough-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo A. R. Silva: "Fix sh mainline builds: - Fix fall-through warning in sh. - Fix missing break bug in sh (this is a 10-year-old bug) Currently, mainline builds for sh are broken. These patches fix that" * tag 'Wimplicit-fallthrough-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: sh: kernel: hw_breakpoint: Fix missing break in switch statement sh: kernel: disassemble: Mark expected switch fall-throughs	2019-08-14 15:29:53 -07:00
Linus Torvalds	e22a97a2a8	AFS Fixes Reviewed-by: Marc Dionne <marc.dionne@auristor.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1UELcACgkQ+7dXa6fL C2viWw//eDLvElBjaQsabfpMOVGkf02t+zCzNfKSC0KdM1+GUZ1FyXQ0UAtgwICY Sp01h/RZa0V07sfYP7R5kL4/KIMdODmhrP0iiHDpoMjKCL7qR9tFbJDAcHtH8xz2 52UV2dmdDBI/wdw/i5dn6M02SoYAQMl1XT49SkzhFSELVchkpraGsf1vf4yITeVe eI1TaOxI+TUaeH5f6+KWp6c8K8q70p3KfrR2VmCWkBrD7PNg9lp19pVnz8tdofYu xURHQbJulSqM+mY7pcNBOi2iWy3dCLjBTkVJIwIhZcZqLThACY38SSaPtmdhgif4 wcyyZUtd8EGPzPPqbfCx7ycTIIDtL/r98XtGyiTJBKrCK+flZONdu0g/oIzvJ/Wu hV4+ButxCuMakbLOe+Hew3lhHFOy7m9XZtOURzxzZSm9uazHDMxnw4ocxIOs24F1 qus1sG0+rlVDcMYjo2tKEAzOl/ZejJ/NUTd60ANIWKTHply2/2/5dH94B0yLwDnp tfifBrBkyqFB4XUKGvqvvJczl0d7+zsEScs4VQLVO/WhATjj6jNnrYKgwvBS5pCM 890qUzj3TRW7ciZLi0THMEHBlEfbEWhNCaggAqieIvbKv7t4Kh2cUBaIsxo4IYqU PBZZhFXRul5ocTJrV9pScl4RbzxE5V0j9cwSiiWnzZL1sQucIgQ= =zivP -----END PGP SIGNATURE----- Merge tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs fixes from David Howells: - Fix the CB.ProbeUuid handler to generate its reply correctly. - Fix a mix up in indices when parsing a Volume Location entry record. - Fix a potential NULL-pointer deref when cleaning up a read request. - Fix the expected data version of the destination directory in afs_rename(). - Fix afs_d_revalidate() to only update d_fsdata if it's not the same as the directory data version to reduce the likelihood of overwriting the result of a competing operation. (d_fsdata carries the directory DV or the least-significant word thereof). - Fix the tracking of the data-version on a directory and make sure that dentry objects get properly initialised, updated and revalidated. Also fix rename to update d_fsdata to match the new directory's DV if the dentry gets moved over and unhash the dentry to stop afs_d_revalidate() from interfering. * tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix missing dentry data version updating afs: Only update d_fsdata if different in afs_d_revalidate() afs: Fix off-by-one in afs_rename() expected data version calculation fs: afs: Fix a possible null-pointer dereference in afs_put_read() afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u() afs: Fix the CB.ProbeUuid service handler to reply correctly	2019-08-14 14:21:14 -07:00
Christian König	e1b4ce25db	drm/scheduler: use job count instead of peek The spsc_queue_peek function is accessing queue->head which belongs to the consumer thread and shouldn't be accessed by the producer This is fixing a rare race condition when destroying entities. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Monk.liu@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-14 15:45:53 -05:00
Vincent Chen	69703eb9a8	riscv: Make __fstate_clean() work correctly. Make the __fstate_clean() function correctly set the state of sstatus.FS in pt_regs to SR_FS_CLEAN. Fixes: 7db91e57a0acd ("RISC-V: Task implementation") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> [paul.walmsley@sifive.com: expanded "Fixes" commit ID] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2019-08-14 13:20:46 -07:00
Vincent Chen	8ac71d7e46	riscv: Correct the initialized flow of FP register The following two reasons cause FP registers are sometimes not initialized before starting the user program. 1. Currently, the FP context is initialized in flush_thread() function and we expect these initial values to be restored to FP register when doing FP context switch. However, the FP context switch only occurs in switch_to function. Hence, if this process does not be scheduled out and scheduled in before entering the user space, the FP registers have no chance to initialize. 2. In flush_thread(), the state of reg->sstatus.FS inherits from the parent. Hence, the state of reg->sstatus.FS may be dirty. If this process is scheduled out during flush_thread() and initializing the FP register, the fstate_save() in switch_to will corrupt the FP context which has been initialized until flush_thread(). To solve the 1st case, the initialization of the FP register will be completed in start_thread(). It makes sure all FP registers are initialized before starting the user program. For the 2nd case, the state of reg->sstatus.FS in start_thread will be set to SR_FS_OFF to prevent this process from corrupting FP context in doing context save. The FP state is set to SR_FS_INITIAL in start_trhead(). Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 7db91e57a0acd ("RISC-V: Task implementation") Cc: stable@vger.kernel.org [paul.walmsley@sifive.com: fixed brace alignment issue reported by checkpatch] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2019-08-14 13:11:11 -07:00
Linus Torvalds	a8dba0531b	Pull request for 5.3-rc3 - Fix a memory registration release flow issue that was causing a WARN_ON (mlx5) - If the counters for a port aren't allocated, then we can't do operations on the non-existent counters (core) - Check the right variable for error code result (mlx5) - Fix a use after free issue (mlx5) - Fix an off by one memory leak (siw) - Actually return an error code on error (core) - Allow siw to be built on 32bit arches (siw, ABI change, but OK since siw was just merged this merge window and there is no prior released kernel to maintain compatibility with and we also updated the rdma-core user space package to match) Signed-off-by: Doug Ledford <dledford@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAl1UH9kACgkQuCajMw5X L922QQ/+ON5Vhb3CZkv1K6mVk/+sXSBIkeceoBJCw46XjVkYQaiE46DyonLDOwco 4z6caV5HmS0CDY7VuoCuA3OmvsYEYWpLi0ktyRJIaRJtWnJmYmVLju8ORrD6s709 FBe7Ay9pE6VIXXbDz2np3aAZW1EL1dPr6fBccHZWvGjb6bwu+a2HbZlIdtKKBRgf r+bp9G3M5FKL0RTGSy+S+w/xO0Ntc0Nbo0RRj+/4sRdxjTdx+B1sLxPya5AgycF9 kQ/a+/mppmfmXe0/PzL30rvbmf29ocodYHokb+OTc1Mwll6yc9Yo3BOlvZmK+EYG yyYXK23MkJDoJ7qaSI7cbiEd5pY2EgSABBKPv5b5wqt03AM0qdRpEUdPSbBZF0tv Lt/i2pke13R+TW3u2e8sY8iHWHC8+GDOyWFiVmrpEcoP80hfRKDkiULv5vrvFzVP 3XOG1z5hHDmZ4jJtHCjCNJLi1+/AxhYIaPSRyJnL5R5cJGX/hXOSex+OsjbcAx7o djVTRbR1JOx603NX4sYgpLcn1TEPvaxKXcrqP8Nhj++xgZWNNfDw0RBk8jICYkOq k+tt70hq1ME0DvsJZiV2vyyVR/o5Amj7o7cdUtT3T2IDJAK1jbrNVD79VrXqJecq laOmge4M40pHPvFs/gtVuQsqsM7YHa1urX+vrFsG3i7QpMDekIo= =misR -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Doug Ledford: "Fairly small pull request for -rc3. I'm out of town the rest of this week, so I made sure to clean out as much as possible from patchworks in enough time for 0-day to chew through it (Yay! for 0-day being back online! :-)). Jason might send through any emergency stuff that could pop up, otherwise I'm back next week. The only real thing of note is the siw ABI change. Since we just merged siw this release, there are no prior kernel releases to maintain kernel ABI with. I told Bernard that if there is anything else about the siw ABI he thinks he might want to change before it goes set in stone, he should get it in ASAP. The siw module was around for several years outside the kernel tree, and it had to be revamped considerably for inclusion upstream, so we are making no attempts to be backward compatible with the out of tree version. Once 5.3 is actually released, we will have our baseline ABI to maintain. Summary: - Fix a memory registration release flow issue that was causing a WARN_ON (mlx5) - If the counters for a port aren't allocated, then we can't do operations on the non-existent counters (core) - Check the right variable for error code result (mlx5) - Fix a use after free issue (mlx5) - Fix an off by one memory leak (siw) - Actually return an error code on error (core) - Allow siw to be built on 32bit arches (siw, ABI change, but OK since siw was just merged this merge window and there is no prior released kernel to maintain compatibility with and we also updated the rdma-core user space package to match)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Change CQ flags from 64->32 bits RDMA/core: Fix error code in stat_get_doit_qp() RDMA/siw: Fix a memory leak in siw_init_cpulist() IB/mlx5: Fix use-after-free error while accessing ev_file pointer IB/mlx5: Check the correct variable in error handling code RDMA/counter: Prevent QP counter binding if counters unsupported IB/mlx5: Fix implicit MR release flow	2019-08-14 11:10:38 -07:00
Hui Peng	daac07156b	ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit The `uac_mixer_unit_descriptor` shown as below is read from the device side. In `parse_audio_mixer_unit`, `baSourceID` field is accessed from index 0 to `bNrInPins` - 1, the current implementation assumes that descriptor is always valid (the length of descriptor is no shorter than 5 + `bNrInPins`). If a descriptor read from the device side is invalid, it may trigger out-of-bound memory access. ``` struct uac_mixer_unit_descriptor { __u8 bLength; __u8 bDescriptorType; __u8 bDescriptorSubtype; __u8 bUnitID; __u8 bNrInPins; __u8 baSourceID[]; } ``` This patch fixes the bug by add a sanity check on the length of the descriptor. Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Cc: <stable@vger.kernel.org> Signed-off-by: Hui Peng <benquike@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2019-08-14 20:05:56 +02:00
Linus Torvalds	e83b009c5c	dma-mapping fixes for 5.3-rc - fix the handling of the bus_dma_mask in dma_get_required_mask, which caused a regression in this merge window (Lucas Stach) - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me) - fix dma_mmap_coherent to not cause page attribute mismatches on coherent architectures like x86 (me) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl1UFhILHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOjexAAjPKLo4WGBGO1nd0btwXcI9A7jQTQlXrokmorDVzx 5++GmTUBeEgvUJath5D3qpQTRZXo9Wb9oGMdS5U6bWJB+SbWtErM304t905TJoDM Cs7xcB1ZQeG/5OrQ+qGPgQCo6WO1dOl9FpaIptjNm4dn+OYhyO/YA+dgrJDwgkiA 140RYUWa+Zhq3df4YqP4M4EnezLN1c4uE80wUxVQKDcq59sxCJek0QT0pUAMbdmQ /cUd2XSU113o1llmIRUh0Oj6VSEhWKHb+bdb8JfGndLzxvDcXZKl60tikWe6xpy2 Ue0kkHRk6OPVRIxWkRjt8D+mlrCyNqN6HWx6eBmVnRKHxZ4ia2hYOFuYN9FFLLK+ kCUlu5P/HUabBedKIxk4rbWITUqcRSviPD2WdnH2RWblvXNSDoSAufYuJ/9IGSoL P6a43DVKFesVF/MxeH9Ko8bnxMUO9Zn97GHcQIUplRwaqrnrCEPlvLVf/teswSQG C13rTnouZ0FA4z/uV96G6HfGIj87MLe/RovmLCMTeiSKrDpbcO7szP037Km73M+V UBmatoYCioVLxBjw3NkxCRc9UpDPdRUu31uVHrAarh4tutUASEWLrb6s9vFlGyED zis9IHWtIAYP3VfFtkXdZ7oDlqC/3KdEErHZuT+z4PK3Wj/QtQVfQ8SB79xFMneD V2E= =Jzmo -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix the handling of the bus_dma_mask in dma_get_required_mask, which caused a regression in this merge window (Lucas Stach) - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me) - fix dma_mmap_coherent to not cause page attribute mismatches on coherent architectures like x86 (me) * tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix page attributes for dma_mmap_* dma-direct: don't truncate dma_required_mask to bus addressing capabilities dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING	2019-08-14 10:31:11 -07:00
Linus Torvalds	b5e33e44d9	IOMMU Fixes for Linux v5.3-rc4 Including: - A couple more fixes for the Intel VT-d driver for bugs introduced during the recent conversion of this driver to use IOMMU core default domains. - Fix for common dma-iommu code to make sure MSI mappings happen in the correct domain for a device. - Fix a corner case in the handling of sg-lists in dma-iommu code that might cause dma_length to be truncated. - Mark a switch as fall-through in arm-smmu code. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl1UFQIACgkQK/BELZcB GuNW1A//Y86ZFRSVCA/+ZiHgADwqsof1/Cdc1Ou1tXMbINbyWvWyT5t8JtplYsEJ 17xlS7l2M9x1VCljzr3fTfBMGu8+CQY2KT6YJliLQZzrQ6LKoxmscCmg6DmH4Gjy CfoRLBXCKTm1F8aNt7f/XupuI+OGpq8h/VPDxYqZZIGKxsMfOH8ZIzF7DjDO2MxS NROjwAyVMZdzR5X/dM1dYK0zwxQvgRGEx8gdGssoyUCJvGdAyQXym30j8esNWJ6J okXVpuQoX/CJQLZP/xF8psWcL+0IJSyd3G90ToBRsoLDc50a4qTdelGvGkVHmU8L WVm+x7GjJrWZieqUtFnW/X7p4qSZdNMIK9c/+/cKg+BxyAKE9FqUJzg6UaSpzTbk XVh0jSiSq7/txU8pyGhEDQxgg4xbIUA5x1gqnqFm8k9Noz1/+AhfdyEUFzIHeE0s XwBfVVGzP2NW5zi97NebEuYsbHgDDSnR9sEKxhhq6G30vrwHEfg/MzdvNp6EupNp J1DnWD0DgMlYMxjZ8YskrSI7/MFB5PCxj/InwAXRZmlPPmlWIRTJfUtwYmhlkoLS zCxfS/sIof9C1pU7noe1WwOz8ylVPeQO3KvBIVhy3WJcVnCDlYX7/Uf/z/sU/d0Z Hd3/PQ6F6xTUEBzXKOFG/3y9EUQuoYP/fckFM4vmH9OEvYWmWqc= =+b4H -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - A couple more fixes for the Intel VT-d driver for bugs introduced during the recent conversion of this driver to use IOMMU core default domains. - Fix for common dma-iommu code to make sure MSI mappings happen in the correct domain for a device. - Fix a corner case in the handling of sg-lists in dma-iommu code that might cause dma_length to be truncated. - Mark a switch as fall-through in arm-smmu code. * tag 'iommu-fixes-v5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix possible use-after-free of private domain iommu/vt-d: Detach domain before using a private one iommu/dma: Handle SG length overflow better iommu/vt-d: Correctly check format of page table in debugfs iommu/vt-d: Detach domain when move device out of group iommu/arm-smmu: Mark expected switch fall-through iommu/dma: Handle MSI mappings separately	2019-08-14 10:16:59 -07:00
Linus Torvalds	cab6d5b66b	Merge branch 'akpm' (patches from Andrew) Merge misc VM fixes from Andrew Morton: "A bunch of hotfixes, all affecting mm/. The two-patch series from Andrea may be controversial. This restores patches which were reverted in Dec 2018 due to a regression report []. After extensive discussion it is evident that the problems which these patches solved were significantly more serious than the problems they introduced. I am told that major distros are already carrying these two patches for this reason" [] See https://lore.kernel.org/lkml/alpine.DEB.2.21.1812061343240.144733@chino.kir.corp.google.com/ https://lore.kernel.org/lkml/alpine.DEB.2.21.1812031545560.161134@chino.kir.corp.google.com/ for the google-specific issues brought up by David Rijentes. And as Andrew says: "I'm unaware of anyone else who will be adversely affected by this, and google already carries over a thousand kernel patches - another won't kill them. There has been sporadic discussion about fixing these things for real but it's clear that nobody apart from David is particularly motivated" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS mm, vmscan: do not special-case slab reclaim when watermarks are boosted Revert "mm, thp: restore node-local hugepage allocations" Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used seq_file: fix problem when seeking mid-record mm: workingset: fix vmstat counters for shadow nodes mm/usercopy: use memory range to be accessed for wraparound check mm: kmemleak: disable early logging in case of error mm/vmalloc.c: fix percpu free VM area search criteria mm/memcontrol.c: fix use after free in mem_cgroup_iter() mm/z3fold.c: fix z3fold_destroy_pool() race condition mm/z3fold.c: fix z3fold_destroy_pool() ordering mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified mm/hmm: fix bad subpage pointer in try_to_unmap_one mm/hmm: fix ZONE_DEVICE anon page mapping reuse mm: document zone device struct page field usage	2019-08-14 09:53:46 -07:00
Hui Wang	871b906602	ALSA: hda - Add a generic reboot_notify Make codec enter D3 before rebooting or poweroff can fix the noise issue on some laptops. And in theory it is harmless for all codecs to enter D3 before rebooting or poweroff, let us add a generic reboot_notify, then realtek and conexant drivers can call this function. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2019-08-14 08:38:23 +02:00
Hui Wang	401714d953	ALSA: hda - Let all conexant codec enter D3 when rebooting We have 3 new lenovo laptops which have conexant codec 0x14f11f86, these 3 laptops also have the noise issue when rebooting, after letting the codec enter D3 before rebooting or poweroff, the noise disappers. Instead of adding a new ID again in the reboot_notify(), let us make this function apply to all conexant codec. In theory make codec enter D3 before rebooting or poweroff is harmless, and I tested this change on a couple of other Lenovo laptops which have different conexant codecs, there is no side effect so far. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2019-08-14 08:38:16 +02:00
Alistair Francis	d568cb3f93	riscv: defconfig: Update the defconfig Update the defconfig: - Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable VirtIORNG when running on QEMU Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2019-08-13 19:26:42 -07:00
Alistair Francis	500bc2c1f4	riscv: rv32_defconfig: Update the defconfig Update the rv32_defconfig: - Add 'CONFIG_DEVTMPFS_MOUNT=y' to match the RISC-V defconfig - Add CONFIG_HW_RANDOM=y and CONFIG_HW_RANDOM_VIRTIO=y to enable VirtIORNG when running on QEMU Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2019-08-13 19:26:38 -07:00
Mike Kravetz	4643d67e8c	hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in the kernel-v5.2.3 testing. This is caused by a race between hugetlb page migration and page fault. If a hugetlb page can not be allocated to satisfy a page fault, the task is sent SIGBUS. This is normal hugetlbfs behavior. A hugetlb fault mutex exists to prevent two tasks from trying to instantiate the same page. This protects against the situation where there is only one hugetlb page, and both tasks would try to allocate. Without the mutex, one would fail and SIGBUS even though the other fault would be successful. There is a similar race between hugetlb page migration and fault. Migration code will allocate a page for the target of the migration. It will then unmap the original page from all page tables. It does this unmap by first clearing the pte and then writing a migration entry. The page table lock is held for the duration of this clear and write operation. However, the beginnings of the hugetlb page fault code optimistically checks the pte without taking the page table lock. If clear (as it can be during the migration unmap operation), a hugetlb page allocation is attempted to satisfy the fault. Note that the page which will eventually satisfy this fault was already allocated by the migration code. However, the allocation within the fault path could fail which would result in the task incorrectly being sent SIGBUS. Ideally, we could take the hugetlb fault mutex in the migration code when modifying the page tables. However, locks must be taken in the order of hugetlb fault mutex, page lock, page table lock. This would require significant rework of the migration code. Instead, the issue is addressed in the hugetlb fault code. After failing to allocate a huge page, take the page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Cyril Hrubis <chrubis@suse.cz> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:53 -07:00
Mel Gorman	28360f3987	mm, vmscan: do not special-case slab reclaim when watermarks are boosted Dave Chinner reported a problem pointing a finger at commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs"). The report is extensive: https://lore.kernel.org/linux-mm/20190807091858.2857-1-david@fromorbit.com/ and it's worth recording the most relevant parts (colorful language and typos included). When running a simple, steady state 4kB file creation test to simulate extracting tarballs larger than memory full of small files into the filesystem, I noticed that once memory fills up the cache balance goes to hell. The workload is creating one dirty cached inode for every dirty page, both of which should require a single IO each to clean and reclaim, and creation of inodes is throttled by the rate at which dirty writeback runs at (via balance dirty pages). Hence the ingest rate of new cached inodes and page cache pages is identical and steady. As a result, memory reclaim should quickly find a steady balance between page cache and inode caches. The moment memory fills, the page cache is reclaimed at a much faster rate than the inode cache, and evidence suggests that the inode cache shrinker is not being called when large batches of pages are being reclaimed. In roughly the same time period that it takes to fill memory with 50% pages and 50% slab caches, memory reclaim reduces the page cache down to just dirty pages and slab caches fill the entirety of memory. The LRU is largely full of dirty pages, and we're getting spikes of random writeback from memory reclaim so it's all going to shit. Behaviour never recovers, the page cache remains pinned at just dirty pages, and nothing I could tune would make any difference. vfs_cache_pressure makes no difference - I would set it so high it should trim the entire inode caches in a single pass, yet it didn't do anything. It was clear from tracing and live telemetry that the shrinkers were pretty much not running except when there was absolutely no memory free at all, and then they did the minimum necessary to free memory to make progress. So I went looking at the code, trying to find places where pages got reclaimed and the shrinkers weren't called. There's only one - kswapd doing boosted reclaim as per commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs"). The watermark boosting introduced by the commit is triggered in response to an allocation "fragmentation event". The boosting was not intended to target THP specifically and triggers even if THP is disabled. However, with Dave's perfectly reasonable workload, fragmentation events can be very common given the ratio of slab to page cache allocations so boosting remains active for long periods of time. As high-order allocations might use compaction and compaction cannot move slab pages the decision was made in the commit to special-case kswapd when watermarks are boosted -- kswapd avoids reclaiming slab as reclaiming slab does not directly help compaction. As Dave notes, this decision means that slab can be artificially protected for long periods of time and messes up the balance with slab and page caches. Removing the special casing can still indirectly help avoid fragmentation by avoiding fragmentation-causing events due to slab allocation as pages from a slab pageblock will have some slab objects freed. Furthermore, with the special casing, reclaim behaviour is unpredictable as kswapd sometimes examines slab and sometimes does not in a manner that is tricky to tune or analyse. This patch removes the special casing. The downside is that this is not a universal performance win. Some benchmarks that depend on the residency of data when rereading metadata may see a regression when slab reclaim is restored to its original behaviour. Similarly, some benchmarks that only read-once or write-once may perform better when page reclaim is too aggressive. The primary upside is that slab shrinker is less surprising (arguably more sane but that's a matter of opinion), behaves consistently regardless of the fragmentation state of the system and properly obeys VM sysctls. A fsmark benchmark configuration was constructed similar to what Dave reported and is codified by the mmtest configuration config-io-fsmark-small-file-stream. It was evaluated on a 1-socket machine to avoid dealing with NUMA-related issues and the timing of reclaim. The storage was an SSD Samsung Evo and a fresh trimmed XFS filesystem was used for the test data. This is not an exact replication of Dave's setup. The configuration scales its parameters depending on the memory size of the SUT to behave similarly across machines. The parameters mean the first sample reported by fs_mark is using 50% of RAM which will barely be throttled and look like a big outlier. Dave used fake NUMA to have multiple kswapd instances which I didn't replicate. Finally, the number of iterations differ from Dave's test as the target disk was not large enough. While not identical, it should be representative. fsmark 5.3.0-rc3 5.3.0-rc3 vanilla shrinker-v1r1 Min 1-files/sec 4444.80 ( 0.00%) 4765.60 ( 7.22%) 1st-qrtle 1-files/sec 5005.10 ( 0.00%) 5091.70 ( 1.73%) 2nd-qrtle 1-files/sec 4917.80 ( 0.00%) 4855.60 ( -1.26%) 3rd-qrtle 1-files/sec 4667.40 ( 0.00%) 4831.20 ( 3.51%) Max-1 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-5 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-10 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Max-90 1-files/sec 4649.60 ( 0.00%) 4780.70 ( 2.82%) Max-95 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%) Max-99 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%) Max 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%) Hmean 1-files/sec 5004.75 ( 0.00%) 5075.96 ( 1.42%) Stddev 1-files/sec 1778.70 ( 0.00%) 1369.66 ( 23.00%) CoeffVar 1-files/sec 33.70 ( 0.00%) 26.05 ( 22.71%) BHmean-99 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%) BHmean-95 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%) BHmean-90 1-files/sec 5107.05 ( 0.00%) 5131.41 ( 0.48%) BHmean-75 1-files/sec 5208.45 ( 0.00%) 5206.68 ( -0.03%) BHmean-50 1-files/sec 5405.53 ( 0.00%) 5381.62 ( -0.44%) BHmean-25 1-files/sec 6179.75 ( 0.00%) 6095.14 ( -1.37%) 5.3.0-rc3 5.3.0-rc3 vanillashrinker-v1r1 Duration User 501.82 497.29 Duration System 4401.44 4424.08 Duration Elapsed 8124.76 8358.05 This is showing a slight skew for the max result representing a large outlier for the 1st, 2nd and 3rd quartile are similar indicating that the bulk of the results show little difference. Note that an earlier version of the fsmark configuration showed a regression but that included more samples taken while memory was still filling. Note that the elapsed time is higher. Part of this is that the configuration included time to delete all the test files when the test completes -- the test automation handles the possibility of testing fsmark with multiple thread counts. Without the patch, many of these objects would be memory resident which is part of what the patch is addressing. There are other important observations that justify the patch. 1. With the vanilla kernel, the number of dirty pages in the system is very low for much of the test. With this patch, dirty pages is generally kept at 10% which matches vm.dirty_background_ratio which is normal expected historical behaviour. 2. With the vanilla kernel, the ratio of Slab/Pagecache is close to 0.95 for much of the test i.e. Slab is being left alone and dominating memory consumption. With the patch applied, the ratio varies between 0.35 and 0.45 with the bulk of the measured ratios roughly half way between those values. This is a different balance to what Dave reported but it was at least consistent. 3. Slabs are scanned throughout the entire test with the patch applied. The vanille kernel has periods with no scan activity and then relatively massive spikes. 4. Without the patch, kswapd scan rates are very variable. With the patch, the scan rates remain quite steady. 4. Overall vmstats are closer to normal expectations 5.3.0-rc3 5.3.0-rc3 vanilla shrinker-v1r1 Ops Direct pages scanned 99388.00 328410.00 Ops Kswapd pages scanned 45382917.00 33451026.00 Ops Kswapd pages reclaimed 30869570.00 25239655.00 Ops Direct pages reclaimed 74131.00 5830.00 Ops Kswapd efficiency % 68.02 75.45 Ops Kswapd velocity 5585.75 4002.25 Ops Page reclaim immediate 1179721.00 430927.00 Ops Slabs scanned 62367361.00 73581394.00 Ops Direct inode steals 2103.00 1002.00 Ops Kswapd inode steals 570180.00 5183206.00 o Vanilla kernel is hitting direct reclaim more frequently, not very much in absolute terms but the fact the patch reduces it is interesting o "Page reclaim immediate" in the vanilla kernel indicates dirty pages are being encountered at the tail of the LRU. This is generally bad and means in this case that the LRU is not long enough for dirty pages to be cleaned by the background flush in time. This is much reduced by the patch. o With the patch, kswapd is reclaiming 10 times more slab pages than with the vanilla kernel. This is indicative of the watermark boosting over-protecting slab A more complete set of tests were run that were part of the basis for introducing boosting and while there are some differences, they are well within tolerances. Bottom line, the special casing kswapd to avoid slab behaviour is unpredictable and can lead to abnormal results for normal workloads. This patch restores the expected behaviour that slab and page cache is balanced consistently for a workload with a steady allocation ratio of slab/pagecache pages. It also means that if there are workloads that favour the preservation of slab over pagecache that it can be tuned via vm.vfs_cache_pressure where as the vanilla kernel effectively ignores the parameter when boosting is active. Link: http://lkml.kernel.org/r/20190808182946.GM2739@techsingularity.net Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Dave Chinner <dchinner@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:53 -07:00
Andrea Arcangeli	a8282608c8	Revert "mm, thp: restore node-local hugepage allocations" This reverts commit 2f0799a0ffc033b ("mm, thp: restore node-local hugepage allocations"). commit 2f0799a0ffc033b was rightfully applied to avoid the risk of a severe regression that was reported by the kernel test robot at the end of the merge window. Now we understood the regression was a false positive and was caused by a significant increase in fairness during a swap trashing benchmark. So it's safe to re-apply the fix and continue improving the code from there. The benchmark that reported the regression is very useful, but it provides a meaningful result only when there is no significant alteration in fairness during the workload. The removal of __GFP_THISNODE increased fairness. __GFP_THISNODE cannot be used in the generic page faults path for new memory allocations under the MPOL_DEFAULT mempolicy, or the allocation behavior significantly deviates from what the MPOL_DEFAULT semantics are supposed to be for THP and 4k allocations alike. Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag set to "madvise") has never meant to provide an implicit MPOL_BIND on the "current" node the task is running on, causing swap storms and providing a much more aggressive behavior than even zone_reclaim_node = 3. Any workload who could have benefited from __GFP_THISNODE has now to enable zone_reclaim_mode=1\|\|2\|\|3. __GFP_THISNODE implicitly provided the zone_reclaim_mode behavior, but it only did so if THP was enabled: if THP was disabled, there would have been no chance to get any 4k page from the current node if the current node was full of pagecache, which further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE. MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode semantics, in fact the two are orthogonal, zone_reclaim_mode = 1\|2\|3 must work exactly the same with MADV_HUGEPAGE set or not. The performance characteristic of memory depends on the hardware details. The numbers below are obtained on Naples/EPYC architecture and the N/A projection extends them to show what we should aim for in the future as a good THP NUMA locality default. The benchmark used exercises random memory seeks (note: the cost of the page faults is not part of the measurement). D0 THP \| D0 4k \| D1 THP \| D1 4k \| D2 THP \| D2 4k \| D3 THP \| D3 4k \| ... 0% \| +43% \| +45% \| +106% \| +131% \| +224% \| N/A \| N/A D0 means distance zero (i.e. local memory), D1 means distance one (i.e. intra socket memory), D2 means distance two (i.e. inter socket memory), etc... For the guest physical memory allocated by qemu and for guest mode kernel the performance characteristic of RAM is more complex and an ideal default could be: D0 THP \| D1 THP \| D0 4k \| D2 THP \| D1 4k \| D3 THP \| D2 4k \| D3 4k \| ... 0% \| +58% \| +101% \| N/A \| +222% \| N/A \| N/A \| N/A NOTE: the N/A are projections and haven't been measured yet, the measurement in this case is done on a 1950x with only two NUMA nodes. The THP case here means THP was used both in the host and in the guest. After applying this commit the THP NUMA locality order that we'll get out of MADV_HUGEPAGE is this: D0 THP \| D1 THP \| D2 THP \| D3 THP \| ... \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Before this commit it was: D0 THP \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Even if we ignore the breakage of large workloads that can't fit in a single node that the __GFP_THISNODE implicit "current node" mbind caused, the THP NUMA locality order provided by __GFP_THISNODE was still not the one we shall aim for in the long term (i.e. the first one at the top). After this commit is applied, we can introduce a new allocator multi order API and to replace those two alloc_pages_vmas calls in the page fault path, with a single multi order call: unsigned int order = (1 << HPAGE_PMD_ORDER) \| (1 << 0); page = alloc_pages_multi_order(..., &order); if (!page) goto out; if (!(order & (1 << 0))) { VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER); /* THP fault / } else { VM_WARN_ON(order != 1 << 0); / 4k fallback */ } The page allocator logic has to be altered so that when it fails on any zone with order 9, it has to try again with a order 0 before falling back to the next zone in the zonelist. After that we need to do more measurements and evaluate if adding an opt-in feature for guest mode is worth it, to swap "DN 4k \| DN+1 THP" with "DN+1 THP \| DN 4k" at every NUMA distance crossing. Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Andrea Arcangeli	92717d429b	Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings". The fixes for what was originally reported as "pathological THP behavior" we rightfully reverted to be sure not to introduced regressions at end of a merge window after a severe regression report from the kernel bot. We can safely re-apply them now that we had time to analyze the problem. The mm process worked fine, because the good fixes were eventually committed upstream without excessive delay. The regression reported by the kernel bot however forced us to revert the good fixes to be sure not to introduce regressions and to give us the time to analyze the issue further. The silver lining is that this extra time allowed to think more at this issue and also plan for a future direction to improve things further in terms of THP NUMA locality. This patch (of 2): This reverts commit 356ff8a9a78fb35d ("Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies 89c83fb539f954 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). Consolidation of the THP allocation flags at the same place was meant to be a clean up to easier handle otherwise scattered code which is imposing a maintenance burden. There were no real problems observed with the gfp mask consolidation but the reversion was rushed through without a larger consensus regardless. This patch brings the consolidation back because this should make the long term maintainability easier as well as it should allow future changes to be less error prone. [mhocko@kernel.org: changelog additions] Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Qian Cai	0cfaee2af3	include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used A compiler throws a warning on an arm64 system since commit 9849a5697d3d ("arch, mm: convert all architectures to use 5level-fixup.h"), mm/kasan/init.c: In function 'kasan_free_p4d': mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable] p4d_t *p4d; ^~~ because p4d_none() in "5level-fixup.h" is compiled away while it is a static inline function in "pgtable-nopud.h". However, if converted p4d_none() to a static inline there, powerpc would be unhappy as it reads those in assembler language in "arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip assembly include for the static inline C function. While at it, converted a few similar functions to be consistent with the ones in "pgtable-nopud.h". Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
NeilBrown	6a2aeab59e	seq_file: fix problem when seeking mid-record If you use lseek or similar (e.g. pread) to access a location in a seq_file file that is within a record, rather than at a record boundary, then the first read will return the remainder of the record, and the second read will return the whole of that same record (instead of the next record). When seeking to a record boundary, the next record is correctly returned. This bug was introduced by a recent patch (identified below). Before that patch, seq_read() would increment m->index when the last of the buffer was returned (m->count == 0). After that patch, we rely on ->next to increment m->index after filling the buffer - but there was one place where that didn't happen. Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/ Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Sergei Turchanov <turchanov@farpost.com> Tested-by: Sergei Turchanov <turchanov@farpost.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Roman Gushchin	ec9f02384f	mm: workingset: fix vmstat counters for shadow nodes Memcg counters for shadow nodes are broken because the memcg pointer is obtained in a wrong way. The following approach is used: virt_to_page(xa_node)->mem_cgroup Since commit 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't set for slab pages, so memcg_from_slab_page() should be used instead. Also I doubt that it ever worked correctly: virt_to_head_page() should be used instead of virt_to_page(). Otherwise objects residing on tail pages are not accounted, because only the head page contains a valid mem_cgroup pointer. That was a case since the introduction of these counters by the commit 68d48e6a2df5 ("mm: workingset: add vmstat counter for shadow nodes"). Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Isaac J. Manjarres	951531691c	mm/usercopy: use memory range to be accessed for wraparound check Currently, when checking to see if accessing n bytes starting at address "ptr" will cause a wraparound in the memory addresses, the check in check_bogus_address() adds an extra byte, which is incorrect, as the range of addresses that will be accessed is [ptr, ptr + (n - 1)]. This can lead to incorrectly detecting a wraparound in the memory address, when trying to read 4 KB from memory that is mapped to the the last possible page in the virtual address space, when in fact, accessing that range of memory would not cause a wraparound to occur. Use the memory range that will actually be accessed when considering if accessing a certain amount of bytes will cause the memory address to wrap around. Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Trilok Soni <tsoni@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Catalin Marinas	fcf3a5b62f	mm: kmemleak: disable early logging in case of error If an error occurs during kmemleak_init() (e.g. kmem cache cannot be created), kmemleak is disabled but kmemleak_early_log remains enabled. Subsequently, when the .init.text section is freed, the log_early() function no longer exists. To avoid a page fault in such scenario, ensure that kmemleak_disable() also disables early logging. Link: http://lkml.kernel.org/r/20190731152302.42073-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Kuppuswamy Sathyanarayanan	5336e52c9e	mm/vmalloc.c: fix percpu free VM area search criteria Recent changes to the vmalloc code by commit 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") can cause spurious percpu allocation failures. These, in turn, can result in panic()s in the slub code. One such possible panic was reported by Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939. Another related panic observed is, RIP: 0033:0x7f46f7441b9b Call Trace: dump_stack+0x61/0x80 pcpu_alloc.cold.30+0x22/0x4f mem_cgroup_css_alloc+0x110/0x650 cgroup_apply_control_enable+0x133/0x330 cgroup_mkdir+0x41b/0x500 kernfs_iop_mkdir+0x5a/0x90 vfs_mkdir+0x102/0x1b0 do_mkdirat+0x7d/0xf0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly uses two lists (vmap_area_list & free_vmap_area_list) to track the used and free VM areas in VMALLOC space. And pcpu_get_vm_areas(offsets[], sizes[], nr_vms, align) function is used for allocating congruent VM areas for percpu memory allocator. In order to not conflict with VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the VMALLOC space. So the search for free vm_area for the given requirement starts near VMALLOC_END and moves upwards towards VMALLOC_START. Prior to commit 68ad4a330433, the search for free vm_area in pcpu_get_vm_areas() involves following two main steps. Step 1: Find a aligned "base" adress near VMALLOC_END. va = free vm area near VMALLOC_END Step 2: Loop through number of requested vm_areas and check, Step 2.1: if (base < VMALLOC_START) 1. fail with error Step 2.2: // end is offsets[area] + sizes[area] if (base + end > va->vm_end) 1. Move the base downwards and repeat Step 2 Step 2.3: if (base + start < va->vm_start) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 But Commit 68ad4a330433 removed Step 2.2 and modified Step 2.3 as below: Step 2.3: if (base + start < va->vm_start \|\| base + end > va->vm_end) 1. Move to previous free vm_area node, find aligned base address and repeat Step 2 Above change is the root cause of spurious percpu memory allocation failures. For example, consider a case where a relatively large vm_area (~ 30 TB) was ignored in free vm_area search because it did not pass the base + end < vm->vm_end boundary check. Ignoring such large free vm_area's would lead to not finding free vm_area within boundary of VMALLOC_start to VMALLOC_END which in turn leads to allocation failures. So modify the search algorithm to include Step 2.2. Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com Fixes: 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap allocation") Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Miles Chen	54a83d6bcb	mm/memcontrol.c: fix use after free in mem_cgroup_iter() This patch is sent to report an use after free in mem_cgroup_iter() after merging commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()"). I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e ("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to the trees. However, I can still observe use after free issues addressed in the commit be2657752e9e. (on low-end devices, a few times this month) backtrace: css_tryget <- crash here mem_cgroup_iter shrink_node shrink_zones do_try_to_free_pages try_to_free_pages __perform_reclaim __alloc_pages_direct_reclaim __alloc_pages_slowpath __alloc_pages_nodemask To debug, I poisoned mem_cgroup before freeing it: static void __mem_cgroup_free(struct mem_cgroup memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); free_percpu(memcg->stat); + / poison memcg before freeing it / + memset(memcg, 0x78, sizeof(struct mem_cgroup)); kfree(memcg); } The coredump shows the position=0xdbbc2a00 is freed. (gdb) p/x ((struct mem_cgroup_per_node )0xe5009e00)->iter[8] $13 = {position = 0xdbbc2a00, generation = 0x2efd} 0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100 0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000 0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000 0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878 0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0 0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000 0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000 0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000 0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000 0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000 0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878 0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878 0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878 0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878 In the reclaim path, try_to_free_pages() does not setup sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ..., shrink_node(). In mem_cgroup_iter(), root is set to root_mem_cgroup because sc->target_mem_cgroup is NULL. It is possible to assign a memcg to root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter(). try_to_free_pages struct scan_control sc = {...}, target_mem_cgroup is 0x0; do_try_to_free_pages shrink_zones shrink_node mem_cgroup root = sc->target_mem_cgroup; memcg = mem_cgroup_iter(root, NULL, &reclaim); mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); My device uses memcg non-hierarchical mode. When we release a memcg: invalidate_reclaim_iterators() reaches only dead_memcg and its parents. If non-hierarchical mode is used, invalidate_reclaim_iterators() never reaches root_mem_cgroup. static void invalidate_reclaim_iterators(struct mem_cgroup dead_memcg) { struct mem_cgroup *memcg = dead_memcg; for (; memcg; memcg = parent_mem_cgroup(memcg) ... } So the use after free scenario looks like: CPU1 CPU2 try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... css = css_next_descendant_pre(css, &root->css); memcg = mem_cgroup_from_css(css); cmpxchg(&iter->position, pos, memcg); invalidate_reclaim_iterators(memcg); ... __mem_cgroup_free() kfree(memcg); try_to_free_pages do_try_to_free_pages shrink_zones shrink_node mem_cgroup_iter() if (!root) root = root_mem_cgroup; ... mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id); iter = &mz->iter[reclaim->priority]; pos = READ_ONCE(iter->position); css_tryget(&pos->css) <- use after free To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in invalidate_reclaim_iterators(). [cai@lca.pw: fix -Wparentheses compilation warning] Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting") Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Henry Burns	b997052bc3	mm/z3fold.c: fix z3fold_destroy_pool() race condition The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. Calling z3fold_deregister_migration() before the workqueues are drained means that there can be allocated pages referencing a freed inode, causing any thread in compaction to be able to trip over the bad pointer in PageMovable(). Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Henry Burns	6051d3bd3b	mm/z3fold.c: fix z3fold_destroy_pool() ordering The constraint from the zpool use of z3fold_destroy_pool() is there are no outstanding handles to memory (so no active allocations), but it is possible for there to be outstanding work on either of the two wqs in the pool. If there is work queued on pool->compact_workqueue when it is called, z3fold_destroy_pool() will do: z3fold_destroy_pool() destroy_workqueue(pool->release_wq) destroy_workqueue(pool->compact_wq) drain_workqueue(pool->compact_wq) do_compact_page(zhdr) kref_put(&zhdr->refcount) __release_z3fold_page(zhdr, ...) queue_work_on(pool->release_wq, &pool->work) BOOM So compact_wq needs to be destroyed before release_wq. Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jonathan Adams <jwadams@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Yang Shi	a53190a4aa	mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind When running syzkaller internally, we ran into the below bug on 4.9.x kernel: kernel BUG at mm/huge_memory.c:2124! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 task: ffff880067b34900 task.stack: ffff880068998000 RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124 Call Trace: split_huge_page include/linux/huge_mm.h:100 [inline] queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538 walk_pmd_range mm/pagewalk.c:50 [inline] walk_pud_range mm/pagewalk.c:90 [inline] walk_pgd_range mm/pagewalk.c:116 [inline] __walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208 walk_page_range+0x154/0x370 mm/pagewalk.c:285 queue_pages_range+0x115/0x150 mm/mempolicy.c:694 do_mbind mm/mempolicy.c:1241 [inline] SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370 SyS_mbind+0x46/0x60 mm/mempolicy.c:1352 do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282 entry_SYSCALL_64_after_swapgs+0x5d/0xdb Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124 RSP <ffff88006899f980> with the below test: uint64_t r[1] = {0xffffffffffffffff}; int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); intptr_t res = 0; res = syscall(__NR_socket, 0x11, 3, 0x300); if (res != -1) r[0] = res; (uint32_t)0x20000040 = 0x10000; (uint32_t)0x20000044 = 1; (uint32_t)0x20000048 = 0xc520; (uint32_t)0x2000004c = 1; syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10); syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0); (uint64_t)0x20000340 = 2; syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3); return 0; } Actually the test does: mmap(0x20000000, 16777216, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_FIXED\|MAP_ANONYMOUS, -1, 0) = 0x20000000 socket(AF_PACKET, SOCK_RAW, 768) = 3 setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0 mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED\|MAP_FIXED\|MAP_POPULATE\|MAP_DENYWRITE, 3, 0) = 0x20fed000 mbind(..., MPOL_MF_STRICT\|MPOL_MF_MOVE) = 0 The setsockopt() would allocate compound pages (16 pages in this test) for packet tx ring, then the mmap() would call packet_mmap() to map the pages into the user address space specified by the mmap() call. When calling mbind(), it would scan the vma to queue the pages for migration to the new node. It would split any huge page since 4.9 doesn't support THP migration, however, the packet tx ring compound pages are not THP and even not movable. So, the above bug is triggered. However, the later kernel is not hit by this issue due to commit d44d363f6578 ("mm: don't assume anonymous pages have SwapBacked flag"), which just removes the PageSwapBacked check for a different reason. But, there is a deeper issue. According to the semantic of mbind(), it should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and MPOL_MF_STRICT was also specified, but the kernel was unable to move all existing pages in the range. The tx ring of the packet socket is definitely not movable, however, mbind() returns success for this case. Although the most socket file associates with non-movable pages, but XDP may have movable pages from gup. So, it sounds not fine to just check the underlying file type of vma in vma_migratable(). Change migrate_page_add() to check if the page is movable or not, if it is unmovable, just return -EIO. But do not abort pte walk immediately, since there may be pages off LRU temporarily. We should migrate other pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some paged could not be not moved, then return -EIO for mbind() eventually. With this change the above test would return -EIO as expected. [yang.shi@linux.alibaba.com: fix review comments from Vlastimil] Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Yang Shi	d883544515	mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should try best to migrate misplaced pages, if some of the pages could not be migrated, then return -EIO. There are three different sub-cases: 1. vma is not migratable 2. vma is migratable, but there are unmovable pages 3. vma is migratable, pages are movable, but migrate_pages() fails If #1 happens, kernel would just abort immediately, then return -EIO, after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified"). If #3 happens, kernel would set policy and migrate pages with best-effort, but won't rollback the migrated pages and reset the policy back. Before that commit, they behaves in the same way. It'd better to keep their behavior consistent. But, rolling back the migrated pages and resetting the policy back sounds not feasible, so just make #1 behave as same as #3. Userspace will know that not everything was successfully migrated (via -EIO), and can take whatever steps it deems necessary - attempt rollback, determine which exact page(s) are violating the policy, etc. Make queue_pages_range() return 1 to indicate there are unmovable pages or vma is not migratable. The #2 is not handled correctly in the current kernel, the following patch will fix it. [yang.shi@linux.alibaba.com: fix review comments from Vlastimil] Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Ralph Campbell	1de13ee592	mm/hmm: fix bad subpage pointer in try_to_unmap_one When migrating an anonymous private page to a ZONE_DEVICE private page, the source page->mapping and page->index fields are copied to the destination ZONE_DEVICE struct page and the page_mapcount() is increased. This is so rmap_walk() can be used to unmap and migrate the page back to system memory. However, try_to_unmap_one() computes the subpage pointer from a swap pte which computes an invalid page pointer and a kernel panic results such as: BUG: unable to handle page fault for address: ffffea1fffffffc8 Currently, only single pages can be migrated to device private memory so no subpage computation is needed and it can be set to "page". [rcampbell@nvidia.com: add comment] Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00

1 2 3 4 5 ...

856680 Commits