linux

iv/linux

Author	SHA1	Message	Date
Sean Christopherson	e528acd31a	KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream. Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl Reported-by: Adam Borowski <kilobyte@angband.pl> Analyzed-by: David Hildenbrand <david@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>	2019-11-28 18:29:02 +01:00
Tomas Bortoli	368bc43ff0	Bluetooth: Fix invalid-free in bcsp_close() commit cf94da6f502d8caecabd56b194541c873c8a7a3c upstream. Syzbot reported an invalid-free that I introduced fixing a memleak. bcsp_recv() also frees bcsp->rx_skb but never nullifies its value. Nullify bcsp->rx_skb every time it is freed. Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+a0d209a4676664613e76@syzkaller.appspotmail.com Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-28 18:29:02 +01:00
zhong jiang	4a34dcf258	mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lock [ Upstream commit d2ab99403ee00d8014e651728a4702ea1ae5e52c ] When adding the memory by probing memory block in sysfs interface, there is an obvious issue that we will unlock the device_hotplug_lock when fails to takes it. That issue was introduced in Commit 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") We should drop out in time when fails to take the device_hotplug_lock. Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") Reported-by: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:02 +01:00
Vignesh R	cb92a5c2cd	spi: omap2-mcspi: Fix DMA and FIFO event trigger size mismatch [ Upstream commit baf8b9f8d260c55a86405f70a384c29cda888476 ] Commit b682cffa3ac6 ("spi: omap2-mcspi: Set FIFO DMA trigger level to word length") broke SPI transfers where bits_per_word != 8. This is because of mimsatch between McSPI FIFO level event trigger size (SPI word length) and DMA request size(word length * maxburst). This leads to data corruption, lockup and errors like: spi1.0: EOW timed out Fix this by setting DMA maxburst size to 1 so that McSPI FIFO level event trigger size matches DMA request size. Fixes: b682cffa3ac6 ("spi: omap2-mcspi: Set FIFO DMA trigger level to word length") Cc: stable@vger.kernel.org Reported-by: David Lechner <david@lechnology.com> Tested-by: David Lechner <david@lechnology.com> Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:02 +01:00
Kishon Vijay Abraham I	61022359b3	PCI: keystone: Use quirk to limit MRRS for K2G [ Upstream commit 148e340c0696369fadbbddc8f4bef801ed247d71 ] PCI controller in K2G also has a limitation that memory read request size (MRRS) must not exceed 256 bytes. Use the quirk to limit MRRS (added for K2HK, K2L and K2E) for K2G as well. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
Nathan Chancellor	4b6fa999b0	pinctrl: zynq: Use define directive for PIN_CONFIG_IO_STANDARD [ Upstream commit cd8a145a066a1a3beb0ae615c7cb2ee4217418d7 ] Clang warns when one enumerated type is implicitly converted to another: drivers/pinctrl/pinctrl-zynq.c:985:18: warning: implicit conversion from enumeration type 'enum zynq_pin_config_param' to different enumeration type 'enum pin_config_param' [-Wenum-conversion] {"io-standard", PIN_CONFIG_IOSTANDARD, zynq_iostd_lvcmos18}, ~ ^~~~~~~~~~~~~~~~~~~~~ drivers/pinctrl/pinctrl-zynq.c:990:16: warning: implicit conversion from enumeration type 'enum zynq_pin_config_param' to different enumeration type 'enum pin_config_param' [-Wenum-conversion] = { PCONFDUMP(PIN_CONFIG_IOSTANDARD, "IO-standard", NULL, true), ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/pinctrl/pinconf-generic.h:163:11: note: expanded from macro 'PCONFDUMP' .param = a, .display = b, .format = c, .has_arg = d \ ^ 2 warnings generated. It is expected that pinctrl drivers can extend pin_config_param because of the gap between PIN_CONFIG_END and PIN_CONFIG_MAX so this conversion isn't an issue. Most drivers that take advantage of this define the PIN_CONFIG variables as constants, rather than enumerated values. Do the same thing here so that Clang no longer warns. Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
Nathan Chancellor	01bd9422af	pinctrl: lpc18xx: Use define directive for PIN_CONFIG_GPIO_PIN_INT [ Upstream commit f24bfb39975c241374cadebbd037c17960cf1412 ] Clang warns when one enumerated type is implicitly converted to another: drivers/pinctrl/pinctrl-lpc18xx.c:643:29: warning: implicit conversion from enumeration type 'enum lpc18xx_pin_config_param' to different enumeration type 'enum pin_config_param' [-Wenum-conversion] {"nxp,gpio-pin-interrupt", PIN_CONFIG_GPIO_PIN_INT, 0}, ~ ^~~~~~~~~~~~~~~~~~~~~~~ drivers/pinctrl/pinctrl-lpc18xx.c:648:12: warning: implicit conversion from enumeration type 'enum lpc18xx_pin_config_param' to different enumeration type 'enum pin_config_param' [-Wenum-conversion] PCONFDUMP(PIN_CONFIG_GPIO_PIN_INT, "gpio pin int", NULL, true), ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/pinctrl/pinconf-generic.h:163:11: note: expanded from macro 'PCONFDUMP' .param = a, .display = b, .format = c, .has_arg = d \ ^ 2 warnings generated. It is expected that pinctrl drivers can extend pin_config_param because of the gap between PIN_CONFIG_END and PIN_CONFIG_MAX so this conversion isn't an issue. Most drivers that take advantage of this define the PIN_CONFIG variables as constants, rather than enumerated values. Do the same thing here so that Clang no longer warns. Link: https://github.com/ClangBuiltLinux/linux/issues/140 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
Brian Masney	265c470602	pinctrl: qcom: spmi-gpio: fix gpio-hog related boot issues [ Upstream commit 149a96047237574b756d872007c006acd0cc6687 ] When attempting to setup up a gpio hog, device probing would repeatedly fail with -EPROBE_DEFERED errors. It was caused by a circular dependency between the gpio and pinctrl frameworks. If the gpio-ranges property is present in device tree, then the gpio framework will handle the gpio pin registration and eliminate the circular dependency. See Christian Lamparter's commit a86caa9ba5d7 ("pinctrl: msm: fix gpio-hog related boot issues") for a detailed commit message that explains the issue in much more detail. The code comment in this commit came from Christian's commit. Signed-off-by: Brian Masney <masneyb@onstation.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
David Barmann	597b389bd8	sock: Reset dst when changing sk_mark via setsockopt [ Upstream commit 50254256f382c56bde87d970f3d0d02fdb76ec70 ] When setting the SO_MARK socket option, if the mark changes, the dst needs to be reset so that a new route lookup is performed. This fixes the case where an application wants to change routing by setting a new sk_mark. If this is done after some packets have already been sent, the dst is cached and has no effect. Signed-off-by: David Barmann <david.barmann@stackpath.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
YueHaibing	23fd22e304	net: bcmgenet: return correct value 'ret' from bcmgenet_power_down [ Upstream commit 0db55093b56618088b9a1d445eb6e43b311bea33 ] Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 'bcmgenet_power_down': drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable] bcmgenet_power_down should return 'ret' instead of 0. Fixes: ca8cf341903f ("net: bcmgenet: propagate errors from bcmgenet_power_down") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
Colin Ian King	f1f9cb4f6b	ACPICA: Use %d for signed int print formatting instead of %u [ Upstream commit f8ddf49b420112e28bdd23d7ad52d7991a0ccbe3 ] Fix warnings found using static analysis with cppcheck, use %d printf format specifier for signed ints rather than %u Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:01 +01:00
Tycho Andersen	f4b754975c	dlm: don't leak kernel pointer to userspace [ Upstream commit 9de30f3f7f4d31037cfbb7c787e1089c1944b3a7 ] In copy_result_to_user(), we first create a struct dlm_lock_result, which contains a struct dlm_lksb, the last member of which is a pointer to the lvb. Unfortunately, we copy the entire struct dlm_lksb to the result struct, which is then copied to userspace at the end of the function, leaking the contents of sb_lvbptr, which is a valid kernel pointer in some cases (indeed, later in the same function the data it points to is copied to userspace). It is an error to leak kernel pointers to userspace, as it undermines KASLR protections (see e.g. 65eea8edc31 ("floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl") for another example of this). Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:00 +01:00
Tycho Andersen	81a30f89c7	dlm: fix invalid free [ Upstream commit d968b4e240cfe39d39d80483bac8bca8716fd93c ] dlm_config_nodes() does not allocate nodes on failure, so we should not free() nodes when it fails. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:29:00 +01:00
James Smart	4bbb2151c9	scsi: lpfc: fcoe: Fix link down issue after 1000+ link bounces [ Upstream commit 036cad1f1ac9ce03e2db94b8460f98eaf1e1ee4c ] On FCoE adapters, when running link bounce test in a loop, initiator failed to login with switch switch and required driver reload to recover. Switch reached a point where all subsequent FLOGIs would be LS_RJT'd. Further testing showed the condition to be related to not performing FCF discovery between FLOGI's. Fix by monitoring FLOGI failures and once a repeated error is seen repeat FCF discovery. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:59 +01:00
Shivasharan S	d13ad591bd	scsi: megaraid_sas: Fix msleep granularity [ Upstream commit 9155cf30a3c4ef97e225d6daddf9bd4b173267e8 ] In megasas_transition_to_ready() driver waits 180seconds for controller to change FW state. Here we are calling msleep(1) in a loop for this. As explained in timers-howto.txt, msleep(1) will actually sleep longer than 1ms. If a faulty controller is connected, we will end up waiting for much more than 180 seconds causing unnecessary delays during load. Change the granularity of msleep() call from 1ms to 1000ms. Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:59 +01:00
Suganath Prabu	c05a29f73e	scsi: mpt3sas: Fix driver modifying persistent data in Manufacturing page11 [ Upstream commit 97f35194093362a63b33caba2485521ddabe2c95 ] Currently driver is modifying both current & NVRAM/persistent data in Manufacturing page11. Driver should change only current copy of Manufacturing page11. It should not modify the persistent data. So removed the section of code where driver is modifying the persistent data of Manufacturing page11. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:58 +01:00
Suganath Prabu	f8c80f6279	scsi: mpt3sas: Fix Sync cache command failure during driver unload [ Upstream commit 9029a72500b95578a35877a43473b82cb0386c53 ] This is to fix SYNC CACHE and START STOP command failures with DID_NO_CONNECT during driver unload. In driver's IO submission patch (i.e. in driver's .queuecommand()) driver won't allow any SCSI commands to the IOC when ioc->remove_host flag is set and hence SYNC CACHE commands which are issued to the target drives (where write cache is enabled) during driver unload time is failed with DID_NO_CONNECT status. Now modified the driver to allow SYNC CACHE and START STOP commands to IOC, even when remove_host flag is set. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:58 +01:00
Shaokun Zhang	d307d4b183	rtlwifi: rtl8192de: Fix misleading REG_MCUFWDL information [ Upstream commit 7d129adff3afbd3a449bc3593f2064ac546d58d3 ] RT_TRACE shows REG_MCUFWDL value as a decimal value with a '0x' prefix, which is somewhat misleading. Fix it to print hexadecimal, as was intended. Cc: Ping-Ke Shih <pkshih@realtek.com> Cc: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:57 +01:00
Dan Carpenter	d7253cb7c5	wireless: airo: potential buffer overflow in sprintf() [ Upstream commit 3d39e1bb1c88f32820c5f9271f2c8c2fb9a52bac ] It looks like we wanted to print a maximum of BSSList_rid.ssidLen bytes of the ssid, but we accidentally use "%s" (width) instead of "%.s" (precision) so if the ssid doesn't have a NUL terminator this could lead to an overflow. Static analysis. Not tested. Fixes: e174961ca1a0 ("net: convert print_mac to %pM") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:57 +01:00
Ali MJ Al-Nasrawy	4b3a1f75e2	brcmsmac: never log "tid x is not agg'able" by default [ Upstream commit 96fca788e5788b7ea3b0050eb35a343637e0a465 ] This message greatly spams the log under heavy Tx of frames with BK access class which is especially true when operating as AP. It is also not informative as the "agg'ablity" of TIDs are set once and never change. Fix this by logging only in debug mode. Signed-off-by: Ali MJ Al-Nasrawy <alimjalnasrawy@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:56 +01:00
Gustavo A. R. Silva	6d0d7b7501	rtl8xxxu: Fix missing break in switch [ Upstream commit 307b00c5e695857ca92fc6a4b8ab6c48f988a1b1 ] Add missing break statement in order to prevent the code from falling through to the default case. Fixes: 26f1fad29ad9 ("New driver: rtl8xxxu (mac80211)") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:56 +01:00
Christophe JAILLET	ae79ea8c87	wlcore: Fix the return value in case of error in 'wlcore_vendor_cmd_smart_config_start()' [ Upstream commit 3419348a97bcc256238101129d69b600ceb5cc70 ] We return 0 unconditionally at the end of 'wlcore_vendor_cmd_smart_config_start()'. However, 'ret' is set to some error codes in several error handling paths and we already return some error codes at the beginning of the function. Return 'ret' instead to propagate the error code. Fixes: 80ff8063e87c ("wlcore: handle smart config vendor commands") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:55 +01:00
Richard Guy Briggs	c2b5d224ab	audit: print empty EXECVE args [ Upstream commit ea956d8be91edc702a98b7fe1f9463e7ca8c42ab ] Empty executable arguments were being skipped when printing out the list of arguments in an EXECVE record, making it appear they were somehow lost. Include empty arguments as an itemized empty string. Reproducer: autrace /bin/ls "" "/etc" ausearch --start recent -m execve -i \| grep EXECVE type=EXECVE msg=audit(10/03/2018 13:04:03.208:1391) : argc=3 a0=/bin/ls a2=/etc With fix: type=EXECVE msg=audit(10/03/2018 21:51:38.290:194) : argc=3 a0=/bin/ls a1= a2=/etc type=EXECVE msg=audit(1538617898.290:194): argc=3 a0="/bin/ls" a1="" a2="/etc" Passes audit-testsuite. GH issue tracker at https://github.com/linux-audit/audit-kernel/issues/99 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: cleaned up the commit metadata] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:55 +01:00
Valentin Schneider	e1f78c15ae	sched/fair: Don't increase sd->balance_interval on newidle balance [ Upstream commit 3f130a37c442d5c4d66531b240ebe9abfef426b5 ] When load_balance() fails to move some load because of task affinity, we end up increasing sd->balance_interval to delay the next periodic balance in the hopes that next time we look, that annoying pinned task(s) will be gone. However, idle_balance() pays no attention to sd->balance_interval, yet it will still lead to an increase in balance_interval in case of pinned tasks. If we're going through several newidle balances (e.g. we have a periodic task), this can lead to a huge increase of the balance_interval in a very small amount of time. To prevent that, don't increase the balance interval when going through a newidle balance. This is a similar approach to what is done in commit 58b26c4c0257 ("sched: Increment cache_nice_tries only on periodic lb"), where we disregard newidle balance and rely on periodic balance for more stable results. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: patrick.bellasi@arm.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1537974727-30788-2-git-send-email-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:55 +01:00
Eric Dumazet	f4e7ca091d	net: do not abort bulk send on BQL status [ Upstream commit fe60faa5063822f2d555f4f326c7dd72a60929bf ] Before calling dev_hard_start_xmit(), upper layers tried to cook optimal skb list based on BQL budget. Problem is that GSO packets can end up comsuming more than the BQL budget. Breaking the loop is not useful, since requeued packets are ahead of any packets still in the qdisc. It is also more expensive, since next TX completion will push these packets later, while skbs are not in cpu caches. It is also a behavior difference with TSO packets, that can break the BQL limit by a large amount. Note that drivers should use __netdev_tx_sent_queue() in order to have optimal xmit_more support, and avoid useless atomic operations as shown in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:54 +01:00
Larry Chen	bcb7267a91	ocfs2: fix clusters leak in ocfs2_defrag_extent() [ Upstream commit 6194ae4242dec0c9d604bc05df83aa9260a899e4 ] ocfs2_defrag_extent() might leak allocated clusters. When the file system has insufficient space, the number of claimed clusters might be less than the caller wants. If that happens, the original code might directly commit the transaction without returning clusters. This patch is based on code in ocfs2_add_clusters_in_btree(). [akpm@linux-foundation.org: include localalloc.h, reduce scope of data_ac] Link: http://lkml.kernel.org/r/20180904041621.16874-3-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:53 +01:00
Changwei Ge	65cbd1279f	ocfs2: don't put and assigning null to bh allocated outside [ Upstream commit cf76c78595ca87548ca5e45c862ac9e0949c4687 ] ocfs2_read_blocks() and ocfs2_read_blocks_sync() are both used to read several blocks from disk. Currently, the input argument bhs can be NULL or NOT. It depends on the caller's behavior. If the function fails in reading blocks from disk, the corresponding bh will be assigned to NULL and put. Obviously, above process for non-NULL input bh is not appropriate. Because the caller doesn't even know its bhs are put and re-assigned. If buffer head is managed by caller, ocfs2_read_blocks and ocfs2_read_blocks_sync() should not evaluate it to NULL. It will cause caller accessing illegal memory, thus crash. Link: http://lkml.kernel.org/r/HK2PR06MB045285E0F4FBB561F9F2F9B3D5680@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:53 +01:00
Victor Kamensky	b2c50f6fcb	arm64: makefile fix build of .i file in external module case [ Upstream commit 98356eb0ae499c63e78073ccedd9a5fc5c563288 ] After 'a66649dab350 arm64: fix vdso-offsets.h dependency' if one will try to build .i file in case of external kernel module, build fails complaining that prepare0 target is missing. This issue came up with SystemTap when it tries to build variety of .i files for its own generated kernel modules trying to figure given kernel features/capabilities. The issue is that prepare0 is defined in top level Makefile only if KBUILD_EXTMOD is not defined. .i file rule depends on prepare and in case KBUILD_EXTMOD defined top level Makefile contains empty rule for prepare. But after mentioned commit arch/arm64/Makefile would introduce dependency on prepare0 through its own prepare target. Fix it to put proper ifdef KBUILD_EXTMOD around code introduced by mentioned commit. It matches what top level Makefile does. Acked-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:53 +01:00
Dave Jiang	23c7064d5c	ntb: intel: fix return value for ndev_vec_mask() [ Upstream commit 7756e2b5d68c36e170a111dceea22f7365f83256 ] ndev_vec_mask() should be returning u64 mask value instead of int. Otherwise the mask value returned can be incorrect for larger vectors. Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Lucas Van <lucas.van@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:52 +01:00
Jon Mason	bb51638760	ntb_netdev: fix sleep time mismatch [ Upstream commit a861594b1b7ffd630f335b351c4e9f938feadb8e ] The tx_time should be in usecs (according to the comment above the variable), but the setting of the timer during the rearming is done in msecs. Change it to match the expected units. Fixes: e74bfeedad08 ("NTB: Add flow control to the ntb_netdev") Suggested-by: Gerd W. Haeussler <gerd.haeussler@cesys-it.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:52 +01:00
Miroslav Lichvar	8f74ac645a	igb: shorten maximum PHC timecounter update interval [ Upstream commit 094bf4d0e9657f6ea1ee3d7e07ce3970796949ce ] The timecounter needs to be updated at least once per ~550 seconds in order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old timestamp. Since commit 500462a9d ("timers: Switch to a non-cascading wheel"), scheduling of delayed work seems to be less accurate and a requested delay of 540 seconds may actually be longer than 550 seconds. Shorten the delay to 480 seconds to be sure the timecounter is updated in time. This fixes an issue with HW timestamps on 82580/I350/I354 being off by ~1100 seconds for few seconds every ~9 minutes. Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:51 +01:00
David Hildenbrand	e2e7b55178	mm/memory_hotplug: make add_memory() take the device_hotplug_lock [ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mathieu Malaterre <malat@debian.org> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:51 +01:00
Colin Ian King	218e302c82	fs/hfs/extent.c: fix array out of bounds read of array extent [ Upstream commit 6c9a3f843a29d6894dfc40df338b91dbd78f0ae3 ] Currently extent and index i are both being incremented causing an array out of bounds read on extent[i]. Fix this by removing the extraneous increment of extent. Ernesto said: : This is only triggered when deleting a file with a resource fork. I : may be wrong because the documentation isn't clear, but I don't think : you can create those under linux. So I guess nobody was testing them. : : > A disk space leak, perhaps? : : That's what it looks like in general. hfs_free_extents() won't do : anything if the block count doesn't add up, and the error will be : ignored. Now, if the block count randomly does add up, we could see : some corruption. Detected by CoverityScan, CID#711541 ("Out of bounds read") Link: http://lkml.kernel.org/r/20180831140538.31566-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ernesto A. Fernndez <ernesto.mnd.fernandez@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:50 +01:00
Ernesto A. Fernández	79fb2b86db	hfs: update timestamp on truncate() [ Upstream commit 8cd3cb5061730af085a3f9890a3352f162b4e20c ] The vfs takes care of updating mtime on ftruncate(), but on truncate() it must be done by the module. Link: http://lkml.kernel.org/r/e1611eda2985b672ed2d8677350b4ad8c2d07e8a.1539316825.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:50 +01:00
Ernesto A. Fernández	304d969f63	hfsplus: update timestamps on truncate() [ Upstream commit dc8844aada735890a6de109bef327f5df36a982e ] The vfs takes care of updating ctime and mtime on ftruncate(), but on truncate() it must be done by the module. This patch can be tested with xfstests generic/313. Link: http://lkml.kernel.org/r/9beb0913eea37288599e8e1b7cec8768fb52d1b8.1539316825.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:49 +01:00
Ernesto A. Fernández	d78a7b5150	hfs: fix return value of hfs_get_block() [ Upstream commit 1267a07be5ebbff2d2739290f3d043ae137c15b4 ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfs is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfs_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/4538ab8c35ea37338490525f0f24cbc37227528c.1539195310.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:49 +01:00
Ernesto A. Fernández	face88eea3	hfsplus: fix return value of hfsplus_get_block() [ Upstream commit 839c3a6a5e1fbc8542d581911b35b2cb5cd29304 ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfsplus is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfsplus_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/2cd1301404ec7cf1e39c8f11a01a4302f1460ad6.1539195310.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:49 +01:00
Ernesto A. Fernández	bae19f949b	hfs: prevent btree data loss on ENOSPC [ Upstream commit 54640c7502e5ed41fbf4eedd499e85f9acc9698f ] Inserting a new record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes or extents. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. There is no need to reserve space before deleting a catalog record, as we do for hfsplus. This difference is because hfs index nodes have fixed length keys. Link: http://lkml.kernel.org/r/ab5fc8a7d5ffccfd5f27b1cf2cb4ceb6c110da74.1536269131.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:48 +01:00
Ernesto A. Fernández	376270e9ba	hfsplus: prevent btree data loss on ENOSPC [ Upstream commit d92915c35bfaf763d78bf1d5ac7f183420e3bd99 ] Inserting or deleting a record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes, extents or xattrs. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. The patch can be tested with xfstests generic/027. Link: http://lkml.kernel.org/r/4596eef22fbda137b4ffa0272d92f0da15364421.1536269129.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:47 +01:00
Ernesto A. Fernández	09bb000888	hfs: fix BUG on bnode parent update [ Upstream commit ef75bcc5763d130451a99825f247d301088b790b ] hfs_brec_update_parent() may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. It is not possible for this to happen on a valid hfs filesystem because the index nodes have fixed length keys. For reasons I ignore, the hfs module does have support for a number of hfsplus features. A corrupt btree header may report variable length keys and trigger this BUG, so it's better to fix it. Link: http://lkml.kernel.org/r/cf9b02d57f806217a2b1bf5db8c3e39730d8f603.1535682463.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:46 +01:00
Ernesto A. Fernández	5c7ff86f26	hfsplus: fix BUG on bnode parent update [ Upstream commit 19a9d0f1acf75e8be8cfba19c1a34e941846fa2b ] Creating, renaming or deleting a file may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. This bug is triggered by xfstests generic/027, somewhat rarely; here is a more reliable reproducer: truncate -s 50M fs.iso mkfs.hfsplus fs.iso mount fs.iso /mnt i=1000 while [ $i -le 2400 ]; do touch /mnt/$i &>/dev/null ((++i)) done i=2400 while [ $i -ge 1000 ]; do mv /mnt/$i /mnt/$(perl -e "print $i x61") &>/dev/null ((--i)) done The issue is that a newly created bnode is being put twice. Reset new_node to NULL in hfs_brec_update_parent() before reaching goto again. Link: http://lkml.kernel.org/r/5ee1db09b60373a15890f6a7c835d00e76bf601d.1535682461.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:46 +01:00
Rasmus Villemoes	b74d722d60	linux/bitmap.h: fix type of nbits in bitmap_shift_right() [ Upstream commit d9873969fa8725dc6a5a21ab788c057fd8719751 ] Most other bitmap API, including the OOL version __bitmap_shift_right, take unsigned nbits. This was accidentally left out from 2fbad29917c98. Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk Fixes: 2fbad29917c98 ("lib: bitmap: change bitmap_shift_right to take unsigned parameters") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:45 +01:00
Rasmus Villemoes	835c18375a	linux/bitmap.h: handle constant zero-size bitmaps correctly [ Upstream commit 7275b097851a5e2e0dd4da039c7e96b59ac5314e ] The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:44 +01:00
Anton Ivanov	b7b08f9aa8	um: Make line/tty semantics use true write IRQ [ Upstream commit 917e2fd2c53eb3c4162f5397555cbd394390d4bc ] This fixes a long standing bug where large amounts of output could freeze the tty (most commonly seen on stdio console). While the bug has always been there it became more pronounced after moving to the new interrupt controller. The line semantics are now changed to have true IRQ write semantics which should further improve the tty/line subsystem stability and performance Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:44 +01:00
Sabrina Dubroca	ea2b783ced	macsec: let the administrator set UP state even if lowerdev is down [ Upstream commit 07bddef9839378bd6f95b393cf24c420529b4ef1 ] Currently, the kernel doesn't let the administrator set a macsec device up unless its lower device is currently up. This is inconsistent, as a macsec device that is up won't automatically go down when its lower device goes down. Now that linkstate propagation works, there's really no reason for this limitation, so let's remove it. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:43 +01:00
Sabrina Dubroca	8f70a4d8ad	macsec: update operstate when lower device changes [ Upstream commit e6ac075882b2afcdf2d5ab328ce4ab42a1eb9593 ] Like all other virtual devices (macvlan, vlan), the operstate of a macsec device should match the state of its lower device. This is done by calling netif_stacked_transfer_operstate from its netdevice notifier. We also need to call netif_stacked_transfer_operstate when a new macsec device is created, so that its operstate is set properly. This is only relevant when we try to bring the device up directly when we create it. Radu Rendec proposed a similar patch, inspired from the 802.1q driver, that included changing the administrative state of the macsec device, instead of just the operstate. This version is similar to what the macvlan driver does, and updates only the operstate. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: Radu Rendec <radu.rendec@gmail.com> Reported-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:42 +01:00
Dave Chinner	8a7b0a5833	mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock [ Upstream commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ] We've recently seen a workload on XFS filesystems with a repeatable deadlock between background writeback and a multi-process application doing concurrent writes and fsyncs to a small range of a file. range_cyclic writeback Process 1 Process 2 xfs_vm_writepages write_cache_pages writeback_index = 2 cycled = 0 .... find page 2 dirty lock Page 2 ->writepage page 2 writeback page 2 clean page 2 added to bio no more pages write() locks page 1 dirties page 1 locks page 2 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 find page 1 towrite lock Page 1 ->writepage page 1 writeback page 1 clean page 1 added to bio find page 2 towrite lock Page 2 page 2 is writeback <blocks> write() locks page 1 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 !done && !cycled sets index to 0, restarts lookup find page 1 dirty find page 1 towrite lock Page 1 page 1 is writeback <blocks> lock Page 1 <blocks> DEADLOCK because: - process 1 needs page 2 writeback to complete to make enough progress to issue IO pending for page 1 - writeback needs page 1 writeback to complete so process 2 can progress and unlock the page it is blocked on, then it can issue the IO pending for page 2 - process 2 can't make progress until process 1 issues IO for page 1 The underlying cause of the problem here is that range_cyclic writeback is processing pages in descending index order as we hold higher index pages in a structure controlled from above write_cache_pages(). The write_cache_pages() caller needs to be able to submit these pages for IO before write_cache_pages restarts writeback at mapping index 0 to avoid wcp inverting the page lock/writeback wait order. generic_writepages() is not susceptible to this bug as it has no private context held across write_cache_pages() - filesystems using this infrastructure always submit pages in ->writepage immediately and so there is no problem with range_cyclic going back to mapping index 0. However: mpage_writepages() has a private bio context, exofs_writepages() has page_collect fuse_writepages() has fuse_fill_wb_data nfs_writepages() has nfs_pageio_descriptor xfs_vm_writepages() has xfs_writepage_ctx All of these ->writepages implementations can hold pages under writeback in their private structures until write_cache_pages() returns, and hence they are all susceptible to this deadlock. Also worth noting is that ext4 has it's own bastardised version of write_cache_pages() and so it /may/ have an equivalent deadlock. I looked at the code long enough to understand that it has a similar retry loop for range_cyclic writeback reaching the end of the file and then promptly ran away before my eyes bled too much. I'll leave it for the ext4 developers to determine if their code is actually has this deadlock and how to fix it if it has. There's a few ways I can see avoid this deadlock. There's probably more, but these are the first I've though of: 1. get rid of range_cyclic altogether 2. range_cyclic always stops at EOF, and we start again from writeback index 0 on the next call into write_cache_pages() 2a. wcp also returns EAGAIN to ->writepages implementations to indicate range cyclic has hit EOF. writepages implementations can then flush the current context and call wpc again to continue. i.e. lift the retry into the ->writepages implementation 3. range_cyclic uses trylock_page() rather than lock_page(), and it skips pages it can't lock without blocking. It will already do this for pages under writeback, so this seems like a no-brainer 3a. all non-WB_SYNC_ALL writeback uses trylock_page() to avoid blocking as per pages under writeback. I don't think #1 is an option - range_cyclic prevents frequently dirtied lower file offset from starving background writeback of rarely touched higher file offsets. #2 is simple, and I don't think it will have any impact on performance as going back to the start of the file implies an immediate seek. We'll have exactly the same number of seeks if we switch writeback to another inode, and then come back to this one later and restart from index 0. #2a is pretty much "status quo without the deadlock". Moving the retry loop up into the wcp caller means we can issue IO on the pending pages before calling wcp again, and so avoid locking or waiting on pages in the wrong order. I'm not convinced we need to do this given that we get the same thing from #2 on the next writeback call from the writeback infrastructure. #3 is really just a band-aid - it doesn't fix the access/wait inversion problem, just prevents it from becoming a deadlock situation. I'd prefer we fix the inversion, not sweep it under the carpet like this. #3a is really an optimisation that just so happens to include the band-aid fix of #3. So it seems that the simplest way to fix this issue is to implement solution #2 Link: http://lkml.kernel.org/r/20181005054526.21507-1-david@fromorbit.com Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:42 +01:00
Jia-Ju Bai	345e0e2702	fs/ocfs2/dlm/dlmdebug.c: fix a sleep-in-atomic-context bug in dlm_print_one_mle() [ Upstream commit 999865764f5f128896402572b439269acb471022 ] The kernel module may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 255: __dlm_put_mle in dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 254: spin_lock in dlm_put_ml [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 222: __dlm_put_mle in dlm_put_mle_inuse fs/ocfs2/dlm/dlmmaster.c, 219: spin_lock in dlm_put_mle_inuse To fix this bug, GFP_NOFS is replaced with GFP_ATOMIC. This bug is found by my static analysis tool DSAC. Link: http://lkml.kernel.org/r/20180901112528.27025-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:41 +01:00
David S. Miller	2c7552dee5	sparc64: Rework xchg() definition to avoid warnings. [ Upstream commit 6c2fc9cddc1ffdef8ada1dc8404e5affae849953 ] Such as: fs/ocfs2/file.c: In function ‘ocfs2_file_write_iter’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__((ptr)))__xchg((unsigned long)(x),(ptr),sizeof((ptr)))) and drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function ‘ixgbevf_xdp_setup’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__((ptr)))__xchg((unsigned long)(x),(ptr),sizeof((ptr)))) Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:40 +01:00
Felipe Rechia	6cb971b83d	powerpc/process: Fix flush_all_to_thread for SPE [ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ] Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e764e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e764e ("powerpc: create flush_all_to_thread()") Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-28 18:28:40 +01:00

1 2 3 4 5 ...

650090 Commits