1059640 Commits

Author SHA1 Message Date
Sumeet Pawnikar
f872f73601 thermal: int340x: Fix VCoRefLow MMIO bit offset for TGL
The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect.

Current implementation reads it from MMIO offset 0x5A18 and bit
offset [12:14], but the actual correct register definition is from
bit offset [11:13].

Update to fix the bit offset.

Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-08 15:29:22 +01:00
Rafael J. Wysocki
444dd878e8 PM: runtime: Fix pm_runtime_active() kerneldoc comment
The kerneldoc comment of pm_runtime_active() does not reflect the
behavior of the function, so update it accordingly.

Fixes: 403d2d116ec0 ("PM: runtime: Add kerneldoc comments to multiple helpers")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-12-08 15:21:55 +01:00
Chen Yu
11f8cb8903 ACPI: tools: Fix compilation when output directory is not present
Compiling the ACPI tools when output directory parameter is specified,
but the output directory is not present, triggers the following error:

make O=/data/test/tmp/ -C tools/power/acpi/

make: Entering directory '/data/src/kernel/linux/tools/power/acpi'
  DESCEND tools/acpidbg
make[1]: Entering directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg'
  MKDIR    include
  CP       include
  CC       tools/acpidbg/acpidbg.o
Assembler messages:
Fatal error: can't create /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o: No such file or directory
make[1]: *** [../../Makefile.rules:24: /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o] Error 1
make[1]: Leaving directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg'
make: *** [Makefile:18: acpidbg] Error 2
make: Leaving directory '/data/src/kernel/linux/tools/power/acpi'

which occurs because the output directory has not been created yet.

Fix this issue by creating the output directory before compiling.

Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-08 15:10:27 +01:00
Steven Rostedt (VMware)
48b27b6b51 tracefs: Set all files to the same group ownership as the mount option
As people have been asking to allow non-root processes to have access to
the tracefs directory, it was considered best to only allow groups to have
access to the directory, where it is easier to just set the tracefs file
system to a specific group (as other would be too dangerous), and that way
the admins could pick which processes would have access to tracefs.

Unfortunately, this broke tooling on Android that expected the other bit
to be set. For some special cases, for non-root tools to trace the system,
tracefs would be mounted and change the permissions of the top level
directory which gave access to all running tasks permission to the
tracing directory. Even though this would be dangerous to do in a
production environment, for testing environments this can be useful.

Now with the new changes to not allow other (which is still the proper
thing to do), it breaks the testing tooling. Now more code needs to be
loaded on the system to change ownership of the tracing directory.

The real solution is to have tracefs honor the gid=xxx option when
mounting. That is,

(tracing group tracing has value 1003)

 mount -t tracefs -o gid=1003 tracefs /sys/kernel/tracing

should have it that all files in the tracing directory should be of the
given group.

Copy the logic from d_walk() from dcache.c and simplify it for the mount
case of tracefs if gid is set. All the files in tracefs will be walked and
their group will be set to the value passed in.

Link: https://lkml.kernel.org/r/20211207171729.2a54e1b3@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Yabin Cui <yabinc@google.com>
Fixes: 49d67e445742 ("tracefs: Have tracefs directories not set OTH permission bits by default")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-08 08:06:40 -05:00
Steven Rostedt (VMware)
ee7f366699 tracefs: Have new files inherit the ownership of their parent
If directories in tracefs have their ownership changed, then any new files
and directories that are created under those directories should inherit
the ownership of the director they are created in.

Link: https://lkml.kernel.org/r/20211208075720.4855d180@gandalf.local.home

Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yabin Cui <yabinc@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: stable@vger.kernel.org
Fixes: 4282d60689d4f ("tracefs: Add new tracefs file system")
Reported-by: Kalesh Singh <kaleshsingh@google.com>
Reported: https://lore.kernel.org/all/CAC_TJve8MMAv+H_NdLSJXZUSoxOEq2zB_pVaJ9p=7H6Bu3X76g@mail.gmail.com/
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-08 08:06:29 -05:00
Greg Kroah-Hartman
7c602f5d04 2nd set of IIO fixes for 5.16
Note 1st set were before the merge window.
 
 Biggest set in here fix what happens when things go wrong in
 the interrupt handlers for an IIO trigger.
 
 Otherwise normal mix of recent and ancient bugs.
 
 trigger core
  - Fix reference counting bug that was preventing the iio_trig
    structures from being released.
 adxrs290
  - Correctly sign extend the rate and temperature data.
 at91-sama5d2
  - Fix sign extension from the wrong bit and use the scan_type
    values to avoid it being open coded in two places (which were
    out of sync)
 axp20x_adc
  - Fix current reporting bit depth.
 dln2-adc
  - Fix a lock ordering issue and lockdep complaint that results.
  - Add error handling for failure to register the trigger.
 imx8qxp
  - Wrong config dependency
 kxcjk-1013
  - Potential leak due to wrong guard on cleanup.
 ltr501, kxsd9, stk3310, itg3200, ad7768
  - Don't return error codes from interrupt handler and call
    iio_trigger_notify_done() on all paths to avoid leaving
    trigger disabled on an intermittent fault.
 mma8452
  - Fix missing iio_trigger_get() that could lead to use after free.
 stm32
  - Fix a current leak.
  - Avoid null pointer derefence on defer_probe error due to wrong
    struct device being passed.
 stm32-timer
  - Drop space in MODULE_ALIAS.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmGwgs0RHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0Foiovw//bGSpzC3Jyr+rAw47/i+2lem2kjfiaJAg
 OqET3KCjk5auKv6rAA8BtyQiaq4uqhpykC6Niz7klUhwA1gMUqswqFHQtOocnz2T
 YzyRN3ew4IKOiTUykOUhr4HqyHVSKJhEgXB/1E86EXA5hjqXypaKE3ObWNE46SkO
 1vw8Kyx4cH/HaOnF7/piZxMmTAIUQSKDjRkcpUJtCx6IVlqnGGsj6bF2npY0MoWJ
 jAcB/u4XTYMH8oKEJpKvRmJ6l8BJWtbd/3e4fb0lP0pZGMyRWhfbuLWUkUOr6+8/
 0p5ZSuKqOOjogNsrj/ENvv2UJskWkrLtLLaS6C2mo+4nD2mn3ZVo2vSpnSOifR/J
 TPSOAHH4+Vfb+OjlN791PB9vJE9D9/j340EtEGRDkbqA3uYTbYsMYr3+1ySG8g4E
 VJTkkeCz6cNQjLvPL+rBd4v7PepCEtx8risBoClcWuRmAJ4ffVjeApl2e817ZJNe
 bDTrQun6wTY1Jscwhzb8y5tKnlQvZx//tUdINLTiICLSQRikkmuiMTu1/M4UD209
 k1+2jNX3pxlgBdqGzn57G3JQgT69X92l4PzHtX7rn28P5IY+j3j9Vp0tpc17VSTV
 Z32HQPkB+cxHcKPTmddWOm9C7rA7s8OfqGQ/kD40Cj7vVO5epWAaMc4lglALPxeW
 wqAzgugeyjQ=
 =V2My
 -----END PGP SIGNATURE-----

Merge tag 'iio-fixes-for-5.16b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next

Jonathan writes:

2nd set of IIO fixes for 5.16

Note 1st set were before the merge window.

Biggest set in here fix what happens when things go wrong in
the interrupt handlers for an IIO trigger.

Otherwise normal mix of recent and ancient bugs.

trigger core
 - Fix reference counting bug that was preventing the iio_trig
   structures from being released.
adxrs290
 - Correctly sign extend the rate and temperature data.
at91-sama5d2
 - Fix sign extension from the wrong bit and use the scan_type
   values to avoid it being open coded in two places (which were
   out of sync)
axp20x_adc
 - Fix current reporting bit depth.
dln2-adc
 - Fix a lock ordering issue and lockdep complaint that results.
 - Add error handling for failure to register the trigger.
imx8qxp
 - Wrong config dependency
kxcjk-1013
 - Potential leak due to wrong guard on cleanup.
ltr501, kxsd9, stk3310, itg3200, ad7768
 - Don't return error codes from interrupt handler and call
   iio_trigger_notify_done() on all paths to avoid leaving
   trigger disabled on an intermittent fault.
mma8452
 - Fix missing iio_trigger_get() that could lead to use after free.
stm32
 - Fix a current leak.
 - Avoid null pointer derefence on defer_probe error due to wrong
   struct device being passed.
stm32-timer
 - Drop space in MODULE_ALIAS.

* tag 'iio-fixes-for-5.16b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
  iio: trigger: stm32-timer: fix MODULE_ALIAS
  iio: adc: stm32: fix null pointer on defer_probe error
  iio: at91-sama5d2: Fix incorrect sign extension
  iio: adc: axp20x_adc: fix charging current reporting on AXP22x
  iio: gyro: adxrs290: fix data signedness
  iio: ad7768-1: Call iio_trigger_notify_done() on error
  iio: itg3200: Call iio_trigger_notify_done() on error
  iio: imx8qxp-adc: fix dependency to the intended ARCH_MXC config
  iio: dln2: Check return value of devm_iio_trigger_register()
  iio: trigger: Fix reference counting
  iio: dln2-adc: Fix lockdep complaint
  iio: adc: stm32: fix a current leak by resetting pcsel before disabling vdda
  iio: mma8452: Fix trigger reference couting
  iio: stk3310: Don't return error code in interrupt handler
  iio: kxsd9: Don't return error code in trigger handler
  iio: ltr501: Don't return error code in trigger handler
  iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove
2021-12-08 13:53:22 +01:00
Wudi Wang
b383a42ca5 irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL
INVALL CMD specifies that the ITS must ensure any caching associated with
the interrupt collection defined by ICID is consistent with the LPI
configuration tables held in memory for all Redistributors. SYNC is
required to ensure that INVALL is executed.

Currently, LPI configuration data may be inconsistent with that in the
memory within a short period of time after the INVALL command is executed.

Signed-off-by: Wudi Wang <wangwudi@hisilicon.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: cc2d3216f53c ("irqchip: GICv3: ITS command queue")
Link: https://lore.kernel.org/r/20211208015429.5007-1-zhangshaokun@hisilicon.com
2021-12-08 11:13:18 +00:00
Vitaly Kuznetsov
250552b925 KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened
VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which
are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is
updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from
clean fields to make Hyper-V aware of the change. For KVM's L1s, this is
done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr().
MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending
MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't
check if the resulting bitmap is different and never cleans
HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and
may result in Hyper-V missing the update.

The issue could've been solved by calling evmcs_touch_msr_bitmap() for
eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so
would not give any performance benefits (compared to not using Enlightened
MSR Bitmap at all). 3-level nesting is also not a very common setup
nowadays.

Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for
now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211129094704.326635-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:53:13 -05:00
Kelly Devilliv
a0793fdad9 csky: fix typo of fpu config macro
Fix typo which will cause fpe and privilege exception error.

Signed-off-by: Kelly Devilliv <kelly.devilliv@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2021-12-08 14:15:54 +08:00
Joakim Zhang
b5bd95d171 net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
Background:
We have a customer is running a Profinet stack on the 8MM which receives and
responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from
time to time the received PNIO-CM package is "stock" and is only handled when
receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and
the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from
the expected 40 ms and not the 2s delay of the DCERPC-Ping).

After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would
be handled by different RX queues.

The root cause should be driver ack all queues' interrupt when handle a
specific queue in fec_enet_rx_queue(). The blamed patch is introduced to
receive as much packets as possible once to avoid interrupt flooding.
But it's unreasonable to clear other queues'interrupt when handling one
queue, this patch tries to fix it.

Fixes: ed63f1dcd578 (net: fec: clear receive interrupts before processing a packet)
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Reported-by: Nicolas Diaz <nicolas.diaz@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:39:39 -08:00
Jakub Kicinski
65af674a59 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-12-06

This series contains updates to iavf and i40e drivers.

Mitch adds restoration of MSI state during reset for iavf.

Michal fixes checking and reporting of descriptor count changes to
communicate changes and/or issues for iavf.

Karen resolves an issue with failed handling of VF requests while a VF
reset is occurring for i40e.

Mateusz removes clearing of VF requested queue count when configuring
VF ADQ for i40e.

Norbert fixes a NULL pointer dereference that can occur when getting VSI
descriptors for i40e.
====================

Link: https://lore.kernel.org/r/20211206183519.2733180-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:33:12 -08:00
Jakub Kicinski
9e8926888c Merge branch 'net-phy-fix-doc-build-warning'
Yanteng Si says:

====================
net: phy: Fix doc build warnings
====================

Link: https://lore.kernel.org/r/cover.1638776933.git.siyanteng@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si
c35e8de704 net: phy: Add the missing blank line in the phylink_suspend comment
Fix warning as:

Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1657: WARNING: Unexpected indentation.
Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1658: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si
a97770cc40 net: phy: Remove unnecessary indentation in the comments of phy_device
Fix warning as:

linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.

Suggested-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:06 -08:00
Ameer Hamza
e6f60c51f0 gve: fix for null pointer dereference.
Avoid passing NULL skb to __skb_put() function call if
napi_alloc_skb() returns NULL.

Fixes: 37149e9374bf ("gve: Implement packet continuation for RX.")
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Link: https://lore.kernel.org/r/20211205183810.8299-1-amhamza.mgc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:57:17 -08:00
Vincent Whitchurch
51a08bdeca cifs: Fix crash on unload of cifs_arc4.ko
The exit function is wrongly placed in the __init section and this leads
to a crash when the module is unloaded.  Just remove both the init and
exit functions since this module does not need them.

Fixes: 71c02863246167b3d ("cifs: fork arc4 and create a separate module...")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-07 22:38:03 -06:00
Petr Machata
6ebe4b3508 MAINTAINERS: net: mlxsw: Remove Jiri as a maintainer, add myself
Jiri has moved on and will not carry out the mlxsw maintainership duty any
longer. Add myself as a co-maintainer instead.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:35:43 -08:00
Jakub Kicinski
56a271be06 Merge branch 'net-tls-cover-all-ciphers-with-tests'
Vadim Fedorenko says:

====================
net: tls: cover all ciphers with tests

Recent patches to Kernel TLS showed that some ciphers are not covered
with tests. Let's cover missed.
====================

Link: https://lore.kernel.org/r/20211206213932.7508-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:10 -08:00
Vadim Fedorenko
13bf99ab21 selftests: tls: add missing AES256-GCM cipher
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Vadim Fedorenko
d76c51f976 selftests: tls: add missing AES-CCM cipher tests
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Eric Dumazet
802a7dc5cf netfilter: conntrack: annotate data-races around ct->timeout
(struct nf_conn)->timeout can be read/written locklessly,
add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.

BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get

write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
 __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push_pending_frames include/net/tcp.h:1897 [inline]
 tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 __sys_sendto+0x21e/0x2c0 net/socket.c:2036
 __do_sys_sendto net/socket.c:2048 [inline]
 __se_sys_sendto net/socket.c:2044 [inline]
 __x64_sys_sendto+0x74/0x90 net/socket.c:2044
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
 nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
 ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
 __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
 tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
 rds_send_worker+0x43/0x200 net/rds/threads.c:200
 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
 worker_thread+0x616/0xa70 kernel/workqueue.c:2445
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x00027cc2 -> 0x00000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: krdsd rds_send_worker

Note: I chose an arbitrary commit for the Fixes: tag,
because I do not think we need to backport this fix to very old kernels.

Fixes: e37542ba111f ("netfilter: conntrack: avoid possible false sharing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:29:15 +01:00
Florian Westphal
d46cea0e69 selftests: netfilter: switch zone stress to socat
centos9 has nmap-ncat which doesn't like the '-q' option, use socat.
While at it, mark test skipped if needed tools are missing.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:29:15 +01:00
Pablo Neira Ayuso
962e5a4035 netfilter: nft_exthdr: break evaluation if setting TCP option fails
Break rule evaluation on malformed TCP options.

Fixes: 99d1712bc41c ("netfilter: exthdr: tcp option set support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Stefano Brivio
0de53b0ffb selftests: netfilter: Add correctness test for mac,net set type
The existing net,mac test didn't cover the issue recently reported
by Nikita Yushchenko, where MAC addresses wouldn't match if given
as first field of a concatenated set with AVX2 and 8-bit groups,
because there's a different code path covering the lookup of six
8-bit groups (MAC addresses) if that's the first field.

Add a similar mac,net test, with MAC address and IPv4 address
swapped in the set specification.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Stefano Brivio
b7e945e228 nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
The sixth byte of packet data has to be looked up in the sixth group,
not in the seventh one, even if we load the bucket data into ymm6
(and not ymm5, for convenience of tracking stalls).

Without this fix, matching on a MAC address as first field of a set,
if 8-bit groups are selected (due to a small set size) would fail,
that is, the given MAC address would never match.

Reported-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Cc: <stable@vger.kernel.org> # 5.6.x
Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-By: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Nicolas Dichtel
d43b75fbc2 vrf: don't run conntrack on vrf with !dflt qdisc
After the below patch, the conntrack attached to skb is set to "notrack" in
the context of vrf device, for locally generated packets.
But this is true only when the default qdisc is set to the vrf device. When
changing the qdisc, notrack is not set anymore.
In fact, there is a shortcut in the vrf driver, when the default qdisc is
set, see commit dcdd43c41e60 ("net: vrf: performance improvements for
IPv4") for more details.

This patch ensures that the behavior is always the same, whatever the qdisc
is.

To demonstrate the difference, a new test is added in conntrack_vrf.sh.

Fixes: 8c9c296adfae ("vrf: run conntrack only in context of lower/physdev for locally generated packets")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Linus Torvalds
2a987e6502 perf tools fixes for v5.16: 2nd batch
- Fix SMT detection fast read path on sysfs.
 
 - Fix memory leaks when processing feature headers in perf.data files.
 
 - Fix 'Simple expression parser' 'perf test' on arch without CPU die topology
   info, such as s/390.
 
 - Fix building perf with BUILD_BPF_SKEL=1.
 
 - Fix 'perf bench' by reverting "perf bench: Fix two memory leaks detected with ASan".
 
 - Fix itrace space allowed for new attributes in 'perf script'.
 
 - Fix the build feature detection fast path, that was always failing on systems
   with python3 development packages, speeding up the build.
 
 - Reset shadow counts before loading, fixing metrics using duration_time.
 
 - Sync more kernel headers changed by the new futex_waitv syscall: s390 and powerpc.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYa9kXwAKCRCyPKLppCJ+
 J1pSAQDAk1vQa82x5bqq8yHZTeym4Y6QtKivveTmqCagFT7FOwEAmzwOsdv/v3cQ
 GA0u97zPUsTiN1ma/nh+YZTA9IK1nw0=
 =YPeN
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix SMT detection fast read path on sysfs.

 - Fix memory leaks when processing feature headers in perf.data files.

 - Fix 'Simple expression parser' 'perf test' on arch without CPU die
   topology info, such as s/390.

 - Fix building perf with BUILD_BPF_SKEL=1.

 - Fix 'perf bench' by reverting "perf bench: Fix two memory leaks
   detected with ASan".

 - Fix itrace space allowed for new attributes in 'perf script'.

 - Fix the build feature detection fast path, that was always failing on
   systems with python3 development packages, speeding up the build.

 - Reset shadow counts before loading, fixing metrics using
   duration_time.

 - Sync more kernel headers changed by the new futex_waitv syscall: s390
   and powerpc.

* tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf_skel: Do not use typedef to avoid error on old clang
  perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
  perf header: Fix memory leaks when processing feature headers
  perf test: Reset shadow counts before loading
  perf test: Fix 'Simple expression parser' test on arch without CPU die topology info
  tools build: Remove needless libpython-version feature check that breaks test-all fast path
  perf tools: Fix SMT detection fast read path
  tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall
  perf inject: Fix itrace space allowed for new attributes
  tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall
  Revert "perf bench: Fix two memory leaks detected with ASan"
2021-12-07 15:36:45 -08:00
Pavel Begunkov
75feae73a2 block: fix single bio async DIO error handling
BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882
CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
  [...]
  refcount_dec_and_test include/linux/refcount.h:333 [inline]
  iocb_put fs/aio.c:1161 [inline]
  io_submit_one+0x496/0x2fe0 fs/aio.c:1882
  __do_sys_io_submit fs/aio.c:1938 [inline]
  __se_sys_io_submit fs/aio.c:1908 [inline]
  __x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

__blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages()
directly, in which case upper layers won't be expecting ->ki_complete
to be called by the block layer and will terminate the request. However,
there is also bio_endio() leading to a second ->ki_complete and a double
free.

Fixes: 54a88eb838d37 ("block: add single bio async direct IO helper")
Reported-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-07 15:07:40 -07:00
Michal Swiatkowski
de6acd1cdd ice: fix adding different tunnels
Adding filters with the same values inside for VXLAN and Geneve causes HW
error, because it looks exactly the same. To choose between different
type of tunnels new recipe is needed. Add storing tunnel types in
creating recipes function and start checking it in finding function.

Change getting open tunnels function to return port on correct tunnel
type. This is needed to copy correct port to dummy packet.

Block user from adding enc_dst_port via tc flower, because VXLAN and
Geneve filters can be created only with destination port which was
previously opened.

Fixes: 8b032a55c1bd5 ("ice: low level support for tunnels")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Michal Swiatkowski
0e32ff0240 ice: fix choosing UDP header type
In tunnels packet there can be two UDP headers:
- outer which for hw should be mark as ICE_UDP_OF
- inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if
  inner header is of TCP type

In none tunnels packet header can be:
- UDP, which for hw should be mark as ICE_UDP_ILOS
- TCP, which for hw should be mark as ICE_TCP_IL

Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS.
ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to
error from hw while adding this kind of recipe.

In summary, for tunnel outer port type should always be set to
ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be
set to ICE_UDP_ILOS.

Fixes: 9e300987d4a8 ("ice: VXLAN and Geneve TC support")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Jesse Brandeburg
28dc1b86f8 ice: ignore dropped packets during init
If the hardware is constantly receiving unicast or broadcast packets
during driver load, the device previously counted many GLV_RDPC (VSI
dropped packets) events during init. This causes confusing dropped
packet statistics during driver load. The dropped packets counter
incrementing does stop once the driver finishes loading.

Avoid this problem by baselining our statistics at the end of driver
open instead of the end of probe.

Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Dave Ertman
6d39ea19b0 ice: Fix problems with DSCP QoS implementation
The patch that implemented DSCP QoS implementation removed a
bandwidth check that was used to check for a specific condition
caused by some corner cases.  This check should not of been
removed.

The same patch also added a check for when the DCBx state could
be changed in relation to DSCP, but the check was erroneously
added nested in a check for CEE mode, which made the check useless.

Fix these problems by re-adding the bandwidth check and relocating
the DSCP mode check earlier in the function that changes DCBx state
in the driver.

Fixes: 2a87bd73e50d ("ice: Add DSCP support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Paul Greenwalt
2657e16d8c ice: rearm other interrupt cause register after enabling VFs
The other interrupt cause register (OICR), global interrupt 0, is
disabled when enabling VFs to prevent handling VFLR. If the OICR is
not rearmed then the VF cannot communicate with the PF.

Rearm the OICR after enabling VFs.

Fixes: 916c7fdf5e93 ("ice: Separate VF VSI initialization/creation from reset flow")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Yahui Cao
f23ab04dd6 ice: fix FDIR init missing when reset VF
When VF is being reset, ice_reset_vf() will be called and FDIR
resource should be released and initialized again.

Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:00 -08:00
Marc Zyngier
8762051268 PCI: apple: Fix PERST# polarity
Now that PERST# is properly defined as active-low in the device tree, fix
the driver to correctly drive the line independently of the implied
polarity.

Suggested-by: Pali Rohár <pali@kernel.org>
Fixes: 1e33888fbe44 ("PCI: apple: Add initial hardware bring-up")
Link: https://lore.kernel.org/r/20211123180636.80558-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
2021-12-07 14:27:29 -06:00
Marc Zyngier
5b970dfcfe arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT
As the name indicates, PERST# is active low. Fix the DT description to
match the HW behaviour.

Fixes: ff2a8d91d80c ("arm64: apple: Add PCIe node")
Link: https://lore.kernel.org/r/20211123180636.80558-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Reviewed-by: Mark Kettenis <kettenis@openbsd.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2021-12-07 14:27:07 -06:00
Dan Carpenter
2d4fcc5ab3 clk: versatile: clk-icst: use after free on error path
This frees "name" and then tries to display in as part of the error
message on the next line.  Swap the order.

Fixes: 1b2189f3aa50 ("clk: versatile: clk-icst: Ensure clock names are unique")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211117072604.GC5237@kili
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-12-07 12:25:29 -08:00
Dan Carpenter
d17b9737c2 net/qla3xxx: fix an error code in ql_adapter_up()
The ql_wait_for_drvr_lock() fails and returns false, then this
function should return an error code instead of returning success.

The other problem is that the success path prints an error message
netdev_err(ndev, "Releasing driver lock\n");  Delete that and
re-order the code a little to make it more clear.

Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211207082416.GA16110@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 10:37:10 -08:00
Jakub Kicinski
2a62df3692 linux-can-fixes-for-5.16-20211207
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmGvM24THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqZL8CACmYXCIKdLxumCmfa7z61r6Y839grFl
 weBo1dgrg/FIIyxro0EEmGi3ZRGrVcQNyQOQKl9xtt8FFmkrl5yCioQ5N+ib/Qt7
 6BhMN0kdaWQDzgyIT5BT8Ba/13S4Hpjb7baDBf+Rqw7WemeX2hni8Dx4WxflfMbo
 lpxuRyDtUvndUHVzATkbB8TLsmB50wdTinzZkY3IV8bAhLcznQ2vYvV4HblbeNlA
 BtEEtAjsR7zFfyqDmxOIOdqMD4m4vjUnaOoT6KQznNIy1EYxFgX7VFNAp3DeGaX3
 bM4CTEbVm980hPgm8tjFL0p6BExWWR9q6lR/x41O/P2cQ2PKNDSuBnuo
 =4e0K
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
can 2021-12-07

The 1st patch is by Vincent Mailhol and fixes a use after free in the
pch_can driver.

Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver.

The remaining 7 patches target the m_can driver. Brian Silverman
contributes a patch to disable and ignore the ELO interrupt, which is
currently not handled in the driver and may lead to an interrupt
storm. Vincent Mailhol's patch fixes a memory leak in the error path
of the m_can_read_fifo() function. The remaining patches are
contributed by Matthias Schiffer, first a iomap_read_fifo() and
iomap_write_fifo() functions are fixed in the PCI glue driver, then
the clock rate for the Intel Ekhart Lake platform is fixed, the last 3
patches add support for the custom bit timings on the Elkhart Lake
platform.

* tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: m_can: pci: use custom bit timings for Elkhart Lake
  can: m_can: make custom bittiming fields const
  Revert "can: m_can: remove support for custom bit timing"
  can: m_can: pci: fix incorrect reference clock rate
  can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
  can: m_can: m_can_read_fifo: fix memory leak in error branch
  can: m_can: Disable and ignore ELO interrupt
  can: sja1000: fix use after free in ems_pcmcia_add_card()
  can: pch_can: pch_can_rx_normal: fix use after free
====================

Link: https://lore.kernel.org/r/20211207102420.120131-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 10:32:05 -08:00
Darrick J. Wong
089558bc7b xfs: remove all COW fork extents when remounting readonly
As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose.  Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:

mount <filesystem>				# (0)
fsstress <run only readonly ops> &		# (1)
while true; do
	fsstress <run all ops>
	mount -o remount,ro			# (2)
	fsstress <run only readonly ops>
	mount -o remount,rw			# (3)
done

When (2) happens, notice that (1) is still running.  xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).

When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.

The results are catastrophic -- file (B) and the refcount btree are now
corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
synchronously, which causes xfs_icwalk to return to inodes that were
skipped because the blockgc code couldn't take the IOLOCK.  This is safe
to do here because the VFS has already prohibited new writer threads.

Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-12-07 10:17:29 -08:00
Linus Torvalds
957232439c platform-drivers-x86 for v5.16-3
Various bug-fixes and hardware-id additions.
 
 The following is an automated git shortlog grouped by driver:
 
 amd-pmc:
  -  Fix s2idle failures on certain AMD laptops
 
 lg-laptop:
  -  Recognize more models
 
 platform/x86/intel:
  -  hid: add quirk to support Surface Go 3
 
 thinkpad_acpi:
  -  Add lid_logo_dot to the list of safe LEDs
  -  Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
 
 touchscreen_dmi:
  -  Add TrekStor SurfTab duo W1 touchscreen info
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmGvjMIUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wGywf/XJebHMeSVLe8udb4RMatwHpEsyVK
 CvrV4G0FXI5O3GNdaDOU8nlt71zeEaDUkOj3w/cbD9Q/NhrDEgmg3XfZpFZFwNMX
 JELy4VkcVypFWbiAfzh8bCdBJF22gRDc4e8nTsvj31MHC+qOuzHliH15KEMHlz2S
 IbOkc5qHaL4m5oEPnUFj7O/DsSxlSEhO8PVpfN7fOYf0aBm5cuC9yZD/0bw93wHk
 DtnVKzIVXrbItxnw/NJGD0eoBtHemY8u37JDR0TMyR9ekWqQCmdWUf1DibuVCDRl
 XjitIfC5oYMrJwbJKd0kKn8oVvXV07zTFRLPXw9kAXHAqMYXKU6wnt1j4g==
 =T9dH
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Various bug-fixes and hardware-id additions"

* tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel: hid: add quirk to support Surface Go 3
  platform/x86: amd-pmc: Fix s2idle failures on certain AMD laptops
  platform/x86: touchscreen_dmi: Add TrekStor SurfTab duo W1 touchscreen info
  platform/x86: lg-laptop: Recognize more models
  platform/x86: thinkpad_acpi: Add lid_logo_dot to the list of safe LEDs
  platform/x86: thinkpad_acpi: Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
2021-12-07 10:10:20 -08:00
Tatyana Nikolova
10467ce09f RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
Completion events (CEs) are lost if the application is allowed to arm the
CQ more than two times when no new CE for this CQ has been generated by
the HW.

Check if arming has been done for the CQ and if not, arm the CQ for any
event otherwise promote to arm the CQ for any event only when the last arm
event was solicited.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Shiraz Saleem
25b5d6fd6d RDMA/irdma: Report correct WC errors
Return IBV_WC_REM_OP_ERR for responder QP errors instead of
IBV_WC_REM_ACCESS_ERR.

Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes

Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Christophe JAILLET
117697cc93 RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in
'pchunk->sizeofbitmap'.

When it is allocated, the size (in bytes) is computed by:
   size_in_bits >> 3

There are 2 issues (numbers bellow assume that longs are 64 bits):
   - there is no guarantee here that 'pchunk->bitmapmem.size' is modulo
     BITS_PER_LONG but bitmaps are stored as longs
     (sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long))

   - the number of bytes is computed with a shift, not a round up, so we
     may allocate less memory than needed
     (sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2
     longs are needed = 16 bytes)

Fix both issues by using 'bitmap_zalloc()' and remove the useless
'bitmapmem' from 'struct irdma_chunk'.

While at it, remove some useless NULL test before calling
kfree/bitmap_free.

Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://lore.kernel.org/r/5e670b640508e14b1869c3e8e4fb970d78cbe997.1638692171.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:45:48 -04:00
Shiraz Saleem
1e11a39a82 RDMA/irdma: Fix a user-after-free in add_pble_prm
When irdma_hmc_sd_one fails, 'chunk' is freed while its still on the PBLE
info list.

Add the chunk entry to the PBLE info list only after successful setting of
the SD in irdma_hmc_sd_one.

Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Link: https://lore.kernel.org/r/20211207152135.2192-1-shiraz.saleem@intel.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:43:52 -04:00
Mike Marciniszyn
60a8b5a161 IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
This buffer is currently allocated in hfi1_init():

	if (reinit)
		ret = init_after_reset(dd);
	else
		ret = loadtime_init(dd);
	if (ret)
		goto done;

	/* allocate dummy tail memory for all receive contexts */
	dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
							 sizeof(u64),
							 &dd->rcvhdrtail_dummy_dma,
							 GFP_KERNEL);

	if (!dd->rcvhdrtail_dummy_kvaddr) {
		dd_dev_err(dd, "cannot allocate dummy tail memory\n");
		ret = -ENOMEM;
		goto done;
	}

The reinit triggered path will overwrite the old allocation and leak it.

Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation
to hfi1_free_devdata().

Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 46b010d3eeb8 ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
f6a3cfec3c IB/hfi1: Fix early init panic
The following trace can be observed with an init failure such as firmware
load failures:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0010 [#1] SMP PTI
  CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G           OE    --------- -  - 4.18.0-240.el8.x86_64 #1
  Workqueue: events work_for_cpu_fn
  RIP: 0010:0x0
  Code: Bad RIP value.
  RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00
  RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100
  R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180
  R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000
  FS:  0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0
  Call Trace:
   receive_context_interrupt+0x1f/0x40 [hfi1]
   __free_irq+0x201/0x300
   free_irq+0x2e/0x60
   pci_free_irq+0x18/0x30
   msix_free_irq.part.2+0x46/0x80 [hfi1]
   msix_clean_up_interrupts+0x2b/0x70 [hfi1]
   hfi1_init_dd+0x640/0x1a90 [hfi1]
   do_init_one.isra.19+0x34d/0x680 [hfi1]
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x35/0x40

The free_irq() results in a callback to the registered interrupt handler,
and rcd->do_interrupt is NULL because the receive context data structures
are not fully initialized.

Fix by ensuring that the do_interrupt is always assigned and adding a
guards in the slow path handler to detect and handle a partially
initialized receive context and noop the receive.

Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const array")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
b6d57e24ce IB/hfi1: Insure use of smp_processor_id() is preempt disabled
The following BUG has just surfaced with our 5.16 testing:

  BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081
  caller is sdma_select_user_engine+0x72/0x210 [hfi1]
  CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S                5.16.0-rc1+ #1
  Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  Call Trace:
   <TASK>
   dump_stack_lvl+0x33/0x42
   check_preemption_disabled+0xbf/0xe0
   sdma_select_user_engine+0x72/0x210 [hfi1]
   ? _raw_spin_unlock_irqrestore+0x1f/0x31
   ? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1]
   hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1]
   ? hfi1_write_iter+0xb8/0x200 [hfi1]
   hfi1_write_iter+0xb8/0x200 [hfi1]
   do_iter_readv_writev+0x163/0x1c0
   do_iter_write+0x80/0x1c0
   vfs_writev+0x88/0x1a0
   ? recalibrate_cpu_khz+0x10/0x10
   ? ktime_get+0x3e/0xa0
   ? __fget_files+0x66/0xa0
   do_writev+0x65/0x100
   do_syscall_64+0x3a/0x80

Fix this long standing bug by moving the smp_processor_id() to after the
rcu_read_lock().

The rcu_read_lock() implicitly disables preemption.

Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
9292f8f9a2 IB/hfi1: Correct guard on eager buffer deallocation
The code tests the dma address which legitimately can be 0.

The code should test the kernel logical address to avoid leaking eager
buffer allocations that happen to map to a dma address of 0.

Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled")
Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:53 -04:00
Ruozhu Li
8b77fa6fdc nvme: fix use after free when disconnecting a reconnecting ctrl
A crash happens when trying to disconnect a reconnecting ctrl:

 1) The network was cut off when the connection was just established,
    scan work hang there waiting for some IOs complete.  Those I/Os were
    retried because we return BLK_STS_RESOURCE to blk in reconnecting.
 2) After a while, I tried to disconnect this connection.  This
    procedure also hangs because it tried to obtain ctrl->scan_lock.
    It should be noted that now we have switched the controller state
    to NVME_CTRL_DELETING.
 3) In nvme_check_ready(), we always return true when ctrl->state is
    NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
    device which was already freed.

To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live.  If not, return host path error to
the block layer

Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-07 18:21:16 +01:00