1058279 Commits

Author SHA1 Message Date
Maxim Levitsky
af957eebfc KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist.  When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry.  A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.

However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set.  The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter.  That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:42 -05:00
David Woodhouse
964b7aa0b0 KVM: Fix steal time asm constraints
In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
the [abcd] registers. For which, GCC has the "q" constraint instead of
the less restrictive "r".

Also fix st->preempted, which is an input/output operand rather than an
input.

Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:41 -05:00
Paul Durrant
dc23a5110b cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'
The lack a static declaration currently results in:

arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features'

when compiling with "W=1".

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 760849b1476c ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20211115144131.5943-1-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18 02:03:14 -05:00
Dan Carpenter
a280ef90af octeontx2-af: debugfs: don't corrupt user memory
The user supplies the "count" value to say how big its read buffer is.
The rvu_dbg_lmtst_map_table_display() function does not take the "count"
into account but instead just copies the whole table, potentially
corrupting the user's data.

Introduce the "ret" variable to store how many bytes we can copy.  Also
I changed the type of "off" to size_t to make using min() simpler.

Fixes: 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211117073454.GD5237@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17 20:25:40 -08:00
Lin Ma
48b71a9e66 NFC: add NCI_UNREG flag to eliminate the race
There are two sites that calls queue_work() after the
destroy_workqueue() and lead to possible UAF.

The first site is nci_send_cmd(), which can happen after the
nci_close_device as below

nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
    flush_workqueue          |
    del_timer_sync           |
  nci_unregister_device      |    nfc_get_device
    destroy_workqueue        |    nfc_dev_up
    nfc_unregister_device    |      nci_dev_up
      device_del             |        nci_open_device
                             |          __nci_request
                             |            nci_send_cmd
                             |              queue_work !!!

Another site is nci_cmd_timer, awaked by the nci_cmd_work from the
nci_send_cmd.

  ...                        |  ...
  nci_unregister_device      |  queue_work
    destroy_workqueue        |
    nfc_unregister_device    |  ...
      device_del             |  nci_cmd_work
                             |  mod_timer
                             |  ...
                             |  nci_cmd_timer
                             |    queue_work !!!

For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17 20:17:05 -08:00
Lin Ma
3e3b5dfcd1 NFC: reorder the logic in nfc_{un,}register_device
There is a potential UAF between the unregistration routine and the NFC
netlink operations.

The race that cause that UAF can be shown as below:

 (FREE)                      |  (USE)
nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
  nci_unregister_device      |    nfc_get_device
    nfc_unregister_device    |    nfc_dev_up
      rfkill_destory         |
      device_del             |      rfkill_blocked
  ...                        |    ...

The root cause for this race is concluded below:
1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after
the device_is_registered check.
2. Since the netlink operations are possible just after the device_add
in nfc_register_device, the nfc_dev_up() can happen anywhere during the
rfkill creation process, which leads to data race.

This patch reorder these actions to permit
1. Once device_del is finished, the nfc_dev_up cannot dereference the
rfkill object.
2. The rfkill_register need to be placed after the device_add of nfc_dev
because the parent device need to be created first. So this patch keeps
the order but inject device_lock to prevent the data race.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Fixes: be055b2f89b5 ("NFC: RFKILL support")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17 20:17:05 -08:00
Lin Ma
86cdf8e387 NFC: reorganize the functions in nci_request
There is a possible data race as shown below:

thread-A in nci_request()       | thread-B in nci_close_device()
                                | mutex_lock(&ndev->req_lock);
test_bit(NCI_UP, &ndev->flags); |
...                             | test_and_clear_bit(NCI_UP, &ndev->flags)
mutex_lock(&ndev->req_lock);    |
                                |

This race will allow __nci_request() to be awaked while the device is
getting removed.

Similar to commit e2cb6b891ad2 ("bluetooth: eliminate the potential race
condition when removing the HCI controller"). this patch alters the
function sequence in nci_request() to prevent the data races between the
nci_close_device().

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Link: https://lore.kernel.org/r/20211115145600.8320-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17 20:16:53 -08:00
Tadeusz Struk
3e6db07975 tipc: check for null after calling kmemdup
kmemdup can return a null pointer so need to check for it, otherwise
the null key will be dereferenced later in tipc_crypto_key_xmit as
can be seen in the trace [1].

Cc: tipc-discussion@lists.sourceforge.net
Cc: stable@vger.kernel.org # 5.15, 5.14, 5.10

[1] https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a58

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20211115160143.5099-1-tadeusz.struk@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-17 20:04:52 -08:00
Grzegorz Szczurek
5aff430d4e i40e: Fix display error code in dmesg
Fix misleading display error in dmesg if tc filter return fail.
Only i40e status error code should be converted to string, not linux
error code. Otherwise, we return false information about the error.

Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 16:20:57 -08:00
Jedrzej Jagielski
2e6d218c1e i40e: Fix creation of first queue by omitting it if is not power of two
Reject TCs creation with proper message if the first queue
assignment is not equal to the power of two.
The first queue number was checked too late in the second queue
iteration, if second queue was configured at all. Now if first queue value
is not a power of two, then trying to create qdisc will be rejected.

Fixes: 8f88b3034db3 ("i40e: Add infrastructure for queue channel support")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 16:20:57 -08:00
Karen Sornek
3a3b311e38 i40e: Fix warning message and call stack during rmmod i40e driver
Restore part of reset functionality used when reset is called
from the VF to reset itself. Without this fix warning message
is displayed when VF is being removed via sysfs.

Fix the crash of the VF during reset by ensuring
that the PF receives the reset message successfully.
Refactor code to use one function instead of two.

Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 16:20:50 -08:00
Linus Torvalds
42eb8fdac2 Fixes in gfs2:
* The current iomap_file_buffered_write behavior of failing the entire
   write when part of the user buffer cannot be faulted in leads to an
   endless loop in gfs2.  Work around that in gfs2 for now.
 * Various other bugs all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmGVjogUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTojHA/7BvjlQKdDG84pbgFwWa2JBHgkfB5u
 ayd4gBR9mq3WwOzMO7Wp9dUmUgDYjjj2ZJZkExHYgla+52ScRJAbcTk0VkNpSgy6
 pOAZmPpTjmj4KGUghAZuUUqm55S5lpB6A2VciscW0GFGWE7kEOOcs/Gw5heSTpHW
 zSLM2cQZ303kqd34KE747DchG6ruwlzsRms8qxwaV6Qe5ZyYx6xSBoL7bJ+VVqdF
 Ujn3KEpFd5+aRkffwyypltkIsMh1yQceOCoTOp5G3U8zwl+bT8hhpY6qMvtMtT90
 Pl5A7abpM7kGzdbdi3z+RtJK3DwmDgtfFrP6z4LTa4QJtUE8Dn7akIPH5v2E1ajI
 vpQVepIS+ryn614nqCyXgGed975gFqNNgZheqiZq2+mQMsAO/fxxI4jMYXpB2Hyi
 k4PoFm7K+Gcn8ARkuhlbZiZoo8CldmorZz7lVthyb3hNNzg2EzMK/K14KAz7F7W9
 ndcXLnMZdxS3R4wlGVKtfLi/E9+VsM0MPTsfvsJtubq/lfu5d2G47FoQvsYFRRDW
 icCzMFC+MshFS+GurSYHGni3x11C6M6wsL7m+YvkPLkDC4Jn4Ok9mdf77DiaHy2L
 znc0w32FHSQHQZuYN+Fakr8K+W5k0XGkHeJpVzRglkQf2qH0hClqXnzmgbt60ECu
 2rxg2KuSsEiYtrU=
 =/Vqx
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - The current iomap_file_buffered_write behavior of failing the entire
   write when part of the user buffer cannot be faulted in leads to an
   endless loop in gfs2. Work around that in gfs2 for now.

 - Various other bugs all over the place.

* tag 'gfs2-v5.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Prevent endless loops in gfs2_file_buffered_write
  gfs2: Fix "Introduce flag for glock holder auto-demotion"
  gfs2: Fix length of holes reported at end-of-file
  gfs2: release iopen glock early in evict
  gfs2: Fix atomic bug in gfs2_instantiate
  gfs2: Only dereference i->iov when iter_is_iovec(i)
2021-11-17 15:55:07 -08:00
Linus Torvalds
3fa595481b - wire futex_waitv syscall
- build fixes for lantiq and bcm63xx configs
 - yamon-dt bugfix
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmGVfWsaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAOkg//e3wmqwno/WX5ekkLH9TU
 LEWe6mzqNMtjYjrebQJdCPPpfuN4AvNFSigL1nOg+YStTslEkiFSvJPMJ4f9BsKj
 6oJmOn/gkbeHv4/v3I2jV0kJwW1sUYDMvqSY6KIkS7aFCIuAlkt8WHze9RB7545N
 4Qmcqqcyd1OEI4rhGbR/JrIYosp3fqUg30WUQk3Z6bfYqrPdD3w38oWdx8Fe42p5
 wI76fmX68pS8l5iH1A/z0AofdtDAxYe7vPjKrlZR+U2ypJzbG2FOj1S74+pFKlss
 ONGgXY7DVjEsgZKacisDJ43u2XOLw77CGfN3g+150KVEpwBuCqXUGV9Fz4mKjsjl
 LwtcJ1VJRcHWHN3aPF07ZP+13pyX7R7WFikpMW3r+UteyvsmEXJtWElLhlev48EG
 WzUTJk+A8kVntUwcYiGNbLGs6psfxdFx1mEQQmSRYzhukACmESyCoPVf9dj16trl
 cwEzQdKv0M1Q/Dyaz17hrOi8TC4g6ScvSv0Cu3vHczSUSXgr/+71KBFdXP3HUXEd
 AG7oZIJe8hFaOCx7wP/3DmNV6NZlE/xcWQl9/aIqNBpAC3Bw5sjEcXAW/PfzV4iZ
 fr9d9Gn+Xze4McHD4uDIqRRMf40dXuPzFsxgXirzBNv0wzX2Lq4FHeirR1pHYQh4
 AlzD1ddJxIvNPC4Zu4M5IUY=
 =aV5f
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - wire futex_waitv syscall

 - build fixes for lantiq and bcm63xx configs

 - yamon-dt bugfix

* tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: lantiq: add support for clk_get_parent()
  mips: bcm63xx: add support for clk_get_parent()
  MIPS: generic/yamon-dt: fix uninitialized variable error
  MIPS: syscalls: Wire up futex_waitv syscall
2021-11-17 15:12:50 -08:00
Linus Torvalds
ee1703cda8 hyperv-fixes for 5.16-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmGU6QcTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXvOnCACRIRRinMFwCmPn69nzVTAhffY8mzGu
 9PQrYPXPumvX/U+YAoXol1M+ArjasV/9BJKWY2NNS91pE9OdwYe+2m92tTsotE95
 qr2hWXP1X6/OCSnuRVmnp3x+VasL83kconerhICUcsOqJLnYBkGO4JDPvDPAxxFt
 CDuTgrf8zMS1X4jIHdZpDlJ4UT9Rc1lDSyHpbx2BC8XDzeZlOttST/S92yvSKNCx
 I70z4I89g2HR1cmsV/DXZiOnauDh7kS2alBpVkPpKSmwpYZ4lWyMrXLQVISRJOJb
 k7zG655kSp82grN33RaINk+ajLqHP+OhRA7UOZNYTkngIJ9QC8nCDcgq
 =Q8K1
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix ring size calculation for balloon driver (Boqun Feng)

 - Fix issues in Hyper-V setup code (Sean Christopherson)

* tag 'hyperv-fixes-signed-20211117' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Move required MSRs check to initial platform probing
  x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
  Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
2021-11-17 08:46:15 -08:00
Linus Torvalds
ef1d8dda23 This is just one bugfix for a bufferflow in knfsd's xdr decoding.
-----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGS3msVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+wCMQAIm7hCuZ7bNtJwdgabv/z3u9Cgre
 2lBuFN5edrymgahKERBnA5bZCQGEnG/yVG6w69nB8LnqWphN2caG5ln17kSjfFCO
 W/J7FBH9u7662GQhmxqZyrNVm/Td0vgyKH0uh2RTiaitN0JrPg+4gjAWOPUPq53I
 lYVgm20Aj3LkH83MEwwp6K2u3pqJ5y+pqfDv6ROX6/HkPV+7yczleWLafB/EYQrs
 zX4vSyrR7aLjJ5ZEFz4rokcsemq1iI4eqBr6fiwSZwIDbRBwPFdIlQTwuww4PGSW
 ingM4y/RU3okUXV5exchex7ffzmPi8IvkTBOdn0RicHRcbm9f0Rky6wXiASJLTqu
 QURh+rsvupfrHnLQ/b1bJtOrSJCdJXdidw8bA7vrpsmpatImnS+u+iWO9RkesL+g
 sVdQJV+0ZmOtyLTvw6xpRtXXcpMaJvUksmtiHvySBZot9waX03X/h7TFEmtx+P3E
 k0znywn9Ebu5d7X8vBwwDqq9f7Xe7pzo7zALkMeC1qXULIzCzYmTuvSTm870Cz9S
 JPh0ojuYJvlvoKNkjYfRKRyE28VLZe/hEwtVWL+kgQD8zR8gPlQNpAcBRJ++Ett+
 fwjffdAQ/rfb43W2T5AhoSK1173pu0AWSVEH9h1h604HDvf3iCCw0Wew3JsnJYCS
 +tcj/6NetJG64aHe
 =beJ8
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from Bruce Fields:
 "This is just one bugfix for a buffer overflow in knfsd's xdr decoding"

* tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux:
  NFSD: Fix exposure in nfsd4_decode_bitmap()
2021-11-17 08:38:00 -08:00
Rafael J. Wysocki
3b2b49e6df Revert "ACPI: scan: Release PM resources blocked by unused objects"
Revert commit c10383e8ddf4 ("ACPI: scan: Release PM resources blocked
by unused objects"), because it causes boot issues to appear on some
platforms.

Reported-by: Kyle D. Pelton <kyle.d.pelton@intel.com>
Reported-by: Saranya Gopal <saranya.gopal@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-17 17:05:41 +01:00
Eryk Rybak
9e0a603cb7 i40e: Fix ping is lost after configuring ADq on VF
Properly reconfigure VF VSIs after VF request ADQ.
Created new function to update queue mapping and queue pairs per TC
with AQ update VSI. This sets proper RSS size on NIC.
VFs num_queue_pairs should not be changed during setup of queue maps.
Previously, VF main VSI in ADQ had configured too many queues and had
wrong RSS size, which lead to packets not being consumed and drops in
connectivity.

Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use")
Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 08:05:06 -08:00
Eryk Rybak
d2a69fefd7 i40e: Fix changing previously set num_queue_pairs for PFs
Currently, the i40e_vsi_setup_queue_map is basing the count of queues in
TCs on a VSI's alloc_queue_pairs member which is not changed throughout
any user's action (for example via ethtool's set_channels callback).

This implies that vsi->tc_config.tc_info[n].qcount value that is given
to the kernel via netdev_set_tc_queue() that notifies about the count of
queues per particular traffic class is constant even if user has changed
the total count of queues.

This in turn caused the kernel warning after setting the queue count to
the lower value than the initial one:

$ ethtool -l ens801f0
Channel parameters for ens801f0:
Pre-set maximums:
RX:             0
TX:             0
Other:          1
Combined:       64
Current hardware settings:
RX:             0
TX:             0
Other:          1
Combined:       64

$ ethtool -L ens801f0 combined 40

[dmesg]
Number of in use tx queues changed invalidating tc mappings. Priority
traffic classification disabled!

Reason was that vsi->alloc_queue_pairs stayed at 64 value which was used
to set the qcount on TC0 (by default only TC0 exists so all of the
existing queues are assigned to TC0). we update the offset/qcount via
netdev_set_tc_queue() back to the old value but then the
netif_set_real_num_tx_queues() is using the vsi->num_queue_pairs as a
value which got set to 40.

Fix it by using vsi->req_queue_pairs as a queue count that will be
distributed across TCs. Do it only for non-zero values, which implies
that user actually requested the new count of queues.

For VSIs other than main, stay with the vsi->alloc_queue_pairs as we
only allow manipulating the queue count on main VSI.

Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use")
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Co-developed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 08:05:06 -08:00
Michal Maloszewski
37d9e304ac i40e: Fix NULL ptr dereference on VSI filter sync
Remove the reason of null pointer dereference in sync VSI filters.
Added new I40E_VSI_RELEASING flag to signalize deleting and releasing
of VSI resources to sync this thread with sync filters subtask.
Without this patch it is possible to start update the VSI filter list
after VSI is removed, that's causing a kernel oops.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Reviewed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Reviewed-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com>
Reviewed-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 08:05:05 -08:00
Eryk Rybak
6afbd7b3c5 i40e: Fix correct max_pkt_size on VF RX queue
Setting VLAN port increasing RX queue max_pkt_size
by 4 bytes to take VLAN tag into account.
Trigger the VF reset when setting port VLAN for
VF to renegotiate its capabilities and reinitialize.

Fixes: ba4e003d29c1 ("i40e: don't hold spinlock while resetting VF")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-17 08:05:05 -08:00
Łukasz Stelmach
c366ce2875 net: ax88796c: use bit numbers insetad of bit masks
Change the values of EVENT_* constants from bit masks to bit numbers as
accepted by {clear,set,test}_bit() functions.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17 14:51:40 +00:00
Jonathan Davies
cf9acc90c8 net: virtio_net_hdr_to_skb: count transport header in UFO
virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type
correctly for UFO packets received via virtio-net that are a little over
the GSO size. This can lead to problems elsewhere in the networking
stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is
not set.

This is due to the comparison

  if (skb->len - p_off > gso_size)

not properly accounting for the transport layer header.

p_off includes the size of the transport layer header (thlen), so
skb->len - p_off is the size of the TCP/UDP payload.

gso_size is read from the virtio-net header. For UFO, fragmentation
happens at the IP level so does not need to include the UDP header.

Hence the calculation could be comparing a TCP/UDP payload length with
an IP payload length, causing legitimate virtio-net packets to have
lack gso_type/gso_size information.

Example: a UDP packet with payload size 1473 has IP payload size 1481.
If the guest used UFO, it is not fragmented and the virtio-net header's
flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with
gso_size = 1480 for an MTU of 1500.  skb->len will be 1515 and p_off
will be 42, so skb->len - p_off = 1473.  Hence the comparison fails, and
shinfo->gso_size and gso_type are not set as they should be.

Instead, add the UDP header length before comparing to gso_size when
using UFO. In this way, it is the size of the IP payload that is
compared to gso_size.

Fixes: 6dd912f82680 ("net: check untrusted gso_size at kernel entry")
Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17 14:50:45 +00:00
Pavel Skripkin
9b5a333272 net: dpaa2-eth: fix use-after-free in dpaa2_eth_remove
Access to netdev after free_netdev() will cause use-after-free bug.
Move debug log before free_netdev() call to avoid it.

Fixes: 7472dd9f6499 ("staging: fsl-dpaa2/eth: Move print message")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17 14:47:07 +00:00
Aaron Ma
f77b83b5bb net: usb: r8152: Add MAC passthrough support for more Lenovo Docks
Like ThinkaPad Thunderbolt 4 Dock, more Lenovo docks start to use the original
Realtek USB ethernet chip ID 0bda:8153.

Lenovo Docks always use their own IDs for usb hub, even for older Docks.
If parent hub is from Lenovo, then r8152 should try MAC passthrough.
Verified on Lenovo TBT3 dock too.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17 14:46:19 +00:00
Mauro Carvalho Chehab
b96ff02ab2 Documentation/process: fix a cross reference
The cross-reference for the handbooks section works. However, it is
meant to describe the path inside the Kernel's doc where the section
is, but there's an space instead of a dash, plus it lacks the .rst at
the end, which makes:

	./scripts/documentation-file-ref-check

to complain.

Fixes: 604370e106cc ("Documentation/process: Add maintainer handbooks section")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-17 06:12:14 -07:00
Mauro Carvalho Chehab
636e36b19d Documentation: update vcpu-requests.rst reference
Changeset 2f5947dfcaec ("Documentation: move Documentation/virtual to Documentation/virt")
renamed: Documentation/virtual/kvm/vcpu-requests.rst
to: Documentation/virt/kvm/vcpu-requests.rst.

Update its cross-reference accordingly.

Fixes: 2f5947dfcaec ("Documentation: move Documentation/virtual to Documentation/virt")
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-17 06:12:14 -07:00
Mauro Carvalho Chehab
0f60a29c52 docs: accounting: update delay-accounting.rst reference
The file name: accounting/delay-accounting.rst
should be, instead: Documentation/accounting/delay-accounting.rst.

Also, there's no need to use doc:`foo`, as automarkup.py will
automatically handle plain text mentions to Documentation/
files.

So, update its cross-reference accordingly.

Fixes: fcb501704554 ("delayacct: Document task_delayacct sysctl")
Fixes: c3123552aad3 ("docs: accounting: convert to ReST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-17 06:12:14 -07:00
Mauro Carvalho Chehab
1c1c3c7d08 libbpf: update index.rst reference
Changeset d20b41115ad5 ("libbpf: Rename libbpf documentation index file")
renamed: Documentation/bpf/libbpf/libbpf.rst
to: Documentation/bpf/libbpf/index.rst.

Update its cross-reference accordingly.

Fixes: d20b41115ad5 ("libbpf: Rename libbpf documentation index file")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-17 06:12:14 -07:00
David S. Miller
9311ccef27 mlx5-fixes-2021-11-16
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmGUEocACgkQSD+KveBX
 +j6ynwf/UKQ4I+ZexNfgDmINn4qNgGjnOOq8EfaPPBUzUODObmgPHV7fPT8ADX3b
 sLx7CgVGRHnk36XqDhptYrIDEtbfa26epRvr3PGU4FcUmz+PoJprQtWBK2B5PDiT
 nwSmxerEauiji7LxlXx9YmtfpnbeHDFanjJyYbVnuZkqREMP7S/B73KLFnW7WEyU
 EyEKu+oahGx3C4CiF7Fg20w9mP2fujC7pw+og+g6T9dDiF/ynTCf1KZMXq1EwA1M
 +/GS37Lm4VEoCrumiF6HGYmr9gRIFhMXBTaTRYzuz/d7FPxFrtF/88IF0xfmFmLr
 7SNBWjmNnjnCaXNIlY1IbyHh9/P83Q==
 =a7ES
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2021-11-16

Please pull this mlx5 fixes series, or let me know in case of any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17 10:50:53 +00:00
Sven Schnelle
bec05f33eb parisc/sticon: fix reverse colors
sticon_build_attr() checked the reverse argument and flipped
background and foreground color, but returned the non-reverse
value afterwards. Fix this and also add two local variables
for foreground and background color to make the code easier
to read.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-17 11:04:02 +01:00
Christian Brauner
9682197081
fs: handle circular mappings correctly
When calling setattr_prepare() to determine the validity of the attributes the
ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id.
When the {g,u}id attribute of the file isn't altered and the caller's fs{g,u}id
matches the current {g,u}id attribute the attribute change is allowed.

The value in ia_{g,u}id does already account for idmapped mounts and will have
taken the relevant idmapping into account. So in order to verify that the
{g,u}id attribute isn't changed we simple need to compare the ia_{g,u}id value
against the inode's i_{g,u}id value.

This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has a meaning
when circular idmappings are used, i.e. mappings where e.g. id 1000 is mapped
to id 1001 and id 1001 is mapped to id 1000. Such ciruclar mappings can e.g. be
useful when sharing the same home directory between multiple users at the same
time.

As an example consider a directory with two files: /source/file1 owned by
{g,u}id 1000 and /source/file2 owned by {g,u}id 1001. Assume we create an
idmapped mount at /target with an idmapping that maps files owned by {g,u}id
1000 to being owned by {g,u}id 1001 and files owned by {g,u}id 1001 to being
owned by {g,u}id 1000. In effect, the idmapped mount at /target switches the
ownership of /source/file1 and source/file2, i.e. /target/file1 will be owned
by {g,u}id 1001 and /target/file2 will be owned by {g,u}id 1000.

This means that a user with fs{g,u}id 1000 must be allowed to setattr
/target/file2 from {g,u}id 1000 to {g,u}id 1000. Similar, a user with fs{g,u}id
1001 must be allowed to setattr /target/file1 from {g,u}id 1001 to {g,u}id
1001. Conversely, a user with fs{g,u}id 1000 must fail to setattr /target/file1
from {g,u}id 1001 to {g,u}id 1000. And a user with fs{g,u}id 1001 must fail to
setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Both cases must fail
with EPERM for non-capable callers.

Before this patch we could end up denying legitimate attribute changes and
allowing invalid attribute changes when circular mappings are used. To even get
into this situation the caller must've been privileged both to create that
mapping and to create that idmapped mount.

This hasn't been seen in the wild anywhere but came up when expanding the
testsuite during work on a series of hardening patches. All idmapped fstests
pass without any regressions and we add new tests to verify the behavior of
circular mappings.

Link: https://lore.kernel.org/r/20211109145713.1868404-1-brauner@kernel.org
Fixes: 2f221d6f7b88 ("attr: handle idmapped mounts")
Cc: Seth Forshee <seth.forshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-11-17 09:26:09 +01:00
Thomas Gleixner
3751c3d34c net: stmmac: Fix signed/unsigned wreckage
The recent addition of timestamp correction to compensate the CDC error
introduced a subtle signed/unsigned bug in stmmac_get_tx_hwtstamp() while
it managed for some obscure reason to avoid that in stmmac_get_rx_hwtstamp().

The issue is:

    s64 adjust = 0;
    u64 ns;

    adjust += -(2 * (NSEC_PER_SEC / priv->plat->clk_ptp_rate));
    ns += adjust;

works by chance on 64bit, but falls apart on 32bit because the compiler
knows that adjust fits into 32bit and then treats the addition as a u64 +
u32 resulting in an off by ~2 seconds failure.

The RX variant uses an u64 for adjust and does the adjustment via

    ns -= adjust;

because consistency is obviously overrated.

Get rid of the pointless zero initialized adjust variable and do:

	ns -= (2 * NSEC_PER_SEC) / priv->plat->clk_ptp_rate;

which is obviously correct and spares the adjust obfuscation. Aside of that
it yields a more accurate result because the multiplication takes place
before the integer divide truncation and not afterwards.

Stick the calculation into an inline so it can't be accidentally
disimproved. Return an u32 from that inline as the result is guaranteed
to fit which lets the compiler optimize the substraction.

Cc: stable@vger.kernel.org
Fixes: 3600be5f58c1 ("net: stmmac: add timestamp correction to rid CDC sync error")
Reported-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Benedikt Spranger <b.spranger@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel EHL
Link: https://lore.kernel.org/r/87mtm578cs.ffs@tglx
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:49:55 -08:00
Jakub Kicinski
e4ca7823da Merge branch 'net-fix-the-mirred-packet-drop-due-to-the-incorrect-dst'
Xin Long says:

====================
net: fix the mirred packet drop due to the incorrect dst

This issue was found when using OVS HWOL on OVN-k8s. These packets
dropped on rx path were seen with output dst, which should've been
dropped from the skbs when redirecting them.

The 1st patch is to the fix and the 2nd is a selftest to reproduce
and verify it.
====================

Link: https://lore.kernel.org/r/cover.1636734751.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:17:42 -08:00
Davide Caratti
1d127effdc selftests: add a test case for mirred egress to ingress
add a selftest that verifies the correct behavior of TC act_mirred egress
to ingress: in particular, it checks if the dst_entry is removed from skb
before redirect egress -> ingress. The correct behavior is: an ICMP 'echo
request' generated by ping will be received and generate a reply the same
way as the one generated by mausezahn.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:17:38 -08:00
Xin Long
f799ada6bf net: sched: act_mirred: drop dst for the direction from egress to ingress
Without dropping dst, the packets sent from local mirred/redirected
to ingress will may still use the old dst. ip_rcv() will drop it as
the old dst is for output and its .input is dst_discard.

This patch is to fix by also dropping dst for those packets that are
mirred or redirected from egress to ingress in act_mirred.

Note that we don't drop it for the direction change from ingress to
egress, as on which there might be a user case attaching a metadata
dst by act_tunnel_key that would be used later.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:17:38 -08:00
Taehee Yoo
b0024a04e4 amt: cancel delayed_work synchronously in amt_fini()
When the amt module is being removed, it calls cancel_delayed_work()
to cancel pending delayed_work. But this function doesn't wait for
canceling delayed_work.
So, workers can be still doing after module delete.

In order to avoid this, cancel_delayed_work_sync() should be used instead.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Fixes: bc54e49c140b ("amt: add multicast(IGMP) report message handler")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20211116160923.25258-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:14:12 -08:00
Pavel Skripkin
0a83f96f87 MAINTAINERS: remove GR-everest-linux-l2@marvell.com
I've sent a patch to GR-everest-linux-l2@marvell.com few days ago and
got a reply from postmaster@marvell.com:

	Delivery has failed to these recipients or groups:

	gr-everest-linux-l2@marvell.com<mailto:gr-everest-linux-l2@marvell.com>
	The email address you entered couldn't be found. Please check the
	recipient's email address and try to resend the message. If the problem
	continues, please contact your helpdesk.

As requested by Alok Prasad, replacing GR-everest-linux-l2@marvell.com
with Manish Chopra's email address. [0]

Link: https://lore.kernel.org/all/20211116081601.11208-1-palok@marvell.com/ [0]
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20211116141303.32180-1-paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 19:07:01 -08:00
Michael Chan
9f5363916a bnxt_en: Fix compile error regression when CONFIG_BNXT_SRIOV is not set
bp->sriov_cfg is not defined when CONFIG_BNXT_SRIOV is not set.  Fix
it by adding a helper function bnxt_sriov_cfg() to handle the logic
with or without the config option.

Fixes: 46d08f55d24e ("bnxt_en: extend RTNL to VF check in devlink driver_reinit")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1637090770-22835-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 18:58:14 -08:00
Marcin Wojtas
2460386bef net: mvmdio: fix compilation warning
The kernel test robot reported a following issue:

>> drivers/net/ethernet/marvell/mvmdio.c:426:36: warning:
unused variable 'orion_mdio_acpi_match' [-Wunused-const-variable]
   static const struct acpi_device_id orion_mdio_acpi_match[] = {
                                      ^
   1 warning generated.

Fix that by surrounding the variable by appropriate ifdef.

Fixes: c54da4c1acb1 ("net: mvmdio: add ACPI support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20211115153024.209083-1-mw@semihalf.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 18:55:17 -08:00
Jakub Kicinski
f5c741608b Couple of fixes:
* bad dont-reorder check
  * throughput LED trigger for various new(ish) paths
  * radiotap header generation
  * locking assertions in mac80211 with monitor mode
  * radio statistics
  * don't try to access IV when not present
  * call stop_ap for P2P_GO as well as we should
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmGT1iIACgkQB8qZga/f
 l8QDhA/+JeKZeXQ+Ct8kh34xMN4VxCUeGlJNY1aH2mTlJFSwxqJ4XH7eCiSTezXB
 ZEZ46wDLOpUF9epGl4NLzFkR5+Sanpn3LJOMLBDYQwkl3xcvNiQEZ0y4T8PayJRR
 kSs9ina3zxvZvBjCYX1KKdFatZiE+X5qwYyaK8T4ZjNQ4vjQpeX/V3eFpFeMDjuX
 fzIUN27yiESZG4heB2BkyeTR/jb4hG/gCbdGfNxa9MbF0OGJMJkuwH9MNHRQR2cx
 NvEPoq2IcooZfbqWnVk9lxWzRpm8Buio+gth9Knd13AGDl4Mez4+n0KxAEhafjhE
 ljUYekjh+55iSkt7eczW7wjvxmdGC+lfGN7GUop9mw4LU6NvpFCKXW5MqaH8XBy4
 UIkcblXsMKkD8sFbMGzNIl7pT4JGdXkxlorxg7xI5s02nMFppnWJ7mrXQmcBkBG7
 5JLkaXahiQzQYq/VGepCPMYYwKQqbHXts9TPMq/W3aIeR8QxA3uINZ5P9eGF9qn8
 xTcvEQPYvIid1BIorO5/0V/VgE9VNDP1w8aKvn8bJjOmbWfuE68Wx+K6LlHy+E+O
 /MhjiRwnCvvTfdi6vPUjTxis8Xh5e9xfYmcqyRVIQujLmhta31FEJ/q+R+A1A4Rs
 RsesqpLgEOo3rkSciAAfItmRf3SVgJlIuZcfe16XFw0s45oYEck=
 =aD/1
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-net-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Couple of fixes:
 * bad dont-reorder check
 * throughput LED trigger for various new(ish) paths
 * radiotap header generation
 * locking assertions in mac80211 with monitor mode
 * radio statistics
 * don't try to access IV when not present
 * call stop_ap for P2P_GO as well as we should

* tag 'mac80211-for-net-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  mac80211: fix throughput LED trigger
  mac80211: fix monitor_sdata RCU/locking assertions
  mac80211: drop check for DONT_REORDER in __ieee80211_select_queue
  mac80211: fix radiotap header generation
  mac80211: do not access the IV when it was stripped
  nl80211: fix radio statistics in survey dump
  cfg80211: call cfg80211_stop_ap when switch from P2P_GO type
====================

Link: https://lore.kernel.org/r/20211116160845.157214-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 16:53:59 -08:00
Jakub Kicinski
f083ec3160 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-11-16

We've added 12 non-merge commits during the last 5 day(s) which contain
a total of 23 files changed, 573 insertions(+), 73 deletions(-).

The main changes are:

1) Fix pruning regression where verifier went overly conservative rejecting
   previsouly accepted programs, from Alexei Starovoitov and Lorenz Bauer.

2) Fix verifier TOCTOU bug when using read-only map's values as constant
   scalars during verification, from Daniel Borkmann.

3) Fix a crash due to a double free in XSK's buffer pool, from Magnus Karlsson.

4) Fix libbpf regression when cross-building runqslower, from Jean-Philippe Brucker.

5) Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_*() helpers in tracing
   programs due to deadlock possibilities, from Dmitrii Banshchikov.

6) Fix checksum validation in sockmap's udp_read_sock() callback, from Cong Wang.

7) Various BPF sample fixes such as XDP stats in xdp_sample_user, from Alexander Lobakin.

8) Fix libbpf gen_loader error handling wrt fd cleanup, from Kumar Kartikeya Dwivedi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  udp: Validate checksum in udp_read_sock()
  bpf: Fix toctou on read-only map's constant scalar tracking
  samples/bpf: Fix build error due to -isystem removal
  selftests/bpf: Add tests for restricted helpers
  bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
  libbpf: Perform map fd cleanup for gen_loader in case of error
  samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu
  tools/runqslower: Fix cross-build
  samples/bpf: Fix summary per-sec stats in xdp_sample_user
  selftests/bpf: Check map in map pruning
  bpf: Fix inner map state pruning regression.
  xsk: Fix crash on double free in buffer pool
====================

Link: https://lore.kernel.org/r/20211116141134.6490-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16 16:53:48 -08:00
Raed Salem
c4c3176739 net/mlx5: E-Switch, return error if encap isn't supported
On regular ConnectX HCAs getting encap mode isn't supported when the
E-Switch is in NONE mode. Current code would return no error code when
trying to get encap mode in such case which is wrong.

Fix by returning error value to indicate failure to caller in such case.

Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:23 -08:00
Maher Sanalla
ae396d85c0 net/mlx5: Lag, update tracker when state change event received
Currently, In NETDEV_CHANGELOWERSTATE/NETDEV_CHANGEUPPERSTATE events
handling, tracking is not fully completed if the LAG device is not ready
at the time the events occur. But, we must keep track of the upper and
lower states after receiving the events because RoCE needs this info in
mlx5_lag_get_roce_netdev() - in order to return the corresponding port
that its running on. Returning the wrong (not most recent) port will lead
to gids table being incorrect.

For example: If during the attachment of a slave to the bond, the other
non-attached port performs pci_reload, then the LAG device is not ready,
but that should not result in dismissing attached slave tracker update
automatically (which is performed in mlx5_handle_changelowerstate()), Since
these events might not come later, which can lead to both bond ports
having tx_enabled=0 - which is not a valid state of LAG bond.

Fixes: 9b412cc35f00 ("net/mlx5e: Add LAG warning if bond slave is not lag master")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:23 -08:00
Roi Dayan
806401c20a net/mlx5e: CT, Fix multiple allocations and memleak of mod acts
CT clear action offload adds additional mod hdr actions to the
flow's original mod actions in order to clear the registers which
hold ct_state.
When such flow also includes encap action, a neigh update event
can cause the driver to unoffload the flow and then reoffload it.

Each time this happens, the ct clear handling adds that same set
of mod hdr actions to reset ct_state until the max of mod hdr
actions is reached.

Also the driver never releases the allocated mod hdr actions and
causing a memleak.

Fix above two issues by moving CT clear mod acts allocation
into the parsing actions phase and only use it when offloading the rule.
The release of mod acts will be done in the normal flow_put().

 backtrace:
    [<000000007316e2f3>] krealloc+0x83/0xd0
    [<00000000ef157de1>] mlx5e_mod_hdr_alloc+0x147/0x300 [mlx5_core]
    [<00000000970ce4ae>] mlx5e_tc_match_to_reg_set_and_get_id+0xd7/0x240 [mlx5_core]
    [<0000000067c5fa17>] mlx5e_tc_match_to_reg_set+0xa/0x20 [mlx5_core]
    [<00000000d032eb98>] mlx5_tc_ct_entry_set_registers.isra.0+0x36/0xc0 [mlx5_core]
    [<00000000fd23b869>] mlx5_tc_ct_flow_offload+0x272/0x1f10 [mlx5_core]
    [<000000004fc24acc>] mlx5e_tc_offload_fdb_rules.part.0+0x150/0x620 [mlx5_core]
    [<00000000dc741c17>] mlx5e_tc_encap_flows_add+0x489/0x690 [mlx5_core]
    [<00000000e92e49d7>] mlx5e_rep_update_flows+0x6e4/0x9b0 [mlx5_core]
    [<00000000f60f5602>] mlx5e_rep_neigh_update+0x39a/0x5d0 [mlx5_core]

Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:22 -08:00
Avihai Horon
38a54cae6f net/mlx5: Fix flow counters SF bulk query len
When doing a flow counters bulk query, the number of counters to query
must be aligned to 4. Current SF bulk query len is not aligned to 4,
which leads to an error when trying to query more than 4 counters.

Fix it by aligning SF bulk query len to 4.

Fixes: 2fdeb4f4c2ae ("net/mlx5: Reduce flow counters bulk query buffer size for SFs")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:22 -08:00
Mark Bloch
2eb0cb31bc net/mlx5: E-Switch, rebuild lag only when needed
A user can enable VFs without changing E-Switch mode, this can happen
when a user moves straight to switchdev mode and only once in switchdev
VFs are enabled via the sysfs interface.

The cited commit assumed this isn't possible and exposed a single
API function where the E-switch calls into the lag code, breaks the lag
and prevents any other lag operations to take place until the
E-switch update has ended.

Breaking the hardware lag when it isn't needed can make it such that
hardware lag can't be enabled again.

In the sysfs call path check if the current E-Switch mode is NONE,
in the context of the function it can only mean the E-Switch is moving
out of NONE mode and the hardware lag should be disabled and enabled
once the mode change has ended. If the mode isn't NONE it means
VFs are about to be enabled and such operation doesn't require
toggling the hardware lag.

Fixes: cac1eb2cf2e3 ("net/mlx5: Lag, properly lock eswitch if needed")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:21 -08:00
Neta Ostrovsky
ba50cd9451 net/mlx5: Update error handler for UCTX and UMEM
In the fast unload flow, the device state is set to internal error,
which indicates that the driver started the destroy process.
In this case, when a destroy command is being executed, it should return
MLX5_CMD_STAT_OK.
Fix MLX5_CMD_OP_DESTROY_UCTX and MLX5_CMD_OP_DESTROY_UMEM to return OK
instead of EIO.

This fixes a call trace in the umem release process -
[ 2633.536695] Call Trace:
[ 2633.537518]  ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs]
[ 2633.538596]  remove_client_context+0x8b/0xd0 [ib_core]
[ 2633.539641]  disable_device+0x8c/0x130 [ib_core]
[ 2633.540615]  __ib_unregister_device+0x35/0xa0 [ib_core]
[ 2633.541640]  ib_unregister_device+0x21/0x30 [ib_core]
[ 2633.542663]  __mlx5_ib_remove+0x38/0x90 [mlx5_ib]
[ 2633.543640]  auxiliary_bus_remove+0x1e/0x30 [auxiliary]
[ 2633.544661]  device_release_driver_internal+0x103/0x1f0
[ 2633.545679]  bus_remove_device+0xf7/0x170
[ 2633.546640]  device_del+0x181/0x410
[ 2633.547606]  mlx5_rescan_drivers_locked.part.10+0x63/0x160 [mlx5_core]
[ 2633.548777]  mlx5_unregister_device+0x27/0x40 [mlx5_core]
[ 2633.549841]  mlx5_uninit_one+0x21/0xc0 [mlx5_core]
[ 2633.550864]  remove_one+0x69/0xe0 [mlx5_core]
[ 2633.551819]  pci_device_remove+0x3b/0xc0
[ 2633.552731]  device_release_driver_internal+0x103/0x1f0
[ 2633.553746]  unbind_store+0xf6/0x130
[ 2633.554657]  kernfs_fop_write+0x116/0x190
[ 2633.555567]  vfs_write+0xa5/0x1a0
[ 2633.556407]  ksys_write+0x4f/0xb0
[ 2633.557233]  do_syscall_64+0x5b/0x1a0
[ 2633.558071]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 2633.559018] RIP: 0033:0x7f9977132648
[ 2633.559821] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 55 6f 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 2633.562332] RSP: 002b:00007fffb1a83888 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2633.563472] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f9977132648
[ 2633.564541] RDX: 000000000000000c RSI: 000055b90546e230 RDI: 0000000000000001
[ 2633.565596] RBP: 000055b90546e230 R08: 00007f9977406860 R09: 00007f9977a54740
[ 2633.566653] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f99774056e0
[ 2633.567692] R13: 000000000000000c R14: 00007f9977400880 R15: 000000000000000c
[ 2633.568725] ---[ end trace 10b4fe52945e544d ]---

Fixes: 6a6fabbfa3e8 ("net/mlx5: Update pci error handler entries and command translation")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:21 -08:00
Yevgeny Kliteynik
455832d496 net/mlx5: DR, Fix check for unsupported fields in match param
The existing loop doesn't cast the buffer while scanning it, which
results in out-of-bounds read and failure to create the matcher.

Fixes: 941f19798a11 ("net/mlx5: DR, Add check for unsupported fields in match param")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:20 -08:00
Yevgeny Kliteynik
9091b821aa net/mlx5: DR, Handle eswitch manager and uplink vports separately
When querying eswitch manager vport capabilities as "other = 1",
we encounter a FW compatibility issue with older FW versions.
To maintain backward compatibility, eswitch manager vport should
be queried as "other = 0" vport both for ECPF and non-ECPF cases.

This patch fixes these queries and improves the code readability
by handling eswitch manager and uplink vports separately, avoiding
the excessive 'if' conditions. Also, uplink caps are stored similar
to esw manager and not as part of xarray.

Fixes: dd4acb2a0954 ("net/mlx5: DR, Add missing query for vport 0")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:20 -08:00
Valentine Fatiev
76ded29d3f net/mlx5e: nullify cq->dbg pointer in mlx5_debug_cq_remove()
Prior to this patch in case mlx5_core_destroy_cq() failed it proceeds
to rest of destroy operations. mlx5_core_destroy_cq() could be called again
by user and cause additional call of mlx5_debug_cq_remove().
cq->dbg was not nullify in previous call and cause the crash.

Fix it by nullify cq->dbg pointer after removal.

Also proceed to destroy operations only if FW return 0
for MLX5_CMD_OP_DESTROY_CQ command.

general protection fault, probably for non-canonical address 0x2000300004058: 0000 [#1] SMP PTI
CPU: 5 PID: 1228 Comm: python Not tainted 5.15.0-rc5_for_upstream_min_debug_2021_10_14_11_06 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:lockref_get+0x1/0x60
Code: 5d e9 53 ff ff ff 48 8d 7f 70 e8 0a 2e 48 00 c7 85 d0 00 00 00 02
00 00 00 c6 45 70 00 fb 5d c3 c3 cc cc cc cc cc cc cc cc 53 <48> 8b 17
48 89 fb 85 d2 75 3d 48 89 d0 bf 64 00 00 00 48 89 c1 48
RSP: 0018:ffff888137dd7a38 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff888107d5f458 RCX: 00000000fffffffe
RDX: 000000000002c2b0 RSI: ffffffff8155e2e0 RDI: 0002000300004058
RBP: ffff888137dd7a88 R08: 0002000300004058 R09: ffff8881144a9f88
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881141d4000
R13: ffff888137dd7c68 R14: ffff888137dd7d58 R15: ffff888137dd7cc0
FS:  00007f4644f2a4c0(0000) GS:ffff8887a2d40000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b4500f4380 CR3: 0000000114f7a003 CR4: 0000000000170ea0
Call Trace:
  simple_recursive_removal+0x33/0x2e0
  ? debugfs_remove+0x60/0x60
  debugfs_remove+0x40/0x60
  mlx5_debug_cq_remove+0x32/0x70 [mlx5_core]
  mlx5_core_destroy_cq+0x41/0x1d0 [mlx5_core]
  devx_obj_cleanup+0x151/0x330 [mlx5_ib]
  ? __pollwait+0xd0/0xd0
  ? xas_load+0x5/0x70
  ? xa_load+0x62/0xa0
  destroy_hw_idr_uobject+0x20/0x80 [ib_uverbs]
  uverbs_destroy_uobject+0x3b/0x360 [ib_uverbs]
  uobj_destroy+0x54/0xa0 [ib_uverbs]
  ib_uverbs_cmd_verbs+0xaf2/0x1160 [ib_uverbs]
  ? uverbs_finalize_object+0xd0/0xd0 [ib_uverbs]
  ib_uverbs_ioctl+0xc4/0x1b0 [ib_uverbs]
  __x64_sys_ioctl+0x3e4/0x8e0

Fixes: 94b960b9deff ("net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path")
Signed-off-by: Valentine Fatiev <valentinef@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16 12:20:19 -08:00