1031383 Commits

Author SHA1 Message Date
Jacob Keller
4dd0d5c33c ice: add lock around Tx timestamp tracker flush
The driver didn't take the lock while flushing the Tx tracker, which
could cause a race where one thread is trying to read timestamps out
while another thread is trying to read the tracker to check the
timestamps.

Avoid this by ensuring that flushing is locked against read accesses.

Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27 09:14:49 -07:00
Jacob Keller
1f0cbb3e89 ice: remove dead code for allocating pin_config
We have code in the ice driver which allocates the pin_config structure
if n_pins is > 0, but we never set n_pins to be greater than zero.
There's no reason to keep this code until we actually have pin_config
support. Remove this. We can re-add it properly when we implement
support for pin_config for E810-T devices.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27 09:11:39 -07:00
Jacob Keller
84c5fb8c42 ice: fix Tx queue iteration for Tx timestamp enablement
The driver accidentally copied the ice_for_each_rxq iterator when
implementing enablement of the ptp_tx bit for the Tx rings. We still
load the Tx rings and set the ptp_tx field, but we iterate over the
count of the num_rxq.

If the number of Tx and Rx queues differ, this could either cause
a buffer overrun when accessing the tx_rings list if num_txq is greater
than num_rxq, or it could cause us to fail to enable Tx timestamps for
some rings.

This was not noticed originally as we generally have the same number of
Tx and Rx queues.

Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-27 08:38:55 -07:00
David S. Miller
5fe2a6b434 mlx5-fixes-2021-08-26
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEoEo8ACgkQSD+KveBX
 +j5RbAf/UTeEjjFgYFwYIWZoaHtbXMNygwEQXqFWKPKoEHy3u4Korhu6ZEWclOg1
 /9W/7FV46Xhe97txfeubx3WKvsUltmLFlnPF/p7C+wOrwJ4cfHcJVPSLXFm0ZIA4
 6ny9wDSCyEnf0N9F9gUQGE9Ch5btuT6uNq1JOhaqtOI9BtJ/4i0B+Fk2KjGv89CY
 xnuyo6WKnfKyJIEMps2hk5hKjUTlH8W9XheaH+JI/YY0GrV9xSqGUm2mv9+GQqGk
 pZWOiMduzvucoXnBouhr/TF5N5XH+IsYqXClR9SOoLtokBttDCl1WzRYvpGvt59N
 Pr6QOZJe0UAKmgVdjsFbUvYmD6JfWw==
 =fpj/
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-08-26

This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:42:21 +01:00
Peter Collingbourne
d0efb16294 net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).

Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.

Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.

Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:40:25 +01:00
Wentao_Liang
6cc64770fb net/mlx5: DR, fix a potential use-after-free bug
In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to
cur_htbl and may cause cur_htbl to be freed.

However, cur_htbl is subsequently used in the next line, which may result
in an use-after-free bug.

Fix this by calling mlx5dr_err() before the cur_htbl is put.

Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Dmytro Linkin
f9d196bd63 net/mlx5e: Use correct eswitch for stack devices with lag
If link aggregation is used within stack devices driver rejects encap
rules if PF of the VF tunnel device is down. This happens because route
resolved for other PF and its eswitch instance is used to determine
correct vport.
To fix that use devcom feature to retrieve other eswitch instance if
failed to find vport for the 1st eswitch and LAG is active.

Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Maor Dickman
ca6891f9b2 net/mlx5: E-Switch, Set vhca id valid flag when creating indir fwd group
When indirect forward group is created, flow is added with vhca id but
without setting vhca id valid flag which violates the PRM.

Fix by setting the missing flag, vhca id valid.

Fixes: 34ca65352ddf ("net/mlx5: E-Switch, Indirect table infrastructure")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Roi Dayan
9a5f9cc794 net/mlx5e: Fix possible use-after-free deleting fdb rule
After neigh-update-add failure we are still with a slow path rule but
the driver always assume the rule is an fdb rule.
Fix neigh-update-del by checking slow path tc flag on the flow.
Also fix neigh-update-add for when neigh-update-del fails the same.

Fixes: 5dbe906ff1d5 ("net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Leon Romanovsky
8e7e2e8ed0 net/mlx5: Remove all auxiliary devices at the unregister event
The call to mlx5_unregister_device() means that mlx5_core driver is
removed. In such scenario, we need to disregard all other flags like
attach/detach and forcibly remove all auxiliary devices.

Fixes: a5ae8fc9058e ("net/mlx5e: Don't create devices during unload flow")
Tested-and-Reported-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:41 -07:00
Dima Chumak
2f8b6161cc net/mlx5: Lag, fix multipath lag activation
When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route,
lag activation can be missed if a stale (struct lag_mp)->mfi pointer
exists, which was associated with an older multipath route that had been
removed.

Normally, when a route is removed, it triggers mlx5_lag_fib_event(),
which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if
mlx5_lag_check_prereq() condition isn't met, for example when eswitch is
in legacy mode, the fib event is skipped and mfi pointer becomes stale.

Fix by resetting mfi pointer to NULL in mlx5_deactivate_lag().

Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:41 -07:00
Linus Torvalds
73367f05b2 This is a one-liner fix for a serious bug that can cause the server to
become unresponsive to a client, so I think it's worth the last-minute
 inclusion for 5.14.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmEn5MwVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DoIP+QHYnLK5fVN8TcBV/I3RbhEGafMv
 XSym9RtpLVAlfhrM6eBxiQqq2eHzjKADwatE2orDVD7w7rPKa19xvF8+LoYvtGnm
 cs3j49DlncWjoO1zO36QteO9M9FHxYM85PFX1kM74ZwBuLNTvZecIdHhuIg9WrnN
 GDhQHwUMXxXFZJyWARBar/XLGMUDkCl1CEj1QgKePvbwY/ucmtTLgbwaSooRrNog
 9g7ac5s5ZHEgM2oniS+WZ1C+18azOOoo9blP8bzEAM4mE91uq/k1jYGkLcnZIyK/
 CUO1CF7G26yW+4Q+k/OSPrN2cNnA9Uvg3z7hld/yVoAQKHMQ0ndt3SUWny3pH/xi
 y7f62jRHeLnhpTwX5psOTnL24XyuPHyoSriop1xfym9Tbrhsskc+BtVpY2TltggG
 rbdyYH7BjQoMMqyXla/tUFA7Iso5W06qdSqapLbquu8/XPaMFs7R147GMsFcmA8D
 NdCbIwMeE/1YnE2qx0XqwXfxEkK/prLnabPtDhSiQ1flAYxHwBhTw3teHSD0Ohy/
 BbwY4eHRBnY8q22b1dJp1PNHWVSCuPtXX7QHIeQeYTBvSu/dwahq1xo6LtYoQExl
 Fhmer7Jvgh1+X8OMYOHYmbKcHXXSi1esL4+XB55d/m4vZTCxoeMonUP6bdcnzqkp
 aOmgygKJSyAzsXA8
 =NtJD
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
 "This is a one-liner fix for a serious bug that can cause the server to
  become unresponsive to a client, so I think it's worth the last-minute
  inclusion for 5.14"

* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
2021-08-26 13:26:40 -07:00
Linus Torvalds
8a2cb8bd06 Networking fixes for 5.14(-rc8?), including fixes from can and bpf.
Current release - regressions:
 
  - stmmac: revert "stmmac: align RX buffers"
 
  - usb: asix: ax88772: move embedded PHY detection as early as possible
 
  - usb: asix: do not call phy_disconnect() for ax88178
 
  - Revert "net: really fix the build...", from Kalle to fix QCA6390
 
 Current release - new code bugs:
 
  - phy: mediatek: add the missing suspend/resume callbacks
 
 Previous releases - regressions:
 
  - qrtr: fix another OOB Read in qrtr_endpoint_post
 
  - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
 
 Previous releases - always broken:
 
  - inet: use siphash in exception handling
 
  - ip_gre: add validation for csum_start
 
  - bpf: fix ringbuf helper function compatibility
 
  - rtnetlink: return correct error on changing device netns
 
  - e1000e: do not try to recover the NVM checksum on Tiger Lake
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEn4nAACgkQMUZtbf5S
 IrtNnBAAnF6dxSdVOMZe0pScj4YLp3Vwfxi3sFTQ/9iUf3hbwyEZTntUdJ9xQjBd
 V8f+V7gorvPCEszYxZKAgqwZdEuOhNZdPzmEveug9Ln8AdV84RT5Pvh0PpY2Tzop
 jloh58+3vnNYJKUlrCavwKcG5eF+g/hZdgDMzp5hqFAqY1W4liZAR+u3LKYHggy2
 jAFk8/gRIzOHOAB0g4JuXwTUDhOxIKscUyJbvd8z/9X5MZLqnKvz8+tFIvU2ipJ+
 2P6Q7VmF57v8sDBII7tvpFqG1pR2X5JjgNasH3J1O1ttR268OlZkNwH/09vZe6Ih
 WZcebfcQWEOqv8HTFr992d9jHVHHnN8hlJkD1Co0yBJTsDbGfWhR3ngnKGvZ14is
 5RNjHgmHEvmCnIKaZkBI2pPP6HQBmxFinP12wldVa/Na0bpqjZpDs8YFZ11H74ST
 DP4CXR6YKrIRWCiIxT2NDbupIZwGVzRtzNAfjbjTTkN7wgRbVtcR2xkPxV4fiEcO
 DJ1cE//1fpj9m9W+Ln4evRfDmbCEMsyJjozlTub4cKqCiE6ywTuSa4OZ21/nI4/k
 LS0CT/VF9uRU9QSElHNPuMDIIKMnJYJobdZzXh9wmniG+MQw46XC69QOfO631Ly2
 BayvXf/5EvxLl0xY5Ub5K6PmECtZ/QLJCa+nxIWi/Btus9F6RfA=
 =ilTg
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from can and bpf.

  Closing three hw-dependent regressions. Any fixes of note are in the
  'old code' category. Nothing blocking release from our perspective.

  Current release - regressions:

   - stmmac: revert "stmmac: align RX buffers"

   - usb: asix: ax88772: move embedded PHY detection as early as
     possible

   - usb: asix: do not call phy_disconnect() for ax88178

   - Revert "net: really fix the build...", from Kalle to fix QCA6390

  Current release - new code bugs:

   - phy: mediatek: add the missing suspend/resume callbacks

  Previous releases - regressions:

   - qrtr: fix another OOB Read in qrtr_endpoint_post

   - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings

  Previous releases - always broken:

   - inet: use siphash in exception handling

   - ip_gre: add validation for csum_start

   - bpf: fix ringbuf helper function compatibility

   - rtnetlink: return correct error on changing device netns

   - e1000e: do not try to recover the NVM checksum on Tiger Lake"

* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
  Revert "net: really fix the build..."
  net: hns3: fix get wrong pfc_en when query PFC configuration
  net: hns3: fix GRO configuration error after reset
  net: hns3: change the method of getting cmd index in debugfs
  net: hns3: fix duplicate node in VLAN list
  net: hns3: fix speed unknown issue in bond 4
  net: hns3: add waiting time before cmdq memory is released
  net: hns3: clear hardware resource when loading driver
  net: fix NULL pointer reference in cipso_v4_doi_free
  rtnetlink: Return correct error on changing device netns
  net: dsa: hellcreek: Adjust schedule look ahead window
  net: dsa: hellcreek: Fix incorrect setting of GCL
  cxgb4: dont touch blocked freelist bitmap after free
  ipv4: use siphash instead of Jenkins in fnhe_hashfun()
  ipv6: use siphash in rt6_exception_hash()
  can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
  net: usb: asix: ax88772: fix boolconv.cocci warnings
  net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
  qede: Fix memset corruption
  net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
  ...
2021-08-26 13:20:22 -07:00
Linus Torvalds
1a6d80ff24 arm64 fix for 5.14
- Fix dma_map_resource() by reverting back to old pfn_valid() code
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmEnX+UQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNN/lB/wInWMhFLs/uowiTi7hJr9uMO+w43yRv11O
 xQqfOyxH5Zo3NNffWJ9y7ysDDYEtgU1oBBLaSBT/SQxbf3B3Sspcvtsqwib60BhN
 CzQf4eNqoW8Q+pdRUyIJE5LtifiBbUGYAjva/RpyFgbMlBo/9uRSUQi65QqEh0mp
 sIWyLQdql3EmmWIu3bMWHcLH3pLvSGmy4Uh38gnKv/TevsugGDCvssR2GwY2k6mQ
 pvAAJC5eFYgEqamGShAf4gg3wTPDQlCs1S/M0f3sFrw4H7zpWEbaPulCUSeRoy6W
 U1olSc8yTZ3PlMl4IwFOsEmchZhovKveuTfrSz5vmkhTiW66IM5d
 =gaHA
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:
 "We received a report this week that the generic version of
  pfn_valid(), which we switched to this merge window in 16c9afc77660
  ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
  dma_map_resource() due to the following check:

        /* Don't allow RAM to be mapped */
        if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                return DMA_MAPPING_ERROR;

  Since the ongoing saga to determine the semantics of pfn_valid() is
  unlikely to be resolved this week (does it indicate valid memory, or
  just the presence of a struct page, or whether that struct page has
  been initialised?), just revert back to our old version of pfn_valid()
  for 5.14.

  Summary:

   - Fix dma_map_resource() by reverting back to old pfn_valid() code"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
2021-08-26 11:26:00 -07:00
Linus Torvalds
97d8cc2008 Two memory management fixes for the filesystem.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmEntw0THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi8CFB/4/1LVBiC2P9tqIr0S4rLaCBW91Xm4v
 oYbqoEuzzzl9FPYndSka2hq5x8wg+0nBCXiejafYVbIZsvE/UN+C5+H1mCD5NwyO
 imXHJ3lqKuZRHrGCkMSM3TJuOijPIU2gqVR+xb0vIfqjr0mU6YgLvvRBcY0QNimQ
 gLPoMwFGYwGWSLdcBfnHYSGWzmJk4rE94SSkL9Rg1NjkPslBahOrpA/GwNbltGsU
 +jYIWAZwpfgu2SCWPdUdYpA/Rw518WjGjZ9pOuZmFKg8R2mSJ4LVb5wZ4bsi1b5j
 CGc4KjNV+koeSMBlBex1EDdVXvqxkviNiWP1jm4FKz/fWcpD5DWv5467
 =t1nf
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two memory management fixes for the filesystem"

* tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
  ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
  ceph: correctly handle releasing an embedded cap flush
2021-08-26 11:18:30 -07:00
Kalle Valo
9ebc2758d0 Revert "net: really fix the build..."
This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88.

Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:

qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22

Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.

Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 11:08:32 -07:00
Linus Torvalds
9b49ceb854 for-5.14-rc7-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmEmQy4ACgkQxWXV+ddt
 WDuCRRAAmuO+6Zsl5MSq0hBnpec/VBN6lTi9VPt184BjW1IWsqwR1Ax8dVQEKgCm
 gzkGYEuVq2L5p+/ugWKKftAbmUU85Jf3AIsv81SCJQosRkxVXAdbrZOv00yUZy6/
 5YOdO+9u61otvtO6LcZz9l+0LcpSmrBwEszluyIS+nArgQyZwX2aZTjcScDJvB9+
 1y7Eo6eIbqbcJOf4mLDIJh0bHaiA7HB6jYJkbsnz51wBU2ETATzNzAoyP5ReTPGc
 1s0uxrpY37kHcUUTd6q8VLDTM6Ei4vF2zQm0jWcrw0K3hM6yPuH+GiEADoV/xsls
 6pbtss1E81rHEQjcK8brf6CxbOak8/WXV0gRia/3avkFteVlax+NJxRdVhksuJln
 siGlQqASX3vYdNL0nG+U0ml1Y9C1ZXTXu4lGjS6rtT9oeV+YSccG2UjIT9LEtuON
 W/zE4bUMqCddcZFEPH5jNK+ChGS8mmfs+UFFR+W/JzMIO8Uji5/K44FZDFBo0Oc/
 3JEgk7ZV4D+8SblBPMxJx0fZqbE8ggKM+IN5CAyscINOWOxrmRiaFFRygRX0TLDB
 2uts9owItW6zvaTRY6RclVeCvJ6ARQli4pv7YxZmH85hhtCbn515imWvLWw4+tSg
 QwrtDnPVMSJTdzFHvsmeE9lM6Vaw0ur70Ysyd29k/XJu3WwRdkM=
 =jN7s
 -----END PGP SIGNATURE-----

Merge tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix that I think qualifies for a late merge. It's a revert of
  a one-liner fix that meanwhile got backported to stable kernels and we
  got reports from users.

  The broken fix prevents creating compressed inline extents, which
  could be noticeable on space consumption.

  Technically it's a regression as the patch was merged in 5.14-rc1 but
  got propagated to several stable kernels and has higher exposure than
  a 'typical' development cycle bug"

* tag 'for-5.14-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: compression: don't try to compress if we don't have enough pages"
2021-08-26 11:05:11 -07:00
Jakub Kicinski
75da63b7a1 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
bpf 2021-08-26

We've added 1 non-merge commit during the last 1 day(s):

1) Fix ringbuf helper function compatibility, from Daniel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix ringbuf helper function compatibility
====================

Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 08:44:38 -07:00
Jakub Kicinski
57f8178292 Merge branch 'net-hns3-add-some-fixes-for-net'
Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Link: https://lore.kernel.org/r/1629976921-43438-1-git-send-email-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:21 -07:00
Guangbin Huang
8c1671e0d1 net: hns3: fix get wrong pfc_en when query PFC configuration
Currently, when query PFC configuration by dcbtool, driver will return
PFC enable status based on TC. As all priorities are mapped to TC0 by
default, if TC0 is enabled, then all priorities mapped to TC0 will be
shown as enabled status when query PFC setting, even though some
priorities have never been set.

for example:
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off
$ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on
$ dcb pfc show dev eth0
pfc-cap 4 macsec-bypass off delay 0
prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on

To fix this problem, just returns user's PFC config parameter saved in
driver.

Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:17 -07:00
Yufeng Mo
3462207d2d net: hns3: fix GRO configuration error after reset
The GRO configuration is enabled by default after reset. This
is incorrect and should be restored to the user-configured value.
So this restoration is added during reset initialization.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:17 -07:00
Yufeng Mo
55649d5654 net: hns3: change the method of getting cmd index in debugfs
Currently, the cmd index is obtained in debugfs by comparing file names.
However, this method may cause errors when processing more complex file
names. So, change this method by saving cmd in private data and comparing
it when getting cmd index in debugfs for optimization.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:17 -07:00
Guojia Liao
94391fae82 net: hns3: fix duplicate node in VLAN list
VLAN list should not be added duplicate VLAN node, otherwise it would
cause "add failed" when restore VLAN from VLAN list, so this patch adds
VLAN ID check before adding node into VLAN list.

Fixes: c6075b193462 ("net: hns3: Record VF vlan tables")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:17 -07:00
Yonglong Liu
b15c072a9f net: hns3: fix speed unknown issue in bond 4
In bond 4, when the link goes down and up repeatedly, the bond may get an
unknown speed, and then this port can not work.

The driver notify netif_carrier_on() before update the link state, when the
bond receive carrier on, will query the speed of the port, if the query
operation happens before updating the link state, will get an unknown
speed. So need to notify netif_carrier_on() after update the link state.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:16 -07:00
Yufeng Mo
a96d9330b0 net: hns3: add waiting time before cmdq memory is released
After the cmdq registers are cleared, the firmware may take time to
clear out possible left over commands in the cmdq. Driver must release
cmdq memory only after firmware has completed processing of left over
commands.

Fixes: 232d0d55fca6 ("net: hns3: uninitialize command queue while unloading PF driver")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:16 -07:00
Yufeng Mo
1a6d281946 net: hns3: clear hardware resource when loading driver
If a PF is bonded to a virtual machine and the virtual machine exits
unexpectedly, some hardware resource cannot be cleared. In this case,
loading driver may cause exceptions. Therefore, the hardware resource
needs to be cleared when the driver is loaded.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 07:24:16 -07:00
王贇
733c99ee8b net: fix NULL pointer reference in cipso_v4_doi_free
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:

  BUG: kernel NULL pointer dereference, address:
  ...
  RIP: 0010:cipso_v4_doi_free+0x3a/0x80
  ...
  Call Trace:
   netlbl_cipsov4_add_std+0xf4/0x8c0
   netlbl_cipsov4_add+0x13f/0x1b0
   genl_family_rcv_msg_doit.isra.15+0x132/0x170
   genl_rcv_msg+0x125/0x240

This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when 'doi_def->type' equal 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.

This patch just add the check to prevent panic happen for similar
cases.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 12:20:47 +01:00
Andrey Ignatov
96a6b93b69 rtnetlink: Return correct error on changing device netns
Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.

Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.

Before:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: Invalid argument

After:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: File exists

The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.

For example, __rtnl_newlink() does this:

net/core/rtnetlink.c
    3270	char ifname[IFNAMSIZ];
     ...
    3286	if (tb[IFLA_IFNAME])
    3287		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
    3288	else
    3289		ifname[0] = '\0';
     ...
    3364	if (dev) {
     ...
    3394		return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
    3395	}

, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.

But the pat (pattern) in __dev_change_net_namespace() is used as:

net/core/dev.c
   11198	err = -EEXIST;
   11199	if (__dev_get_by_name(net, dev->name)) {
   11200		/* We get here if we can't use the current device name */
   11201		if (!pat)
   11202			goto out;
   11203		err = dev_get_valid_name(net, dev, pat);
   11204		if (err < 0)
   11205			goto out;
   11206	}

As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.

Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 12:08:08 +01:00
David S. Miller
a423cbe0f2 Merge branch 'dsa-hellcreek-fixes'
Kurt Kanzenbach says:

====================
net: dsa: hellcreek: 802.1Qbv Fixes

while using TAPRIO offloading on the Hirschmann hellcreek switch, I've noticed
two issues in the current implementation:

1. The gate control list is incorrectly programmed
2. The admin base time is not set properly

Fix it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:26:06 +01:00
Kurt Kanzenbach
b7658ed35a net: dsa: hellcreek: Adjust schedule look ahead window
Traffic schedules can only be started up to eight seconds within the
future. Therefore, the driver periodically checks every two seconds whether the
admin base time provided by the user is inside that window. If so the schedule
is started. Otherwise the check is deferred.

However, according to the programming manual the look ahead window size should
be four - not eight - seconds. By using the proposed value of four seconds
starting a schedule at a specified admin base time actually works as expected.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:26:06 +01:00
Kurt Kanzenbach
a7db5ed863 net: dsa: hellcreek: Fix incorrect setting of GCL
Currently the gate control list which is programmed into the hardware is
incorrect resulting in wrong traffic schedules. The problem is the loop
variables are incremented before they are referenced. Therefore, move the
increment to the end of the loop.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:26:05 +01:00
Rahul Lakkireddy
43fed4d48d cxgb4: dont touch blocked freelist bitmap after free
When adapter init fails, the blocked freelist bitmap is already freed
up and should not be touched. So, move the bitmap zeroing closer to
where it was successfully allocated. Also handle adapter init failure
unwind path immediately and avoid setting up RDMA memory windows.

Fixes: 5b377d114f2b ("cxgb4: Add debugfs facility to inject FL starvation")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:23:24 +01:00
David S. Miller
38d57551dd Merge branch 'inet-siphash'
Eric Dumazet says:

====================
inet: use siphash in exception handling

A group of security researchers brought to our attention
the weakness of hash functions used in rt6_exception_hash()
and fnhe_hashfun()

I made two distinct patches to help backports, since IPv6
part was added in 4.15
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:20:34 +01:00
Eric Dumazet
6457378fe7 ipv4: use siphash instead of Jenkins in fnhe_hashfun()
A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Also remove the inline keyword, this really is distracting.

Fixes: d546c621542d ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:20:34 +01:00
Eric Dumazet
4785305c05 ipv6: use siphash in rt6_exception_hash()
A group of security researchers brought to our attention
the weakness of hash function used in rt6_exception_hash()

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Following patch deals with IPv4.

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:20:34 +01:00
David S. Miller
92ea47fe09 linux-can-fixes-for-5.14-20210826
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmEnNr8THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqaPqB/9sZk3/yAIEEwDakHI3SB979Fpj+BES
 9Ijt4KEN8QUzD9c5domd4mRMTcvL2nEseWPgYp66ufdiNQ3faPmc1kkeeX7Advtm
 yyrNo/fqODHrj/7s3MWlzGpzUkB74T7U4qlrpNuXC+E9+6kmDTqfahdxEuCahKOP
 EP+aQ6nawrtS1TYqjLZip4H1bTccyNRHIL0XBZckWuHI0yM5/SD1hm6ZDi5PSH6g
 FRhkp807m7qkL5k+gt/OtREVZ1+W7Y4s9d8Jjs20nBDSuF7PNineSpj+c9GYVU4m
 6VyA5Fu6dYsVqlLzQP89moy93cQGs4KCIRFeV4BcJrDUIDl0WnRz5go1
 =ojvR
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.14-20210826' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine says:

====================
pull-request: can 2021-08-26

this is a pull request of a single patch for net/master.

Stefan Mätje's patch fixes the interchange of RX and TX error counters
inthe esd_usb2 CAN driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 09:37:40 +01:00
Stefan Mätje
044012b520 can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
This patch fixes the interchanged fetch of the CAN RX and TX error
counters from the ESD_EV_CAN_ERROR_EXT message. The RX error counter
is really in struct rx_msg::data[2] and the TX error counter is in
struct rx_msg::data[3].

Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device")
Link: https://lore.kernel.org/r/20210825215227.4947-2-stefan.maetje@esd.eu
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-26 08:37:13 +02:00
kernel test robot
ec92e524ee net: usb: asix: ax88772: fix boolconv.cocci warnings
drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here

 Remove unneeded conversion to bool

Semantic patch information:
 Relational and logical operators evaluate to bool,
 explicit conversion is overly verbose and unneeded.

Generated by: scripts/coccinelle/misc/boolconv.cocci

Fixes: 7a141e64cf14 ("net: usb: asix: ax88772: move embedded PHY detection as early as possible")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-25 16:35:51 -07:00
Trond Myklebust
062b829c52 SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
flag on the socket. Among other things, this make it impossible to close
the socket.

Fixes: 82011c80b3ec ("SUNRPC: Move svc_xprt_received() call sites")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-08-25 16:58:09 -04:00
Linus Torvalds
73f3af7b46 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "2 patches.

  Subsystems affected by this patch series: mm/memory-hotplug and
  MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MAINTAINERS: exfat: update my email address
  mm/memory_hotplug: fix potential permanent lru cache disable
2021-08-25 12:45:31 -07:00
Namjae Jeon
a34cc13add MAINTAINERS: exfat: update my email address
My email address in exfat entry will be not available in a few days.
Update it to my own kernel.org address.

Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.com
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25 12:25:12 -07:00
Miaohe Lin
946746d1ad mm/memory_hotplug: fix potential permanent lru cache disable
If offline_pages failed after lru_cache_disable(), it forgot to do
lru_cache_enable() in error path.  So we would have lru cache disabled
permanently in this case.

Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com
Fixes: d479960e44f2 ("mm: disable LRU pagevec during the migration temporarily")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25 12:25:12 -07:00
Linus Torvalds
fe67f4dd8d pipe: do FASYNC notifications for every pipe IO, not just state changes
It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.

Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable).  User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.

But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).

The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.

This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong.  What matters is that it used to work.

So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes.  It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.

Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed.  Probably because the stress-ng problem case ends up
being timing-dependent too.

It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").

And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.

Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case.  FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.

Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25 10:27:16 -07:00
Linus Torvalds
62add98208 Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucount fixes from Eric Biederman:
 "This branch fixes a regression that made it impossible to increase
  rlimits that had been converted to the ucount infrastructure, and also
  fixes a reference counting bug where the reference was not incremented
  soon enough.

  The fixes are trivial and the bugs have been encountered in the wild,
  and the fixes have been tested"

* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Increase ucounts reference counter before the security hook
  ucounts: Fix regression preventing increasing of rlimits in init_user_ns
2021-08-25 09:56:10 -07:00
Tuo Li
a9e6ffbc5b ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
kcalloc() is called to allocate memory for m->m_info, and if it fails,
ceph_mdsmap_destroy() behind the label out_err will be called:
  ceph_mdsmap_destroy(m);

In ceph_mdsmap_destroy(), m->m_info is dereferenced through:
  kfree(m->m_info[i].export_targets);

To fix this possible null-pointer dereference, check m->m_info before the
for loop to free m->m_info[i].export_targets.

[ jlayton: fix up whitespace damage
	   only kfree(m->m_info) if it's non-NULL ]

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25 16:34:11 +02:00
Xiubo Li
b2f9fa1f3b ceph: correctly handle releasing an embedded cap flush
The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.

When force umounting, the client will try to remove all the session
caps. During this, it will free them, but that should not be done
with the ones embedded in a capsnap.

Fix this by adding a new boolean that indicates that the cap flush is
embedded in a capsnap, and skip freeing it if that's set.

At the same time, switch to using list_del_init() when detaching the
i_list and g_list heads.  It's possible for a forced umount to remove
these objects but then handle_cap_flushsnap_ack() races in and does the
list_del_init() again, corrupting memory.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25 16:34:11 +02:00
Qu Wenruo
4e9655763b Revert "btrfs: compression: don't try to compress if we don't have enough pages"
This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763.

[BUG]
It's no longer possible to create compressed inline extent after commit
f2165627319f ("btrfs: compression: don't try to compress if we don't
have enough pages").

[CAUSE]
For compression code, there are several possible reasons we have a range
that needs to be compressed while it's no more than one page.

- Compressed inline write
  The data is always smaller than one sector and the test lacks the
  condition to properly recognize a non-inline extent.

- Compressed subpage write
  For the incoming subpage compressed write support, we require page
  alignment of the delalloc range.
  And for 64K page size, we can compress just one page into smaller
  sectors.

For those reasons, the requirement for the data to be more than one page
is not correct, and is already causing regression for compressed inline
data writeback.  The idea of skipping one page to avoid wasting CPU time
could be revisited in the future.

[FIX]
Fix it by reverting the offending commit.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net
Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:08:19 +02:00
Will Deacon
3eb9cdffb3 Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
This partially reverts commit 16c9afc776608324ca71c0bc354987bab532f51d.

Alex Bee reports a regression in 5.14 on their RK3328 SoC when
configuring the PL330 DMA controller:

 | ------------[ cut here ]------------
 | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
 | Modules linked in: spi_rockchip(+) fuse
 | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
 | Hardware name: Pine64 Rock64 (DT)
 | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
 | pc : dma_map_resource+0x68/0xc0
 | lr : pl330_prep_slave_fifo+0x78/0xd0

This appears to be because dma_map_resource() is being called for a
physical address which does not correspond to a memory address yet does
have a valid 'struct page' due to the way in which the vmemmap is
constructed.

Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
implementation of pfn_valid() called memblock_is_memory() to return
'false' for such regions and the DMA mapping request would proceed.
However, now that we are using the generic implementation where only the
presence of the memory map entry is considered, we return 'true' and
erroneously fail with DMA_MAPPING_ERROR because we identify the region
as DRAM.

Although fixing this in the DMA mapping code is arguably the right fix,
it is a risky, cross-architecture change at this stage in the cycle. So
just revert arm64 back to its old pfn_valid() implementation for v5.14.
The change to the generic pfn_valid() code is preserved from the original
patch, so as to avoid impacting other architectures.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-08-25 11:33:24 +01:00
Davide Caratti
cd9b50adc6 net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
While running kselftests, Hangbin observed that sch_ets.sh often crashes,
and splats like the following one are seen in the output of 'dmesg':

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458
 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
 RIP: 0010:__list_del_entry_valid+0x2d/0x50
 Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94
 RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217
 RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122
 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8
 RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001
 R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100
 R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002
 FS:  00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0
 Call Trace:
  ets_qdisc_reset+0x6e/0x100 [sch_ets]
  qdisc_reset+0x49/0x1d0
  tbf_reset+0x15/0x60 [sch_tbf]
  qdisc_reset+0x49/0x1d0
  dev_reset_queue.constprop.42+0x2f/0x90
  dev_deactivate_many+0x1d3/0x3d0
  dev_deactivate+0x56/0x90
  qdisc_graft+0x47e/0x5a0
  tc_get_qdisc+0x1db/0x3e0
  rtnetlink_rcv_msg+0x164/0x4c0
  netlink_rcv_skb+0x50/0x100
  netlink_unicast+0x1a5/0x280
  netlink_sendmsg+0x242/0x480
  sock_sendmsg+0x5b/0x60
  ____sys_sendmsg+0x1f2/0x260
  ___sys_sendmsg+0x7c/0xc0
  __sys_sendmsg+0x57/0xa0
  do_syscall_64+0x3a/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f93e44b8338
 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
 RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338
 RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
 R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000
 Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

When the change() function decreases the value of 'nstrict', we must take
into account that packets might be already enqueued on a class that flips
from 'strict' to 'quantum': otherwise that class will not be added to the
bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to
do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer
dereference.
For classes flipping from 'strict' to 'quantum', initialize an empty list
and eventually add it to the bandwidth-sharing list, if there are packets
already enqueued. In this way, the kernel will:
 a) prevent crashing as described above.
 b) avoid retaining the backlog packets (for an arbitrarily long time) in
    case no packet is enqueued after a change from 'strict' to 'quantum'.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:15:30 +01:00
Shai Malin
e543468869 qede: Fix memset corruption
Thanks to Kees Cook who detected the problem of memset that starting
from not the first member, but sized for the whole struct.
The better change will be to remove the redundant memset and to clear
only the msix_cnt member.

Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:07:55 +01:00