1073155 Commits

Author SHA1 Message Date
Florian Westphal
5f31edc067 netfilter: conntrack: move extension sizes into core
No need to specify this in the registration modules, we already
collect all sizes for build-time checks on the maximum combined size.

After this change, all extensions except nat have no meaningful content
in their nf_ct_ext_type struct definition.

Next patch handles nat, this will then allow to remove the dynamic
register api completely.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 06:30:28 +01:00
Florian Westphal
bb62a765b1 netfilter: conntrack: make all extensions 8-byte alignned
All extensions except one need 8 byte alignment, so just make that the
default.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 06:30:28 +01:00
Nicolas Dichtel
8b54136472 netfilter: nfqueue: enable to get skb->priority
This info could be useful to improve traffic analysis.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 06:30:27 +01:00
Kevin Mitchell
5bed9f3f63 netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY
The udp_error function verifies the checksum of incoming UDP packets if
one is set. This has the desirable side effect of setting skb->ip_summed
to CHECKSUM_COMPLETE, signalling that this verification need not be
repeated further up the stack.

Conversely, when the UDP checksum is empty, which is perfectly legal (at least
inside IPv4), udp_error previously left no trace that the checksum had been
deemed acceptable.

This was a problem in particular for nf_reject_ipv4, which verifies the
checksum in nf_send_unreach() before sending ICMP_DEST_UNREACH. It makes
no accommodation for zero UDP checksums unless they are already marked
as CHECKSUM_UNNECESSARY.

This commit ensures packets with empty UDP checksum are marked as
CHECKSUM_UNNECESSARY, which is explicitly recommended in skbuff.h.

Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 06:30:27 +01:00
Horatiu Vultur
41414c9bdb net: lan966x: use .mac_select_pcs() interface
Convert lan966x to use the mac_select_interface instead of
phylink_set_pcs.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220202114949.833075-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 19:11:21 -08:00
Guillaume Nault
95eb6ef82b selftests: rtnetlink: Use more sensible tos values
Using tos 0x1 with 'ip route get <IPv4 address> ...' doesn't test much
of the tos option handling: 0x1 just sets an ECN bit, which is cleared
by inet_rtm_getroute() before doing the fib lookup. Let's use 0x10
instead, which is actually taken into account in the route lookup (and
is less surprising for the reader).

For consistency, use 0x10 for the IPv6 route lookup too (IPv6 currently
doesn't clear ECN bits, but might do so in the future).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/d61119e68d01ba7ef3ba50c1345a5123a11de123.1643815297.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 19:11:21 -08:00
Guillaume Nault
bafe517af2 selftests: fib offload: use sensible tos values
Although both iproute2 and the kernel accept 1 and 2 as tos values for
new routes, those are invalid. These values only set ECN bits, which
are ignored during IPv4 fib lookups. Therefore, no packet can actually
match such routes. This selftest therefore only succeeds because it
doesn't verify that the new routes do actually work in practice (it
just checks if the routes are offloaded or not).

It makes more sense to use tos values that don't conflict with ECN.
This way, the selftest won't be affected if we later decide to warn or
even reject invalid tos configurations for new routes.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/5e43b343720360a1c0e4f5947d9e917b26f30fbf.1643826556.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 19:11:21 -08:00
Eric Dumazet
25ee1660a5 net: minor __dev_alloc_name() optimization
__dev_alloc_name() allocates a private zeroed page,
then sets bits in it while iterating through net devices.

It can use __set_bit() to avoid unnecessary locked operations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220203064609.3242863-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 19:11:14 -08:00
Jakub Kicinski
c59400a68c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 17:36:16 -08:00
Kees Cook
dcb85f85fa gcc-plugins/stackleak: Use noinstr in favor of notrace
While the stackleak plugin was already using notrace, objtool is now a
bit more picky.  Update the notrace uses to noinstr.  Silences the
following objtool warnings when building with:

CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y

  vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section

Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.

Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 17:02:21 -08:00
Linus Torvalds
eb2eb5161c Networking fixes for 5.17-rc3, including fixes from bpf, netfilter,
and ieee802154.
 
 Current release - regressions:
 
  - Partially revert "net/smc: Add netlink net namespace support",
    fix uABI breakage
 
  - netfilter:
      - nft_ct: fix use after free when attaching zone template
      - nft_byteorder: track register operations
 
 Previous releases - regressions:
 
  - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
 
  - phy: qca8081: fix speeds lower than 2.5Gb/s
 
  - sched: fix use-after-free in tc_new_tfilter()
 
 Previous releases - always broken:
 
  - tcp: fix mem under-charging with zerocopy sendmsg()
 
  - tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
 
  - neigh: do not trigger immediate probes on NUD_FAILED from
    neigh_managed_work, avoid a deadlock
 
  - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
    false-positives
 
  - netfilter: nft_reject_bridge: fix for missing reply from prerouting
 
  - smc: forward wakeup to smc socket waitqueue after fallback
 
  - ieee802154:
      - return meaningful error codes from the netlink helpers
      - mcr20a: fix lifs/sifs periods
      - at86rf230, ca8210: stop leaking skbs on error paths
 
  - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent
 
  - ax25: add refcount in ax25_dev to avoid UAF bugs
 
  - eth: mlx5e:
      - fix SFP module EEPROM query
      - fix broken SKB allocation in HW-GRO
      - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows
 
  - eth: amd-xgbe:
      - fix skb data length underflow
      - ensure reset of the tx_timer_active flag, avoid Tx timeouts
 
  - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()
 
  - eth: e1000e: handshake with CSME starts from Alder Lake platforms
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmH8X9UACgkQMUZtbf5S
 IrsxuhAAlAvFHGL6y5Y2gAmhKvVUvCYjiIJBcvk7R66CwYVRxofvlhmxi6GM/Czs
 9SrVSaN4RXu3p3d7UtAl1gAQwHqzLIHH3m2g5dSKVvHZWQgkm/+n74x0aZQ9Fll7
 mWs9uu5fWsQr/qZBnnjoQTvUxRUNVd4trBy7nXGzkNqJL5j0+2TT4BhH4qalhE28
 iPc9YFCyKPdjoWFksteZqD3hAQbXxK/xRRr6xuvFHENlZdEHM6ARftHnJthTG/fY
 32rdn9YUkQ9lNtOBJNMN9yP2z1B7TcxASBqjjk55I7XtT1QAI9/PskszavHC0hOk
 BCSMX779bLNW4+G0wiSKVB4tq4tvswtawq8Hxa6zdU4TKIzfQ84ZL/Nf66GtH+4W
 C0mbZohmyJV9hQFkNT0ZLeihljd7i4BkDttlbK3uz2IL9tHeX3uSo5V7AgS/Xaf6
 frXgbGgjQTaR6IL9AUhfN3GTCx60mzpH/aRpFho8A5xAl3EtHWCJcRhbY/CEhQBR
 zyCndcLcG5mUzbhx/TxlKrrpRCLxqCUG/Tsb2wCh5jMxO1zonW9Hhv4P1ie6EFuI
 h+XiJT2WWObS/KTze9S86WOR0zcqrtRqaOGJlNB+/+K8ClZU8UsDTFXLQ0dqpVZF
 Mvp7VchBzyFFJrrvO8WkkJgLTKdaPJmM9wuWUZb4J6d2MWlmDkE=
 =qKvf
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter, and ieee802154.

  Current release - regressions:

   - Partially revert "net/smc: Add netlink net namespace support", fix
     uABI breakage

   - netfilter:
      - nft_ct: fix use after free when attaching zone template
      - nft_byteorder: track register operations

  Previous releases - regressions:

   - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback

   - phy: qca8081: fix speeds lower than 2.5Gb/s

   - sched: fix use-after-free in tc_new_tfilter()

  Previous releases - always broken:

   - tcp: fix mem under-charging with zerocopy sendmsg()

   - tcp: add missing tcp_skb_can_collapse() test in
     tcp_shift_skb_data()

   - neigh: do not trigger immediate probes on NUD_FAILED from
     neigh_managed_work, avoid a deadlock

   - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
     false-positives

   - netfilter: nft_reject_bridge: fix for missing reply from prerouting

   - smc: forward wakeup to smc socket waitqueue after fallback

   - ieee802154:
      - return meaningful error codes from the netlink helpers
      - mcr20a: fix lifs/sifs periods
      - at86rf230, ca8210: stop leaking skbs on error paths

   - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent

   - ax25: add refcount in ax25_dev to avoid UAF bugs

   - eth: mlx5e:
      - fix SFP module EEPROM query
      - fix broken SKB allocation in HW-GRO
      - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows

   - eth: amd-xgbe:
      - fix skb data length underflow
      - ensure reset of the tx_timer_active flag, avoid Tx timeouts

   - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()

   - eth: e1000e: handshake with CSME starts from Alder Lake platforms"

* tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  ax25: fix reference count leaks of ax25_dev
  net: stmmac: ensure PTP time register reads are consistent
  net: ipa: request IPA register values be retained
  dt-bindings: net: qcom,ipa: add optional qcom,qmp property
  tools/resolve_btfids: Do not print any commands when building silently
  bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
  net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
  tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
  net: sparx5: do not refer to skb after passing it on
  Partially revert "net/smc: Add netlink net namespace support"
  net/mlx5e: Avoid field-overflowing memcpy()
  net/mlx5e: Use struct_group() for memcpy() region
  net/mlx5e: Avoid implicit modify hdr for decap drop rule
  net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
  net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
  net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
  net/mlx5: E-Switch, Fix uninitialized variable modact
  net/mlx5e: Fix handling of wrong devices during bond netevent
  net/mlx5e: Fix broken SKB allocation in HW-GRO
  net/mlx5e: Fix wrong calculation of header index in HW_GRO
  ...
2022-02-03 16:54:18 -08:00
Linus Torvalds
551007a8f1 selinux/stable-5.17 PR 20220203
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmH8VkAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNyLQ/9GAsvvB7PYeUpj0CGLMaAT9Hys5l+
 WPjP+NU+HF+r+AUsSCJwKkK4yKnpEDK9nidOdkwiYjAO/83yl7kkBPGRgisQep1A
 tEbuJ5vqZnR59jLxNKCmQE0gY+gByjk3jZIVFLSWwG/ho7s1LQyNoYpm7rbFIgAz
 6qe7IR1nsATzxRhDoJI3RIPlQjzhM1qEX9PBEtwW+LLieShtvMc+ijdiUw7bqNl9
 RTM6hRf4fTX4jLHtxfZYZ99bHEjIseksFbSAnjKxxkt0W5EFha73VX8hjwnG24J/
 XZQAhsyvpQmcZKJGZPWUSa+UFcytoauMnNdgJOQw7TcMT4Y2mMuvcoZ/KkFtDjdr
 30qhp46/gml2yqnByXRfzshGQm9E4ZoqSCn+lFWAfjlrhcqdgZFKpILpwMixbdin
 NgTA/pbwXovrlho8UflB0sbDMrbyV3qNGZXD/4hRg66Vm3F7ipgqPBbM89qoDniG
 CXiQnmRQ+rwcftyeE7me7+kD6djYTWOfEY5HRNiCf9NhnQG8GP7YzZ4KACxJ2PwQ
 R9+Egc9nAl4UG6PrEjZeud81rLzc+ws2SJLokxOcIGnid8lZidf83HfWekAmRloA
 J5+tmpx5q26ug/j2uXV/rp36xaQWhjJrrnhEKamIYYAVioXa9srRhtz3qRI8r/13
 mrZ5hu4le8aC/5s=
 =AyIM
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "One small SELinux patch to ensure that a policy structure field is
  properly reset after freeing so that we don't inadvertently do a
  double-free on certain error conditions"

* tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix double free of cond_list on error paths
2022-02-03 16:44:12 -08:00
Linus Torvalds
25b20ae815 linux-kselftest-fixes-5.17-rc3
This Kselftest fixes update for Linux 5.17-rc3 consists of important
 fixes to several tests and documentation clarification on running
 mainline kselftest on stable releases. A few notable fixes:
 
 - fix kselftest run hang due to child processes that haven't been
   terminated. Fix signals all child processes
 - fix false pass/fail results from vdso_test_abi, openat2, mincore
 - build failures when using -j (multiple jobs) option
 - exec test build failure due to incorrect build rule for a run-time
   created "pipe"
 - zram test fixes related to interaction with zram-generator to
   make sure zram test to coordinate deleted with zram-generator
 - zram test compression ratio calculation fix and skipping
   max_comp_streams.
 - increasing rtc test timeout
 - cpufreq test to write test results to stdout which will necessary on
   automated test systems
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmH8OcAACgkQCwJExA0N
 QxzTkg//YF+iSc1aao8nmUOvsK6oc1RBwIj3hkLUHjP3H1qFkm9OYxzTcLYGcnyo
 JahxNjeVoDuVESYx/AyLZ568aCCxRXJEDzNm5eIVNfBrGtVTfFPM19HwC/R3I1Ew
 KJUruxRx++8AvI1RYMEzsDumKpLVe3bor7sj3CcO1E9/qkOoUAukxt7FVmSNMZlW
 qYCDgc3yBa/XrImHCbJdZc4CUhbmh+l05sZgG3V3fxQSlgfIClY0Qg8W7Ucu+r4S
 6W5nwoEJIG32Zl2avaZ2VTF4T+CTQB70g/n4OBEX8TAxuIIi9W12N6zMZ76q8qbp
 iRs7UqgUSqPWdz/3ZHiQ5gy0WsCJ/W1379TtiG0doEeU2vwZ6fR8NMn+2FrEH18W
 xHBPOWeN+PkVAFjUeoyt1c5OGNprK6EEE2kQ6CLoBTwlKWLDQ87ZNPf84uAsez1x
 G0m7AX/T5adeTLoZfEXNXVY4OROs0nxbAkGC5ghVtKQu1giMcUKj+KHUowgj5OIJ
 Zaj+uSPiN3hnwj5L2fk+orOEC+3bZxVoQzqSB2Bs6stQOQFZLP18xHIjQIDZoQuC
 O512ZY+dwMzSyTi2KoQmb/M0Ft3gSfhRVXc7gfEFfOvC3ZbqRGFfeES1ZILup3rZ
 izMTJBDOe+BGSm/GCFPHxu36YfdPqiAyHBTVSYy5EhLGpxafZvI=
 =rH2m
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "Important fixes to several tests and documentation clarification on
  running mainline kselftest on stable releases. A few notable fixes:

   - fix kselftest run hang due to child processes that haven't been
     terminated. Fix signals all child processes

   - fix false pass/fail results from vdso_test_abi, openat2, mincore

   - build failures when using -j (multiple jobs) option

   - exec test build failure due to incorrect build rule for a run-time
     created "pipe"

   - zram test fixes related to interaction with zram-generator to make
     sure zram test to coordinate deleted with zram-generator

   - zram test compression ratio calculation fix and skipping
     max_comp_streams.

   - increasing rtc test timeout

   - cpufreq test to write test results to stdout which will necessary
     on automated test systems"

* tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest: Fix vdso_test_abi return status
  selftests: skip mincore.check_file_mmap when fs lacks needed support
  selftests: openat2: Skip testcases that fail with EOPNOTSUPP
  selftests: openat2: Add missing dependency in Makefile
  selftests: openat2: Print also errno in failure messages
  selftests: futex: Use variable MAKE instead of make
  selftests/exec: Remove pipe from TEST_GEN_FILES
  selftests/zram: Adapt the situation that /dev/zram0 is being used
  selftests/zram01.sh: Fix compression ratio calculation
  selftests/zram: Skip max_comp_streams interface on newer kernel
  docs/kselftest: clarify running mainline tests on stables
  kselftest: signal all child processes
  selftests: cpufreq: Write test output to stdout as well
  selftests: rtc: Increase test timeout so that all tests run
2022-02-03 16:36:26 -08:00
Duoming Zhou
87563a043c ax25: fix reference count leaks of ax25_dev
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev
to avoid UAF bugs") introduces refcount into ax25_dev, but there
are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().

This patch uses ax25_dev_put() and adjusts the position of
ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.

Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 14:20:36 -08:00
Yannick Vignon
80d4609008 net: stmmac: ensure PTP time register reads are consistent
Even if protected from preemption and interrupts, a small time window
remains when the 2 register reads could return inconsistent values,
each time the "seconds" register changes. This could lead to an about
1-second error in the reported time.

Add logic to ensure the "seconds" and "nanoseconds" values are consistent.

Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 13:54:19 -08:00
Jakub Kicinski
77b1b8b43e Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-02-03

We've added 6 non-merge commits during the last 10 day(s) which contain
a total of 7 files changed, 11 insertions(+), 236 deletions(-).

The main changes are:

1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC
   flag which otherwise trips over KASAN, from Hou Tao.

2) Fix unresolved symbol warning in resolve_btfids due to LSM callback
   rename, from Alexei Starovoitov.

3) Fix a possible race in inc_misses_counter() when IRQ would trigger
   during counter update, from He Fengqing.

4) Fix tooling infra for cross-building with clang upon probing whether
   gcc provides the standard libraries, from Jean-Philippe Brucker.

5) Fix silent mode build for resolve_btfids, from Nathan Chancellor.

6) Drop unneeded and outdated lirc.h header copy from tooling infra as
   BPF does not require it anymore, from Sean Young.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  tools/resolve_btfids: Do not print any commands when building silently
  bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
  tools: Ignore errors from `which' when searching a GCC toolchain
  tools headers UAPI: remove stale lirc.h
  bpf: Fix possible race in inc_misses_counter
  bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
====================

Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 13:42:38 -08:00
Mickaël Salaün
1f2cfdd349 printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()
The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to
kernel/printk/sysctl.c introduced an incorrect __user attribute to the
buffer argument.  I spotted this change in [1] as well as the kernel
test robot.  Revert this change to please sparse:

  kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces)
  kernel/printk/sysctl.c:20:51:    expected void *
  kernel/printk/sysctl.c:20:51:    got void [noderef] __user *buffer

Fixes: faaa357a55e0 ("printk: move printk sysctl to printk/sysctl.c")
Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1]
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 11:27:38 -08:00
Igor Pylypiv
67d6212afd Revert "module, async: async_synchronize_full() on module init iff async is used"
This reverts commit 774a1221e862b343388347bac9b318767336b20b.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread.  As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d28007 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e862
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.

Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.

Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 11:20:34 -08:00
Linus Torvalds
305e6c42e8 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Eric's fix for a long standing cgroup1 permission issue where it only
   checks for uid 0 instead of CAP which inadvertently allows
   unprivileged userns roots to modify release_agent userhelper

 - Fixes for the fallout from Waiman's recent cpuset work

* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
  cgroup-v1: Require capabilities to set release_agent
  cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
  cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
2022-02-03 08:15:13 -08:00
Jakub Kicinski
0166556a12 Merge branch 'net-ipa-enable-register-retention'
Alex Elder says:

====================
net: ipa: enable register retention

With runtime power management in place, we sometimes need to issue
a command to enable retention of IPA register values before power
collapse.  This requires a new Device Tree property, whose presence
will also be used to signal that the command is required.
====================

Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 08:04:16 -08:00
Alex Elder
34a081761e net: ipa: request IPA register values be retained
In some cases, the IPA hardware needs to request the always-on
subsystem (AOSS) to coordinate with the IPA microcontroller to
retain IPA register values at power collapse.  This is done by
issuing a QMP request to the AOSS microcontroller.  A similar
request ondoes that request.

We must get and hold the "QMP" handle early, because we might get
back EPROBE_DEFER for that.  But the actual request should be sent
while we know the IPA clock is active, and when we know the
microcontroller is operational.

Fixes: 1aac309d3207 ("net: ipa: use autosuspend")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 08:03:43 -08:00
Alex Elder
ac62a0174d dt-bindings: net: qcom,ipa: add optional qcom,qmp property
For some systems, the IPA driver must make a request to ensure that
its registers are retained across power collapse of the IPA hardware.
On such systems, we'll use the existence of the "qcom,qmp" property
as a signal that this request is required.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 08:03:20 -08:00
Waiman Long
2bdfd2825c cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks().  It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.

Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.

Fixes: 4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-03 05:59:01 -10:00
Nathan Chancellor
7f3bdbc3f1 tools/resolve_btfids: Do not print any commands when building silently
When building with 'make -s', there is some output from resolve_btfids:

$ make -sj"$(nproc)" oldconfig prepare
  MKDIR     .../tools/bpf/resolve_btfids/libbpf/
  MKDIR     .../tools/bpf/resolve_btfids//libsubcmd
  LINK     resolve_btfids

Silent mode means that no information should be emitted about what is
currently being done. Use the $(silent) variable from Makefile.include
to avoid defining the msg macro so that there is no information printed.

Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
2022-02-03 16:28:49 +01:00
John Hubbard
c36c04c2e1 Revert "mm/gup: small refactoring: simplify try_grab_page()"
This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a

That commit did a refactoring that effectively combined fast and slow
gup paths (again).  And that was again incorrect, for two reasons:

 a) Fast gup and slow gup get reference counts on pages in different
    ways and with different goals: see Linus' writeup in commit
    cd1adf1b63a1 ("Revert "mm/gup: remove try_get_page(), call
    try_get_compound_head() directly""), and

 b) try_grab_compound_head() also has a specific check for
    "FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller
    can fall back to slow gup. This resulted in new failures, as
    recently report by Will McVicker [1].

But (a) has problems too, even though they may not have been reported
yet.  So just revert this.

Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1]
Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
Reported-and-tested-by: Will McVicker <willmcvicker@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Minchan Kim <minchan@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 06:51:42 -08:00
Linus Torvalds
d394bb77dd - fix missed change for PTR->PTR_WD conversion
- kernel-doc fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmH7l90aHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHANHQ//Z2utUjKa81s21yzs0SIy
 352GKbwGyyTQ/HfgtYwOR7j1tR2P3J00bUT5/2cH/l00PTZxGbAYxWtRkYCHYCXB
 GWd3JXIatJlmteu5xPX2bWBsFj5zgvhEC1VNo0vGGRzTGWVVa3kBxWIGUH0Hk7Kz
 rmiSVdmv579Ib39dPRE8CMYadtoX7QAUYicWeTuhdScWgALqccJY/GUSDOKU1fpy
 FghFjSxPDJQP4oogIB41fEDFYeR0GdIR5C5zvKBQ39K3vAfzpB4hauVVnCYLDTU/
 OoNYqDrdroQk7g5YgqwztyLDfPtivF0v04EZZs3lM5+O4S5kKoSmgrlxnYXpzkFe
 cBdlwW4auLxMv6cYSK/MYClZRtLPKQkKTkMvVcC0FQuCcAzq2+dE3R2YFkTECXjh
 Lbf5JNB2445eS6skzmMBcxutkRJQaFIZ4B9eMGraNnXaHNFGxERSYzjoiIF8ndNZ
 yGlm8U/opPyBhpbdIyTWvikR/z2jA2VKIN8jNwC4tvk+PmgURZ3jDTahxrS4kuns
 qF1H8yTXhU05v93qMoeOxnTUKHfzB/Wo3FXQWAg5ovoj/+SEMnzAt6yJCJfYOQ7x
 A8ujSSMssuBFgxOFYt2h6yBts8MSvC6edhEaJoezFOrA9uyaDkAkK3Hu6SQn8IyR
 H9/9R12X+39fLbRoPOXgyCM=
 =0NAK
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fix missed change for PTR->PTR_WD conversion

 - kernel-doc fixes

* tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: KVM: fix vz.c kernel-doc notation
  MIPS: octeon: Fix missed PTR->PTR_WD conversion
2022-02-03 06:45:34 -08:00
David S. Miller
9c30918925 Merge branch 'dsa-mv88e6xxx-phylink_generic_validate'
Russell King says:

====================
net: dsa: mv88e6xxx: convert to phylink_generic_validate()

The overall objective of this series is to convert the mv88e6xxx DSA
driver to use phylink_generic_validate().

Patch 1 adds a new helper mv88e6352_g2_scratch_port_has_serdes() which
indicates whether an 88e6352 port has a serdes associated with it. This
is necessary as ports 4 and 5 will normally be in automedia mode, where
the CMODE field in the port status register will change e.g. between 15
(internal PHY) and 9 (1000base-X) depending on whether the serdes has
link.

The existing code caches the cmode field, and depending whether the
serdes has link at probe time, determines whether we allow things such
as the serdes statistics to be accessed. This means if the link isn't
up at probe time, the serdes is essentially unavailable.

Patch 1 addresses this by reading the pin configuration to find out
whether the serdes is attached to port 4 or port 5.

Patch 2 is a joint effort between myself and Marek Behún, adding the
supported interfaces and MAC capabilities to all mv88e6xxx supported
switch devices. This is slightly more restrictive than the original
code as we didn't used to care too much about the interface mode, but
with this we do - which is why we must know if there's a serdes
associated now.

Patch 3 switches mv88e6xxx to use the generic validation by removing
the initialisation of the phylink_validate pointer in the dsa_ops
struct.

Patch 4 updates the statistics code to use the new helper in patch 1,
so the serdes statistics are available even if the link was down at
driver probe time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:10:35 +00:00
Russell King (Oracle)
7f7d32bc26 net: dsa: mv88e6xxx: improve 88e6352 serdes statistics detection
The decision whether to report serdes statistics currently depends on
the cached C_Mode value for the port, read at probe time or updated by
configuration. However, port 4 can be in "automedia" mode when it is
used as a serdes port, meaning it switches between the internal PHY and
the serdes, changing the read-only C_Mode value depending on which
first gains link. Consequently, the C_Mode value read at probe does not
accurately reflect whether the port has the serdes associated with it.

In "net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()",
we added a way to read the hardware configuration to determine which
port has the serdes associated with it. Use this to determine which
port reports the serdes statistics.

Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:10:35 +00:00
Russell King (Oracle)
2ee84cfefb net: dsa: mv88e6xxx: convert to phylink_generic_validate()
Now that the mv88e6xxx chip drivers are supplying the supported
interfaces and MAC capabilities, switch the driver to use the generic
phylink validation implementation by removing our own validation
implementations. This causes DSA to call phylink_generic_validate()
on our behalf.

Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:10:35 +00:00
Russell King (Oracle)
d4ebf12bce net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilities
Populate the supported interfaces and MAC capabilities for the
Marvell MV88E6xxx DSA switches in preparation to using these for the
validation functionality.

Patch co-authored by Marek.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org> [ fixed 6341 and 6393x ]
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:10:35 +00:00
Russell King (Oracle)
62001548a6 net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()
Read the hardware configuration to determine which port is attached
to the serdes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:10:35 +00:00
David S. Miller
0947644332 Merge branch 'dsa-mv88e6xxx-port-isolation'
Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: Improve standalone port isolation

The ideal isolation between standalone ports satisfies two properties:
1. Packets from one standalone port must not be forwarded to any other
   port.
2. Packets from a standalone port must be sent to the CPU port.

mv88e6xxx solves (1) by isolating standalone ports using the PVT. Up
to this point though, (2) has not guaranteed; as the ATU is still
consulted, there is a chance that incoming packets never reach the CPU
if its DA has previously been used as the SA of an earlier packet (see
1/5 for more details). This is typically not a problem, except for one
very useful setup in which switch ports are looped in order to run the
bridge kselftests in tools/testing/selftests/net/forwarding. This
series attempts to solve (2).

Ideally, we could simply use the "ForceMap" bit of more modern chips
(Agate and newer) to classify all incoming packets as MGMT. This is
not available on older silicon that is still widely used (Opal Plus
chips like the 6097 for example).

Instead, this series takes a two pronged approach:

1/5: Always clear MapDA on standalone ports to make sure that no ATU
     entry can lead packets astray. This solves (2) for single-chip
     systems.

2/5: Trivial prep work for 4/5.
3/5: Trivial prep work for 4/5.

4/5: On multi-chip systems though, this is not enough. On the incoming
     chip, the packet will be forced out towards the CPU thanks to
     1/5, but on any intermediate chips the ATU is still consulted. We
     override this behavior by marking the reserved standalone VID (0)
     as a policy VID, the DSA ports' VID policy is set to TRAP. This
     will cause the packet to be reclassified as MGMT on the first
     intermediate chip, after which it's a straight shot towards the
     CPU.

Finally, we allow more tests to be run on mv88e6xxx:

5/5: The bridge_vlan{,un}aware suites sets an ageing_time of 10s on
     the bridge it creates, but mv88e6xxx has a minimum supported time
     of 15s. Allow this time to be overridden in forwarding.config.

With this series in place, mv88e6xxx passes the following kselftest
suites:

- bridge_port_isolation.sh
- bridge_sticky_fdb.sh
- bridge_vlan_aware.sh
- bridge_vlan_unaware.sh

v1 -> v2:
  - Wording/spelling (Vladimir)
  - Use standard iterator in dsa_switch_upstream_port (Vladimir)
  - Limit enabling of VTU port policy to downstream DSA ports (Vladimir)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:57 +00:00
Tobias Waldekranz
0811975917 selftests: net: bridge: Parameterize ageing timeout
Allow the ageing timeout that is set on bridges to be customized from
forwarding.config. This allows the tests to be run on hardware which
does not support a 10s timeout (e.g. mv88e6xxx).

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:56 +00:00
Tobias Waldekranz
d352b20f41 net: dsa: mv88e6xxx: Improve multichip isolation of standalone ports
Given that standalone ports are now configured to bypass the ATU and
forward all frames towards the upstream port, extend the ATU bypass to
multichip systems.

Load VID 0 (standalone) into the VTU with the policy bit set. Since
VID 4095 (bridged) is already loaded, we now know that all VIDs in use
are always available in all VTUs. Therefore, we can safely enable
802.1Q on DSA ports.

Setting the DSA ports' VTU policy to TRAP means that all incoming
frames on VID 0 will be classified as MGMT - as a result, the ATU is
bypassed on all subsequent switches.

With this isolation in place, we are able to support configurations
that are simultaneously very quirky and very useful. Quirky because it
involves looping cables between local switchports like in this
example:

   CPU
    |     .------.
.---0---. | .----0----.
|  sw0  | | |   sw1   |
'-1-2-3-' | '-1-2-3-4-'
  $ @ '---'   $ @ % %

We have three physically looped pairs ($, @, and %).

This is very useful because it allows us to run the kernel's
kselftests for the bridge on mv88e6xxx hardware.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:56 +00:00
Tobias Waldekranz
585d42bb57 net: dsa: mv88e6xxx: Enable port policy support on 6097
This chip has support for the same per-port policy actions found in
later versions of LinkStreet devices.

Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:56 +00:00
Tobias Waldekranz
bb03b280e0 net: dsa: mv88e6xxx: Support policy entries in the VTU
A VTU entry with policy enabled is used in combination with a port's
VTU policy setting to override normal switching behavior for frames
assigned to the entry's VID.

A typical example is to Treat all frames in a particular VLAN as
control traffic, and trap them to the CPU. In which case the relevant
user port's VTU policy would be set to TRAP.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:56 +00:00
Tobias Waldekranz
7af4a361a6 net: dsa: mv88e6xxx: Improve isolation of standalone ports
Clear MapDA on standalone ports to bypass any ATU lookup that might
point the packet in the wrong direction. This means that all packets
are flooded using the PVT config. So make sure that standalone ports
are only allowed to communicate with the local upstream port.

Here is a scenario in which this is needed:

   CPU
    |     .----.
.---0---. | .--0--.
|  sw0  | | | sw1 |
'-1-2-3-' | '-1-2-'
      '---'

- sw0p1 and sw1p1 are bridged
- sw0p2 and sw1p2 are in standalone mode
- Learning must be enabled on sw0p3 in order for hardware forwarding
  to work properly between bridged ports

1. A packet with SA :aa comes in on sw1p2
   1a. Egresses sw1p0
   1b. Ingresses sw0p3, ATU adds an entry for :aa towards port 3
   1c. Egresses sw0p0

2. A packet with DA :aa comes in on sw0p2
   2a. If an ATU lookup is done at this point, the packet will be
       incorrectly forwarded towards sw0p3. With this change in place,
       the ATU is bypassed and the packet is forwarded in accordance
       with the PVT, which only contains the CPU port.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:05:56 +00:00
David S. Miller
b566967c3c Merge branch 'ptp-virtual-clock-improvements'
Miroslav Lichvar says:

====================
Virtual PTP clock improvements and fix

v2:
- dropped patch changing initial time of virtual clocks

The first patch fixes an oops when unloading a driver with PTP clock and
enabled virtual clocks.

The other patches add missing features to make synchronization with
virtual clocks work as well as with the physical clock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:00:58 +00:00
Miroslav Lichvar
21fad63084 ptp: add getcrosststamp() to virtual clocks.
If the physical clock supports cross timestamping (it has the
getcrosststamp() function), provide a wrapper in the virtual clock to
enable cross timestamping.

This adds support for the PTP_SYS_OFFSET_PRECISE ioctl.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:00:58 +00:00
Miroslav Lichvar
f0067ebfc4 ptp: add gettimex64() to virtual clocks.
If the physical clock has the gettimex64() function, provide a
gettimex64() wrapper in the virtual clock to enable more accurate
and stable synchronization.

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:00:57 +00:00
Miroslav Lichvar
f77222d693 ptp: increase maximum adjustment of virtual clocks.
Increase the maximum frequency offset of virtual clocks to 50% to enable
faster slewing corrections.

This value cannot be represented as scaled ppm when long has 32 bits,
but that is already the case for other drivers, even those that provide
the adjfine() function, i.e. 32-bit applications are expected to check
for the limit.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:00:57 +00:00
Miroslav Lichvar
bfcbb76b0f ptp: unregister virtual clocks when unregistering physical clock.
When unregistering a physical clock which has some virtual clocks,
unregister the virtual clocks with it.

This fixes the following oops, which can be triggered by unloading
a driver providing a PTP clock when it has enabled virtual clocks:

BUG: unable to handle page fault for address: ffffffffc04fc4d8
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:ptp_vclock_read+0x31/0xb0
Call Trace:
 timecounter_read+0xf/0x50
 ptp_vclock_refresh+0x2c/0x50
 ? ptp_clock_release+0x40/0x40
 ptp_aux_kworker+0x17/0x30
 kthread_worker_fn+0x9b/0x240
 ? kthread_should_park+0x30/0x30
 kthread+0xe2/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30

Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 14:00:57 +00:00
Alexander Duyck
52cc6ffc0a page_pool: Refactor page_pool to enable fragmenting after allocation
This change is meant to permit a driver to perform "fragmenting" of the
page from within the driver instead of the current model which requires
pre-partitioning the page. The main motivation behind this is to support
use cases where the page will be split up by the driver after DMA instead
of before.

With this change it becomes possible to start using page pool to replace
some of the existing use cases where multiple references were being used
for a single page, but the number needed was unknown as the size could be
dynamic.

For example, with this code it would be possible to do something like
the following to handle allocation:
  page = page_pool_alloc_pages();
  if (!page)
    return NULL;
  page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX);
  rx_buf->page = page;
  rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX;

Then we would process a received buffer by handling it with:
  rx_buf->pagecnt_bias--;

Once the page has been fully consumed we could then flush the remaining
instances with:
  if (page_pool_defrag_page(page, rx_buf->pagecnt_bias))
    continue;
  page_pool_put_defragged_page(pool, page -1, !!budget);

The general idea is that we want to have the ability to allocate a page
with excess fragment count and then trim off the unneeded fragments.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:49:03 +00:00
David S. Miller
33f7a32dd4 Merge branch 'dsa-phylink_generic_validate'
Russell King says:

====================
Trivial DSA conversions to phylink_generic_validate()

This series converts five DSA drivers to use phylink_generic_validate().
No feedback or testing reports were received from the CFT posting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:07 +00:00
Russell King (Oracle)
1f8d99de1d net: dsa: xrs700x: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the xrs700x
family of DSA switches and remove the old validate implementation to
allow DSA to use phylink_generic_validate() for this switch driver.

According to commit ee00b24f32eb ("net: dsa: add Arrow SpeedChips
XRS700x driver") the switch supports one RMII port and up to three
RGMII ports. This commit assumes that port 0 is the RMII port and the
remainder are RGMII.

This commit also results in the Autoneg bit being set in the ethtool
link modes, which wasn't in the original; if this switch supports
RGMII to a 10/100/1G PHY, then surely we want to allow Autoneg on the
PHY.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:06 +00:00
Russell King (Oracle)
9865b881a5 net: dsa: qca8k: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the QCA8K
DSA switch and remove the old validate implementation to allow DSA to
use phylink_generic_validate() for this switch driver.

In making this change, we bring consistency to the ethtool linkmodes
that phylink's validate step produces, thereby following the expected
behaviour as the phylink documentation has explained. Specifically, the
ethtool 1000baseX_Full capability is now permitted for all interface
modes, as it is a property of the PHY driver whether 1000baseX fiber
connections can be supported.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:06 +00:00
Russell King (Oracle)
82fdbb9174 net: dsa: ksz8795: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the
Microchip KSZ8795 DSA switch and remove the old validate implementation
to allow DSA to use phylink_generic_validate() for this switch driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:06 +00:00
Russell King (Oracle)
927c9daea9 net: dsa: bcm_sf2: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the bcm_sf2
DSA switch and remove the old validate implementation to allow DSA to
use phylink_generic_validate() for this switch driver.

The exclusion of Gigabit linkmodes for MII and Reverse MII links is
handled within phylink_generic_validate() in phylink, so there is no
need to make them conditional on the interface mode in the driver.

Thanks to Florian Fainelli for suggesting how to populate the supported
interfaces.

Link: https://lore.kernel.org/r/3b3fed98-0c82-99e9-dc72-09fe01c2bcf3@gmail.com
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:06 +00:00
Russell King (Oracle)
2a229ef44e net: dsa: ar9331: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the AR9331
DSA switch and remove the old validate implementation to allow DSA to
use phylink_generic_validate() for this switch driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:47:06 +00:00
David S. Miller
83a18b8e69 Merge branch 'mptcp-next'
Mat Martineau says:

====================
mptcp: Miscellaneous changes for 5.18

Patch 1 has some minor cleanup in mptcp_write_options().

Patch 2 moves a rarely-needed branch to optimize mptcp_write_options().

Patch 3 adds a comment explaining which combinations of MPTCP option
headers are expected.

Patch 4 adds a pr_debug() for the MPTCP_RST option.

Patches 5-7 allow setting MPTCP_PM_ADDR_FLAG_FULLMESH with the "set
flags" netlink command. This allows changing the behavior of existing
path manager endpoints. The flag was previously only set at endpoint
creation time. Associated selftests also updated.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03 11:44:08 +00:00