Commit Graph

1234464 Commits

Author SHA1 Message Date
4aa1d8f89b octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
It is possible to add a ntuple rule which would like to direct packet to
a VF whose number of queues are greater/less than its PF's queue numbers.
For example a PF can have 2 Rx queues but a VF created on that PF can have
8 Rx queues. As of today, ntuple rule will reject rule because it is
checking the requested queue number against PF's number of Rx queues.
As a part of this fix if the action of a ntuple rule is to move a packet
to a VF's queue then the check is removed. Also, a debug information is
printed to aware user that it is user's responsibility to cross check if
the requested queue number on that VF is a valid one.

Fixes: f0a1913f8a ("octeontx2-pf: Add support for ethtool ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231121165624.3664182-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:55:32 +01:00
7490a42020 Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-add-v4h-support'
Niklas Söderlund says:

====================
net: ethernet: renesas: rcar_gen4_ptp: Add V4H support

This small series prepares the rcar_gen4_ptp to be useable both on both
R-Car S4 and V4H. The only in-tree driver that make use of this is
rswtich on S4. A new Ethernet (R-Car Ethernet TSN) driver for V4H is on
it's way that also will make use of rcar_gen4_ptp functionality.

Patch 1-2 are small improvements to the existing driver. While patch 3-4
adds V4H support. Finally patch 5 turns rcar_gen4_ptp into a separate
module to allow the gPTP functionality to be shared between the two
users without having to duplicate the code in each.

See each patch for changelog.
====================

Link: https://lore.kernel.org/r/20231121155306.515446-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:05:44 +01:00
8c1c66235e net: ethernet: renesas: rcar_gen4_ptp: Break out to module
The Gen4 gPTP support will be shared between the existing Renesas
Ethernet Switch driver and the upcoming Renesas Ethernet-TSN driver. In
preparation for this break out the gPTP support to its own module.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:02:49 +01:00
be5f81d37f net: ethernet: renesas: rcar_gen4_ptp: Get clock increment from clock rate
Instead of using hard coded clock increment values for each SoC derive
the clock increment from the module clock. This is done in preparation
to support a second platform, R-Car V4H that uses a 200Mhz clock
compared with the 320Mhz clock used on R-Car S4.

Tested on both SoCs,

S4 reports a clock of 320000000Hz which gives a value of 0x19000000.
Documentation says a 320Mhz clock is used and the correct increment for
that clock is 0x19000000.

V4H reports a clock of 199999992Hz which gives a value of 0x2800001a.
Documentation says a 200Mhz clock is used and the correct increment for
that clock is 0x28000000.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:02:49 +01:00
46c361a046 net: ethernet: renesas: rcar_gen4_ptp: Prepare for shared register layout
All known R-Car Gen4 SoC share the same register layout, rename the
R-Car S4 specific identifiers so they can be shared with the upcoming
R-Car V4H support.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:02:49 +01:00
9f3995707e net: ethernet: renesas: rcar_gen4_ptp: Fail on unknown register layout
Instead of printing a warning and proceeding with an unknown register
layout return an error. The only call site is already prepared to
propagate the error.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:02:49 +01:00
d73dcff9eb net: ethernet: renesas: rcar_gen4_ptp: Remove incorrect comment
The comments intent was to indicates which function uses the enum. While
upstreaming rcar_gen4_ptp the function was renamed but this comment was
left with the old function name.

Instead of correcting the comment remove it, it adds little value.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 12:02:49 +01:00
99360d9620 net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Interface 4 is used by for QMI interface in stock firmware of MF28D, the
router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up
from option driver.
The proper configuration is:

Interface mapping is:
0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI

T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  4 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=19d2 ProdID=0189 Rev= 0.00
S:  Manufacturer=ZTE, Incorporated
S:  Product=ZTE LTE Technologies MSM
C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=86(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms

Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Link: https://lore.kernel.org/r/20231117231918.100278-3-lech.perczak@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23 09:30:27 +01:00
9b6de136b5 Merge tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Fix several build errors, a potential kernel panic, a cpu hotplug
  issue and update links in documentations"

* tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst
  Docs/LoongArch: Update links in LoongArch introduction.rst
  LoongArch: Implement constant timer shutdown interface
  LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL
  LoongArch: Silence the boot warning about 'nokaslr'
  LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
  LoongArch: Record pc instead of offset in la_abs relocation
  LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
  LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi
2023-11-22 10:20:17 -08:00
05c8c94ed4 Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:

 - One fix for the KVP daemon (Ani Sinha)

 - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)

 - Micro-optimization for hv_nmi_unknown() (Uros Bizjak)

* tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
  x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
  hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22 09:56:26 -08:00
125b0bb95d asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
We really don't want to do atomic_read() or anything like that, since we
already have the value, not the lock.  The whole point of this is that
we've loaded the lock from memory, and we want to check whether the
value we loaded was a locked one or not.

The main use of this is the lockref code, which loads both the lock and
the reference count in one atomic operation, and then works on that
combined value.  With the atomic_read(), the compiler would pointlessly
spill the value to the stack, in order to then be able to read it back
"atomically".

This is the qspinlock version of commit c6f4a90022 ("asm-generic:
ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same
bug for ticket locks.

Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zbRkQ@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-11-22 09:32:49 -08:00
52471877a2 wifi: rtw89: 8922a: read efuse content from physical map
The calibration values of thermal and bias are programmed in invariable
physical map. Read them into driver and will set them to registers later.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-7-pkshih@realtek.com
2023-11-22 17:51:17 +02:00
c7ccb2402e wifi: rtw89: 8922a: read efuse content via efuse map struct from logic map
Define efuse map struct of RTW89_EFUSE_BLOCK_RF block, and read needed
data from efuse logic map into driver. Also, with efuse power-on state,
read MAC address via register interface according to HCI interface.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-6-pkshih@realtek.com
2023-11-22 17:51:16 +02:00
e102ff4b35 wifi: rtw89: 8852c: read RX gain offset from efuse for 6GHz channels
Read calibration values of RX gain offset from efuse, and set them to
registers to normalize RX gain for all hardware modules. Then, PHY dynamic
mechanism can get expected values to adjust hardware parameters to yield
expected performance.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-5-pkshih@realtek.com
2023-11-22 17:51:16 +02:00
f28eab6ae4 wifi: rtw89: mac: add to access efuse for WiFi 7 chips
MAC address, hardware type, calibration values and etc are stored in efuse,
so we read them at probe stage and use them as capabilities to register
hardware.

There are two physical efuse -- one is the main efuse for digital hardware
part, and the other is for analog part. Because they are very similar, we
only describe the main efuse below.

The main efuse is split into two regions -- one is for logic map, and the
other is for physical map. For both regions, we use the same method to read
data, but need additional parser to get logic map. To allow reading
operation, we need to convert power state to active, and turn to idle state
after reading.

For WiFi 7 chips, we introduce efuse blocks to define feature group easier,
and these blocks are discontinue. For example, RF block is from 0x1_0000 ~
0x1_0240, and the next block PCIE_SDIO is starting from 0x2_0000.
Comparing to old one used by WiFi 6 chips, there is only single one logic
map, it would be a little hard to add an new field to a group if we don't
reserve a room in advance.

The relationship between efuse, region and block is shown as below:

                                           (logical map)
 +------------+    +---------------+    +-----------------+
 | main efuse |    |   region 1    |    | block 0x1_0000~ |
 | (digital)  |    |(to logcal map)|    +-----------------+
 |            |    |               | => +-----------------+
 |            | => |               |    | block 0x2_0000~ |
 |            |    |               |    +-----------------+
 |            |    |---------------|             :
 |            |    |    region 2   |
 +------------+    +---------------+

 +------------+                         +-----------------+
 | 2nd efuse  | ======================> | block 0x7_0000~ |
 | (analog)   |                         +-----------------+
 +------------+

The parser converting from raw data to logic map is to decode block page,
block page offset, and word_en bits. Each word_en bit indicates two
following bytes as data of logic map, so total four word_en bits can
represent eight bytes. Thus, block page offset is 8-byte alignment.
The layout of a tuple is shown as below

  +--------+--------+--------+--------+--------+--------+
  | fixed 3 byte header      |        |        |        |
  |                          |        |        |        |
  | [19:17] block_page       |        |        |  ...   |
  | [16:4]  block_page_offset|        |        |        |
  | [3:0]   word_en          |   ^    |    ^   |        |
  +----|---+--------+--------+---|----+----|---+--------+
       |                         |         |
       +-------------------------+---------+
         a word_en bit indicates two bytes as data

For example,
  block_page = 0x3
  block_page_offset = 0x80 (must 8-byte alignment)
  word_en = 0x6 (b'0110; 0 means data is presented)
  following 4 bytes = 34 56 78 90
  Then,
    0x3_0080 = 34 56
    0x3_0086 = 78 90

A special block page is RTW89_EFUSE_BLOCK_ADIE (7) that uses different
but similar format, because its real efuse size is smaller than main efuse.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-4-pkshih@realtek.com
2023-11-22 17:51:16 +02:00
88e6a923bb wifi: rtw89: mac: use mac_gen pointer to access about efuse
Use function pointers to abstract efuse access, and introduce an new
function to convert efuse power state that is needed by WiFi 7 chips.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-3-pkshih@realtek.com
2023-11-22 17:51:16 +02:00
c0a04552e3 wifi: rtw89: 8922a: add 8922A basic chip info
8922A is a 802.11be chip that can support 2/5/6 GHz bands 160MHz bandwidth.
Introduce the basic info such as firmware file name, some hardware address
and size, supported spatial stream, TX descriptor and so on, and then
we can add more attributes by later patches.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231117024029.113845-2-pkshih@realtek.com
2023-11-22 17:51:16 +02:00
f60df12aaa wifi: rtlwifi: drop unused const_amdpci_aspm
Remove the unused "const_amdpci_aspm" member of struct rtl_pci and
struct rtl_ps_ctl.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231116180529.52752-1-helgaas@kernel.org
2023-11-22 17:50:36 +02:00
a85198c9f0 wifi: mwifiex: mwifiex_process_sleep_confirm_resp(): remove unused priv variable
Clang static analyzer complains that value stored to 'priv' is never read.
'priv' is useless, so remove it to save space.

Signed-off-by: Su Hui <suhui@nfschina.com>
Acked-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231115092328.1048103-1-suhui@nfschina.com
2023-11-22 17:50:09 +02:00
c212abfbd1 wifi: rtw89: regd: update regulatory map to R65-R44
Sync Realtek Regulatory R44 and Realtek Channel Plan R65.
Configure 6 GHz field of Realtek regd on SG, TW, GD.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231114091359.50664-4-pkshih@realtek.com
2023-11-22 17:48:00 +02:00
b2774a916a wifi: rtw89: regd: handle policy of 6 GHz according to BIOS
According to BIOS configuration of Realtek ACPI DSM function 4,
RTW89_ACPI_DSM_FUNC_6G_BP, we handle the regd policy of 6 GHz.

Policy defines two modes as below.
1. `BLOCK` mode:
    The countries in configured list are blocked.
2. `ALLOW` mode:
    _Only_ the countries in configured list are allowed.
    (i.e. others are all blocked.)

Then, when receiving regulatory notification at runtime, if 6 GHz
is blocked on the country, 6 GHz channels will be disabled.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231114091359.50664-3-pkshih@realtek.com
2023-11-22 17:48:00 +02:00
665ecff7dd wifi: rtw89: acpi: process 6 GHz band policy from DSM
Realtek ACPI DSM func 4, RTW89_ACPI_DSM_FUNC_6G_BP,
accepts a configuration via ACPI buffer as below.

| index | description   |
-------------------------
| [0-2] | signature     |
| [3]   | reserved      |
| [4]   | policy mode   |
| [5]   | country count |
| [6-]  | country list  |

Through this function, BIOS can indicate to allow/block
6 GHz on some specific countries. Still, driver should
follow regd first before taking this configuration into
account.

Besides, add a bit in debug mask for ACPI.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231114091359.50664-2-pkshih@realtek.com
2023-11-22 17:48:00 +02:00
2c4e9acbe3 wifi: rtlwifi: simplify rtl_action_proc() and rtl_tx_agg_start()
Since 'drv_priv' is an in-place member allocated at the end of
'struct ieee80211_sta', it can't be NULL and so relevant checks
in 'rtl_action_proc()' and 'rtl_tx_agg_start()' may be dropped.
Compile tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231113144734.197359-2-dmantipov@yandex.ru
2023-11-22 16:57:39 +02:00
6a26310273 Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
This reverts commit efa5f1311c.

I couldn't reproduce the reported issue. What I did, based on a pcap
packet log provided by the reporter:
- Used same chip version (RTL8168h)
- Set MAC address to the one used on the reporters system
- Replayed the EAPOL unicast packet that, according to the reporter,
  was filtered out by the mc filter.
The packet was properly received.

Therefore the root cause of the reported issue seems to be somewhere
else. Disabling mc filtering completely for the most common chip
version is a quite big hammer. Therefore revert the change and wait
for further analysis results from the reporter.

Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-22 12:12:02 +00:00
e6d71b437a net/smc: avoid data corruption caused by decline
We found a data corruption issue during testing of SMC-R on Redis
applications.

The benchmark has a low probability of reporting a strange error as
shown below.

"Error: Protocol error, got "\xe2" as reply type byte"

Finally, we found that the retrieved error data was as follows:

0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2

It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:

client                  server
        ¦  clc proposal
        ------------->
        ¦  clc accept
        <-------------
        ¦  clc confirm
        ------------->
wait llc confirm
			send llc confirm
        ¦failed llc confirm
        ¦   x------
(after 2s)timeout
                        wait llc confirm rsp

wait decline

(after 1s) timeout
                        (after 2s) timeout
        ¦   decline
        -------------->
        ¦   decline
        <--------------

As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.

This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).

This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.

Fixes: 0fb0b02bd6 ("net/smc: adapt SMC client code to use the LLC flow")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-22 12:10:19 +00:00
84d2db91f1 nfc: virtual_ncidev: Add variable to check if ndev is running
syzbot reported an memory leak that happens when an skb is add to
send_buff after virtual nci closed.
This patch adds a variable to track if the ndev is running before
handling new skb in send function.

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
Reviewed-by: Bongsu Jeon
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-22 10:55:48 +00:00
750011e239 net: stmmac: Add support for HW-accelerated VLAN stripping
Current implementation supports driver level VLAN tag stripping only.
The features is always on if CONFIG_VLAN_8021Q is enabled in kernel
config and is not user configurable.

This patch add support to MAC level VLAN tag stripping and can be
configured through ethtool. If the rx-vlan-offload is off, the VLAN tag
will be stripped by driver. If the rx-vlan-offload is on, the VLAN tag
will be stripped by MAC.

Command: ethtool -K <interface> rx-vlan-offload off | on

Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-22 10:54:14 +00:00
36b20fcdd9 net: hsr: Add support for MC filtering at the slave device
When MC (multicast) list is updated by the networking layer due to a
user command and as well as when allmulti flag is set, it needs to be
passed to the enslaved Ethernet devices. This patch allows this
to happen by implementing ndo_change_rx_flags() and ndo_set_rx_mode()
API calls that in turns pass it to the slave devices using
existing API calls.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-22 10:51:32 +00:00
18286883e7 x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	83 f8 ff             	cmp    $0xffffffff,%eax
  55:	0f 95 c0             	setne  %al

to:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	0f 95 c0             	setne  %al

No functional change intended.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
2023-11-22 03:47:44 +00:00
53475287da Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-21

We've added 85 non-merge commits during the last 12 day(s) which contain
a total of 63 files changed, 4464 insertions(+), 1484 deletions(-).

The main changes are:

1) Huge batch of verifier changes to improve BPF register bounds logic
   and range support along with a large test suite, and verifier log
   improvements, all from Andrii Nakryiko.

2) Add a new kfunc which acquires the associated cgroup of a task within
   a specific cgroup v1 hierarchy where the latter is identified by its id,
   from Yafang Shao.

3) Extend verifier to allow bpf_refcount_acquire() of a map value field
   obtained via direct load which is a use-case needed in sched_ext,
   from Dave Marchevsky.

4) Fix bpf_get_task_stack() helper to add the correct crosstask check
   for the get_perf_callchain(), from Jordan Rome.

5) Fix BPF task_iter internals where lockless usage of next_thread()
   was wrong. The rework also simplifies the code, from Oleg Nesterov.

6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another
   fix for certain BPF UAPI structs to fix verifier failures seen
   in bpf_dynptr usage, from Yonghong Song.

7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF
   memory allocator not being able to allocate per-CPU pointer successfully,
   from Hou Tao.

8) Add prep work around dynptr and string handling for kfuncs which
   is later going to be used by file verification via BPF LSM and fsverity,
   from Song Liu.

9) Improve BPF selftests to update multiple prog_tests to use ASSERT_*
   macros, from Yuran Pereira.

10) Optimize LPM trie lookup to check prefixlen before walking the trie,
    from Florian Lehner.

11) Consolidate virtio/9p configs from BPF selftests in config.vm file
    given they are needed consistently across archs, from Manu Bretelle.

12) Small BPF verifier refactor to remove register_is_const(),
    from Shung-Hsi Yu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
  selftests/bpf: reduce verboseness of reg_bounds selftest logs
  bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)
  bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()
  bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
  bpf: emit frameno for PTR_TO_STACK regs if it differs from current one
  bpf: smarter verifier log number printing logic
  bpf: omit default off=0 and imm=0 in register state log
  bpf: emit map name in register state if applicable and available
  bpf: print spilled register state in stack slot
  bpf: extract register state printing
  bpf: move verifier state printing code to kernel/bpf/log.c
  bpf: move verbose_linfo() into kernel/bpf/log.c
  bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
  bpf: Remove test for MOVSX32 with offset=32
  selftests/bpf: add iter test requiring range x range logic
  veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag
  ...
====================

Link: https://lore.kernel.org/r/20231122000500.28126-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:53:20 -08:00
b6fe6f0371 dpll: Fix potential msg memleak when genlmsg_put_reply failed
We should clean the skb resource if genlmsg_put_reply failed.

Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20231121013709.73323-1-gehao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:41:20 -08:00
340bf2dbb1 Merge branch 'bnxt_en-prepare-to-support-new-p7-chips'
Michael Chan says:

====================
bnxt_en: Prepare to support new P7 chips

This patchset is to prepare the driver to support the new P7 chips by
refactoring and modifying the code.  The P7 chip is built on the P5
chip and many code paths can be modified to support both chips.  The
whole patchset to have basic support for P7 chips is about 20 patches so
a follow-on patchset will complete the support and add the new PCI IDs.

The first 8 patches are changes to the backing store logic to support
both chips with mostly common code paths and datastructures.  Both
chips require host backing store memory but the relevant firmware APIs
have been modified to make it easier to support new backing store
memory types.

The next 4 patches are changes to TX and RX ring indexing logic and NAPI
logic.  The main changes are to increment the TX and RX producers
unbounded and to do any masking only when needed.  These changes are
needed to support the P7 chips which require additional higher bits in
these producer indices.  The NAPI logic is also slightly modifed.

The last patch is a rename of BNXT_FLAG_CHIP_P5 to BNXT_FLAG_P5_PLUS and
other related macro changes to make it clear that the P5_PLUS macro
applies to P5 and newer chips.
====================

Link: https://lore.kernel.org/r/20231120234405.194542-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:51 -08:00
1c7fd6ee2f bnxt_en: Rename some macros for the P5 chips
In preparation to support a new P7 chip which has a lot of similarities
with the P5 chip, rename the BNXT_FLAG_CHIP_P5 flag to
BNXT_FLAG_CHIP_P5_PLUS.  This will make it clear that the flag is for
P5 and newer chips.

Also, since there are no additional P5 variants in production, rename
BNXT_FLAG_CHIP_P5_THOR() to BNXT_FLAG_CHIP_P5() to keep the naming
more simple.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-14-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:49 -08:00
f94471f3ce bnxt_en: Modify the NAPI logic for the new P7 chips
Modify the NAPI logic for the new doorbell mechanism on P7 chips.
These changes are compatible with the current P5 chips.

In the current logic, bnxt_poll_p5() services 1 or more CQs for each
MSIX.  Each MSIX has an associated NQ and each NQ has 1 or more
associated CQs.  If any CQ reaches NAPI budget, we'll stay in polling
mode and will unconditionally check and service all CQs until we exit
polling.  We always re-arm all CQs when we exit polling.

To be compatible with the new Toggle bit mechanism in P7 chips, we need
to modify the logic so that we service and re-arm the CQ only if we
receive an NQE notification for work for that CQ.  We add a new
had_nqe_notify bit to the cp_ring_info structure and it gets set when we
see the NQE notification for that CQ anytime during polling.  We'll
service and re-arm only the CQs with the had_nqe_notify bits set.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-13-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:49 -08:00
c09d22674b bnxt_en: Modify RX ring indexing logic.
Modify the RX indexing logic for both RX ring and RX aggregation ring just
like the TX logic.  Change it so that the index increments unbounded and
mask it only when needed.

Modify the existing RX macros so that the index is not masked.  Add new
macros RING_RX()/RING_RX_AGG() to mask it only when needed to get the
index of rxr->rx_buf_ring[] and rxr->rx_agg_ring[].

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:49 -08:00
6d1add9553 bnxt_en: Modify TX ring indexing logic.
Change the TX ring logic so that the index increments unbounded and
mask it only when needed.

Modify the existing macros so that the index is not masked.  Add a
new macro RING_TX() to mask it only when needed to get the index of
txr->tx_buf_ring[].

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:49 -08:00
b9e0c47ee2 bnxt_en: Add db_ring_mask and related macro to bnxt_db_info struct.
This allows the doorbell related logic to mask the doorbell index
to the proper range before writing the doorbell.

The current code masks the doorbell index immediately to keep it in the
legal ranges for the most part.  Subsequent patches will change the
logic so that the index increments unbounded and it only gets masked
before use.  This is preparation work for the new chip that requires an
additional Epoch bit in the doorbell that needs to toggle when the index
has wrapped around.

This patch just adds the basic infrastructure and the logic is largely
unchanged.  We now replace RING_CMP() with the new DB_RING_IDX() at
appropriate places where we mask the completion ring index before
writing the doorbell.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
236e237f8f bnxt_en: Add support for HWRM_FUNC_BACKING_STORE_CFG_V2 firmware calls
Newer chips starting with 57600 will use this new firmware HWRM call to
configure backing store memory.  Add this new call if it is supported
by the firmware.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
6a4d0774f0 bnxt_en: Add support for new backing store query firmware API
Use the new v2 firmware API if supported by the firmware.  We now have the
infrastructure to support the v2 API.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
b098dc5a33 bnxt_en: Add bnxt_setup_ctxm_pg_tbls() helper function
In bnxt_alloc_ctx_mem(), the logic to set up the context memory entries
and to allocate the context memory tables is done repetitively.  Add
a helper function to simplify the code.

The setup of the Fast Path TQM entries relies on some information from
the Slow Path TQM entries.  Copy the SP_TQM entries to the FP_TQM
entries to simplify the logic.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
2ad67aea11 bnxt_en: Use the pg_info field in bnxt_ctx_mem_type struct
Use the newly added pg_info field in bnxt_ctx_mem_type struct and
remove the standalone page info structures in bnxt_ctx_mem_info.
This now completes the reorganization of the context memory
structures to work better with the new and more flexible firmware
interface for newer chips.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
035c576159 bnxt_en: Add page info to struct bnxt_ctx_mem_type
This will further improve the organization of the bnxt_ctx_mem_info
structure by moving the standalone page info structures into the
bnxt_ctx_mem_type array.  Add the allocation and free logic first and
the next patch will migrate to use the new infrastructure.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
76087d997a bnxt_en: Restructure context memory data structures
The current code uses a flat bnxt_ctx_mem_info structure to store 8
types of context memory for the NIC.  All the context memory types
are very similar and have similar parameters.  They can all share a
common structure to improve the organization.  Also, new firmware
interface will provide a new API to retrieve each type of context
memory by calling the API repeatedly.

This patch reorganizes the bnxt_ctx_mem_info structure to fit better
with the new firmware interface.  It will also work with the legacy
firmware interface.  The flat fields in bnxt_ctx_mem_info are replaced
by the bnxt_ctx_mem_type array.  The bnxt_mem_init array info will no
longer be needed.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
e50dc4c220 bnxt_en: Free bp->ctx inside bnxt_free_ctx_mem()
We always free bp->ctx right after calling bnxt_free_ctx_mem(), so just
free it at the end of that function to make things simpler.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:47 -08:00
aa8460bacf bnxt_en: The caller of bnxt_alloc_ctx_mem() should always free bp->ctx
bnxt_alloc_ctx_mem() calls bnxt_hwrm_func_backing_store_qcaps() to
allocate the memory for bp->ctx.  Initialize bp->ctx with the allocated
memory and let the caller free it during unwind.  The unwind logic is
already there, we just need to always set bp->ctx to the allocated
memory so the caller will always free it.  This simplifies the logic
and makes it easier to expand on the backing store logic.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:47 -08:00
46e208e70a Merge branch 'net-page_pool-add-netlink-based-introspection-part1'
Jakub Kicinski says:

====================
net: page_pool: plit the page_pool_params into fast and slow

Small refactoring in prep for adding more page pool params
which won't be needed on the fast path.

v1:  https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:22:38 -08:00
2da0cac1e9 net: page_pool: avoid touching slow on the fastpath
To fully benefit from previous commit add one byte of state
in the first cache line recording if we need to look at
the slow part.

The packing isn't all that impressive right now, we create
a 7B hole. I'm expecting Olek's rework will reshuffle this,
anyway.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:22:30 -08:00
5027ec19f1 net: page_pool: split the page_pool_params into fast and slow
struct page_pool is rather performance critical and we use
16B of the first cache line to store 2 pointers used only
by test code. Future patches will add more informational
(non-fast path) attributes.

It's convenient for the user of the API to not have to worry
which fields are fast and which are slow path. Use struct
groups to split the params into the two categories internally.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:22:29 -08:00
b2d66643dc Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2023-11-21

We've added 19 non-merge commits during the last 4 day(s) which contain
a total of 18 files changed, 1043 insertions(+), 416 deletions(-).

The main changes are:

1) Fix BPF verifier to validate callbacks as if they are called an unknown
   number of times in order to fix not detecting some unsafe programs,
   from Eduard Zingerman.

2) Fix bpf_redirect_peer() handling which missed proper stats accounting
   for veth and netkit and also generally fix missing stats for the latter,
   from Peilin Ye, Daniel Borkmann et al.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: check if max number of bpf_loop iterations is tracked
  bpf: keep track of max number of bpf_loop callback iterations
  selftests/bpf: test widening for iterating callbacks
  bpf: widening for callback iterators
  selftests/bpf: tests for iterating callbacks
  bpf: verify callbacks as if they are called unknown number of times
  bpf: extract setup_func_entry() utility function
  bpf: extract __check_reg_arg() utility function
  selftests/bpf: fix bpf_loop_bench for new callback verification scheme
  selftests/bpf: track string payload offset as scalar in strobemeta
  selftests/bpf: track tcp payload offset as scalar in xdp_synproxy
  selftests/bpf: Add netkit to tc_redirect selftest
  selftests/bpf: De-veth-ize the tc_redirect test case
  bpf, netkit: Add indirect call wrapper for fetching peer dev
  bpf: Fix dev's rx stats for bpf_redirect_peer traffic
  veth: Use tstats per-CPU traffic counters
  netkit: Add tstats per-CPU traffic counters
  net: Move {l,t,d}stats allocation to core and convert veth & vrf
  net, vrf: Move dstats structure to core
====================

Link: https://lore.kernel.org/r/20231121193113.11796-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 15:49:31 -08:00
3a17ea77da Merge branch 'mlxsw-preparations-for-support-of-cff-flood-mode'
Petr Machata says:

====================
mlxsw: Preparations for support of CFF flood mode

PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.

Among other allocations, the PGT currently contains two large blocks for
bridge flooding: one for 802.1q and one for 802.1d. Within each of these
blocks are three tables, for unknown-unicast, multicast and broadcast
flooding:

      . . . |    802.1q    |    802.1d    | . . .
            | UC | MC | BC | UC | MC | BC |
             \______ _____/ \_____ ______/
                    v             v
                   FID flood vectors

Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an
802.1q bridge) uses three flood vectors spread across a fairly large region
of PGT.

This way of organizing the flood table (called "controlled") is not very
flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one
would need to completely rewrite the bridge PGT blocks, or resort to hacks
such as storing individual MC flood vectors into unused part of the bridge
table.

In order to address these shortcomings, Spectrum-2 and above support what
is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode,
each FID has a little table of its own, with three entries adjacent to each
other, one for unknown-UC, one for MC, one for BC. This allows for a much
more fine-grained approach to PGT management, where bits of it are
allocated on demand.

      . . . | FID | FID | FID | FID | FID | . . .
            |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B|
             \_____________ _____________/
                           v
                   FID flood vectors

Besides the FID table organization, the CFF flood mode also impacts Router
Subport (RSP) table. This table contains flood vectors for rFIDs, which are
FIDs that reference front panel ports or LAGs. The RSP table contains two
entries per front panel port and LAG, one for unknown-UC traffic, and one
for everything else. Currently, the FW allocates and manages the table in
its own part of PGT. rFIDs are marked with flood_rsp bit and managed
specially. In CFF mode, rFIDs are managed as all other FIDs. The driver
therefore has to allocate and maintain the flood vectors. Like with bridge
FIDs, this is more work, but increases flexibility of the system.

The FW currently supports both the controlled and CFF flood modes. To shed
complexity, in the future it should only support CFF flood mode. Hence this
patchset, which is the first in series of two to add CFF flood mode support
to mlxsw.

There are FW versions out there that do not support CFF flood mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both controlled flood mode as well as CFF.

Another aspect is that at least on Spectrum-1, there are FW versions out
there that claim to support CFF flood mode, but then reject or ignore
configurations enabling the same. The driver thus has to have a say in
whether an attempt to configure CFF flood mode should even be made.

Much like with the LAG mode, the feature is therefore expressed in terms of
"does the driver prefer CFF flood mode?", and "what flood mode the PCI
module managed to configure the FW with". This gives to the driver a chance
to determine whether CFF flood mode configuration should be attempted.

In this patchset, we lay the ground with new definitions, registers and
their fields, and some minor code shaping. The next patchset will be more
focused on introducing necessary abstractions and implementation.

- Patches #1 and #2 add CFF-related items to the command interface.

- Patch #3 adds a new resource, for maximum number of flood profiles
  supported. (A flood profile is a mapping between traffic type and offset
  in the per-FID flood vector table.)

- Patches #4 to #8 adjust reg.h. The SFFP register is added, which is used
  for configuring the abovementioned traffic-type-to-offset mapping. The
  SFMR, register, which serves for FID configuration, is extended with
  fields specific to CFF mode. And other minor adjustments.

- Patches #9 and #10 add the plumbing for CFF mode: a way to request that
  CFF flood mode be configured, and a way to query the flood mode that was
  actually configured.

- Patch #11 removes dead code.

- Patches #12 and #13 add helpers that the next patchset will make use of.
  Patch #14 moves RIF setup ahead so that FID code can make use of it.
====================

Link: https://lore.kernel.org/r/cover.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 14:53:12 -08:00