IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
max_cap_nve_flood_prf describes maximum number of NVE flooding profiles.
The same value then applies for flooding profiles for flooding in CFF mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/064a2e013d879e5f5494167a6c120c4bb85a2204.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PGT, a port-group table is an in-HW block of specialized memory that holds
sets of ports. Allocated within the PGT are series of flood tables that
describe to which ports traffic of various types (unknown UC, BC, MC)
should be flooded from which FID. The hitherto-used layout of these flood
tables is being replaced with a more flexible scheme, called compressed FID
flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode.
In this patch, add MLXSW_CMD_MBOX_CONFIG_PROFILE_FLOOD_MODE_CFF, the value
to use to enable the CFF mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/fc2e063742856492f8f22b0b87abf431ea6d53d0.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PGT, a port-group table is an in-HW block of specialized memory that holds
sets of ports. Allocated within the PGT are series of flood tables that
describe to which ports traffic of various types (unknown UC, BC, MC)
should be flooded from which FID. The hitherto-used layout of these flood
tables is being replaced with a more flexible scheme, called compressed FID
flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode.
cff_support determines whether CONFIG_PROFILE.flood_mode can be set to CFF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/af727d0e1095e30fa45c7e60404637cdc491aeec.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Networking supports changing netdevice's netns and name
at the same time. This allows avoiding name conflicts
and having to rename the interface in multiple steps.
E.g. netns1={eth0, eth1}, netns2={eth1} - we want
to move netns1:eth1 to netns2 and call it eth0 there.
If we can't rename "in flight" we'd need to (1) rename
eth1 -> $tmp, (2) change netns, (3) rename $tmp -> eth0.
To rename the underlying struct device we have to call
device_rename(). The rename()'s MOVE event, however, doesn't
"belong" to either the old or the new namespace.
If there are conflicts on both sides it's actually impossible
to issue a real MOVE (old name -> new name) without confusing
user space. And Daniel reports that such confusions do in fact
happen for systemd, in real life.
Since we already issue explicit REMOVE and ADD events
manually - suppress the MOVE event completely. Move
the ADD after the rename, so that the REMOVE uses
the old name, and the ADD the new one.
If there is no rename this changes the picture as follows:
Before:
old ns | KERNEL[213.399289] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401302] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401397] move /devices/virtual/net/eth0 (net)
After:
old ns | KERNEL[266.774257] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[266.774509] add /devices/virtual/net/eth0 (net)
If there is a rename and a conflict (using the exact eth0/eth1
example explained above) we get this:
Before:
old ns | KERNEL[224.316833] remove /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.318551] add /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.319662] move /devices/virtual/net/eth0 (net)
After:
old ns | KERNEL[333.033166] remove /devices/virtual/net/eth1 (net)
new ns | KERNEL[333.035098] add /devices/virtual/net/eth0 (net)
Note that "in flight" rename is only performed when needed.
If there is no conflict for old name in the target netns -
the rename will be performed separately by dev_change_name(),
as if the rename was a different command, and there will still
be a MOVE event for the rename:
Before:
old ns | KERNEL[194.416429] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418809] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418869] move /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.420866] move /devices/virtual/net/eth1 (net)
After:
old ns | KERNEL[71.917520] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.919155] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.920729] move /devices/virtual/net/eth1 (net)
If deleting the MOVE event breaks some user space we should insert
an explicit kobject_uevent(MOVE) after the ADD, like this:
@@ -11192,6 +11192,12 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
netdev_adjacent_add_links(dev);
+ /* User space wants an explicit MOVE event, issue one unless
+ * dev_change_name() will get called later and issue one.
+ */
+ if (!pat || new_name[0])
+ kobject_uevent(&dev->dev.kobj, KOBJ_MOVE);
+
/* Adapt owner in case owning user namespace of target network
* namespace is different from the original one.
*/
Reported-by: Daniel Gröber <dxld@darkboxed.org>
Link: https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20231120184140.578375-1-kuba@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The device is always reset two consecutive times (ax88179_reset is called
twice), one from usbnet_probe during the device binding and the other from
usbnet_open.
Remove the non-necessary reset during the device binding and let the reset
operation from open to keep the normal behavior (tested with generic ASIX
Electronics Corp. AX88179 Gigabit Ethernet device).
Reported-by: Herb Wei <weihao.bj@ieisystem.com>
Tested-by: Herb Wei <weihao.bj@ieisystem.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Link: https://lore.kernel.org/r/20231120121239.54504-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use for_each_set_bit() rather than open coding the for() test_bit()
loop.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/E1r4p15-00Cpxe-C7@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If failed to allocate "tags" or could not find the final upper device from
start_dev's upper list in bond_verify_device_path(), only the loopback
detection of the current upper device should be affected, and the system is
no need to be panic.
So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn
information when failed to allocate memory for vlan tags in
bond_verify_device_path.
I also think that the following function calls
netdev_walk_all_upper_dev_rcu
---->>>alb_upper_dev_walk
---------->>>bond_verify_device_path
From this way, "end device" can eventually be obtained from "start device"
in bond_verify_device_path, IS_ERR(tags) could be instead of
IS_ERR_OR_NULL(tags) in alb_upper_dev_walk.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20231118081653.1481260-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add PCI Endpoint NIC support for Octeon CN10K devices.
CN10K devices are part of Octeon 10 family products with
similar PCI NIC characteristics. These include:
- CN10KA
- CNF10KA
- CNF10KB
- CN10KB
Update supported device list in Documentation
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231117103817.2468176-1-srasheed@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela says:
====================
selftests: tc-testing: more updates to tdc
Address the issues making tdc timeout on downstream CIs like lkp and
tuxsuite.
====================
Link: https://lore.kernel.org/r/20231117171208.2066136-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Report the number of workers in use to process the test batches.
Since the number is now subject to a limit, avoid users getting
confused.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-7-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the spirit of failing early, timeout on unbounded loops that take
longer than 20 ticks to complete. Such loops are to ensure that objects
created are already visible so tests can proceed without any issues.
If a test setup takes more than 20 ticks to see an object, there's
definetely something wrong.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of listing lingering ns pinned files and delete them one by one, leverage '-all'
from iproute2 to do it in a single process fork.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When pyroute2 is available, use the native netns delete routine instead
of calling iproute2 to do it. As forks are expensive with some kernel
configs, minimize its usage to avoid kselftests timeouts.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Surprisingly in kernel configs with most of the debug knobs turned on,
pre-allocating the test resources makes tdc run much slower overall than
when allocating resources on a per test basis.
As these knobs are used in kselftests in downstream CIs, let's go back
to the old way of doing things to avoid kselftests timeouts.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202311161129.3b45ed53-oliver.sang@intel.com
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We have observed a lot of lock contention and test instability when running with >8 cores.
Enough to actually make the tests run slower than with fewer cores.
Cap the maximum cores of parallel tdc to 4 which showed in testing to
be a reasonable number for efficiency and stability in different kernel
config scenarios.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231117171208.2066136-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Louis Peens says:
====================
nfp: add flow-steering support
This short series adds flow steering support for the nfp driver.
The first patch adds the part to communicate with ethtool but
stubs out the HW offload parts. The second patch implements the
HW communication and offloads flow steering.
After this series the user can now use 'ethtool -N/-n' to configure
and display rx classification rules.
====================
Link: https://lore.kernel.org/r/20231117071114.10667-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the second part to implement flow steering. Mailbox is used
for the communication between driver and HW.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231117071114.10667-3-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is the first part to implement flow steering. The communication
between ethtool and driver is done. User can use following commands
to display and set flows:
ethtool -n <netdev>
ethtool -N <netdev> flow-type ...
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231117071114.10667-2-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add dmaengine framework to communicate with the xilinx DMAengine
driver(AXIDMA).
Axi ethernet driver uses separate channels for transmit and receive.
Add support for these channels to handle TX and RX with skb and
appropriate callbacks. Also add axi ethernet core interrupt for
dmaengine framework support.
The dmaengine framework was extended for metadata API support.
However it still needs further enhancements to make it well suited for
ethernet usecases. The ethernet features i.e ethtool set/get of DMA IP
properties, ndo_poll_controller,(mentioned in TODO) are not supported
and it requires follow-up discussions.
dmaengine support has a dependency on xilinx_dma as it uses
xilinx_vdma_channel_set_config() API to reset the DMA IP
which internally reset MAC prior to accessing MDIO.
Benchmark with netperf:
xilinx-zcu102-20232:~$ netperf -H 192.168.10.20 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.10.20 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 10.02 886.69
xilinx-zcu102-20232:~$ netperf -H 192.168.10.20 -t UDP_STREAM
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.10.20 () port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 15851 0 830.66
212992 10.00 15851 830.66
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1700074613-1977070-4-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The axiethernet driver has inbuilt dma programming. In order to add
dmaengine support and make it's integration seamless the current axidma
inbuilt programming code is put under use_dmaengine check.
It also performs minor code reordering to minimize conditional
use_dmaengine checks and there is no functional change. It uses
"dmas" property to identify whether it should use a dmaengine
framework or inbuilt axidma programming.
Signed-off-by: Sarath Babu Naidu Gaddam <sarath.babu.naidu.gaddam@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1700074613-1977070-3-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xilinx 1G/2.5G Ethernet Subsystem provides 32-bit AXI4-Stream buses to
move transmit and receive Ethernet data to and from the subsystem.
These buses are designed to be used with an AXI Direct Memory Access(DMA)
IP or AXI Multichannel Direct Memory Access (MCDMA) IP core, AXI4-Stream
Data FIFO, or any other custom logic in any supported device.
Primary high-speed DMA data movement between system memory and stream
target is through the AXI4 Read Master to AXI4 memory-mapped to stream
(MM2S) Master, and AXI stream to memory-mapped (S2MM) Slave to AXI4
Write Master. AXI DMA/MCDMA enables channel of data movement on both
MM2S and S2MM paths in scatter/gather mode.
AXI DMA has two channels where as MCDMA has 16 Tx and 16 Rx channels.
To uniquely identify each channel use 'chan' suffix. Depending on the
usecase AXI ethernet driver can request any combination of multichannel
DMA channels using generic dmas, dma-names properties.
Example:
dma-names = tx_chan0, rx_chan0, tx_chan1, rx_chan1;
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/1700074613-1977070-2-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231116203449.2627525-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the code and use __dev_alloc_page() instead of __dev_alloc_pages()
with order 0 in mtk_wed_tx_buffer_alloc routine
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros says:
===================
net: eth: am65-cpsw: add ethtool MAC stats
Gets 'ethtool -S eth0 --groups eth-mac' command to work.
Also set default TX channels to maximum available and does
cleanup in am65_cpsw_nuss_common_open() error path.
Changelog:
v2:
- add __iomem to *stats, to prevent sparse warning
- clean up RX descriptors and free up SKB in error handling of
am65_cpsw_nuss_common_open()
- Re-arrange some funcitons to avoid forward declaration
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
k3_udma_glue_enable_rx/tx_chn returns error code on failure.
Bail out on error while enabling TX/RX channel.
In the error path, clean up the RX descriptors and SKBs.
Get rid of kmemleak_not_leak() as it seems unnecessary now.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
am65-cpsw supports 8 TX hardware queues. Set this as default.
The rationale is that some am65-cpsw devices can have up to 4 ethernet
ports. If the number of TX channels have to be changed then all
interfaces have to be brought down and up as the old default of 1
TX channel is too restrictive for any mqprio/taprio usage.
Another reason for this change is to allow testing using
kselftest:net/forwarding:ethtool_mm.sh out of the box.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-arrange am65_cpsw_nuss_rx_cleanup(), am65_cpsw_nuss_xmit_free() and
am65_cpsw_nuss_tx_cleanup() to avoid forward declaration.
No functional change.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gets 'ethtool -S eth0 --groups eth-mac' command to work.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
if a PF has 256 or more VFs, ip link command will allocate an order 3
memory or more, and maybe trigger OOM due to memory fragment,
the VFs needed memory size is computed in rtnl_vfinfo_size.
so introduce nlmsg_new_large which calls netlink_alloc_large_skb in
which vmalloc is used for large memory, to avoid the failure of
allocating memory
ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE
Call Trace:
dump_stack+0x57/0x6a
dump_header+0x4a/0x210
oom_kill_process+0xe4/0x140
out_of_memory+0x3e8/0x790
__alloc_pages_slowpath.constprop.116+0x953/0xc50
__alloc_pages_nodemask+0x2af/0x310
kmalloc_large_node+0x38/0xf0
__kmalloc_node_track_caller+0x417/0x4d0
__kmalloc_reserve.isra.61+0x2e/0x80
__alloc_skb+0x82/0x1c0
rtnl_getlink+0x24f/0x370
rtnetlink_rcv_msg+0x12c/0x350
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1b2/0x280
netlink_sendmsg+0x355/0x4a0
sock_sendmsg+0x5b/0x60
____sys_sendmsg+0x1ea/0x250
___sys_sendmsg+0x88/0xd0
__sys_sendmsg+0x5e/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f95a65a5b70
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ice: one by one port representors creation
Michal Swiatkowski says:
Currently ice supports creating port representors only for VFs. For that
use case they can be created and removed in one step.
This patchset is refactoring current flow to support port representor
creation also for subfunctions and SIOV. In this case port representors
need to be created and removed one by one. Also, they can be added and
removed while other port representors are running.
To achieve that we need to change the switchdev configuration flow.
Three first patches are only cosmetic (renaming, removing not used code).
Next few ones are preparation for new flow. The most important one
is "add VF representor one by one". It fully implements new flow.
New type of port representor (for subfunction) will be introduced in
follow up patchset.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: reserve number of CP queues
ice: adjust switchdev rebuild path
ice: add VF representors one by one
ice: realloc VSI stats arrays
ice: set Tx topology every time new repr is added
ice: allow changing SWITCHDEV_CTRL VSI queues
ice: return pointer to representor
ice: make representor code generic
ice: remove VF pointer reference in eswitch code
ice: track port representors in xarray
ice: use repr instead of vf->repr
ice: track q_id in representor
ice: remove unused control VSI parameter
ice: remove redundant max_vsi_num variable
ice: rename switchdev to eswitch
====================
Link: https://lore.kernel.org/r/20231114181449.1290117-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
igc: Add support for physical + free-running timers
Vinicius Costa Gomes says:
The objective is to allow having functionality that depends on the
physical timer (taprio and ETF offloads, for example) and vclocks
operating together.
The "big" missing piece is the implementation of the .getcyclesx64()
function in igc, as i225/i226 have multiple timers, we use one of
those timers (timer 1) as a free-running (non adjustable) timer.
The complication is that only implementing .getcyclesx64() and nothing
else will break synchronization when using vclocks, as reading the clock
will retrieve the free-running value but timnestamps will come from the
adjustable timer. The solution is to modify "in one go" the timestamping
code to be able to retrieve the timestamp from the correct timer (if a
socket is "phc_bound" to a vclock the timestamp will come from the
free-running timer).
I was debating whether or not to do the adjustments for the internal latencies
for the free-running timestamps, decided to do the adjustments so the path
delay when using vclocks is similar to the one when using the physical clock.
One future improvement is to implement the .getcrosscycles() function.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igc: Add support for PTP .getcyclesx64()
igc: Simplify setting flags in the TX data descriptor
====================
Link: https://lore.kernel.org/r/20231114183640.1303163-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pedro Tammela says:
====================
net/sched: cls_u32: use proper refcounts
In u32 we are open coding refcounts of hashtables with integers which is
far from ideal. Update those with proper refcount and add a couple of
tests to tdc that exercise the refcounts explicitly.
====================
Link: https://lore.kernel.org/r/20231114141856.974326-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests to specifically check for the refcount interactions of
hashtables created by u32. These tables should not be deleted when
referenced and the flush order should respect a tree like composition.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231114141856.974326-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Proper refcounts will always warn splat when something goes wrong,
be it underflow, saturation or object resurrection. As these are always
a source of bugs, use it in cls_u32 as a safeguard to prevent/catch issues.
Another benefit is that the refcount API self documents the code, making
clear when transitions to dead are expected.
For such an update we had to make minor adaptations on u32 to fit the refcount
API. First we set explicitly to '1' when objects are created, then the
objects are alive until a 1 -> 0 happens, which is then released appropriately.
The above made clear some redundant operations in the u32 code
around the root_ht handling that were removed. The root_ht is created
with a refcnt set to 1. Then when it's associated with tcf_proto it increments the refcnt to 2.
Throughout the entire code the root_ht is an exceptional case and can never be referenced,
therefore the refcnt never incremented/decremented.
Its lifetime is always bound to tcf_proto, meaning if you delete tcf_proto
the root_ht is deleted as well. The code made up for the fact that root_ht refcnt is 2 and did
a double decrement to free it, which is not a fit for the refcount API.
Even though refcount_t is implemented using atomics, we should observe
a negligible control plane impact.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231114141856.974326-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Revert following commits:
commit acec05fb78ab ("net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask")
commit 11d55be06df0 ("net: ethtool: Add a command to expose current time stamping layer")
commit bb8645b00ced ("netlink: specs: Introduce new netlink command to get current timestamp")
commit d905f9c75329 ("net: ethtool: Add a command to list available time stamping layers")
commit aed5004ee7a0 ("netlink: specs: Introduce new netlink command to list available time stamping layers")
commit 51bdf3165f01 ("net: Replace hwtstamp_source by timestamping layer")
commit 0f7f463d4821 ("net: Change the API of PHY default timestamp to MAC")
commit 091fab122869 ("net: ethtool: ts: Update GET_TS to reply the current selected timestamp")
commit 152c75e1d002 ("net: ethtool: ts: Let the active time stamping layer be selectable")
commit ee60ea6be0d3 ("netlink: specs: Introduce time stamping set command")
They need more time for reviews.
Link: https://lore.kernel.org/all/20231118183529.6e67100c@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mirsad proposed a patch to reduce the number of spinlock lock/unlock
operations and the function code size. This can be further improved
because the function sets a consecutive register block.
Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- Implement new multicast packet type, including its transmission,
forwarding and parsing, by Linus Lüssing (3 patches)
- Switch to new headers for sprintf and array size,
by Sven Eckelmann (2 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmVUwucWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoUABD/9IS2mmn9aRANgy7v1mMr3eUPte
4gFCef3xly1oRF8P7JK77pZwX3uX7HwSM7F56AKxzMjDMjPMWYZlBFB8sxVF6Q0/
JpIIcW0XSU4DHR7yzRaGOG7VSCfhIpshuHyJApFbJTs0jUxv6fnxn2pQREnfgcRd
36CMNwslJf/piTv70980UnLKJlzUwV738GkgYQ3yctKUFRSGStCzDNT/RwkjDZPs
HZ+Y/kedsIYViZrDTokpT45XUbcp+hGaQ5DNbMVSn8CpnfhmoYdm3WWV4H3CI9Jl
byZwncG+KN0dXzbc8TDH7AbdcuKZ3QcgZ09tS5kOB+Hxb3SOyHmXKLFob07+Op+d
/C5aLn7QvHdwCOsPazSHYZbUEeOh4G4sfTog9f5j+nrZ1QYkINEocvFvKSxpjKia
kOjNLooyKOD11l2acedGbEEh3k9+vBzJ6UOpPSe0bwTPwWRkUKpuWZw+aQ9rYAXw
7M8EwVpPELo1fmI5dxrV0ZKvMmO44mSzCv2Sx/iBK8/D8V0Wgde/ImV4hQ6aYtmv
ffXx+myfG1C4xz49d23zMcB9Sj71qgwVCMujDYJfOTbyokKe40fmnaejIUapEWKq
LWpKenq72GxpuI0pWgO5uYJzbDGdK2k9Xbo/92JN+ejAWbKEQbZWM0J5MXRuP+L2
uWDEvCHGqqyzov1dtg==
=w0RC
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20231115' of git://git.open-mesh.org/linux-merge
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Implement new multicast packet type, including its transmission,
forwarding and parsing, by Linus Lüssing (3 patches)
- Switch to new headers for sprintf and array size,
by Sven Eckelmann (2 patches)
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
mlxsw: Add support for new reset flow
Ido Schimmel writes:
This patchset changes mlxsw to issue a PCI reset during probe and
devlink reload so that the PCI firmware could be upgraded without a
reboot.
Unlike the old version of this patchset [1], in this version the driver
no longer tries to issue a PCI reset by triggering a PCI link toggle on
its own, but instead calls the PCI core to issue the reset.
The PCI APIs require the device lock to be held which is why patches
Patches #7 adds reset method quirk for NVIDIA Spectrum devices.
Patch #8 adds a debug level print in PCI core so that device ready delay
will be printed even if it is shorter than one second.
Patches #9-#11 are straightforward preparations in mlxsw.
Patch #12 finally implements the new reset flow in mlxsw.
Patch #13 adds PCI reset handlers in mlxsw to avoid user space from
resetting the device from underneath an unaware driver. Instead, the
driver is gracefully de-initialized before the PCI reset and then
initialized again after it.
Patch #14 adds a PCI reset selftest to make sure this code path does not
regress.
[1] https://lore.kernel.org/netdev/cover.1679502371.git.petrm@nvidia.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that PCI reset works correctly by verifying that only the expected
reset methods are supported and that after issuing the reset the ifindex
of the port changes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement reset_prepare() and reset_done() handlers that are invoked by
the PCI core before and after issuing a PCI reset, respectively.
Specifically, implement reset_prepare() by calling
mlxsw_core_bus_device_unregister() and reset_done() by calling
mlxsw_core_bus_device_register(). This is the same implementation as the
reload_{down,up}() devlink operations with the following differences:
1. The devlink instance is unregistered and then registered again after
the reset.
2. A reset via the device's command interface (using MRSR register) is
not issued during reset_done() as PCI core already issued a PCI
reset.
Tested:
# for i in $(seq 1 10); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset; done
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver resets the device during probe and during a devlink reload.
The current reset method reloads the current firmware version or a
pending one, if one was previously flashed using devlink. However, the
current reset method does not result in a PCI hot reset, preventing the
PCI firmware from being upgraded, unless the system is rebooted.
To solve this problem, a new reset command (6) was implemented in the
firmware. Unlike the current command (1), after issuing the new command
the device will not start the reset immediately, but only after a PCI
hot reset.
Implement the new reset method by first verifying that it is supported
by the current firmware version by querying the Management Capabilities
Mask (MCAM) register. If supported, issue the new reset command (6) via
MRSR register followed by a PCI reset by calling
__pci_reset_function_locked().
Once the PCI firmware is operational, go back to the regular reset flow
and wait for the entire device to become ready. That is, repeatedly read
the "system_status" register from the BAR until a value of "FW_READY"
(0x5E) appears.
Tested:
# for i in $(seq 1 10); do devlink dev reload pci/0000:01:00.0; done
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In general, the existing flow of software reset in the driver is:
1. Wait for system ready status.
2. Send MRSR command, to start the reset.
3. Wait for system ready status.
This flow will be extended once a new reset command is supported. As a
preparation, move step #2 to a separate function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patches, mlxsw_pci_sw_reset() will be extended to support
more reset types and will not necessarily issue a software reset. Rename
the function to reflect that.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mlxsw_reg_mrsr_pack() always sets 'command=1'. As preparation for
support of new reset flow, pass the command as an argument to the
function and add an enum for this field.
For now, always pass 'command=1' to the pack() function.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>