Commit Graph

1156829 Commits

Author SHA1 Message Date
Jiri Pirko
565b4824c3 devlink: change port event netdev notifier from per-net to global
Currently only the network namespace of devlink instance is monitored
for port events. If netdev is moved to a different namespace and then
unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
following WARN_ON in devl_port_unregister().
WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);

Fix this by changing the netdev notifier from per-net to global so no
event is missed.

Fixes: 02a68a47ea ("net: devlink: track netdev with devlink_port assigned")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 14:13:55 +01:00
Vladimir Oltean
bbb253b206 selftests: ocelot: tc_flower_chains: make test_vlan_ingress_modify() more comprehensive
We have two IS1 filters of the OCELOT_VCAP_KEY_ANY key type (the one with
"action vlan pop" and the one with "action vlan modify") and one of the
OCELOT_VCAP_KEY_IPV4 key type (the one with "action skbedit priority").
But we have no IS1 filter with the OCELOT_VCAP_KEY_ETYPE key type, and
there was an uncaught breakage there.

To increase test coverage, convert one of the OCELOT_VCAP_KEY_ANY
filters to OCELOT_VCAP_KEY_ETYPE, by making the filter also match on the
MAC SA of the traffic sent by mausezahn, $h1_mac.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230205192409.1796428-2-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 12:20:21 +01:00
Vladimir Oltean
f964f8399d net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q"
Alternative short title: don't instruct the hardware to match on
EtherType with "protocol 802.1Q" flower filters. It doesn't work for the
reasons detailed below.

With a command such as the following:

tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \
	protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \
	action vlan modify id 300 \
	action goto chain $(IS2 0 0)

the created filter is set by ocelot_flower_parse_key() to be of type
OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}.
This gets propagated all the way to is1_entry_set() which commits it to
hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the
case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY,
and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE.

The problem is that for VLAN-tagged frames, the hardware interprets the
ETYPE field as holding the encapsulated VLAN protocol. So the above
filter will only match those packets which have an encapsulated protocol
of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac.

The reason why this is allowed to occur is because, although we have a
block of code in ocelot_flower_parse_key() which sets "match_protocol"
to false when VLAN keys are present, that code executes too late.
There is another block of code, which executes for Ethernet addresses,
and has a "goto finished_key_parsing" and skips the VLAN header parsing.
By skipping it, "match_protocol" remains with the value it was
initialized with, i.e. "true", and "proto" is set to f->common.protocol,
or 0x8100.

The concept of ignoring some keys rather than erroring out when they are
present but can't be offloaded is dubious in itself, but is present
since the initial commit fe3490e610 ("net: mscc: ocelot: Hardware
ofload for tc flower filter"), and it's outside of the scope of this
patch to change that.

The problem was introduced when the driver started to interpret the
flower filter's protocol, and populate the VCAP filter's ETYPE field
based on it.

To fix this, it is sufficient to move the code that parses the VLAN keys
earlier than the "goto finished_key_parsing" instruction. This will
ensure that if we have a flower filter with both VLAN and Ethernet
address keys, it won't match on ETYPE 0x8100, because the VLAN key
parsing sets "match_protocol = false".

Fixes: 86b956de11 ("net: mscc: ocelot: support matching on EtherType")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 12:20:21 +01:00
Vladimir Oltean
0b6d642510 net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware
Frank reports that in a mt7530 setup where some ports are standalone and
some are in a VLAN-aware bridge, 8021q uppers of the standalone ports
lose their VLAN tag on xmit, as seen by the link partner.

This seems to occur because once the other ports join the VLAN-aware
bridge, mt7530_port_vlan_filtering() also calls
mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way
that the switch processes the traffic of the standalone port.

Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it:

EG_TAG: Incoming Port Egress Tag VLAN Attribution
0: disabled (system default)
1: consistent (keep the original ingress tag attribute)

My interpretation is that this setting applies on the ingress port, and
"disabled" is basically the normal behavior, where the egress tag format
of the packet (tagged or untagged) is decided by the VLAN table
(MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG).

But there is also an option of overriding the system default behavior,
and for the egress tagging format of packets to be decided not by the
VLAN table, but simply by copying the ingress tag format (if ingress was
tagged, egress is tagged; if ingress was untagged, egress is untagged;
aka "consistent). This is useful in 2 scenarios:

- VLAN-unaware bridge ports will always encounter a miss in the VLAN
  table. They should forward a packet as-is, though. So we use
  "consistent" there. See commit e045124e93 ("net: dsa: mt7530: fix
  tagged frames pass-through in VLAN-unaware mode").

- Traffic injected from the CPU port. The operating system is in god
  mode; if it wants a packet to exit as VLAN-tagged, it sends it as
  VLAN-tagged. Otherwise it sends it as VLAN-untagged*.

*This is true only if we don't consider the bridge TX forwarding offload
feature, which mt7530 doesn't support.

So for now, make the CPU port always stay in "consistent" mode to allow
software VLANs to be forwarded to their egress ports with the VLAN tag
intact, and not stripped.

Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/
Fixes: e045124e93 ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07 11:42:04 +01:00
Eddy Tao
15ea59a0e9 net: openvswitch: reduce cpu_used_mask memory
Use actual CPU number instead of hardcoded value to decide the size
of 'cpu_used_mask' in 'struct sw_flow'. Below is the reason.

'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
Its size is hardcoded to CONFIG_NR_CPUS bits, which can be
8192 by default, it costs memory and slows down ovs_flow_alloc.

To address this:
 Redefine cpu_used_mask to pointer.
 Append cpumask_size() bytes after 'stat' to hold cpumask.
 Initialization cpu_used_mask right after stats_last_writer.

APIs like cpumask_next and cpumask_set_cpu never access bits
beyond cpu count, cpumask_size() bytes of memory is enough.

Signed-off-by: Eddy Tao <taoyuan_eddy@hotmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/OS3P286MB229570CCED618B20355D227AF5D59@OS3P286MB2295.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06 22:36:29 -08:00
Arnd Bergmann
bbe6418663 amd-xgbe: fix mismatched prototype
The forward declaration was introduced with a prototype that does
not match the function definition:

drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c:2166:13: error: conflicting types for 'xgbe_phy_perform_ratechange' due to enum/integer mismatch; have 'void(struct xgbe_prv_data *, enum xgbe_mb_cmd,  enum xgbe_mb_subcmd)' [-Werror=enum-int-mismatch]
 2166 | static void xgbe_phy_perform_ratechange(struct xgbe_prv_data *pdata,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c:391:13: note: previous declaration of 'xgbe_phy_perform_ratechange' with type 'void(struct xgbe_prv_data *, unsigned int,  unsigned int)'
  391 | static void xgbe_phy_perform_ratechange(struct xgbe_prv_data *pdata,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Ideally there should not be any forward declarations here, which
would make it easier to show that there is no unbounded recursion.
I tried fixing this but could not figure out how to avoid the
recursive call.

As a hotfix, address only the broken prototype to fix the build
problem instead.

Fixes: 4f3b20bfbb ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20230203121553.2871598-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06 22:33:18 -08:00
Colin Foster
b1ca2f1b04 net: mscc: ocelot: un-export unused regmap symbols
There are no external users of the vsc7514_*_regmap[] symbols or
vsc7514_vcap_* functions. They were exported in commit 32ecd22ba6 ("net:
mscc: ocelot: split register definitions to a separate file") with the
intention of being used, but the actual structure used in commit
2efaca411c ("net: mscc: ocelot: expose vsc7514_regmap definition") ended
up being all that was needed.

Bury these unnecessary symbols.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230204182056.25502-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06 22:33:15 -08:00
Jakub Kicinski
9ac543c06f Merge branch 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux
Ajit Khaparde says:

====================
bnxt: Add Auxiliary driver support

Add auxiliary device driver for Broadcom devices.
The bnxt_en driver will register and initialize an aux device
if RDMA is enabled in the underlying device.
The bnxt_re driver will then probe and initialize the
RoCE interfaces with the infiniband stack.

We got rid of the bnxt_en_ops which the bnxt_re driver used to
communicate with bnxt_en.
Similarly  We have tried to clean up most of the bnxt_ulp_ops.
In most of the cases we used the functions and entry points provided
by the auxiliary bus driver framework.
And now these are the minimal functions needed to support the functionality.

We will try to work on getting rid of the remaining if we find any
other viable option in future.

* 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux:
  bnxt_en: Remove runtime interrupt vector allocation
  RDMA/bnxt_re: Remove the sriov config callback
  bnxt_en: Remove struct bnxt access from RoCE driver
  bnxt_en: Use auxiliary bus calls over proprietary calls
  bnxt_en: Use direct API instead of indirection
  bnxt_en: Remove usage of ulp_id
  RDMA/bnxt_re: Use auxiliary driver interface
  bnxt_en: Add auxiliary driver support
====================

Link: https://lore.kernel.org/r/20230202033809.3989-1-ajit.khaparde@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06 22:25:48 -08:00
ZhaoLong Wang
aa5465aeca cifs: Fix use-after-free in rdata->read_into_pages()
When the network status is unstable, use-after-free may occur when
read data from the server.

  BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0

  Call Trace:
   <TASK>
   dump_stack_lvl+0x38/0x4c
   print_report+0x16f/0x4a6
   kasan_report+0xb7/0x130
   readpages_fill_pages+0x14c/0x7e0
   cifs_readv_receive+0x46d/0xa40
   cifs_demultiplex_thread+0x121c/0x1490
   kthread+0x16b/0x1a0
   ret_from_fork+0x2c/0x50
   </TASK>

  Allocated by task 2535:
   kasan_save_stack+0x22/0x50
   kasan_set_track+0x25/0x30
   __kasan_kmalloc+0x82/0x90
   cifs_readdata_direct_alloc+0x2c/0x110
   cifs_readdata_alloc+0x2d/0x60
   cifs_readahead+0x393/0xfe0
   read_pages+0x12f/0x470
   page_cache_ra_unbounded+0x1b1/0x240
   filemap_get_pages+0x1c8/0x9a0
   filemap_read+0x1c0/0x540
   cifs_strict_readv+0x21b/0x240
   vfs_read+0x395/0x4b0
   ksys_read+0xb8/0x150
   do_syscall_64+0x3f/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

  Freed by task 79:
   kasan_save_stack+0x22/0x50
   kasan_set_track+0x25/0x30
   kasan_save_free_info+0x2e/0x50
   __kasan_slab_free+0x10e/0x1a0
   __kmem_cache_free+0x7a/0x1a0
   cifs_readdata_release+0x49/0x60
   process_one_work+0x46c/0x760
   worker_thread+0x2a4/0x6f0
   kthread+0x16b/0x1a0
   ret_from_fork+0x2c/0x50

  Last potentially related work creation:
   kasan_save_stack+0x22/0x50
   __kasan_record_aux_stack+0x95/0xb0
   insert_work+0x2b/0x130
   __queue_work+0x1fe/0x660
   queue_work_on+0x4b/0x60
   smb2_readv_callback+0x396/0x800
   cifs_abort_connection+0x474/0x6a0
   cifs_reconnect+0x5cb/0xa50
   cifs_readv_from_socket.cold+0x22/0x6c
   cifs_read_page_from_socket+0xc1/0x100
   readpages_fill_pages.cold+0x2f/0x46
   cifs_readv_receive+0x46d/0xa40
   cifs_demultiplex_thread+0x121c/0x1490
   kthread+0x16b/0x1a0
   ret_from_fork+0x2c/0x50

The following function calls will cause UAF of the rdata pointer.

readpages_fill_pages
 cifs_read_page_from_socket
  cifs_readv_from_socket
   cifs_reconnect
    __cifs_reconnect
     cifs_abort_connection
      mid->callback() --> smb2_readv_callback
       queue_work(&rdata->work)  # if the worker completes first,
                                 # the rdata is freed
          cifs_readv_complete
            kref_put
              cifs_readdata_release
                kfree(rdata)
 return rdata->...               # UAF in readpages_fill_pages()

Similarly, this problem also occurs in the uncache_fill_pages().

Fix this by adjusts the order of condition judgment in the return
statement.

Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-06 22:50:25 -06:00
Zhang Changzhong
4a606ce684 ice: switch: fix potential memleak in ice_add_adv_recipe()
When ice_add_special_words() fails, the 'rm' is not released, which will
lead to a memory leak. Fix this up by going to 'err_unroll' label.

Compile tested only.

Fixes: 8b032a55c1 ("ice: low level support for tunnels")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06 15:13:02 -08:00
Dan Carpenter
3f4870df1b ice: Fix off by one in ice_tc_forward_to_queue()
The > comparison should be >= to prevent reading one element beyond
the end of the array.

The "vsi->num_rxq" is not strictly speaking the number of elements in
the vsi->rxq_map[] array.  The array has "vsi->alloc_rxq" elements and
"vsi->num_rxq" is less than or equal to the number of elements in the
array.  The array is allocated in ice_vsi_alloc_arrays().  It's still
an off by one but it might not access outside the end of the array.

Fixes: 143b86f346 ("ice: Enable RX queue selection using skbedit action")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06 15:13:02 -08:00
Brett Creeley
c793f8ea15 ice: Fix disabling Rx VLAN filtering with port VLAN enabled
If the user turns on the vf-true-promiscuous-support flag, then Rx VLAN
filtering will be disabled if the VF requests to enable promiscuous
mode. When the VF is in a port VLAN, this is the incorrect behavior
because it will allow the VF to receive traffic outside of its port VLAN
domain. Fortunately this only resulted in the VF(s) receiving broadcast
traffic outside of the VLAN domain because all of the VLAN promiscuous
rules are based on the port VLAN ID. Fix this by setting the
.disable_rx_filtering VLAN op to a no-op when a port VLAN is enabled on
the VF.

Also, make sure to make this fix for both Single VLAN Mode and Double
VLAN Mode enabled devices.

Fixes: c31af68a1b ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 15:13:02 -08:00
Michal Swiatkowski
b2dbde3ad4 ice: fix out-of-bounds KASAN warning in virtchnl
KASAN reported:
[ 9793.708867] BUG: KASAN: global-out-of-bounds in ice_get_link_speed+0x16/0x30 [ice]
[ 9793.709205] Read of size 4 at addr ffffffffc1271b1c by task kworker/6:1/402

[ 9793.709222] CPU: 6 PID: 402 Comm: kworker/6:1 Kdump: loaded Tainted: G    B      OE      6.1.0+ #3
[ 9793.709235] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[ 9793.709245] Workqueue: ice ice_service_task [ice]
[ 9793.709575] Call Trace:
[ 9793.709582]  <TASK>
[ 9793.709588]  dump_stack_lvl+0x44/0x5c
[ 9793.709613]  print_report+0x17f/0x47b
[ 9793.709632]  ? __cpuidle_text_end+0x5/0x5
[ 9793.709653]  ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.709986]  ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.710317]  kasan_report+0xb7/0x140
[ 9793.710335]  ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.710673]  ice_get_link_speed+0x16/0x30 [ice]
[ 9793.711006]  ice_vc_notify_vf_link_state+0x14c/0x160 [ice]
[ 9793.711351]  ? ice_vc_repr_cfg_promiscuous_mode+0x120/0x120 [ice]
[ 9793.711698]  ice_vc_process_vf_msg+0x7a7/0xc00 [ice]
[ 9793.712074]  __ice_clean_ctrlq+0x98f/0xd20 [ice]
[ 9793.712534]  ? ice_bridge_setlink+0x410/0x410 [ice]
[ 9793.712979]  ? __request_module+0x320/0x520
[ 9793.713014]  ? ice_process_vflr_event+0x27/0x130 [ice]
[ 9793.713489]  ice_service_task+0x11cf/0x1950 [ice]
[ 9793.713948]  ? io_schedule_timeout+0xb0/0xb0
[ 9793.713972]  process_one_work+0x3d0/0x6a0
[ 9793.714003]  worker_thread+0x8a/0x610
[ 9793.714031]  ? process_one_work+0x6a0/0x6a0
[ 9793.714049]  kthread+0x164/0x1a0
[ 9793.714071]  ? kthread_complete_and_exit+0x20/0x20
[ 9793.714100]  ret_from_fork+0x1f/0x30
[ 9793.714137]  </TASK>

[ 9793.714151] The buggy address belongs to the variable:
[ 9793.714158]  ice_aq_to_link_speed+0x3c/0xffffffffffff3520 [ice]

[ 9793.714632] Memory state around the buggy address:
[ 9793.714642]  ffffffffc1271a00: f9 f9 f9 f9 00 00 05 f9 f9 f9 f9 f9 00 00 02 f9
[ 9793.714656]  ffffffffc1271a80: f9 f9 f9 f9 00 00 04 f9 f9 f9 f9 f9 00 00 00 00
[ 9793.714670] >ffffffffc1271b00: 00 00 00 04 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
[ 9793.714680]                             ^
[ 9793.714690]  ffffffffc1271b80: 00 00 00 00 00 04 f9 f9 f9 f9 f9 f9 00 00 00 00
[ 9793.714704]  ffffffffc1271c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The ICE_AQ_LINK_SPEED_UNKNOWN define is BIT(15). The value is bigger
than both legacy and normal link speed tables. Add one element (0 -
unknown) to both tables. There is no need to explicitly set table size,
leave it empty.

Fixes: 1d0e28a9be ("ice: Remove and replace ice speed defines with ethtool.h versions")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06 15:13:02 -08:00
Anirudh Venkataramanan
4d159f7884 ice: Do not use WQ_MEM_RECLAIM flag for workqueue
When both ice and the irdma driver are loaded, a warning in
check_flush_dependency is being triggered. This is due to ice driver
workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
is not.

According to kernel documentation, this flag should be set if the
workqueue will be involved in the kernel's memory reclamation flow.
Since it is not, there is no need for the ice driver's WQ to have this
flag set so remove it.

Example trace:

[  +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
[  +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
[  +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[  +0.000161]  [last unloaded: bonding]
[  +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S                 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
[  +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  +0.000003] Workqueue: ice ice_service_task [ice]
[  +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
[  +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
[  +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
[  +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
[  +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
[  +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
[  +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
[  +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
[  +0.000004] FS:  0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
[  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
[  +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.000002] PKRU: 55555554
[  +0.000003] Call Trace:
[  +0.000002]  <TASK>
[  +0.000003]  __flush_workqueue+0x203/0x840
[  +0.000006]  ? mutex_unlock+0x84/0xd0
[  +0.000008]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? __pfx___flush_workqueue+0x10/0x10
[  +0.000006]  ? mutex_lock+0xa3/0xf0
[  +0.000005]  ib_cache_cleanup_one+0x39/0x190 [ib_core]
[  +0.000174]  __ib_unregister_device+0x84/0xf0 [ib_core]
[  +0.000094]  ib_unregister_device+0x25/0x30 [ib_core]
[  +0.000093]  irdma_ib_unregister_device+0x97/0xc0 [irdma]
[  +0.000064]  ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
[  +0.000059]  ? up_write+0x5c/0x90
[  +0.000005]  irdma_remove+0x36/0x90 [irdma]
[  +0.000062]  auxiliary_bus_remove+0x32/0x50
[  +0.000007]  device_release_driver_internal+0xfa/0x1c0
[  +0.000005]  bus_remove_device+0x18a/0x260
[  +0.000007]  device_del+0x2e5/0x650
[  +0.000005]  ? __pfx_device_del+0x10/0x10
[  +0.000003]  ? mutex_unlock+0x84/0xd0
[  +0.000004]  ? __pfx_mutex_unlock+0x10/0x10
[  +0.000004]  ? _raw_spin_unlock+0x18/0x40
[  +0.000005]  ice_unplug_aux_dev+0x52/0x70 [ice]
[  +0.000160]  ice_service_task+0x1309/0x14f0 [ice]
[  +0.000134]  ? __pfx___schedule+0x10/0x10
[  +0.000006]  process_one_work+0x3b1/0x6c0
[  +0.000008]  worker_thread+0x69/0x670
[  +0.000005]  ? __kthread_parkme+0xec/0x110
[  +0.000007]  ? __pfx_worker_thread+0x10/0x10
[  +0.000005]  kthread+0x17f/0x1b0
[  +0.000005]  ? __pfx_kthread+0x10/0x10
[  +0.000004]  ret_from_fork+0x29/0x50
[  +0.000009]  </TASK>

Fixes: 940b61af02 ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06 15:13:02 -08:00
Colin Ian King
8306829bf8 selftests/bpf: Fix spelling mistake "detecion" -> "detection"
There is a spelling mistake in a literal string. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230206092229.46416-1-colin.i.king@gmail.com
2023-02-06 14:51:30 -08:00
Hao Xiang
d1d7730ff8 libbpf: Correctly set the kernel code version in Debian kernel.
In a previous commit, Ubuntu kernel code version is correctly set
by retrieving the information from /proc/version_signature.

commit<5b3d72987701d51bf31823b39db49d10970f5c2d>
(libbpf: Improve LINUX_VERSION_CODE detection)

The /proc/version_signature file doesn't present in at least the
older versions of Debian distributions (eg, Debian 9, 10). The Debian
kernel has a similar issue where the release information from uname()
syscall doesn't give the kernel code version that matches what the
kernel actually expects. Below is an example content from Debian 10.

release: 4.19.0-23-amd64
version: #1 SMP Debian 4.19.269-1 (2022-12-20) x86_64

Debian reports incorrect kernel version in utsname::release returned
by uname() syscall, which in older kernels (Debian 9, 10) leads to
kprobe BPF programs failing to load due to the version check mismatch.

Fortunately, the correct kernel code version presents in the
utsname::version returned by uname() syscall in Debian kernels. This
change adds another get kernel version function to handle Debian in
addition to the previously added get kernel version function to handle
Ubuntu. Some minor refactoring work is also done to make the code more
readable.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230203234842.2933903-1-hao.xiang@bytedance.com
2023-02-06 14:30:50 -08:00
Linus Torvalds
05ecb68070 cgroup fixes for v6.2-rc7
During v6.2 cycle, there were a series of changes to task cpu affinity
 handling which fixed cpuset inadvertently clobbering user-configured
 affinity masks. Unfortunately, they broke the affinity handling on hybrid
 heterogeneous CPUs which have cores that can execute both 64 and 32bit along
 with cores that can only execute 32bit code.
 
 This late pull request contains two fix patches for the above issue. While
 reverting the changes that caused the regression is definitely an option,
 the origial patches do improve how cpuset behave signficantly in some cases
 and the fixes seem fairly safe, so I think it'd be better to try to fix them
 first.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY+F1Fg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGXK8AP0SuFGMgUQL1gYVGfFG3gMCcxDKHVhdU+UP/N1l
 oJXEpgEA8DW1otuQZz0+MHUyYHkEIUQ5eVj1f2BJfHRkc5r5Cgo=
 =MXxv
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.2-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "During the v6.2 cycle, there were a series of changes to task cpu
  affinity handling which fixed cpuset inadvertently clobbering
  user-configured affinity masks. Unfortunately, they broke the affinity
  handling on hybrid heterogeneous CPUs which have cores that can
  execute both 64 and 32bit along with cores that can only execute 32bit
  code.

  This contains two fix patches for the above issue. While reverting the
  changes that caused the regression is definitely an option, the
  origial patches do improve how cpuset behave signficantly in some
  cases and the fixes seem fairly safe, so I think it'd be better to try
  to fix them first"

* tag 'cgroup-for-6.2-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
  cgroup/cpuset: Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks
2023-02-06 14:07:38 -08:00
Linus Torvalds
66fcf74e5c for-6.2-rc7-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmPhSm8ACgkQxWXV+ddt
 WDtucA/+MYsOjRZtG76NFUzDVaWpgPJ0/M7lJlzQkhMpRZwjVheDBDCGDSlu/Xzq
 wLdvc4VR/o0xZD90KtnQNDPwq1jknBHynVUiWAUzt0FKWu81Jd5TvfRMmGKGQ5B2
 CxSdfB2iatL/1L+DZ3q4uUXg8L+MDKTtjk2xOb648pXrT2MIy3u3j9ZhlDiYhvWx
 6YlPyUehq7a9gLXq6TGmZjC4FUboqlI6hdf3iu3rHlCeFFXTPT4QKR9G8FpVRikc
 C7lH8X3qV2Sg6rGaFT3BIsamS/rQZHh3zOuj4EbI/n6ZXiSsr0Bo/2JAxgyGYoH0
 u5LkIRIpry7E4Pn2vc9mj9T7C+tpN7BP+rQ9wL6r9KIbDB/c1hOsfOp+uZikukpY
 Lg9EvHksHyp0Fcrro3FxswRlK1Q5Q7Vx/+VUoYB93WCl8iQtEiVOH2LSoR+ZtSiD
 /Iitx8i1qcNO5DiFPcZgVC0WbrEfDoVqnwPrvY77BsBMA7i4l6Pe/n5Kw/vzRGmY
 ywo08fri7Daqv3HulBk3QrVGw4lHFPOuUpN9DkI3WfUoXTNeclzTPFS+27XnaXZn
 bP3OLf7hU7zTRC8FukWk9X4nPSTLT0xJ8LllGdMp1Wi9ntavqIDiJAviGsyqvneC
 FTgTKHFuvXvzgnji66Lo61wMEPRbac49diAKcmSiQwua/I7aPRY=
 =5fdr
 -----END PGP SIGNATURE-----

Merge tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - explicitly initialize zlib work memory to fix a KCSAN warning

 - limit number of send clones by maximum memory allocated

 - limit device size extent in case it device shrink races with chunk
   allocation

 - raid56 fixes:
     - fix copy&paste error in RAID6 stripe recovery
     - make error bitmap update atomic

* tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: raid56: make error_bitmap update atomic
  btrfs: send: limit number of clones and allocated memory size
  btrfs: zlib: zero-initialize zlib workspace
  btrfs: limit device extents to the device size
  btrfs: raid56: fix stripes if vertical errors are found
2023-02-06 14:05:16 -08:00
Will Deacon
7a2127e66a cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.

For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().

Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.

Fixes: 431c69fac0 ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-02-06 10:18:36 -10:00
Waiman Long
3fb906e7fa cgroup/cpuset: Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks
Since commit 8f9ea86fdf ("sched: Always preserve the user
requested cpumask"), relax_compatible_cpus_allowed_ptr() is calling
__sched_setaffinity() unconditionally. This helps to expose a bug in
the current cpuset hotplug code where the cpumasks of the tasks in
the top cpuset are not updated at all when some CPUs become online or
offline. It is likely caused by the fact that some of the tasks in the
top cpuset, like percpu kthreads, cannot have their cpu affinity changed.

One way to reproduce this as suggested by Peter is:
 - boot machine
 - offline all CPUs except one
 - taskset -p ffffffff $$
 - online all CPUs

Fix this by allowing cpuset_cpus_allowed() to return a wider mask that
includes offline CPUs for those tasks that are in the top cpuset. For
tasks not in the top cpuset, the old rule applies and only online CPUs
will be returned in the mask since hotplug events will update their
cpumasks accordingly.

Fixes: 8f9ea86fdf ("sched: Always preserve the user requested cpumask")
Reported-by: Will Deacon <will@kernel.org>
Originally-from: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-02-06 10:15:08 -10:00
Jacob Keller
e0645311e1 ice: remove unnecessary virtchnl_ether_addr struct use
The dev_lan_addr and hw_lan_addr members of ice_vf are used only to store
the MAC address for the VF. They are defined using virtchnl_ether_addr, but
only the .addr sub-member is actually used. Drop the use of
virtchnl_ether_addr and just use a u8 array of length [ETH_ALEN].

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:47:48 -08:00
Jacob Keller
537dfe06ac ice: introduce .irq_close VF operation
The Scalable IOV implementation will require notifying the VDCM driver when
an IRQ must be closed. This allows the VDCM to handle releasing stale IRQ
context values and properly reconfigure.

To handle this, introduce a new optional .irq_close callback to the VF
operations structure. This will be implemented by Scalable IOV to handle
the shutdown of the IRQ context.

Since the SR-IOV implementation does not need this, we must check that its
non-NULL before calling it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:47:02 -08:00
Jacob Keller
fa4a15c85c ice: introduce clear_reset_state operation
When hardware is reset, the VF relies on the VFGEN_RSTAT register to detect
when the VF is finished resetting. This is a tri-state register where 0
indicates a reset is in progress, 1 indicates the hardware is done
resetting, and 2 indicates that the software is done resetting.

Currently the PF driver relies on the device hardware resetting VFGEN_RSTAT
when a global reset occurs. This works ok, but it does mean that the VF
might not immediately notice a reset when the driver first detects that the
global reset is occurring.

This is also problematic for Scalable IOV, because there is no read/write
equivalent VFGEN_RSTAT register for the Scalable VSI type. Instead, the
Scalable IOV VFs will need to emulate this register.

To support this, introduce a new VF operation, clear_reset_state, which is
called when the PF driver first detects a global reset. The Single Root IOV
implementation can just write to VFGEN_RSTAT to ensure it's cleared
immediately, without waiting for the actual hardware reset to begin. The
Scalable IOV implementation will use this as part of its tracking of the
reset status to allow properly reporting the emulated VFGEN_RSTAT to the VF
driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:45:03 -08:00
Jacob Keller
5531bb851f ice: convert vf_ops .vsi_rebuild to .create_vsi
The .vsi_rebuild function exists for ice_reset_vf. It is used to release
and re-create the VSI during a single-VF reset.

This function is only called when we need to re-create the VSI, and not
when rebuilding an existing VSI. This makes the single-VF reset process
different from the process used to restore functionality after a
hardware reset such as the PF reset or EMP reset.

When we add support for Scalable IOV VFs, the implementation will be very
similar. The primary difference will be in the fact that each VF type uses
a different underlying VSI type in hardware.

Move the common functionality into a new ice_vf_recreate VSI function. This
will allow the two IOV paths to share this functionality. Rework the
.vsi_rebuild vf_op into .create_vsi, only performing the task of creating a
new VSI.

This creates a nice dichotomy between the ice_vf_rebuild_vsi and
ice_vf_recreate_vsi, and should make it more clear why the two flows atre
distinct.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:42:55 -08:00
Jacob Keller
b1b5694235 ice: introduce ice_vf_init_host_cfg function
Introduce a new generic helper ice_vf_init_host_cfg which performs common
host configuration initialization tasks that will need to be done for both
Single Root IOV and the new Scalable IOV implementation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:41:56 -08:00
Jacob Keller
b5dcff1fd5 ice: add a function to initialize vf entry
Some of the initialization code for Single Root IOV VFs will need to be
reused when we introduce Scalable IOV. Pull this code out into a new
ice_initialize_vf_entry helper function.

Co-developed-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:41:01 -08:00
Jacob Keller
aeead3d04f ice: Pull common tasks into ice_vf_post_vsi_rebuild
The Single Root IOV implementation of .post_vsi_rebuild performs some tasks
that will ultimately need to be shared with the Scalable IOV implementation
such as rebuilding the host configuration.

Refactor by introducing a new wrapper function, ice_vf_post_vsi_rebuild
which performs the tasks that will be shared between SR-IOV and Scalable
IOV. Move the ice_vf_rebuild_host_cfg and ice_vf_set_initialized calls into
this wrapper. Then call the implementation specific post_vsi_rebuild
handler afterwards.

This ensures that we will properly re-initialize filters and expected
settings for both SR-IOV and Scalable IOV.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:40:03 -08:00
Jacob Keller
1efee0734b ice: move ice_vf_vsi_release into ice_vf_lib.c
The ice_vf_vsi_release function will be used in a future change to
refactor the .vsi_rebuild function. Move this over to ice_vf_lib.c so
that it can be used there.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:39:13 -08:00
Jacob Keller
e158819706 ice: move vsi_type assignment from ice_vsi_alloc to ice_vsi_cfg
The ice_vsi_alloc and ice_vsi_cfg functions are used together to allocate
and configure a new VSI, called as part of the ice_vsi_setup function.

In the future with the addition of the subfunction code the ice driver
will want to be able to allocate a VSI while delaying the configuration to
a later point of the port activation.

Currently this requires that the port code know what type of VSI should
be allocated. This is required because ice_vsi_alloc assigns the VSI type.

Refactor the ice_vsi_alloc and ice_vsi_cfg functions so that VSI type
assignment isn't done until the configuration stage. This will allow the
devlink port addition logic to reserve a VSI as early as possible before
the type of the port is known. In this way, the port add can fail in the
event that all hardware VSI resources are exhausted.

Since the ice_vsi_cfg function already takes the ice_vsi_cfg_params
structure, this is relatively straight forward.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:38:25 -08:00
Jacob Keller
5e509ab237 ice: refactor VSI setup to use parameter structure
The ice_vsi_setup function, ice_vsi_alloc, and ice_vsi_cfg functions have
grown a large number of parameters. These parameters are used to initialize
a new VSI, as well as re-configure an existing VSI

Any time we want to add a new parameter to this function chain, even if it
will usually be unset, we have to change many call sites due to changing
the function signature.

A future change is going to refactor ice_vsi_alloc and ice_vsi_cfg to move
the VSI configuration and initialization all into ice_vsi_cfg.

Before this, refactor the VSI setup flow to use a new ice_vsi_cfg_params
structure. This will contain the configuration (mainly pointers) used to
initialize a VSI.

Pass this from ice_vsi_setup into the related functions such as
ice_vsi_alloc, ice_vsi_cfg, and ice_vsi_cfg_def.

Introduce a helper, ice_vsi_to_params to convert an existing VSI to the
parameters used to initialize it. This will aid in the flows where we
rebuild an existing VSI.

Since we also pass the ICE_VSI_FLAG_INIT to more functions which do not
need (or cannot yet have) the VSI parameters, lets make this clear by
renaming the function parameter to vsi_flags and using a u32 instead of a
signed integer. The name vsi_flags also makes it clear that we may extend
the flags in the future.

This change will make it easier to refactor the setup flow in the future,
and will reduce the complexity required to add a new parameter for
configuration in the future.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:37:31 -08:00
Jacob Keller
157acda5b1 ice: drop unnecessary VF parameter from several VSI functions
The vsi->vf pointer gets assigned early on during ice_vsi_alloc. Several
functions currently take a VF pointer, but they can just use the existing
vsi->vf pointer as needed. Modify these functions to drop the unnecessary
VF parameter.

Note that ice_vsi_cfg is not changed as a following change will refactor so
that the VF pointer is assigned during ice_vsi_cfg rather than
ice_vsi_alloc.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:36:09 -08:00
Jacob Keller
a2ca73eae3 ice: fix function comment referring to ice_vsi_alloc
Since commit 1d2e32275de7 ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_alloc has not been responsible for all of the behavior
implied by the comment for ice_vsi_setup_vector_base.

Fix the comment to refer to the new function ice_vsi_alloc_def().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:34:25 -08:00
Brett Creeley
772dec6459 ice: Add more usage of existing function ice_get_vf_vsi(vf)
Extend the usage of function ice_get_vf_vsi(vf) in multiple places
instead of VF's VSI by using a long string of dereferences
(i.e. vf->pf->vsi[vf->lan_vsi_idx]).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Kalyan Kodamagula <kalyan.kodamagula@intel.com>
Tested-by: Piotr Tyda <piotr.tyda@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-06 09:33:12 -08:00
Luka Guzenko
ebebf05a4b HID: Ignore battery for ELAN touchscreen 29DF on HP
The touchscreen reports a battery status of 0% and jumps to 1% when a
stylus is used. The device ID was added and the battery ignore quirk was
enabled for it.

Signed-off-by: Luka Guzenko <l.guzenko@web.de>
Link: https://lore.kernel.org/r/20230120223741.3007-1-l.guzenko@web.de
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2023-02-06 16:38:55 +01:00
Marc Kleine-Budde
3dafbe5cc1 Merge patch series "can: bittiming: cleanups and rework SJW handling"
Marc Kleine-Budde <mkl@pengutronix.de> says:

several people noticed that on modern CAN controllers with wide bit
timing registers the default SJW of 1 can result in unstable or no
synchronization to the CAN network. See Patch 14/17 for details.

During review of v1 Vincent pointed out that the original code and the
series doesn't always check user provided bit timing parameters,
sometimes silently limits them and the return error values are not
consistent.

This series first cleans up some code in bittiming.c, replacing
open-coded variants by macros or functions (Patches 1, 2).

Patch 3 adds the missing assignment of the effective TQ if the
interface is configured with low level timing parameters.

Patch 4 is another code cleanup.

Patches 5, 6 check the bit timing parameter during interface
registration.

Patch 7 adds a validation of the sample point.

The patches 8-13 convert the error messages from netdev_err() to
NL_SET_ERR_MSG_FMT, factor out the SJW handling from
can_fixup_bittiming(), add checking and error messages for the
individual limits and harmonize the error return values.

Patch 14 changes the default SJW value from 1 to min(Phase Seg1, Phase
Seg2 / 2).

Patch 15 switches can_calc_bittiming() to use the new SJW handling.

Patch 16 converts can_calc_bittiming() to NL_SET_ERR_MSG_FMT().

And patch 16 adds a NL_SET_ERR_MSG_FMT() error message to
can_validate_bitrate().

v1: https://lore.kernel.org/all/20220907103845.3929288-1-mkl@pengutronix.de

Link: https://lore.kernel.org/all/20230202110854.2318594-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 14:02:37 +01:00
Marc Kleine-Budde
6d7934719f can: bittiming: can_validate_bitrate(): report error via netlink
Report an error to user space via netlink if the requested bit rate is
not supported by the device.

Link: https://lore.kernel.org/all/20230202110854.2318594-18-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:27 +01:00
Marc Kleine-Budde
06742086a3 can: bittiming: can_calc_bittiming(): convert from netdev_err() to NL_SET_ERR_MSG_FMT()
Replace the netdev_err() by NL_SET_ERR_MSG_FMT() to better inform the
user about the problem. While there, use %u to print unsigned values
and improve error message a bit.

In case of an error, return -EINVAL instead of -EDOM, this corresponds
better to the actual meaning of the error value.

Link: https://lore.kernel.org/all/20230202110854.2318594-17-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:27 +01:00
Marc Kleine-Budde
c7650728a7 can: bittiming: can_calc_bittiming(): clean up SJW handling
In the current code, if the user configures a bitrate, a default SJW
value of 1 is used. If the user configures both a bitrate and a SJW
value, can_calc_bittiming() silently limits the SJW value to SJW max
and TSEG2.

We came to the conclusion that if the user provided an invalid SJW
value, it's best to bail out and inform the user [1].

[1] https://lore.kernel.org/all/CAMZ6RqKqhmTgUZiwe5uqUjBDnhhC2iOjZ791+Y845btJYwVDKg@mail.gmail.com

Further the ISO 11898-1:2015 standard mandates that "SJW shall be less
than or equal to the minimum of these two items: Phase_Seg1 and
Phase_Seg2." [2] The current code is missing that check.

[2] https://lore.kernel.org/all/BL3PR11MB64844E3FC13C55433CDD0B3DFB449@BL3PR11MB6484.namprd11.prod.outlook.com

The previous patches introduced
1) can_sjw_set_default() - sets a default value for SJW if unset
2) can_sjw_check() - implements a SJW check against SJW max, Phase
   Seg1 and Phase Seg2. In the error case this function reports the error
   to user space via netlink.

Replace both the open-coded SJW default setting and the open-coded and
insufficient checks of SJW with the helper functions
can_sjw_set_default() and can_sjw_check().

Link: https://lore.kernel.org/all/20230202110854.2318594-16-mkl@pengutronix.de
Link: https://lore.kernel.org/all/CAMZ6RqKqhmTgUZiwe5uqUjBDnhhC2iOjZ791+Y845btJYwVDKg@mail.gmail.com
Link: https://lore.kernel.org/all/BL3PR11MB64844E3FC13C55433CDD0B3DFB449@BL3PR11MB6484.namprd11.prod.outlook.com
Suggested-by: Thomas Kopp <Thomas.Kopp@microchip.com>
Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:27 +01:00
Marc Kleine-Budde
80bcf5ec99 can: bittiming: can_sjw_set_default(): use Phase Seg2 / 2 as default for SJW
"The (Re-)Synchronization Jump Width (SJW) defines how far a
 resynchronization may move the Sample Point inside the limits defined
 by the Phase Buffer Segments to compensate for edge phase errors." [1]

In other words, this means that the SJW parameter controls the
tolerance of the CAN controller to frequency errors compared to other
CAN controllers.

If the user space does not provide an SJW parameter, the kernel
chooses a default value of 1. This has proven to be a good default
value for classic CAN controllers, but no longer for modern CAN-FD
controllers.

In the past there were CAN controllers like the sja1000 with a rather
limited range of bit timing parameters. For the standard bit rates
this results in the following bit timing parameters:

| Bit timing parameters for sja1000 with 8.000000 MHz ref clock
|                     _----+--------------=> tseg1: 1 …   16
|                    /    /     _---------=> tseg2: 1 …    8
|                   |    |     /    _-----=> sjw:   1 …    4
|                   |    |    |    /    _-=> brp:   1 …   64 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error  BTR0 BTR1
|  1000000    125   2    3    2   1   1  1000000   0.0%  75.0%  75.0%   0.0%   0x00 0x14
|   800000    125   3    4    2   1   1   800000   0.0%  80.0%  80.0%   0.0%   0x00 0x16
|   666666    125   4    4    3   1   1   666666   0.0%  80.0%  75.0%   6.2%   0x00 0x27
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00 0x1c
|   250000    250   6    7    2   1   2   250000   0.0%  87.5%  87.5%   0.0%   0x01 0x1c
|   125000    500   6    7    2   1   4   125000   0.0%  87.5%  87.5%   0.0%   0x03 0x1c
|   100000    625   6    7    2   1   5   100000   0.0%  87.5%  87.5%   0.0%   0x04 0x1c
|    83333    750   6    7    2   1   6    83333   0.0%  87.5%  87.5%   0.0%   0x05 0x1c
|    50000   1250   6    7    2   1  10    50000   0.0%  87.5%  87.5%   0.0%   0x09 0x1c
|    33333   1875   6    7    2   1  15    33333   0.0%  87.5%  87.5%   0.0%   0x0e 0x1c
|    20000   3125   6    7    2   1  25    20000   0.0%  87.5%  87.5%   0.0%   0x18 0x1c
|    10000   6250   6    7    2   1  50    10000   0.0%  87.5%  87.5%   0.0%   0x31 0x1c

The attentive reader will notice that the SJW is 1 in most cases,
while the Seg2 phase is 2. Both values are given in TQ units, which in
turn is a duration in nanoseconds.

For example the 500 kbit/s configuration:

|  nominal                                  real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error  BTR0 BTR1
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00 0x1c

the TQ is 125ns, the Phase Seg2 is "2" (== 250ns), the SJW is "1" (==
125 ns).

Looking at a more modern CAN controller like a mcp2518fd, it has wider
bit timing registers.

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock
|                     _----+--------------=> tseg1: 2 …  256
|                    /    /     _---------=> tseg2: 1 …  128
|                   |    |     /    _-----=> sjw:   1 …  128
|                   |    |    |    /    _-=> brp:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error      NBTCFG
|   500000     25  34   35   10   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00440900

The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "1" (==
25ns).

Since the kernel chooses a default SJW of 1 regardless of the TQ, this
leads to a much smaller SJW and thus much smaller tolerances to
frequency errors.

To maintain the same oscillator tolerances on controllers with wide
bit timing registers, select a default SJW value of Phase Seg2 / 2
unless Phase Seg 1 is less. This results in the following bit timing
parameters:

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock
|                     _----+--------------=> tseg1: 2 …  256
|                    /    /     _---------=> tseg2: 1 …  128
|                   |    |     /    _-----=> sjw:   1 …  128
|                   |    |    |    /    _-=> brp:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error      NBTCFG
|   500000     25  34   35   10   5   1   500000   0.0%  87.5%  87.5%   0.0%   0x00440904

The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "5" (==
125ns). Which is the same as on the sja1000 controller.

[1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf

Link: https://lore.kernel.org/all/20230202110854.2318594-15-mkl@pengutronix.de
Cc: Mark Bath <mark@baggywrinkle.co.uk>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:27 +01:00
Marc Kleine-Budde
b5a3d0864e can: bittiming: can_sjw_check(): check that SJW is not longer than either Phase Buffer Segment
According to "The Configuration of the CAN Bit Timing" [1] the SJW
"may not be longer than either Phase Buffer Segment".

Check SJW against length of both Phase buffers. In case the SJW is
greater, report an error via netlink to user space and bail out.

[1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf

Link: https://lore.kernel.org/all/20230202110854.2318594-14-mkl@pengutronix.de
Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:27 +01:00
Marc Kleine-Budde
0c017f0910 can: bittiming: can_sjw_check(): report error via netlink and harmonize error value
If the user space has supplied an invalid SJW value (greater than the
maximum SJW value), report -EINVAL instead of -ERANGE, this better
matches the actual meaning of the error value.

Additionally report an error message via netlink to the user space.

Link: https://lore.kernel.org/all/20230202110854.2318594-13-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
de82d6185b can: bittiming: can_fixup_bittiming(): report error via netlink and harmonize error value
Check each bit timing parameter first individually against their
limits and report a meaningful error message via netlink to the user
space.

In case of an error, return -EINVAL instead of -ERANGE, this
corresponds better to the actual meaning of the error value.

Link: https://lore.kernel.org/all/20230202110854.2318594-12-mkl@pengutronix.de
Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
5988bf737d can: bittiming: factor out can_sjw_set_default() and can_sjw_check()
Factor out the functionality of assigning a SJW default value into
can_sjw_set_default() and the checking the SJW limits into
can_sjw_check().

This functions will be improved and called from a different function
in the following patches.

Link: https://lore.kernel.org/all/20230202110854.2318594-11-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
286c0e09e8 can: bittiming: can_changelink() pass extack down callstack
This is a preparation patch.

In order to pass warning/error messages during netlink calls back to
user space, pass the extack struct down the callstack of
can_changelink(), the actual error messages will be added in the
following ptaches.

Link: https://lore.kernel.org/all/20230202110854.2318594-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
1494d27f64 can: netlink: can_changelink(): convert from netdev_err() to NL_SET_ERR_MSG_FMT()
Since commit 51c352bdbc ("netlink: add support for formatted extack
messages") formatted extack messages are supported to inform the user
space or warnings/errors during netlink calls.

Replace the netdev_err() by NL_SET_ERR_MSG_FMT() to better inform the
user about the problem. While there, use %u to print unsigned values
and improve error message a bit.

Link: https://lore.kernel.org/all/20230202110854.2318594-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
73335cfab7 can: netlink: can_validate(): validate sample point for CAN and CAN-FD
The sample point is a value in tenths of a percent. Meaningful values
are between 0 and 1000. Invalid values are rejected and an error
message is returned to user space via netlink.

Link: https://lore.kernel.org/all/20230202110854.2318594-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
a3db542410 can: dev: register_candev(): bail out if both fixed bit rates and bit timing constants are provided
The CAN driver framework supports either fixed bit rates or bit timing
constants. Bail out during driver registration if both are given.

Link: https://lore.kernel.org/all/20230202110854.2318594-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
d58ac89d0d can: dev: register_candev(): ensure that bittiming const are valid
Implement the function can_bittiming_const_valid() to check the
validity of the specified bit timing constant. Call this function from
register_candev() to check the bit timing constants during the
registration of the CAN interface.

Link: https://lore.kernel.org/all/20230202110854.2318594-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:26 +01:00
Marc Kleine-Budde
8e0a0b32c4 can: bittiming: can_get_bittiming(): use direct return and remove unneeded else
Clean up the code flow a bit, don't assign err variable but directly
return. Remove the unneeded else, too.

Link: https://lore.kernel.org/all/20230202110854.2318594-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:25 +01:00
Marc Kleine-Budde
52375446f2 can: bittiming: can_fixup_bittiming(): set effective tq
The can_fixup_bittiming() function is used to validate the
user-supplied low-level bit timing parameters and calculate the
bitrate prescaler (brp) from the requested time quanta (tq) and the
CAN clock of the controller.

can_fixup_bittiming() selects the best matching integer bit rate
prescaler, which may result in a different time quantum than the value
specified by the user.

Calculate the resulting time quantum and assign it so that the user
sees the effective time quantum.

Link: https://lore.kernel.org/all/20230202110854.2318594-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-06 13:57:25 +01:00