IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
RDS/IB tries to refill the recv buffer in softirq context using
GFP_NOWAIT flag. However alloc failure is handled by queueing a work to
refill the recv buffer with GFP_KERNEL flag. This means failure to
allocate with GFP_NOWAIT isn't fatal. Do not print the PAF warnings if
softirq context fails to refill the recv buffer. We will see the PAF
warnings when worker also fails to allocate.
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Moshe Shemesh says:
====================
Add devlink reload action and limit options
Introduce new options on devlink reload API to enable the user to select
the reload action required and constrains limits on these actions that he
may want to ensure. Complete support for reload actions in mlx5.
The following reload actions are supported:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.
The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions.
However, if reload limit is selected, the driver should perform only if
it can do it while keeping the limit constraints.
Reload limit added:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
Each driver which supports devlink reload command should expose the
reload actions and limits supported.
Add reload stats to hold the history per reload action per limit.
For example, the number of times fw_activate has been done on this
device since the driver module was added or if the firmware activation
was done with or without reset.
Patch 1 changes devlink_reload_supported() param type to enable using
it before allocating devlink.
Patch 2-3 add the new API reload action and reload limit options to
devlink reload.
Patch 4-5 add reload stats and remote reload stats. These stats are
exposed through devlink dev get.
Patches 6-11 add support on mlx5 for devlink reload action fw_activate
and handle the firmware reset events.
Patches 12-13 add devlink enable remote dev reset parameter and use it
in mlx5.
Patches 14-15 mlx5 add devlink reload limit no_reset support for
fw_activate reload action.
Patch 16 adds documentation file devlink-reload.rst
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink reload rst documentation file.
Update index file to include it.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate with reload limit
no_reset which does firmware live patching, updating the firmware image
without reset, no downtime and no configuration lose. The driver checks
if the firmware is capable of handling the pending firmware changes as a
live patch. If it is then it triggers firmware live patching flow.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enable_remote_dev_reset devlink param flags that the host admin
allows device resets that can be initiated by other hosts. This
parameter is useful for setups where a device is shared by different
hosts, such as multi-host setup. Once the user set this parameter to
false, the driver should NACK any attempt to reset the device while the
driver is loaded.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate. To activate firmware
image the mlx5 driver resets the firmware and reloads it from flash. If
a new image was stored on flash it will be loaded. Once this reload
command is executed the driver initiates fw sync reset flow, where the
firmware synchronizes all PFs on coming reset and driver reload.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If firmware sends sync_reset_abort to driver the driver should clear the
reset requested mode as reset is not expected any more.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Set capability to notify the firmware that this host driver is capable
of handling pci sync for firmware update events.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add functions to query and set the MFRL reset options supported by
firmware.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add remote reload stats to hold the history of actions performed due
devlink reload commands initiated by remote host. For example, in case
firmware activation with reset finished successfully but was initiated
by remote host.
The function devlink_remote_reload_actions_performed() is exported to
enable drivers update on remote reload actions performed as it was not
initiated by their own devlink instance.
Expose devlink remote reload stats to the user through devlink dev get
command.
Examples:
$ devlink dev show
pci/0000:82:00.0:
stats:
reload:
driver_reinit 2 fw_activate 1 fw_activate_no_reset 0
remote_reload:
driver_reinit 0 fw_activate 0 fw_activate_no_reset 0
pci/0000:82:00.1:
stats:
reload:
driver_reinit 1 fw_activate 0 fw_activate_no_reset 0
remote_reload:
driver_reinit 1 fw_activate 1 fw_activate_no_reset 0
$ devlink dev show -jp
{
"dev": {
"pci/0000:82:00.0": {
"stats": {
"reload": {
"driver_reinit": 2,
"fw_activate": 1,
"fw_activate_no_reset": 0
},
"remote_reload": {
"driver_reinit": 0,
"fw_activate": 0,
"fw_activate_no_reset": 0
}
}
},
"pci/0000:82:00.1": {
"stats": {
"reload": {
"driver_reinit": 1,
"fw_activate": 0,
"fw_activate_no_reset": 0
},
"remote_reload": {
"driver_reinit": 1,
"fw_activate": 1,
"fw_activate_no_reset": 0
}
}
}
}
}
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add reload stats to hold the history per reload action type and limit.
For example, the number of times fw_activate has been performed on this
device since the driver module was added or if the firmware activation
was performed with or without reset.
Add devlink notification on stats update.
Expose devlink reload stats to the user through devlink dev get command.
Examples:
$ devlink dev show
pci/0000:82:00.0:
stats:
reload:
driver_reinit 2 fw_activate 1 fw_activate_no_reset 0
pci/0000:82:00.1:
stats:
reload:
driver_reinit 1 fw_activate 0 fw_activate_no_reset 0
$ devlink dev show -jp
{
"dev": {
"pci/0000:82:00.0": {
"stats": {
"reload": {
"driver_reinit": 2,
"fw_activate": 1,
"fw_activate_no_reset": 0
}
}
},
"pci/0000:82:00.1": {
"stats": {
"reload": {
"driver_reinit": 1,
"fw_activate": 0,
"fw_activate_no_reset": 0
}
}
}
}
}
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
By default reload limit is unspecified and so no constraints on reload
actions are required.
Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.
The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.
Have the uapi parameter of reload limit ready for future support of
multiselection.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.
command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change devlink_reload_supported() function to get devlink_ops pointer
param instead of devlink pointer param.
This change will be used in the next patch to check if devlink reload is
supported before devlink instance is allocated.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kmalloc() of sufficiently big portion of memory is cache-aligned
in regular conditions. If some debugging options are used,
there is no reason qdisc structures would need 64-byte alignment
if most other kernel structures are not aligned.
This get rid of QDISC_ALIGN and QDISC_ALIGNTO.
Addition of privdata field will help implementing
the reverse of qdisc_priv() and documents where
the private data is.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Allen Pais <allen.lkml@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of errors, this message was printed:
(...)
balanced bwidth with unbalanced delay 5233 max 5005 [ fail ]
client exit code 0, server 0
\nnetns ns3-0-EwnkPH socket stat for 10003:
(...)
Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".
The commit 8b974778f9 ("selftests: mptcp: interpret \n as a new line")
is very similar to this one. But the modification in simult_flows.sh was
missed because this commit above was done in parallel to one here below.
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without these definitions, the driver will crash in:
mscc_ocelot_probe
-> ocelot_init
-> ocelot_vcap_init
-> __ocelot_target_read_ix
I missed this because I did not have the VSC7514 hardware to test, only
the VSC9959 and VSC9953, and the probing part is different.
Fixes: e3aea296d8 ("net: mscc: ocelot: add definitions for VCAP ES0 keys, actions and target")
Fixes: a61e365d7c ("net: mscc: ocelot: add definitions for VCAP IS1 keys, actions and target")
Reported-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If recvmsg() and the workqueue race to dequeue the data
pending on some subflow, the current mapping for such
subflow covers several skbs and some of them have not
reached yet the received, either the worker or recvmsg()
can find a subflow with the data_avail flag set - since
the current mapping is valid and in sequence - but no
skbs in the receive queue - since the other entity just
processed them.
The above will lead to an unbounded loop in __mptcp_move_skbs()
and a subsequent hang of any task trying to acquiring the msk
socket lock.
This change addresses the issue stopping the __mptcp_move_skbs()
loop as soon as we detect the above race (empty receive queue
with data_avail set).
Reported-and-tested-by: syzbot+fcf8ca5817d6e92c6567@syzkaller.appspotmail.com
Fixes: ab174ad8ef ("mptcp: move ooo skbs into msk out of order queue.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This accidentally got wired up to the *get* policy instead
of the *set* policy, causing operations to be rejected. Fix
it by wiring up the correct policy instead.
Fixes: 5028588b62 ("ethtool: wire up set policies to ops")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ETHTOOL_A_STRSET_COUNTS_ONLY flag attribute was previously
not allowed to be used, but now due to the policy size reduction
we would access the tb[] array out of bounds since we tried to
check for the attribute despite it not being accepted.
Fix both issues by adding it correctly to the appropriate policy.
Fixes: ff419afa43 ("ethtool: trim policy tables")
Fixes: 71921690f9 ("ethtool: provide string sets with STRSET_GET request")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some last minute fixes. The last two of them haven't been in next but
they do seem kind of obvious, very small and safe, fix bugs reported in
the field, and they are both in a new mlx5 vdpa driver, so it's not like
we can introduce regressions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl9/cWEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpaPoH/2b+Hc0UvQPvAas1uWC022bV/VpfiW0+OZaT
IsP88s9IInjWUoBb7Rqkhi3jnZYs9p7W59AV9cNJ5g6vPxcxrJAfgeo9R4bq6rD9
LqIAxHRRwPEnYddtFAv/XnX4YuTS+cJFwFJLWGXXdySA5/pgFqc9qDXjNLFzxq7X
8pI7qbW04e9eS3i8vlZwXNHpHQ7DMcpgewR7XO1Lhqh4sHWfusKSEVDrOM7v/0ru
yMtwKA5X8vRZQTIoaLamRWIm/qLWIi/Wcor6APhRG0Hn9yzS21JRnGDs8iKRSjjx
ecBNatgGPrmbO7yCfyh4el0GgrYhxAk/w2H3p/aXQJu+sGXL8nk=
=y/Tb
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost fixes from Michael Tsirkin:
"Some last minute vhost,vdpa fixes.
The last two of them haven't been in next but they do seem kind of
obvious, very small and safe, fix bugs reported in the field, and they
are both in a new mlx5 vdpa driver, so it's not like we can introduce
regressions"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix dependency on MLX5_CORE
vdpa/mlx5: should keep avail_index despite device status
vhost-vdpa: fix page pinning leakage in error path
vhost-vdpa: fix vhost_vdpa_map() on error condition
vhost: Don't call log_access_ok() when using IOTLB
vhost: Use vhost_get_used_size() in vhost_vring_set_addr()
vhost: Don't call access_ok() when using IOTLB
vhost vdpa: fix vhost_vdpa_open error handling
Pull networking fixes from Jakub Kicinski:
"One more set of fixes from the networking tree:
- add missing input validation in nl80211_del_key(), preventing
out-of-bounds access
- last minute fix / improvement of a MRP netlink (uAPI) interface
introduced in 5.9 (current) release
- fix "unresolved symbol" build error under CONFIG_NET w/o
CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock
BTF.
- fix 32 bit sub-register bounds tracking in the bpf verifier for OR
case
- tcp: fix receive window update in tcp_add_backlog()
- openvswitch: handle DNAT tuple collision in conntrack-related code
- r8169: wait for potential PHY reset to finish after applying a FW
file, avoiding unexpected PHY behaviour and failures later on
- mscc: fix tail dropping watermarks for Ocelot switches
- avoid use-after-free in macsec code after a call to the GRO layer
- avoid use-after-free in sctp error paths
- add a device id for Cellient MPL200 WWAN card
- rxrpc fixes:
- fix the xdr encoding of the contents read from an rxrpc key
- fix a BUG() for a unsupported encoding type.
- fix missing _bh lock annotations.
- fix acceptance handling for an incoming call where the incoming
call is encrypted.
- the server token keyring isn't network namespaced - it belongs
to the server, so there's no need. Namespacing it means that
request_key() fails to find it.
- fix a leak of the server keyring"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (21 commits)
net: usb: qmi_wwan: add Cellient MPL200 card
macsec: avoid use-after-free in macsec_handle_frame()
r8169: consider that PHY reset may still be in progress after applying firmware
openvswitch: handle DNAT tuple collision
sctp: fix sctp_auth_init_hmacs() error path
bridge: Netlink interface fix.
net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
bpf: Fix scalar32_min_max_or bounds tracking
tcp: fix receive window update in tcp_add_backlog()
net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
mptcp: more DATA FIN fixes
net: mscc: ocelot: warn when encoding an out-of-bounds watermark value
net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
rxrpc: Fix server keyring leak
rxrpc: The server keyring isn't network-namespaced
rxrpc: Fix accept on a connection that need securing
rxrpc: Fix some missing _bh annotations on locking conn->state_lock
rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
rxrpc: Fix rxkad token xdr encoding
...
Remove propmt for selecting MLX5_VDPA by the user and modify
MLX5_VDPA_NET to select MLX5_VDPA. Also modify MLX5_VDPA_NET to depend
on mlx5_core.
This fixes an issue where configuration sets 'y' for MLX5_VDPA_NET while
MLX5_CORE is compiled as a module causing link errors.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 device")s
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20201007064011.GA50074@mtl-vdi-166.wap.labs.mlnx
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A VM with mlx5 vDPA has below warnings while being reset:
vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
We should allow userspace emulating the virtio device be
able to get to vq's avail_index, regardless of vDPA device
status. Save the index that was last seen when virtq was
stopped, so that userspace doesn't complain.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Add usb ids of the Cellient MPL200 card.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@mailbox.org>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
De-referencing skb after call to gro_cells_receive() is not allowed.
We need to fetch skb->len earlier.
Fixes: 5491e7c6b1 ("macsec: enable GRO and RPS on macsec devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some firmware files trigger a PHY soft reset and don't wait for it to
be finished. PHY register writes directly after applying the firmware
may fail or provide unexpected results therefore. Fix this by waiting
for bit BMCR_RESET to be cleared after applying firmware.
There's nothing wrong with the referenced change, it's just that the
fix will apply cleanly only after this change.
Fixes: 89fbd26cca ("r8169: fix firmware not resetting tp->ocp_base")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With multiple DNAT rules it's possible that after destination
translation the resulting tuples collide.
For example, two openvswitch flows:
nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
Assuming two TCP clients initiating the following connections:
10.0.0.10:5000->10.0.0.10:10
10.0.0.10:5000->10.0.0.20:10
Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
nf_conntrack_confirm() to fail because of tuple collision.
Netfilter handles this case by allocating a null binding for SNAT at
egress by default. Perform the same operation in openvswitch for DNAT
if no explicit SNAT is requested by the user and allocate a null binding
for SNAT for packets in the "original" direction.
Reported-at: https://bugzilla.redhat.com/1877128
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-10-08
The main changes are:
1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due
to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song.
2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is correcting NETLINK br_fill_ifinfo() to be able to
handle 'filter_mask' with multiple flags asserted.
Fixes: 36a8e8e265 ("bridge: Extend br_fill_ifinfo to return MPR status")
Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nouveau:
- fix crash in TTM alloc fail path
- return error earlier for unknown chipsets
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJffof8AAoJEAx081l5xIa+J1cP/Rw0awD0mLwtjfI+9btpxBkk
/p308idpUNGvF92HJ/f+V8gwznR85mwes5ls6/qtfI78c+ShuTDlWwIDF7xHfIyJ
f8Ai/NqGciRcRceeM0kjC7+EUGd6xpyzEg2YADMuRYoeTqC4VTDdFM+Bf+YWYfyo
vgCMvidap8Sdc2K+mEhSr1PwbeB+13ViflgyWTne8o5mZxmq66d2/ufoa7qBZecJ
FMpansUaa5PXFFjVI6bYt+AmUNi50JDa63GO4UNuBCOzLqfRLnFj9yCCrgaNrTTx
rKcOAYvHphSRfkKU2OQ8dEYnzwAlCfthOc6Ks1TGd9ve4Z5swb6X8mMQiTxKvTDR
+EFKXQCtO/6c7y7bWQw7pGzoBMA1Bpi0ky1VtG+llME+F0W5ePaUqbVBj6AC4iIR
sPlT6wtrqW99/AfgvcfZs5wq25onoPSMZplGbfqx8AErFWp/KmEE/+R5bR27SA3N
TlKPzyYCQ3EL1nQmrfPnDwF+H8GetaVngJZe/awnr31xwWcHLl3h+FfIArzd7gRl
H2umkUIO/Uk8lIcIr0Vk90V84BLy+de4ijng2b5bnXKbBx7+o+e/faisqVXx8ZR6
2hmGupAiuOmHOOf2PCLPnUyZTN/J+pzURN6UK4yjk6nlTfN01wXn4+2w3ifB8m6b
Vl54q++yIQBaZAADE/NL
=IkbJ
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm
Pull drm nouveau fixes from Dave Airlie:
"Karol found two last minute nouveau fixes, they both fix crashes, the
TTM one follows what other drivers do already, and the other is for
bailing on load on unrecognised chipsets.
- fix crash in TTM alloc fail path
- return error earlier for unknown chipsets"
* tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/mem: guard against NULL pointer access in mem_del
drm/nouveau/device: return error for unknown chipsets
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCX31hAAAKCRCAXGG7T9hj
vjneAQDTJofrC76bt5QcPcrz1BWBC41tOOb5jSVLEVxwsnTfDAD/STWrrT6ZLH2z
759txSf/ZCnpRCub7IXgaUek5oNlSAI=
=QWgj
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"One fix for a regression when booting as a Xen guest on ARM64
introduced probably during the 5.9 cycle. It is very low risk as it is
modifying Xen specific code only.
The exact commit introducing the bug hasn't been identified yet, but
everything was fine in 5.8 and only in 5.9 some configurations started
to fail"
* tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/arm64: xen: Fix to convert percpu address to gfn correctly
The afs filesystem has a lock[*] that it uses to serialise I/O operations
going to the server (vnode->io_lock), as the server will only perform one
modification operation at a time on any given file or directory. This
prevents the the filesystem from filling up all the call slots to a server
with calls that aren't going to be executed in parallel anyway, thereby
allowing operations on other files to obtain slots.
[*] Note that is probably redundant for directories at least since
i_rwsem is used to serialise directory modifications and
lookup/reading vs modification. The server does allow parallel
non-modification ops, however.
When a file truncation op completes, we truncate the in-memory copy of the
file to match - but we do it whilst still holding the io_lock, the idea
being to prevent races with other operations.
However, if writeback starts in a worker thread simultaneously with
truncation (whilst notify_change() is called with i_rwsem locked, writeback
pays it no heed), it may manage to set PG_writeback bits on the pages that
will get truncated before afs_setattr_success() manages to call
truncate_pagecache(). Truncate will then wait for those pages - whilst
still inside io_lock:
# cat /proc/8837/stack
[<0>] wait_on_page_bit_common+0x184/0x1e7
[<0>] truncate_inode_pages_range+0x37f/0x3eb
[<0>] truncate_pagecache+0x3c/0x53
[<0>] afs_setattr_success+0x4d/0x6e
[<0>] afs_wait_for_operation+0xd8/0x169
[<0>] afs_do_sync_operation+0x16/0x1f
[<0>] afs_setattr+0x1fb/0x25d
[<0>] notify_change+0x2cf/0x3c4
[<0>] do_truncate+0x7f/0xb2
[<0>] do_sys_ftruncate+0xd1/0x104
[<0>] do_syscall_64+0x2d/0x3a
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The writeback operation, however, stalls indefinitely because it needs to
get the io_lock to proceed:
# cat /proc/5940/stack
[<0>] afs_get_io_locks+0x58/0x1ae
[<0>] afs_begin_vnode_operation+0xc7/0xd1
[<0>] afs_store_data+0x1b2/0x2a3
[<0>] afs_write_back_from_locked_page+0x418/0x57c
[<0>] afs_writepages_region+0x196/0x224
[<0>] afs_writepages+0x74/0x156
[<0>] do_writepages+0x2d/0x56
[<0>] __writeback_single_inode+0x84/0x207
[<0>] writeback_sb_inodes+0x238/0x3cf
[<0>] __writeback_inodes_wb+0x68/0x9f
[<0>] wb_writeback+0x145/0x26c
[<0>] wb_do_writeback+0x16a/0x194
[<0>] wb_workfn+0x74/0x177
[<0>] process_one_work+0x174/0x264
[<0>] worker_thread+0x117/0x1b9
[<0>] kthread+0xec/0xf1
[<0>] ret_from_fork+0x1f/0x30
and thus deadlock has occurred.
Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
that the caller is holding i_rwsem doesn't preclude more pages being
dirtied through an mmap'd region.
Fix this by:
(1) Use the vnode validate_lock to mediate access between afs_setattr()
and afs_writepages():
(a) Exclusively lock validate_lock in afs_setattr() around the whole
RPC operation.
(b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
shared-lock validate_lock and returning immediately if we couldn't
get it.
(c) If WB_SYNC_ALL is set, wait for the lock.
The validate_lock is also used to validate a file and to zap its cache
if the file was altered by a third party, so it's probably a good fit
for this.
(2) Move the truncation outside of the io_lock in setattr, using the same
hook as is used for local directory editing.
This requires the old i_size to be retained in the operation record as
we commit the revised status to the inode members inside the io_lock
still, but we still need to know if we reduced the file size.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 70e806e4e6 ("mm: Do early cow for pinned pages during fork()
for ptes") we write-protected the PTE before doing the page pinning
check, in order to avoid a race with concurrent fast-GUP pinning (which
doesn't take the mm semaphore or the page table lock).
That trick doesn't actually work - it doesn't handle memory ordering
properly, and doing so would be prohibitively expensive.
It also isn't really needed. While we're moving in the direction of
allowing and supporting page pinning without marking the pinned area
with MADV_DONTFORK, the fact is that we've never really supported this
kind of odd "concurrent fork() and page pinning", and doing the
serialization on a pte level is just wrong.
We can add serialization with a per-mm sequence counter, so we know how
to solve that race properly, but we'll do that at a more appropriate
time. Right now this just removes the write protect games.
It also turns out that the write protect games actually break on Power,
as reported by Aneesh Kumar:
"Architecture like ppc64 expects set_pte_at to be not used for updating
a valid pte. This is further explained in commit 56eecdb912 ("mm:
Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"
and the code triggered a warning there:
WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
Call Trace:
copy_present_page mm/memory.c:857 [inline]
copy_present_pte mm/memory.c:899 [inline]
copy_pte_range mm/memory.c:1014 [inline]
copy_pmd_range mm/memory.c:1092 [inline]
copy_pud_range mm/memory.c:1127 [inline]
copy_p4d_range mm/memory.c:1150 [inline]
copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
dup_mmap kernel/fork.c:592 [inline]
dup_mm+0x77c/0xab0 kernel/fork.c:1355
copy_mm kernel/fork.c:1411 [inline]
copy_process+0x1f00/0x2740 kernel/fork.c:2070
_do_fork+0xc4/0x10b0 kernel/fork.c:2429
Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/
Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon reported an issue with the current scalar32_min_max_or() implementation.
That is, compared to the other 32 bit subreg tracking functions, the code in
scalar32_min_max_or() stands out that it's using the 64 bit registers instead
of 32 bit ones. This leads to bounds tracking issues, for example:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
The bound tests on the map value force the upper unsigned bound to be 25769803777
in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
tracker. This is incorrect given the only thing we know is that the value must be
positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
we know that these are positive. Previously, in the else branch the 64 bit values
of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
be 1 as upper bound there. After the fix the subreg range is now correct:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Simon Scannell <scannell.smn@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Previously the code relied on device->pri to be NULL and to fail probing
later. We really should just return an error inside nvkm_device_ctor for
unsupported GPUs.
Fixes: 24d5ff40a7 ("drm/nouveau/device: rework mmio mapping code to get rid of second map")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Cc: dann frazier <dann.frazier@canonical.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-1-kherbst@redhat.com
Fix missing result check of exfat_build_inode().
And use PTR_ERR_OR_ZERO instead of PTR_ERR.
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>