IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Updates:
- Yaml conversion for renesas,rcar-gen3 pcie phy and
rockchip-usb-phy bindings
- Support for devm_phy_get() taking NULL phy name
- New support:
- PCIe phy for Qualcomm IPQ60xx
- PCIe phy for Qualcomm SDX55
- USB phy for RK3308
- CAN transceivers phy for TI TCAN104x
- Innosilicon-based CSI dphy for rockchip
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmDS5/MACgkQfBQHDyUj
g0cMZhAAjtAeYe7R8r2M8NACd2HRnBMBcHFJtz6Os7ZZa4dRMLyq6TkStZ8k1Mat
2dEUb3Wb8G18WqECXUlcJp/CYlK1ea1GDwgTLd09tGo7PAn6p3RBG7ujf+aCWoil
TLhsE9Sb7d3pr8qVsy6DQc/TIWxfCTeqZrIRzt2UQXxK6P/3CtdN1g/TEdq1r/h9
A/xkDnyT2QQOJTgFMUh9RF85vXfptm9+gVxNdgyHGVvdaEhSURUytEfubrQYjd1P
sM5BhsNPEKHqL/kYlgMAgnENtQBrXBwfEzP02M32zyhTLvaZMcV87Z8LXxmrPDn/
jtYLlSYTQTDKA9tHBw1W8NogaLS9INeTjDhqt0hBsT/lh1vL5KMk0IGsSxUaJ27h
6on3NmjsxI/HKosdrQzRizURS0xixOWDnZN7UPFaHFJbA8GMIlQlDidZOJTl87cc
inIuSdB29YA0FNGbXOq/HR3y1qChlwAewxFYCzFtKfuHmZWJNy7osUvQNHodEAGl
BGjerYgJUNNiWlxmoggUcLtAdbqYVH8fvvz3WUyClUePVqioX8gS6UDRx8LrgGV2
FvLSKgetbqH3iQtRO+2RtwMJEMmC3eW7esjUv/fEgKym+2JvORpE0p2n5sWTgRMc
VWFDVqOp5JJXmarRkly2MMppO5x25d3wE9DLR0/Re7hNPGUsz4M=
=Col+
-----END PGP SIGNATURE-----
Merge tag 'phy-for-5.14_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into char-misc-next
Vinod writes:
phy-for-5.14 version 2
- Updates:
- Yaml conversion for renesas,rcar-gen3 pcie phy and
rockchip-usb-phy bindings
- Support for devm_phy_get() taking NULL phy name
- New support:
- PCIe phy for Qualcomm IPQ60xx
- PCIe phy for Qualcomm SDX55
- USB phy for RK3308
- CAN transceivers phy for TI TCAN104x
- Innosilicon-based CSI dphy for rockchip
* tag 'phy-for-5.14_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (36 commits)
phy: Revert "phy: ralink: Kconfig: convert mt7621-pci-phy into 'bool'"
phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
phy: uniphier-pcie: Fix updating phy parameters
phy/rockchip: add Innosilicon-based CSI dphy
dt-bindings: phy: add yaml binding for rockchip-inno-csi-dphy
phy: rockchip: remove redundant initialization of pointer cfg
phy: phy-can-transceiver: Add support for generic CAN transceiver driver
dt-bindings: phy: Add binding for TI TCAN104x CAN transceivers
phy: core: Reword the comment specifying the units of max_link_rate to be Mbps
phy: phy-mtk-hdmi: Remove redundant dev_err call in mtk_hdmi_phy_probe()
phy: phy-mtk-mipi-dsi: Remove redundant dev_err call in mtk_mipi_tx_probe()
phy: phy-mmp3-hsic: Remove redundant dev_err call in mmp3_hsic_phy_probe()
phy: bcm-ns-usb3: Remove redundant dev_err call in bcm_ns_usb3_mdio_probe()
MAINTAINERS: update marvell,armada-3700-utmi-phy.yaml reference
phy: phy-twl4030-usb: use DEVICE_ATTR_RO macro
dt-bindings: phy: convert rockchip-usb-phy.txt to YAML
phy: phy-rockchip-inno-usb2: add support for RK3308 USB phy
dt-bindings: phy: rockchip-inno-usb2: add compatible for rk3308 USB phy
phy: stm32: manage optional vbus regulator on phy_power_on/off
dt-bindings: phy: add vbus-supply optional property to phy-stm32-usbphyc
...
This reverts commit 6eded551ce ("phy: ralink: Kconfig: convert
mt7621-pci-phy into 'bool'") as we don't want drivers to be built in and
should be a module instead
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Here are changes for the 5.14-rc1 merge window consisting of interconnect
driver updates.
Driver changes:
- New driver for SC7280 platforms.
Signed-off-by: Georgi Djakov <djakov@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJg0c/OAAoJEIDQzArG2BZjNckP/jN5l3aYw/mbmffC1CUGa+nb
OOzHoUdX8m9J5zC7ecjBBxsBnRPLY84CjzziM+RVab15N+QmQEW5tp4KiUfgJqIs
bfmun8g+FxbSS71rCuykTH1/dbBGVixDFXVkquWNiBQwE+RRMGF0dOYs4BeXGf8B
sqORnNrHSGjuXzg5CCVTJ6d45O46lFZHx2/4bxR1FbcrQqrhRiRkrlM6FlI/coyR
+9cfZOd8+lDSC6j/S9XkDT75fHHvdpXNH/7wlZnyOlOHpE2CoEduu0OySrAEO9J9
l1iEK9apgm+6hoC1xngbR84mYVLjlSaeih/fQJ5/0yrgbxWA6zyunEnUOuPDPOhZ
2Ghgxwf8+TwhRI1PGO6rLxpQ7xldz36U6pJOPU91cJJjl71Ix8KtfbfpY5Phbj6R
zM/TGLBuNEHN8+wSIyh766mnv6gLzfYbexyjIc3PoZgx8EytZOBPVcEfWlJxnDni
k/EPPa/aUiytGCVl0WwDrMhJlITMpGaZfSxuTLyAblECcRxxk8gvMsFcEFk8HW8y
WnonClC39kPK7+9CVzkPq1CMvetdbmCWsIiwvZohah5MDpg+ao8ZGe6uk8yZnOSQ
3GME8Gv0OhUI1u9T9VNuRC4xrGmokKYcgOPsqgVKJMkmoyeh6oA1jE0w9IxhJkbZ
AJgbpRg0hMvMFK+C+Wy8
=ihE7
-----END PGP SIGNATURE-----
Merge tag 'icc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next
Georgi writes:
interconnect changes for 5.14
Here are changes for the 5.14-rc1 merge window consisting of interconnect
driver updates.
Driver changes:
- New driver for SC7280 platforms.
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: Add SC7280 interconnect provider driver
dt-bindings: interconnect: Add Qualcomm SC7280 DT bindings
The mei extension header was build as array of flexible structures
which will not work if actually more headers are added.
(Currently only vtag header was used).
Sparse reports:
drivers/misc/mei/hw.h:253:32: warning: array of flexible structures
Use basic type u8 for the variable sized extension.
Define explicitly mei_ext_hdr_vtag structure.
And also fix mei_ext_next() function to point correctly to the
end of the header.
Note: the headers are part of firmware interface and need to be __packed.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210621193756.134027-2-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Over time the functions were renamed,
but this was not always reflected in kdoc, fix that.
Signed-off-by: Tamar Mashiah <tamar.mashiah@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210621193756.134027-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Change communication protocol with f/w. The new protocl allows better
backward compatibility between different f/w versions and is more
stable.
- Send hard-reset cause to f/w after a hard-reset has happened.
- Move to indirection when generating interrupts to f/w.
- Better progress and error messages during the f/w load stage.
- Recognize that f/w is with enabled security according to device ID.
- Add validity check to event queue mechanism.
- Add new event from f/w that will indicate a daemon has been terminated
inside the f/w.
- Move to TLB cache range invalidation in the device's MMU.
- Disable memory scrubbing by default for performance.
- Many fixes for sparse/smatch reported errors.
- Enable by default stop-on-err in the ASIC.
- Move to ASYNC device probing to speedup loading of driver in server
with multiple devices.
- Fix to stop using disabled NIC ports when doing collective operation.
- Use standard error codes instead of positive values.
- Add support for resetting device after user has finished using it.
- Add debugfs option to avoid reset when a CS has got stuck.
- Add print of the last 8 CS pointers in case of error in QMANs.
- Add statistics on opening of the FD of a device.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE7TEboABC71LctBLFZR1NuKta54AFAmDRrdoTHG9nYWJiYXlA
a2VybmVsLm9yZwAKCRBlHU24q1rngHvyCACheRh5ExpPOvFZkPT6l4pGekx1vJwy
tJsYmILk5mWprRczeSskilxQMFZOGTQzQqN01c/Bl/94eyCcdmBoLNCFcrtDYcjh
pBZft4tUWMKToPX3j5gpMospXg+CBsEIsltxKrrlQ1ZxgY5JWmcg1NZOTU32yMvC
/9rpTxdpfda6870hY0kfoXjRfCAReENQCQkCNWi/DONmtneOmpDgJC7AQgW8gQcm
pQBFwjvF3aweO5/R9pvJa3QhuwY5nWQDsLKGJvcPNThpEYJ230Yh6N33KQQUNsaz
4Y5pUl5MS8Z6qz2Yd79bnRolWTSDP2QQhHRUnx7vh2rRsJKzr1QGP6Ck
=MixT
-----END PGP SIGNATURE-----
Merge tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next
Oded writes:
This tag contains habanalabs driver changes for v5.14:
- Change communication protocol with f/w. The new protocl allows better
backward compatibility between different f/w versions and is more
stable.
- Send hard-reset cause to f/w after a hard-reset has happened.
- Move to indirection when generating interrupts to f/w.
- Better progress and error messages during the f/w load stage.
- Recognize that f/w is with enabled security according to device ID.
- Add validity check to event queue mechanism.
- Add new event from f/w that will indicate a daemon has been terminated
inside the f/w.
- Move to TLB cache range invalidation in the device's MMU.
- Disable memory scrubbing by default for performance.
- Many fixes for sparse/smatch reported errors.
- Enable by default stop-on-err in the ASIC.
- Move to ASYNC device probing to speedup loading of driver in server
with multiple devices.
- Fix to stop using disabled NIC ports when doing collective operation.
- Use standard error codes instead of positive values.
- Add support for resetting device after user has finished using it.
- Add debugfs option to avoid reset when a CS has got stuck.
- Add print of the last 8 CS pointers in case of error in QMANs.
- Add statistics on opening of the FD of a device.
* tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (72 commits)
habanalabs/gaudi: refactor hard-reset related code
habanalabs/gaudi: add support for NIC DERR
habanalabs: add validity check for signal cs
habanalabs: get lower/upper 32 bits via masking
habanalabs: allow reset upon device release
debugfs: add skip_reset_on_timeout option
habanalabs: fix typo
habanalabs/gaudi: correct driver events numbering
habanalabs: remove a rogue #ifdef
habanalabs/gaudi: print last QM PQEs on error
habanalabs/goya: add '__force' attribute to suppress false alarm
habanalabs: added open_stats info ioctl
habanalabs/gaudi: set the correct rc in case of err
habanalabs/gaudi: update coresight configuration
habanalabs: remove node from list before freeing the node
habanalabs: set rc as 'valid' in case of intentional func exit
habanalabs: zero complex structures using memset
habanalabs: print more info when failing to pin user memory
habanalabs: Fix an error handling path in 'hl_pci_probe()'
habanalabs: print firmware versions
...
Updates for v5.14-rc1 are:
- Core has odd updates including improving clock stop codes, write api,
handling ENODATA etc
- Drivers has Big move of Intel driver to be aux dev and minor updates
to Intel/cadence driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmDRcwkACgkQfBQHDyUj
g0eDQg//RfLrqVuEiLRk0D3shOvxJszEtZkBsEkgXOcB6FlQyh/WG63GE8bwdW1W
hanPmKrJqwH3zqPwKZlocVnbJ6GaAg8lY31rOiMfoj71cRBG5odGA4DZDx1KYe0l
JdJxFmgaOgny1hZgeWXdjoODYXg8+a6R/Cl0HoZexA7lk2bK7tehfBTftuejDIzx
W5NmvjJ4MueHfANE3dHfY+4PGMOtEuuSg50VFaE+xfoMeBXzikvWshtDsuC9R7ym
dLYXfnKlWp55jhk1e0KWTKDyfzsWptQSOThYiHVgU5WkaBI8lHLKctscIxfianT0
9RIcGCwuQDz2ARn0KLbsIEVyJdNsGmUAGg/tnLvsgHsg+AyvATuBVS5Q3t0QFdal
NyDaGMhrGa1U+thx9n99zurNrjU+rOUS8BxjHKioCpJ/+/DQ12FEO8wF5T8LwLlB
0zQyXbFB0KFYRyWvfWLFC1Hua8NlBTgrfwPhtjPyCRpUtIx/rrW1hwiNFVTw5SC/
qaGnOFXlZ3vSManHCFZHk9ZC5SAahBtfLb7gorVyJT5Sl2amwV+o0NWcoZg+cgoj
EElZThYhKY5FjEhvhtfXE4/vmQs1KVaNQ5FYIq2mCPzTI9MEuc8uzYkomXTmHoAY
LGD3MwbLRQcwaXEKwSCvg07bPDM891GiHOUEjP6RBMsBBvVb6Mg=
=JGCd
-----END PGP SIGNATURE-----
Merge tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next
Vinod writes:
soundwire updates for 5.14-rc1
Updates for v5.14-rc1 are:
- Core has odd updates including improving clock stop codes, write api,
handling ENODATA etc
- Drivers has Big move of Intel driver to be aux dev and minor updates
to Intel/cadence driver
* tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: Fix test for DP prepare complete
soundwire: bus: Make sdw_nwrite() data pointer argument const
soundwire: intel: move to auxiliary bus
soundwire: cadence: remove the repeated declaration
soundwire: dmi-quirks: remove duplicate initialization
soundwire: cadence_master: always set CMD_ACCEPT
soundwire: bus: add missing \n in dynamic debug
soundwire: bus: handle -ENODATA errors in clock stop/start sequences
soundwire: add missing kernel-doc description
soundwire: bus: only use CLOCK_STOP_MODE0 and fix confusions
soundwire: bandwidth allocation: improve error messages
soundwire/ASoC: add leading zeroes in peripheral device name
There is code related to hard-reset, which is done in gaudi specific
code. However, this code can be used by future ASICs and therefore it
is better to move it to the common code section.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We add support for NIC DERR ECC error events, in case this error
is received a device reset will be performed.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In preparation for a new feature that allows the user to reserve
signals ahead of submissions, we need to change a current assumption
in the code.
Currently, the driver uses 2 SOBs to support signal CS. When the first
SOB reaches max value, the driver switches to the other one and assumes
that when it will need to switch back to the first one, all of the
signals have already been handled.
This assumption won't hold when the new feature will be added, because
using signal reservation, the driver can reach the max SOB value very
fast.
The change is to add a validity check when submitting a signal CS, to
make sure the previous SOB is available (all the signals attached to
it indeed finished).
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We introduce a new type of reset which is reset upon device release.
This reset is very similar to soft reset except the fact it is
performed only upon device release and not upon user sysfs request
nor TDR.
The purpose of this reset is to make sure the device is returned to
IDLE state after the current user has finished working with the device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The current driver uses a value from register TEST_O as the original
value for register TEST_I, though, the value is overwritten by "param",
so there is a bug that the original value isn't no longer used.
The value of TEST_O[7:0] should be masked with "mask", replaced with
"param", and placed in the bitfield TESTI_DAT_MASK as new TEST_I value.
Fixes: c6d9b13241 ("phy: socionext: add PCIe PHY driver support")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/1623037842-19363-1-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The CSI dphy found for example on the rk3326/px30 and rk3368 is based
on an IP design from Innosilicon. Add a driver for it.
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Link: https://lore.kernel.org/r/20210610212935.3520341-3-heiko@sntech.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Some Rockchip SoCs like the rk3368, rk3326, px30 use a CSI dphy
based on an Innosilicon IP. Add a binding for them.
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210610212935.3520341-2-heiko@sntech.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The pointer cfg is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210609113901.185230-1-colin.king@canonical.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
In sdw_prep_deprep_slave_ports(), after the wait_for_completion()
the DP prepare status register is read. If this indicates that the
port is now prepared, the code should continue with the port setup.
It is irrelevant whether the wait_for_completion() timed out if the
port is now ready.
The previous implementation would always fail if the
wait_for_completion() timed out, even if the port was reporting
successful prepare.
This patch also fixes a minor bug where the return from sdw_read()
was not checked for error - any error code with LSBits clear could
be misinterpreted as a successful port prepare.
Fixes: 79df15b7d3 ("soundwire: Add helpers for ports operations")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210618144745.30629-1-rf@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Idiomatically, write functions should take const pointers to the
data buffer, as they don't change the data. They are also likely
to be called from functions that receive a const data pointer.
Internally the pointer is passed to function/structs shared with
the read functions, requiring a cast, but this is an implementation
detail that should be hidden by the public API.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210616145901.29402-1-rf@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
To be able to debug long-running CS better, without changing the
userspace code, we are adding a new option through debugfs interface
to skip the reset of the device in case of CS timeout.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently driver sends fc interrupt id to FW instead of using
cpu interrupt id. We intend to fix that and keep backward
compatibility by using the same interrupt values.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
There was a rogue #ifdef that crept into the upstream code for
backwards compatibility which isn't needed of course.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case QMAN has an error and stop_on_err is true, print specific
information of the "offending" command buffer batch.
If the error occurred on one of the higher CPs, the CQ pointer and size
will be printed along with (up to) last 8 PQEs of the stream.
If the error occurred in the lower CP, the CQ pointer and size will be
printed along with (up to) last 8 PQEs of ALL upper CPs as we have no
way to know which upper CP sent the job there.
This is done so higher SW levels will be able to debug their CS by
extracting the raw data of the offending command buffer batch and
examine those offline to detect the issue.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In a system with multiple ASICs, there is a need to provide monitoring
tools with information on how long a device was opened and how many
times a device was opened.
Therefore, we add a new opcode to the INFO ioctl to provide that
information.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Update STMTCSR and STMSYNCR values in order to reduce amount of sync
packets
Signed-off-by: Tal Albo <talbo@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
fix the following smatch warnings:
goya_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
gaudi_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
pin_user_pages_fast() might fail and return a negative number, or pin
less pages than requested and return the number of the pages that were
pinned.
For the latter, it is informative to print also the memory size and the
number of requested pages.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 2e5eda4681 ("habanalabs: PCIe Advanced Error Reporting support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Firmware in habanalabs devices is composed of several components.
During device initialization, we read these versions from the device.
Print them during device initialization to allow better visibility in
automated systems.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Hard reset flow on PLDM might take more than 2 minutes.
Hence add a dedicated hard reset timeout of 6 minutes for PLDM.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In current code, for dynamic f/w loading flow, DRAM scrambling is
enabled post Linux fit image is loaded to the card. This can cause the
device CPU to go into reset state.
The correct sequence should be:
1. Load boot fit image
2. Enable scrambling
3. Load Linux fit image
This commit aligns the DRAM scrambling enabling with the static f/w load
flow.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If there is an error in the QMAN/engine, there is no point of trying
to continue running the workload. It is better to stop to allow the
user to debug the program.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case we have EQ fault we would like to know about it.
For this, a status bitmask was added in which EQ_FAULT bit is
set by FW in case of EQ fault.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When there is an ECC error in the HBM, return a standard error code,
-EIO in this case, and not a positive value.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When converting virtual address to physical we need to add correct
offset to the physical page.
For this we need to use mask that include ALL bits of page offset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Get rid of the need to check if boot_dev_sts is valid on every access
to value read from these registers.
This is done by storing the register value in hdev props ONLY if
register is enabled.
This way if register is NOT enabled all capability bits will not be set.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If device is not idle after user closes the FD we must reset device
as next user that will try to open FD will encounter a non-functional
device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Sometimes it is useful to allow the command to continue running despite
the timeout occurred, to differentiate between really stuck or just very
time consuming commands. This can be achieved by passing a new debug
flag alongside the cs, HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT.
Anyway, if the timeout occurred, a warning print shall be issued,
however this shall not fail the submission.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order for driver to be aware of process or thread crashes inside
GAUDI's CPU, we introduce a new event which contains all relevant
information. Upon event reception, driver will dump information and
will reset the device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In the collective wait, we put jobs on the QMANs of all the NICs. The
code takes into account if a port is disabled only in case of PCI card.
When this info arrives from the f/w, the code doesn't take it into
account, and it tries to schedule jobs on NICs that aren't enabled and
thats a bug.
To fix this, after the f/w sends us the list of disabled ports, we
update the state of the QMANs according to that list. In addition,
we need to update the HW_CAP bits so the collective wait operation
will not try to use those QMANs. We also need to update the collective
master monitor mask.
Moreover, we need to add a protection for such future cases and in case
the user will try to submit work to those QMANs.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Current implementation uses a single interrupt interface towards
FW, this interface is causing races between interrupt types.
We split this interface to interface per interrupt type.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
There is no dependency when probing multiple devices so indicate to the
kernel that it can probe our devices in ASYNC fashion.
This shortens insmod of the driver from ~2 minutes to 20 seconds on
a system with 8 devices.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Update the QM stop on error masks to also stop on ARB errors.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>