Commit Graph

4310 Commits

Author SHA1 Message Date
Adam Vodopjan
37e14e4f37 ata: ahci: Fix PCS quirk application for suspend
Since kernel 5.3.4 my laptop (ICH8M controller) does not see Kingston
SV300S37A60G SSD disk connected into a SATA connector on wake from
suspend.  The problem was introduced in c312ef1763 ("libata/ahci: Drop
PCS quirk for Denverton and beyond"): the quirk is not applied on wake
from suspend as it originally was.

It is worth to mention the commit contained another bug: the quirk is
not applied at all to controllers which require it. The fix commit
09d6ac8dc5 ("libata/ahci: Fix PCS quirk application") landed in 5.3.8.
So testing my patch anywhere between commits c312ef1763 and
09d6ac8dc5 is pointless.

Not all disks trigger the problem. For example nothing bad happens with
Western Digital WD5000LPCX HDD.

Test hardware:
- Acer 5920G with ICH8M SATA controller
- sda: some SATA HDD connnected into the DVD drive IDE port with a
  SATA-IDE caddy. It is a boot disk
- sdb: Kingston SV300S37A60G SSD connected into the only SATA port

Sample "dmesg --notime | grep -E '^(sd |ata)'" output on wake:

sd 0:0:0:0: [sda] Starting disk
sd 2:0:0:0: [sdb] Starting disk
ata4: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
ata1.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
ata1: FORCE: cable set to 80c
ata5: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata3.00: disabled
sd 2:0:0:0: rejecting I/O to offline device
ata3.00: detaching (SCSI 2:0:0:0)
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_NO_CONNECT
	driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Synchronizing SCSI cache
sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result:
	hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] Stopping disk
sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
	driverbyte=DRIVER_OK

Commit c312ef1763 dropped ahci_pci_reset_controller() which internally
calls ahci_reset_controller() and applies the PCS quirk if needed after
that. It was called each time a reset was required instead of just
ahci_reset_controller(). This patch puts the function back in place.

Fixes: c312ef1763 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
Signed-off-by: Adam Vodopjan <grozzly@protonmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-12-27 11:06:57 +09:00
Linus Torvalds
8ecd28b7a3 ata changes for 6.2
ata changes fro 6.2 include the ususal set of driver fixes and
 improvements as well as several patches improving libata core in
 preparation of the introduction of the support for the command duration
 limits feature. In more details:
 
   - Define the missing COMPLETED sense key in scsi header (from me).
 
   - Several patches to improve libata handling of the status of
     completed commands and the retry and sense data reported to the scsi
     layer for failed commands. In particular, this widen the support for
     NCQ autosense to all drives that support this feature instead of
     restricting this feature use to ZAC drives only (from Niklas).
 
   - Cleanup of the pata_mpc52xx and sata_dwc_460ex drivers to remove the
     use of the deprecated NO_IRQ macro (from Christophe).
 
   - Fix build dedependency on OF vs use of the of_match_ptr() macro to
     avoid build errors with the sata_gemini and pata_ftide010 drivers
     (from me).
 
   - Some libata cleanups using the new helper function
     ata_port_is_frozen() (from Niklas).
 
   - Improve internal command handling by not retrying commands that
     failed with a timeout (from Niklas).
 
   - Remove code for several unused libata helper functions (from
     Niklas).
 
   - Remove the palmchip pata_bk3710 driver. A couple of other driver
     removal should come in through the arm tree pull request (from
     Arnd).
 
   - Remove unused variable and function in the sata_dwc_460ex driver and
     libata-sff code (from Colin and Sergey).
 
   - Minor cleanup of the pata_ep93xx driver platform code (from
     Minghao).
 
   - Remove the unnecessary linux/msi.h include from the ahci driver
     (from Thomas).
 
   - Changes to libata enum constants definitions to avoid warnings with
     gcc-13 (from Arnd).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY5aregAKCRDdoc3SxdoY
 dlfSAQCeTLQP9qBrmSUZnP5G5XOHHcxp5maXKWBrPFVsOhTmLQD/WxEGDzgEnnPe
 m8hKvBcqTQIn2QRGCiXRnYAiG9Om0Qo=
 =bJZM
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata updates from Damien Le Moal:
 "The ususal set of driver fixes and improvements as well as several
  patches improving libata core in preparation of the introduction of
  the support for the command duration limits feature. In more details:

   - Define the missing COMPLETED sense key in scsi header (me)

   - Several patches to improve libata handling of the status of
     completed commands and the retry and sense data reported to the
     scsi layer for failed commands. In particular, this widen the
     support for NCQ autosense to all drives that support this feature
     instead of restricting this feature use to ZAC drives only (Niklas)

   - Cleanup of the pata_mpc52xx and sata_dwc_460ex drivers to remove
     the use of the deprecated NO_IRQ macro (Christophe)

   - Fix build dedependency on OF vs use of the of_match_ptr() macro to
     avoid build errors with the sata_gemini and pata_ftide010 drivers
     (me)

   - Some libata cleanups using the new helper function
     ata_port_is_frozen() (Niklas)

   - Improve internal command handling by not retrying commands that
     failed with a timeout (Niklas)

   - Remove code for several unused libata helper functions (from
     Niklas)

   - Remove the palmchip pata_bk3710 driver. A couple of other driver
     removal should come in through the arm tree pull request (from
     Arnd)

   - Remove unused variable and function in the sata_dwc_460ex driver
     and libata-sff code (Colin and Sergey)

   - Minor cleanup of the pata_ep93xx driver platform code (from
     Minghao)

   - Remove the unnecessary linux/msi.h include from the ahci driver
     (Thomas)

   - Changes to libata enum constants definitions to avoid warnings with
     gcc-13 (Arnd)"

* tag 'ata-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (24 commits)
  ata: ahci: fix enum constants for gcc-13
  ata: libata: fix commands incorrectly not getting retried during NCQ error
  ata: ahci: Remove linux/msi.h include
  ata: sata_dwc_460ex: Check !irq instead of irq == NO_IRQ
  ata: pata_ep93xx: use devm_platform_get_and_ioremap_resource()
  ata: libata-sff: kill unused ata_sff_busy_sleep()
  ata: sata_dwc_460ex: remove variable num_processed
  ata: remove palmchip pata_bk3710 driver
  ata: remove unused helper ata_id_flush_ext_enabled()
  ata: remove unused helper ata_id_flush_enabled()
  ata: remove unused helper ata_id_lba48_enabled()
  ata: libata-core: do not retry reading the log on timeout
  scsi: libsas: make use of ata_port_is_frozen() helper
  ata: make use of ata_port_is_frozen() helper
  ata: add ata_port_is_frozen() helper
  ata: pata_ftide010: Remove build dependency on OF
  ata: sata_gemini: Remove dependency on OF for compile tests
  ata: pata_mpc52xx: Replace NO_IRQ with 0
  ata: libahci: read correct status and error field for NCQ commands
  ata: libata: fetch sense data for ATA devices supporting sense reporting
  ...
2022-12-13 10:54:19 -08:00
Anders Roxell
d95d140e83 ata: libahci_platform: ahci_platform_find_clk: oops, NULL pointer
When booting a arm 32-bit kernel with config CONFIG_AHCI_DWC enabled on
a am57xx-evm board. This happens when the clock references are unnamed
in DT, the strcmp() produces a NULL pointer dereference, see the
following oops, NULL pointer dereference:

[    4.673950] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    4.682098] [00000000] *pgd=00000000
[    4.685699] Internal error: Oops: 5 [#1] SMP ARM
[    4.690338] Modules linked in:
[    4.693420] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc7 #1
[    4.699615] Hardware name: Generic DRA74X (Flattened Device Tree)
[    4.705749] PC is at strcmp+0x0/0x34
[    4.709350] LR is at ahci_platform_find_clk+0x3c/0x5c
[    4.714416] pc : [<c130c494>]    lr : [<c0c230e0>]    psr: 20000013
[    4.720703] sp : f000dda8  ip : 00000001  fp : c29b1840
[    4.725952] r10: 00000020  r9 : c1b23380  r8 : c1b23368
[    4.731201] r7 : c1ab4cc4  r6 : 00000001  r5 : c3c66040  r4 : 00000000
[    4.737762] r3 : 00000080  r2 : 00000080  r1 : c1ab4cc4  r0 : 00000000
[...]
[    4.998870]  strcmp from ahci_platform_find_clk+0x3c/0x5c
[    5.004302]  ahci_platform_find_clk from ahci_dwc_probe+0x1f0/0x54c
[    5.010589]  ahci_dwc_probe from platform_probe+0x64/0xc0
[    5.016021]  platform_probe from really_probe+0xe8/0x41c
[    5.021362]  really_probe from __driver_probe_device+0xa4/0x204
[    5.027313]  __driver_probe_device from driver_probe_device+0x38/0xc8
[    5.033782]  driver_probe_device from __driver_attach+0xb4/0x1ec
[    5.039825]  __driver_attach from bus_for_each_dev+0x78/0xb8
[    5.045532]  bus_for_each_dev from bus_add_driver+0x17c/0x220
[    5.051300]  bus_add_driver from driver_register+0x90/0x124
[    5.056915]  driver_register from do_one_initcall+0x48/0x1e8
[    5.062591]  do_one_initcall from kernel_init_freeable+0x1cc/0x234
[    5.068817]  kernel_init_freeable from kernel_init+0x20/0x13c
[    5.074584]  kernel_init from ret_from_fork+0x14/0x2c
[    5.079681] Exception stack(0xf000dfb0 to 0xf000dff8)
[    5.084747] dfa0:                                     00000000 00000000 00000000 00000000
[    5.092956] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.101165] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.107818] Code: e5e32001 e3520000 1afffffb e12fff1e (e4d03001)
[    5.114013] ---[ end trace 0000000000000000 ]---

Add an extra check in the if-statement if hpriv-clks[i].id.

Fixes: 6ce73f3a6f ("ata: libahci_platform: Add function returning a clock-handle by id")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-12-07 08:36:37 +09:00
Arnd Bergmann
f07788079f ata: ahci: fix enum constants for gcc-13
gcc-13 slightly changes the type of constant expressions that are defined
in an enum, which triggers a compile time sanity check in libata:

linux/drivers/ata/libahci.c: In function 'ahci_led_store':
linux/include/linux/compiler_types.h:357:45: error: call to '__compiletime_assert_302' declared with attribute error: BUILD_BUG_ON failed: sizeof(_s) > sizeof(long)
357 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)

The new behavior is that sizeof() returns the same value for the
constant as it does for the enum type, which is generally more sensible
and consistent.

The problem in libata is that it contains a single enum definition for
lots of unrelated constants, some of which are large positive (unsigned)
integers like 0xffffffff, while others like (1<<31) are interpreted as
negative integers, and this forces the enum type to become 64 bit wide
even though most constants would still fit into a signed 32-bit 'int'.

Fix this by changing the entire enum definition to use BIT(x) in place
of (1<<x), which results in all values being seen as 'unsigned' and
fitting into an unsigned 32-bit type.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107917
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107405
Reported-by: Luis Machado <luis.machado@arm.com>
Cc: linux-ide@vger.kernel.org
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: stable@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Luis Machado <luis.machado@arm.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-12-06 14:33:30 +09:00
Niklas Cassel
3d8a3ae3d9 ata: libata: fix commands incorrectly not getting retried during NCQ error
A NCQ error means that the device has aborted processing of all active
commands.
To get the single NCQ command that caused the NCQ error, host software has
to read the NCQ error log, which also takes the device out of error state.

When the device encounters a NCQ error, we receive an error interrupt from
the HBA, and call ata_do_link_abort() to mark all outstanding commands on
the link as ATA_QCFLAG_FAILED (which means that these commands are owned
by libata EH), and then call ata_qc_complete() on them.

ata_qc_complete() will call fill_result_tf() for all commands marked as
ATA_QCFLAG_FAILED.

The taskfile is simply the latest status/error as seen from the device's
perspective. The taskfile will have ATA_ERR set in the status field and
ATA_ABORTED set in the error field.

When we fill the current taskfile values for all outstanding commands,
that means that qc->result_tf will have ATA_ERR set for all commands
owned by libata EH.

When ata_eh_link_autopsy() later analyzes all commands owned by libata EH,
it will call ata_eh_analyze_tf(), which will check if qc->result_tf has
ATA_ERR set, if it does, it will set qc->err_mask (which marks the command
as an error).

When ata_eh_finish() later calls __ata_qc_complete() on all commands owned
by libata EH, it will call qc->complete_fn() (ata_scsi_qc_complete()),
ata_scsi_qc_complete() will call ata_gen_ata_sense() to generate sense
data if qc->err_mask is set.

This means that we will generate sense data for commands that should not
have any sense data set. Having sense data set for the non-failed commands
will cause SCSI to finish these commands instead of retrying them.

While this incorrect behavior has existed for a long time, this first
became a problem once we started reading the correct taskfile register in
commit 4ba09d2026 ("ata: libahci: read correct status and error field
for NCQ commands").

Before this commit, NCQ commands would read the taskfile values received
from the last non-NCQ command completion, which most likely did not have
ATA_ERR set, since the last non-NCQ command was most likely not an error.

Fix this by changing ata_eh_analyze_ncq_error() to mark all non-failed
commands as ATA_QCFLAG_RETRY, and change the loop in ata_eh_link_autopsy()
to skip commands marked as ATA_QCFLAG_RETRY.

While at it, make sure that we clear ATA_ERR and any error bits for all
commands except the actual command that caused the NCQ error, so that no
other libata code will be able to misinterpret these commands as errors.

Fixes: 4ba09d2026 ("ata: libahci: read correct status and error field for NCQ commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-19 09:41:52 +09:00
Thomas Gleixner
6c57e74e6e ata: ahci: Remove linux/msi.h include
Nothing in this file needs anything from linux/msi.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-14 08:24:27 +09:00
Christophe Leroy
01a965d750 ata: sata_dwc_460ex: Check !irq instead of irq == NO_IRQ
NO_IRQ is a relic from the old days. It is not used anymore in core
functions. By the way, function irq_of_parse_and_map() returns value 0
on error.

In some drivers, NO_IRQ is erroneously used to check the return of
irq_of_parse_and_map().

It is not a real bug today because the only architectures using the
drivers being fixed by this patch define NO_IRQ as 0, but there are
architectures which define NO_IRQ as -1. If one day those
architectures start using the non fixed drivers, there will be a
problem.

Long time ago Linus advocated for not using NO_IRQ, see
https://lkml.org/lkml/2005/11/21/221 . He re-iterated the same view
recently in https://lkml.org/lkml/2022/10/12/622

So test !irq instead of tesing irq == NO_IRQ.

And remove the fallback definition of NO_IRQ at the top of the file.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-12 11:00:04 +09:00
Minghao Chi
aebf1e26a8 ata: pata_ep93xx: use devm_platform_get_and_ioremap_resource()
Convert platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-12 10:58:56 +09:00
Niklas Cassel
e20e81a24a ata: libata-core: do not issue non-internal commands once EH is pending
While the ATA specification states that a device should return command
aborted for all commands queued after the device has entered error state,
since ATA only keeps the sense data for the latest command (in non-NCQ
case), we really don't want to send block layer commands to the device
after it has entered error state. (Only ATA EH commands should be sent,
to read the sense data etc.)

Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
(state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
-> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()

Before commit e494f6a728 ("[SCSI] improved eh timeout handler"),
when receiving a TFES error IRQ, the call chain looked like this:
ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)

Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
would be set.

However, after commit e494f6a728 ("[SCSI] improved eh timeout handler"),
scsi_times_out() will instead call scsi_abort_command() which will queue
delayed work, and the worker function scmd_eh_abort_handler() will call
scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).

So now, after the TFES error IRQ has been serviced, we need to wait for
the SCSI workqueue to run its work before SHOST_RECOVERY gets set.

It is worth noting that, even before commit e494f6a728 ("[SCSI] improved
eh timeout handler"), we could receive an error IRQ from the time when
scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
ata_scsi_queuecmd() is actually called.

In order to handle both the delayed setting of SHOST_RECOVERY and the
window where we can receive an error IRQ, add a check against
ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
inside ata_scsi_queuecmd() itself, while holding the ap->lock.
(Since the ap->lock is held while servicing IRQs.)

Fixes: e494f6a728 ("[SCSI] improved eh timeout handler")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-12 07:51:06 +09:00
Yang Yingliang
1ff3635130 ata: libata-transport: fix error handling in ata_tdev_add()
In ata_tdev_add(), the return value of transport_add_device() is
not checked. As a result, it causes null-ptr-deref while removing
the module, because transport_remove_device() is called to remove
the device that was not added.

Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
CPU: 13 PID: 13603 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #36
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x48/0x3a0
lr : device_del+0x44/0x3a0
Call trace:
 device_del+0x48/0x3a0
 attribute_container_class_device_del+0x28/0x40
 transport_remove_classdev+0x60/0x7c
 attribute_container_device_trigger+0x118/0x120
 transport_remove_device+0x20/0x30
 ata_tdev_delete+0x24/0x50 [libata]
 ata_tlink_delete+0x40/0xa0 [libata]
 ata_tport_delete+0x2c/0x60 [libata]
 ata_port_detach+0x148/0x1b0 [libata]
 ata_pci_remove_one+0x50/0x80 [libata]
 ahci_remove_one+0x4c/0x8c [ahci]

Fix this by checking and handling return value of transport_add_device()
in ata_tdev_add(). In the error path, device_del() is called to delete
the device which was added earlier in this function, and ata_tdev_free()
is called to free ata_dev.

Fixes: d9027470b8 ("[libata] Add ATA transport class")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-11 17:26:05 +09:00
Yang Yingliang
cf0816f632 ata: libata-transport: fix error handling in ata_tlink_add()
In ata_tlink_add(), the return value of transport_add_device() is
not checked. As a result, it causes null-ptr-deref while removing
the module, because transport_remove_device() is called to remove
the device that was not added.

Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
CPU: 33 PID: 13850 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #12
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x48/0x39c
lr : device_del+0x44/0x39c
Call trace:
 device_del+0x48/0x39c
 attribute_container_class_device_del+0x28/0x40
 transport_remove_classdev+0x60/0x7c
 attribute_container_device_trigger+0x118/0x120
 transport_remove_device+0x20/0x30
 ata_tlink_delete+0x88/0xb0 [libata]
 ata_tport_delete+0x2c/0x60 [libata]
 ata_port_detach+0x148/0x1b0 [libata]
 ata_pci_remove_one+0x50/0x80 [libata]
 ahci_remove_one+0x4c/0x8c [ahci]

Fix this by checking and handling return value of transport_add_device()
in ata_tlink_add().

Fixes: d9027470b8 ("[libata] Add ATA transport class")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-11 17:26:03 +09:00
Yang Yingliang
3613dbe390 ata: libata-transport: fix error handling in ata_tport_add()
In ata_tport_add(), the return value of transport_add_device() is
not checked. As a result, it causes null-ptr-deref while removing
the module, because transport_remove_device() is called to remove
the device that was not added.

Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
CPU: 12 PID: 13605 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #8
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x48/0x39c
lr : device_del+0x44/0x39c
Call trace:
 device_del+0x48/0x39c
 attribute_container_class_device_del+0x28/0x40
 transport_remove_classdev+0x60/0x7c
 attribute_container_device_trigger+0x118/0x120
 transport_remove_device+0x20/0x30
 ata_tport_delete+0x34/0x60 [libata]
 ata_port_detach+0x148/0x1b0 [libata]
 ata_pci_remove_one+0x50/0x80 [libata]
 ahci_remove_one+0x4c/0x8c [ahci]

Fix this by checking and handling return value of transport_add_device()
in ata_tport_add().

Fixes: d9027470b8 ("[libata] Add ATA transport class")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-11 17:26:02 +09:00
Yang Yingliang
8c76310740 ata: libata-transport: fix double ata_host_put() in ata_tport_add()
In the error path in ata_tport_add(), when calling put_device(),
ata_tport_release() is called, it will put the refcount of 'ap->host'.

And then ata_host_put() is called again, the refcount is decreased
to 0, ata_host_release() is called, all ports are freed and set to
null.

When unbinding the device after failure, ata_host_stop() is called
to release the resources, it leads a null-ptr-deref(), because all
the ports all freed and null.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
CPU: 7 PID: 18671 Comm: modprobe Kdump: loaded Tainted: G            E      6.1.0-rc3+ #8
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : ata_host_stop+0x3c/0x84 [libata]
lr : release_nodes+0x64/0xd0
Call trace:
 ata_host_stop+0x3c/0x84 [libata]
 release_nodes+0x64/0xd0
 devres_release_all+0xbc/0x1b0
 device_unbind_cleanup+0x20/0x70
 really_probe+0x158/0x320
 __driver_probe_device+0x84/0x120
 driver_probe_device+0x44/0x120
 __driver_attach+0xb4/0x220
 bus_for_each_dev+0x78/0xdc
 driver_attach+0x2c/0x40
 bus_add_driver+0x184/0x240
 driver_register+0x80/0x13c
 __pci_register_driver+0x4c/0x60
 ahci_pci_driver_init+0x30/0x1000 [ahci]

Fix this by removing redundant ata_host_put() in the error path.

Fixes: 2623c7a5f2 ("libata: add refcounting to ata_host")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-11 17:26:00 +09:00
Sergey Shtylyov
d5b560c014 ata: libata-sff: kill unused ata_sff_busy_sleep()
Nobody seems to call ata_sff_busy_sleep(), so we can get rid of it...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-11 17:20:26 +09:00
Shin'ichiro Kawasaki
ea045fd344 ata: libata-scsi: fix SYNCHRONIZE CACHE (16) command failure
SAT SCSI/ATA Translation specification requires SCSI SYNCHRONIZE CACHE
(10) and (16) commands both shall be translated to ATA flush command.
Also, ZBC Zoned Block Commands specification mandates SYNCHRONIZE CACHE
(16) command support. However, libata translates only SYNCHRONIZE CACHE
(10). This results in SYNCHRONIZE CACHE (16) command failures on SATA
drives and then libata translation does not conform to ZBC. To avoid the
failure, add support for SYNCHRONIZE CACHE (16).

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-08 15:08:25 +09:00
Yang Yingliang
015618c3ec ata: palmld: fix return value check in palmld_pata_probe()
If devm_platform_ioremap_resource() fails, it never return
NULL pointer, replace the check with IS_ERR().

Fixes: 57bf0f5a16 ("ARM: pxa: use pdev resource for palmld mmio")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-31 20:28:05 +09:00
Sergey Shtylyov
171a93182e ata: pata_legacy: fix pdc20230_set_piomode()
Clang gives a warning when compiling pata_legacy.c with 'make W=1' about
the 'rt' local variable in pdc20230_set_piomode() being set but unused.
Quite obviously, there is an outb() call missing to write back the updated
variable. Moreover, checking the docs by Petr Soucek revealed that bitwise
AND should have been done with a negated timing mask and the master/slave
timing masks were swapped while updating...

Fixes: 669a5db411 ("[libata] Add a bunch of PATA drivers.")
Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-31 20:27:27 +09:00
Colin Ian King
de58fd3d80 ata: sata_dwc_460ex: remove variable num_processed
Variable num_processed is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-27 09:57:55 +09:00
Arnd Bergmann
43c1061870 ata: remove palmchip pata_bk3710 driver
This device was used only on the davinci dm644x platform that
is now gone, and no references to the device remain in the
kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-21 08:04:39 +09:00
Niklas Cassel
5122e53ee7 ata: libata-core: do not retry reading the log on timeout
ata_read_log_page() first tries to read the log using READ LOG DMA EXT.
If that fails it will instead try to read the log using READ LOG EXT.

ata_exec_internal_sg() is synchronous, so it will wait for the command to
finish. If we actually got an error back from the device, it is correct
to retry. However, if the command timed out, ata_exec_internal_sg() will
freeze the port.

There is no point in retrying if the port is frozen, as
ata_exec_internal_sg() will return AC_ERR_SYSTEM on a frozen port,
without ever sending the command down to the drive.

Therefore, avoid retrying if the first command froze the port, as that
will result in a misleading AC_ERR_SYSTEM error print, instead of printing
the error that actually caused the port to be frozen (AC_ERR_TIMEOUT).

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-19 13:46:08 +09:00
Niklas Cassel
4cb7c6f1ef ata: make use of ata_port_is_frozen() helper
Clean up the code by making use of the newly introduced
ata_port_is_frozen() helper function.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-18 13:53:27 +09:00
Damien Le Moal
dc62c7e6ed ata: pata_ftide010: Remove build dependency on OF
The pata_ftide010 can be built without CONFIG_OF being enabled, as long
as the macro of_match_ptr() is not used when initializing the platform
driver .of_match_table field.

Remove the use of this macro and the build dependency on OF.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
2022-10-18 08:05:10 +09:00
Damien Le Moal
6c4c900b73 ata: sata_gemini: Remove dependency on OF for compile tests
If CONFIG_OF is disabled, then using the macro of_match_ptr() results
in the gemini_sata_of_match variable being unused, which generates a
compilation warning and a compilation error if CONFIG_WERROR is enabled.

Removing the use of this macro by directly assigning the
gemini_sata_of_match match table to the .of_match_table field in the
platform driver definition allows removing the dependency on OF for
compile tests, thus improving compile test coverage.

Fixes: f7220eac75 ("ata: Kconfig: fix sata gemini compile test condition")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2022-10-18 08:04:46 +09:00
Damien Le Moal
2ce3a0bf20 ata: ahci_qoriq: Fix compilation warning
When compiling with clang and W=1, the following warning is generated:

drivers/ata/ahci_qoriq.c:283:22: error: cast to smaller integer type
'enum ahci_qoriq_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
                qoriq_priv->type = (enum ahci_qoriq_type)of_id->data;
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2022-10-18 08:02:14 +09:00
Damien Le Moal
26d9f48d99 ata: ahci_imx: Fix compilation warning
When compiling with clang and W=1, the following warning is generated:

drivers/ata/ahci_imx.c:1070:18: error: cast to smaller integer type
'enum ahci_imx_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
        imxpriv->type = (enum ahci_imx_type)of_id->data;
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2022-10-18 08:02:14 +09:00
Damien Le Moal
e8fbdf1855 ata: ahci_xgene: Fix compilation warning
When compiling with clang and W=1, the following warning is generated:

drivers/ata/ahci_xgene.c:788:14: error: cast to smaller integer type
'enum xgene_ahci_version' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
       version = (enum xgene_ahci_version) of_devid->data;
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using a cast to unsigned long to match the "void *" type
size of of_devid->data.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2022-10-18 08:02:14 +09:00
Damien Le Moal
7d7b0c8512 ata: ahci_brcm: Fix compilation warning
When compiling with clang and W=1, the following warning is generated:

drivers/ata/ahci_brcm.c:451:18: error: cast to smaller integer type
'enum brcm_ahci_version' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
        priv->version = (enum brcm_ahci_version)of_id->data;
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
2022-10-18 08:01:57 +09:00
Damien Le Moal
0ffac4727e ata: sata_rcar: Fix compilation warning
When compiling with clang and W=1, the following warning is generated:

drivers/ata/sata_rcar.c:878:15: error: cast to smaller integer type
'enum sata_rcar_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
        priv->type = (enum sata_rcar_type)of_device_get_match_data(dev);
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by using a cast to unsigned long to match the "void *" type
size returned by of_device_get_match_data().

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
2022-10-18 08:01:45 +09:00
Damien Le Moal
17cc1ee6e8 ata: ahci_st: Fix compilation warning
If CONFIG_OF is disabled and the ahci_st driver is builtin (or
CONFIG_MODULES is disabled), then using the macro of_match_ptr()
results in the st_ahci_match variable being unused, which generates a
compilation warning and a compilation error if CONFIG_WERROR is enabled.

Fix this by directly assigning st_ahci_match to .of_match_table in the
st_ahci_driver platform driver definition.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2022-10-17 22:01:57 +09:00
Kai-Heng Feng
1e41e693f4 ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
UBSAN complains about array-index-out-of-bounds:
[ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41
[ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]'
[ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu
[ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010
[ 1.980718] kernel: Call Trace:
[ 1.980721] kernel: <TASK>
[ 1.980723] kernel: show_stack+0x52/0x58
[ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f
[ 1.980734] kernel: dump_stack+0x10/0x12
[ 1.980736] kernel: ubsan_epilogue+0x9/0x45
[ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49
[ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci]
[ 1.980748] kernel: ata_qc_issue+0x135/0x240
[ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580
[ 1.980754] kernel: ? vprintk_default+0x1d/0x20
[ 1.980759] kernel: ata_exec_internal+0x67/0xa0
[ 1.980762] kernel: sata_pmp_read+0x8d/0xc0
[ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90
[ 1.980768] kernel: sata_pmp_attach+0x8b/0x310
[ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0
[ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30
[ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci]
[ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci]
[ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci]
[ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0
[ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560
[ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40
[ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci]
[ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600
[ 1.980810] kernel: ata_scsi_error+0x9c/0xd0
[ 1.980813] kernel: scsi_error_handler+0xa1/0x180
[ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0
[ 1.980820] kernel: kthread+0x12a/0x150
[ 1.980823] kernel: ? set_kthread_struct+0x50/0x50
[ 1.980826] kernel: ret_from_fork+0x22/0x30
[ 1.980831] kernel: </TASK>

This happens because sata_pmp_init_links() initialize link->pmp up to
SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array.

I can't find the maximum Enclosure Management ports specified in AHCI
spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier
Information" can utilize 4 bits, which implies it can support up to 16
ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the
issue.

BugLink: https://bugs.launchpad.net/bugs/1970074
Cc: stable@vger.kernel.org
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 12:02:59 +09:00
Alexander Stein
979556f152 ata: ahci-imx: Fix MODULE_ALIAS
'ahci:' is an invalid prefix, preventing the module from autoloading.
Fix this by using the 'platform:' prefix and DRV_NAME.

Fixes: 9e54eae23b ("ahci_imx: add ahci sata support on imx platforms")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:58:27 +09:00
Christophe Leroy
1dea5edc90 ata: pata_mpc52xx: Replace NO_IRQ with 0
NO_IRQ is used to check the return of irq_of_parse_and_map().

On some architecture NO_IRQ is 0, on other architectures it is -1.

irq_of_parse_and_map() returns 0 on error, independent of NO_IRQ.

So use 0 instead of using NO_IRQ.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:38:15 +09:00
Niklas Cassel
4ba09d2026 ata: libahci: read correct status and error field for NCQ commands
Currently, for PIO commands, ahci_qc_fill_rtf() reads the status and
error fields from the PIO Setup FIS area of the Received FIS Structure.

For any non-PIO command, ahci_qc_fill_rtf() currently reads the status
and error fields from the D2H Register FIS area of the Received FIS
Structure. This is simply not correct.

According to the SATA 3.5a specification:
11.10 DMA DATA-IN command protocol and 11.11 DMA DATA-OUT command
protocol: READ DMA and WRITE DMA (non-NCQ commands) will end with
the Send_status state, which transmits a Register D2H FIS.

Likewise, in:
11.15 FPDMA QUEUED command protocol: READ FPDMA QUEUED and WRITE FPDMA
QUEUED (NCQ commands) will end with the SendStatus state, which
transmits a Set Device Bits FIS.

So, for NCQ commands, there is never a D2H Register FIS sent.
Reading the status and error fields from the D2H Register FIS area
for a NCQ command, will result in us returning the status and error
values for the last non-NCQ command, which is incorrect.

Update ahci_qc_fill_rtf() to read the status and error fields from
the correct area in the Received FIS Structure for NCQ commands.

Once reason why this has not been detected before, could be because,
in case of an NCQ error, ata_eh_analyze_ncq_error() will overwrite
the (incorrect) status and error values set by ahci_qc_fill_rtf().

However, even successful NCQ commands can have bits set in the status
field (e.g. the sense data available bit), so it is mandatory to read
the status from the correct area also for NCQ commands.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:31:52 +09:00
Niklas Cassel
013115d90e ata: libata: fetch sense data for ATA devices supporting sense reporting
Currently, the sense data reporting feature set is enabled for all ATA
devices which supports the feature set (ata_id_has_sense_reporting()),
see ata_dev_config_sense_reporting().

However, even if sense data reporting is enabled, and the device
indicates that sense data is available, the sense data is only fetched
for ATA ZAC devices. For regular ATA devices, the available sense data
is never fetched, it is simply ignored. Instead, libata will use the
ERROR + STATUS fields and map them to a very generic and reduced set
of sense data, see ata_gen_ata_sense() and ata_to_sense_error().

When sense data reporting was first implemented, regular ATA devices
did fetch the sense data from the device. However, this was restricted
to only ATA ZAC devices in commit ca156e006a ("libata: don't request
sense data on !ZAC ATA devices").

With recent changes related to sense data and NCQ autosense, we want
to, once again, fetch the sense data for all ATA devices supporting
sense reporting.
ata_gen_ata_sense() should only be used for devices that don't support
the sense data reporting feature set.
hopefully the features will be more robust this time around.

It is not just ZAC, many new ATA features, e.g. Command Duration
Limits, relies on working NCQ autosense and sense data. Therefore,
it is not really an option to avoid fetching the sense data forever.

If we encounter a device that is misbehaving because the sense data is
actually fetched, then that device should be quirked such that it
never enables the sense data reporting feature set in the first place,
since such a device is obviously not compliant with the specification.

The order in which we will try to add sense data to a scsi_cmnd:
1) NCQ autosense (if supported) - ata_eh_analyze_ncq_error()
2) REQUEST SENSE DATA EXT (if supported) - ata_eh_request_sense()
3) error + status field translation - ata_gen_ata_sense(), called
   by ata_scsi_qc_complete() if neither 1) or 2) is supported.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:31:52 +09:00
Niklas Cassel
4b89ad8e5e ata: libata: only set sense valid flag if sense data is valid
While this shouldn't be needed if all devices that claim that they
support NCQ autosense (ata_id_has_ncq_autosense()) and/or the sense
data reporting feature (ata_id_has_sense_reporting()), actually
supported those features.

However, there might be some old ATA devices that either have these
bits set, even when they don't support those features, or they simply
return malformed data when using those features.

These devices should be quirked, but in order to try to minimize the
impact for the users of these such devices, it was suggested by Damien
Le Moal that it might be a good idea to sanity check the sense data
received from the device. If the sense data looks bogus, then the
sense data is never added to the scsi_cmnd command.

Introduce a new function, ata_scsi_sense_is_valid(), and use it in all
places where sense data is received from the device.

Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:31:52 +09:00
Niklas Cassel
461ec04067 ata: libata: clarify when ata_eh_request_sense() will be called
ata_eh_request_sense() returns early when flag ATA_QCFLAG_SENSE_VALID is
set. However, since the call to ata_eh_request_sense() is guarded by a
ATA_SENSE bit conditional, the logical conclusion for the reader is that
all checks are performed at the call site.

Highlight the fact that the sense data will not be fetched if flag
ATA_QCFLAG_SENSE_VALID is already set by adding an additional check to
the existing guarding conditional. No functional change.

Additionally, add a comment explaining that ata_eh_analyze_tf() will
only fetch the sense data if:
-It was a non-NCQ command that failed, or
-It was a NCQ command that failed, but the sense data was not included
 in the NCQ command error log (i.e. NCQ autosense is not supported).

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:31:52 +09:00
Niklas Cassel
7390896b34 ata: libata: fix NCQ autosense logic
Currently, the logic if we should call ata_scsi_set_sense()
(and set flag ATA_QCFLAG_SENSE_VALID to indicate that we have
successfully added sense data to the struct ata_queued_cmd)
looks like this:

if (dev->class == ATA_DEV_ZAC &&
    ((qc->result_tf.status & ATA_SENSE) || qc->result_tf.auxiliary))

The problem with this is that a drive can support the NCQ command
error log without supporting NCQ autosense.

On such a drive, if the failing command has sense data, the status
field in the NCQ command error log will have the ATA_SENSE bit set.

It is just that this sense data is not included in the NCQ command
error log when NCQ autosense is not supported. Instead the sense
data has to be fetched using the REQUEST SENSE DATA EXT command.

Therefore, we should only add the sense data if the drive supports
NCQ autosense AND the ATA_SENSE bit is set in the status field.

Fix this, and at the same time, remove the duplicated ATA_DEV_ZAC
check. The struct ata_taskfile supplied to ata_eh_read_log_10h()
is memset:ed before calling the function, so simply checking if
qc->result_tf.auxiliary is set is sufficient to tell us that the
log actually contained sense data.

Fixes: d238ffd59d ("libata: do not attempt to retrieve sense code twice")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-10-17 11:31:52 +09:00
Linus Torvalds
4078aa6850 ata changes for 6.1-rc1
* Print the timeout value for internal command failures due to a
   timeout (from Tomas).
 
 * Improve parameter names in ata_dev_set_feature() to clarify this
   function use (from Niklas).
 
 * Improve the ahci driver low power mode setting initialization to allow
   more flexibility for the user (from Rafael).
 
 * Several patches to remove redundant variables in libata-core,
   libata-eh and the pata_macio driver and to fix typos in comments (from
   Jinpeng, Shaomin, Ye).
 
 * Some code simplifications and macro renaming (for clarity) in various
   functions of libata-core (from me).
 
 * Add a missing check for a potential failure of sata_scr_read() in
   sata_print_link_status() (from Li).
 
 * Cleanup of libata Kconfig PATA_PLATFORM and PATA_OF_PLATFORM options
   (from Lukas).
 
 * Cleanups of ata dt-bindings and improvements of libahci_platform, ahci
   and libahci code (from Serge)
 
 * New driver for Synopsys AHCI SATA controllers, based of the generic
   ahci code (from Serge). One compilation warning fix is added for this
   driver (from me).
 
 * Several fixes to macros used to discover a drive capabilities to be
   consistent with the ACS specifications (from Niklas).
 
 * A couple of simplifcations to some libata functions, removing
   unnecessary arguments (from Niklas).
 
 * An improvements to libata-eh code to avoid unnecessary link reset when
   revalidating a drive after a failed command. In practice, this extra,
   unneeded reset, reset does not cause any arm beyond slightly slowing
   down error recovery (from Niklas).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYz0asgAKCRDdoc3SxdoY
 drHoAQCJhb6MuQHzbN/wR5cTGAfWXQJWBJx2mJr7oKJCrB34PwD/RzphcsuaXDta
 kwbTGlpitegByZTDKt9eMRLWmKgyngw=
 =CnJj
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata updates from Damien Le Moal:

 - Print the timeout value for internal command failures due to a
   timeout (from Tomas)

 - Improve parameter names in ata_dev_set_feature() to clarify this
   function use (from Niklas)

 - Improve the ahci driver low power mode setting initialization to
   allow more flexibility for the user (from Rafael)

 - Several patches to remove redundant variables in libata-core,
   libata-eh and the pata_macio driver and to fix typos in comments
   (from Jinpeng, Shaomin, Ye)

 - Some code simplifications and macro renaming (for clarity) in various
   functions of libata-core (from me)

 - Add a missing check for a potential failure of sata_scr_read() in
   sata_print_link_status() (from Li)

 - Cleanup of libata Kconfig PATA_PLATFORM and PATA_OF_PLATFORM options
   (from Lukas)

 - Cleanups of ata dt-bindings and improvements of libahci_platform,
   ahci and libahci code (from Serge)

 - New driver for Synopsys AHCI SATA controllers, based of the generic
   ahci code (from Serge). One compilation warning fix is added for this
   driver (from me)

 - Several fixes to macros used to discover a drive capabilities to be
   consistent with the ACS specifications (from Niklas)

 - A couple of simplifcations to some libata functions, removing
   unnecessary arguments (from Niklas)

 - An improvements to libata-eh code to avoid unnecessary link reset
   when revalidating a drive after a failed command. In practice, this
   extra, unneeded reset, reset does not cause any arm beyond slightly
   slowing down error recovery (from Niklas)

* tag 'ata-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (45 commits)
  ata: libata-eh: avoid needless hard reset when revalidating link
  ata: libata: drop superfluous ata_eh_analyze_tf() parameter
  ata: libata: drop superfluous ata_eh_request_sense() parameter
  ata: fix ata_id_has_dipm()
  ata: fix ata_id_has_ncq_autosense()
  ata: fix ata_id_has_devslp()
  ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
  ata: libata-eh: Remove the unneeded result variable
  ata: ahci_st: Enable compile test
  ata: ahci_st: Fix compilation warning
  MAINTAINERS: Add maintainers for DWC AHCI SATA driver
  ata: ahci-dwc: Add Baikal-T1 AHCI SATA interface support
  ata: ahci-dwc: Add platform-specific quirks support
  dt-bindings: ata: ahci: Add Baikal-T1 AHCI SATA controller DT schema
  ata: ahci: Add DWC AHCI SATA controller support
  ata: libahci_platform: Add function returning a clock-handle by id
  dt-bindings: ata: ahci: Add DWC AHCI SATA controller DT schema
  ata: ahci: Introduce firmware-specific caps initialization
  ata: ahci: Convert __ahci_port_base to accepting hpriv as arguments
  ata: libahci: Don't read AHCI version twice in the save-config method
  ...
2022-10-07 10:48:49 -07:00
Linus Torvalds
a5088ee725 Thermal control updates for 6.1-rc1
- Rework the device tree initialization, convert the drivers to the
    new API and remove the old OF code (Daniel Lezcano).
 
  - Fix return value to -ENODEV when searching for a specific thermal
    zone which does not exist (Daniel Lezcano).
 
  - Fix the return value inspection in of_thermal_zone_find() (Dan
    Carpenter).
 
  - Fix kernel panic when KASAN is enabled as it detects use after
    free when unregistering a thermal zone (Daniel Lezcano).
 
  - Move the set_trip ops inside the therma sysfs code (Daniel Lezcano).
 
  - Remove unnecessary error message as it is already shown in the
    underlying function (Jiapeng Chong).
 
  - Rework the monitoring path and move the locks upper in the call
    stack to fix some potentials race windows (Daniel Lezcano).
 
  - Fix lockdep_assert() warning introduced by the lock rework (Daniel
    Lezcano).
 
  - Do not lock thermal zone mutex in the user space governor (Rafael
    Wysocki).
 
  - Revert the Mellanox 'hotter thermal zone' feature because it is
    already handled in the thermal framework core code (Daniel Lezcano).
 
  - Increase maximum number of trip points in the thermal core (Sumeet
    Pawnikar).
 
  - Replace strlcpy() with unused retval with strscpy() in the core
    thermal control code (Wolfram Sang).
 
  - Use module_pci_driver() macro in the int340x processor_thermal
    driver (Shang XiaoJing).
 
  - Use get_cpu() instead of smp_processor_id() in the intel_powerclamp
    thermal driver to prevent it from crashing and remove unused
    accounting for IRQ wakes from it (Srinivas Pandruvada).
 
  - Consolidate priv->data_vault checks in int340x_thermal (Rafael
    Wysocki).
 
  - Check the policy first in cpufreq_cooling_register() (Xuewen Yan).
 
  - Drop redundant error message from da9062-thermal (zhaoxiao).
 
  - Drop of_match_ptr() from thermal_mmio (Jean Delvare).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmM7OzUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx8CMP/1kNnDUjtCBIvl/OJEM0yVlEGJNppNu9
 dCNQwuftAJRt7dBB292clKW15ef1fFAHrW/eOy+z62k6k/tfqnMLEK38hpS4pIaL
 PZRjRfpp2CDmBBVTJ0I2nNC9h0zZY4tQK+JCII1M2UmgtLaGbp3NuIu2Ga4i5bNR
 IYE3QvVz/g2/pRNa/BTsJySje/q7wv231Vd5jg4czg58EyntBGO4gmWNuXyZ6OkF
 ijzcu7K5Tkll6DT+Paw8bGcPCyjBtoBhv2A6HtsC3cYfc2tY9BVf3DG2HlG0qTAU
 pIZsLOlUozzJScVbu8ScKj1hlP/iCKcPgVjP4l+U57EtcY/ULBqrQtkDVtGedNOZ
 wa5a1VUsZDrigWRpj1u5SsDWmbAod5uK5X/3naUfH7XtomhqikfaZK7Euek6kfDN
 SEhZaBycAFWlI/L5cG2sd6BBcmWlBqkaHiQ0zEv2YlALY5SkNOUQrrq2A/1LP1Ae
 67xbzbWFXyR8DAq3YyfnLpqgcJcSiyGxmdKZ1u2XHudXeJHKdAB7xlcJz+hfWpYU
 Rd1ZTB3ATA/IMG1rLO2dKgaMmyQCWPG6oXtJjTH0g3sXm0U6K14dwEU1ey9u/Li6
 DmI4cWaXf8WoRvb3rkEcJliayUed4U/ohU+z1bInSE8EuCBuyMLRhoJdi3ZhunOC
 PAyKcHg8fLy9
 =Ufw7
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "The most significant part of this update is the thermal control DT
  initialization rework from Daniel Lezcano and the following conversion
  of drivers to use the new API introduced by it

  Apart from that, the maximum number of trip points in a thermal zone
  is increased and there are some fixes and code cleanups

  Specifics:

   - Rework the device tree initialization, convert the drivers to the
     new API and remove the old OF code (Daniel Lezcano)

   - Fix return value to -ENODEV when searching for a specific thermal
     zone which does not exist (Daniel Lezcano)

   - Fix the return value inspection in of_thermal_zone_find() (Dan
     Carpenter)

   - Fix kernel panic when KASAN is enabled as it detects use after free
     when unregistering a thermal zone (Daniel Lezcano)

   - Move the set_trip ops inside the therma sysfs code (Daniel Lezcano)

   - Remove unnecessary error message as it is already shown in the
     underlying function (Jiapeng Chong)

   - Rework the monitoring path and move the locks upper in the call
     stack to fix some potentials race windows (Daniel Lezcano)

   - Fix lockdep_assert() warning introduced by the lock rework (Daniel
     Lezcano)

   - Do not lock thermal zone mutex in the user space governor (Rafael
     Wysocki)

   - Revert the Mellanox 'hotter thermal zone' feature because it is
     already handled in the thermal framework core code (Daniel Lezcano)

   - Increase maximum number of trip points in the thermal core (Sumeet
     Pawnikar)

   - Replace strlcpy() with unused retval with strscpy() in the core
     thermal control code (Wolfram Sang)

   - Use module_pci_driver() macro in the int340x processor_thermal
     driver (Shang XiaoJing)

   - Use get_cpu() instead of smp_processor_id() in the intel_powerclamp
     thermal driver to prevent it from crashing and remove unused
     accounting for IRQ wakes from it (Srinivas Pandruvada)

   - Consolidate priv->data_vault checks in int340x_thermal (Rafael
     Wysocki)

   - Check the policy first in cpufreq_cooling_register() (Xuewen Yan)

   - Drop redundant error message from da9062-thermal (zhaoxiao)

   - Drop of_match_ptr() from thermal_mmio (Jean Delvare)"

* tag 'thermal-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (55 commits)
  thermal: core: Increase maximum number of trip points
  thermal: int340x: processor_thermal: Use module_pci_driver() macro
  thermal: intel_powerclamp: Remove accounting for IRQ wakes
  thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
  thermal: int340x_thermal: Consolidate priv->data_vault checks
  thermal: cpufreq_cooling: Check the policy first in cpufreq_cooling_register()
  thermal: Drop duplicate words from comments
  thermal: move from strlcpy() with unused retval to strscpy()
  thermal: da9062-thermal: Drop redundant error message
  thermal/drivers/thermal_mmio: Drop of_match_ptr()
  thermal: gov_user_space: Do not lock thermal zone mutex
  Revert "mlxsw: core: Add the hottest thermal zone detection"
  thermal/core: Fix lockdep_assert() warning
  thermal/core: Move the mutex inside the thermal_zone_device_update() function
  thermal/core: Move the thermal zone lock out of the governors
  thermal/governors: Group the thermal zone lock inside the throttle function
  thermal/core: Rework the monitoring a bit
  thermal/core: Rearm the monitoring only one time
  thermal/drivers/qcom/spmi-adc-tm5: Remove unnecessary print function dev_err()
  thermal/of: Remove old OF code
  ...
2022-10-03 15:33:38 -07:00
Damien Le Moal
141f3d6256 ata: libata-sata: Fix device queue depth control
The function __ata_change_queue_depth() uses the helper
ata_scsi_find_dev() to get the ata_device structure of a scsi device and
set that device maximum queue depth. However, when the ata device is
managed by libsas, ata_scsi_find_dev() returns NULL, turning
__ata_change_queue_depth() into a nop, which prevents the user from
setting the maximum queue depth of ATA devices used with libsas based
HBAs.

Fix this by renaming __ata_change_queue_depth() to
ata_change_queue_depth() and adding a pointer to the ata_device
structure of the target device as argument. This pointer is provided by
ata_scsi_change_queue_depth() using ata_scsi_find_dev() in the case of
a libata managed device and by sas_change_queue_depth() using
sas_to_ata_dev() in the case of a libsas managed ata device.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: John Garry <john.garry@huawei.com>
2022-09-28 20:47:31 +09:00
Damien Le Moal
6a8438de52 ata: libata-scsi: Fix initialization of device queue depth
For SATA devices supporting NCQ, drivers using libsas first initialize a
scsi device queue depth based on the controller and device capabilities,
leading to the scsi device queue_depth field being 32 (ATA maximum queue
depth) for most setup. However, if libata was loaded using the
force=[ID]]noncq argument, the default queue depth should be set to 1 to
reflect the fact that queuable commands will never be used. This is
consistent with manually setting a device queue depth to 1 through sysfs
as that disables NCQ use for the device.

Fix ata_scsi_dev_config() to honor the noncq parameter by sertting the
device queue depth to 1 for devices that do not have the ATA_DFLAG_NCQ
flag set.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: John Garry <john.garry@huawei.com>
2022-09-28 20:47:26 +09:00
Niklas Cassel
ea08aec7e7 libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Commit 1527f69204 ("ata: ahci: Add Green Sardine vendor ID as
board_ahci_mobile") added an explicit entry for AMD Green Sardine
AHCI controller using the board_ahci_mobile configuration (this
configuration has later been renamed to board_ahci_low_power).

The board_ahci_low_power configuration enables support for low power
modes.

This explicit entry takes precedence over the generic AHCI controller
entry, which does not enable support for low power modes.

Therefore, when commit 1527f69204 ("ata: ahci: Add Green Sardine
vendor ID as board_ahci_mobile") was backported to stable kernels,
it make some Pioneer optical drives, which was working perfectly fine
before the commit was backported, stop working.

The real problem is that the Pioneer optical drives do not handle low
power modes correctly. If these optical drives would have been tested
on another AHCI controller using the board_ahci_low_power configuration,
this issue would have been detected earlier.

Unfortunately, the board_ahci_low_power configuration is only used in
less than 15% of the total AHCI controller entries, so many devices
have never been tested with an AHCI controller with low power modes.

Fixes: 1527f69204 ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile")
Cc: stable@vger.kernel.org
Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-27 08:20:37 +09:00
Niklas Cassel
71d7b6e51a ata: libata-eh: avoid needless hard reset when revalidating link
Performing a revalidation on a AHCI controller supporting LPM,
while using a lpm mode of e.g. med_power_with_dip (hipm + dipm) or
medium_power (hipm), will currently always lead to a hard reset.

The expected behavior is that a hard reset is only performed when
revalidate fails, because the properties of the drive has changed.

A revalidate performed after e.g. a NCQ error, or such a simple thing
as disabling write-caching (hdparm -W 0 /dev/sda), should succeed on
the first try (and should therefore not cause the link to be reset).

This unwarranted hard reset happens because ata_phys_link_offline()
returns true for a link that is in deep sleep. Thus the call to
ata_phys_link_offline() in ata_eh_revalidate_and_attach() will cause
the revalidation to fail, which causes ata_eh_handle_dev_fail() to be
called, which will set ehc->i.action |= ATA_EH_RESET, such that the
link is reset before retrying revalidation.

When the link is reset, the link is reestablished, so when
ata_eh_revalidate_and_attach() is called the second time, directly
after the link has been reset, ata_phys_link_offline() will return
false, and the revalidation will succeed.

Looking at "8.3.1.3 HBA Initiated" in the AHCI 1.3.1 specification,
it is clear the when host software writes a new command to memory,
by setting a bit in the PxCI/PxSACT HBA port registers, the HBA will
automatically bring back the link before sending out the Command FIS.

However, simply reading a SCR (like ata_phys_link_offline() does),
will not cause the HBA to automatically bring back the link.

As long as hipm is enabled, the HBA will put an idle link into deep
sleep. Avoid this needless hard reset on revalidation by temporarily
disabling hipm, by setting the LPM mode to ATA_LPM_MAX_POWER.

After revalidation is complete, ata_eh_recover() will restore the link
policy by setting the LPM mode to ap->target_lpm_policy.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-24 15:31:00 +09:00
Niklas Cassel
e3b1fff6c0 ata: libata: drop superfluous ata_eh_analyze_tf() parameter
The parameter can easily be derived from struct ata_queued_cmd.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-21 11:18:36 +09:00
Niklas Cassel
b46c760e11 ata: libata: drop superfluous ata_eh_request_sense() parameter
The parameter can easily be derived from struct ata_queued_cmd.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-21 11:18:33 +09:00
ye xingchen
cb6e73aaad ata: libata-eh: Remove the unneeded result variable
Return the value ata_port_abort() directly instead of storing it in
another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-21 11:17:05 +09:00
Damien Le Moal
ecf8322f46 ata: ahci_st: Enable compile test
Enable compiling the ahci_st driver when COMPILE_TEST is enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-20 08:52:02 +09:00
Damien Le Moal
2d29dd108c ata: ahci_st: Fix compilation warning
Remove the unused variable dev in st_ahci_probe() to avoid compilation
warning and build failures where CONFIG_WERROR is enabled.

Fixes: 3f74cd046f ("ata: libahci_platform: Parse ports-implemented property in resources getter")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-20 08:43:07 +09:00
Serge Semin
9628711aa6 ata: ahci-dwc: Add Baikal-T1 AHCI SATA interface support
It's almost fully compatible DWC AHCI SATA IP-core derivative except the
reference clocks source, which need to be very carefully selected. In
particular the DWC AHCI SATA PHY can be clocked either from the pads
ref_pad_clk_{m,p} or from the internal wires ref_alt_clk_{m,n}. In the
later case the clock signal is generated from the Baikal-T1 CCU SATA PLL.
The clocks source is selected by means of the ref_use_pad wire connected
to the CCU SATA reference clock CSR.

In normal situation it would be much more handy to use the internal
reference clock source, but alas we haven't managed to make the AHCI
controller working well with it so far. So it's preferable to have the
controller clocked from the external clock generator and fallback to the
internal clock source only as a last resort. Other than that the
controller is full compatible with the DWC AHCI SATA IP-core.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-17 01:40:31 +09:00
Serge Semin
bc7af9100f ata: ahci-dwc: Add platform-specific quirks support
Some DWC AHCI SATA IP-core derivatives require to perform small platform
or IP-core specific setups. They are too small to be placed in a dedicated
driver. It's just much easier to have a set of quirks for them right in
the DWC AHCI driver code. Since we are about to add such platform support,
as a pre-requisite we introduce a platform-data based DWC AHCI quirks API.
The platform data can be used to define the flags passed to the
ahci_platform_get_resources() method, additional AHCI host-flags and a set
of callbacks to initialize, re-initialize and clear the platform settings.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-17 01:40:27 +09:00