IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
There is a chance of the driver to be stuck in kdump if drives start
acting up in kdump discovery process and the kernel decides to send eh
resets, which would prompt rescan to be scheduled.
Do not perform a rescan in kdump context, since we do not expect a hotplug
event during kdump and all the devices are going to go away anyway.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add back the ability to scan for hotplug changes while eh was in progress.
Schedule a rescan for a later time in the eh recovery code and wait for
eh to complete in the rescan worker.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver fails to retrieve information from the fw (could happen when
the fw is not fully in its senses), the driver does nothing and change is
not processed correctly by the driver
Schedule host rescan in case of failure. This is only for SAFW, since
the information retrieval failure will happen on SAFW devices.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.
Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.
Protect safw update function with a scan mutex.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The device hotplug events are processed only after retrieving the updated
lun information from the fw. Does not make sense to keep them separate.
Merge both the hotplug handling and safw adapter setup code into single
function.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Resolve luns checks the if a sdev is already present in the os to figure
out if it needs to be removed. Internally the driver exposes HBA on bus
2 even though its bus 1 in the fw. Its mildly confusing.
Refactor out the sdev lookup into its function to check if sdev has been
added to the kernel or not. Add helper functions to add, remove and put
devices based on their fw bus and target number.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added macros to loop through the MAX SUPPORTED Buses and Targets. This
will make the code a bit easier to read.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hotplug handler code is duplicated for hba handling and container
handling.
Merged function to handle hba and container hot plug events into the
resolve luns functions. Added a bunch of helper functions to check the
validity of a given target and to check if bus, target is container
device.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge aac_get_containers to setup target function, so that information
about all the present devices can be retrieved in one shot.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add helper function to setup targets devices and create the base for the
upcoming patches
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename variables and functions to make bmic identify, report phy luns
to make them consistent across code internal existing code bases
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Edit function that retrieves phy lun information to use common
bmic function
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ideally driver needs to wait for IO to be submitted or responded to before
shutdown.
Move code to wait for IO completion into shutdown path
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of the recovery process, the drivers removes offline devices (
done by the kernel) and then tries to add them back in the rescan code.
Removing the device is like taking a sledgehammer to a nail.
Set the device as running if it is marked offline.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)
Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aacraid passes the current time to the firmware in one of two ways,
either as year/month/day/... or as 32-bit unsigned seconds.
The first one is broken on 32-bit architectures as it cannot go past
year 2038. Using timespec64 here makes it behave properly on both 32-bit
and 64-bit architectures, and avoids relying on signed integer overflow
to pass times into the second interface.
The interface used in aac_send_hosttime() however is still problematic
in year 2106 when 32-bit seconds overflow. Hopefully we don't have to
worry about aacraid by that time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To correctly identify which fib has a scsi command callback this
patch implements a flag FIB_CONTEXT_FLAG_SCSI_CMD.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_hba_send() will return FAILED for any non-SCSI command requests,
failing any TMFs. This patch updates the check to allow TMFs.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove reference to Series-9 HBA and created arc ctrl check function.
Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Made sure that ioctl commands return in case of a controller reset.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The command thread checks the ctrl health periodically before sending
updates to the controller. The function that it uses is aac_check_health
which does more than get the health status.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Check health does not need to reset the ctrl but just return the
controller health status.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver changed the DMA consistent map after consistent memory was
allocated, this invalidated the IOMMU identity mapping. The fix was to
make sure that we set the DMA consistent mask setting once depending on
the controller card.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There were pci_alloc_consistent() failures on ARM64 platform. Use
dma_alloc_coherent() with GFP_KERNEL flag DMA memory allocations.
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
[hch: tweaked indentation, removed memsets]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During a PCI error recovery, if aac_check_health() is not aware that a
PCI error happened and we have an offline PCI channel, it might trigger
some errors (like NULL pointer dereference) and inhibit the error
recovery process to complete.
This patch makes the health check procedure aware of PCI channel issues,
and in case of error recovery process, the function
aac_adapter_check_health() returns -1 and let the recovery process to
complete successfully. This patch was tested on upstream kernel
v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
(VID:DID) - the error recovery procedure was able to recover fine.
Fixes: 5c63f7f710 ("aacraid: Added EEH support")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, command threads fails to return ioctls commands for older
controller versions, since it returns when all the fibs have been
allocated. Another issue is even all the fibs have not been allocated,
the correct allocated fibs is not updated nor freed.
Fixes: 113156bcea (scsi: aacraid: Reworked aac_command_thread)
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The check for ret being zero is redundant as a few statements earlier we
break out of the while loop if ret is non-zero. Thus we can remove the
zero check and also the dead-code non-zero case too.
Detected by CoverityScan, CID#1411632 ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current driver Hotplug processing code skips over Enclosure channel,
therefore any addition/removal of expander enclosure is not processed.
Additionally device addition code relies on older device type, which
prevents the hotplug of adapter expanders.
Fixed by removing code that skips over Enclosure channels and using the
latest device type for addition or removal or enclosure expanders.
Fixes: 6223a39fe6 (scsi: aacraid: Added support for hotplug)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver does not unlock the reply queue spin lock after handling SMART
adapter events. Instead it might attempt to unlock an already unlocked
spin lock.
Fixed by making sure the driver locks the spin lock before freeing it.
Thank you dan for finding this issue out.
Fixes: 6223a39fe6 (scsi: aacraid: Added support for hotplug)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During the IOP reset stress testing, it was found that the drives can be
marked offline when the adapter controller crashes and IO's are running
in parallel. When the controller does come back from the reset, the drive
that is marked offline is not exposed.
Fixed by removing and adding drives that are marked offline. In addition
invoke a scsi host bus rescan to capture any additional configuration
changes.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_command_thread checks on the health of controller periodically,
using aac_check_health. If the status is an error state KERNEL_PANIC or
anything else. The driver will attempt to restart the adapter, but the
response is not checked in aac_command_thread. This allows the periodic
sync to go thru and lead the driver to a hung state.
Fixed by terminating the periodic loop(intended per original design),
if the controller is not restored to a healthy state.
Cc: stable@vger.kernel.org
Fixes: 3d77d84044 (scsi: aacraid: Added support for periodic wellness sync)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_fib_map_free frees misaligned fib dma memory, additionally it does not
free up the whole memory.
Fixed by changing the code to free up the correct and full memory
allocation.
Cc: stable@vger.kernel.org
Fixes: e8b12f0fb8 ([SCSI] aacraid: Add new code for PMC-Sierra's SRC based controller family)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Arrconf management utility at times sends fibs with AdapterProcessed set
in its fibs. This causes the controller to panic and lockup.
Fixed by failing the commands that have AdapterProcessed set in its flag.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This issue showed up on a kdump debug(single CPU on powerkvm), when EEH
errors rendered the adapter unusable. The driver correctly detected the
issue and attempted to restart the controller, in doing so the driver
attempted to read the status registers of the controller. This triggered
additional eeh errors which continued for a good 6 minutes.
Fixed by returning without waiting when EEH error is reported.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replaced camel case with snake case for init supported options.
Suggested-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added new copyright messages
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added a new IWBR soft reset type, reworked the IOP reset interface for
a bit.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added support to send direct pasthru srb commands from management utilty
to the controller.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added support for drive hotplug add and removal
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds a new functions that periodically sync the time of host
to the adapter. In addition also informs the adapter that the driver is
alive and kicking. Only applicable to the HBA1000 and SMARTIOC2000.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reworked aac_command_thread into aac_process_events
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch enables the driver to actually process the I/O, or srb replies
from adapter. In addition to any HBA1000 or SmartIOC2000 adapter events.
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch lays the groundwork for supporting the new HBA-1000 controller
family.A new INIT structure INIT_STRUCT_8 has been added which allows for a
variable size for MSI-x vectors among other things, and is used for both
Series-8, HBA-1000 and SmartIOC-2000.
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Signed-off-by: Dave Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use pci_alloc_irq_vectors and drop the hand-crafted interrupt affinity
routines.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Cc: Adaptec OEM Raid Solutions <aacraid@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Firmware AIF messages about cache loss and data recovery are being missed
by the driver since currently they are not captured but rather let go.
This patch to capture those messages and log them for the user.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.
In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.
The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.
Fixed by: Adding another kthread_should_stop() before schedule_timeout()
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_fib_send has a special function case for initial commands during
driver initialization using wait < 0(pseudo sync mode). In this case,
the command does not sleep but rather spins checking for timeout.This
loop is calls cpu_relax() in an attempt to allow other processes/threads
to use the CPU, but this function does not relinquish the CPU and so the
command will hog the processor. This was observed in a KDUMP
"crashkernel" and that prevented the "command thread" (which is
responsible for completing the command from being timed out) from
starting because it could not get the CPU.
Fixed by replacing "cpu_relax()" call with "schedule()"
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>