linux

iv/linux

Author	SHA1	Message	Date
Stefan Richter	5e2125677f	firewire: sbp2: fix DMA mapping leak on the failure path Reported-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> who also provided a first version of the fix. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-01-28 20:31:08 +01:00
Stefan Richter	f746072abc	firewire: sbp2: define some magic numbers as macros Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-01-28 20:31:07 +01:00
Stefan Richter	a08e100aec	firewire: sbp2: fix payload limit at S1600 and S3200 1394-2008 clause 16.3.4.1 (1394b-2002 clause 16.3.1.1) defines tighter limits than 1394-2008 clause 6.2.2.3 (1394a-2000 clause 6.2.2.3). Our previously too large limit doesn't matter though if the controller reports its max_receive correctly. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-01-28 20:31:07 +01:00
Stefan Richter	621f6dd715	firewire: fw-sbp2: remove unnecessary locking What was I thinking when I added sbp2_set_generation()? Its locking did nothing (except for implicitly providing the necessary barrier between node IDs update and generation update). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-01-04 23:50:36 +01:00
Stefan Richter	031bb27c4b	firewire: fw-sbp2: another iPod mini quirk entry Add another model ID of a broken firmware to prevent early I/O errors by acesses at the end of the disk. Reported at linux1394-user, http://marc.info/?t=122670842900002 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-11-25 21:38:31 +01:00
Kay Sievers	a1f64819fe	firewire: struct device - replace bus_id with dev_name(), dev_set_name() Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-10-31 08:48:25 +01:00
Jay Fenlason	cd1f70fdb4	firewire: fw-sbp2: fix races 1: There is a small race between queue_delayed_work() and its corresponding kref_get(). Do the kref_get first, and _put it again if the queue_delayed_work() failed, so there is no chance of the kref going to zero while the work is scheduled. 2: An SBP2_LOGOUT_REQUEST could be sent out with a login_id full of garbage. Initialize it to an invalid value so we can tell if we ever got a valid login_id. 3: The node ID and generation may have changed but the new values may not yet have been recorded in lu and tgt when the final logout is attempted. Use the latest values from the device in sbp2_release_target(). Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-10-26 10:27:01 +01:00
Stefan Richter	0dcfeb7e3c	firewire: fw-sbp2: delay first login to avoid retries This optimizes firewire-sbp2's device probe for the case that the local node and the SBP-2 node were discovered at the same time. In this case, fw-core's bus management work and fw-sbp2's login and SCSI probe work are scheduled in parallel (in the globally shared workqueue and in fw-sbp2's workqueue, respectively). The bus reset from fw-core may then disturb and extremely delay the login and SCSI probe because the latter fails with several command timeouts and retries and has to be retried from scratch. We avoid this particular situation of sbp2_login() and fw_card_bm_work() running in parallel by delaying the first sbp2_login() a little bit. This is meant to be a short-term fix for https://bugzilla.redhat.com/show_bug.cgi?id=466679. In the long run, the SCSI probe, i.e. fw-sbp2's call of __scsi_add_device(), should be parallelized with sbp2_reconnect(). Problem reported and fix tested and confirmed by Alex Kanavin. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-10-26 10:27:01 +01:00
Stefan Richter	4bbc1bdd01	firewire: fw-sbp2: fix another small generation access bug queuecommand() looked at the remote and local node IDs before it read the bus generation. The corresponding race with sbp2_reconnect updating these data was probably impossible to happen though because the current code blocks the SCSI layer during reconnection. However, better safe than sorry, especially if someone later improves the code to not block the SCSI layer. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-10-15 22:21:10 +02:00
Stefan Richter	09b12dd4e3	firewire: fw-sbp2: enforce s/g segment size limit 1. We don't need to round the SBP-2 segment size limit down to a multiple of 4 kB (0xffff -> 0xf000). It is only necessary to ensure quadlet alignment (0xffff -> 0xfffc). 2. Use dma_set_max_seg_size() to tell the DMA mapping infrastructure and the block IO layer about the restriction. This way we can remove the size checks and segment splitting in the queuecommand path. This assumes that no other code in the firewire stack uses dma_map_sg() with conflicting requirements. It furthermore assumes that the controller device's platform actually allows us to set the segment size to our liking. Assert the latter with a BUG_ON(). 3. Also use blk_queue_max_segment_size() to tell the block IO layer about it. It cannot know it because our scsi_add_host() does not point to the FireWire controller's device. Thanks to Grant Grundler and FUJITA Tomonori for advice. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-10-15 22:21:10 +02:00
Jay Fenlason	1e119fa995	firewire: fw_send_request_sync() Share code between fw_send_request + wait_for_completion callers. Signed-off-by: Jay Fenlason <fenlason@redhat.com> Addendum: Removes an unnecessary struct and an ununsed retry loop. Calls it fw_run_transaction() instead of fw_send_request_sync(). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Kristian Høgsberg <krh@redhat.com>	2008-10-15 22:21:09 +02:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Stefan Richter	2635f96f90	firewire: fw-sbp2: spin disks down on suspend and shutdown This instructs sd_mod to send START STOP UNIT on suspend and resume, and on driver unbinding or unloading (including when the system is shut down). We don't do this though if multiple initiators may log in to the target. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Tino Keitel <tino.keitel@gmx.de>	2008-07-14 13:00:18 +02:00
Stefan Richter	ffcaade310	firewire: fw-sbp2: fix spindown for PL-3507 and TSB42AA9 firmwares Reported by Tino Keitel: PL-3507 with firmware from Prolific does not spin down the disk on START STOP UNIT with power condition = 0 and start = 0. It does however work with power condition = 2 or 3. Also found while investigating this: DViCO Momobay CX-1 and FX-3A (TI TSB42AA9/A based) become unresponsive after START STOP UNIT with power condition = 0 and start = 0. They stay responsive if power condition is set when stopping the motor. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Tino Keitel <tino.keitel@gmx.de>	2008-07-14 13:00:17 +02:00
Richard Sharpe	0e3e2eabf4	firewire: fw-sbp2: fix parsing of logical unit directories There is a small off-by-one bug in firewire-sbp2. This causes problems when a device exports multiple LUN Directories. I found it when trying to talk to a SONY DVD Jukebox. Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com> Acked-by: Kristian Høgsberg <krh@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (op. order, changelog)	2008-06-27 20:55:00 +02:00
Linus Torvalds	d626e3bf72	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: [SCSI] aic94xx: fix section mismatch [SCSI] u14-34f: Fix 32bit only problem [SCSI] dpt_i2o: sysfs code [SCSI] dpt_i2o: 64 bit support [SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to dma_alloc_coherent [SCSI] dpt_i2o: use standard __init / __exit code [SCSI] megaraid_sas: fix suspend/resume sections [SCSI] aacraid: Add Power Management support [SCSI] aacraid: Fix jbod operations scan issues [SCSI] aacraid: Fix warning about macro side-effects [SCSI] add support for variable length extended commands [SCSI] Let scsi_cmnd->cmnd use request->cmd buffer [SCSI] bsg: add large command support [SCSI] aacraid: Fix down_interruptible() to check the return value correctly [SCSI] megaraid_sas; Update the Version and Changelog [SCSI] ibmvscsi: Handle non SCSI error status [SCSI] bug fix for free list handling [SCSI] ipr: Rename ipr's state scsi host attribute to prevent collisions [SCSI] megaraid_mbox: fix Dell CERC firmware problem	2008-05-02 13:52:35 -07:00
Boaz Harrosh	64a87b244b	[SCSI] Let scsi_cmnd->cmnd use request->cmd buffer - struct scsi_cmnd had a 16 bytes command buffer of its own. This is an unnecessary duplication and copy of request's cmd. It is probably left overs from the time that scsi_cmnd could function without a request attached. So clean that up. - Once above is done, few places, apart from scsi-ml, needed adjustments due to changing the data type of scsi_cmnd->cmnd. - Lots of drivers still use MAX_COMMAND_SIZE. So I have left that #define but equate it to BLK_MAX_CDB. The way I see it and is reflected in the patch below is. MAX_COMMAND_SIZE - means: The longest fixed-length () SCSI CDB as per the SCSI standard and is not related to the implementation. BLK_MAX_CDB. - The allocated space at the request level - I have audit all ISA drivers and made sure none use ->cmnd in a DMA Operation. Same audit was done by Andi Kleen. ()fixed-length here means commands that their size can be determined by their opcode and the CDB does not carry a length specifier, (unlike the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly true and the SCSI standard also defines extended commands and vendor specific commands that can be bigger than 16 bytes. The kernel will support these using the same infrastructure used for VARLEN CDB's. So in effect MAX_COMMAND_SIZE means the maximum size command scsi-ml supports without specifying a cmd_len by ULD's Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-05-02 10:18:22 -05:00
Stefan Richter	f32ddaddf9	firewire: fw-sbp2: log scsi_target ID at release Makes the good-by message more informative. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-05-01 19:55:24 +02:00
Stefan Richter	c9755e14a0	firewire: reread config ROM when device reset the bus When a device changes its configuration ROM, it announces this with a bus reset. firewire-core has to check which node initiated a bus reset and whether any unit directories went away or were added on this node. Tested with an IOI FWB-IDE01AB which has its link-on bit set if bus power is available but does not respond to ROM read requests if self power is off. This implements - recognition of the units if self power is switched on after fw-core gave up the initial attempt to read the config ROM, - shutdown of the units when self power is switched off. Also tested with a second PC running Linux/ieee1394. When the eth1394 driver is inserted and removed on that node, fw-core now notices the addition and removal of the IPv4 unit on the ieee1394 node. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:36 +02:00
Jarod Wilson	17cff9ff87	firewire: fw-sbp2: set dual-phase cycle_limit Try to write dual-phase retry protocol limits to BUSY_TIMEOUT register. - The dual-phase retry protocol is optional to implement, and if not supported, writes to the dual-phase portion of the register will be ignored. We try to write the original 1394-1995 default here. - In the case of devices that are also SBP-3-compliant, all writes are ignored, as the register is read-only, but contains single-phase retry of 15, which is what we're trying to set for all SBP-2 device anyway, so this write attempt is safe and yields more consistent behavior for all devices. See section 8.3.2.3.5 of the 1394-1995 spec, section 6.2 of the SBP-2 spec, and section 6.4 of the SBP-3 spec for further details. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:33 +02:00
Stefan Richter	a5fd9ec7a2	firewire: fw-sbp2: reduce log noise The block/unblock logic is now sufficiently tested. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:32 +02:00
Stefan Richter	6f73100cbb	firewire: fw-sbp2: remove unnecessary memset orb came from kzalloc. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:32 +02:00
Stefan Richter	0d7dcbf2a3	firewire: fw-sbp2: simplify some macros How hard can it be to switch on one bit? :-) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:32 +02:00
Stefan Richter	71ee9f01f2	firewire: fw-sbp2: remove usages of fw_memcpy_to_be32 Write directly in big endian instead of byte-swapping after the fact. This saves a few conversions, lets gcc use constant endianess conversions where possible, and enables deeper endianess annotation. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:32 +02:00
Stefan Richter	8ac3a47cab	firewire: fw-sbp2: relax SCSI DMA alignment Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-04-18 17:55:32 +02:00
Stefan Richter	1dc3bea78b	firewire: refactor fw_unit reference counting Add wrappers for getting and putting a unit. Remove some line breaks. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-04-18 17:55:32 +02:00
Stefan Richter	7c1fca3366	firewire: fw-sbp2: fix reference counting The reference count of the unit dropped too low in an error path in sbp2_probe. Fixed by moving the _get further up. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-04-18 17:55:31 +02:00
Stefan Richter	2aa9ff7fc5	firewire: fw-sbp2: fix for SYM13FW500 bridge (Datafab disk) Fix I/O errors due to SYM13FW500's inability to handle larger request sizes. Reported by Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> in https://bugzilla.redhat.com/show_bug.cgi?id=436879 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-03-14 00:56:59 +01:00
Jarod Wilson	51f9dbef5b	firewire: fw-sbp2: set single-phase retry_limit Per the SBP-2 specification, all SBP-2 target devices must have a BUSY_TIMEOUT register. Per the 1394-1995 specification, the retry_limt portion of the register should be set to 0x0 initially, and set on the target by a logged in initiator (i.e., a Linux host w/firewire controller(s)). Well, as it turns out, lots of devices these days have actually moved on to starting to implement SBP-3 compliance, which says that retry_limit should default to 0xf instead (yes, SBP-3 stomps directly on 1394-1995, oops). Prior to this change, the firewire driver stack didn't touch retry_limit, and any SBP-3 compliant device worked fine, while SBP-2 compliant ones were unable to retransmit when the host returned an ack_busy_X, which resulted in stalled out I/O, eventually causing the SCSI layer to give up and offline the device. The simple fix is for us to set retry_limit to 0xf in the register for all devices (which actually matches what the old ieee1394 stack did). Prior to this change, a hard disk behind an SBP-2 Prolific PL-3507 bridge chip would routinely encounter buffer I/O errors and wind up offlined by the SCSI layer. With this change, I've encountered zero I/O failures moving tens of GB of data around. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-03-14 00:56:59 +01:00
Stefan Richter	855c603d61	firewire: fix crash in automatic module unloading "modprobe firewire-ohci; sleep .1; modprobe -r firewire-ohci" used to result in crashes like this: BUG: unable to handle kernel paging request at ffffffff8807b455 IP: [<ffffffff8807b455>] PGD 203067 PUD 207063 PMD 7c170067 PTE 0 Oops: 0010 [1] PREEMPT SMP CPU 0 Modules linked in: i915 drm cpufreq_ondemand acpi_cpufreq freq_table applesmc input_polldev led_class coretemp hwmon eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button thermal processor sg snd_hda_intel snd_pcm snd_timer snd snd_page_alloc sky2 i2c_i801 rtc [last unloaded: crc_itu_t] Pid: 9, comm: events/0 Not tainted 2.6.25-rc2 #3 RIP: 0010:[<ffffffff8807b455>] [<ffffffff8807b455>] RSP: 0018:ffff81007dcdde88 EFLAGS: 00010246 RAX: ffff81007dc95040 RBX: ffff81007dee5390 RCX: 0000000000005e13 RDX: 0000000000008c8b RSI: 0000000000000001 RDI: ffff81007dee5388 RBP: ffff81007dc5eb40 R08: 0000000000000002 R09: ffffffff8022d05c R10: ffffffff8023b34c R11: ffffffff8041a353 R12: ffff81007dee5388 R13: ffffffff8807b455 R14: ffffffff80593bc0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff8055a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff8807b455 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process events/0 (pid: 9, threadinfo ffff81007dcdc000, task ffff81007dc95040) Stack: ffffffff8023b396 ffffffff88082524 0000000000000000 ffffffff8807d9ae ffff81007dc5eb40 ffff81007dc9dce0 ffff81007dc5eb40 ffff81007dc5eb80 ffff81007dc9dce0 ffffffffffffffff ffffffff8023be87 0000000000000000 Call Trace: [<ffffffff8023b396>] ? run_workqueue+0xdf/0x1df [<ffffffff8023be87>] ? worker_thread+0xd8/0xe3 [<ffffffff8023e917>] ? autoremove_wake_function+0x0/0x2e [<ffffffff8023bdaf>] ? worker_thread+0x0/0xe3 [<ffffffff8023e813>] ? kthread+0x47/0x74 [<ffffffff804198e0>] ? trace_hardirqs_on_thunk+0x35/0x3a [<ffffffff8020c008>] ? child_rip+0xa/0x12 [<ffffffff8020b6e3>] ? restore_args+0x0/0x3d [<ffffffff8023e68a>] ? kthreadd+0x14c/0x171 [<ffffffff8023e68a>] ? kthreadd+0x14c/0x171 [<ffffffff8023e7cc>] ? kthread+0x0/0x74 [<ffffffff8020bffe>] ? child_rip+0x0/0x12 Code: Bad RIP value. RIP [<ffffffff8807b455>] RSP <ffff81007dcdde88> CR2: ffffffff8807b455 ---[ end trace c7366c6657fe5bed ]--- Note that this crash happened _after_ firewire-core was unloaded. The shared workqueue tried to run firewire-core's device initialization jobs or similar jobs. The fix makes sure that firewire-ohci and hence firewire-core is not unloaded before all device shutdown jobs have been completed. This is determined by the count of device initializations minus device releases. Also skip useless retries in the node initialization job if the node is to be shut down. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-03-02 12:35:46 +01:00
Stefan Richter	f8436158b1	firewire: fw-sbp2: better fix for NULL pointer dereference in scsi_remove_device Patch "firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device" had the unintended effect that firewire-sbp2 could not be unloaded anymore until all SBP-2 devices were unplugged. We now fix the NULL pointer bug by reacquiring a reference to the sdev instead of holding a reference to the sdev (and to the module) all the time. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Jarod Wilson <jwilson@redhat.com>	2008-03-02 12:35:46 +01:00
Stefan Richter	33f1c6c352	firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device Fix a kernel bug when unplugging an SBP-2 device after having its scsi_device already removed via the "delete" sysfs attribute. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-19 19:57:23 +01:00
Stefan Richter	5513c5f6f9	firewire: fw-sbp2: fix NULL pointer deref. in slave_alloc Fix a kernel bug when running rescan-scsi-bus while a FireWire disk is connected: http://bugzilla.kernel.org/show_bug.cgi?id=10008 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-19 19:57:23 +01:00
Stefan Richter	2e2705bdcb	firewire: fw-sbp2: (try to) avoid I/O errors during reconnect While fw-sbp2 takes the necessary time to reconnect to a logical unit after bus reset, the SCSI core keeps sending new commands. They are all immediately completed with host busy status, and application clients or filesystems will break quickly. The SCSI device might even be taken offline: http://bugzilla.kernel.org/show_bug.cgi?id=9734 The only remedy seems to be to block the SCSI device until reconnect. Alas the SCSI core has no useful API to block only one logical unit i.e. the scsi_device, therefore we block the entire Scsi_Host. This currently corresponds to an SBP-2 target. In case of targets with multiple logical units, we need to satisfy the dependencies between logical units by carefully tracking the blocking state of the target and its units. We block all logical units of a target as soon as one of them needs to be blocked, and keep them blocked until all of them are ready to be unblocked. Furthermore, as the history of the old sbp2 driver has shown, the scsi_block_requests() API is a minefield with high potential of deadlocks. We therefore take extra measures to keep logical units unblocked during __scsi_add_device() and during shutdown. This avoids I/O errors during reconnect in many but alas not in all cases. There may still be errors after a re-login had to be performed. Also, some bridges have been seen to cease fetching management ORBs if I/O went on up until a bus reset. In these cases, all management ORBs time out after mgt_orb_timeout. The old sbp2 driver is less vulnerable or maybe not vulnerable to this, for as yet unknown reasons. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-19 19:57:23 +01:00
Stefan Richter	e80de3704a	firewire: fw-sbp2: enforce a retry of __scsi_add_device if bus generation changed fw-sbp2 is unable to reconnect while performing __scsi_add_device because there is only a single workqueue thread context available for both at the moment. This should be fixed eventually. An actual failure of __scsi_add_device is easy to handle, but an incomplete execution of __scsi_add_device with an sdev returned would remain undetected and leave the SBP-2 target unusable. Therefore we use a workaround: If there was a bus reset during __scsi_add_device (i.e. during the SCSI probe), we remove the new sdev immediately, log out, and attempt login and SCSI probe again. Tested-by: Jarod Wilson <jwilson@redhat.com> (earlier version) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-16 15:40:35 +01:00
Stefan Richter	7bb6bf7c8b	firewire: fw-sbp2: sort includes Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-16 15:40:35 +01:00
Stefan Richter	ce896d95cc	firewire: fw-sbp2: logout and login after failed reconnect If fw-sbp2 was too late with requesting the reconnect, the target would reject this. In this case, log out before attempting the reconnect. Else several firmwares will deny the re-login because they somehow didn't invalidate the old login. Also, don't retry reconnects in this situation. The retries won't succeed either. These changes improve chances for successful re-login and shorten the period during which the logical unit is inaccessible. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-02-16 15:40:35 +01:00
Stefan Richter	0fa6dfdb0a	firewire: fw-sbp2: don't add scsi_device twice When a reconnect failed but re-login succeeded, __scsi_add_device was called again. In those cases, __scsi_add_device succeeded and returned the pointer to the existing scsi_device. fw-sbp2 then continued orderly, except that it missed to call sbp2_cancel_orbs. SCSI core would call fw-sbp2's eh_abort_handler eventually if there had been an outstanding command. This patch avoids the needless lookups and temporary allocations in SCSI core and I/O stall and timeout until eh_abort_handler hits. Also, __scsi_add_device tolerating calls for devices which already exist is undocumented behavior on which we shouldn't rely. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-02-16 15:40:35 +01:00
Stefan Richter	48f18c761c	firewire: fw-sbp2: log bus_id at management request failures for easier readable logs if more than one SBP-2 device is present. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-02-16 15:40:34 +01:00
Stefan Richter	e0e6021555	firewire: fw-sbp2: wait for completion of fetch agent reset Like the old sbp2 driver, wait for the write transaction to the AGENT_RESET to complete before proceeding (after login, after reconnect, or in SCSI error handling). There is one occasion where AGENT_RESET is written to from atomic context when getting DEAD status for a command ORB. There we still continue without waiting for the transaction to complete because this is more difficult to fix... Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-16 15:40:34 +01:00
Stefan Richter	9220f19462	firewire: fw-sbp2: add INQUIRY delay workaround Several different SBP-2 bridges accept a login early while the IDE device is still powering up. They are therefore unable to respond to SCSI INQUIRY immediately, and the SCSI core has to retry the INQUIRY. One of these retries is typically successful, and all is well. But in case of Momobay FX-3A, the INQUIRY retries tend to fail entirely. This can usually be avoided by waiting a little while after login before letting the SCSI core send the INQUIRY. The old sbp2 driver handles this more gracefully for as yet unknown reasons (perhaps because it waits for fetch agent resets to complete, unlike fw-sbp2 which quickly proceeds after requesting the agent reset). Therefore the workaround is not as much necessary for sbp2. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-02-16 15:40:34 +01:00
Stefan Richter	be6f48b017	firewire: fw-sbp2: don't retry login or reconnect after unplug If a device is being unplugged while fw-sbp2 had a login or reconnect on schedule, it would take about half a minute to shut the fw_unit down: Jan 27 18:34:54 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries) <unplug> Jan 27 18:34:59 stein firewire_sbp2: sbp2_scsi_abort Jan 27 18:34:59 stein scsi 25:0:0:0: Device offlined - not ready after error recovery Jan 27 18:35:01 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:06 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:12 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:17 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:22 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:27 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:32 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:32 stein firewire_sbp2: failed to login to fw2.0 LUN 0000 Jan 27 18:35:32 stein firewire_sbp2: released fw2.0 After this patch, typically only a few seconds spent in __scsi_add_device remain: Jan 27 19:05:50 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries) <unplug> Jan 27 19:05:56 stein firewire_sbp2: sbp2_scsi_abort Jan 27 19:05:56 stein scsi 33:0:0:0: Device offlined - not ready after error recovery Jan 27 19:05:56 stein firewire_sbp2: released fw2.0 The benefit of this is less noise in the syslog. It furthermore avoids a few wasted CPU cycles and needlessly prolonged lifetime of a few driver objects. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-02-16 15:40:33 +01:00
Stefan Richter	1b9c12ba2f	firewire: fw-sbp2: fix logout before login retry This fixes a "can't recognize device" kind of bug. If the SCSI INQUIRY failed and hence __scsi_add_device failed due to a bus reset, we tried a logout and then waited for the already scheduled login work to happen. So far so good, but the generation used for the logout was outdated, hence the logout never reached the target. The target might therefore deny the subsequent relogin attempt, which would also leave the target inaccessible. Therefore fetch a fresh device->generation for the logout. Use memory barriers to prevent our plan being foiled by compiler or hardware optimizations. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-16 15:40:32 +01:00
Stefan Richter	05cca73814	firewire: fw-sbp2: unsigned int vs. unsigned Standardize on "unsigned int" style. Sort some struct members thematically. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2008-02-16 15:40:32 +01:00
Jarod Wilson	384170da93	firewire: fw-sbp2: Use sbp2 device-provided mgt orb timeout for logins To be more compliant with section 7.4.8 of the SBP-2 specification, use the mgt_ORB_timeout specified in the SBP-2 device's config rom for login ORB attempts (though with some sanity checks). A happy side-effect is that certain device and controller combinations that sometimes take more than 20 seconds to get synced up (like my laptop with just about any SBP-2 device) now function more reliably. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (silenced sparse)	2008-01-30 22:22:29 +01:00
Jarod Wilson	a4c379c197	firewire: fw-sbp2: increase login orb reply timeout, fix "failed to login" Increase (and rename) the login orb reply timeout value to 20s to match that of the old firewire stack. 2s simply didn't give many devices enough time to spin up and reply. Fixes inability to recognize some devices. Failure mode was "orb reply timed out"/"failed to login". Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (style, comments, changelog)	2008-01-30 22:22:28 +01:00
Stefan Richter	b5d2a5e04e	firewire: enforce access order between generation and node ID, fix "giving up on config rom" fw_device.node_id and fw_device.generation are accessed without mutexes. We have to ensure that all readers will get to see node_id updates before generation updates. Fixes an inability to recognize devices after "giving up on config rom", https://bugzilla.redhat.com/show_bug.cgi?id=429950 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>. Verified to fix 'giving up on config rom' issues on multiple system and drive combinations that were previously affected. Signed-off-by: Jarod Wilson <jwilson@redhat.com> Signed-off-by: Kristian Høgsberg <krh@redhat.com>	2008-01-30 22:22:27 +01:00
Stefan Richter	5a8a1bcd15	firewire: fw-sbp2: use device generation, not card generation There was a small window where a login or reconnect job could use an already updated card generation with an outdated node ID. We have to use the fw_device.generation here, not the fw_card.generation, because the generation must never be newer than the node ID when we emit a transaction. This cannot be guaranteed with fw_card.generation. Furthermore, the target's and initiator's node IDs can be obtained from fw_device and fw_card. Dereferencing their underlying topology objects is not necessary. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Verified in concert with subsequent memory barriers patch to fix 'giving up on config rom' issues on multiple system and drive combinations that were previously affected. Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-01-30 22:22:26 +01:00
Stefan Richter	14dc992aa7	firewire: fw-sbp2: try to increase reconnect_hold (speed up reconnection) Ask the target to grant 4 seconds instead of the standard and minimum of 1 second window after bus reset for reconnection. This accelerates reconnection if there are more than one targets on the bus: If a login and inquiry to one target blocks the fw-sbp2 workqueue for more than 1s after bus reset, we now still can reconnect to the other target. Before that, fw-sbp2's reconnect attempts would be rejected with "error status: 0:9" (function rejected), and fw-sbp2 would finally re-login. All those futile reconnect attemps cost extra time until the target which needs re-login is ready for I/O again. The reconnect timeout field in the login ORB doesn't have to be honored by the target though. I found that we could get up to - allegedly 32768s from an old OXFW911 firmware - 256s from LSI bridges - 4s from OXUF922 and OXFW912 bridges, - 2s from TI bridges, - only the standard 1s from Initio and Prolific bridges and from Apple OpenFirmware in target mode. We just try to get 4 seconds which already covers the case of a few HDDs on the same bus quite nicely. A minor drawback occurs in the following (rare and impractical) border case: - two initiators are there, initiator 1 holds an exclusive login to a target, - initiator 1 goes off the bus, - target refuses login attempts from initiator 2 until reconnect_hold seconds after bus reset. An alternative approach to the issue at hand would be to parallelize fw-sbp2's reconnect and login work. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jwilson@redhat.com>	2008-01-30 22:22:26 +01:00
Stefan Richter	4dccd020d7	firewire: fw-sbp2: skip unnecessary logout Don't attempt to send a logout ORB if the target was already unplugged or had its link switched off. If two targets are attached, this enhances the chance to quickly reconnect to the remaining target when one target is plugged out. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jwilson@redhat.com>	2008-01-30 22:22:26 +01:00

1 2 3

112 Commits