Fix Fibre fc_host ports status "Linkdown"

I have two older servers. They are up and running with Ubuntu 20.04 LTS. Now I want to connect an HP 1040 SAN storage to it. ISCSI over TCP-Network seems to be not supportet by the storage, so i tried via Fibre Connection. But the HBA Fibre-Ports of the server seem offline:

cmd:

more /sys/class/fc_host/host?/port_state

result:

::::::::::::::
/sys/class/fc_host/host3/port_state
::::::::::::::
Linkdown
::::::::::::::
/sys/class/fc_host/host4/port_state
::::::::::::::
Linkdown

The are connected to the storage and I switched ports numberous times. Drivers are there and seemingly working (system report below). I have no clue why the are down. I have seen some people have the driver: qla2xxx for QLogic Adapters. Should I change the driver? And if yes how?

How can I "activate" the FC-Connection?

Regards Ari

Information about the HBA's:

cmd:

sudo lspci -v

result:

0e:00.0 Fibre Channel: Cavium QLogic 425/825/42B/82B 4Gbps/8Gbps PCIe dual port FC HBA (rev 01)
        Subsystem: Hewlett-Packard Company 82B 8Gbps dual port FC HBA
        Physical Slot: 3
        Flags: bus master, fast devsel, latency 0, IRQ 54
        Memory at fbfe0000 (64-bit, non-prefetchable) [size=128K]
        Memory at fbfd0000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbf00000 [virtual] [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable+ Count=24 Masked-
        Capabilities: [60] Express Endpoint, MSI 1e
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [180] Power Budgeting <?>
        Kernel driver in use: bfa
        Kernel modules: bfa

0e:00.1 Fibre Channel: Cavium QLogic 425/825/42B/82B 4Gbps/8Gbps PCIe dual port FC HBA (rev 01)
        Subsystem: Hewlett-Packard Company 82B 8Gbps dual port FC HBA
        Physical Slot: 3
        Flags: bus master, fast devsel, latency 0, IRQ 68
        Memory at fbfa0000 (64-bit, non-prefetchable) [size=128K]
        Memory at fbf90000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable+ Count=24 Masked-
        Capabilities: [60] Express Endpoint, MSI 1f
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [180] Power Budgeting <?>
        Kernel driver in use: bfa
        Kernel modules: bfa

cmd:

sudo systool -c fc_host -v

result:

  Class Device = "host3"
  Class Device path = "/sys/devices/pci0000:00/0000:00:07.0/0000:0e:00.0/host3/fc_host/host3"
    active_fc4s         = "0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
    dev_loss_tmo        = "60"
    fabric_name         = "0x0"
    issue_lip           = <store method only>
    max_npiv_vports     = "255"
    maxframe_size       = "0 bytes"
    node_name           = "0x20000024ff887dde"
    npiv_vports_inuse   = "0"
    port_id             = "0x000000"
    port_name           = "0x21000024ff887dde"
    port_state          = "Linkdown"
    port_type           = "Unknown"
    speed               = "unknown"
    supported_classes   = "Class 3"
    supported_fc4s      = "0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
    supported_speeds    = "2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name       = "QLogic-825 | 3.2.25.1 |  |  | "
    tgtid_bind_type     = "wwpn (World Wide Port Name)"
    uevent              = 
    vport_create        = <store method only>
    vport_delete        = <store method only>

    Device = "host3"
    Device path = "/sys/devices/pci0000:00/0000:00:07.0/0000:0e:00.0/host3"
      uevent              = "DEVTYPE=scsi_host"

  Class Device = "host4"
  Class Device path = "/sys/devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host4/fc_host/host4"
    active_fc4s         = "0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
    dev_loss_tmo        = "60"
    fabric_name         = "0x0"
    issue_lip           = <store method only>
    max_npiv_vports     = "255"
    maxframe_size       = "0 bytes"
    node_name           = "0x20000024ff887ddf"
    npiv_vports_inuse   = "0"
    port_id             = "0x000000"
    port_name           = "0x21000024ff887ddf"
    port_state          = "Linkdown"
    port_type           = "Unknown"
    speed               = "unknown"
    supported_classes   = "Class 3"
    supported_fc4s      = "0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 "
    supported_speeds    = "2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name       = "QLogic-825 | 3.2.25.1 |  |  | "
    tgtid_bind_type     = "wwpn (World Wide Port Name)"
    uevent              = 
    vport_create        = <store method only>
    vport_delete        = <store method only>

    Device = "host4"
    Device path = "/sys/devices/pci0000:00/0000:00:07.0/0000:0e:00.1/host4"
      uevent              = "DEVTYPE=scsi_host"

I used to be a storage engineer working for a storage vendor; if you were a storage admin with my company's storage array hardware, I'm the guy my company would send to fix any problems that stumped the storage admin.

Link down means no FC protocol. There may be light going back and forth, but the two sides are not talking.

If you have a loopback adapter, you can quickly test the next few items in under a minute. Put the loopback adapter on one end of the cable and the device at the other end should show link up. Test with the HBA and the switch/storage array. Look at the error stats and they should not increment; you will get a burst of errors during link negotiation when you plug in the cable but they should stop. If they don't, you have a bad cable.

Without a loopback adapter, it's a bit more work. Start troubleshooting by checking the following:

  • WARNING: do not look at the laser transmitter in the SFP. Most of the light is invisible; just because its not a bright visible light doesn't mean it's harmless.
  • Is the HBA disabled? Use the HBA configuration utility to verify.
  • Is the switch port disabled? Check port status on the switch/storage array.
  • Is the cable the right polarity? TX (transmit) -> RX (receive).

Good reference in LC cable polarity is here.

You want an A to B straight through cable. The naming doesn't sound sensible but a picture helps:

_

Looking at laser light can seriously damage your eyes, especially since you can't see most of the light. Shine a flashlight (only works for relatively short cables) or trace the cable or flip one end. Depending on the SFP type, there's enough power to go 80km down a the fiber cable--you don't want anything close to that power entering your eyes.

  • Is the cable too long for the SFP type (SW or LW) and speed (8/4/2/1GB)?

Check SFP type at both ends. SFP modules use a black handle/molding to represent a shortwave (SW) laser transmitter. Blue for longwave (LW). Both sides need to be the same type otherwise they can't talk to each other.

Google the part numbers of the SFP. Ethernet SFPs wile not work for FC.

Both sides are using the right kind of laser light, that means they can talk at each other. When they see each other's light, they will talk at each other and negotiate link settings. Typically the server side is set to "auto" speed negotiation and the switch is set to a fixed speed.

Go set the link speed to 2G on the switch/storage array and put your HBA in auto mode.

Check cable type. 62.5nm or 50nm written on the cable. Here's an excellent reference for FC cables. Write this down.

2G SW with 62.5nm cables goes almost 500ft. With a LW SFP or 50nm cable, you'll get longer distance. You've already set the connection to 2G and cable length is no longer a concern unless you're using a 500 ft run of fiber cable. And it skips fillword issues; 4G and higher you need compatible fillword settings on both sides.

Now the switch and HBA will show "syncing", "negotiating", or something other than "no link" and will go back to no link after negotiation fails. Some FC devices try once and, if negotiating fails, will not try again until loss of light is detected. Resetting the HBA will cause at least one negotiation attempt but it's easier to tell and intern to unplug and plug a cable (less typing). The HBA utility should have a way to reset which will force renegotiation.

    watch -n1 "sudo systool -c fc_host -v | grep port_"

Now watch and see if anything changes when someone else unplugs and plugs the cable on the server's HBA. If it changes, you don't have a cable polarity problem. If it doesn't change, reverse the cable polarity and try again. If it still doesn't change status, cable is bad, get a new cable.

Atthis point, physical link is compatible and correctly plugged in. Something should change. If nothing changes, get a new cable and start over.

Another potential issue is port topology. HBA will go through link setting negotiation but may fail to link up due to incompatible topologies. Usual choices are fabric, loop, and point-to-point (sometimes abbreviated as P2P or PtP). Fabric is a connection to a FC switch, loop to a 1G FC hub (do not use, very obsolete), and PtP which is direct connect. Don't pick loop; it was rare in 2007 and should be extinct in 2021. Both sides of the link have to be using the same topology.

These steps should get you HBA into a link up state. Good luck!