fibre_channel: fix crash when attempting to dereference invalid counters

The statistics counters of a disabled FC host cannot be read from sysfs
and any read attempt fails with errno ENOENT. Therefore, all statistics
counters returned by procfs are nil in such a case. This results in
a crash on s390x Linux when a zFCP host is disabled with
`chzdev -d <host id>`.

Crash stacktrace in a GDB
-------------------------

$ gdb /root/node_exporter/node_exporter
...
Thread 4 "node_exporter" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 35579]
0x00000000007be86c in github.com/prometheus/node_exporter/collector.(*fibrechannelCollector).Update (c=0xc00012fef0, ch=0xc0002fc8c0, ~r0=...)
    at /root/node_exporter/collector/fibrechannel_linux.go:133
133                     c.pushCounter(ch, "dumped_frames_total", *host.Counters.DumpedFrames, *host.Name)
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
(gdb) p host
$1 = {Name = 0xc00038faf0, Speed = 0xc00038fb00, PortState = 0xc00038fb10, PortType = 0xc00038fb30, SymbolicName = 0xc00038fba0, NodeName = 0xc00038fb40, PortID = 0xc00038fb50,
  PortName = 0xc00038fb60, FabricName = 0xc00038fb80, DevLossTMO = 0xc00038fb90, SupportedClasses = 0xc00038fbb0, SupportedSpeeds = 0xc00038fbd0, Counters = 0xc0001cb2d0}
(gdb) p host.Name
$2 = (string *) 0xc00038faf0
(gdb) p *host.Name
$3 = "host0"
(gdb) p *host.Counters.DumpedFrames
️ Cannot access memory at address 0x0
(gdb) p *host.Counters
$4 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0, RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0, InvalidTXWordCount = 0x0,
  LinkFailureCount = 0x0, LossOfSyncCount = 0x0, LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb) bt
    at /root/node_exporter/collector/fibrechannel_linux.go:133
    name="fibrechannel", c=..., ch=0xc0002fc8c0, logger=0xc0002104a0)
    at /root/node_exporter/collector/collector.go:160
(gdb) p *host.Counters
$5 = {DumpedFrames = 0x0, ErrorFrames = 0x0, InvalidCRCCount = 0x0, RXFrames = 0x0,
  RXWords = 0x0, TXFrames = 0x0, TXWords = 0x0, SecondsSinceLastReset = 0x0,
  InvalidTXWordCount = 0x0, LinkFailureCount = 0x0, LossOfSyncCount = 0x0,
  LossOfSignalCount = 0x0, NosCount = 0x0, FCPPacketAborts = 0x0}
(gdb)

Signed-off-by: Alexander Egorenkov <eaibmz@gmail.com>
pull/3573/head
Alexander Egorenkov 3 weeks ago
parent 9fd21e8122
commit 5bb3fb4b54
  1. 19
      collector/fibrechannel_linux.go

@ -129,7 +129,24 @@ func (c *fibrechannelCollector) Update(ch chan<- prometheus.Metric) error {
)...)
// Then the counters
// Note: `procfs` guarantees these a safe dereference for these counters.
// Note: `procfs` does not guarantee a safe dereference for these counters.
// A disabled host returns no statistics counters.
if host.PortState == nil || *host.PortState == "Unknown" {
host.Counters.DumpedFrames = new(uint64)
host.Counters.ErrorFrames = new(uint64)
host.Counters.InvalidCRCCount = new(uint64)
host.Counters.RXFrames = new(uint64)
host.Counters.RXWords = new(uint64)
host.Counters.TXFrames = new(uint64)
host.Counters.TXWords = new(uint64)
host.Counters.SecondsSinceLastReset = new(uint64)
host.Counters.InvalidTXWordCount = new(uint64)
host.Counters.LinkFailureCount = new(uint64)
host.Counters.LossOfSyncCount = new(uint64)
host.Counters.LossOfSignalCount = new(uint64)
host.Counters.NosCount = new(uint64)
host.Counters.FCPPacketAborts = new(uint64)
}
c.pushCounter(ch, "dumped_frames_total", *host.Counters.DumpedFrames, *host.Name)
c.pushCounter(ch, "error_frames_total", *host.Counters.ErrorFrames, *host.Name)
c.pushCounter(ch, "invalid_crc_total", *host.Counters.InvalidCRCCount, *host.Name)

Loading…
Cancel
Save