VPLEX: False positive events “The specified FRU is not present” & "The operational state of the specified fru is Faulted" generated for VS6 BBUs daily

           

   Article Number:     533159                                   Article Version: 2     Article Type:    Break Fix 
   

 


Product:

 

VPLEX VS6,VPLEX GeoSynchrony 6.0,VPLEX GeoSynchrony 6.1,VPLEX Series,VPLEX GeoSynchrony,VPLEX Local,VPLEX Metro

 

Issue:

 

 

On VS6 systems, call-homes "0x8a4861d7" and "0x8a4861d6" will be called home daily, where the BBU or power supply will be reported with “the state of the specified FRU is faulted”, and later the director will indicate the “the specified FRU is not present” (reported by the peer director).   
   
    Example:   
   
    <EventData><![CDATA[/engine/bbu-a0: PartNo 078-000-123-05 /SerialNo ACPHxxxxxx0168 /RevNo FFF :The operational state of the specified fru is Faulted. [Versions:MS\{D50.10.0.21.0, D50.10.0.16, D50.10.0.24}, Director\{7.5.125.21.0}, ClusterWitnessServer\{D50.10.0.24}] RCA: The state of the specified FRU is faulted Remedy: Contact EMC Customer Support.     
     
      <EventData><![CDATA[/engine/hyperion@a: :The specified FRU is not present. [Versions:MS\{D50.10.0.21.0, D50.10.0.16, D50.10.0.24}, Director\{7.5.125.21.0}, ClusterWitnessServer\{D50.10.0.24}] RCA: The specified FRU is not present. Remedy: Check if the correct FRU has been inserted in the slot. If the problem persists, contact EMC Customer Support.
   
        
    The directors’ zpem.logs can log the following kinds of events indicating the BBUs or power supplies are faulted and require replacement, even though the hardware is not actually faulted.  Replacement of the hardware will not resolve the events (and call-homes) in this case.   
   
    Mar 28 08:54:08 director-1-1-a zpem[4709]: CRITICAL: /engine/psa1 (AC7Nxxxxxx0633): output-over-voltage-fault is true     
      Mar 28 08:54:08 director-1-1-a zpem[4709]: CRITICAL: /engine/psa1 (AC7Nxxxxxx0633): output-under-voltage-fault is true     
      Mar 28 08:54:08 director-1-1-a zpem[4709]: CRITICAL: /engine/psa1 (AC7Nxxxxxx0633): general-fault is true     
      Mar 28 08:54:08 director-1-1-a zpem[4709]: CRITICAL: /engine/psa1 (AC7Nxxxxxx0633): SMS ERR: The operational state of the specified fru is Faulted.     
      Mar 28 08:54:21 director-1-1-a zpem[4709]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): internal-fault is true     
      Mar 28 08:54:21 director-1-1-a zpem[4709]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): SMS ERR: The operational state of the specified fru is Faulted.     
      Mar 28 08:54:56 director-1-1-a zpem[4709]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): internal-fault is true     
      Mar 28 08:54:56 director-1-1-a zpem[4709]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): SMS ERR: The operational state of the specified fru is Faulted.     
      Mar 28 08:55:18 director-1-1-a zpem[4709]: CRITICAL: /engine/bbu-a1 (ACPTxxxxxx0351): battery-ready is false     
      Mar 29 09:06:13 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): battery-enabled is false     
      Mar 29 09:06:13 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a0 (ACPTxxxxxx0236): battery-ready is false     
      Mar 29 09:06:14 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a1 (ACPTxxxxxx0351): battery-enabled is false     
      Mar 29 09:06:14 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a1 (ACPTxxxxxx0351): battery-ready is false     
      Mar 29 09:06:14 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a1 (ACPTxxxxxx0351): requires-replacement is True     
      Mar 29 09:06:14 director-1-1-a zpem[4759]: CRITICAL: /engine/bbu-a1 (ACPTxxxxxx0351): SMS ERR: The operational state of the specified fru is Faulted.
   
   
   
    The following events may also be seen frequently in the firmware and zpem logs regarding battery charging that occurs frequently:   
   
    128.221.252.68/xmmg/log:5988:W/"0060167328a822200":14548:<4>2018/03/12 18:28:16.81: ZPEM/210 /engine/bbu-b0: PartNo 078-000-123-05 /SerialNo ACPHxxxxxx0206 /RevNo FFF :battery-charging is true       
       
        128.221.253.68/xmmg/log:5988:W/"0060167328a822200":14549:<6>2018/03/12 18:31:34.45: ZPEM/66 /engine/bbu-b0: PartNo 078-000-123-05 /SerialNo ACPHxxxxxx0206 /RevNo FFF :battery-fully-charged is true
   
   
    Note, regarding these events the BBU is expected to cycle from charged to charging often. This will be seen on every VS6.  Reference KBA 531093:  VPLEX: BBU's are constantly cycling from "Charging" to "Fully-Charged"
                                                           

 

 

Cause:

 

 

An internal component of the GeoSynchrony code is responsible for monitoring the hardware, and it does this through regular polling.  Some hardware monitoring that involved remote polling between A/B director pairs was done using a very slow link, which contributed to some polling cycles being missed.    
    The missed polling cycles resulted in the component perceiving that contact was lost with the hardware, and that it was missing (when it actually wasn’t).  This triggered the events that “the FRU is not present” & “the state of the specified FRU is faulted.”
                                                           

 

 

Resolution:

 

 

This issue is fixed in GeoSynchrony 6.1 Patch 1, it is recommended to upgrade to this release to resolve this issue.   
    Some hardware monitoring that involved remote polling between A/B director pairs was removed (the information being removed is still gathered locally and wasn't deemed necessary to have on the remote director), which prevents polling cycles from being missed and triggering these false-positive events.