XtremIO: Storage Controller InfiniBand Port call home alerts (XTR0401702-6, XTR0401902-3, XTR0402102-6 and XTR0402302-3) (Dell EMC Correctable)

           

   Article Number:     503060                                   Article Version: 8     Article Type:    Break Fix 
   

 


Product:

 

XtremIO HW Gen2 400GB,XtremIO HW Gen2 400GB Encrypt Capbl,XtremIO HW Gen2 400GB Encrypt Disable,XtremIO HW Gen2 400GB Exp Encrypt Disable,XtremIO HW Gen2 400GB Expandable,XtremIO HW Gen2 800GB Encrypt Capbl,XtremIO HW Gen2 800GB Encrypt Disable

 

Issue:

 

 

The following alerts are created when a Storage Controller (SC) InfiniBand (IB) port experiences issues:   
                                                                                                                                                                                                                                                                                                                                                                                                                              

Alert NameSymptom CodeDescription
node_ib1_level_2_unknownXTR0401702InfiniBand port %(ib_port_index): link status cannot be determined
node_ib2_level_2_unknownXTR0402102
node_ib1_port_unknownXTR0401903Storage Controller InfiniBand port %(ib_port_index) state cannot be determined
node_ib2_port_unknownXTR0402303
node_ib1_level_3_warningXTR0401703InfiniBand port %(ib_port_index): link status is not healthy. The port state is %(ib2_port_state)s
node_ib2_level_3_warningXTR0402103
node_ib1_level_4_minorXTR0401704
node_ib2_level_4_minorXTR0402104
node_ib1_level_5_majorXTR0401705
node_ib2_level_5_majorXTR0402105
node_ib1_level_6_criticalXTR0401706
node_ib2_level_6_criticalXTR0402106
node_ib1_port_downXTR0401902Storage Controller InfiniBand port %(ib_port_index) is down
node_ib2_port_downXTR0402302
   
    For any of the listed alerts (with the exception of symptom codes XTR0401702,  XTR0401902, XTR0402102 and XTR0402303), the issue should be resolved as soon as possible, since sub-optimal IB port health may impact cluster performance.   
   
    For symptom codes XTR0401702,  XTR0401902, XTR0402102 and XTR0402303, the issue should be resolved as soon as possible, since overall cluster impact is only limited as long as the other IB port on the problematic SC is up and running with a healthy link.                                                           

 

 

Cause:

 

 

Each of the listed alerts is caused by the following:   
                                                                                                                                                                                                                                                                                                                                                                                                                              

Alert NameSymptom CodeDescription
node_ib1_level_2_unknownXTR0401702Platform manager (PM) may not be able to read the IB port status
node_ib2_level_2_unknownXTR0402102
node_ib1_port_unknownXTR0401902
node_ib2_port_unknownXTR0402303
node_ib1_level_3_warningXTR0401703              

              Hardware failure of IB cable or IB port on SC or switch (if applicable), or false-positive alert           

              
node_ib2_level_3_warningXTR0402103
node_ib1_level_4_minorXTR0401704
node_ib2_level_4_minorXTR0402104
node_ib1_level_5_majorXTR0401705
node_ib2_level_5_majorXTR0402105
node_ib1_level_6_criticalXTR0401706
node_ib2_level_6_criticalXTR0402106
node_ib1_port_downXTR0401902
node_ib2_port_downXTR0402302
                                                             

 

 

Resolution:

 

 

Check the health status and error counters reported for both IB ports of the affected SC (X1-SC1 in this example) by executing the following XMCLI commands:   

xmcli (admin)> show-storage-controllers-infiniband-ports   Name     Index  Port-Index     Peer-Type       Port-In-Peer-Index  Link-Rate-In-Gbps  Port-State  Storage-Controller-Name   Index  Brick-Name   Index    Cluster-Name   Index   Health-Level     Health-State  Enabled-StateX1-SC1-IB1    1         1      StorageController          1                  40              up              X1-SC1              1        X1         1       cluster-1       1     level_1_clear      healthy       enabledX1-SC1-IB2    2         2      StorageController          2                  40              up              X1-SC1              1        X1         1       cluster-1       1     level_1_clear      healthy       enabledX1-SC2-IB1    3         1      StorageController          1                  40              up              X1-SC2              2        X1         1       cluster-1       1     level_1_clear      healthy       enabledX1-SC2-IB2    4         2      StorageController          2                  40              up              X1-SC2              2        X1         1       cluster-1       1     level_1_clear      healthy       enabledxmcli (admin)> show-storage-controllers-infiniband-error-counters   Name       Port-Index   Cluster-Name   Index  Storage-Controller-Name  Index  Symb-Errs  Recovers  Lnk-Downed  Rcv-Errs  Rmt-Phys-Errs  Integ-Errs  Buffer-Overrun-Errs  Last-Day-Num-Faults  Last-Day-Num-ProblemsX1-SC1-IB1         1        cluster-1       1            X1-SC1             1        0         0           0         0            0             0              0                     0                    0X1-SC1-IB2         2        cluster-1       1            X1-SC1             1        0         0           0         0            0             0              0                     0                    0X1-SC2-IB1         1        cluster-1       1            X1-SC2             2        0         0           0         0            0             0              0                     0                    0X1-SC2-IB2         2        cluster-1       1            X1-SC2             2        0         0           0         0            0             0              0                     0                    0    
    Ensure that the values of the Port-StateHealth-Level and Health-State fields in the generated output are displayed as uplevel_1_clear and healthy respectively, and that all error counters are displayed as 0 for both IB ports of the affected SC. If this is not the case, then contact a member of XtremIO Global Technical Support for assistance.   
   
    When contacting XtremIO Global Technical Support for assistance, ensure that you have a fresh XtremIO log bundle (dossier) available for attachment to the Service Request (SR) to expedite the review process. To collect an XtremIO log bundle, follow the instructions in KB 334928 - How to collect XtremIO log bundle for analysis by EMC Global Technical Support.   
   
    Note: Resolution of the issue may require XtremIO Global Technical Support to dispatch a field engineer to the site (if there is no field engineer on-site already)                                                           

 

 

Notes:

 

 

                                             

An automatic Dell EMC Global Services Service Request (SR) will be generated for symptom codes XTR0401706, XTR0401902, XTR0402106 and XTR0402302 of this article.