XtremIO: Storage Controller Memory Health Status Call Home Alerts (XTR0403302, XTR0403303, XTR0403304, aND XTR0403305 and XTR0408502) (Dell EMC Correctable)

           

   Article Number:     503012                                   Article Version: 13     Article Type:    Break Fix 
   

 


Product:

 

XtremIO HW Gen2 400GB,XtremIO HW Gen2 400GB Encrypt Capbl,XtremIO HW Gen2 400GB Encrypt Disable,XtremIO HW Gen2 400GB Exp Encrypt Disable,XtremIO HW Gen2 400GB Expandable,XtremIO HW Gen2 800GB Encrypt Capbl,XtremIO HW Gen2 800GB Encrypt Disable

 

Issue:

 

 

   
   
    The following alerts are raised related to DIMM issues:                                                                                                                                                                               

Alert CodeAlert NameAlert TextComment
 0403302node_dimm_level_2_unknownStorage Controller DIMM memory card information is unavailable. 
 0403303node_dimm_level_3_warningMemory card (DIMM) health fault. Supported in XIOS 6.1 and later
 0403304node_dimm_level_4_minorMemory card (DIMM) health fault.By default, disabled in XMS 6.1 and later
 0403305node_dimm_level_5_majorMemory card (DIMM) health fault.              

              In XIOS 4.0.10-33 and later, the storage controller will be disabled           

              
 0408502node_add_sc_failure_not_enough_memoryStorage Controller wasn't added to the Cluster due to insufficient memory, probably because of a Memory Module (DIMM) Uncorrectable error.Supported in XIOS 6.1 and later
   
   
    The Storage Controller will be automatically deactivated when there is a condition of very high DIMM Correctable Errors, as reported by node_dimm_level_5_major (XTR0403305).                                                           

 

 

Cause:

 

 

Internal daily thresholds are set for the DIMM Modules. When the relevant thresholds are reached or system detects some other DIMM issue, the alerts 0403303, 0403304, 0403305, 0403306 and 0408502 are triggered.    
    The alert 0408502 is raised if the cluster is unable to read the DIMM status. This may be due to a transient timing issue.   
   
    In XtremIO version 2.4.0 (or earlier), this may not be a problem as the daily alert memory error count threshold is very low. In this case, analysis of the dossier is required to confirm this.   
   
    However, in XtremIO version 2.4.1 (or later), the memory error count threshold was adjusted.  
                                                           

 

 

Resolution:

 

 

                                             

The alerts 0403302 and 0403304 may be caused by a transient condition. If they appear only once (e.g. for a single day) and are cleared they can be ignored.            
               The alert 0403304 detects Correctable Errors crossing a low threshold. The alert 0403304  is a benign condition and can be ignored.            
                          
               For all other cases please contact a member of EMC Global Technical Support for assistance.            
                          
               When contacting EMC Global Technical Support for assistance, ensure that you have a fresh XtremIO log bundle (dossier) available for attachment to the Service Request (SR) to expedite the review process. To collect an XtremIO log bundle, follow the instructions in KB 334928 - How to collect XtremIO log bundle for analysis by EMC Global Technical Support.           
                          
               Note: Resolution of the issue may require EMC Global Technical Support to dispatch a field engineer to the site (if there is no field engineer on-site already).
   
                                                                

 

 

Notes:

 

 

                                             

An automatic EMC Global Services Service Request (SR) will be opened when the conditions of the alert codes XTR0403303, XTR0403305 and XTR0408502 are detected.             
               For X2 clusters an SR will be opened for XTR0403304. For X1 clusters the condition causing  XTR0403304 to be raised has been determined to be safe.