Dell EMC VxRack SDDC:  One or more PCPUs didn't perform a heartbeat check for 7 seconds

           

   Article Number:     533496                                   Article Version: 4     Article Type:    Break Fix 
   

 


Product:

 

VxRack SDDC 14G-1

 

Issue:

 

 

Dell EMC VxRack SDDC:  One or more PCPUs didn't perform a heartbeat check for 7 seconds   
   
    PCPU   
   
    VxRack SDDC 5.1.2   
    Customer seeing multiple servers reporting these messages, and usually causes latency in the VSAN.   
   
   
    VMKernel.log   
    2019-02-27T02:04:33.891Z cpu0:121761)WARNING: Heartbeat: 498: One or more PCPUs didn't perform a heartbeat check for 7 seconds.   
    2019-02-27T02:06:38.049Z cpu0:97381)WARNING: Heartbeat: 498: One or more PCPUs didn't perform a heartbeat check for 7 seconds.   
    2019-02-27T02:08:00.822Z cpu0:66618)WARNING: Heartbeat: 498: One or more PCPUs didn't perform a heartbeat check for 7 seconds.   
    2019-02-27T02:08:42.208Z cpu0:67130)WARNING: Heartbeat: 498: One or more PCPUs didn't perform a heartbeat check for 7 seconds.   
    2019-02-27T02:10:04.980Z cpu0:67959)WARNING: Heartbeat: 498: One or more PCPUs didn't perform a heartbeat check for 7 seconds.
                                                           

 

 

Cause:

 

 

PowerEdge 14G BIOS issue.   
    https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=MGGKF&osCode=wst14&productCode=poweredge-r640   
   
    BIOS takes a long time to handle correctable memory errors. The new BIOS will respond to correctable memory errors in a better way. We believe at the moment the server BIOS is not handling the ECC error, and waiting. 
                                                           

 

 

Resolution:

 

 

Dell EMC recommends upgrading the RCM to 5.1.4 which includes BIOS 1.4.8.    
   
     
                                                           

 

 

Notes:

 

 

Note there are 2 x fixes in this BIOS update that Dell HW Engineering believe resolves this issue.   
   
    1. Enhanced the Correctable ECC memory error detection and logging feature. (2.1.6 and forward)   
    2. Updated the Intel Xeon Processor Scalable Family Processor Microcode to version 0x43.