VMware ESXi: #PF Exception 14 in world vmm0

           

   Article Number:     536862                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

PowerPath/VE for VMware 6.3,PowerPath/VE for VMware 6.4,PowerPath/VE for VMware 6.5,PowerMaxOS 5978

 

Issue:

 

 

Unexpected VMware ESXi server panic with no apparent trigger event   
   
    OS: VMware ESXi 6.5.0 build   
    DellEMC SW: PowerPath/VE 6.3 (build 105)   
    DellEMC SW: PowerPath/VE 6.4 (build 103)   
    DellEMC SW: PowerPath/VE 6.5 (build 110)   
    DellEMC HW: Symmetrix    
   
    PowerPath/VE and Symmetrix running 5978 code or higher is required in order to be exposed to this issue.   
   
    2019-06-11T05:56:03.906Z cpu23:47993633)@BlueScreen: #PF Exception 14 in world 47993633:vmm0:FRAJXSA IP 0x418024500c9a addr 0x410006dcffc4       
        PTEs:0x8000853023;0x800082e023;0x80008a0023;0x0;       
        2019-06-11T05:56:03.907Z cpu23:47993633)Code start: 0x418024200000 VMK uptime: 69:00:05:14.200       
        2019-06-11T05:56:03.907Z cpu23:47993633)0x43941909bb50:[0x418024500c9a]Sched_SysServiceDone@vmkernel#nover+0x8a stack: 0x439dcb2afe80       
        2019-06-11T05:56:03.907Z cpu23:47993633)0x43941909bbb0:[0x4180245360ce]SCSICompleteAdapterCommand@vmkernel#nover+0x152 stack: 0x410006dd0040       
        2019-06-11T05:56:03.908Z cpu23:47993633)0x43941909bc30:[0x418024b69a09]SCSILinuxWorldletFn@com.vmware.driverAPI#9.2+0x3f1 stack: 0x4180242d1a38       
        2019-06-11T05:56:03.908Z cpu23:47993633)0x43941909bd90:[0x418024326ea8]WorldletBHHandler@vmkernel#nover+0x478 stack: 0x0       
        2019-06-11T05:56:03.909Z cpu23:47993633)0x43941909bef0:[0x4180242b1cb0]BH_DrainAndDisableInterrupts@vmkernel#nover+0x100 stack: 0x0       
        2019-06-11T05:56:03.909Z cpu23:47993633)0x43941909bf80:[0x418024319e66]VMMVMKCall_Call@vmkernel#nover+0x196 stack: 0x43941909bfec       
        2019-06-11T05:56:03.910Z cpu23:47993633)0x43941909bfe0:[0x41802434b8a2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802434b894       
        2019-06-11T05:56:03.913Z cpu23:47993633)base fs=0x0 gs=0x418045c00000 Kgs=0x0       
       
       
        2019-06-24T08:43:40.022Z cpu17:169970)@BlueScreen: #PF Exception 14 in world 169970:vmm0:FRAWINE IP 0x41802f30155a addr 0x410006d6ffc4       
        PTEs:0x8000053023;0x800002b023;0x800009e023;0x0;       
        2019-06-24T08:43:40.023Z cpu17:169970)Code start: 0x41802f000000 VMK uptime: 6:00:01:30.899       
        2019-06-24T08:43:40.023Z cpu17:169970)0x43923f91bd30:[0x41802f30155a]Sched_SysServiceDone@vmkernel#nover+0x8a stack: 0xfc40a085       
        2019-06-24T08:43:40.023Z cpu17:169970)0x43923f91bd90:[0x41802f126e31]WorldletBHHandler@vmkernel#nover+0xe1 stack: 0x418042800c00       
        2019-06-24T08:43:40.024Z cpu17:169970)0x43923f91bef0:[0x41802f0b1db0]BH_DrainAndDisableInterrupts@vmkernel#nover+0x100 stack: 0x0       
        2019-06-24T08:43:40.024Z cpu17:169970)0x43923f91bf80:[0x41802f11a186]VMMVMKCall_Call@vmkernel#nover+0x196 stack: 0x43923f91bfec       
        2019-06-24T08:43:40.025Z cpu17:169970)0x43923f91bfe0:[0x41802f14b8a2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802f14b894       
        2019-06-24T08:43:40.028Z cpu17:169970)base fs=0x0 gs=0x418044400000 Kgs=0x0
   
   
     
                                                           

 

 

Cause:

 

 

VMware Engineering has determined that this PSOD is caused by a preemption anomaly resulting in random context panicking in SchedSysServiceContextPut().   
    PowerPath/VE for VMware 6.3, 6.4 & 6.5 has a bug in it's app finger printing feature which can cause a preemption anomaly.
                                                           

 

 

Resolution:

 

 

While troubleshooting this issue internally a PowerPath/VE bug was discovered which relates to the app finger printing feature.  While we cannot be 100% certain it is the cause of the panic seen by the customer, as a precaution we are recommending any customer that has experienced this type panic to disable the app finger printing feature.     
   
    Workaround: Disable app finger printing.   
   
    Fix: Upgrade to PowerPath/VE 6.6 which is currently available for download on the support site.
                                                           

 

 

Notes:

 

 

Below are the rpowermt commands to display & disable the "app finger printing" feature   
    To verify if the feature is enabled:   
   
    # rpowermt display options host=<ESXi host name/IP>       
       
                Show CLARiiON LUN names:      true       
       
                Path Latency Monitor: Off       
       
                Performance Monitor: disabled       
       
                Autostandby:  IOs per Failure (iopf): enabled       
                              iopf aging period     : 1 d       
                              iopf limit            : 6000       
       
                Storage       
                System Class  Attributes       
                ------------  ----------       
                Symmetrix     periodic autorestore = on       
                              reactive autorestore = on       
                              auto host registration = enabled       
                              app finger printing = enabled       
                              device to array performance report = enabled       
                              device in use to array report = enabled
   
   
    To turn off the feature:   
   
    # rpowermt set app_finger_printing=off host=<ESXi host name/IP>   
   
    To verify if the feature is disabled:   
   
    # rpowermt display options host=<ESXi host name/IP>       
       
                Show CLARiiON LUN names:      true       
       
                Path Latency Monitor: Off       
       
                Performance Monitor: disabled       
       
                Autostandby:  IOs per Failure (iopf): enabled       
                              iopf aging period     : 1 d       
                              iopf limit            : 6000       
       
                Storage       
                System Class  Attributes       
                ------------  ----------       
                Symmetrix     periodic autorestore = on       
                              reactive autorestore = on       
                              auto host registration = enabled       
                              app finger printing = disabled       
                              device to array performance report = enabled       
                              device in use to array report = enabled
   
   
    Please note:   
    1. This feature enable/disable doesn’t require any maintenance activity on ESXi hosts and it is persistent across reboots.   
    2. There are no changes required on the array side associated with this feature.