VNX: Random temporary loss of connection and/or performance degradation on ESXi hosts from version 5.5 u2 and later

Environment:

VNX5100

VNX5200

VNX5300

VNX5400

VNX5500

VNX5600

VNX5700

VNX5800

VNX7500

VNX7600

VNX8000

 

 

Description:

The ESXi host(s) loses connection to the VMFS datastore for a short period of time. Any VM's on the datastore may crash or have IO errors during this.

Due to an ATS (Atomic Test & Set) miscompare on an VMFS HeartBeat slot the ESXi host attempts to regain control of the device.  To do this the host issues a SCSI device reset on the LUN holding the VMFS.

All active IO on this LUN will be aborted and the SCSI device will be reset.  A temporary loss in connectivity will show up in the VMKernel logs.

ATS Miscompare can happen both with NMP and PowerPath.

You see error messages indicating an ATS miscompare similar to this in /var/log/vmkernel.log:

2015-11-20T22:12:47.194Z cpu13:33467)ScsiDeviceIO: 2645: Cmd(0x439dd0d7c400) 0x89, CmdSN 0x2f3dd6 from world 3937473 to dev "naa.50002ac0049412fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.

 

You may also see:

Hosts disconnecting from vSphere vCenter.

Virtual machines hanging on I/O operations.

 

 

 

Resolution:

Recommended workaround is to disable the VAAI ATS heartbeat mechanism.See VMware KB 2113956  for more information

For a detailed  resolution please refer to EMC Support Solution 463284 https://support.emc.com/kb/463284

 

YOU MAY ALSO BE INTERESTED IN THE FOLLOWING CONTENTS FOR VNX:


Top Services Topics

Video Playlist

p