The ESXi host(s) loses connection to the VMFS datastore for a short period of time. Any VM's on the datastore may crash or have IO errors during this.
Due to an ATS (Atomic Test & Set) miscompare on an VMFS HeartBeat slot the ESXi host attempts to regain control of the device. To do this the host issues a SCSI device reset on the LUN holding the VMFS.
All active IO on this LUN will be aborted and the SCSI device will be reset. A temporary loss in connectivity will show up in the VMKernel logs.
ATS Miscompare can happen both with NMP and PowerPath.
You see error messages indicating an ATS miscompare similar to this in /var/log/vmkernel.log:
2015-11-20T22:12:47.194Z cpu13:33467)ScsiDeviceIO: 2645: Cmd(0x439dd0d7c400) 0x89, CmdSN 0x2f3dd6 from world 3937473 to dev "naa.50002ac0049412fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
You may also see:
Hosts disconnecting from vSphere vCenter.
Virtual machines hanging on I/O operations.
Recommended workaround is to disable the VAAI ATS heartbeat mechanism.See VMware KB 2113956 for more information
|YOU MAY ALSO BE INTERESTED IN THE FOLLOWING CONTENTS FOR VNX:|