VPLEX: Slow performance on VPLEX with a workload consisting of medium to large outstanding queue depth of large block reads

Environment:

VPLEX Geo, V

PLEX Local,

VPLEX Metro,

VPLEX Series,

VPLEX VS1,

VPLEX VS2,

VPLEX GeoSynchrony 5.2,

VPLEX GeoSynchrony 5.2 Patch 1,

VPLEX GeoSynchrony 5.2 Service Pack 1,

VPLEX GeoSynchrony 5.2 Service Pack 1 Patch 1,

VPLEX GeoSynchrony 5.2 Service Pack 1 Patch 2,

VPLEX GeoSynchrony 5.2 Service Pack 1 Patch 3,

VPLEX GeoSynchrony 5.3,

VPLEX GeoSynchrony 5.3 Patch 1,

VPLEX GeoSynchrony 5.3 Patch 2,

VPLEX GeoSynchrony 5.3 Patch 3,

VPLEX GeoSynchrony 5.4,

VPLEX GeoSynchrony 5.4 Service Pack 1,

VPLEX GeoSynchrony 5.3 Patch 4

 

Description:

 

Host applications may experience performance impact (high I/O response times) due to high Front End (FE) read and write latency on VPLEX that does not follow the Back End (BE) latency (or BE latency + WAN COM latency in the case of a Metro or Geo).  Typically FE read average latency is more heavily impacted, but FE write average latency can be impacted as well.

This issue has always been seen in environments with large block READ I/O patterns. This is typical of SQL servers, Oracle DBs and backup applications.

 

1] Users will report poor application performance, high I/O response times, high latency, and potentially even Data Unavailability (DU) due to performance degradation.

 

2] Hosts may issue SCSI TMFs such as Abort Tasks and Logical Unit Resets when slow I/O processing occurs, which will be logged in the VPLEX/var/log/VPlex/cli/firmware.log on the VPLEX management-server with stdf/10 events. 

 

The first example below shows a host abort (Abort Task) with a very high taskElapsedTime  of ~11 seconds for a write I/O (the 2a in 2a00000000000605: ).  The x5 in 2a00000000000605:  indicates the I/O status is waiting on data transfer to the host, which is one of the markers of this issue.  The second example shows a LUN reset (Logical Unit Reset) on an ITL (Initiator-Target-Lun), which will always have an I/O status of  0:0 .

 

Example output:

 

stdf/10 Scsi Tmf [Abort Task] on fcp ITLQ: [hba0_0 (0x20000025b503xxxx) B0-FC00 (0x50001442b04dxxxx) 0x3000000000000 0x54a3] vol vvol-MGMT04 taskElapsedTime(usec)10994420 dormantQCnt 0 enabledQCnt 5 status 2a00000000000605:400a800100000

 

stdf/10 Scsi Tmf [Logical Unit Reset] on fcp ITLQ: [200_b (0x10000000c97bxxxx) A0-FC01 (0x50001442a02cxxxx) 0x0 0xffffffffffffffff] vol <0x0000000000000000> taskElapsedTime(usec) 0 dormantQCnt 0 enabledQCnt 0 status 0:0


Resolution:

The number of I/O operations outstanding to a storage target by a host is dependent upon the hosts queue depth setting. It's the maximum number of concurrent or queued I/O operations a host port (HBA, CNA, etc.) will put onto the wire "in-flight" at once.


Finding the best value may be difficult. Sometimes the default values provided in the VPLEX Host Connectivity Guides are reasonable, yet other times they are may not be.

 

Host Connectivity Guides are available from EMC online support, look for the Host Configuration Guide that applies to the host in question. Under the VPLEX section of each Host Connectivity Guide, EMC has listed the recommended queue depth to be used for the different supported hosts.

For a detailed step by step resolution please refer to EMC Support Solution 335182 https://support.emc.com/kb/335182


YOU MAY ALSO BE INTERESTED IN THE FOLLOWING CONTENTS FOR VPLEX:


Top Services Topics

.p