I recently completed a POC for a data warehouse on SQL Server using XtremIO.  One of the requirements of the POC was to use PCIe flash for TempDB.  PCIe flash is fast.  It is located in the server and is just inches away from the CPU.  But there are limitations.  Here are some stats I found for PCIe flash:

 

Read bandwidth (GBps)

2.7

Write bandwidth (GBps)

2.1

Random read operations at 4-KB block size (IOPS)

285,000

Random write operations at 4-KB block size (IOPS)

385,000

Read latency (microseconds)

92

Write latency (microseconds)

15

 

When looking at IOPs, those are some really big numbers.  Even the bandwidth numbers are fast.  But something that was left off of the chart was the IO size used to produce those bandwidth numbers.  After some searching, it was identified that a 1MB IO size was used.  Most workloads don't generate a 1MB IO, what happens when the IO size is smaller?

 

There were two different databases used for testing during the POC.  One database was about 1.1TB in size and contained three tables.  The second database was built from the first database but artificially expanded by 10x.  The second database therefore was 11TB and the three tables had billions of records in each.

 

A batch of queries were run against the two databases at various user loads.  These queries were not typical queries.  They were created to stress test the entire system.  Many of these queries were doing multi table joins and some of the result sets on the larger database were in the hundreds of millions.  In addition to the large results sets, many of the queries were using GROUP BY which caused a lot of TempDB usage, sometimes moving terabytes of data in and out of TempDB.

 

As I was monitoring the performance of the server, I was witnessing response times greater than 200ms on the volume used for TempDB.  That was not something I had expected.  Bandwidth on the volume was ~1.9GB/s read and ~1.6GB/s write.  The average IO size was 64k.  Using a smaller IO size may have had an impact on the max bandwidth the PCIe flash could generate.  I have witnessed XtremIO achieving higher bandwidth at those IO sizes.  Could moving TempDB to XtremIO improve performance?

 

The XtremIO array in my environment is made up of two 10TB X-Bricks.  Each X-Brick has four 8Gb FC ports.  There are also four 10Gb iSCSI ports on each X-Brick, but they were not used for this POC.  A single 8Gb FC port can provide 800MB/s of bandwidth.  Each X-Brick is capable of 3.2GB/s of bandwidth.  The XtremIO array in my environment can do 6.4GB/s.  6.4GB/s is more bandwidth than what the PCIe flash is rated for.  But since the XtremIO array is external, can the server get the bandwidth out to the array?

 

The server being used for this POC had 5 PCIe Gen 3 slots.  The PCIe flash utilized a Gen 2 x8 interface.  Theoretical max bandwidth utilizing that interface is 4GB/s.  The server also had 2 dual ported 16Gb FC HBAs.  The 16Gb FC HBAs utilized a Gen 3 x8 interface.  Theoretical bandwidth is 7.877GB/s and using two provides 15.754GB/s.  Plenty of bandwidth available on the PCIe interface.  A 16Gb FC port is capable of 1600MB/s, the server had four.  Four ports provide 6.4GB/s of bandwidth which matches up well to the 6.4GB/s that the 2 X-Bricks can deliver.

 

All of the tests were rerun after moving TempDB to XtremIO.  The TempDB volume still experienced response times greater than 200ms, but there was an improvement in bandwidth.  Bandwidth numbers increased to ~4.5GB/s, which is still lower than what the two X-Bricks could deliver.  There is more to performance than just IO.  CPU performance had a large impact as well.  Most of the time the server CPU was 100% across 44 physical cores.  Even when the server was reconfigured with 96 cores (4 sockets, 24 cores each ), CPU was 100%.  It was determined that the queries that were CPU bound initially, didn't see any improvement or degradation in performance when TempDB was moved.  But those queries that were bandwidth bound on TempDB did see an improvement when moved to XtremIO.

 

In the end, it's all about understanding where bottlenecks can occur.  Yes, multiple PCIe flash could have been used but the server was limited on PCIe slots.  But also, why add more PCIe flash when XtremIO has plenty of bandwidth to handle the workload.  In addition to handling more workload, there are also all of the benefits of using shared storage.  With XtremIO, if we came to a situation where more performance or space was needed, additional X-Bricks can be added while keeping everything online.  The same thing can't be said for PCIe flash.