HBase Performance Testing on Isilon

Introduction

 

We did a series of performance benchmarking tests on an Isilon X410 cluster using the YCSB benchmarking suite and CDH 5.10.

 

The CAE POC lab environment was configured with 5x Isilon x410 nodes are running OneFS 8.0.0.4 and later 8.0.1.1  NFS large Block streaming benchmarks we should expect 5x ~700 MB/s writes (3.5 GB/s) and 5x ~1 GB/s reads (5 GB/s) for our theoretical aggregate maximums in any of these tests.

 

The (9) Compute nodes are Dell PowerEdge FC630 servers running CentOS 7.3.1611 each configured with 2x18C/36T-Intel® Xeon® CPU E5-2697 v4 @ 2.30GHz with 512GB of RAM.  Local storage is 2xSSD in Raid1 formatted as XFS for both OS and scratch space/spill files.

 

There were also 3 additional edge servers which were used to drive the YCSB load.

 

The backend network between compute nodes and Isilon is 10Gbps with Jumbo Frames set (MTU=9162) for the NICs and the switch ports.

 

arch1.png

 

CDH 5.10 was configured to run in an Access Zone on Isilon, service accounts were created in the Isilon Local provider and locally in the client /etc/passwd files.  All tests were run using a basic test user with no special privileges.

 

Isilon statistics were monitored with both IIQ and Grafana/Data Insights package.  CDH statistics were monitored with Cloudera Manager and also with Grafana.

 

 

Initial Testing

 

The first series of tests we ran were to determine the relevant parameters on the HBASE side that affected the overall output.  We used the YCSB tool to generate the load for HBASE.   This initial test was run using a single client (edge server) using the 'load' phase of YCSB and 40 Million rows.  This table was deleted prior to each run.

 

ycsb load hbase10 -P workloads/workloada1 -p table='ycsb_40Mtable_nr' -p columnfamily=family -threads 256 -p recordcount=40000000

 

hbase.regionserver.maxlogs - Maximum number of Write-Ahead Log (WAL) files. This value multiplied by HDFS Block Size (dfs.blocksize) is the size of the WAL that will need to be replayed when a server crashes. This value is inversely proportional to the frequency of flushes to disk

 

hbase.wal.regiongrouping.numgroups - When using Multiple HDFS WAL as the WALProvider, sets how many write-ahead-logs each RegionServer should run. Will result in this number of HDFS pipelines. Writes for a given Region only go to a single pipeline, spreading total RegionServer load.

 

 

graph-1.png

graph-2.png

 

The philosophy here was to parallel-ize as many writes as we could, so increasing the number of WALs and then the number of threads (pipeline) per WAL accomplishes this.  The previous two charts show that for a given number for 'maxlogs' 128 or 256 we see no real change indicating that we aren't really pushing into this number from the client side.  Oby varying the number of 'pipelines' per file though we see the trend indicating the parameter that is sensitive t oparallelization. The next question is then would be where does Isilon "get in the way" either with Disk IO, network, CPU or OneFS and we can take a look look at what Isilon statistics report.

graph-3.png

 

The network and CPU graphs tell us that the Isilon cluster is underutilized and has room for more work.  CPU would be > 80% and network badwidth would be more than 3 GB/s.

 

graph-4.png

These plots show the HDFS protocol statistics and how they are tranlated by OneFS.  The HDFS ops are multiples of dfs.blocksize which is 256MB here.  The interesting thing here is that the 'Heat' graph shows the OneFS file operations and you can see coorellation of writes and locks.  In this case HBase is doing appends to the WAL's so OneFS locks the WAL file for each write that is appended.  Which is what we would expect for stable writes on a clustered filesystem.  These would appear to be contributing to the limiting factor in this set of tests.

 

HBase Updates

 

This next test was to do some more experimenting in finding what happens at scale so I created a 1 BIllion row table, which took a good hour to generate, and then did a YCSB run that updated 10 million of the rows using the 'workloada' settings (50/50 read/write).  This was run on a single client and I was also looking for the most troughput I could generate so i ran this as a function of the number of YCSB threads.  One other note was that we did some tuning of Isilon and went to OneFS 8.0.1.1 which has performance tweaks for the Datanode service.  You can see the bump up in performance compared to the previous set of runs.  For these runs we st the hbase.regionserver.maxlogs = 256 and the hbase.wal.regiongrouping.numgroups = 20

 

graph-5.png

graph-6.png

graph-7.png

 

Looking at these runs the firts thing that is apparent is the fall off at high thread count.  I was curious if this was an Isilon issue or a client side issue.  We will see some further tests regarding that in the upcoming paragraphs.  But i can say that driving 200K+ Ops at an update latency of < 3ms is quite impressive.  Each of these update runs was pretty fast and i was able to do them one after another and the graph below shows the even balance across the Isilon nodes for these runs.

 

graph-8.png

Again you can see from the Heat graph that the file operations are writes and locks corresponding to the append nature of the WAL processes.

 

Region Server Scaling

 

The next test was to determine how the isilon nodees (5 of them) would fare against different number of region servers.  The same update script ran in the previous test was run here.  A 1 Billion row table and 10 million rows updated using 'workloada' with a single client and YCSB threads at 51, We also kept the same setting on the maxlogs and pipelines (256 and 20 respectively).

 

graph-9.png

 

graph-10.png

 

The results are pretty informative, albeit not surprising.  The scale-out nature of HBase combined with the scale-out nature of Isilon and more==better.  This is a test i would recommend customers run on their environments as part of their own sizing exercise.  It may come to a point of diminishing returns, but here we have 9 pretty hefty servers pushing 5 Isilon nodes and it looks like there is room for more.

 

More Clients

 

The last series of tests come from that deep dark place that makes you want to break the system you're testing.  After all its a perfectly valid scientific method to ratchet a test up until things break and call thereby knowing what the upper limit on the parameters being tested are.  This series of tests I had two additional servers that i could use to run the client from, in addition I ran two YCSB clients on each one allowing me to scale up to 6 clients each driving 512 threads, which would be 4096 threads overall.  I went back and created two different tables one with 4 Billion rows split into 600 regions and one with 400million rows split into 90 regions.  

 

 

graph-11.png

 

graph-12.png

 

graph-13.png

 

As you can see, the size of the table matters little in this test.  Looking at the Isilon Heat charts again you can see there is a few percentage difference in the number of file operations mostly inline with the differences of a 4 Billion row table to 400 Million rows.

 

graph-14.png

 

Conclusion

 

HBase is a good candidate for running on Isilon, mainly because of the scale-out to scale-out architectures.  HBase does a lot of its own caching, and isplitting the table across a good number of regions you will get HBase to scale-out with your data.  In other words it does a good job of taking care of its own needs and the filesystem is there for persistence.  We weren't able to push the load tests to the point of actually breaking things but if your looking at 4 Billion rows in your Hbase design and expect 800,000 operations with less than 3 ms of latency this archtecture will support it.  If you notice I didn't mention much more about any of the myriad other client side tweaks you could apply to HBase itself, I would expect all those tweaks to still be valid, and beyond the scope of this test.