EMC Hadoop Starter Kit (HSK)

Step By Step Guide To Quickly And Easily Deploy Hadoop

Hadoop like any new technology can be time consuming, and expensive for our customers to get deployed and operational. When we surveyed a number of our customers, two main challenges were identified to getting started: confusion over which Hadoop distribution to use and how to deploy using existing IT assets and knowledge.  The Hadoop Starter Kit (HSK) is intended to simplify all Hadoop distribution deployments, reduce the time to deployment, and the cost of deployment while leveraging common IT technologies such as EMC Isilon storage and VMware virtualization.

Download the HSK guide now for your Hadoop distribution of choice:


HSK 3.0 is the latest release and combines the power of VMware vSphere Big Data Extensions(BDE) 2.0 with Isilon scale-out NAS to achieve a comprehensive big data storage and analytics solution that delivers the following benefits:


  • Rapid provisioning – From the creation of virtual Hadoop nodes to starting up the Hadoop services on the cluster, much of the Hadoop cluster deployment can be automated, requiring little expertise on the user’s part. Virtual Hadoop clusters can be rapidly deployed and configured as needed.
  • High availability – Reliability is critical for certain mission-critical uses of Hadoop. HA protection can be provided through the virtualization platform to protect the single points of failure (SPOF) in the Hadoop system, such as the NameNode for HDFS and JobTracker for MapReduce.
  • Virtualization Extensions: Accelerate cloud provisioning, deployment and management using VMware vSphere BDE.
  • Elasticity – Hadoop capacity can be scaled up and down on demand in a virtual environment, thus allowing the same physical infrastructure to be shared among Hadoop and other applications. This consolidation of workloads results in more efficient resource utilization and reduced costs.
  • Multi-tenancy – Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation. With virtualization, mixed workloads that include non-Hadoop applications can run alongside Hadoop on the same physical cluster.
  • Portability - Use any Hadoop distribution througout the Big Data application lifecycle with zero data migration - Apache Open Source, Pivotal HD, Cloudera, Hortonworks.


As you use the HSK to deploy Hadoop with Isilon, we want to remind you of Isilon’s validation matrix with the various Hadoop Distributions:(Please consult your EMC Account Representative for support on a higher version of a Hadoop Distribution)


Hadoop DistributionEMC Isilon OneFS
Pivotal PHD 2.1.0Up to OneFS 7.2.0
Cloudera CDH 5.1.3 with Cloudera Manager 5.2.0Up to OneFS 7.2.0
Hortonworks HDP 2.1Up to OneFS 7.2.0
IBM BigInsights Open Platform 4.0Up to OneFS 7.2.0