ECN Big Data Roundup, January 2015

Follow ECN Everything Big Data technical community


Inside EMC 920X300.jpg


Step-By-Step Guide:  Hadoop Benchmark With Terasort


Are you in the midst of a Hadoop POC or attempting to optimize Hadoop performance?  This document will describe in detail how to benchmark Hadoop. In particular, it will cover how to use the Terasort suite to benchmark YARN MapReduce.  Although applicable to any benchmarking with Terasort, there are specific recommendations that apply when using an EMC Isilon cluster for HDFS storage.


Like any benchmark, Terasort, or the related Teragen and Teravalidate benchmarks, may have limited or even no relevance to your particular workload. If you have a specific workload you are trying to measure or optimize, then you should use that exact workload. In the absence of a specific workload, a generic benchmark may be useful.


There are many benchmarks related to Hadoop and Terasort is just one of them, although it appears to be the most widely used. It is a MapReduce benchmark and likely has little relevance to non-MapReduce workloads such as HBase, Impala, Hive on Tez, HAWQ, and SOLR.

 

Access step by step guide

 

 

Strata + Hadoop World Feb 17-20, San Jose:  Meet With EMC Big Data Experts and Executives

EMC is sending its top Federation big data experts and senior executives to engage with you at the big data event of the year. Strata + Hadoop World is where big data's most influential business decision makers, strategists, architects, developers, and analysts gather – you won’t want to miss it.


Do you want to?

• Interact with top EMC execs like Sam Grocott and Aidan O’Brien

• Engage with elite EMC Federation big data architects, data scientists and visionaries

• Meet-face-to face with innovators and thought leaders

 

If the answer is yes, contact your EMC account manager to schedule a meeting with an EMC Federation expert now.  You will also have the opportunity to meet EMC Federation experts by visiting the EMC Booth #631.

 

Access additional Strata + Hadoop World event details

 

 

On-Demand Webcast:  The 3rd Platform, A New Frontier to Modernize Your Infrastructure

>

The 3rd platform involves mobile devices and platforms in cloud, big data, analytics, and social technologies - the new frontier, stretching the reach and impact of data centers across geographies.  Learn how to transform your traditional, process-driven IT model into a digitized, market-driven environment so your business can thrive.


In this 60-minute webcast with IDC, VCE and EMC you will learn about:

Market environments and transitions to maximize your IT investments

Accelerating time from idea to results on integrated, modular solutions and extensions from the 2nd to 3rd platform

Designing for future needs with scale-out, modular technologies – buy as you go,  private, public, hybrid

Customer case studies and benefits of a modern, connected approach


If you missed the live webcast Jan 20, it is now available on-demand

 

 

Customer Success:  Adobe Virtualizes Its Large Scale Hadoop Deployment

 

After eight weeks of fine-tuning the virtual HDaaS infrastructure, Adobe succeeded in running a 65-terabyte Hadoop workload - significantly larger than the largest known virtual Hadoop workloads. In addition, this was the largest workload ever tested by EMC in a virtual Hadoop environment on Isilon.

 

Fundamentally, these results proved that Isilon as the HDFS layer worked. In fact, the POC refutes claims by some in the industry that suggest shared storage will cause problems with Hadoop. To the contrary, Isilon had no adverse effects and even contributed superior results in a virtualized HDaaS environment compared to traditional Hadoop clusters. These advantages apply to many aspects of Hadoop, including performance, storage efficiency, data protection, and flexibility.

 

Download the white paper

 

Education:  Want to become a Data Scientist?

Data Science and Big Data Analytics is about harnessing the power of data to gain new insights. Covering the breadth of activities, methods, and tools that Data Scientists use, the book focuses on concepts and principles that can be practically applied to any industry and technology environment. The learning is supported and explained with examples that you can replicate using open-source software.


This book will help you:

• Become a contributor on a data science team

• Deploy a structured lifecycle approach to data analytics problems

• Apply appropriate analytic techniques and tools to analyze big data

• Learn how to tell a compelling story with data to drive business action

Prepare for EMC Proven Professional Data Scientist certification

 

Learn more


hdr_in_this_issue.jpg

 

social-tw-45.jpg


Follow @EMCBigData

 

 

blog-icon.jpg


Subscribe to EMC Big Data Blog

 

social-yt-45.jpg


Watch EMC Big Data YouTube Playlist