We just passed the five year anniversary of the initial release of Apache Hadoop and I think it fair to say it a ubiquitous technology. Born out of Google's need to find a way to store and query a repository of every URL in the world, plus all the relationships between them, Hadoop is now synonymous with enterprise big data. What is fascinating to me is that a technology designed for the largest of the large web scale companies is now providing critical value to so many organizations. We all interact everyday with companies that use Hadoop but don't think of them in same way we do Google or Yahoo. The companies we rely are making and selling the everyday goods and services that we want, but yes, they rely on Hadoop primarily to keep us coming back. Hadoop on the other hand is not an everyday product. Hadoop is better describes as an ecosystem made up of a collection of open source software projects, tools, and people. Since Hadoop is open source with no single point of control, the Hadoop community is critical to the ongoing success of the ecosystem.
The core of what makes Hadoop so powerful is the Hadoop Distributed File System (HDFS) and the MapReduce engine for scheduling and tracking jobs and tasks. An example of how 3rd parties enhance the Hadoop ecosystem is the Dell EMC Isilon enterprise storage platform. Hadoop compute clients can access data stored in an Isilon cluster to perform massive scale analytics by connecting via HDFS. Isilon's multi-access protocol support means you can also connect to the cluster with your existing workflows and standard protocols, including SMB, HTTP, FTP, REST, and NFS as well as HDFS.
On the software side of the Hadoop ecosystem is an incredibly rich set of tools that sit on top or along side Hadoop. You may have heard of many of the most popular access tools including Apache Pig, Hive, and HBase that are also part of the open source Hadoop ecosystem. What may be more surprising is the integration of commercial software with Hadoop.
For example, take Splunk Enterprise, the leading platform for Operational Intelligence that enables the curious to look closely at what others ignore—machine data. Since machine data management is one of the fastest growing needs for our customers moving into the world of big data, Dell EMC is developing solutions for every stage of the journey from first proof-of-technology to rack scale massive implementations. Through our partnership with Splunk Software, we recently published the results of a joint Splunk and Dell EMC Solution Guide. It contains both a high level "why Splunk and Dell EMC" section as well as details about the engineering lab work that we did to show how easy and powerful this solution is.
Splunk Analytics for Hadoop is a full-featured, integrated analytics platform that enables you to interactively explore, analyze, and visualize data in Hadoop without building fixed schemas.
It lets you rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop without having to move or replicate data.
Splunk Analytics for Hadoop can be used to:
- Interactively query raw data by previewing results and refining searches using the same Splunk Enterprise interface.
- Quickly create and share charts, graphs, and dashboards.
- Ensure security with role-based access control and HDFS pass-through authentication.
New paradigms, such as the Hadoop framework, have become extremely popular because of their capability to ingest and store massive amounts of raw data. This guide is intended for infrastructure administrators, storage administrators, virtualization administrators, system administrators, IT managers, and others who evaluate, acquire, manage, maintain, or operate Splunk Enterprise environments. In the guide we describe how to integrate Splunk Analytics for Hadoop with an existing data lake implemented using Isilon support for native Hadoop Distributed File System (HDFS) enterprise-ready Hadoop storage. Please take some time and review this solution guide and use the comment area below to tell us what you think or ask any questions.
Thanks for reading
Phil Hummel, EMCDSA
On Twitter @GotDisk