The first thing that I noticed as I made my way to the registration desk at the Hadoop Summit was the abundance of Fortune 1000 companies that had sent people to this show. Many of the companies sending employees to the Hadoop Summit are the big data pioneers that burned through all the best practices and platform options for an enterprise data warehouse years ago. They are now in various stages of deploying Hadoop and Big Data analytics. Slim Baltagi (@slimbalgati) from Capital One Financial Corp noted in the introduction to his session on Analysis of Trends in Big Data Analytics that he has been doing Big Data projects for 7 years and has the scars to prove it. Many of the attendees at the San Jose Summit talked extensively about the numerous experiments and several production deployments they have been a part of. They were genuinely exited to be at the show to learn more and share experiences with the Hadoop community. The world of Big Data is moving at light speed with a dizzying array of new open source and commercial tool options being announced every few weeks. The Big Data community is very open to exchanging information and even code through open source projects while remaining laser focused on solving business problems in a unique and proprietary implementation for the benefit of a single organization.
That passion for knowledge is what made this such a great show for VCE (@VCE), the EMC Converged Platforms Division. For example, at EMC's session on Increasing Hadoop Resiliency & Performance with EMC Isilon delivered by Boni Bruno (@bonibruno), about 50% of the participants knew what our Isilon (@EMCIsilon) product is and the other 50% where getting their first introduction to the product in a context they are passionate about, Big Data.
The experience with talking to attendees in the EMC booth was very similar. Approximately 50% of attendees knew about EMC's efforts in Big Data and the other half were hearing about our offerings for the first time. This was an excellent show for us to spread the word about our Big Data Solution as well as our Hadoop compatible platforms from the Elastic Cloud Storage (@EMCECS) and DSSD (@EMCDSSD) business units. I'm going to finish this post with some background on the EMC Big Data Solution with links to additional online content for those wanting to dig deeper. I'll be writing more about the EMC ECS and DSSD solutions in future posts and covering all new announcements on Twitter using the handle @GotDisk.
Big Data Solution Architecture Block Diagram
The EMC Big Data Solution (BDS) is a complete turn key hardware and software platform for implementing a Hadoop-based data lake with all the required management and end-user tools for data engineers and data scientists. The BDS builds on on the very popular VCE Vblock converged infrastructure platform. As you can see in the graphic above, the core Vblock is enhanced with additional software integration from EMC partners to provide a complete data lake and analytics platform in an integrated solution.
For data engineers, the BDS offers data lake hardware options for using EMC Isilon, EMC Elastic Cloud Storage and/or the EMC XtremIO all-flash array platform. Data ingestion, lineage, and quality are managed by Zaloni software for complete data lake management. Zaloni help simplify and automate common data management tasks so organization can focus resources on building analytics. Data index and search services are provided by Attivio software. Many of the Fortune 100 rely on Attivio to quickly find, correlate, and return the most relevant search results to boost knowledge worker productivity. The third tool of interest for data engineers is Blue Talon for data access and authorization. BlueTalon offers the best data control technology for today’s data challenges including authorization, access control, enforcement and auditing. VCE integrates these critical pieces necessary for an enterprise grade data lake implementation into a single SKU with a single point of support for your entire investment.
For data scientists, the BDS provides a platform for productivity and collaboration. The environment for data scientists is organized around the creation, use, and sharing of work spaces. A work space is a collection of tools and data focused on a user defined analysis project. Tools that can be easily deployed into a work space include R, RStudio, Python, Java and Tableau for visualization shown in the Extension Packs block in the diagram above. The BDS can also support other technologies such as MongoDB to allow for the users to apply the right tools to best address their needs.
Work spaces can be shared between multiple collaborators that need to work jointly on a project using a common pool of data. Data is organized into one or more data sets that can be saved either in the work space or back to the data lake to be available to other data scientists. Having your data analytics platform organized around data sharing and collaboration with tightly integrated access controls will immediately create an environment for better data governance and productivity.
The EMC BDS is currently being previewed at selected customer locations and trade shows around the globe. There were a number of interviews conducted by SiliconAngle for theCUBE at the Hadoop Summit. Here are links to a couple of recordings featuring EMC's Carey James (@careyjames33) from with various partners mentioned above in this article.
For information on upcoming shows that will be featuring the EMC Business Data Solution or any questions related to other Big Data products or solutions mentioned in this article, leave us a message using the comments section below.
Thanks for reading,
Phil Hummel, EMCDSA