1 2 Previous Next 23 Replies Latest reply: Jul 28, 2014 2:44 PM by Roberto Araujo RSS

Ask the Expert: EMC Isilon Scale-out Data Lake

Mark

Welcome to the EMC Ask the Expert discussion on Isilon following the EMC Redefine Possible announcement

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask the Expert: Are you ready to manage deep archiving workloads with Isilon’s HD400 node and OneFS 7.2.0? Find out more about the Data Lake Foundation products

Ask The Expert – Isilon’s New Releases: IsilonSD Edge, OneFS.NEXT and CloudPools

Ask the Experts: EMC Isilon technical content and documentation

 

 

EMC Isilon provides an enterprise grade scale-out data lake to help protect, manage and secure all unstructured data. We are reinforcing our data lake with new announcements that include – 2 new platforms, new solutions, new access methods and SmartFlash Flash as Cache. Join us to learn more about our strategy and what’s new at Isilon.

 

Your hosts:

 

profile-image-display.jspa?imageID=10823&size=350

Nicholas Kirsch is the Chief Technology Officer and Vice President of the Isilon Storage Division at EMC. His primary focus is extending EMC Isilon's lead in Scale-Out NAS products, technologies and market solutions. Nick is currently responsible for both product and technology strategy for Isilon's integrated storage appliance and the OneFS distributed file system. He also drives advanced development and strategic acquisitions.

 

Nick joined Isilon Engineering in 2002 as a Software Engineer for OneFS before serving as the Director of Software Engineering through 2007. He built and led Isilon's Product Management organization as Senior Director of Product Management through 2012. Nick holds Bachelor of Science degrees in Computer Science and Mathematics from the University of Puget Sound and a Master's degree in Computer Science from the University of Washington.

 

This discussion will take place July 8 - 25.

 

Share this event on Twitter:

>> Join Ask the Expert, EMC Isilon scale-out Data Lake http://bit.ly/1mGPlqD #EMCATE <<

  • 1. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    Jim Cahill

    The term data lake is new to me. Can you briefly describe it in the context of the storage industry, in general, and specifically in the context of Isilon scale-out NAS. Thanks!

  • 2. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    MRWA

    Hello Jim, I just noticed a Data Lake white paper that was posted yesterday:

    Isilon scale-out Data Lake (white paper)

     

    Hope that helps

     

    -Michael

  • 3. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    nick_kirsch

    Jim,

     

    (I copied this from the RSVP thread - hopefully you saw it there earlier!)

     

    My apologies on the delay - we just finished the Seoul, Korea edition of the MegaLaunch and had a wonderful time redefining possible. As I see it in the most general context, a data lake is a shared storage infrastructure which enables a multitude of different applications and workloads to interact seamlessly. This naturally applies primarily to unstructured data (since storage systems can have knowledge of this information) and demands scalable technologies (since scale of performance and capacity are both critical requirements.)

     

    In the context of Isilon’s Scale-Out data lake, we indeed provide a large scalable storage infrastructure for unstructured data - and more specifically, we provide seamless and shared access to applications which communicate via NFS, SMB (Windows), FTP/HTTP, HDFS (Hadoop), and (coming soon) OpenStack Swift. This is an extremely powerful combination of access methods, as it enables applications designed and written for a variety of purposes to co-exist peacefully - or more interestingly, interact (without data movement or additional copies.) Imagine (as an example) logging network access via traditional UNIX applications over NFS, using Hadoop MapReduce to find potential intrusion points, and generating graphical reports that can be viewed via a Windows workstation.

     

    Added benefits of Isilon’s approach to the data lake is that the enterprise capabilities around storage, security, performance, and information management can all be applied uniformly to any or all of the application data. A few fun examples include the ability to provide secure multi-tenancy for Hadoop applications, de-dupe between Openstack Swift objects and NFS files, and see advanced performance details on simultaneous access (across all these protocols) to the same files!

     

    I could go on, but I hope this explanation provides a strong foundation and sparks more curiosity.

     

    All the best,

     

    Nick

  • 4. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    Jim Cahill

    Thanks, Michael!

  • 5. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    Jim Cahill

    Thanks, Nick. Very helpful.

  • 6. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    lnicholes

    Hi Nick!

     

    It sounds like with Isilon advancements in Data Lake infrastructure, HDFS, and the upcoming Openstack Swift that unstructured data is a real growth area. Does this introduce any security complexities? What are some of the features Isilon has to help in this area?

  • 7. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    mhiers

    What new features in OneFS 7.1.1 are customers most excited about and which will help them swim, not sink, in the Data Lake?

  • 8. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    shellies

    Hi Nick

     

    What kind of performance improvements can we expect from the SmartFlash feature? Also, in what situations would GNA be a better choice for performance over SmartFlash and vice versa?

  • 9. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    jeenak

    How can customers best leverage the OneFS API to automate system configuration tasks?

  • 10. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    smurlidhar

    Hi Nick,

     

    What is the preferred mechanism to backup and restore data lakes in case of a hardware failure - NDMP or Cloud back-up?

     

    Thanks.

  • 11. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    nick_kirsch

    Hello!

     

    One of the many amazing things that only Isilon can do is provide a unified security context for these next-generation protocols (such as HDFS and OpenStack Swift).

     

    This means that authentication services are provided by proven Enterprise frameworks (such as Active Directory, LDAP/Kerberos, or NIS) and permissions models are shared across emerging (HDFS/Swift) and traditional NAS (NFS/SMB) protocols.

     

    In addition, OneFS brings support for multi-tenancy and encryption (via SEDs and local key management) easily to these emerging environments.

     

    This makes it not only easy to expand the set of applications and workloads, create shared pipelines and workflows, but to do so in a way that will please even the most stringent security administrator.

     

    I hope that helps!

     

    Nick

    @nkirsch

  • 12. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    nick_kirsch

    Backup!

     

    One of my favorite topics... Have you heard the one about the backup administrator and the bartender? Perhaps I'll leave that for Stephen Manley of EMC's DPAD group...

     

    Back to your question. The vast majority of Isilon customers pursue a multi-pronged approach to protecting their environment. First, it is important to note that OneFS provides industry-leading protection capabilities both through advanced Reed-Solomon encodings as well as end-to-end referential integrity. This will protect customers from a variety of local hardware faults, including up to four simultaneous node failures.

     

    Customers then take advantage of SnapshotIQ to ensure near limitless local snapshots for fast recovery from application or user errors as well as security or virus related incidents.

     

    I often see customers leverage SyncIQ in conjunction with a secondary Isilon cluster - either for disaster recovery or business continuity. This protects customers from a complete site failure while still enabling them to quickly get back to business. SyncIQ's unique design allows that secondary environment to be used for both failover and additional production purposes and doesn't require either the same hardware or the same version of OneFS (at both sites.)

     

    Finally, customers can deploy traditional NDMP backup, either 2-way or 3-way, and OneFS will work with all major backup providers - such as EMC Networker, CommVault Simpana, or Symantec NetBackup (to name a few.)

     

    A full spectrum of "backup" choices - which can be combined in clever ways. For example, SyncIQ data to a secondary sight and SnapshotIQ or NDMP backup from there.

     

    Did I mention that all of these capabilities are available regardless of which protocol - NFS, SMB, HDFS, or Swift?

     

    Wow!

     

    Nick

    @nkirsch

  • 13. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    nick_kirsch

    Perhaps the better question around the OneFS API is what can it not do?

     

    So far, my list includes laundry, my favorite espresso, and weeding the garden.

     

    The OneFS API, which is a modern, versioned RESTful interface, provides both access capabilities and management capabilities. It is the underlying control layer for nearly all of the Isilon web management and command-line tools.

     

    The possibilities range from simple tasks, such as provisioning SMB/NFS shares, to more complex tasks such as managing per-directory SmartPool file policies. In addition, the access methods allow for directory creation and permissions management, as well as authenticated file retrieval.

     

    I've had the pleasure of watching Isilon customers build complete self-service portals for their end-users leveraging nothing more than a web scripting language and the OneFS API.

     

    We have a complete reference guide available if you are interested.

     

    What will you build?

     

    Nick

    @nkirsch

  • 14. Re: Ask the Expert: EMC Isilon Scale-out Data Lake
    nick_kirsch

    Performance improvements will always have a big caveat - "it depends" - but I will give my best answer.

     

    In terms of raw performance numbers, a read request that could have taken as long as 7 milliseconds to service from a disk can now be serviced in almost 200 nanoseconds. That's 30 times faster!

     

    That said, the value of SmartFlash shouldn't be measured in disk versus flash access times but rather in the amount of the application working set that can easily (and automatically) fit into the scalable, cluster-coherent cache.

     

    SmartFlash can be scaled from a few hundred gigabytes of flash to nearly a petabyte! In addition, due to the Isilon scale-out architecture, you aren't just adding flash, you are adding CPU cores and network ports to service that flash - all while maintaining a single file system and single namespace.

     

    We expect Isilon customers to jump at the opportunity to cost-effectively and simply scale their entire application and user experience using SmartFlash.

     

    You asked another question about GNA (which means "global namespace acceleration".) I will simplify that to compare flash for read-caching with meta-data acceleration. SmartFlash is only for reads and can accelerate both meta-data (directory and attribute information) as well as data (file contents). Meta-data acceleration, on the other hand, can accelerate both reads and writes, but is only for accelerating information about files (rather than files themselves.)

     

    Clearly those are fairly different use cases, and we have some great tools (such as InsightIQ) and white papers to help you make the right choice for your environment.

     

    Vroom vroom!

     

    Nick

    @nkirsch

1 2 Previous Next