Find Communities by: Category | Product

Isilon and ECS for unstructured data storage

 

CommVault data protection software generates massive amounts of unstructured data.  What makes this CommVault backup data unstructured data?  Because the backup data is not stored in a structured CommVault database but as flat files in a filesystem or objects in an object store.  While CommVault does use databases to run its CommServe and MediaAgents (http://bit.ly/2oQX3al), the backup data is only indexed in a database.  The actual backup data chunks are stored outside the CommVault databases which makes it unstructured data.


Dell EMC is the industry leader when it comes to unstructured data storage.  For the second year in a row (http://bit.ly/2z6J3dP) Dell EMC has been recognized as the leader in Gartner's magic quadrant for distributed file systems and object storage.  Isilon is the distributed filesystem platform (NAS) and ECS (Elastic Cloud Storage) is the object storage platform.  Both combined offer an industry leading approach to storing unstructured data. 


In previous blog posts I have covered in detail how to use Isilon as a CommVault backup target (http://bit.ly/2vqtlJl).  Isilon scale-out NAS is a great choice as a CommVault backup target due to its ease of management and storage efficiency.  But when would I compliment an Isilon backup target with an ECS object store?  When would I use ECS in general for CommVault?  How do I configure ECS object storage with CommVault?  I'll cover all these questions and run through some configuration examples in this blog post.

 

Gartner 2017 Distribute File Systems and Object Storage Magic Quadrant

xlarge.jpg

https://www.gartner.com/doc/reprints?id=1-4FDB713&ct=170926&st=sb

 

What is ECS?

 

Let's talk about ECS and object storage.  ECS is an object storage platform that lets you host cloud storage within your datacenter.  Cloud storage (to me) is a storage platform that allows read/write access via industry standard-object APIs.  Typically the S3 protocol is used but can also be the Swift, Atmos, or CAS protocols (http://bit.ly/2jHVcCU).  All are web-based protocols that use HTTP(S) and a REST-based API to read/write data to the object storage platform.  ECS allows you to provide your business object storage without sending your users and developers to the public cloud. 


ECS looks like a storage appliance.  It has a node-based architecture, 10GbE connectivity, JBOD attached storage, erasure coding data protection, and the familiar black with blue stripe front bezel you recognize from other storage Dell EMC appliances (http://bit.ly/2hNYhjR).  What makes ECS different is how data is read and written to the appliance.  This is not a block storage array with LUNs nor is this an ethernet NAS storage appliance (although we have some flexibility here).  ECS requires an application to interact with the data using an API rather than a filesystem or LUN.  To use ECS you need an application (like CommVault) that supports REST-based API access to object storage.   

 

Object storage latency

 

NAS or SAN-based storage is designed to work on a local network and its workloads typically have a requirement for low latency.  A highly transactional database might be running on all-flash SAN storage due to a latency requirement of 1 millisecond (or less).  Or you might have scale-out NAS storage (like Isilon) that provides applications access to unstructured data with response time of less than 10 milliseconds.  Both workloads are much different than an object storage use case because SAN and NAS-based storage is intended to stay within the same datacenter as the application, often within a single SAN network or ethernet dedicated LAN network. 


Object storage has a different use case and performance characteristic than standard NAS or SAN-based storage.  Object storage is intended for web traffic using HTTP as the vehicle for transport.  Which means your application isn't expecting extremely low response times, often you will see web HTTP traffic with response times of 100 milliseconds.  You don't expect a workload with low latency requirements to use object storage.  But you will see plenty of new applications developed to take advantage of object storage for data that does not require low latency.


Object storage replication


Traditional SAN/NAS storage was built for hosting a workload in a single datacenter and replicate this data offsite for protection.  Which means traditional storage is replicated in an active/passive manner, you have an active site with "live" data and a passive site with a read-only copy.  You can fail over traditional storage between sites and reverse the roles but you don't typically have the same data active and available in both sites.  Yes, you can clone read-only DR data with snapshots to create a writable copy but the architecture remains active/passive. 


The ECS architecture is much more flexible than a traditional active/passive architecture since the ECS platform was built for more modern applications. With ECS object storage you can have geo-replication across multiple sites which means an active/active architecture for two sites, or even an active/active/active architecture for three sites.  With the aid of a load balancer (ie, http://bit.ly/2zUbe2B), users and applications can connect to any available site and read/write any object data.  Applications that need REST-based API storage using HTTP (and the associated latency) can access ECS content on with a global URL no matter where they are located.  Which makes ECS great as a global object content repository since the architecture is not limited by the location of the user or application.  Applications can take advantage of an active/active object storage architecture which does not necessarily need to be colocated in the same datacenter as the application. 


ECS and CommVault

 

CommVault is a data protection software platform I have written about previously with a focus on how it can interact with Isilon file-based protocols like SMB and NFS (http://bit.ly/2vqtlJl).  So why would I bother with ECS and object storage for CommVault, why not just use a NAS platform like Isilon? 


Generally, CommVault has a need to store some backups short-term while retaining other backups long-term.  This is referred to as the backup retention cycle and can vary depending on the type of data being protected.  Say you had 500 virtual machines that needed daily backups run and retained for a month.  Most data restore requests usually happen within a month of the backup so this short-term retention works well operationally.  However, say you also had a legal requirement to store one full backup a month of each VM for 7 years.  This means you need another strategy for keeping certain data for longer amounts of time which is very infrequently accessed. 


Tape backups used to be the standard for long-term storage.  Tapes were cheap and could be easily used for long-term retention of backups and archival purposes.  But these days, object storage is a much more attractive alternative as a medium for long-term storage.  Think of a global object content repository accessible from any datacenter in your company.  Instead of having local tape libraries in each of your datacenters with physical tapes to managed, a geo-federated set of ECS appliances could provide this type of long-term backup storage at a very low cost and with less physical management than a tape library. 

 

What about ECS for short-term retention?  What about using ECS for all your CommVault backup activity?  Not really a good fit, think about the latency.  Your short-term daily backups need fast storage to complete their backup cycle quickly and meet your SLAs.  10 millisecond or less response time will typically keep your backup application (like CommVault) happy and performing well.  Using HTTP(S) and REST-based APIs like S3 are not intended to provide this type of low latency since they are intended to service web-based traffic. Even if your ECS appliance is in the same datacenter as your CommVault infrastructure, the latency profile will be different and will not perform the same as a NAS-based appliance using NAS protocols.  Stick with NAS scale-out storage (like Isilon) for short-term daily backups and consider ECS for long-term backup/archival storage due to the higher latency values. 


What about using the ECS NFS gateway for short-term backups?  Or the CIFS-ECS application for Windows access to ECS?  While these tools provide the convenience of NAS protocols for Linux and Windows hosts, they are not intended for workloads that require NAS (LAN-based) latency.  Think of the ECS NAS gateways as a convenient way to upload/download and share data but not as substitute for NAS storage when implementing with performance based workloads.  Backup applications will do best with NAS-based storage for the daily backup cycle because they need the ~10ms or less response time when running active backups.  ECS can provide plenty of bandwidth is but if you don't have decent response time (low latency) your legacy applications written for SAN/NAS storage will suffer.  Use ECS for long-term backup storage and use a platform like Isilon for short-term backup storage. 


Lastly, why do I need a separate object storage platform like ECS for long-term backup storage?  Why not just retain long-term backups on the same storage where I keep my short-term backups?  Why use both Isilon for short-term and ECS for long-term backup storage? 


You can certainly use a single scale-out storage platform like Isilon for all backup storage.  But what if your retention requirements are forcing you to store petabytes of old data?  What if you want to start taking advantage of object storage and offload some of this content to a geo-distributed object archive rather than continue to grow your Isilon?  What if you want to keep your Isilon lean with high performance (low latency) and move all old data out to an object store?  All of these would be great reasons to explore object storage to compliment Isilon storage with a platform for short-term low latency storage and a second for long-term object storage. 

 

Configuring a CommVault library to use ECS

 

It's very easy to use object storage with CommVault.  CommVault supports various cloud storage "libraries" and lists EMC ECS as a supported S3 vendor (http://bit.ly/2pC1KFt).  This means you can add ECS as a "cloud storage library" and start moving long-term backups to this library just as you would with any other tape or disk library.


We'll make the assumption that you already have ECS up and running in your environment and have also created an ECS namespace and bucket for CommVault to use.  A namespace is simply a means to provide multi-tenancy within the ECS object store.  Multiple tenants typically use multiple namespaces while a single organization with no multi-tenancy requirements can use a single namespace for their ECS object store.  A bucket is an object storage container within a namespace that offers further granularity for user/group access, quotas, and retention.  See the ECS admin guide (http://bit.ly/2znYvWY) for more details.  For the purposes of this blog, we'll assume you have a namespace and bucket already created for use by CommVault backups, something like https://commvault@ecs.keith.com:9021 for a "commvault" bucket on an ECS appliance named ecs.keith.com using port 9021. 


Create a cloud storage library for ECS by opening the CommVault GUI and adding a new library of type "Cloud Storage Library" shown below in the diagram.  Give your ECS library a friendly name and set the "type" in the drop down list to "Amazon S3".  Also assign an existing MediaAgent host to control this library.  Next complete the access information for CommVault to communicate with ECS.  This includes:


     - Authentication set to "Access & Secret Keys"

     - Service host set to your HTTPS URL for ECS using port 9021

     - Access Key ID set during creation of your CommVault bucket

     - Secret Access Key set during creation of your CommVault bucket

     - Bucket name on ECS for CommVault

     - Storage class set to "standard"


See the next few screenshots for example configuration.  Once all the fields are completed we click "OK" and create the library. 


CommVault GUI - Add Cloud Storage Library

Screen Shot 2017-08-11 at 4.03.14 PM.png

 

CommVault GUI - Cloud Storage Properties

Screen Shot 2017-08-11 at 4.04.34 PM.png

 

Lets take a look at a few of the ECS library properties just for fun.  We see below that the ECS library is of cloud vendor type "Amazon S3" which means it will use S3 APIs to read/write to the object storage.  We also see that the library is a single mount path by default associated with a single MediaAgent and using the maximum allowed writers.  Keep the defaults, these are fine for our use case of long-term backup storage. 


CommVault GUI - ECS library general properties

Screen Shot 2017-08-11 at 4.05.03 PM.png


CommVault GUI - ECS library mount paths

Screen Shot 2017-08-11 at 4.05.10 PM.png

 

Configuring a CommVault storage policy to use an ECS library

 

Our intention is to use ECS as a long-term backup archive that will complement whatever storage we are using for daily short-term backup storage.  We use a NAS-based platform like Isilon for the daily backups and retain those backups for a standard time, say 15 days in the screenshots below.  Then we configure CommVault to send off backups for long-term retention to ECS, say 120 days in the screenshots below.  All this is very easy to configure using CommVault storage policy copies and the auxillary copy job to move the data.


For a recap on CommVault storage policies and CommVault's aux copy process see my previous blog post (http://bit.ly/2oQX3al).  A storage policy is CommVault's way of combining a storage library, MediaAgent, and retention settings all into a policy to assign to clients for a shared configuration.  A storage policy will have a primary copy where the backups run during the daily backup cycle and would typically use faster storage like Isilon for NAS-based access (could be SAN as well).  A storage policy can also have secondary copies with different backup retention settings for longer term backup storage.  CommVault's aux copy process moves backups between the primary and secondary copies when the aux copy job is run. 


Simply put, we have a primary storage policy copy on Isilon and we will create a  secondary storage policy copy on ECS using the ECS cloud library we configured earlier.  I already have a primary copy on Isilon in the screenshot below retaining backups for 15 days and 2 backup cycles (time between full backups).


CommVault GUI - storage policy primary copy general settings

Screen Shot 2017-08-11 at 4.19.14 PM.png


CommVault GUI - storage policy primary copy retention settings

Screen Shot 2017-08-11 at 4.19.22 PM.png


We want to create a secondary copy for long-term backups since we want to retain some backups for longer than the 15 day / 2 cycle retention on the primary copy.  All we need to do is to create a new storage policy copy and assign the ECS library along with the MediaAgent we used to control the ECS (Windows "MA" host below).  We then define a longer retention policy than the primary copy, I've used 120 days and 4 cycles as an example below.  I also keep the copy policy to set to "all backups" and enable deduplication (more on this below). 

 

CommVault GUI - Create new copy

Screen Shot 2017-08-11 at 4.20.00 PM.png


CommVault GUI - new copy general properties

Screen Shot 2017-08-11 at 4.20.30 PM.png


CommVault GUI - new copy retention settings

Screen Shot 2017-08-11 at 4.21.11 PM.png


CommVault GUI - new copy policy

Screen Shot 2017-08-11 at 4.21.20 PM.png

 

I'm using deduplication so I need to configure the MediaAgent deduplication database (DDB) for this storage policy copy.  This DDB stores the hashes used for deduplication blocks and should be placed on fast flash storage local to the MediaAgent or on an all-flash block based array.  I'll keep the defaults and point the DDB to local storage on my MediaAgent.  See the screenshots below for a walkthrough of the DDB configuration.


CommVault has a great feature called "DASH copy" which is a deduplication-aware secondary copy.  Say you had your backups nicely deduped on your primary copy, wouldn't you also want to preserve those capacity savings on your secondary copies?  If so, use the DASH copy which is configured along with the DDB during the creation of the storage policy copy on ECS.  Make sure the "Enable DASH Copy" is selected on your secondary copy and use either a disk read optimized copy or a network read optimized copy.  Ask your CommVault team which is right for you, I have the default disk read copy, see the screenshots below. 


CommVault GUI - new copy deduplication settings

Screen Shot 2017-08-11 at 4.22.58 PM.png


CommVault GUI - DDB access path

Screen Shot 2017-08-11 at 4.23.10 PM.png


CommVault GUI - completed DDB configuration with local paths

Screen Shot 2017-08-11 at 4.23.59 PM.png


CommVault GUI - copy deduplication completed settings

Screen Shot 2017-08-11 at 4.24.08 PM.png


CommVault GUI - enable DASH for storage policy copy

Screen Shot 2017-08-11 at 4.24.31 PM.png


Using CommVault Auxillary copy to move long-term backups to ECS

 

The configuration work is done, now we just use the CommVault auxillary copy job to move backups from our primary copy to our secondary ECS copy.  The MediaAgent that "owns" the storage policy will perform the data movement between copies while maintaining the deduplication scheme (if the option was enabled for the secondary copy). 


For our example, we have a storage policy named "Isilon.windows.mediaagent" with a "primary" copy on Isilon and a "ecs - long term" secondary copy on ECS.  When we view the backups on the Isilon primary copy we can see a completed client backup.  When we view the backups on the secondary ECS copy we see the same job with a "to be copied" status since we have not yet run an aux copy. 

 

CommVault GUI - example storage policy with two copies, Isilon and ECS

Screen Shot 2017-08-11 at 4.29.26 PM.png


CommVault GUI - view jobs on primary Isilon copy

Screen Shot 2017-08-11 at 4.29.45 PM.png


CommVault GUI - completed job #1486 on primary copy

Screen Shot 2017-08-11 at 4.30.05 PM.png


CommVault GUI - view jobs on ECS secondary copy

Screen Shot 2017-08-11 at 4.30.14 PM.png


CommVault GUI - job #1486 to be copied to ECS storage policy copy

Screen Shot 2017-08-11 at 4.30.33 PM.png


So how do we move our backups between primary and secondary copies?  Easy, just run an "auxillary copy" job or wait for the next scheduled aux copy job to run.  We'll leave the default options and move "all copies".  Below is the process to kick off an aux copy and also a few screenshots of the running job status and the completed job. 


CommVault GUI - run axillary copy

Screen Shot 2017-08-11 at 4.30.48 PM.png


CommVault GUI - aux copy job options

Screen Shot 2017-08-11 at 4.31.01 PM.png


CommVault GUI - aux copy job details of running job

Screen Shot 2017-08-11 at 4.31.27 PM.png


CommVault GUI - completed aux copy job

Screen Shot 2017-08-11 at 4.32.20 PM.png

 

Summary

 

I now have my CommVault backups on two different storage libraries with different levels of performance and retention.  My primary copy is on Isilon and intended to absorb the daily backup "churn" and store short-term backups with higher performance.  My secondary copy is using ECS object storage for long-term backup storage.  I run my operational SLA-driven backups on my primary copy and move my archival backups to an object store.  The bulk of my restores will most likely come from my primary copy but I have the flexibility to also restore from any long-term copy on the ECS storage.


Think about using ECS for CommVault long-term backup storage.  You can effectively build a cloud storage archive that is globally accessible across multiple datacenters and share this platform across multiple CommVault environments.  You can scale an object storage platform to multiple petabytes and eliminate the hassles of trying to send this data out to the public cloud.  And you can keep your primary short-term storage high performing while moving you long-term backups to an industry leading object storage platform. 


Thanks for reading, comments welcome! 


Why do CommVault customers store their backups on Isilon?  Because Isilon is scale-out, efficient, simple to setup, and easy to maintain.  Yes there are other reasons (https://www.emc.com/collateral/handout/h12842-ho-top-reasons-simpana-and-isilon.pdf) but the scale-out architecture and operational simplicity are most important.  A single scale-out NAS filesystem that dynamically grows with online capacity additions is the most flexible type of on-premises storage media for CommVault. 


What exactly does this mean?  Why is scale-out better than traditional scale-up storage?  Storage architectures that are scale-up will always have some type of size limit that prevents significant growth of a single filesystem.  Every legacy scale-up storage array will have its own unique capacity limit that will prevent this type of filesystem growth on NAS or SAN, whether it is a filesystem limitation or a RAID level construct.  Yes there are tricks you can use to get around this for each vendor but they are still compromises.  Scale-up storage will only grow to its limit and will not scale-out a single filesystem like Isilon. 


Why is it so important to have a single scale-out filesystem anyway?  What is wrong with using multiple NAS or SAN filesystems?  Both add overhead (capacity and operational) which is unnecessary.  Isilon allows you to setup a single SMB share for CommVault Windows MediaAgents that can dynamically grow as you add Isilon nodes.  A single SMB share for the entire backup environment that never runs out of space as long as you manage the Isilon capacity per best practices.  Anything else is a compromise, you will have to manage more paths, add more silos of storage, induce more usable capacity overhead, and create a more complicated architecture.


But what about Linux MediaAgents?  Scale-out is best for all CommVault disk libraries, even with Linux MediaAgents.  Isilon brings the same value to both Windows and Linux with an ever expanding scale-out disk library for backup storage.  Linux MediaAgents are configured differently than Windows and have different architectural concerns but are still simple to setup and require less sysadmin time to run in production than SAN or scale-up NAS. 

 

The goal of this post is to walk those unfamiliar with managing CommVault through he process of setting up Isilon as a disk library for CommVault Windows and Linux MediaAgents.  Walking through the setup process will demonstrate the simplicity of this architecture and the value Isilon scale-out NAS brings to CommVault.

 

Quick Review

 

I posted a previous blog (https://community.emc.com/blogs/keith/2017/05/01/isilon-as-a-commvault-backup-target--planning-and-sizing) covering the basics of the CommVault architecture and how they relate to Isilon. Take a look to get an in-depth overview since I won't repeat all the details in this post. 


To quickly recap, a CommVault deployment will look like the diagram below, the Isilon will act as a "library".  Specifically a disk library that can be shared among multiple MediaAgents not just a single MediaAgent like the diagram. 

 

CommVault architecture overview

commcell_logical.png

Lab Components

 

I've setup CommVault along with Isilon in a lab environment with the components below running virtualized on a single ESXi host. 

 

CommVault Simpana v11

CommServe - Win2K8R2 ("CS")

Windows MediaAgent - Win2K8R2 ("MA")

Linux MediaAgent - CentOS 7 ("Linux MA")

Client - Win2K8R2

Isilon - OneFS 8.0.0.4

DNS server (AD domain controller)  - Win2KR2

 

VM lab components - vSphere client

Screen Shot 2017-08-02 at 8.09.17 AM.png

 

Windows MediaAgent setup

General OneFS setup

 

First assume a standard Isilon cluster setup.  Nothing special has to be done on OneFS for CommVault but there are some key standard components that need to be functional for optimal performance.  Specifically SmartConnect with DNS which load balances end client SMB connections across multiple Isilon nodes.  Before anything else, get SmartConnect and DNS working correctly even before configuring Isilon as a disk library (https://youtu.be/zVwZDpyPedA).  All CommVault infrastructure servers (CommServe and all MediaAgents) should get rotating "round-robin" IP addresses returned from the Isilon with consecutive 'nslookup' commands.  Think of this as the number one Isilon best practice you can implement for CommVault.

 

Why is SmartConnect so important?  CommVault Windows MediaAgents will open multiple SMB connections to the entire Isilon cluster across multiple Isilon nodes when thing are working correctly.  This will spread out the backup load and take advantage of all Isilon cluster resources.  Without SmartConnect, the cluster can get unbalanced with performance hot spots - certain Isilon nodes can get overloaded with too many SMB connections and overall backup performance will suffer.  It's easy to get SmartConnect working, just do it! 

 

While we are on the topic of SmartConnect, leave the default "round-robin" connection policy in place.  This policy can always get tuned later when CommVault is up and running.  It's best to leave the default SmartConnect connection policy and confirm a baseline level of performance with CommVault backups.  Get a baseline with the defaults then tune the SmartConnect connection policy later if necessary.

 

Create a CommVault OneFS SMB share

 

This is where Isilon makes a huge difference to a storage sysadmin.  All we need is a single SMB share created on the Isilon for all CommVault Windows MediaAgents.  Simple right?  All backups from all CommVault Windows MediaAgents will pass through this single share.  Other storage solutions will have many steps of creating/masking/mapping many LUNs or exporting many storage array filesystems and shares which take significantly longer to setup, troubleshoot, and maintain.  Not with Isilon, a single SMB share is used for the disk library by all Windows MediaAgents in the CommCell. 

 

Just login to the Isilon webUI and create a new SMB share in the desired access zone (use the system zone by default if you like).  There is nothing special about this share, it's a standard SMB share with a name, path, and share permissions (persona).  You can even use the Isilon "root" account to authenticate if you like.  I created a local "commvault" account (with a local Isilon password) to use as the share persona.  Put an optional SmartQuota directory quota on the share root directory if you like (not shown below). 

 

Don't create multiple SMB shares for multiple Windows MediaAgents, it's not necessary!  The value in this architecture is using an single SMB share for a single Isilon disk library shared by all Windows MediaAgents. 

 

Create new SMB share - OneFS WebUI

Screen+Shot+2017-07-24+at+2.17.35+PM.png

 

Add Persona (account) to new SMB share - OneFS webUI

Screen Shot 2017-07-24 at 2.19.24 PM.png

 

CommVault Setup of an Isilon disk library

 

At this point the Isilon sysadmin is complete.  The CommVault sysadmins now take over and configure CommVault to use the single SMB share as a disk library.  This is also very easy, just create a new disk library, use the share persona account used when creating the share (my example is a local 'commvault' account), share account password, and Windows UNC path to the share (something like \\isilon.fully.qualified.domain.name.com\commvault).


Don't map Windows explorer drives to the share and don't use drive letters to the SMB share.  When a connection to the disk library is needed, CommVault will mount the UNC path dynamically with this configuration and load balancing across Isilon nodes will happen with OneFS SmartConnect.  You also need to select a single MediaAgent ("MA" in screenshot) to initially use the disk library which can be shared with other Windows MediaAgents later.

 

See the screenshots below that walk through an example. My Isilon disk library is given a name, assigned to a Windows MediaAgent ("MA"), authenticated with an local Isilon "commvault" account/password, and mapped to a UNC path (\\isilon.keith.com\commvault).  That's it, the setup of the disk library is done!


Again, this is why Isilon is great for CommVault, all of the sysadmin work (both Isilon and CommVault) on the disk library is done at this point and is future proof.  This single share can grow the library indefinitely without any additional configuration aside from Isilon node additions.  Keep adding Isilon nodes and the SMB share will continue to grow over the life of the disk library.  Nothing will fill up as long as the Isilon cluster capacity is managed per best practices.  No future migrations, no LUNs, no SAN networks, no management of multiple filesystem paths, just a simple SMB share with a single SMB mount from CommVault.

 

Isilon disk library setup

 

Add disk library - CommVault webUI

Screen Shot 2017-07-19 at 10.51.30 AM.png

Add disk library configuration - CommVault webUI

Screen Shot 2017-07-19 at 10.52.34 AM.png

Completed Isilon disk library - CommVault webUI

Screen Shot 2017-07-19 at 10.53.33 AM.png

 

Isilon disk library properties

 

Let's look at the Isilon disk library properties from the CommVault webUI.  We don't want to change anything at this point, keep the defaults until you have a reason to tune options.  This section is not intended as documentation of all the options, for that see the CommVault documentation (http://documentation.commvault.com/commvault/v11/article?p=features/disk_library/advanced.htm).  I just want to highlight a few options as they relate to Isilon to help understand how the disk library works.

 

General - This is the main screen for the disk library where you can take the library offline/online using the "enable library" checkbox for maintenance.  I can also can mark archive files as read-only which is integrated between CommVault v11 and OneFS SmartLock.  This is a very cool feature see the CommVault documentation link above for more details. 

 

Mount Paths - Here I find the settings mostly used for storage systems that need use multiple mount paths.  This is not necessary for Windows MediaAgents using Isilon.

 

Associations - This lists the storage policies associated with the disk library along with the storage policy copy name.  All storage policies have a "primary" copy for backups and optionally can have a "secondary" copy, usually for offsite or long term retention.

 

Security - CommVault webUI accounts with permissions to the library GUI tree component of the webUI, screenshot omitted

 

Disk Usage - Gives a nice summary of the disk library space from a CommVault perspective.

 

Space Management - Tab that allows a CommVault sysadmin to set thresholds for warning events when the disk library is running low on space.  Can also automatically spawn a data aging job when space is tight to remove old backups immediately on hitting the threshold since they are normally scheduled to run daily.

 

Isilon disk library "general" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.55.35 AM.png

 

Isilon disk library "mount path" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.55.46 AM.png

 

Isilon disk library "associations" properties - CommVault webUI

Screen Shot 2017-07-26 at 8.37.44 AM.png

 

Isilon disk library "disk usage" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.56.12 AM.png

 

Isilon disk library "space management" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.56.20 AM.png

 

Mount Path Properties

 

The mount path properties are different than the library properties, let's look at these also. Again, see the CommVault docs for full documentation (link above) since I am only highlighting of what is interesting from an Isilon perspective.

 

General - We find some useful space information on the mount path along with another option to enable/disable the disk library mount path for maintenance.

 

Allocation Policy - Here CommVault sets some reserved space on the mount path to avoid running a path/filesystem completely out of space (default of 2GB reserved space).  Isilon mount paths will grow automatically as capacity is added to the Isilon cluster or as quotas are increased on the SMB share, keep the 2GB default. 

 

Deduplication DBs - Screenshot omitted, summarizes the MediaAgent deduplication databases for the mount path, more on this later.

 

Sharing - Allows sharing of mount path/library between multiple MediaAgents.  If you want to add a second or third MediaAgent to this Isilon disk library mount path, simply click "share" and add the other Windows MediaAgents using the same UNC path and credentials.  MediaAgents will each need their own storage policy for load balancing but can all share the same Isilon mount path. 

 

Isilon mount path "general" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.56.47 AM.png

 

Isilon mount path "allocation policy" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.56.53 AM.png

 

Isilon mount path "sharing" properties - CommVault webUI

Screen Shot 2017-07-19 at 10.57.08 AM.png

 

Creation of a Storage Policy

 

We now have an Isilon disk library owned by a Windows MediaAgent and have seen all the properties for our various configuration options.  How do we now start using the library?


We have to create a CommVault "storage policy" to send client backups to the Isilon through the Windows MediaAgent(s).  This is simply a shared policy that clients use to send a library backups through a specific MediaAgent with a few attributes like the backup retention time.  See the wizard screenshots below for an example, this is a very basic configuration required for any storage library.  The "data aging" (or pruning) process is a scheduled job that runs and enforces each storage policy retention time by deleting expired backups.

 

Wizard walkthrough:

 

Data protection and archiving - We will create a storage policy for backups ("data protection and archiving"), the other option is a DR backup for the CommServe database

 

Storage policy name - self explanatory, leave the other options unchecked

 

Select library - select the Isilon disk library we created earlier

 

MediaAgent - select the MediaAgent that will push data from the client to the Isilon for this storage policy.

 

Here is an important point to clear up any confusion.  Multiple MediaAgents can all share the same Isilon disk library.  However, we would typically assign a single MediaAgent to a single storage policy.  Then assign clients to various storage policies to load balance across MediaAgents.  So use multiple storage policies each with a unique MediaAgent (all using the Isilon library) and split your client backups between these storage policies.

 

Streams and retention criteria - each stream will spawn a new SMB connection to the Isilon library, leave the default but it can be tuned later.  The retention policy is how long this storage policy keeps backups, measured in days and cycles.  A cycle is a full backup plus incrementals, a second cycle starts when a second full is run.  Again, the data aging (pruning) job enforces the retention settings on a storage policy by deleting old backups.

 

Another important point to clear up confusion.  Performance bottlenecks are a fact of life.  Once you have a significant number of clients running backups (say 100, 200, 500 etc) there will a bottleneck somewhere in the architecture that slow things down.  This could be the MediaAgent (CPU, RAM, DDB speed), the network, or the storage library.  Bottlenecks can induce degraded performance and are either fixed by adding more resources (more MediaAgents, more network, more storage resources) or tuning the system to ease the stress on the bottleneck.  This tuning can be as simple as reducing concurrency (simultaneous client backups) by throttling the number of writers and backup streams.

 

Let's take an example.  Lets say when running nightly backups my MediaAgent CPU is pegged @ 100% CPU and the host gets unresponsive.  I'd much rather run the system at a maximum of 80% CPU which gives me few choices for remediation.  I can invest in another MediaAgent, create another storage policy, and balance clients across multiple MediaAgents and reduce the CPU load on the first MediaAgent.  But what if I don't have another MediaAgent available?  In that case, I can artificially limit (reduce) the writers to the library and/or the streams (reduce) on the storage policy to ease the concurrency load during backups.  Less writers and less streams means less simultaneous backup activity which could help improve my MediaAgent performance. 

 

Let's take another example.  When running my nightly backups my MediaAgents are running fine but my storage system is pegged @ 100% CPU.  Or the disk I/O subsystem is 100% busy, the disk wait time is very high, and the disk queues are more than full.  This causes the client SMB latency to spike dramatically and I'd rather run the storage with less load and better response time.  I can invest in more storage resources and increase performance (add an Isilon node).  But what if I have more than enough capacity and don't want to invest in more storage?  Again, I would decrease the concurrency by tuning (reducing) the max writers on the library or tuning (reducing) the streams on each storage policy.

 

Contact CommVault for help with this type of tuning.

 

Advanced (software encryption) - self explanitory

 

Deduplication DB - each MediaAgent will use a separate deduplication database (DDB) for each storage policy and store them locally on the MediaAgent.  This database is best hosted on block flash storage (either shared or local), avoid putting this on an Isilon NAS share.   

 

Review - review selections and create the storage policy

 

New storage policy - CommVault webUI

Screen Shot 2017-07-19 at 10.58.06 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.58.17 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.58.39 AM.png\


New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.58.49 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.58.56 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.59.21 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.59.30 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.59.37 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 10.59.48 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 11.00.30 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 11.00.41 AM.png

 

New storage policy wizard - CommVault webUI

Screen Shot 2017-07-19 at 11.00.47 AM.png

 

Newly created storage policy - CommVault webUI

Screen Shot 2017-07-19 at 11.01.06 AM.png

 

Assign Storage Policy to a client and run a backup

 

The disk library is configured and a new storage policy has been created to use our Windows MediaAgent with our Isilon disk library.  All that has to be done now is to assign a backup client to this storage policy and run a backup.  I've omitted screenshots of this process since it is a standard operation for all CommVault sysadmins. 

 

Linux MediaAgent setup


Isilon works just as well for Linux MediaAgents as a CommVault disk library as it does for Windows MediaAgents.  Both benefit from a scale-out architecture and result in a disk library that automatically grows as Isilon capacity is added.  However, there are some differences between how Windows mounts SMB shares and how Linux mounts NFS shares that impact the sysadmin steps required to create an Isilon disk library for Linux MediaAgents. 

 

Linux MediaAgent Differences

 

What is the difference between using a CommVault Windows MediaAgent versus a Linux MediaAgent with an Isilon disk library?  Both take advantage of an Isilon scale-out architecture which allows rapid data growth with minimal administration effort.  However, Linux MediaAgents differ in that they do not dynamically mount a UNC path when they need to write to a disk library like a Windows MediaAgent.  Linux hosts need to mount shared storage with NFS when creating an Isilon disk library on Linux MediaAgents.  This raises some questions and concerns.  How will this work with SmartConnect client load balancing across the Isilon cluster?  How can a single NFS mount take advantage of scale-out storage? Can Isilon bring any benefit to Linux MediaAgents?


Multiple mount paths


We don't want an entire disk library mounted with a single NFS mount on a single Linux host since it will become a bottleneck.  Why?  That configuration would only use a single IP address to NFS mount a single Isilon node and wouldn't balance the workload across all nodes in the Isilon cluster.  The Isilon architecture prefers concurrency of many connections across all Isilon nodes in a cluster, not a single NFS connection to a single node.  So we need our Linux MediaAgent to connect to more than a single Isilon node IP address when writing large amounts of data during a CommVault backup.


Is a single NFS connection to a single Isilon node a bad thing?  It will function fine and protect the data when written to OneFS.  The Linux host will connect to an NFS mount over a single IP address to a single Isilon node.  Data written to the NFS mount will be erasure coded and spread across all Isilon nodes in the OneFS nodepool through the back-end cluster network.  However, the process of writing data to the cluster its coordinated by the Isilon node that accepts the initial NFS connection.  This node performs the erasure coding and sends the data to other nodes using the coordinator node hardware resources (CPU/RAM/network/infiniband).  If this coordinator Isilon node becomes overwhelmed with write requests it can become a bottleneck since it will be very busy while the other nodes in the cluster may not be as busy.  More work can be done if a client connects to multiple Isilon nodes for writing rather than a single Isilon node, especially for a large amount of backup data.


The best way to load balance across the Isilon cluster is to configure the Linux MediaAgent to use multiple NFS mounts to multiple Isilon nodes.  SmartConnect will balance the NFS mount requests across the Isilon cluster by returning the client a rotating list of IP addresses that reside in the OneFS IP address pool.  Repeated mount commands will cycle through IP addresses on the Isilon via SmartConnect and multiple Isilon nodes will service client mount read/write requests once the mounts are made. The Linux host will use these fixed IP addresses returned from SmartConnect through the life of each NFS mount until the host is rebooted or the mount unmounted and remounted.

 

Mount path usage and balancing


Can a Commvault MediaAgent take advantage of multiple NFS mount points for backups?  Absolutely!  Remember CommVault MediaAgents also support block SAN storage which usually is allocated with multiple LUNs each formatted with a host filesystem.  The software needs a method for the MediaAgent to spread backups across several separate local filesystems which could be local DAS, SAN, or in the case of Linux, hard NFS mounts. CommVault takes care of this local mount path load balancing in the MediaAgent software and can utilize multiple mount paths efficiently. 


This load balancing feature is a property of the disk library.  Multiple paths can be balanced by using a "fill and spill mount paths..." or a "spill and fill mount paths..." algorithm to use multiple block or NAS filesystems on a MediaAgent.  "Fill and spill..." simply means fill the first mount path (capacity) and then start using the next mount path until it is full.  "Spill and fill..." is the opposite where each mount is used in more of a round-robin fashion and all mounts are filled just about equally.  See the screenshots later in the configuration walkthrough that demonstrates the mount path option.  We are going to select "Spill and fill..." for Isilon because we want to use the multiple rotating mount paths for I/O load balancing.  The goal is to spread the I/O across multiple Isilon nodes which means creating multiple mount paths that are balanced across the Isilon cluster by SmartConnect. 

 

Performance tuning of clients to use multiple mount paths


Disk libraries with multiple mount paths and a round-robin type load balancing configuration may need some performance tuning of the backup clients.  Why?  A backup single stream will only use a single MediaAgent mount point.  More specifically, each backup client will have "readers" (read threads) that will write to MediaAgent mount paths with "writers" (write threads).  A large backup with a single reader won't use our multi-mount path round-robin Linux MediaAgent to its full potential.  So we may need to tune clients with large amounts of backup data to take advantage of multiple paths.  We do this by adjusting the number of client readers and possibly also allowing multiple readers to access a single client volume/mount.

 

Before you start tuning, think about the concurrency in your architecture.  Will this Linux MediaAgent funnel backup data to the disk library from a single backup client?  Several backup clients?  Many backup clients?  Client tuning is probably not necessary for many clients, they will naturally balance across the Linux MediaAgent mount paths using the "spill and fill" algorithm.  However, single clients or only a few client may need tuning since they may not use all MediaAgent mount paths equally. 

 

Let's look at an example.  Say you have four (4) Isilon NFS mount paths on a single Linux MediaAgent.  A single (Windows) client backup using this MediaAgent will default to two (2) readers and will not use multiple readers within a single drive letter or volume mount.  So at most we only use two (2) mounts paths out of four (4) total on the Linux MediaAgent.  Performance tuning is necessary in this case, we want to use all mount paths for load balancing.

 

Take another example.  Say you have the same four (4) Isilion NFS mount paths on the Linux MediaAgent but instead you have twenty (20) clients to backup.  Clients will default to two (2) readers each but since we have more clients we will naturally load balance across the mount paths without any performance tuning.  More clients and more concurrency will most likely not require any performance tuning since the multiple mount paths will be more fully utilized. 

 

Disk library capacity reporting and OneFS quotas


Isilon NFS exports without quotas can seem confusing.  Say I have 1PB of total capacity in my Isilon cluster.  I create an NFS export with no quota and mount it from a Linux host.  The Linux host will see the entire 1PB of capacity from the NFS client mount when running a 'df' command.  Now create several more NFS exports on the same Isilon and mount them from the same Linux host.  Linux will see 1PB of total capacity for each of the NFS mounts made from the Isilon.  The used/available/total capacity will adjust with usage but each NFS mount will report the entire capacity of the Isilon cluster.  This is why we use the OneFS SmartQuota feature, to control the export size as reported by the end client. 

 

A CommVault disk library will view the total capacity of the library as the sum of its mount paths. In our example above, we have a 1PB Isilon without quotas.  Each client NFS mount will report the entire 1PB filesystem capacity.  What if the Linux host mounted four (4) NFS mounts for the CommVault disk library?  Without quotas, Linux would report 4PB across the four (4) paths and CommVault would report that the disk library was 4PB in total capacity.  Each separate NFS mount will magnify the total library capacity without quotas.  This is not necessarily a problem, backups will work fine.  But this can get confusing if the storage team is managing Isilon capacity using OneFS and the CommVault team is managing capacity from the CommVault webUI.

 

What if you don't want each NFS mount to assume it owns the entire Isilon cluster capacity?  What if you don't want the mount path magnification of the total size and inflate the total size by the number of mount paths?  Then use quotas.  Quotas on Isilon are very granular and can be set on any directory in the OneFS filesystem.  A quota can be placed on an NFS export OneFS directory or on any subdirectory within that NFS export.  Want a disk library to start at with 400TB of total capacity?  Put a 100TB hard quota on the NFS export parent directory and each mount path will be 100TB in size, four (4) mount paths will create a 400TB CommVault disk library.  Or put a 100TB quota on each mount path nested under the NFS export parent directory. 


Hard directory quotas are required to force the Linux NFS mount to report on the quota size.  When utilization of the NFS exports starts to get too high, simply increase the quotas when more capacity is required.  The CommVault Linux MediaAgent queries the mount path every 30 minutes and updates the capacity utilization.  So increase the quota and wait 30 minutes and your disk library will grow in size.  There is very minimal effort required to grow a library, no need for additional NFS export, no additional LUNs, no Linux or CommVault sysadmin work necessary. 

 

Does Isilon still benefit Linux MediaAgents?


Does Isilon still benefit Linux MediaAgents?  Does scale-out work better than scale-up for this use case?  The initial configuration of a Linux MediaAgent with SAN or scale-up NAS will be similar to Isilon.  Allocate a few LUNs or NFS exports to a Linux host and mount for use with CommVault.  The difference comes when we need to grow the library.  Isilon requires very little effort.  The NFS exports to  the Linux hosts will always have capacity as long as the Isilon capacity is managed per best practices.  Isilon NFS exports will dynamically grow with the Isilon cluster and require little to no sysadmin work.  Add Isilon nodes to the Isilon cluster and the shares get larger.  If quotas are used, increase a quota to dynamically grow an NFS export and the disk library automatically.  Capacity allocation is performed only once during the initial configuration which is a unique benefit, SAN or scale-up NAS will always require more sysadmin work to grow the environment when capacity is running low. 


Scale-out also adds performance with capacity.  SAN or scale-up NAS will scale capacity up to a certain point when the array controller becomes a bottleneck and can no longer service additional workloads.  As a CommVault environment grows, it needs more performance not less.  A disk library that grows from 1PB to 2PB will need more performance with the additional capacity.  Since Isilon is scale-out we automatically get this performance bump with each capacity addition.  This is a much simpler solution than SAN or scale-up NAS for a customer and avoids having multiple storage arrays for a single disk library or performance troubleshooting of certain filesystems, RAID groups, aggregates, etc.

 

General OneFS setup


Linux MediaAgents rely on a few key features in OneFs.  Primarily SmartConnect which we discussed earlier for Windows MediaAgents.  Get SmartConnect working before you start any CommVault configuration.  Make sure your Linux Media Agent receives unique "round-robin" IP addresses from the Isilon with consecutive 'nslookup' commands.

 

Windows MediaAgents should use a OneFS "static" IP address pool on Isilon.  Linux MediaAgents should use a OneFS "dynamic" IP address pool.  Why?  SMB2 is a stateful protocol and works best with fixed IP addresses that remain associated with a specific ethernet interface on an Isilon node.  NFSv3 is a stateless protocol and can benefit from "floating" IP addresses that move between interfaces when nodes get added or removed from the Isilon cluster (or rebooted).  So if you are sharing the Isilon cluster between both Windows and Linux MediaAgents, create a static IP address pool for Windows MediaAgents with a unique SmartConnect zone name.  Create a separate dynamic IP address pool for Linux MediaAgents with a separate unique SmartConnect zone name.  Both pools can be on the same subnet, but it is best to separate them and use static IPs for SMB2 (Windows) and dynamic IPs for NFSv3 (Linux).

 

The default "round-robin" SmartConnect client connection balancing policy is most likely best for most CommVault deployments.   Linux MediaAgents will mount their NFS shares infrequently and probably won't query SmartConnect very often.  The policy can always be adjusted if necessary but it's best to keep the default "round-robin" and baseline performance prior to making any changes.

 

Create a CommVault OneFS NFS export

 

OneFS has flexibly for hosts mounting NFS exports.  Hosts can directly mount OneFS NFS exports or can mount subdirectories within the parent NFS export.  Linux MediaAgents will benefit from load balancing I/O across multiple Isilon mount paths.  So we have a choice, we can create an NFS export for each CommVault mount path or create a single parent NFS export for the Linux host which individually mounts export subdirectories for each mount path.  Either will work but we can save some time by just creating a single export per Linux host.  Then just create subdirectories within this parent export for the Linux host to mount as CommVault mount paths.

 

How many Isilon mount paths should be used by a Linux MediaAgent?  It depends!  There are no simple answers here, the number of Isilon nodes and the horsepower of the Linux hosts will determine how many mount paths are optimal.  A Linux host may be very busy with only a few mounts paths and not need more I/O paths to the Isilon.  On the other hand, the number of mount paths may be the bottleneck and the host may be able to push more data with more mount paths.  There is no broad statement to make here, talk to CommVault to get official sizing for the application.  CommVault can estimate the amount of data the MediaAgent is capable of pushing to a disk library which can then be spread across a few mount paths.  There is a feature in CommVault to "validate" a mount path which tests I/O.  Get a baseline for each mount path and size accordingly for the total number of mount paths required.


Don't bypass SmartConnect by mounting the Isilon cluster by IP address to attempt to manually load balance I/O.  Dynamic IP addresses will shuffle across Isilon node interfaces over time due to node additions, removals, and reboots.  The IP address residing on a particular Isilon node will over time move to another node.  Use SmartConnect DNS name when mounting NFS shares to allow the cluster to load balance connections.

 

In my example below I have a single Linux MediaAgent.  I have create a parent directory for this MediaAgent (/ifs/linuxma) and I have created four (4) subdirectories under this parent directory for my Linux host to mount which will give me four (4) mount paths to present to CommVault.  The parent directory is exported with the default options.  The Isilon sysadmin work is done, we configure the Linux host next.

 

Isilon CLI


# pwd

/ifs/linuxma

eightdotfour-1# ls

mountfour mountone mountthree mounttwo

 

# isi nfs exports view 2

                     ID: 2

                   Zone: System

                  Paths: /ifs/linuxma

            Description: -

                Clients: -

           Root Clients: -

      Read Only Clients: -

     Read Write Clients: -

               All Dirs: Yes

             Block Size: 8.0k

           Can Set Time: Yes

       Case Insensitive: No

        Case Preserving: Yes

       Chown Restricted: No

    Commit Asynchronous: No

Directory Transfer Size: 128.0k

               Encoding: DEFAULT

<snip>

 

CommVault Setup of an Isilon disk library


Configuring a Linux MediaAgent is a two step process.  We first mount the NFS shares from the Linux CLI and then finish by configuring the disk library in the CommVault webUI.

 

Mounting Isilon NFS export mount paths on Linux

 

I'll use /mnt to mount the Isilon exports and I've created a subdirectory under /mnt for each of the four (4) Isilon subdirectories under my single NFS export. The /mnt directories "commvaultM1", "commvaultM2", "commvaultM3", "commvaultM4" will mount the "mountone", "mounttwo", "mountthree", and "mount four" directories under the /ifs/linuxma/ export.  The OneFS /ifs/linuxma/mountone directory will get mounted as /mnt/commvaultM1, /ifs/linuxma/mounttwo will get mounted as /mnt/commvaultM2, etc.

 

Each mount command will cycle through the Isilon SmartConnect IP pool assuming SmartConnect is working correctly.  A single IP address for all mounts will not load balance I/O, multiple mounts each need a unique IP address returned from SmartConnect.  My lab has an IP address pool range of 192.168.0.51-53 and each of my mount commands will use a unique IP address.  Dynamic IP addresses will get shuffled across Isilon nodes over time, the distribution doesn't have to be perfectly balanced across the Isilon cluster but should be balanced enough to avoid overwhelming a single Isilon node with write I/O. 


What mount options should I use for CommVault NFS?  I don't use any below, I'd rather get a baseline for performance and possibly adjust the mount options later.  Again, CommVault provides a function to "validate" a mount path, test I/O with various mount options if you like to see if any provide performance improvements.  Start with the basics and make tuning adjustments later.

 

Linux MediaAgent CLI

 

# ls -lah /mnt

total 4.0K

drwxr-xr-x.  6 root root   78 Jul 31 14:45 .

dr-xr-xr-x. 17 root root 4.0K Jul 31 11:53 ..

drwxr-xr-x   2 root root    6 Jul 31 12:04 commvaultM1

drwxr-xr-x   2 root root    6 Jul 31 14:45 commvaultM2

drwxr-xr-x   2 root root    6 Jul 31 14:45 commvaultM3

drwxr-xr-x   2 root root    6 Jul 31 14:45 commvaultM4

 

# mount isilon.keith.com:/ifs/linuxma/mountone /mnt/commvaultM1

# mount isilon.keith.com:/ifs/linuxma/mounttwo /mnt/commvaultM2

# mount isilon.keith.com:/ifs/linuxma/mountthree /mnt/commvaultM3

# mount isilon.keith.com:/ifs/linuxma/mountfour /mnt/commvaultM4

 

# mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

<snip>

isilon.keith.com:/ifs/linuxma/mountone on /mnt/commvaultM1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.53,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.0.53)

isilon.keith.com:/ifs/linuxma/mounttwo on /mnt/commvaultM2 type nfs (rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.51,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.0.51)

isilon.keith.com:/ifs/linuxma/mountthree on /mnt/commvaultM3 type nfs (rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.53,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.0.53)

isilon.keith.com:/ifs/linuxma/mountfour on /mnt/commvaultM4 type nfs (rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.52,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=192.168.0.52)

 

Add Isilon disk library

 

Use the CommVault webUI to complete the configuration of the Linux MediaAgent disk library.  I've included a screenshot of the Linux MediaAgent properties to start with, I have a CentOS 7 host installed with the CommVault MediaAgent software.

 

Following the screenshots below, we add a disk library, give it a name, associate with an existing Linux MediaAgent, and add the first Isilon NFS mount path.  A single mount path is added for the initial creation, additional mount paths are added immediately after with the "add mount path" GUI dialog.  The last screenshot is a summary of all four (4) Isilon NFS mount paths added to this disk library associated with the Linux MediaAgent.

 

LinuxMediaAgent properties - CommVault webUI

Screen Shot 2017-08-01 at 10.19.51 AM.png

 

Add disk library - CommVault webUI

Screen Shot 2017-08-01 at 10.20.03 AM.png

 

Add disk library - name and select MediaAgent - CommVault webUI

Screen+Shot+2017-08-01+at+10.20.28+AM.png

 

Add disk library - browse for local path - CommVault webUI

Screen Shot 2017-08-01 at 10.20.40 AM.png

 

Add disk library - select first mount path - CommVault webUI

Screen Shot 2017-08-01 at 10.20.53 AM.png

 

Add disk library complete - CommVault webUI

Screen Shot 2017-08-01 at 10.21.04 AM.png

 

Add additional mount path (repeat for all available mount paths)  - CommVault webUI

Screen Shot 2017-08-01 at 10.23.26 AM.png

 

Final view of all mount paths - CommVault webUI

Screen Shot 2017-08-01 at 10.24.30 AM.png

 

Change library properties

 

Our disk library has all mount paths added and it at full capacity.  We now need to adjust our load balancing algorithm to balance I/O across all mount paths equally.  This setting can be views from the "mount paths" tab of the library properties.  Change the "mount path usage" from "fill and spill..." to "spill and fill...".  "Spill and fill..." will cycle I/O across the mount paths equally rather than filling up a single mount path before moving to the next.

 

After making this change, the disk library is ready to use.  I have not included screenshots but simply create a storage policy like we did earlier using this Linux MediaAgent and disk library.  Then assign the storage policy to backup client to start using the library and MediaAgent.

 

Isilon library properties - CommVault webUI

Screen Shot 2017-08-01 at 10.27.15 AM.png

 

Mount path properties for Isilon library - "Fill and spill" - CommVault webUI

Screen Shot 2017-08-01 at 10.27.25 AM.png

 

Mount path properties for Isilon library - change to "Spill and fill" - CommVault webUI

Screen Shot 2017-08-01 at 10.27.37 AM.png

 

Performance tuning - readers and writers

 

As discussed earlier, you may want to tune client performance if you only have a single backup client or a small number of backup clients.  Adjust performance when you don't have a large degree of concurrency (many backup clients) or aren't fully utilizing the available Linux mount paths.  The default is two (2) readers (threads) per Windows client and also by default will not allow multiple readers per drive or mount point.  So if you want a client to run more than 2 threads or run multiple threads per filesystem see the steps below. 

 

In my example, I have four (4) mount paths configured on my Linux MediAgent disk library.  I'm configuring a Windows client to backup to the Linux MediaAgent and Isilon disk library.  The default settings on the Windows backup client will not take advantage of the four (4) paths (default to two (2) readers and does not allow multiple readers per mount).  So we are going to adjust the number of readers to four (4) and we will allow multiple readers per drive/mount.

 

My client is going to backup up a single Windows directory with a single large file and some small files.  The default settings would only run a single stream since my content resides on a single drive and would only use a single MediaAgent mount path.  By tuning performance I can drive my backup data from a single Windows client across my four (4) Isilon mount paths using the new tuning parameters.

 

The screenshots below also show a running and completed backup of the Windows client after making the performance tuning adjustments.  The running job shows four (4) streams running to each Isilon mount path and the job history shows four (4) streams.  I have a single large ~600MB file on my Windows client which is sent to a single mount path.  The other mount paths are used for the small files within the client backup content.  A single large file is not going to be split across multiple mount paths in this configuration, a single file is sent to a single path.


There is a feature in CommVault to "view contents" of each mount path which will be unique in our configuration.  Each stream will send unique data to each mount path which can be validated by viewing each mount path's contents (no screenshots). 

 

Subclient properties - advanced button - CommVault webUI

Screen Shot 2017-08-01 at 10.30.51 AM.png

 

Performance tab - advanced sub client properties - CommVault webUI

Screen Shot 2017-08-02 at 2.48.39 PM.png

 

Performance tab - change advanced sub client properties - CommVault webUI

Screen Shot 2017-08-01 at 10.31.01 AM.png

 

Running backup of client with performance changes - streams - CommVault webUI

Screen+Shot+2017-08-01+at+10.32.38+AM.png

 

Client backup completed job details - CommVaultwebUI

Screen Shot 2017-08-01 at 10.34.22 AM.png

 

Disk library capacity and quotas


Isilon NFS shares without quotas can be deceiving to CommVault as we mentioned earlier.  Every NFS mount will "see" the entire Isilon cluster capacity if no quotas are used.  Mount several NFS mounts from an Isilon cluster and each will seem to own entire cluster capacity.  Combine NFS mounts to create a a disk library and it will appear to be the size of the entire Isilon cluster magnified by the number of mount paths.

 

Does this really matter?  CommVault will use a mount path until the free space on the mount path reaches the reserved space set on that path.  By default, CommVault will reserve 2GB per mount path and will stop writing to that mount path when it hits this 2GB free reserve.  Keep the Isilon total capacity healthy per best practices and the NFS mounts will never run out of space, they will continue to grow as Isilon nodes are added for additional capacity.   The disk library when viewed from the CommVault software will appear to be larger than the physical Isilon but there will be no loss of functionality, backups will run just fine.

 

But what if I don't want the disk library to appear larger than the available Isilon capacity?  What if I want to put a limit on how much data the disk library can write to the Isilon?  Both are reasonable concerns and are why the SmartQuotas license is recommended for CommVault customers using Isilon.  Use quotas to limit the capacity of the NFS export either at a parent directory or subdirectory level for Linux mount paths.  Hard directory quota limits are required to force Linux to see the NFS mount as a fixed size dictated by the quota.  CommVault will stop writing to the mount paths when the used capacity reaches the hard quota limit (minus the 2GB reserve).

 

What if I fill all mount paths and need to increase the capacity of the disk library?  Hard quotas will run out of space eventually right?  Yes, just increase the quota and the share will automatically grow on both the Linux host and on the CommVault disk library mount paths.  CommVault checks for disk space updates every 30 minutes by default.  So increase the quota and just give it some time, the available capacity will increase automatically after the quotas are increased. 

 

Linux mount paths with no quotas

 

In my example below, I have an Isilon cluster that is ~56GB in total capacity and has ~20GB of free space.  I use four (4) NFS mounts from my Linux client when creating the disk library so I have four (4) mounts each seeing the entire capacity and availability of the Isilon cluster by default with no quotas.  CommVault will report 4 x 56GB total capacity for the disk library (224GB) total and 4 x 20GB free space (80GB) total.  CommVault will report this total as capacity consumed and capacity free for all the library mount paths.

 

This will work fine but is confusing to the Linux and CommVault sysadmins.  The capacity free and consumed will update from the Isilon every 30 minutes so any Isilon cluster capacity increases will automatically grow the shares.  And the consumed/free will be accurate for each mount path, the disk library "disk usage" tab will just be inaccurate since it will total all mount paths associated with the library and magnify the total actual capacity. 

 

Isilon disk library properties - no quotas - CommVault webUI

Screen Shot 2017-08-02 at 4.27.30 PM.png

 

Isilon disk library mount paths - no quotas - CommVault webUI

Screen Shot 2017-08-02 at 4.27.44 PM.png

 

Adding OneFS quotas to Linux mount paths

 

I set the quota below on the Linux MediaAgent parent NFS export directory (/ifs/linuxma) to 10GB.  This can also be done on each of the individual mount paths but it will save time to set a single quota at the parent directory.  Each of the NFS mounts will appear to the Linux host as 10GB total per mount instead of the 56GB previously above with no quota.  See the CLI output for my quota on the Isilon cluster and then see the Linux 'df' output for the NFS mounts. 

 

After waiting for approximately 30 minutes, the CommVault webUI updates the library properties and the mount path disk space usage.  Notice how the CommVault webUI now sees the total library space as 40GB (4 x 10GB mounts) and also updated the mount paths to show 10GB per mount path.  Ignore the 64% consumed above and the 0% consumed below, I had to delete a few things between screenshots .

 

Quota on LinuxMA root directory - OneFS CLI

 

# isi quota quota view /ifs/linuxma directory

                       Path: /ifs/linuxma

                       Type: directory

                  Snapshots: No

Thresholds Include Overhead: No

                      Usage

                          Files: 21

                  With Overhead: 134.50k

                   W/O Overhead: 32.66k

                       Over: -

                   Enforced: Yes

                  Container: Yes

                     Linked: -

                 Thresholds

                 Hard Threshold: 10.00G

                  Hard Exceeded: No

             Hard Last Exceeded: 1969-12-31T19:00:00

                       Advisory: -

              Advisory Exceeded: No

         Advisory Last Exceeded: -

                 Soft Threshold: -

                  Soft Exceeded: No

             Soft Last Exceeded: -

                     Soft Grace: -

 

Linux MA CLI

 

# df -h /mnt/*

Filesystem                                Size  Used Avail Use% Mounted on

isilon.keith.com:/ifs/linuxma/mountone     10G     0   10G   0% /mnt/commvaultM1

isilon.keith.com:/ifs/linuxma/mounttwo     10G     0   10G   0% /mnt/commvaultM2

isilon.keith.com:/ifs/linuxma/mountthree   10G     0   10G   0% /mnt/commvaultM3

isilon.keith.com:/ifs/linuxma/mountfour    10G     0   10G   0% /mnt/commvaultM4

 

Isilon disk library properties - with quotas - CommVault webUI

Screen Shot 2017-08-02 at 4.23.57 PM.png

 

Isilon disk library mount paths - with quotas - CommVault webUI

Screen+Shot+2017-08-02+at+3.07.12+PM.png

Summary

 

Isilon for Commvault is easy.  Create a single SMB share for all your Windows MediaAgents to mount via UNC path and you are done.  Create an NFS export for each of your Linux MediaAgents and NFS mount a few mounts paths for load balancing.  Grow the Isilon cluster and both your Windows and Linux MediaAgent disk libraries grow automatically.  Your sysadmins will have much less busy work when compared to a SAN or scale-up NAS alternative. 


Thanks for reading, comments welcome!

Introduction to the Isilon Data Insights Connector

 

Are you the "do it yourself" type of Isilon sysadmin?  Have you even looked at the OneFS API documentation and wondered how you could build your own monitoring and performance visualization tool?  Well, you are in luck!  The Isilon Data Insights Connector has been shared by Isilon Engineering on GitHub (https://github.com/Isilon/isilon_data_insights_connector) and allows you to collect monitoring and performance statistics from Isilon OneFS clusters, push those stats to an Influxdb time-series database, and display this data using Grafana visualization dashboards.  In this post I'll explore why you would deploy a custom dashboard solution, briefly describe the components, and then run through the steps to implement.

 

Why would I create my own performance monitoring tools when I already have Isilon InsightIQ?  Make no mistake, InsightIQ (https://www.emc.com/collateral/software/data-sheet/h8317-ds-isilon-insightiq.pdf - IIQ for short) is the preferred and supported method of monitoring all of your Isilon clusters.  InsightIQ is prebuilt and does not require much effort to setup - simply deploy a VMware IIQ OVA, point it to your Isilon clusters, and you are ready to monitor both performance and the OneFS filesystem analytics.  IIQ allows you to create some custom reports from the prebuilt set of metrics collected and even email that report out on a schedule to selected users.  To be very clear, InsightIQ is the supported way to monitor your Isilon clusters.

 

But what if I want a different view of the performance statistics that isn't available in InsightIQ?  What if I want to dig in and query the database where all of the time-series statistics are stored?  What if I want to correlate two different types of statistics over time to explore some type of relationship that I find interesting? IIQ data is stored in a database within the IIQ host but it is not available for customers to query or change.  And while InsightIQ provides many views of both performance and filesystem statistics, it may not give you that one view of  a combination of statistics that could help monitor the behavior of an application.  The Isilon Data insights Connector when combined with Influxdb and Grafana gives you the ability to manage all of the data collected in your own open source database and tweak the visualization dashboards all you like.


Screenshot - Isilon Data Insights "Cluster Summary" Grafana dashboard

Screen Shot 2017-01-20 at 2.28.00 PM.png


Components of Isilon Data Insights Connector

 

The OneFS API (https://www.emc.com/collateral/TechnicalDocument/docu66301.pdf) allows access to Isilon cluster configuration and cluster data including performance metrics through a REST interface.  The Isilon Data Insights Connector is simply a set of python scripts that control a daemon process which queries Isilon clusters for these metrics via the OneFS API.  Once queried the Data Insights Connector needs somewhere to store this query data which is handled by Influxdb (https://www.influxdata.com/time-series-platform/influxdb/).  Influxdb is simply a time-series database that will store the queried performance statistics with the associated time stamps.  Lastly, we use Grafana (http://grafana.org/) as a way to visualize this time based performance data with dashboards that can be customized (see screenshot above).

 

Pretty simple,  The connector pulls data from OneFS via the API, stores it in an open source database (Influxdb) and then we use an open source visualization tool (Grafana) to view the data.  All the components can run on a single Linux VM which I show how to install in the sample steps below.  Another benefit is that you can keep as much data as you like in your open source database with as much granularity as you like without any "down-sampling".  This means you won't lose granularity of the data over time to save space provided you allow your database to grow larger over time.  You can get very detailed dashboards going back as many months as you like if you have the storage capacity in Influxdb to keep all data indefinitely (by default stored in /var/lib/influxdb/data).

 

Prerequisites

 

- Existing Isilon cluster(s) with API support (OneFS 7.x, 8.x).  I'm using free Isilon simulators (https://www.emc.com/products-solutions/trial-software-download/isilon.htm).

- A single CentOS 7 minimal ISO install with access to the internet so that we can pull down packages via 'yum'.  I'm using a Linux virtual machine hosted on vSphere 6.

- Isilon Data Insights Connector (https://github.com/Isilon/isilon_data_insights_connector) which we will clone with 'git'.

 

Steps

 

Linux VM Preparation

 

1.  Install CentOS 7 from a minimal ISO.  Why CentOS 7?  Because the earlier versions of CentOS have Python 2.6 packages bundled into the OS for system config use.  You'll need the Python 2.7.x version that ships with CentOS 7 to get the Isilon Data Insights Connector deployed successfully in this example.  I'm using the local local Python installation rather than the Python Virtual Environment which is *not* covered in this post.

 

In my example I've deployed a relatively small CentOS 7 VM for demonstration purposes.  As part of this VM I configured with:

 

  • 4GB RAM
  • 1 virtual socket, 1 core per socket
  • 25GB disk capacity
  • 1 NIC configured during installation with DHCP and IPv4

 

2.  SSH into your new CentOS VM's IPv4 address and update yum.


yum -y update

 

3.  In my lab environment I'm running a DNS name server on my Windows domain controller.  Set your Linux hostname if did not during the initial CentOS installation and make it a fully qualified domain name that will resolve as part of your DNS setup (edit /etc/hostname). My Linux host is named "influxdbgrafana.keith.com". 

 

4.  Edit your /etc/resolv.conf file and add your DNS name server.  Again, I'm using a Windows domain controller for DNS and I added the domain controller's FQDN to my Linux host's /etc/resolv.conf file for name resolution.  FYI, DHCP will overwrite your resolv.conf when rebooting and you will lose your changes if you don't change the default behavior.  You can do this a variety of ways, I'm lazy and I force it.


chattr +i /etc/resolv.conf

 

5.  Install nslookup.


yum -y install bind-utils

 

6.  Test name resolution between your Linux VM and your Isilon cluster SmartConnect DNS names. Configuring DNS for Isilon is outside of the scope of this post but when DNS is working, you should be able to resolve the Isilon cluster names successfully with nslookup.  In my example I have two clusters named demo.keith.com and demo7.keith.com both resolve to IP addresses when queried with nslookup.  This step is critical because without name resolution you will not get you Linux VM to communicate with your Isilon clusters (although you could use IP addresses I'd rather use DNS names). 

 

# nslookup demo.keith.com

Server: 192.168.0.15

Address: 192.168.0.15#53

 

Name: demo.keith.com

Address: 192.168.0.53

 

# nslookup demo7.keith.com

Server: 192.168.0.15

Address: 192.168.0.15#53

 

Name: demo7.keith.com

Address: 192.168.0.61

 

7.  Disable SELinux to keep things simple . Edit your /etc/selinux/config file and set "SELINUX=disabled" then reboot. 

 

8. There should be no firewall daemon running from a CentOS minimal install, just to be certain check with this command (confirm it is not installed by verifying you get a "no such file or directory" error).


systemctl status firewalld

 

9.  Getting the date and time to sync between the Isilon cluster(s) and your Linux VM is critical for time base graphs.  Without the date/time sync'd things will not work correctly. All Isilon clusters and the Linux VM should be in the same time zone and pull from the same NTP source server. 


I made my Windows domain controller the NTP server for my lab and use it to sync both my Isilon clusters and my Linux VM with the NTP client installed.  Configuring NTP on Isilon is out of scope for this post but easy to do in the Isilon WebUI. 


To configure the NTP client on Linux:


  • yum -y install ntp ntpdate
  • edit the /etc/ntp.conf file and add your NTP server name or IP address to the configuration
  • chkconfig ntpd on
  • ntpdate <your NTP server>
  • systemctl start ntpd
  • ntpq -p

 

The last line should show your NTP server was queried and responded with the correct time information.  See my sample output below:


# ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================

*win-fqus1o8el2m 97.107.128.58    3 u  965 1024  377    0.226   -5.963  13.343

 

Data Insights Connector Installation

 

1.  Install the "git" package that we will use to clone the connector binaries from GitHub


yum -y install git

 

2.  Create a new directory or 'cd' into the directory where you want the connector files to reside, i used /opt


cd /opt

 

3.  Clone  the connector from GitHub


git clone https://github.com/Isilon/isilon_data_insights_connector.git

 

4.  Install the following two packages with yum that will allow us to install some requirements


yum -y install epel-release python-pip

 

5.  'cd' into the directory where you installed the connector and use "pip" to install the requirements in the "requirements.txt" file.  These are python packages required for the Isilon connector


pip install -r requirements.txt

 

6.  Copy the example config file to make it the real config file in the connector directory


cp example_isi_data_insights_d.cfg isi_data_insights_d.cfg

 

7.  Edit the isi_data_insights_d.cfg file and add your Isilon cluster name(s) to the "clusters:" line.  You will need to add the first Isilon cluster name on the SAME line as the "cluster:" string.  If you have a second Isilon cluster to monitor, you can add it to the same line with a space in between (like the example below) or with a line break separating the two cluster names.  Just make sure the first cluster name is on the same line as the "clusters:" line and things will work. 

 

clusters: demo.keith.com demo7.keith.com

 

8.  Configure your yum repository for the influxdb.repo and install influxdb with yum.  Edit your Linux /etc/yum.repos.d/influxdb.repo file and add the text:

 

[influxdb]

name = InfluxDB Repository - RHEL \$releasever

baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable

enabled = 1

gpgcheck = 1

gpgkey = https://repos.influxdata.com/influxdb.key

 

Then run the commands:

 

yum install -y influxdb

service influxdb start

 

9.  Configure your yum repository for the grafana.repo and install grafana with yum.  Edit your Linux /etc/yum.repos.d/grafana.repo file and add the text:

 

[grafana]

name=grafana

baseurl=https://packagecloud.io/grafana/stable/el/6/$basearch

enabled=1

gpgcheck=1

gpgkey=https://packagecloud.io/gpg.key https://grafanarel.s3.amazonaws.com/RPM-GPG-KEY-grafana

 

Then run the commands:

 

yum install -y grafana

service grafana-server start

 

10.  Start the Isilon Data Insights Connector with the python script below and enter the root user credentials for each Isilon cluster when prompted.  You can optionally configure the credentials in the isi_data_insights_d.cfg file in the "clusters:" line, see the notes in the cfg file comments.  Also you can decide if you'd like to use SSL for the connection, I am choosing "No" in my example.    If successful, your output should look similar to what I've posted below.

 

# ./isi_data_insights_d.py start


Please provide the username used to access demo.keith.com via PAPI: root

Password:

Verify SSL cert [y/n]: n

Configured demo.keith.com as version 8 cluster, using SDK isi_sdk_8_0.

Please provide the username used to access demo7.keith.com via PAPI: root

Password:

Verify SSL cert [y/n]: n

Configured demo7.keith.com as version 7 cluster, using SDK isi_sdk_7_2.

Computing update intervals for stat group: cluster_cpu_stats.

Computing update intervals for stat group: cluster_network_traffic_stats.

Computing update intervals for stat group: cluster_client_activity_stats.

Computing update intervals for stat group: cluster_health_stats.

Computing update intervals for stat group: ifs_space_stats.

Computing update intervals for stat group: ifs_rate_stats.

Computing update intervals for stat group: node_load_stats.

Computing update intervals for stat group: cluster_disk_rate_stats.

Computing update intervals for stat group: cluster_proto_stats.

Computing update intervals for stat group: cache_stats.

Computing update intervals for stat group: heat_total_stats.

Configured stat set:

  Clusters: [demo, demo7]

  Update Interval: 300

  Stat Keys: set(['ifs.percent.free', 'ifs.bytes.free', 'ifs.bytes.used', 'ifs.bytes.avail', 'ifs.bytes.total', 'ifs.percent.avail', 'ifs.percent.used'])

Configured stat set:

  Clusters: [demo, demo7]

  Update Interval: 30

  Stat Keys: set(['cluster.protostats.lsass_out.total', 'node.open.files', 'cluster.protostats.siq.total', 'node.clientstats.connected.ftp', 'cluster.net.ext.bytes.out.rate', 'node.clientstats.active.smb2', 'cluster.protostats.siq', 'cluster.protostats.irp.total', 'ifs.ops.out.rate', 'node.clientstats.connected.cifs', 'cluster.cpu.idle.avg', 'cluster.net.ext.packets.out.rate', 'node.ifs.heat.rename.total', 'ifs.bytes.out.rate', 'cluster.protostats.irp', 'node.load.5min', 'node.memory.used', 'cluster.protostats.nlm.total', 'node.clientstats.active.hdfs', 'node.clientstats.connected.nlm', 'cluster.protostats.nfs4.total', 'cluster.protostats.nlm', 'node.ifs.heat.read.total', 'node.clientstats.active.nlm', 'cluster.health', 'cluster.disk.bytes.out.rate', 'cluster.disk.bytes.in.rate', 'node.ifs.heat.link.total', 'cluster.protostats.papi.total', 'node.ifs.heat.setattr.total', 'cluster.protostats.smb2.total', 'cluster.protostats.jobd', 'cluster.net.ext.packets.in.rate', 'cluster.protostats.lsass_out', 'cluster.protostats.hdfs.total', 'node.clientstats.connected.hdfs', 'cluster.disk.xfers.in.rate', 'node.clientstats.active.nfs4', 'node.clientstats.active.ftp', 'node.ifs.heat.deadlocked.total', 'cluster.protostats.nfs4', 'node.clientstats.connected.papi', 'node.ifs.cache', 'cluster.protostats.lsass_in.total', 'node.ifs.heat.lock.total', 'cluster.net.ext.errors.out.rate', 'node.clientstats.connected.siq', 'node.clientstats.active.lsass_out', 'node.ifs.heat.write.total', 'node.clientstats.active.cifs', 'cluster.protostats.nfs', 'cluster.node.count.all', 'node.load.15min', 'cluster.protostats.cifs', 'node.clientstats.active.nfs', 'cluster.net.ext.bytes.in.rate', 'cluster.protostats.nfs.total', 'node.ifs.heat.getattr.total', 'node.ifs.heat.unlink.total', 'cluster.protostats.http', 'node.ifs.heat.blocked.total', 'node.clientstats.active.jobd', 'cluster.protostats.hdfs', 'cluster.protostats.cifs.total', 'cluster.protostats.ftp', 'node.clientstats.active.http', 'node.load.1min', 'cluster.node.count.down', 'cluster.cpu.intr.avg', 'ifs.ops.in.rate', 'cluster.protostats.lsass_in', 'node.memory.cache', 'cluster.cpu.user.avg', 'cluster.disk.xfers.out.rate', 'cluster.protostats.http.total', 'cluster.disk.xfers.rate', 'node.clientstats.active.siq', 'cluster.net.ext.errors.in.rate', 'cluster.protostats.ftp.total', 'cluster.cpu.sys.avg', 'node.ifs.heat.contended.total', 'cluster.protostats.jobd.total', 'node.ifs.heat.lookup.total', 'cluster.protostats.smb2', 'node.clientstats.active.papi', 'node.clientstats.connected.http', 'ifs.bytes.in.rate', 'node.clientstats.connected.nfs', 'cluster.protostats.papi', 'node.memory.free', 'node.cpu.throttling'])

 

Configuring Grafana and importing sample dashboards

 

Everything is up and running and we now need to configure Grafana to set the InfluxDB database as a data source and then import the four prebuilt sample dashboards.

 

1.  Log into Grafana which is port 3000 of the Linux VM (https://ip.address:3000).  The default login and password are "admin".

 

2.  Grafana will automatically ask to create a data source.  See the screenshot below, the data source should be type "InfluxDB" and the database name is "isi_data_insights".  The other options are captured in the screenshot, we are using basic authentication with no credentials (you can change this later if you like).

 

Screenshot - Configure Grafana Data Source

Screen Shot 2017-01-20 at 12.58.06 PM.png

 

3.  Download the zip file of the Isilon Data Insights Connector to your host/laptop that is connecting to the Grafana web page.  Extract the zip file and find the four (4) prebuilt dashboards which are the four JSON files.  Note the location of the four JSON files for the next step.

 

4.  Import the four JSON files into four dashboards.  Click on the "new dashboard" button in the upper left hand corner of the Grafana page and select the "Import" button at the bottom.

 

Screenshot - New Dashboard - Import

Screen Shot 2017-01-20 at 12.49.24 PM.png

 

Click the "Upload .json File" button a browse the the first of the four JSON files in the zip file from step #3.  Repeat for all four JSON files below:

 

grafana_ cluster_capacity_utiltization_dashboard.json

grafana_cluster_detail_dashboard.json

grafana_cluster_list_dashboard.json

grafana_cluster_protocol_dashboard.json

 

Screenshot - Upload .json file

Screen Shot 2017-01-20 at 12.49.33 PM.png

 

5.  You are done!  Pull up each dashboards to test and look at your clusters.  At first there will be no data, you will need to wait a few minutes for statistics to start populating the database (hint - refresh every 30 seconds in the upper right hand corner to start seeing data immediately).

 

Screenshot - Prebuilt Dashboards after import

Screen Shot 2017-01-25 at 5.34.15 PM.png

 

Using Grafana with Isilon Data Insights Connector

 

Prebuilt Dashboards

 

The four prebuilt dashboards give you the ability to do advanced cluster and performance monitoring without doing any customization.

 

Isilon Data Insight Cluster Summary

Single pane of glass view to all your clusters on a single screen with a few key real time metrics for monitoring total nodes, nodes down, alert status, cpu, capacity, NFS throughput/ops/latency, and SMB throughput/ops/latency with a user defined refresh rate.

 

Isilon Data Insights Cluster Detail

Detailed view of a single cluster with metrics for CPU, capacity utilization, protocol ops, client connections, open files, network traffic, filesystem throughput, disk throughput, network errors, job engine activity, filesystem events, and cache stats.  The dashboard can be used for real time monitoring with a user defined refresh rate or can be used to go back and look at historical data.  Both options are defined in the upper right hand corner with a time range and refresh rate.

 

Isilon Data Insights Cluster Capacity Utilization Table

Single view of all clusters capacity utilization level.  Great for monitoring your clusters when they are approaching 90-95% full.

 

Isilon Data Insights Protocol Detail

Detailed view of single cluster protocol details.  Can filter on selected protocol (SMB/NFS/FTP/FTP/PAPI/SyncIQ) and get very detailed information including protocol connections, protcol ops graphed against cluster CPU, protocol operational mix, throughput, latency.  Again, this dashboard can monitor real time stats or go back in time and do a detailed analysis of historical data.

 

Customizing Dashboards

 

This is probably what you have been waiting for, how do you edit and customize one of the prebuilt dashboards or even build your own?  I won't go into every detail but its pretty easy to do since you have access to edit and change everything in the prebuilt dashboards.

 

Let's take a look at one of the charts to see how a graph as constructed.  I'm going to look at "Cluster SMB Operations and CPU" which charts SMB2 operations per second and correlates to the cluster CPU utilization (part of the Isilon Data Insights Protocol Detail dashboard).  Every dashboard and graph can be edited to show how the graph was built in Grafana.  Just click the graph and a menu pops up above it (screenshot below), select "Edit".

 

Screenshot - Cluster SMB2 Operations and CPU

Screen Shot 2017-01-26 at 8.29.59 AM.png

 

Once you edit a graph you will see the SELECT FROM WHERE type query that was used to query the InfluxDB database and built the graph.  Below you will see two SELECT statements that were used for the SMB OPS and CPU correlation . The first query was used for the CPU utilization and the second was used for the SMB ops chart.

 

Screenshot - Graph Metrics

Screen Shot 2017-01-26 at 8.32.06 AM.png

 

What if I wanted to add a third query to this existing graph?  All I need to do is click the "+Add query" button and then create my own SQL query from the interface.  I'm going to add a very simple query for "node.clientstats.active.smb2" to also graph the active SMB clients against the SMB ops and CPU.  See the screenshot for my added query.

 

Screenshot - Add a third query to an existing graph

Screen Shot 2017-01-26 at 8.38.11 AM.png

If I close out the graph metric interface (X on the right) and go back to my main protocol dashboard, I can now see the graph now has three metrics including the active SMB clients I added in my query (screenshot below).

 

Screenshot - SMB ops and CPU with active clients

Screen Shot 2017-01-26 at 8.40.12 AM.png

 

Pretty easy right?  Getting the SQL queries takes a bit of work and the best way to build your own queries is to copy those that were already used to build the prebuilt dashboards.  You can even manually run SQL queries from the Linux command line, simply use the InfluxDB interface (type 'influx' and 'help').

 

Summary

 

Isillon InsightIQ is the preferred and supported product for Isilon cluster performance monitoring. But what if you want the ability to manage all of your Isilon performance data in an open source database and have full access to create your own visualization dashboards?  Then look at the Isilon Data Insight Connector project and use the steps above to build your own Linux VM with this connector, InfluxDB, and Grafana for full control of your Isilon monitoring and performance statistics.  The solution is very easy to use and offers many possibilities to create custom queries and dashboard views without writing any code and using open source products.   Please provide feedback through the GitHub page for any issues, this will help the community and keep communications central to GitHub.  Thanks for reading, comments welcome!

Unstructured data and new Gartner magic quadrant

 

Gartner published their first ever Magic Quadrant for Distributed File Systems and Object Storage (Gartner Reprint) and recognized Dell EMC as a leader for both Isilon (distributed file system) and ECS (object storage) storage platforms.  Unstructured data is growing at a pace that is unnerving.  Gartner gives the estimate of 40% year on year growth unstructured data for some enterprises.  How can anyone store that much data and keep costs down?  Consolidation of data and storage efficiency are two ways and Isilon excels at both as seen in the report.

 

                                   gartner-mq-isilon-ecs.jpg

 

The report also predicts that by 2021 "...more than 80% of enterprise data will be stored in scale-out storage systems in enterprise and cloud data centers, up from 30% today".  Unstructured file data already makes up a large percentage of the total capacity stored by enterprises today and in 5 years will only get larger.  This is why both cloud providers and enterprises are looking to distributed scale-out systems rather than traditional NAS architectures to keep up with unstructured file growth.

 

Another platform becoming mainstream for unstructured data is Hadoop.  Hadoop runs its own distributed scale-out filesystem (HDFS) with the ability to store extremely large datasets and process that data using a distributed computing model.  Hadoop is traditionally run using servers with direct attached storage (DAS) but can also run using Isilon shared storage.  Both scale very well since capacity is added in the form of additional nodes (Hadoop compute nodes or Isilon nodes) that also add CPU, RAM, and networking with each capacity addition.  Hadoop and Isilon can combine to help solve this predicted unstructured data explosion with complimentary architectures and solutions.

 

HBase

 

So what do you do with all this unstructured data once you store it?  Why keep petabytes of unstructured storage?  Things start to make more sense when you introduce Hadoop ecosystem frameworks and applications that can actually process this unstructured data.  HBase is a popular Hadoop component and is a NoSQL database that runs on the Hadoop filesystem (HDFS).  HBase is different than a relational database like Oracle or Microsoft SQL in that it is a columnar database that stores database cells as flat files on HDFS with multiple database hosts "owning" distributed pieces of the flat files.  Because of this architecture HBase scales much larger than a typical relational database in terms of capacity and compute resources.  In order to scale HBase needs a scale-out file system like HDFS to store large amounts of always increasing volumes of data.

 

Let's dig a little deeper into HBase so we can understand the architecture before we talk about backups. HBase scales by using multiple server roles to distribute the computing power over the HDFS filesystem.  The master HBase processes (HMaster run by Zookeeeper) controls the compute oriented region servers (HBase region servers in the diagram).  HBase database tables are broken up into ranges called regions with a start key and an end key.  The regions are then assigned to region servers across the HBase cluster for a divide and conquer approach to scaling.  A metadata table runs the entire database and maps all the regions to the various region servers.  So instead of a monolithic large database host we have many region servers that own "chunks" of the database files and can work in parallel to perform a distributed jobs.

 

                         hqdefault.jpg

 

Source : An Introduction to Apache HBase - YouTube

 

Client writes to HBase are protected by a Write Ahead Log (WAL) that is written to HDFS and protects data that hasn't yet been committed to permanent storage (think transactional logging).  There are two levels of caching (read = BlockCache, write = MemStore) and data is eventually persisted to a flat HFiles on HDFS.  One last concept is that everything stored on HDFS is append-only which makes all I/O a sequential write or a sequential read.  Writes coming into HBase are first appended to the WAL, placed into the MemStore, sorted, and flushed to HDFS in the form of HFiles.

 

So how do I backup HBase?

 

Understanding the HBase architecture should lead you to the conclusion that you can't simply backup the flat HFiles while HBase is running.  A backup of a flat HFile with HBase running will not contain the data in-flight from the WAL or the MemStore unless HBase is shut down.  Shutting down HBase is disruptive to production environments and backing up entire flat offline HFiles also does not lend itself to any type of incremental backup.  So we need to look at alternative ways of backing up HBase tables while HBase is actively running that will capture the in-flight data and give options for incremental backups.

 

Cloudera has a great blog post from a few years ago describing the various approaches to backing up HBase, see the link below.  The backup methods outlined include snapshots, replication, export/copytable, API backups, and the brute force offline HDFS backup process.  Let's briefly look at each method and decide which to explore when using Isilon as a target.

 

http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/

 

Snapshots

HBase snapshots are quick and easy to run, require no downtime, and are incremental by design.  Snapshots are simply metadata pointers to existing HFile blocks that allow you to roll back a table to a previous snapshot.  Snapshots are local to the HBase Hadoop cluster where they are taken and aren't technically a backup unless they are moved to secondary media.  We'll explore the process of taking snapshots and moving them to an Isilon as a backup method for HBase. 

 

Replication

Replication from an active HBase Hadoop cluster to a passive HBase Hadoop cluster works for a disaster recover scenario but the same argument could be made for replication as we made for snapshots.  Unless we move the data to secondary media (like an Isilon) then there is a chance of rolling corruption.  We will not detail HBase replication but instead focus on ways to capture HBase table data and move to an Isilon as a secondary media copy.

 

Export/copytable

Exporting a table is an alternative to snapshots and exports data in an HBase table to a plain flat file in HDFS.  The export command takes an argument of Unix start time and stop time which allows incremental backups when the commands are scripted.  Exported tables are also portable which means we can move them to secondary media and still have the ability to bring them back as a restore copy if necessary.   We will explore table exports to secondary media as a backup solution in addition to snapshots.

 

Copytable is similar to an export but requires interaction with Hadoop Zookeeper services.  Isilon doesn't natively run Zookeeper daemons for copytable compatibility so we will omit using copytable in this exercise.

 

API / offline backups

Developers can utilize various APIs to write customer backups but we will omit for this blog post.  Taking HBase offline for flat file backups are also too disruptive and not conducive to incremental backups so will also be omitted.

 

Why Isilon for HBase backups?

 

So why is Isilon a good solution for Hadoop and specifically for HBase backups?  Why not just stand up a remote Hadoop cluster for backup storage?

 

Please note, my intent is to outline the process of backing up HBase running on HDFS direct attached storage (DAS) to a secondary Isilon NAS storage system.  If you are using Isilon already for shared HDFS storage this blog post is not relevant!  Running HBase directly on Isilon introduces independent snapshots on both the active and the passive sites to allow for a rollback of snapshots if a rolling corruption occurs.  If you are running HBase on DAS then read on!

 

Native HDFS protocol on Isilon

Isilon in an enterprise grade scale-out NAS storage system that can export HDFS shares to Hadoop clusters.  The HDFS protocol is supported on the Isilon by running a namenode and datanode daemons within the Isilon OneFS operation system.  Hadoop clusters with an HDFS share on the Isilon can treat the Isilon as a remote Hadoop cluster and move data using the HDFS protocol.  There is no need to build a remote Hadoop cluster with an Isilon, the namenode and datanode services are embedded.  Data can be written to and read from the Isilon using HDFS calls which make it an ideal backup target to move data off Hadoop clusters using HDFS.

 

 

                    image3_zpsxvrbpaol.jpg

 

Parallel mapreduce jobs to Isilon

Mapreduce jobs are parallel in nature because they run on distributed Hadoop compute nodes.  Isilon scales in a similar fashion by adding Isilon nodes to grow the Isilon cluster.  The more Isilon nodes a cluster has, the more CPU, RAM, and ethernet connectivity the Isilon cluster contains.  Highly parallelized mapreduce jobs work well with an Isilon because the Isilon distributes this work across multiple Isilon nodes for high bandwidth and efficient use of resources on both the Hadoop and the Isilon cluster.

 

Multi-protocol support

Data written to the Isilon using HDFS can be accessed using other NAS supported protocols.  If your HBase cluster writes a backup to a remote Isilon using HDFS, the same data can then be shared out via NFS, SMB, HTTP, Swift, or FTP.  This makes your HBase snapshots or exports very portable because you don't need a Hadoop client to access that data once it is written to the Isilon.  Which means your workflow can eliminate scratch spaces that are often used to get data from a Hadoop cluster to another location that supports NAS protocols.  Backup HBase tables to the Isilon and then pull them off from a Windows desktop using SMB.  Backup HBase snapshots to the Isilon and pull them off using a remote FTP client.  Use the Isilon to ingest data using its native HDFS protocol capabilities and access that same exact data using any other supported Isilon NAS protocol.

 

Storage efficiency and enterprise grade storage

Isilon is not only convenient for HBase backups but it brings efficiencies and enterprise features that are not available in native HDFS.  Isilon uses erasure coding rather than Hadoop's default 3X mirroring so there are space savings introduced when storing Hadoop data on Isilon that are not possible on native HDFS.  Isilon is by nature a scale-out fileystem with a single namespace and can grow extremely large simply and quickly by adding additional Isilon nodes (65PB RAW at the time of this writing).  Isilon brings enterprise storage features customers would expect from a Dell EMC solution including storage snapshots, offsite storage replication, self encrypted drives, storage quotas, and detailed storage analytic reporting.  So Isilon is not only convenient for Hadoop but brings additional storage benefits that an justify the use of a scale-out NAS system for Hadoop data.

 

The details of HBase backups to Isilon

 

Ok, let's finally get to the details!  First we'll show a bit of the setup and then talk about HBase exports and snapshots stored on an Isilon.

 

Setup

My setup is running a Cloudera CDH 5.5 Hadoop cluster along with an Isillon running OneFS 8.0.0.2.  Getting the Isilon working with CDH is straightforward and documented in the Isilon Hadoop starter kit for Cloudera in the link below.  The document is intended for use with VMware and BDE (big data extensions) but I've just used the "prepare Isilon" section and the "functional tests" and skipped the other sections.

 

EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions — EMC Isilon Hadoop Starter Kit for Cloudera …

 

When the Isilon setup is complete I can issue the following command from any of my Hadoop nodes and the Isilon appears as simply a remote Hadoop cluster with a DNS name of "cloudera.isilon.keith.com" in my setup.  Any Hadoop command that allows a remote Hadoop cluster as a command line argument can use this remote Isilon.  All commands will results in standard HDFS protocol output since Isilon fully supports HDFS natively.

 

For example, if I run an 'ls' against the remote Isilon I see the HDFS filesytem supported on that remote Isilon which looks exactly like a remote Hadoop cluster.

 

CDH node

# hdfs dfs -ls hdfs://cloudera.isilon.keith.com/

Found 6 items

-rw-r--r--   3 root  wheel               0 2016-11-03 08:48 hdfs://cloudera.isilon.keith.com/THIS.IS.MY.FILE

drwxr-xr-x   - hbase hbase               0 2016-11-03 12:28 hdfs://cloudera.isilon.keith.com/hbase

drwxr-xr-x   - root  wheel               0 2016-11-03 12:22 hdfs://cloudera.isilon.keith.com/home

drwxrwxr-x   - solr  solr                0 2016-11-03 08:56 hdfs://cloudera.isilon.keith.com/solr

drwxrwxrwt   - hdfs  supergroup          0 2016-11-03 08:56 hdfs://cloudera.isilon.keith.com/tmp

drwxr-xr-x   - hdfs  supergroup          0 2016-11-03 08:56 hdfs://cloudera.isilon.keith.com/user

 

This works well with the HBase table 'export' command and the HBase 'exportsnapshot' command because both accept a remote Hadoop cluster as an argument as long as its a valid DNS name that supports HDFS.  All of our backup commands will run to this remote Isilon and will operate as if they are running to a remote Hadoop cluster (ie, a remote Hadoop namenode)

 

The HBase table export and snapshot export commands all spawn mapreduce jobs that split the work into manageable pieces across the Hadoop compute nodes.  This pairs well with Isilon also since an Isilon will load balance connections across the member Isilon nodes for a truly distributed job not only across the Hadoop compute nodes but also across the Isilon nodes for better resource utilization and network bandwidth.

 

I've create a simple table named "analytics" on my HBase cluster and have imported approximately 65MB of data in the table.

 

CDH node

hbase(main):008:0> list

TABLE                                                                       

analytics                                                                   

events                                                                      

2 row(s) in 0.1600 seconds

 

=> ["analytics", "events"]

 

Table export to local HDFS

Our first method for backing up an HBase table and sending to secondary media (Isilon) is using the HBase 'export' command.  The command simply takes a table name as an input and an output directory as the target for the flat file HBase table backup.  The export process spawns a distributed mapreduce job and flushes all data in-flight from the WAL and MemStore to create a portable flat file backup in the output directory.

 

Let's first do an export to the local HDFS filesystem on our CDH cluster.  I'm exporting the "analytics" table and putting the results in the local /hbase/new.export directory.

 

CDH node

# sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export analytics /hbase/new.export

16/11/07 14:14:59 INFO mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false

<snip>

16/11/07 14:15:10 INFO mapreduce.Job:  map 0% reduce 0%

16/11/07 14:15:26 INFO mapreduce.Job:  map 100% reduce 0%

16/11/07 14:15:27 INFO mapreduce.Job: Job job_1478545483489_0001 completed successfully

16/11/07 14:15:27 INFO mapreduce.Job: Counters: 30

  File System Counters

<snip>

  Bytes Written=68181737

 

The export was successful and wrote the 65MB flat export file to the local HDFS filesystem on my CDH cluster.

 

CDH node

# sudo -u hdfs hdfs dfs -ls /hbase/new.export

Found 2 items

-rw-r--r--   3 hdfs hbase          0 2016-11-07 14:15 /hbase/new.export/_SUCCESS

-rw-r--r--   3 hdfs hbase   68181737 2016-11-07 14:15 /hbase/new.export/part-m-00000

 

Table export to remote Isilon

We want to run the exact same export command to a remote Isilon directory to remove the backup file from the source Hadoop cluster and store on secondary media which is the Isilon.  What is nice about the HBase 'export' command is that is will accept the remote Isilon DNS name as the target provided we precede it with the 'hdfs://' designation.  I'll run the same 'export' command but instead of using a local HDFS directory I will use a remote HDFS directory with the 'hdfs://' prefix follows by my remote Isilon DNS name.  The command runs exactly like a backup to a remote HDFS filesystem and moves the data via mapreduce to a remote Isilon cluster.

 

CDH node

# sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export analytics hdfs://cloudera.isilon.keith.com/hbase/new.export

16/11/07 14:22:10 INFO mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false

<snip>

16/11/07 14:22:13 INFO util.RegionSizeCalculator: Calculating region sizes for table "analytics".

16/11/07 14:22:13 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService

16/11/07 14:22:13 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x158402ca2070054

16/11/07 14:22:13 INFO zookeeper.ZooKeeper: Session: 0x158402ca2070054 closed

16/11/07 14:22:13 INFO zookeeper.ClientCnxn: EventThread shut down

16/11/07 14:22:13 INFO mapreduce.JobSubmitter: number of splits:1

16/11/07 14:22:13 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

16/11/07 14:22:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478545483489_0002

16/11/07 14:22:13 INFO impl.YarnClientImpl: Submitted application application_1478545483489_0002

16/11/07 14:22:13 INFO mapreduce.Job: The url to track the job: http://nodeone.keith.com:8088/proxy/application_1478545483489_0002/

16/11/07 14:22:13 INFO mapreduce.Job: Running job: job_1478545483489_0002

16/11/07 14:22:18 INFO mapreduce.Job: Job job_1478545483489_0002 running in uber mode : false

16/11/07 14:22:18 INFO mapreduce.Job:  map 0% reduce 0%

16/11/07 14:22:31 INFO mapreduce.Job:  map 100% reduce 0%

16/11/07 14:22:32 INFO mapreduce.Job: Job job_1478545483489_0002 completed successfully

16/11/07 14:22:32 INFO mapreduce.Job: Counters: 30

  File System Counters

<snip>

  Bytes Written=68181737

 

I've left a bit more of the logging text above but the results are the same, a 65MB export file was written remotely to the Isilon using HDFS RPC calls and a distributed mapreduce job to move data to the Isilon.  I'll confirm the output with an 'ls' command to the remote Isilon.

 

CDH node

# sudo -u hdfs hdfs dfs -ls hdfs://cloudera.isilon.keith.com/hbase/new.export

Found 2 items

-rw-r--r--   3 root wheel          0 2016-11-07 19:22 hdfs://cloudera.isilon.keith.com/hbase/new.export/_SUCCESS

-rw-r--r--   3 root wheel   68181737 2016-11-07 14:25 hdfs://cloudera.isilon.keith.com/hbase/new.export/part-m-00000

 

How would I run an incremental backup?  I would simply use the -starttime and -endtime arguments with my export command which both use decimal milliseconds since Unix epoch time.  Basically I would run my first baseline export and then use the starttime/endtime arguments to run the incrementals in some type of script that tracks the time since the last backup and the current time.

 

Simple right?  Just use standard HBase 'export' command but instead of sending the output to a local HDFS filesystem directory we use an Isilon DNS name and the 'hdfs://' prefix to an Isilon with an HDFS NAS share.  The HDFS license is also free for any Isilon customer .

 

Snapshot export to remote Isilon

Snapshots as discussed earlier are metadata points to blocks in flat HFiles.  When the HBase data changes, the HFiles are appended and snapshots would contain the metadata pointers to the previous versions of the flat HFiles.   Restoring a table from a previous snapshot would simply revert the HFiles to older versions for a table restore.

 

We also mentioned that local snapshots have a drawback since they aren't on secondary media.  We are looking to export these snapshots off the Hadoop cluster for a "safe" copy that we can use later if necessary.  We can't just move the snapshots though, they would not be useful unless we also moved their corresponding flat HFiles along with them.  The HBase 'ExportSnapshot' command not only exports the snapshot (metadata pointers) but also the HFiles that the snapshots reference.  Both the snapshot and the HFile references are useful but they cannot be separated.  So we must run the 'ExportSnapshot' command to get a usable snapshot exported to an Isilon.

 

I first create a snapshot of the "analytics" table using the HBase shell interface.  I'm using the 'snapshot' shell command to take the snapshot and verifying with the 'list_snapshots' command.  I've creatively named the snapshot "snapshot" .

 

CDH node

# hbase shell

16/11/07 14:40:57 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.0-cdh5.7.4, rUnknown, Tue Sep 20 16:03:12 PDT 2016

 

hbase(main):001:0> snapshot 'analytics', 'snapshot'

0 row(s) in 0.4980 seconds

 

hbase(main):003:0> list_snapshots

SNAPSHOT                             TABLE + CREATION TIME                                       

snapshot                            analytics (Mon Nov 07 14:41:07 -0500 2016)                  

1 row(s) in 0.0250 seconds

 

I don't need to demonstrate this export to a local filesystem because my local Hadoop cluster already contains the snapshots and the HFiles.  So I'm going to export my snapshot named "snapshot" to my remote Isilon cluster with the 'hdfs://' prefix and my output directory is a subdirectory called "snapshots" under the existing "hbase" remote directory.  I'm using 16 mappers for my mapreduce job, this can be adjusted however necessary based on the Hadoop compute resources.

 

CDH node

# sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot -copy-to hdfs://cloudera.isilon.keith.com/hbase/snapshots -mappers 16

 

16/11/07 14:43:01 INFO snapshot.ExportSnapshot: Copy Snapshot Manifest

16/11/07 14:43:02 INFO client.RMProxy: Connecting to ResourceManager at nodeone.keith.com/192.168.0.29:8032

16/11/07 14:43:04 INFO snapshot.ExportSnapshot: Loading Snapshot 'snapshot' hfile list

16/11/07 14:43:04 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

16/11/07 14:43:04 INFO mapreduce.JobSubmitter: number of splits:3

16/11/07 14:43:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478545483489_0004

16/11/07 14:43:04 INFO impl.YarnClientImpl: Submitted application application_1478545483489_0004

16/11/07 14:43:04 INFO mapreduce.Job: The url to track the job: http://nodeone.keith.com:8088/proxy/application_1478545483489_0004/

16/11/07 14:43:04 INFO mapreduce.Job: Running job: job_1478545483489_0004

16/11/07 14:45:58 INFO mapreduce.Job: Job job_1478545483489_0004 running in uber mode : false

16/11/07 14:45:58 INFO mapreduce.Job:  map 0% reduce 0%

16/11/07 14:46:04 INFO mapreduce.Job:  map 33% reduce 0%

16/11/07 14:46:10 INFO mapreduce.Job:  map 67% reduce 0%

16/11/07 14:46:11 INFO mapreduce.Job:  map 100% reduce 0%

16/11/07 14:46:11 INFO mapreduce.Job: Job job_1478545483489_0004 completed successfully

16/11/07 14:46:11 INFO mapreduce.Job: Counters: 37

  File System Counters

  FILE: Number of bytes read=0

  FILE: Number of bytes written=460278

  FILE: Number of read operations=0

  FILE: Number of large read operations=0

  FILE: Number of write operations=0

  HDFS: Number of bytes read=71588835

  HDFS: Number of bytes written=71588238

  HDFS: Number of read operations=15

  HDFS: Number of large read operations=0

  HDFS: Number of write operations=9

  Job Counters

  Launched map tasks=3

  Other local map tasks=3

  Total time spent by all maps in occupied slots (ms)=18384

  Total time spent by all reduces in occupied slots (ms)=0

  Total time spent by all map tasks (ms)=18384

  Total vcore-seconds taken by all map tasks=18384

  Total megabyte-seconds taken by all map tasks=18825216

  Map-Reduce Framework

  Map input records=3

  Map output records=0

  Input split bytes=597

  Spilled Records=0

  Failed Shuffles=0

  Merged Map outputs=0

  GC time elapsed (ms)=108

  CPU time spent (ms)=3570

  Physical memory (bytes) snapshot=980393984

  Virtual memory (bytes) snapshot=4701536256

  Total committed heap usage (bytes)=913833984

  org.apache.hadoop.hbase.snapshot.ExportSnapshot$Counter

  BYTES_COPIED=71588238

  BYTES_EXPECTED=71588238

  BYTES_SKIPPED=0

  COPY_FAILED=0

  FILES_COPIED=3

  FILES_SKIPPED=0

  MISSING_FILES=0

  File Input Format Counters

  Bytes Read=0

  File Output Format Counters

  Bytes Written=0

16/11/07 14:46:11 INFO snapshot.ExportSnapshot: Finalize the Snapshot Export

16/11/07 14:46:11 INFO snapshot.ExportSnapshot: Verify snapshot integrity

16/11/07 14:46:11 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS

16/11/07 14:46:11 INFO snapshot.ExportSnapshot: Export Completed: snapshot

 

I've left most of the output above so readers can see the results.  The output size differs from the table export but is approximately the same.  The resulting files on the remote Isilon are a bit more complex than a simple export output.  The 'ExportSnapshot' command separates the flat HFiles from the metadata in different output directories and the names are obscured.  However, if I log into the console of my Isilon I can dig into the /hbase/snapshots directory and find my "analytics" snapshot flat file that is 65MB just like my export output.  The names aren't friendly but the results are similar.

 

Isilon console

# pwd

/ifs/demo8/cloudera/hdfs/hbase/snapshots/archive/data/default/analytics/b89a2f75eb52c49169fb9e92426f02e5/day

demo8-3# ls -lah

total 98430

drwxr-xr-x    2 root  wheel    50B Nov  7 19:49 .

drwxr-xr-x    5 root  wheel    66B Nov  7 19:49 ..

-rw-r--r--    1 505   505      64M Nov  8 00:46 492380d4eacb496da6c566b547dc5763

 

So that's it, we took an existing HBase snapshot and moved the metadata and the flat file over to a remote Isilon using the HDFS protocol and the HBase 'ExportSnapshot' command.  This allows us to take snapshots and move that data off the Hadoop cluster an onto secondary media for a backup / archive.

 

HBase table restore from an export

The HBase 'export' command has a corresponding 'import' command for table restores from backup.  If we ever needed to pull a backup from the Isilon and restore a table we could do everything remotely using the 'import' command and give the remote Isilon as the export source with the 'hdfs://' prefix.  This eliminates the need to copy around flat files, if we know the location on the Isilon we can just pull it directly to the HBase cluster.  Note, the table must exist with the correct HBase columns (schema if you will) for the import to work, this won't create the table from scratch.  Read the 'import' documentation for more info.

 

The 'import' command is similar to the export command, I just give the remote location of the Isilon 'export" output and it will pull the data back into my "analytics" table.

 

CDH node

# sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Import analytics hdfs://cloudera.isilon.keith.com/hbase/new.export

16/11/08 09:20:03 INFO client.RMProxy: Connecting to ResourceManager at nodeone.keith.com/192.168.0.29:8032

16/11/08 09:20:04 INFO input.FileInputFormat: Total input paths to process : 1

16/11/08 09:20:04 INFO mapreduce.JobSubmitter: number of splits:1

16/11/08 09:20:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478545483489_0009

16/11/08 09:20:04 INFO impl.YarnClientImpl: Submitted application application_1478545483489_0009

16/11/08 09:20:04 INFO mapreduce.Job: The url to track the job: http://nodeone.keith.com:8088/proxy/application_1478545483489_0009/

16/11/08 09:20:04 INFO mapreduce.Job: Running job: job_1478545483489_0009

16/11/08 09:20:09 INFO mapreduce.Job: Job job_1478545483489_0009 running in uber mode : false

16/11/08 09:20:09 INFO mapreduce.Job:  map 0% reduce 0%

16/11/08 09:20:21 INFO mapreduce.Job:  map 67% reduce 0%

16/11/08 09:20:24 INFO mapreduce.Job:  map 100% reduce 0%

16/11/08 09:20:24 INFO mapreduce.Job: Job job_1478545483489_0009 completed successfully

16/11/08 09:20:24 INFO mapreduce.Job: Counters: 30

  File System Counters

  FILE: Number of bytes read=0

  FILE: Number of bytes written=152132

  FILE: Number of read operations=0

  FILE: Number of large read operations=0

  FILE: Number of write operations=0

  HDFS: Number of bytes read=68871857

  HDFS: Number of bytes written=0

  HDFS: Number of read operations=3

  HDFS: Number of large read operations=0

  HDFS: Number of write operations=0

  Job Counters

  Launched map tasks=1

  Rack-local map tasks=1

  Total time spent by all maps in occupied slots (ms)=11539

  Total time spent by all reduces in occupied slots (ms)=0

  Total time spent by all map tasks (ms)=11539

  Total vcore-seconds taken by all map tasks=11539

  Total megabyte-seconds taken by all map tasks=11815936

  Map-Reduce Framework

  Map input records=1000

  Map output records=1000

  Input split bytes=127

  Spilled Records=0

  Failed Shuffles=0

  Merged Map outputs=0

  GC time elapsed (ms)=123

  CPU time spent (ms)=9910

  Physical memory (bytes) snapshot=400203776

  Virtual memory (bytes) snapshot=1572040704

  Total committed heap usage (bytes)=327680000

  File Input Format Counters

  Bytes Read=68871730

  File Output Format Counters

  Bytes Written=0

16/11/08 09:20:24 INFO mapreduce.Job: Running job: job_1478545483489_0009

16/11/08 09:20:24 INFO mapreduce.Job: Job job_1478545483489_0009 running in uber mode : false

16/11/08 09:20:24 INFO mapreduce.Job:  map 100% reduce 0%

16/11/08 09:20:24 INFO mapreduce.Job: Job job_1478545483489_0009 completed successfully

16/11/08 09:20:24 INFO mapreduce.Job: Counters: 30

  File System Counters

  FILE: Number of bytes read=0

  FILE: Number of bytes written=152132

  FILE: Number of read operations=0

  FILE: Number of large read operations=0

  FILE: Number of write operations=0

  HDFS: Number of bytes read=68871857

  HDFS: Number of bytes written=0

  HDFS: Number of read operations=3

  HDFS: Number of large read operations=0

  HDFS: Number of write operations=0

  Job Counters

  Launched map tasks=1

  Rack-local map tasks=1

  Total time spent by all maps in occupied slots (ms)=11539

  Total time spent by all reduces in occupied slots (ms)=0

  Total time spent by all map tasks (ms)=11539

  Total vcore-seconds taken by all map tasks=11539

  Total megabyte-seconds taken by all map tasks=11815936

  Map-Reduce Framework

  Map input records=1000

  Map output records=1000

  Input split bytes=127

  Spilled Records=0

  Failed Shuffles=0

  Merged Map outputs=0

  GC time elapsed (ms)=123

  CPU time spent (ms)=9910

  Physical memory (bytes) snapshot=400203776

  Virtual memory (bytes) snapshot=1572040704

  Total committed heap usage (bytes)=327680000

  File Input Format Counters

  Bytes Read=68871730

  File Output Format Counters

  Bytes Written=0

 

Snapshot restores from exported snapshots

Snapshot restores are a bit trickier and require some more research on my part.  The documentation I've seen involves copying the remote metadata and HFiles back to the original local HBase cluster in the correct locations and then restoring the snapshot from the HBase shell.  The 'clone_snapshot' can then run from the HBase shell to create a table from the restored snapshot.  I may explore this in another blog post but would welcome any feedback from those who have run this successfully.

 

Summary

 

So there you have it, two different ways to backup an HBase table and move them off to an Isilon cluster for safe keeping using standard HBase commands and the HDFS protocol.  No need to create NFS mounts on the Hadoop compute nodes, no need to setup SAMBA or FTP hosts, just a simple way to push backup data to a remote Isilon with a single HBase command.  No 3x HDFS mirroring, no need for a Hadoop backup cluster, no temporary landing zones for copying data, just a simple HDFS share on a scale-out NAS storage system!

 

As usual, thanks for reading this long post, comments welcome!

prod-icon-cloud-pools.png

 

Are you an Isilon customer?  Are you also an Iron Mountain customer?  Interested in taking advantage of economical cloud storage to extend OneFS into the cloud?  Great news, Iron Mountain now has a service offering that provides public cloud storage that is perfect as an Isilon CloudPools target!  The service was announced in April of 2016 (http://www.ironmountain.com/About-Us/Company-News/News-Categories/Press-Releases/2016/April/27.aspx) and is called IMCA (Iron Mountain Cloud Archive).  The service is based on EMC Elastic Cloud Storage (ECS - Elastic Cloud Storage - Object Storage Solutions | EMC) and is fully supported by Isilon OneFS for automated tiering of data to a cloud storage provider. 


The goal of this blog is to highlight some of the scoping, planning, and implementation steps required when using Iron Mountain's Cloud Archive (abbreviated as IMCA going forward) with Isilon CloudPools.  I've gone through the deployment and am sharing my results to save you time and hopefully help you plan with as much information as possible.

 

As of this blog posting Isilon CloudPools supports the following platforms as a CloudPools target:

 

- Service Providers running EMC ECS (for example, Iron Mountain's IMCA)

- EMC Virtustream

- Private cloud (internal on-premises EMC ECS or EMC Isilon)

- Amazon S3

- Microsoft Azure

 

So why look at Iron Mountain?  Iron Mountain has great reputation for providing offsite data backup and recovery so using their services for a CloudPools target makes perfect sense.  Many EMC customers also already have existing business relationships with Iron Mountain and trust Iron Mountain with their data.  So why not consider Iron Mountain for sending Isilon data offsite to a trusted service provider and partner of EMC?  Iron Mountain provides an excellent way to get started using EMC object storage with a well known service provider.  And by the way, the CloudPools license is free if you tier to Iron Mountain since it uses EMC ECS, another benefit!

 

Isilon CloudPools Scoping

 

Reach out to your Isilon presales engineer to help you with this step.  You will need to determine two things:


- How much data on your Isilon cluster is inactive and can be moved to the cloud

- How you will move that data using the OneFS SmartPools software


Your Isilon presales engineer can assist you with scanning the data and finding the best metadata policy for CloudPools.  Why do you need to do this?  A rough estimate of cold or inactive data residing on your Isilon cluster could be inaccurate without the right tools and could lead to unexpected costs from your cloud provider.  Also, if you tier data that is still active it could lead to poor application performance since data will be recalled from the cloud provider which will result in higher storage response times (latency).  


What tools are used and how can you get this information?  See my previous blog post below for a detailed explanation of the process used and how your Isilon presales engineer will work with you to get the best information possible from your existing data. 

 

https://community.emc.com/blogs/keith/2016/05/06/sizing-an-isilon-cloudpools-configuration

 

Preparation for using IMCA

 

Iron Mountain IMCA account

 

Contact your Iron Mountain sales team (Contact Sales; Request a Quote - Iron Mountain) after you perform the Isilon scoping exercises and have a good idea of how much data you plan on tiering offsite.  Iron Mountain can give you more information regarding the IMCA service and create an IMCA account for you.  As a result they will provide the following information required for CloudPools:

 

• URI (https)

• Account username

• Account secret key

 

The Isilon CloudPools software will use all three attributes to establish a CloudPools target using the IMCA account information. 

 

Networking

 

Isilon CloudPools software will communicate with the IMCA service over the public internet.  Unless a VPN service is established with Iron Mountain, the network configuration must be evaluated for this exchange of information over a public (and insecure) connection.  Ports need to be opened on both the source side and the target side (Iron Mountain) firewalls for CloudPools to function correctly.

 

So what ports need to be opened?  More importantly, what source IP address (or addresses) should you provide to Iron Mountain for their firewall rules?  Most times your Isilon will have a subnets and IP address pools configured for local LAN access and resides behind routers that are public facing.  When traffic from your source Isilon needs to access a resource over the public internet (possibly NTP, DNS, etc) the address will be remapped by the routers.  This is known as network address translation or NAT.  The NAT IP addresses are the one we are interested in and need to provide to Iron Mountain.

 

Take a simple example of my home network.  I have an Isilon simulator with three nodes and they are configured with an IP address pool of on the 192.168.0.0 subnet.  Everything is behind a router which has a single public facing IP address (say a subnet like 75.189.213.0 for Chapel Hill, NC).  All CloudPools traffic originating from my Isilon cluster on the 192.168.0.0 subnet will remap to an address on the 75.189.213.0 subnet.  So I need to provide Iron Mountain with my single public facing router IP address on the 75.189.213.0 subnet and they would open firewall ports for my address to allow CloudPools traffic.

 

Your datacenter networking could be more complex and require the Iron Mountain networking team to speak with your networking team. Iron Mountain will open the ports below for your public facing IP address (or addresses) provided to them.  Your must also open these ports for the NAT address on your firewall.


CloudPools Ports:

443

9021

 

SSL Certificates


Isilon CloudPools software has the option to validate SSL certificates when performing any communication to the IMCA service.  Iron Mountain will need to provide you with the appropriate certificates (.pem files) which you will configure locally on your Isilon cluster.  Skipping the SSL validation is possible but not recommended since this connection is made over the public internet, make sure you work with Iron Mountain to get these certificates prior to configuring CloudPools.  OneFS configuration of trusted certificates is discussed later in the blog.

 

Isilon Licensing

 

CloudPools software licenses are free when landing your CloudPools data on IMCA (based on EMC ECS).  Your local Isilon sales team can help you get a valid license key for CloudPools.  You can't configure CloudPools without a license so request on prior to performing the work. 

 

CloudPools also requires a OneFS SmartPools license since SmartPools is the tiering software used to move data between OneFS tiers.  You will need a valid SmartPools license as part of an existing software bundle (enterprise advanced bundle) or a standalone SmartPools license.  Again, talk to your local Isilon sales team about purchasing this license if you don't already have it.  Using CloudPools requires a valid SmartPools license and will not work without one since CloudPools utilizes the tiering engine of SmartPools software.   

 

Backups

 

Does you have only a single copy of your Isilon data?  Now is the time to address this topic since its critical to understand that CloudPools is not a backup technology and does not work as a second data copy.  CloudPools is a tiering and stubbing process, it applies SmartPools rules to determine which files it can tier and replaces those files with smaller stub files. If the stub file is lost or corrupted the offsite tiered data is useless and the entire file is lost. 


Your need a second copy of your data, especially if you are thinking of using CloudPools!  Your options are to replicate a second copy of your data to a second Isilon via SyncIQ or to backup your data using NDMP and your favorite backup software.  CloudPools stub files will be small (under 1MB, probably a few KB) so they will be easy to backup.   

 

Again, unprotected stub files introduce a high amount of risk to your business since they are required to access their associated data in the cloud.  There is no way to reconstruct the CloudPools data tiered to Iron Mountain without the stubs files and CloudPools does not send a copy of the stub files offsite.  Backup your data!

 

Implementation Steps

 

Test Basic Connectivity

 

Once you have the networking setup (firewall ports opened) and the IMCA account information you can do some basic connectivity testing before configuring CloudPools.  This will save you time and effort since you can test connectivity independently of the CloudPools.  The best way to test is using the Isilon command line console since your networking should be configured at this point to allow the Isilon to communicate outbound to the Iron Mountain ECS.  This is done via the 'curl' binary that is bundled within FreeBSD on each Isilon node. 


Note, Cyberduck is a popular desktop software package used for connecting to an object store like the ECS based IMCA service.  However, Cyberduck may not work as a connectivity test in this scenario since it does not run on the Isilon cluster and Iron Mountain's firewall may not allow traffic that originates on a desktop to route to their ECS.  We gave the example of using NAT earlier, Cyberduck on a laptop may not translate to the same NAT IP address that results from the Isilon cluster.  Stick with the 'curl' command on the Isilon. 

 

Run the command below as "root" from the Isilon command line console to verify basic connectivity.  The error below is expected because we cannot insecurely query the namespace but the connection succeeded. The URI below is an example, use the one provided by Iron Mountain when testing.  Use the URI provided by Iron Mountain when your IMCA account is created. 

 

# curl --insecure https://URI:9021/ping

 

<Error><Code>NoNamespaceForAnonymousRequest</Code><Message>Could not determine namespace from anonymous request. Please use a namespace BaseURL or include an x-emc-namespace header</Message><RequestId>0a82650a:156471dda22:29a90:9b5</RequestId></Error>#

 

This verifies we have end to end connectivity between the Isilon and the Iron Mountain ECS over port 9021.  If you ran the same curl command to another port besides 9021 it would time-out.

 

Configure Certificates

 

Iron Mountain should provide all certificates required to communicate between your Isilon and the IMCA service in the form of .pem files.  Below is a brief set of steps to take those .pem files and add to the Isilon configuration to allow validation of the SSL certificates when the Isilon communicates with IMCA.  Open a service request with EMC support if you feel you need assistance with this work. 

 

1.  Get all required trusted certificates in .pem format and copy them to the Isilon filesystem

2.  From the Isilon console, login as "root"

3.  Copy the certificate .pem files to the directory /ifs/.ifsvar/modules/cloud/cacert

4.  Get the hash value of each .pem file with the command ‘openssl x509 –hash –noout –in <cert.pem>’ and record

5.  Link each of the .pem files to the hash value output and append with a suffix of “.0” using the command ‘ln –s  <cert.pem>   <hash-val>.0’

6.  Verify all certificate .pem files are linked to their <hash-val>.0 using an ‘ls’ command in the directory /ifs/.ifsvar/modules/cloud/cacert (see example below).

 

# pwd

/ifs/.ifsvar/modules/cloud/cacert

# ls -lah

total 335

drwxr-xr-x    2 root  wheel   391B Aug 31 21:27 .

drwxr-xr-x    5 root  wheel   1.3K Sep  6 21:11 ..

lrwxr-xr-x    1 root  wheel     9B Aug 31 12:25 1bd4f080.0 -> IMone.pem

lrwxr-xr-x    1 root  wheel    55B May 14 10:20 415660c1.0 -> ./VerisignClass3PublicPrimaryCertificationAuthority.pem

lrwxr-xr-x    1 root  wheel    29B May 14 10:20 653b494a.0 -> ./BaltimoreCyberTrustRoot.pem

lrwxr-xr-x    1 root  wheel     9B Aug 31 12:25 7d9c641e.0 -> IMtwo.pem

-rw-r--r--    1 root  wheel   1.2K May 14 10:20 BaltimoreCyberTrustRoot.pem

-rw-r--r--    1 root  wheel   2.1K Aug 31 12:22 IMone.pem

-rw-r--r--    1 root  wheel   1.8K Aug 31 12:23 IMtwo.pem

-rw-r--r--    1 root  wheel   1.7K Aug 31 21:26 Symantec.pem

-rw-r--r--    1 root  wheel   848B May 14 10:20 VerisignClass3PublicPrimaryCertificationAuthority.pem

lrwxr-xr-x    1 root  wheel    12B Aug 31 21:27 b204d74a.0 -> Symantec.pem

-rw-r--r--    1 root  wheel   2.1K Aug 31 07:58 certdata

-rw-r--r--    1 root  wheel   3.6K Aug 31 07:57 logfile

 

Configure CloudPools

 

The rest of the work is straightforward and performed on your Isilon cluster.  A few steps are needed to configure CloudPools to work with Iron Mountain and then a file pool policy is created to tier data to the cloud.  Open a service request with EMC support if you need help or encounter errors. 

 

Create Cloud Storage Account

 

Create a cloud storage account using the credentials supplied by Iron Mountain.  Don't skip SSL validation (set the value to false)!  The command below is an example, the webUI can also create the account and the screenshot below shows an example.

 

isi cloud accounts create --name CloudStorageAccountNameHere --type ecs --uri https://URL.Provided.By.Iron.Mountain:9021 --account-name ProvidedByIronMountain --key ProvidedByIronMountain --skip-ssl-validation false


Screen Shot 2016-09-08 at 11.38.24 AM.png

 

Create Cloud Pool

 

Create a cloud pool that is tied to the account that was created in the previous step.  This pool will get used by the file pool policy as a target for tiered data.  The command below is an example, the webUI can also create the pool and the screenshot below shows an example.

 

isi cloud pools create --name CloudPoolNameHere --type ecs --accounts CloudStorageAccountNameHere

 

Screen Shot 2016-09-08 at 11.40.11 AM.png

 

Create CloudPools File Pool Policy

 

Here we use the information gathered earlier in the scoping section to create a file pool policy that will tier cold and frozen data sets to our cloud pool and cloud account created above.  This policy will most likely use access time (atime) to determine if a file can go offsite along with other metadata attributes.  No example is given here because the each deployment will be different and dependent on the information gathered prior.  Note, use encryption and compression when creating the file pool for the transfer of data to Iron Mountain over the public internet!  See the screenshot below for the encryption, compression, and cloud pool selection (the policy logic is omitted).


Screen Shot 2016-09-08 at 11.40.59 AM.png

 

Testing

 

Test the configuration by running a SmartPools job manually and by checking the CloudPools activity using the commands below.  When the CloudPools "archive" job is complete the data will be replaced by stubs if everything is working correctly.  Compare the output of 'ls' versus 'du' from the Isilon command line console to determine if a file was stubbed, the 'du' command will show a smaller size than the 'ls' command.

 

#Run the SmartPools job

isi job start smartpools

 

#View the CloudPools jobs

isi cloud jobs list

 

#Sample output below, can use the command 'isi cloud jobs view <job.number>' to get detailed information about a job.  When the latest "archive" job completes (job 37 below) the tiering process is done.

 

 

ID   Description                             Effective State  Type

--------------------------------------------------------------------------------------

1    Write updated data back to the cloud    running          cache-writeback

2    Expire CloudPools cache                 running          cache-invalidation

3    Clean up cache and stub file metadata   running          local-garbage-collection

4    Clean up unreferenced data in the cloud running          cloud-garbage-collection

28                                           completed        archive

29                                           completed        archive

30                                           completed        archive

31                                           completed        archive

32                                           completed        archive

33                                           completed        archive

34                                           completed        archive

35                                           completed        archive

36                                           completed        archive

37                                           running          archive

--------------------------------------------------------------------------------------

 

#Compare 'ls' versus 'du' on data you expected to send to Iron Mountain.  In the example below the 600MB files have been stubbed out and the entire directory is only 53K (success).

ls -lah

total 55

drwxr-xr-x    2 root  wheel   103B Aug 17 11:41 .

drwxr-xr-x    3 root  wheel    55B Aug  5 11:43 ..

-rwx------ +  1 root  wheel   636M Jul 27  2015 CentOS-7-x86_64-Minimal-1503-01.iso

-rwx------ +  1 root  wheel   603M Jan 22  2016 CentOS-7-x86_64-Minimal-1511.iso

 

du -h

53K .

 

 

Thanks for reading, comments welcome!

In my last blog (https://community.emc.com/blogs/keith/2016/07/28/hadoop-on-isilon--configuring-a-command-line-gateway) I outlined how to deploy Cloudera CDH on Isilon shared storage and setup a Linux CLI gateway host that allowed users to submit Hadoop jobs on a perimeter edge host.  This allows users to do their work on a non-critical gateway host that runs no significant Hadoop roles other than gateway services.  I made an assumption that I would not use Kerberos in that configuration to keep things simple.

 

Is it realistic to ignore Kerberos for a production Hadoop cluster?  No!  I'll again reference the "Cloudera Security" document (http://www.cloudera.com/documentation/enterprise/5-5-x/PDF/cloudera-security.pdf) and point out that without any authentication or authorization we do not have a secure cluster and will run at "Level 0".  With Hadoop on Isilon, implementing authentication (Kerberos) and authorization (file and directory permissions) are necessary for moving to "Level 1" and getting your Hadoop cluster production ready while taking advantage of Isilon shared storage for multiprotocol access.

 

This post will cover more abstract concepts and less "how-to" content than normal since there is already an excellent guide written by Russ Stevenson on the topic (Cloudera CDH 5.7 with Isilon 8.0.0.1 and Active Directory Kerberos Implementation).  Consider this blog post supplemental and my goal is to explain how things work.  The first few times I went through this process I was lost and had more questions than answers.  Hopefully this can be a shortcut to those who find it and will consolidate some of the background information.

 

Also, please contact Cloudera and EMC when planning this type of work  My goal is to help customers understand how Hadoop on Isilon works with Kerberos but I do not intend this as a guide to use for your production environment.  Please talk to Cloudera and EMC (or your system integrator/ partner) about planning this work and having their professional services folks help out.  It will save you time!

 

Concepts

 

Authentication

 

When you connect to a NAS share or HDFS filesystem, how does the system know if you are who you say you are?  If you are familiar with NFSv3 you will understand that asking an NFS client to announce their user ID (UID) and group ID (GID) is not secure since there is no real authentication involved.  A NAS system hosting NFSv3 shares will simply give a user access provided they present the correct UID/GID (among other things, this is oversimplified to make a point).  HDFS is similar in that it uses user and group account names instead of numbers.  So if a rogue user connected to "Level 0" Hadoop cluster as a superuser account ("hdfs", "mapred") then HDFS would give that account full access without any further authentication.

 

Kerberos is the de facto standard for authentication due to its strong security capabilities. Instead of connecting to a share as userID '501' or as user 'hdfs', the Kerberos client connects to the Kerberos Key Distribution Center (KDC) and performs an encrypted authentication exchange by means of principals and encrypted keytabs.  Do you have Active Directory in your environment?  Then you are already using Kerberos when your Windows users authenticate their accounts with a domain controller.  Active Directory not only provides directory services (list of accounts and groups centrally managed) but also uses Kerberos to authenticate all the accounts in the directory.  MIT Kerberos is an alternative to using an Active Directory KDC but MIT Kerberos is not a directory service and is usually combined with a directory provider (AD or LDAP).

 

Hadoop is typically configured to use Kerberos to provide legitimate secure authentication.  Isilon is also typically configured to use Kerberos authentication by joining the Isilon to an Active Directory domain or by configuring an MIT KDC realm (https://mydocuments.emc.com/pub/en-us/isilon/onefs/7.2.1/ifs-pub-administration-guide-gui/05-ifs-br-authentication.htm).  When running Cloudera CDH on Isilon its best to first configure and test an insecure installation first ("Level 0").  Then the planning can start to enable Kerberos for secure authentication.  Whether you are using Hadoop on DAS (direct attached storage) or Hadoop on Isilon (for shared storage), the Kerberos concept and configurations are very similar but I will highlight the differences where appropriate.

 

Authorization

 

Once you've authenticated yourself through Kerberos, what data are you authorized to access?  Think of authenticating through Kerberos as the first step and authorization as the next step.  Authorization is handled through POSIX permissions on HDFS data and access control lists (ACLs).  POSIX permissions are well known Unix type permissions on the HDFS filesystem data that you manage through 'chown' and 'chmod'. HDFS ACLs (disabled by default) allow you to implement permissions that differ from traditional POSIX users and groups.

 

HDFS ACLs are not implemented within Isilon OneFS and cannot be used with Hadoop on Isilon!  See this link (Using CDH with Isilon Storage) for a reference from Cloudera.  Isilon is a multi-protocol scale-out NAS array that supports simultaneous access to the same data over SMB, NFS, and HDFS.  So with Isilon we really don't need HDFS ACLs since we can permission the data with Windows ACLs which are very common in most IT shops.

 

Lets take our CLI gateway setup as an example (https://community.emc.com/blogs/keith/2016/07/28/hadoop-on-isilon--configuring-a-command-line-gateway).  We created a Windows user in active directory, created an HDFS home directory (in /user) for that user, and then changed the owner on that users home directory to match the active directory user ID as pulled from the Isilon.  Because the Isilon supports HDFS, NFS, and SMB access, there are already capabilities to use ACLs outside HDFS because HDFS is mounted on shared Isilon storage.  So if I want my AD user "keith" to use Unix style POSIX permissions I can set them from the Isilon side or from HDFS.  If I want to use Windows ACLs I can share this user directory out via SMB (the HDFS mount /user/keith) and manage ACLs through Windows Explorer or using the Isilon console.  So the Isilon already has ACL capabilities since its a NAS system designed for NFS and SMB access.  We can use Windows ACLs which are very common and in most cases don't need HDFS ACLs.  Bottom line, one of the great things about Hadoop on Isilon is that Windows users can access their HDFS user directory through Windows Explorer while also accessing the exact same data through HDFS on Hadoop.

 

Note, there are other ACLs in Hadoop that have nothing to do with file and directory permissions (such as job queues).  These are still supported, we are only talking about HDFS file and directory ACLs here.

 

Simple vs Kerberos Authentication

 

Hadoop (and Isilon) supports two types of HDFS authentication, simple and kerberos.  Simple authentication means there really is no authentication as we mentioned above and Hadoop "trusts" that a user is who they say they are with no verification.  You could even put a username in the HADOOP_USER_NAME environment variable to impersonate other users, there is no further verification and jobs will run as any user you like including built in superuser accounts like "hdfs" and "mapred".  A Cloudera CDH installation that has not yet had Kerberos enabled will default to simple authentication and an Isilon access zone by default will also support simple authentication.  The Isilon will also match UIDs and GIDs for HDFS data so its important to keep the HDFS accounts in sync with your external directory UIDs and GIDs on the Isilon permissions for authorization (we did this in the last blog post when setting the UID/GID in HDFS instead of simply the friendly account name). Its best to setup your Cloudera on Isilon cluster first with simple authentication to make sure everything is working before moving to Kerberos authentication.

 

Kerberos authentication is enabled both on the Cloudera cluster (Cloudera wizard to enable Kerberos) and on the Isilon (access zone setting to switch from simple authentication to Kerberos).  Once enabled all HDFS tasks require a Kerberos ticket from the user submitting the command or job.  What does this mean?  Simply put, you can't do any work without first obtaining a ticket.  If you are running jobs from a command line, that means you need to 'kinit' prior to performing any work that requires access to HDFS.  Without this ticket ('kinit') all commands and jobs will fail with a "PriviledgedActionException" error, even a simple 'hdfs fs -ls' command.

 

Simple authentication mode allows users to very easily impersonate superuser accounts (hdfs, mapred) and this happens when users submit jobs to the various Hadoop ecosystem components.  Say my user "keith" submits a map reduce job, Hadoop will run parts of the jobs as the "mapred" account by default without the user knowing or maliciously trying to impersonate that superuser account (this gets complicated, comments welcome).  When Kerberos is turned on this still happens but needs to be done securely by delegation tokens.  Why do we need this?  Say a user logs into a CLI gateway host, runs 'kinit', gets a Kerberos ticket, and submits a job using data in HDFS that they are authorized to access.  This job may run for a few minutes or a few hours.  That user may log off or destroy their Kerberos ticket but we want the job to continue to run since it was submitted securely.  So a delegation token is used which simply impersonates a user securely for a fixed amount of time (1 day) but can be renewed.  Kerberos authentication means delegation tokens will now be used between Cloudera and Isilon for an additional layer of security when users submit Hadoop jobs.

 

Proxy Users

 

Superuser accounts in Hadoop (mapred, hdfs, etc) need the ability to submit Hadoop jobs on behalf of end users and groups.   As we mentioned in simple authentication mode above, this happens behind the scenes because there simply is no authentication in simple mode.  Superusers impersonate end users who submit jobs with no authentication.  If user "keith" submits a map reduce job, the superuser account "mapred" impersonates user "keith" at times when necessary.  When Kerberos is configured the same things happens except the superuser account must obtain Kerberos credentials. Thats not a problem because the Cloudera Kerberos wizard will create these superuser accounts in active directory and authenticate them via Kerberos when necessary (you don't get the password as the Hadoop admin!).

 

When using Hadoop on DAS (again, direct attached storage) this works automatically since the HDFS filesystem is self contained within the compute nodes.  Introducing Isilon as shared storage requires an extra step since the Isilon does not automatically know which superuser accounts are allows to impersonate end users.  To enable this feature the Isilon has a concept of an Isilon proxy user to mimic this behavior (https://mydocuments.emc.com/pub/en-us/isilon/onefs/7.2.1/ifs-pub-administration-guide-cli/25-ifs-br-hadoop.htm).  When configuring a secure Kerberized configuration you need to create proxy users for every Hadoop superuser account that needs to securely impersonate end users based on the type of jobs you plan on running.  The documentation describes how to do this but don't skip it, your jobs will fail!

 

Putting it all together

 

So how do we take our "level 0" Cloudera cluster with a CLI gateway (one last time -> https://community.emc.com/blogs/keith/2016/07/28/hadoop-on-isilon--configuring-a-command-line-gateway) and secure it with Kerberos?  Its not that difficult now that we've explained all the info above and since we already know things are working without Kerberos.

 

Recap of our environment

 

- CDH 5.5.4 running on four (4)  CentOS 6.8 VMs

- OneFS 7.2.1.1 running on three (3) virtual Isilon nodes

- One (1) Windows 2008 R2 host acting as a domain controller, DNS server, and certificate authority

 

Our CDH cluster nodes are:

 

- One (1) Linux VM running Cloudera Manager

- Two (2) Linux VMs running all CDH Hadoop roles (master nodes / compute nodes)

- One (1) Linux VM acting as our CLI gateway running only gateway roles, this is where users submit their Hadoop jobs via the CLI

 

Screen Shot 2016-07-14 at 8.51.20 PM.png

 

Our Kerberos setup will use the Windows domain controller as the Key Distribution Center (KDC) and we will not use MIT in this example (although you could).  Our steps will look like:

 

Configure SSSD on all nodes

 

If you remember we had SSSD and all its required packages installed and configured on our CLI gateway host so that active directory users could login to the Linux host with their AD credentials and submit Hadoop jobs.  Substitute your favorite commercial package for SSSD if you like.  We don't need SSSD on the Cloudera Manager host. 


Why do we need SSSD on all hosts and not just the CLI gateway host? With all the info we've covered in this post regarding simple versus kerberos authentication, we now know that even though we were submitting jobs on the CLI gateway host as an AD user, the jobs were subsequently using superuser accounts to run without authenticating and without our knowledge.  The Hadoop nodes running core services had no knowledge of our Active Directory accounts, they were using superuser accounts like "mapred" and "hdfs".  However, if we want to use Kerberos we need every node in the Hadoop cluster (except the Cloudera Manager host) to recognize all of our directory accounts (AD accounts) so we must install our SSSD package everywhere and configure. 


See the my last blog post on how to configure SSSD, you can most likely copy the .conf files to your other hosts and start the services without much trouble. 

 

Create Isilon proxy users for Hadoop superusers and add end users

 

Just follow the process described in the link (Using CDH with Isilon Storage).  The process is straightforward, just create the proxy superusers with the commands given and then add your Hadoop users.  Better yet, put your Hadoop users in groups and assign groups to the proxy users (less work in the long run).  At this point you can't nest groups within groups so just keep things simple.  You will need to know what AD users need to run Hadoop jobs and make sure you permission their HDFS /user home directory with the correct UIDs/GIDs are from the Isilon perspective ('isi auth mapping token...'). 

 

Follow the Cloudera documented process for enabling Kerberos through the wizard (Enabling Kerberos Authentication Using the Wizard).


Note that OneFS 8.0 and up supports AES-256 encryption but OneFS 7.x does not!  Not really a problem if you follow the default steps and are ok with other encryption types, just use OneFS 8.0 if you need AES-256 explicitly.  You will also need to create an AD user preferably in a new active directory organizational unit (OU) that has the ability to "Create, Delete and Manage User Accounts".  The documentation describes this well so just follow Cloudera's process.


Note, you may stall during the Cloudera Kerberos wizard when starting up the Hue service.  Again, follow the step by step process in this excellent blog (Cloudera CDH 5.7 with Isilon 8.0.0.1 and Active Directory Kerberos Implementation) to get past that error, it contains a link in the post that specifically addresses how to start Hue.  The overall process is very well documented and can be used for the entire process of getting Kerberos enabled on CDH on Isilon (great job Russ!).

 

Test with your AD users

 

Now the moment of truth!  Log into your CLI gateway host as your AD user ("keith" in my case) and try to run an 'hdfs' command.  It will fail of course since we did not obtain a Kerberos ticket (if it succeeds there is something wrong or you have a Kerberos ticket cached).  Before you can do anything on the secure cluster you must obtain a ticket from the KDC (our Windows domain controller) so we need to run the 'kinit' command.

 

-sh-4.1$ kinit keith@KEITH.COM

Password for keith@KEITH.COM:

-sh-4.1$ klist -e

Ticket cache: FILE:/tmp/krb5cc_710801189

Default principal: keith@KEITH.COM

 

Valid starting     Expires            Service principal

08/04/16 08:40:10  08/04/16 18:40:11  keith/KEITH.COM@KEITH.COM

  renew until 08/11/16 08:40:10, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96

-sh-4.1$ hdfs dfs -ls /user/keith

Found 3 items

drwx------   - keith domain users          0 2016-08-04 08:28 /user/keith/.staging

drwxrwxrwx   - keith domain users          0 2016-08-04 08:25 /user/keith/QuasiMonteCarlo_1470313498632_1021303185

 

Success!  Now to submit a test job.  Our Kerberos ticket is valid for a fixed time period so we don't have to run 'kinit' every time, we can continue to use the same ticket.  I'm going to submit a sample teragen job as my user "keith".  Notice the creation of the delegation token (as discussed above).


-sh-4.1$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 1000 /user/keith/out

16/08/04 08:40:39 INFO client.RMProxy: Connecting to ResourceManager at slavetwo.keith.com/192.168.0.20:8032

16/08/04 08:40:39 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 17 for keith on 192.168.0.54:8020

16/08/04 08:40:39 INFO security.TokenCache: Got dt for hdfs://cloudera.isilon.keith.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.54:8020, Ident: (HDFS_DELEGATION_TOKEN token 17 for keith)

16/08/04 08:40:39 INFO terasort.TeraSort: Generating 1000 using 2

16/08/04 08:40:39 INFO mapreduce.JobSubmitter: number of splits:2

16/08/04 08:40:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470244871757_0028

16/08/04 08:40:40 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.54:8020, Ident: (HDFS_DELEGATION_TOKEN token 17 for keith)

16/08/04 08:40:40 INFO impl.YarnClientImpl: Submitted application application_1470244871757_0028

16/08/04 08:40:40 INFO mapreduce.Job: The url to track the job: http://slavetwo.keith.com:8088/proxy/application_1470244871757_0028/

16/08/04 08:40:40 INFO mapreduce.Job: Running job: job_1470244871757_0028

16/08/04 08:40:56 INFO mapreduce.Job: Job job_1470244871757_0028 running in uber mode : false

16/08/04 08:40:56 INFO mapreduce.Job:  map 0% reduce 0%

16/08/04 08:41:07 INFO mapreduce.Job:  map 50% reduce 0%

16/08/04 08:41:13 INFO mapreduce.Job:  map 100% reduce 0%

16/08/04 08:41:14 INFO mapreduce.Job: Job job_1470244871757_0028 completed successfully

16/08/04 08:41:14 INFO mapreduce.Job: Counters: 31

  File System Counters

  FILE: Number of bytes read=0

  FILE: Number of bytes written=229962

  FILE: Number of read operations=0

  FILE: Number of large read operations=0

  FILE: Number of write operations=0

  HDFS: Number of bytes read=164

  HDFS: Number of bytes written=100000

  HDFS: Number of read operations=8

  HDFS: Number of large read operations=0

  HDFS: Number of write operations=4

  Job Counters

  Launched map tasks=2

  Other local map tasks=2

  Total time spent by all maps in occupied slots (ms)=11587

  Total time spent by all reduces in occupied slots (ms)=0

  Total time spent by all map tasks (ms)=11587

  Total vcore-seconds taken by all map tasks=11587

  Total megabyte-seconds taken by all map tasks=11865088

  Map-Reduce Framework

  Map input records=1000

  Map output records=1000

  Input split bytes=164

  Spilled Records=0

  Failed Shuffles=0

  Merged Map outputs=0

  GC time elapsed (ms)=67

  CPU time spent (ms)=640

  Physical memory (bytes) snapshot=227663872

  Virtual memory (bytes) snapshot=3020079104

  Total committed heap usage (bytes)=121634816

  org.apache.hadoop.examples.terasort.TeraGen$Counters

  CHECKSUM=2173251765740

  File Input Format Counters

  Bytes Read=0

  File Output Format Counters

  Bytes Written=100000

 

Success again!  We now have a secured Cloudera cluster communicating with Isilon shared storage both using Kerberos!

 

Still a CLI gateway host?

 

With our simple authentication setup our Active Directory users could only log into the CLI gateway host to submit jobs because the SSSD package was not deployed anywhere else.  Kerberos requires that we deploy SSSD everywhere if we want to integrate with AD.  Which means that now users can log into any of our Hadoop hosts and submit jobs (if they have ssh access of course).  After enabling Kerberos its probably a good idea to secure the critical Hadoop hosts running core CDH services to avoid user access and only allow users to access the CLI gateway host.

 

Thanks for reading, comments welcome!

I recently had a Hadoop question from a great EMC customer and parter concerning Cloudera CDH running on Isilon.  The question went something along the lines of...

 

"I was checking with you on active directory (AD) integration issues we are seeing on Isilon as HDFS is not able to recognize file and directory permissions with the special character “\” in the AD user name format of DOMAIN\USERNAME How can we change the format so that Cloudera Hadoop can recognize permissions on files and directories for active directory users on Isilon?"

 

The simple answer to this question is to set the Isilon OneFS active directory provider option --assume-default-domain to "yes" since it defaults to "no" (thanks Brian Radwanski and Russ Stevenson!).  Setting this option changes the user and group permission from the "DOMAIN\USERNAME" and "DOMAIN\GROUP" format to simply "USERNAME" and "GROUP" but still references the same active directory users and groups.

 

But what does that really mean?  Why doesn't it work with the defaults?  What exactly does the problem look like in Hadoop with the option set to "no" and how does it behave differently with that flag set to "yes"?  I went through the process of setting this up on my home lab gear with the intent of fully understanding the problem, the fix, and to learn some more about Hadoop security along the way.  I spent a lot of time setting up components that I normally don't touch so hopefully this helps others save some time .

 

NOTE - Please engage your EMC and Cloudera sales teams to help you plan the deployment Hadoop on Isilon for your production environment.  This blog is not meant to be a run book or a substitute for professional services; engage EMC, Cloudera, and other partners to help you get your Hadoop on Isilon production ready!

 

Another quick caveat, I don't have Kerberos configured on my Hadoop cluster in this blog post.  Consider this a starting point on your way towards a fully secured cluster since its easier to learn how things work first without Kerberos and then add it later.

 

Cloudera Security Overview

 

Take some time to read the guide "Cloudera Security" (http://www.cloudera.com/documentation/enterprise/5-5-x/PDF/cloudera-security.pdf) prior to going any further.  It does an excellent job describing the different aspects of Hadoop security (authentication, encryption, authorization, and auditing) and also the different levels of security (levels 0 through 3) for a Hadoop cluster.  In our scenario, we will be working with authentication and authorization.  We'll also assume a relatively insecure cluster, somewhere around the security guide's "level 1" since we'll simply work on giving active directory users access to the Hadoop cluster through a dedicated Linux gateway host and securing data with user and group permissions.

 

So what is a gateway host?  The Cloudera security guide shows a few diagrams where "gateway" service are presented to users while the critical Hadoop components are secured away from users.  So a gateway service or edge host is simply a non-critical host that is a member of the Hadoop cluster but doesn't run any critical roles, just gateway roles.  Those gateway roles could include the ability to communicate with the Hadoop cluster via API, Hue, HttpFS, or command line for submitting jobs.  We'll assume for this blog just a command line (CLI) gateway service running on a Linux host that runs no critical Hadoop roles but allows user to submit jobs to the cluster via command line.  In a production environment, this host would be open to the user community while the hosts running critical Hadoop roles would be unavailable for direct user interaction.

 

Ok, this sounds fairly simple right?  Well, it would be if we used only local accounts everywhere but most organizations would rather take advantage of existing directory services like Active Directory or LDAP.  If this CLI gateway host had to use local accounts it would introduce additional management tasks for sysadmins and wouldn't be a great solution.  However, if this gateway host was integrated with AD then one could simply authenticate through active directory and use existing infrastructure and account identities.  So deploy a Linux CLI gateway integrated with AD to solve both the authentication and authorization problems for accessing data on the Hadoop cluster.

 

So what does this have to do with Isilon?  Using Isilon shared storage for Hadoop means all the HDFS data is stored on an enterprise grade scale-out NAS cluster that is easy to use and can grow very large (PB+ scale) with little effort.  For Cloudera, you simply select Isilon as a role during CDH installation which results in the HDFS filesystem being stored on the Isilon and not on direct attached disks (DAS).  No need for namenodes, no datanodes, no 3x replication overhead headaches, and no need to add more compute nodes when you really only need more storage capacity.  What Isilon does deliver is the benefits of enterprise grade Isilon data protection via efficient erasure coding, self encrypting drives, storage snapshots, storage quotas, and storage replication.  Since the Isilon is used as the storage for the HDFS Hadoop filesystem we have to integrate not only the CLI gateway host with AD but also the Isilon with the same AD so that the entire Hadoop cluster is secured for authentication (AD) and authorization (user/group permissions for users via ACLS and POSIX bits).

 

Lets get started!

 

Environment

 

I'm using an ESXi 6 host in my lab to run all my hosts for this setup.  Below are the versions and a screenshot of my host VMs.  Note that I'm hand building each CentOS host from scratch and treating the VMs as if they were bare metal hosts.  Virtualizing Hadoop with Cloudera and Isilon would normally look like the reference architecture co-developed by EMC and Cloudera (http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_vmware_isilon.pdf).

 

  • ESXi 6 to host all VMs
  • CDH 5.5.4 - four (4) VMs using Windows DNS name resolution running CentOS 6.8
  • OneFS 7.2.1.1 - three (3) virtual Isilon nodes with name resolution working with the Windows DNS server
  • Windows 2008 R2  - one (1) host acting as domain controller, DNS server, and certificate authority

 

Screenshot of lab environment

Screen Shot 2016-07-14 at 8.51.20 PM.png


Prepare Isilon

 

You may already have a physical Isilon cluster or you may want to use the Isilon simulator (https://www.emc.com/products-solutions/trial-software-download/isilon.htm) to build a test environment.  If you are using the simulator, follow the PDF instructions that come with the ZIP download and deploy on your platform of choice, I am using ESXi 6 and the VMware vSphere Converter Standalone Client.  I won't go into the details of deploying the Isilon simulator, contact your Isilon sales team if you need assistance or evaluation licenses.  Also, I'm using the OneFS 7.2.1.1 version in my example, OneFS 8.0 would also be a good place to start.

 

Once you have your simulator built or have a physical cluster ready, you will need to prepare your Isilon for Hadoop.  I won't go into detail about doing this prep work but the process is documented well in the "EMC Isilon Hadop Starter Kit for Cloudera" (EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions — EMC Isilon Hadoop Starter Kit for Cloudera). Although the document is intended for virtualizing Hadoop on VMware with Big Data Extensions (we are not doing that in this blog example) we can use sections of the document that are true for both virtualized Hadoop and bare metal Hadoop.  Again, even though I have virtual CentOS hosts for Hadoop, I am not following the full blown Big Data Extension route, I am hand building each VM as if it were bare metal and not doing an automated VM deployment that the BDE product would allow.  Skip straight to the "Prepare Isilon" section and follow all the instructions in that section.

 

After following the preparation steps in the Hadoop Starter Kit (HSK) instructions, you will have a separate access zone and IP address pool specifically for Cloudera Hadoop.  This access zone will also have a directory dedicated to Hadoop and will be the root directory used by Cloudera for HDFS.  It is important also to download the Isilon Hadoop Tools (GitHub - claudiofahey/isilon-hadoop-tools: Tools to deploy Hadoop on EMC Isilon) and run the isilon_create_users and isilon_create_directories scrips to get the local users, directories, and permissions in place for the Cloudera install.  Skipping this step will create more work later so make sure to run these scripts!  Lastly, make sure you perform the the DNS work documented, I used my Windows 2008 R2 domain controller as my DNS server to keep things simple.

 

Install Cloudera Manager & CDH

 

I have four (4) CentOS VMs that I'm treating as bare metal hosts and this is how I will use them:

 

  • one (1) Cloudera Manager host
  • two (2) master nodes to run CDH roles and act as the compute for my CDH cluster
  • one (1) CLI Gateway for user access to the CDH cluster

 

I'm not going to to into great detail on how to deploy a CDH cluster on Isilon, its all documented well in the HSK we used above (EMC Isilon Hadoop Starter Kit for Cloudera with VMware Big Data Extensions — EMC Isilon Hadoop Starter Kit for Cloudera) Jump to the "Install Cloudera Manager section" and follow the instructions for deploying CDH with Isilon.  Then follow the instructions for "Deploy a Cloudera Hadoop Cluster".  Use your all your Linux hosts but don't add your Isilon until you get to the "cluster setup" section and then add custom services with "Isilon" selected.  When you get to "role assignments" make sure you don't install anything on your CLI gateway host other than gateway services like the Isilon gateway, Spark gateway, etc.  This way you will have a single host for user interaction to the cluster that is not running any core Hadoop services.

 

Once the steps above are completed, follow the steps for "Adding a Hadoop User" and "Functional Tests".  Add the user and perform the functional testing on the CLI gateway host (with no roles other than gateway roles).  At this point you are using all local users on the Hadoop cluster, local users on the gateway host, and local users on the Isilon.  If all the tests are working (Yarn/MapReduce, Pig, Hive, etc) then you can be confident that CDH and the Isilon have been deployed correctly and Hadoop is operational with shared Isilon storage.  There is no AD integration at this point.

 

Prepare Windows Host (Active Directory, LDAPS, Certificate Services)

 

We assumed above that we'd use the Windows 2008 R2 host as a DNS server to get name resolution working in this environment.  Now we need to use the same Windows host for a few other roles necessary to allow integration of the CLI gateway host.  If you haven't already done so, promote this Windows server to be a domain controller.  Also, make sure you have name resolution working across all hosts and the Isilon (the CDH installation would have failed if you had no name resolution).

 

Your CLI gateway host will need the ability to interact securely with active directory over LDAPS (secure LDAP queries) which requires certificate services for SSL communications.  I struggled with this step since its been a while since I've worked with SSL, be prepared to spend some time on this.  I won't go into a step by step process for this but I did find this link useful in getting started --> http://social.technet.microsoft.com/wiki/contents/articles/2980.ldap-over-ssl-ldaps-certificate.aspx

 

Basically you need to first install the AD domain certificate services role to your W2K8R2 host and then duplicate the "Kerberos Authentication" template (name it whatever you like).  Follow the steps in the Technet link and make sure you "allow the private key to be exported".  Follow the remaining steps to issue the template you created and then request it locally on the same W2K8R2 host.  Basically, you are issuing the certificate for your AD domain and then you are requesting it locally on the same domain controller host.

 

How can you tell if LDAP is working locally within your AD domain?  Run LDP.EXE from the command line of your domain controller.  This executable can quickly test both LDAP connections (insecure) and LDAPS connections (secure with a certificate).  First try an insecure LDAP test to your AD domain name over port 389 (screenshot below) with no options checked.  If successful you will get a log message showing "Established connection to <your AD domain>" and "Retrieving base DSA information...".

 

LDAP test

Screen Shot 2016-07-27 at 9.44.05 AM.png

Successful LDAP Results

Screen Shot 2016-07-27 at 9.55.00 AM.png

 

Next try a secure LDAPS connection over SSL (certificate used).  Disconnect your previous LDP.EXE session and connect to your AD domain name over port 636 with "SSL" checked (screenshot below).  If successful, you will see the LDAP SSL messages first and then the "Established connection..." and "Retrieving base DSA information..." messages.  If you can't get an LDAPS connection working locally on your Windows host, keep working on this until you can, if your certificate is not working locally on your domain controller it will also not work on your CLI gateway Linux host so keep trying until this works.

 

LDAPS Test

Screen Shot 2016-07-27 at 9.49.50 AM.png

 

Successful LDAPS Results

Screen Shot 2016-07-27 at 9.55.25 AM.png

Note, all of the above work (AD certificates, LDAPS) are 100% independent of the Isilon shared storage.  If you want to build a CLI gateway host with AD integration you will need to perform this work regardless of whether you are using Isilon shared storage or DAS (direct attached storage).  Isilon joins an active directory domain with the domain administrator credentials and by default becomes an object in the AD "Computers" OU.  You do not need to setup AD certificate services or issue SSL certificates to get Isilon working with AD.  However you do need to do this for your Linux host if you want integration with AD.

 

Speaking of the Isilon, if you haven't already, join the Isilon to your Active directory domain using your AD domain administrator credentials.  Also add this AD domain to your Hadoop access zone as an AD authentication provider.  Both actions are very easily done through the GUI and won't be explained in detail here.

 

Prepare Linux CLI Gateway Host (ldapsearch, certificates, SSSD)

 

Your Linux host will need a way to integrate with active directory and allow AD users to login to the Linux host via the CLI.  This is another step that took some work and I found a few blogs helpful in getting this working.  We can refer to this topic as identity integration (as mentioned in the Cloudera Security Guide) and there are a number of commercial products (Centrify, Quest) that can do this as well as the open source SSSD package which I am using for this example.  The blog links below helped me and should be followed roughly (along with research on your own) to get through this process.

 

What I See: Redhat Integration with Active Directory using SSSD.  <-- testing LDAP and getting SSSD working

Certificate Installation with OpenSSL - Other People's Certificates <-- installing the AD certificate on the Linux host

 

LDAP

 

Our Linux client needs to securely query active directory using LDAPS and SSL certificates in order to integrate successfully with AD so we first want to test insecure LDAP connectivity.  Install the openldap-clients package on your Linux CLI gateway and first test insecure LDAP connectivity to active directory.  A query like the example below should work at this point between the Linux host and the Windows domain controller.

 

ldapsearch -v -x -H ldap://<your.domain.controller/ -D "cn=Administrator,cn=Users,dc=example,dc=com" -W -b "cn=Users,dc=example,dc=com"

 

Here is my sample test using my Windows domain controller host name and my KEITH.COM domain.  The credentials to active directory are passed with the -D flag and the object to query is the -b flag.

 

ldapsearch -v -x -H ldap://win-cd56ouv3urh.keith.com/ -D "cn=Administrator,cn=Users,dc=keith,dc=com" -W -b "cn=Users,dc=keith,dc=com"

 

I'm first prompted for the administrator password and then I'm given the full attributes of the "Users" OU on my Linux host (output not included for this example).  If this query completes successfully then you have basic LDAP over port 389 communication working.

 

Certificates

 

Now we need to install the AD certificate on our Linux host to communicate over SSL with the domain controller.  You'll need the openssl package installed on your Linux host and also will need a basic understanding of certificates since I won't go into heavy detail on this blog.  Basically you will need to get the certificate from the domain controller, convert it to a PEM file, verify the certificate, find the hash of the certificate, and create a symbolic link in the certificate directory using the hash name as the link name pointing to the actual certificate.

 

Wait, what?  For those who never work with certificates on Linux this is easier said than done but nothing you can't Google your way through .  I basically first exported my AD Kerberos certificate that I issued earlier in my Windows host preparation step (the "Kerberos Authentication" duplicate).  I moved that to my Linux host, calculated the fingerprint (openssl x509 -noout -fingerprint -in ca-certificate-file), and copied it to my /etc/openldap/certs directory.  I calculated the hash (openssl x509 -noout -hash -in ca-certificate-file) and created a symlink to the certificate using the hash name.  In my example below, the hash is 20eb7ef1 which I can now see in /etc/openldap/certs.  If I look at that 20eb7ef1 file I'll see the actual certificate (not shown but easy enough to do, you will see the "BEGIN CERTIFICATE" and "END CERTIFICATE" strings surround the certificate data.

 

# ls 20*

20eb7ef1

 

SSSD

 

SSSD is an open source package that integrates AD with a Linux host allow us to login to our Linux host with AD credentials (among other things).  I can't start this work until I have the Kerberos certificate from my domain controller installed on my Linux host /etc/openldap/certs directory which I performed in the previous section.  We'll need the sssd package installed on our Linux host as well as sssd-client, krb5-workstation, samba, openldap-clients (should already be installed), openssl (already installed), and authconfig.

 

You will first need to create an ldap bind user in active directory that SSSD will use to connect and communicate with AD.  Just create a service account in AD (something like "ldapuser") and set a password that never expires.

 

Next you'll need to configure your /etc/krb5.conf file so SSSD can communicate with active directory using Kerberos.  Set your default_realm to your AD domain name (example.com) and then configure your [realms] section with your AD name (example.com) and domain controller host name (host.example.com) for both the admin server and KDC.  Again, I'm going to skip going into detail on the format of the conf file but it should be straightforward (Google it!).  If you have this working correctly you should be able to 'kinit' an AD user and then 'klist' that users Kerberos ticket (below, please forgive the email auto-formatting, don't email this address)

 

# kinit keith@KEITH.COM

Password for keith@KEITH.COM:

# klist -e

Ticket cache: FILE:/tmp/krb5cc_0

Default principal: keith@KEITH.COM

 

Valid starting    Expires            Service principal

07/27/16 15:53:34  07/28/16 01:53:37  krbtgt/KEITH.COM@KEITH.COM

  renew until 08/03/16 15:53:34, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96

 

So we've configured Kerberos and tested ldapsearch so we can now configure the SSSD conf file.  I have not seen a way to automatically generate the /etc/sssd/sssd.conf file, it needs to be created manually and a sample config pasted into the file and edited for your specific environment. I struggled a bit with the format and finally used the example in the blog mentioned previously (What I See: Redhat Integration with Active Directory using SSSD.).  I won't go into too much detail here, edit your [domain/example.com] section to match your environment (ldap_uri, ldap_schema, krb5_realm, etc) and use your ldapuser (created above) as the ldap_default_bind_dn and the ldapuser password as the ldap_default_authtok.

 

Once you have edited the sssd.conf file and think its correct, enable sssd and update with the command:

 

authconfig --enablesssd --enablesssdauth --enablelocauthorize --enablekrb5 --update

 

Now try to start (or restart) the sssd service with the command:

 

service sssd start

 

If you have a misconfigured sssd.conf file the service will not start, go back and fix your configuration file (sorry).  If it starts ok, test by trying to query an AD user with the 'id' command.  I am querying (below) the ldapuser I have created in AD as the service account for SSSD.

 

#id ldapuser

uid=710801188(ldapuser) gid=710800513(Domain Users) groups=710800513(Domain Users)

 

A result with the UID/GID of an AD user means success!  I don't have a local user by this name on this host and I can tell its an AD user by the group "Domain Users".  You should now also be able to 'su' to an AD user or even login to this host as that AD user provided you have login permissions locally on that Linux host.

 

# su - ldapuser

-sh-4.1$ id

uid=710801188(ldapuser) gid=710800513(Domain Users) groups=710800513(Domain Users)

 

Note, the Linux oddjob package is to create a home directory locally on your Linux host for AD users, research and use that if you are interested in using it.

 

Final Touches

 

If you've made it this far give yourself a pat on the back!  The rest is fairly easy .  We now have a Linux host where AD users have access to login as their AD identities, we don't have to create local Linux users (or local Isilon users) for Hadoop and can take full advantage of AD identities across both the Hadoop CLI gateway host and the Isilon.  Again, make sure you have joined the Isilon to your AD domain and have added that AD domain to your Hadoop access zone as a provider.

 

Select a user (or create a new user) in AD that needs to run Hadoop jobs on the Hadoop cluster.  Give the user the ability to log into the Linux host via 'ssh' (outside the scope of this blog) or simply test with 'su'.  I have a user named 'keith' in AD and I can test as this user on my CLI gateway host (below).

 

# id keith

uid=710801189(keith) gid=710800513(Domain Users) groups=710800513(Domain Users)

[root@cligateway certs]# su - keith

$ id

uid=710801189(keith) gid=710800513(Domain Users) groups=710800513(Domain Users)

 

Great, so how can I now run Hadoop jobs?  Well, by default your user will get an error when trying to submit jobs since they have no home directory in HDFS under the /users mount.  If I try to run a Hadoop job as this user with no HDFS home directory I'll get strange errors because the user is unknown to Hadoop.  So now I have to create a home directory for my user in HDFS under the /users mount (below) as the hdfs user (superuser in Hadoop).

 

sudo -u hdfs hdfs dfs -mkdir -p /user/keith

 

But wait, what permissions do I assign to that home directory in HDFS?  Before when I used local users on the CLI gateway host this was easy because I just used the local account name.  What about an AD account?  This is where we integrate with the Isilon and this is where i hope to save you some time and aggravation.

 

Secret to making this all work

 

When I run the 'id' command against my AD user on the CLI gateway host, I get a strange 700 million UID/GID.  DO NOT use this 700 million UID/GID on the HDFS home directory under HDFS /users! It will not work since the Isilon does not use this UID/GID.  Remember, we are using shared storage for HDFS and /users is actually running on the Isilon, not locally on direct attached storage.

 

# id keith

uid=710801189(keith) gid=710800513(Domain Users) groups=710800513(Domain Users)

 

So what is the correct UID/GID to assign to the user HDFS home directory?  We need to look at the user's token on the Isilon since this directory is on the Isilon.  We run a command from the Isilon to examine the mapping token (account info) and get the correct UID/GID for this sample user who needs to run Hadoop jobs from the gateway host.  Note, we are using an access zone (if we followed the steps in the HSK) so be sure to specify your Hadoop access zone in the --zone flag.

 

isi auth mapping token --user=KEITH\\keith --zone=cloudera

                  User

                      Name: keith

                        UID: 1000004

                        SID: S-1-5-21-2027596968-42473543-1993621826-1189

                    On Disk: S-1-5-21-2027596968-42473543-1993621826-1189

                    ZID: 2

                  Zone: cloudera

            Privileges: -

          Primary Group

                      Name: domain users

                        GID: 1000000

                        SID: S-1-5-21-2027596968-42473543-1993621826-513

                    On Disk: S-1-5-21-2027596968-42473543-1993621826-513

<snip>

 

Ah, now I see that the UID is 1000004 for this AD user and the GID is 1000000.  The Isilon will auto-generate a UID and GID in the 1 million range for every AD user unless you have AD integrated with Unix UID/GIDs and are using RFC2307 (another topic we won't cover here).  So now I just have to 'chown' the user's HDFS home directory with the Isilon UID/GID so that AD user can access the Hadoop cluster.  You can also 'chmod' to give appropriate access.

 

sudo -u hdfs hdfs dfs -chown 1000004:1000000 /user/keith

 

Now to check what this looks like, run the command below on the CLI gateway host to look at the UID/GID along with the permissions

 

# sudo -u hdfs hdfs dfs -ls /user

Found 20 items

<snip>

drwxr-xr-x  - KEITH\keith KEITH\domain users          0 2016-07-27 16:11 /user/keith

<snip>

 

Great!  Now my Linux CLI gateway has a HDFS home directory for my AD user and has the UID/GID set to the AD identity.  All is good right?  Well, one last thing.  This format of DOMAIN\user is confusing to Hadoop and you will probably get a "Username: 'keith' not found. Make sure your client's username exists on the cluster" error.  What we want is "USERNAME"  for the UID instead of "DOMAIN\USERNAME" and "GROUP" for the GID instead of "DOMAIN\GROUP".  But we also want the owner and group to retain the AD identity.   

 

How do we do this?  We set the option on the Isilon of course!  There is an option on the Isilon for the active directory provider called --assume-default-domain which defaults to “no”.  The help describes this option as "Specifies whether to look up unqualified user names in the primary domain. If set to no, the primary domain must be specified for each authentication operation.”.  If you set this option to “yes” you will get the short "USERNAME" and "GROUP" instead of the format with the "DOMAIN\" prefix. Just what we need!

 

On the Isilon I set the option to "yes" with the command below.  Note that I am not sharing this cluster with other existing users or workloads.  Be cautious and test this option if you are sharing the Isilon since this option is global for the AD provider on the Isilon.

 

isi auth ads modify KEITH.COM --assume-default-domain=yes

 

What do the permissions look like now in HDFS?  No other changes are required, the "isi auth..." command above on the Isilon is enough to flip the format, see below.

 

# sudo -u hdfs hdfs dfs -ls /user

Found 20 items

<snip>

drwxr-xr-x  - keith    domain users          0 2016-07-27 16:11 /user/keith

<snip>

 

No more domain prefixes in the UID/GID names!  Just a clean UID of "keith" and a GID of "domain users"!  I can now su into that user and try to run some hdfs commands and a test Hadoop job (calculate pi).  Notice that the identity of my user is the same as before, nothing has changed other than I removed the domain prefix to the UID/GID.

 

# su - keith

-sh-4.1$ id

uid=710801189(keith) gid=710800513(Domain Users) groups=710800513(Domain Users)

-sh-4.1$ hdfs dfs -ls /user/keith

Found 3 items

drwx------  - keith domain users          0 2016-07-14 18:52 /user/keith/.staging

-rw-r--r--  1 keith domain users        158 2016-07-14 18:54 /user/keith/in

drwxr-xr-x  - keith domain users          0 2016-07-14 18:54 /user/keith/out

$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000

Number of Maps  = 10

Samples per Map = 1000

<snip>

Starting Job

<snip>

Job Finished in 2.801 seconds

Estimated value of Pi is 3.14080000000000000000

 

Success!  I can now login to my Linux CLI gateway host and run Hadoop jobs as an AD user.  Hopefully this was helpful, look forward to any comments!

Data needs protection and keeping mulitple copies is a best practice for EMC customers.  Isilon users can asynchronously replicate data between Isilon clusters using SyncIQ software for disaster recovery, backup, and archival requirements.  SyncIQ runs in parallel across all source and target cluster nodes to maximum scalability of performance and bandwidth.  As you add nodes, you increase the resources available to move data between clusters with SyncIQ.  See the diagram below taken from the SyncIQ best practices guide.

Screen Shot 2016-06-09 at 5.18.08 PM.png

https://www.emc.com/collateral/hardware/white-papers/h8224-replication-isilon-synciq-wp.pdf

 

What if you don't want to use every node during replication?  What happens if you limit replication to a subset of nodes on the source and target cluster?  This blog post will explore how SyncIQ replication works when every node is used and when replication is restricted to certain nodes on the source and target cluster.  Lets get started!

 

Overview of OneFS networking and SmartConnect

 

An Isilon cluster will have multiple ethernet ports per Isilon node that can be configured with multiple subnets and IP address pools.  Physical network interfaces can be assigned to each IP address pool in various combinations; a sysadmin can assign some or all network ports to each IP address pool based on their cluster design requirements. Flexibility is the key here, you do not have to assign all physical network ports to an IP address pool, many combinations are possible especially if you have multiple Isilon nodes types.

 

OneFS SmartConnect software binds all the networking together by assigning a unique DNS name to each subnet IP address pool and shaping front end network traffic by way of DNS name.  Want to have an application use a certain subnet on fast S210 Isilon nodes but keep user home directories on dense X410 nodes?  Create a subnet and IP pool for your application and only include the S210 physical network ports and then assign a SmartConnect DNS name 'performancezone.isilon.yourdomain.com'.  Then create a subnet and IP pool for your home directories and only include the X410 physical network ports and assign a SmartConnect DNS name 'generalzone.yourdomain.com'.  Use OneFS SmartPools software and you can 'pin' your home directory data to X410 disks and application data to your S210 disks.  See the diagram below taken from the SmartConnect white paper.

Screen Shot 2016-06-13 at 3.52.38 PM.png

https://www.emc.com/collateral/hardware/white-papers/h8316-wp-smartconnect.pdf

 

SyncIQ and networking

 

OneFS SyncIQ replication can also take advantage of this networking flexibility and use all source/target nodes for replication or restrict replication by source/target subnet IP address pools.  SyncIQ is configured by creating policies which are subsets of the cluster data to be replicated with certain configuration options for that subset of data.  Policies are configured on the source Isilon cluster and the target Isilon cluster is defined in the policy.  Think of SyncIQ as a "push" configuration rather than a "pull" configuration since it is configured on the source and pushes replication data to the target.

 

A SyncIQ policy defaults to using all nodes in the cluster for replication on the source cluster (screenshot below).  Optionally, the nodes on the source cluster can be restricted by subnet IP address pools to only allow specific nodes on a specific subnet IP pool to run SyncIQ.  On the target cluster, SyncIQ defaults to using all nodes on the target cluster.  Restriction on the target cluster is done by selecting the "Connect only to the nodes..." option checkbox (screenshot below).

 

SyncIQ policy source cluster restriction

Screen Shot 2016-06-13 at 5.33.45 PM.png

SyncIQ policy target cluster restriction

Screen Shot 2016-06-13 at 5.38.42 PM.png

What does it mean to restrict nodes?  Does this restriction limit network bandwidth by only using certain interfaces or does it limit cluster resources for replication?  What actually happens on the cluster during replication?

 

SyncIQ workers

 

SyncIQ moves data using worker threads on both the source and target cluster.  We mentioned above that a SyncIQ policy is a set of data to replicate with certain configuration attributes.  When a SyncIQ policy runs, OneFS spawns workers on both the source cluster and the target cluster and distributes the list of files to be replicated across all the workers spawned by the job.  The workers on the source cluster move data to the workers on the target cluster.  Worker threads on the source cluster are 'pworkers' and threads on the target cluster are 'sworkers'.

 

The number of workers spawned can be tuned to match the desired replication performance at the cost of increasing resource utilization on the clusters.  The number of workers are defined by the SyncIQ policy and defaults to 3 per Isilon node on the source cluster (screenshot below).  Increasing the number of workers assigned to a policy can help replication run faster but will use more CPU on the Isilon cluster since more threads are running simultaneously.  The default is recommended in most cases but can be increased following the best practices white paper (https://www.emc.com/collateral/hardware/white-papers/h8224-replication-isilon-synciq-wp.pdf).  If both the source cluster and target cluster have the same number of nodes (or roughly the same) there will be the same number of threads running on the target cluster during replication (default 3 per node).

 

SyncIQ policy - worker threads per node

Screen Shot 2016-06-14 at 10.21.49 AM.png

There are limits in OneFS that will determine how many worker threads will run at any given time on the source and target clusters.  OneFS 7.x and earlier can have 100 SyncIQ policies total but only 5 policies can run at a time.  The default number of workers per node is 3 and the maximum is 8.  The maximum number of workers per job is 40 so the absolute maximum number of workers running at any given time is 200 (40 workers maximum per job, 5 jobs running maximum).  These limits are lifted in OneFS 8.x and above to 1000 SyncIQ policies maximum and 50 concurrent SyncIQ jobs.  The maximum number of workers in OneFS 8.x is determined by the number of CPUs in the cluster and the maximum workers per policy is determined by the number of nodes in the cluster.  Again, see the SyncIQ white paper for the details.

 

Last point about workers, if the source cluster and target cluster do not have the same number of nodes the workers spawned will negotiate to the number of nodes in the smallest cluster.  Say a 10 node source cluster is replicating to a 5 node target cluster, we won't get our default 3 workers per node on the source cluster because we don't have the same number of resource on the target cluster.  In that example, we will see 15 workers on both clusters (smallest cluster is 5 nodes, 3 workers per node).  So SyncIQ will limit the resources consumed to avoid overwhelming the smaller target cluster that doesn't have the same CPU resources as the source.

 

SyncIQ example - no restrictions

 

Lets look at an example now to see what happens during replication when we use the SyncIQ policy defaults of no restrictions.  I have a three (3) node source cluster called "source" and a three (3) node target cluster called "target".  I have a SyncIQ policy configured on the source cluster that does not restrict source nodes and does not restrict target nodes.  See the partial command output below for the properties of my sample SyncIQ policy, notice the text in red, I am not specifying a source subnet or pool (no source restriction) and I am not restricting the target in this policy.

 

SyncIQ policy

                       Name: mySyncIQpolicy

                       Path: /ifs/data

                     Action: sync

                    Enabled: Yes

                     Target: target

                Description:

            Check Integrity: Yes

Source Include Directories: /ifs/data/small files

Source Exclude Directories: -

              Source Subnet: -

                Source Pool: -

      Source Match Criteria:

                Target Path: /ifs/sourceisilon

    Target Snapshot Archive: Yes

    Target Snapshot Pattern: SIQ_%{SrcCluster}_%{PolicyName}_%Y-%m-%d_%H-%M

Target Snapshot Expiration: 6M

      Target Snapshot Alias: SIQ_%{SrcCluster}_%{PolicyName}

Target Detect Modifications: Yes

    Source Snapshot Archive: No

    Source Snapshot Pattern:

Source Snapshot Expiration: Never

                   Schedule: Manually scheduled

                  Log Level: notice

          Log Removed Files: No

           Workers Per Node: 3

             Report Max Age: 1Y

           Report Max Count: 2000

            Force Interface: No

    Restrict Target Network: No

<snip>

 

Since I am not restricting replication on the source or target cluster I will use the subnet "subnet0" and IP address pool "pool0" on both clusters as seen below.  My source and target clusters are virtual (download here --> https://www.emc.com/products-solutions/trial-software-download/isilon.htm) and each have a single 'ext' interface per virtual Isilon node with three nodes per cluster.  IP addresses and physical interfaces are in red so we can reference later.

 

Source cluster IP address pool

subnet0:pool0 - Default ext-1 pool

          In Subnet: subnet0

         Allocation: Static

             Ranges: 1

                     192.168.0.57-192.168.0.59

    Pool Membership: 3

                     1:ext-1 (up)

                     2:ext-1 (up)

                     3:ext-1 (up)

   Aggregation Mode: Link Aggregation Control Protocol (LACP)

        Access Zone: System (1)

<snip>

 

Target cluster IP address pool

subnet0:pool0 - Default ext-1 pool

          In Subnet: subnet0

         Allocation: Static

             Ranges: 1

                     192.168.0.60-192.168.0.62

    Pool Membership: 3

                     1:ext-1 (up)

                     2:ext-1 (up)

                     3:ext-1 (up)

   Aggregation Mode: Link Aggregation Control Protocol (LACP)

        Access Zone: System (1)

<snip>

 

I will run my SyncIQ policy (isi sync jobs start <policyname>) and tail the /var/log/isi_migrate.log file on one of the source nodes and one of the target nodes.  Note that every Isilon node has an isi_migrate.log file so you must watch the log file of a node participating in replication during a SyncIQ job.  My SyncIQ job kicks off and logs the following messages on one of the nodes in each cluster:

 

Source isi_migrate.log

2016-06-14T17:31:32Z <3.6> source-1(id1) isi_migrate[73064]: coord[mySyncIQpolicy:1465925491]: source nodes: 3 nodes available: 1)192.168.0.57 2)192.168.0.58 3)192.168.0.59

<snip>

2016-06-14T17:31:32Z <3.6> source-1(id1) isi_migrate[73065]: primary[mySyncIQpolicy:1465925491]: Starting worker 0 on 'mySyncIQpolicy' with 192.168.0.60

2016-06-14T17:31:32Z <3.6> source-1(id1) isi_migrate[73067]: primary[mySyncIQpolicy:1465925491]: Starting worker 4 on 'mySyncIQpolicy' with 192.168.0.61

2016-06-14T17:31:32Z <3.6> source-1(id1) isi_migrate[73066]: primary[mySyncIQpolicy:1465925491]: Starting worker 2 on 'mySyncIQpolicy' with 192.168.0.62

<snip>

 

Target isi_migrate.log

2016-06-14T17:31:31Z <3.6> target-1(id1) isi_migrate[74116]: secondary[mySyncIQpolicy:1465925491]: 3 nodes available: 1)192.168.0.60 2)192.168.0.61 3)192.168.0.62

<snip>

2016-06-14T17:31:32Z <3.6> target-1(id1) isi_migrate[74121]: secondary[mySyncIQpolicy:1465925491]: Starting worker 8 on 'mySyncIQpolicy' from 192.168.0.57

2016-06-14T17:31:32Z <3.6> target-1(id1) isi_migrate[74119]: secondary[mySyncIQpolicy:1465925491]: Starting worker 7 on 'mySyncIQpolicy' from 192.168.0.58

2016-06-14T17:31:32Z <3.6> target-1(id1) isi_migrate[74120]: secondary[mySyncIQpolicy:1465925491]: Starting worker 6 on 'mySyncIQpolicy' from 192.168.0.59

 

Just as we expected, the source Isilon logs that it has three source nodes available (by IP addresses in red) and it starts workers on the three target nodes available (by IP addresses in red).  The target Isilon logs the same messages but in reverse, it logs it has three target nodes available and starts workers from the source IP addresses.  This is only a snippet from a single node's log on each cluster.  Each node in each cluster will have a unique isi_migrate.log, if I wanted to get info on all threads running I would have to look at the isi_migrate.log on every node in both clusters.

 

Now lets look at the SyncIQ processes running on each node using a 'ps' command and looking for the pworkers (source threads) and sworkers (target threads).  The isi_for_array command allows me to type my command once and run it on all nodes in the cluster.  In the output the clusters are named "source" and "target" and the three nodes each have a unique identifier (ie, source-1, source-2, and source-3).

 

Source Isilon 'ps' output

# isi_for_array "ps -aux | grep pwork"

source-1: root   73065  0.0  1.4 170204 14248  ??  S     5:31PM   0:00.92 isi_migr_pworker

source-1: root   73066  0.0  1.4 170204 14184  ??  S     5:31PM   0:00.84 isi_migr_pworker

source-1: root   73067  0.0  1.4 170204 14192  ??  S     5:31PM   0:00.67 isi_migr_pworker

source-2: root   15087  0.0  1.4 170204 14328  ??  S     5:31PM   0:00.55 isi_migr_pworker

source-2: root   15088  0.0  1.4 170204 14312  ??  S     5:31PM   0:00.50 isi_migr_pworker

source-2: root   15089  0.0  1.4 170204 14332  ??  S     5:31PM   0:01.03 isi_migr_pworker

source-3: root   46644  0.0  1.6 170204 15860  ??  S     5:31PM   0:00.37 isi_migr_pworker

source-3: root   46645  0.0  1.6 170204 15924  ??  D     5:31PM   0:01.55 isi_migr_pworker

source-3: root   46646  0.0  1.6 170204 15840  ??  S     5:31PM   0:00.48 isi_migr_pworker

 

Target Isilon 'ps' output

# isi_for_array "ps -aux | grep swork"

target-3: root   70568  1.0  0.8 170204 16812  ??  D     5:31PM   0:00.75 isi_migr_sworker

target-3: root   70569  1.0  0.8 170204 16812  ??  D     5:31PM   0:00.31 isi_migr_sworker

target-3: root   70570  0.0  0.8 170204 16732  ??  S     5:31PM   0:00.02 isi_migr_sworker

target-2: root   70739  1.0  0.8 170204 16680  ??  D     5:31PM   0:00.85 isi_migr_sworker

target-2: root   70740  0.0  0.8 170204 16680  ??  D     5:31PM   0:00.52 isi_migr_sworker

target-2: root   70741  0.0  0.8 170204 16672  ??  D     5:31PM   0:00.55 isi_migr_sworker

target-1: root   74119  1.0  0.8 170204 17548  ??  D     5:31PM   0:01.56 isi_migr_sworker

target-1: root   74120  1.0  0.8 170204 17060  ??  D     5:31PM   0:02.70 isi_migr_sworker

target-1: root   74116  0.0  0.8 168108 17428  ??  S     5:31PM   0:00.05 isi_migr_sworker

target-1: root   74121  0.0  0.8 170204 17064  ??  D     5:31PM   0:00.27 isi_migr_sworker

 

My SyncIQ policy defaults to 3 worker per node (as seen earlier) and we see it indeed starts three pworker threads on each source node since there are no source cluster restrictions.  We can also see that since I have an equal number of target cluster nodes we get three sworker threads on each target node because there are no target node restrictions. Notice the extra sworker thread on the target cluster, that extra sworker coordinates the running SyncIQ job policy across the target nodes.

 

So far so good, no restrictions means we default to 3 workers per node per cluster and we can verify by looking at the 'ps' output.  If we increased the number of workers per node in the SyncIQ policy we would see more running sworker and pworker pairs.  Network replication traffic will flow across all "ext" (external) interfaces in each IP address pool (see command output that was listed earlier).

 

SyncIQ example - restrict source and target

 

Lets now look at an example of restricting source and target replication and see what impact this has on the pworkers and sworkers.  I will use the same source/target clusters but I will first create a new subnet on each for SyncIQ traffic and only include a single node's "ext" interface in each IP address pool.  Note the single IP address range in red and the single ext (external) interface in red. The target cluster will need this SyncIQ address pool configured for SmartConnect which simply means associating a DNS name with this address pool.

 

Source cluster restricted IP address pool

SyncIQ:SyncIQ

          In Subnet: SyncIQ

         Allocation: Static

             Ranges: 1

                     192.168.1.5-192.168.1.5

    Pool Membership: 1

                     1:ext-1 (up)

   Aggregation Mode: Link Aggregation Control Protocol (LACP)

        Access Zone: System (1)

<snip>

 

Target cluster restricted IP address pool

SyncIQ:SyncIQ

          In Subnet: SyncIQ

         Allocation: Static

             Ranges: 1

                     192.168.1.6-192.168.1.6

    Pool Membership: 1

                     1:ext-1 (up)

   Aggregation Mode: Link Aggregation Control Protocol (LACP)

        Access Zone: System (1)

<snip>

 

I then modify my SyncIQ policy to restrict the source nodes to the "SyncIQ:SyncIQ" subnet and address pool and I also restrict the job to only run on target nodes in the target cluster SmartConnect zone (specified DNS name for the SyncIQ IP address pool).

 

Restricted SyncIQ policy

                       Name: mySyncIQpolicy

                       Path: /ifs/data

                     Action: sync

                    Enabled: Yes

                     Target: target

                Description:

            Check Integrity: Yes

Source Include Directories: /ifs/data/small files

Source Exclude Directories: -

              Source Subnet: SyncIQ

                Source Pool: SyncIQ

      Source Match Criteria:

                Target Path: /ifs/sourceisilon

    Target Snapshot Archive: No

    Target Snapshot Pattern: SIQ_%{SrcCluster}_%{PolicyName}_%Y-%m-%d_%H-%M

Target Snapshot Expiration: Never

      Target Snapshot Alias: SIQ_%{SrcCluster}_%{PolicyName}

Target Detect Modifications: Yes

    Source Snapshot Archive: No

    Source Snapshot Pattern:

Source Snapshot Expiration: Never

                   Schedule: Manually scheduled

                  Log Level: notice

          Log Removed Files: No

           Workers Per Node: 3

             Report Max Age: 1Y

           Report Max Count: 2000

            Force Interface: No

    Restrict Target Network: Yes

<snip>

 

Lets run the job and see what happens.  I am going to watch the /var/log/isi_migrate.log from the node IP addresses specified in the SyncIQ IP address pools (192.168.1.5 and 192.168.1.6) since every node has an isi_migrate.log file but may not be participating actively in the replication job.

 

Source isi_migrate.log

2016-06-14T18:23:50Z <3.6> source-1(id1) isi_migrate[75670]: coord[mySyncIQpolicy:1465928629]: source nodes: Restriction 'SyncIQ:SyncIQ' has 1 nodes: 1)192.168.1.5

<snip>

2016-06-14T18:23:50Z <3.6> source-1(id1) isi_migrate[75671]: primary[mySyncIQpolicy:1465928629]: Starting worker 0 on 'mySyncIQpolicy' with 192.168.1.6

2016-06-14T18:23:50Z <3.6> source-1(id1) isi_migrate[75672]: primary[mySyncIQpolicy:1465928629]: Starting worker 1 on 'mySyncIQpolicy' with 192.168.1.6

2016-06-14T18:23:50Z <3.6> source-1(id1) isi_migrate[75673]: primary[mySyncIQpolicy:1465928629]: Starting worker 2 on 'mySyncIQpolicy' with 192.168.1.6

 

Target isi_migrate.log

2016-06-14T18:23:50Z <3.6> target-1(id1) isi_migrate[75805]: secondary[mySyncIQpolicy:1465928629]: 1 nodes available: 1)192.168.1.6

2016-06-14T18:23:50Z <3.6> target-1(id1) isi_migrate[75806]: secondary[mySyncIQpolicy:1465928629]: Starting worker 0 on 'mySyncIQpolicy' from 192.168.1.5

2016-06-14T18:23:50Z <3.6> target-1(id1) isi_migrate[75807]: secondary[mySyncIQpolicy:1465928629]: Starting worker 1 on 'mySyncIQpolicy' from 192.168.1.5

2016-06-14T18:23:50Z <3.6> target-1(id1) isi_migrate[75808]: secondary[mySyncIQpolicy:1465928629]: Starting worker 2 on 'mySyncIQpolicy' from 192.168.1.5

 

We can see that the restriction it not just a networking interface restriction, the SyncIQ job enforces the restrictions by only running workers on the nodes in the source IP address pools and the target SmartConnect zone (target IP address pool with a DNS name associated).  Lets look at the SyncIQ pworkers and sworkers to confirm.

 

Isilon source 'ps' output

# isi_for_array "ps -aux | grep pwork"

source-1: root   75671  0.0  1.4 170204 14504  ??  S     6:23PM   0:03.64 isi_migr_pworker

source-1: root   75672  0.0  1.4 170204 14208  ??  S     6:23PM   0:03.29 isi_migr_pworker

source-1: root   75673  0.0  1.4 170204 14208  ??  D     6:23PM   0:03.13 isi_migr_pworker

 

Isilon target 'ps' output

# isi_for_array "ps -aux | grep swork"

target-1: root   75806  1.0  1.0 173276 19728  ??  D     6:23PM   0:16.69 isi_migr_sworker

target-1: root   75807  1.0  0.9 173276 19428  ??  D     6:23PM   0:16.07 isi_migr_sworker

target-1: root   75808  1.0  0.8 170204 17568  ??  D     6:23PM   0:14.81 isi_migr_sworker

target-1: root   75805  0.0  0.8 168108 17528  ??  S     6:23PM   0:00.17 isi_migr_sworker

 

That confirms it, the restriction not only moves traffic on the single 'ext-1' interface defined in each IP addres pool but also only spawns pworkers and sworkers on the node with that selected interface.  In our restricted example, only one source node interface (source-1/ext-1) is a member of the SyncIQ subnet/pool and only one interface (target-1/ext-1) is associated with the target subnet/pool.  And when our SyncIQ policy runs, it only runs SyncIQ processing on the first source cluster node (source-1) and the first target cluster node (target-1) which means we only get three worker threads for this job (plus the extra coordinator sworker thread on the target).

 

I don't have to restrict both the source and target, I could configure the SyncIQ policy to only restrict the source and allow all target nodes to participate (or vice versa).  In that case, the total number of workers will negotiate down to the lowest number of nodes participating on either the source or target. 

 

Conclusion

 

Restricting source and/or target nodes in a SyncIQ policy will limit the SyncIQ threads (pworkers and sworkers) to run only on the nodes participating in the restricted policy IP address pools.  It may seem like restricting the policy to run on certain physical network interfaces will force the network traffic only across those interfaces but it will also restrict the SyncIQ worker threads to run on the nodes participating in those IP address pools.  This results in less overall threads for the SyncIQ job because the workers will not run on every node in the source or target cluster with a restriction.

 

Use the info in this blog to make your SyncIQ design decisions regarding restricting SyncIQ participant nodes.  A SyncIQ job will use more workers if it has access to every node in the cluster, will use more cluster CPU cycles, and will finish faster.  Restricting SyncIQ policies to only run on certain cluster interfaces will only run workers on the subset of nodes participating in that subnet IP address pool, will use less cluster CPU cycles, and will take more time to complete.

CloudPools is here!  With the release of OneFS 8.0 we now have the opportunity to help Isilon customers tier infrequently accessed data to public or private cloud providers using the Isilon SmartPools policy engine they know and love.  "Infrequently accessed" is the key phrase here since we are targeting cold or frozen data sets which are not typically accessed but cannot be deleted for business reasons.  The goal of this blog is to help size a CloudPools configuration with an existing Isilon cluster to determine how much data can targeted for CloudPools using the tools available.

 

My current focus involves large service providers and outsourcers which could be potential candidates for your CloudPool target data. CloudPools can push your frozen data to a service provider using EMC gear or push this data to Amazon AWS and Microsoft Azure using a proprietary solution.  Engage with an EMC Isilon presale engineer or an EMC partner for help with this process, they will be glad to help gather this info and make design decisions.  You may think an Isilon cluster has  a large amount of frozen data while talking to a business unit or application owner but will need to validate those numbers.  A solid design based on data collected from the Isilon cluster will help you make better decisions and will avoid sticker shock and performance problems in the future by doing the analysis prior to pushing data to a cloud provider.

 

Why spend all this effort validating frozen data for CloudPools?  Cost and performance.  Data that is tiered out via CloudPools will leave your LAN environment and move over the WAN.  LAN speeds and NAS protocols have a typical performance level that is measured in milliseconds and good performance falls in a certain range (say 5-10ms, possibly 20ms).  Moving that data over the WAN using object protocols may not result in the same response times.  So if an application is frequently accessing data that has been stubbed via CloudPools performance will suffer.  Additionally, service providers will charge a rate for capacity consumed and will typically give a small allotted rate of data recalls or egress bandwidth per month.  If an application is consistently pulling data back from the service provider the Isilon cluster is going to not only have decreased performance but most likely your business will incur additional cost from the data recalls since you may overrun the egress bandwidth in your service provider agreement.  Size this solution correctly and you will have less risk of performance problems and hidden egress service provider costs.

 

Why use CloudPools?  Why wouldn't we just add a tier of Isilon archive nodes (NL410s or HD400s) and call it a day?  Sizing the data sets will ultimately demonstrate the capacity and price points.  Paying a cloud provider 5 cents per GB per month (not a bad price) over the course of three years for 100TB would total $180,000 which could seem attractive to a business looking to utilize the public cloud.  Ignoring any discounting, those prices don't get better with scale.  500TB would be $900K over three years and could be much more expensive than purchasing a high density node tier or even EMC ECS appliance for an internal object store solution.  Again, identifying the frozen data is key to helping you make the decision to buy or move data to the cloud.

 

Will your application work with CloudPools?

 

Sizing for user home directories and department shares can be straightforward since there are no application concerns with the code levels or file stubbing.  Simply work on a plan to move to OneFS 8.0 keeping in mind the EMC  engineering target code list (https://support.emc.com/docu46145_Current-Isilon-Software-Releases.pdf?language=en_US).  Individual user interactions using SMB and NFS will work fine with file stubs if data must be recalled from the service provider.

 

Applications require more diligence since some solutions need certification from both the independent software vendor (ISV) and EMC.  If you can identify the application workflow your EMC presales engineer or partner can help you tell if that application has been certified with OneFS 8.0 and if it has been certified to work with file stubs used by CloudPools.  Again, your EMC or partner presales engineer can help you plan for interactions between the Isilon and the application.  Ideally the application will store data in some type of hierarchy that may even be organized by date which makes things easier to target for moving offsite due to infrequent access.  The goal is to reduce risk by confirming certification and then moving ahead to target frozen data sets.

 

What type of cloud storage is appropriate?

 

OneFS 8.0 supports Amazon AWS, Microsoft Azure, EMC ECS, and EMC Isilon as of this writing (soon to support Virtustream).  EMC ECS and Isilon are not a concern since they are fully supported no matter what hardware is used.  The non-EMC providers were certified with CloudPools so its important to stick with a solution that was certified and not introduce the risk of pushing data to a class of service that was not tested by EMC.  This information will most likely become outdated soon!

 

Amazon AWS has several public cloud object storage services available with detailed SLAs and pricing calculators.  The goal of this blog is not to discuss each in detail but to point out that AWS S3 Standard is supported and the only platform of choice.  As of this writing, AWS S3 Standard - Infrequent Access (IA) is not certified nor is Amazon Glacier.  Glacier is not a real time storage access platform, documentation from Amazon states a 3-5 hour response time for data access requests.  This will result in timeouts when users try to access stubbed data and is not the right solution at this point.  Amazon also has a tiering mechanism within AWS that moves data between platforms which is not certified since the data will eventually reside on Glacier.  Stick with S3 Standard!

 

Microsoft Azure is easy, CloudPools supports Azure Blob storage, specifically block blobs.

 

How do we decide what to archive and how much data can we archive?

 

Lets size this solution!  Isilon customers with multiple node types will be familiar with SmartPools tiering and already have existing policies for moving data between tiers. Those unfamiliar with SmartPools will need a SmartPools license and an overview of how Isilon tiers data based on the defined metadata attributes. Finding the right policy and estimating the amount of data that will be moved by that policy is the goal.  The "file matching criteria" will be our first decision to discuss with the business app owners and InsightIQ and/or Mitrend will be our tool for estimating the amount of data impacted by that policy.

 

Keep in mind the impact of performance and cost will make on our choices.  Using the "modify time" attribute is probably not the best choice for CloudPools.  Why?  Users and applications can read data daily without modifying it.  Which means you could be recalling data from the cloud provider much more often than desired impacting application performance and incurring egress/recall fees from your provider.  Look at "access time" as your method of choice for tiering out to the cloud or work with the application team to design a method that will reduce risk.

 

SmartPools Filtering - what to archive?

 

The SmartPools file matching criteria will determine what data gets moved to the cloud.  Below are the matching methods used by SmartPools and some ideas about how each could be used with CloudPools.  This is by no means the definitive answer regarding the use of each criteria but a guide to working with your customers.

 

  • Filename - Does your application rename files when they age out and can use a naming pattern to tier data?  Or maybe you script a rename operation for data no longer required?

 

  • Path - Can you move data to an /ifs/frozen path when its no longer required and automatically move data in that path off the Isilon to the cloud?  Create a frozen directory as an archive bucket for old data that can't be deleted?

 

  • File Type - see filename

 

  • File Attribute - Do you use custom user defined attributes for old data?

 

  • Modified - Popular with existing customers that have not turned on access time tracking.  Probably not your first choice for CloudPools since data can get accessed frequently but not modified which means you will run the risk of not understanding how much data you will recall from a cloud provider.  Performance can suffer and you may incur egress/recall costs since you will not understand the data access patterns.

 

  • Accessed - Probably the best attribute for CloudPools in the absence of any other information.  Access time tracking needs to be enabled (disabled by default), see the screenshot below.  We will talk about turning on access time tracking later in this post.

 

  • Metadata Changed - Access time is probably better, could be interesting for some applications

 

  • Created - Access time is probably better, a file created 2 years ago may be accessed daily

 

  • Size - Could be interesting in a combination rule, an application could consolidate smaller files into a very large file as part of an archive process in preparation for long term archival needs

 

Screenshot of the matching criteria

Screen Shot 2016-04-12 at 8.57.02 AM.png

Screenshot of access time tracking

Screen Shot 2016-04-12 at 8.57.56 AM.png

InsightIQ Reporting - how much data can we archive?

 

InsightIQ will give us a set of tools to estimate the amount of data we will push to a cloud provider using SmartPools file matching.  Again, access time tracking is off by default which also means we will not get any information regarding access time and data sizes from InsightIQ by default.  The default reports will show file counts by physical and logical sizes which won't help other than to show the histogram of all data on the cluster.

 

File counts by physical size

Screen Shot 2016-04-12 at 8.58.46 AM.png

File counts by logical size

Screen Shot 2016-04-12 at 8.58.52 AM.png

 

InsightIQ will show the datasets by modify time by default which allows us to estimate the file count and sizes of the data by modify time.  Modify time is probably not your best rule for CloudPools for reasons discussed above but is the best you will get with the default OneFS settings.  The screenshot below shows my histogram with my simulator, I could break this down further by adjusting the "breakout by" options for logical or physical size.

 

File counts by last modified

Screen Shot 2016-04-12 at 8.58.58 AM.png

 

Mitrend - how much data can we archive using access time?

 

So how do you size a CloudPools configuration using access time with the default OneFS setting of access time tracking disabled?  You could turn on access time tracking (discussed later) or you could use the Mitrend Scanner.  This scanner is  great visualization tool of file data and requires no changes on the Isilon cluster.  Just download the tool to a Windows VM and scan the shares on the Isilon for a detailed analysis including access time data estimates.  Better yet, use the scanner to scan the entire cluster by mapping to the "ifs" share or a hidden Windows admin share (ifs$) which makes things very easy when running the scanner as a domain administrator.

 

Download the executable from the Mitrend site (Mitrend.com), rename the executable per the instructions, and run the "file analysis" scanner. Then use the wizard to add the UNC path to the 'ifs' admin share (if possible) and run the scanner until it completes.  Use the wizard to automatically upload the results by adding your contact information.  The scanner is very fast and can scan millions of files in a multithreaded fashion.  You will receive an email when the results are complete and get an excel configuration spreadsheet and a powerpoint presentation with the results.

 

Mitrend download

Screen Shot 2016-04-12 at 9.01.17 AM.png

Mitrend scanner - file analysis

Screen Shot 2016-04-12 at 9.02.48 AM.png

Mitrend scanner - add a UNC path

Screen Shot 2016-04-12 at 9.03.08 AM.png

Mitrend scanner - example of the cluster IFS share

Screen Shot 2016-04-12 at 9.04.28 AM.png

Mitrend scanner - completed

Screen Shot 2016-04-12 at 9.05.12 AM.png

 

Mitrend Results

 

The powerpoint you receive from Mitrend will include some statistics already available in InsightIQ reports but will also give you the access time results without the need to turn on access time tracking on the cluster.  This saves time if you are reluctant to turn on access time tracking or have to wait for change controls or a change window.  The screenshots below are from my virtual Isilon installation so are small but give a good general idea of what to look for.  We can see there is a decent amount of data that has not been accessed in > two years (~1.5GB) which would be the perfect candidate for CloudPools.  There is also some data that hasn't been accessed in 1 - 2 years that would also work well tiered off the cluster.  The pie chart shows ~28% of the total data falls into this > 1 year access pattern.

 

This is the best starting point for CloudPools sizing.  Pull the Mitrend report and provide this data to IT management and the application owners showing access patterns and a realistic capacity estimate for tiering off the Isilon to a cloud provider.  Modify times and estimates are also included in the Mitrend output so explain to the customer how modify time may not be a great idea unless possibly combined with another rule, say access time and modify time or any other combination the application team thinks could make sense.

 

Mitrend results histogram of access time

Screen Shot 2016-04-12 at 9.36.27 AM.png

Mitrend results pie chart of access time

Screen Shot 2016-04-12 at 9.36.37 AM.png

 

Access time tracking

 

Why is access time tracking disabled by default?  Is it ok to enable?  See the KB link below to get the official view on this feature and the procedure to enable.  Basically you can negatively impact cluster performance if you set access time tracking with too much precision.  In other words, setting this to update every hour is going to overwhelm cluster resource but setting this value to one day is a better practice.  In our CloudPools example, we wouldn't want something as granular as an hour since we simply want to find old data that hasn't been accessed in a long time.

 

OneFS: How to enable access time tracking (atime)

https://support.emc.com/kb/303681

 

If and when you decide to tier data via CloudPools to a cloud provider you will need to enable access time tracking to tier based on access time.  Open an support case if you are not comfortable with you customer making these changes and customer support can help.  The process is not complicated and the changes are shown in the screenshots below.  Again, set the precision to one day and nothing more granular.

 

Access time tracking settings

Screen Shot 2016-04-12 at 8.59.39 AM.png

Once access time tracking is enabled you will need to run an FSA job which will populate InsightIQ with additional information.  Once this has been done, you can filter the InsightIQ reports to filter by "accessed time" and get the same functionality as a Mitrend report.  The screenshot below shows ~11,000 files between 8KB and 100MB that haven't been accessed in over two years from both a physical and logical size.  Note, you cannot filter by access time in InsightIQ until you turn on access time tracking!

 

InsightIQ - file count by physical size filtered by accessed time

Screen Shot 2016-04-12 at 9.28.12 AM.png

InsightIQ - file count by physical size filtered by accessed time

Screen Shot 2016-04-12 at 9.28.27 AM.png

A custom filter can also be used to filter data by a certain attribute.  In this case we are filtering for all data last accessed 1-2 years ago which also shows file counts graphed agains physical and logical sizes.  Download these graphs to a CSV for access to the raw data which can be used for further analysis.

 

InsightIQ - custom filter

Screen Shot 2016-04-12 at 9.33.57 AM.png

InsightIQ - custom filter access time and physical size

Screen Shot 2016-04-12 at 9.34.08 AM.png

InsightIQ - custom filter access time and logical size

Screen Shot 2016-04-12 at 9.34.14 AM.png

Summary

 

Now that we've sized the solution we can see that an Isilon customer should take the time to really understand how this data is going to move off the cluster since it can have a negative impact on both application performance and operating changes from a provider.  Engage with EMC or partner presales tools to help remove risk and add a degree of confidence when implementing CloudPools.  More information is better and having a good grasp on how much data can be tiered off an Isilon cluster to the cloud will provide value to all parties involved.  Good luck!

 

Appendix - Operational concerns and service provider fees Q&A

 

How exactly should I tier data to a cloud provider?

Configure your cloud provider account and CloudPools tier following the OneFS 8.0 documentation.  Then add a SmartPools file policy to filter files based on the rule you agreed on with the application owners (access time) and move that data to the CloudPools tier.  The next time the SmartPools job runs it will evaluate the cluster data based on your rules and move data to the cloud provider.

 

You will not create a SmartPools rule to move data back from the cloud provider to the Isilon cluster if the inverse logic of that filter occurs.  Simply put, unlike tiering between node pools, CloudPools does not recall data based on the SmartPools file policy rules.  Data must be recalled manually in OneFS 8.0 using the CLI and does not get recalled based on a file policy rule.  This is why its even more important to carefully consider your file filtering rules for CloudPools, it will move to the cloud provider permanently and incur recall/egress fees every time the stub is accessed by a user or application.

 

Will Isilon job engine activity recall files and incur fees?

No, not under normal conditions.  The default SyncIQ jobs will only protect the stubs and will not pull the payload data from the cloud provider and the same applies to default NDMP activity.  You can elect to do a "deep copy" for SyncIQ and NDMP but that will of course pull all the data back from the cloud provider and incur recall fees.

 

Will I save space after the CloudPools stubbing process?

Yes, you will see the space savings when running 'isi status' since the files are replaced with smaller stubs and the space savings is visible to the Isilon storage admin. See the examples below.

 

Starting capacity:

demo-1# isi status -q

Cluster Name: demo

Cluster Health:     [  OK ]

Cluster Storage:  HDD                 SSD Storage

Size:             42.7G (72.5G Raw)   0 (0 Raw)

VHS Size:         29.8G

Used:             24.7G (58%)         0 (n/a)

Avail:            18.0G (42%)         0 (n/a)

 

 

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

---+---------------+-----+-----+-----+-----+-----------------+-----------------

  1|192.168.0.51   | OK  | 131k| 2.9M| 3.0M| 6.1G/18.1G( 34%)|(No Storage SSDs)

  2|192.168.0.52   | OK  |    0|    0|    0| 6.5G/18.1G( 36%)|(No Storage SSDs)

  3|192.168.0.53   | OK  |    0| 133k| 133k| 6.5G/18.1G( 36%)|(No Storage SSDs)

  4|192.168.0.54   | OK  |    0|92.9k|92.9k| 5.7G/18.1G( 31%)|(No Storage SSDs)

---+---------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals:          | 131k| 3.1M| 3.2M|24.7G/42.7G( 58%)|(No Storage SSDs)

 

Write a 1.18GB file:

demo-1# isi status -q

Cluster Name: demo

Cluster Health:     [  OK ]

Cluster Storage:  HDD                 SSD Storage

Size:             42.7G (72.5G Raw)   0 (0 Raw)

VHS Size:         29.8G

Used:             26.4G (62%)         0 (n/a)

Avail:            16.3G (38%)         0 (n/a)

 

 

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

---+---------------+-----+-----+-----+-----+-----------------+-----------------

  1|192.168.0.51   | OK  |65.6k| 1.9M| 2.0M| 6.5G/18.1G( 36%)|(No Storage SSDs)

  2|192.168.0.52   | OK  |    0|    0|    0| 6.9G/18.1G( 38%)|(No Storage SSDs)

  3|192.168.0.53   | OK  |    0|37.2k|37.2k| 6.9G/18.1G( 38%)|(No Storage SSDs)

  4|192.168.0.54   | OK  |    0|    0|    0| 6.1G/18.1G( 34%)|(No Storage SSDs)

---+---------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals:          |65.6k| 1.9M| 2.0M|26.4G/42.7G( 62%)|(No Storage SSDs)

 

Look at the newly written file (not stubbed yet):

demo-1# ls -lah /ifs/demo/archive/ecs\ node\ one.ova

-rwxrwxrwx +  1 root  wheel   1.2G Jan 26 20:18 /ifs/demo/archive/ecs node one.ova

 

demo-1# du -h /ifs/demo/archive/ecs\ node\ one.ova

1.6G /ifs/demo/archive/ecs node one.ova

demo-1#

 

Run SmartPools and stub to cloud:

File is now only 26KB when running 'du' but is still logically 1.2GB  per 'ls'

 

demo-1# du -hs /ifs/demo/archive/ecs.node.one.ova

26K /ifs/demo/archive/ecs.node.one.ova

 

demo-1# ls -lh /ifs/demo/archive/ecs.node.one.ova

-rwxrwxrwx +  1 root  wheel   1.2G Jan 26 20:18 /ifs/demo/archive/ecs.node.one.ova

demo-1#

 

Cluster utilization is now less "used" and more "available" by 1.2GB:

demo-1# isi status -q

Cluster Name: demo

Cluster Health:     [  OK ]

Cluster Storage:  HDD                 SSD Storage

Size:             42.7G (72.5G Raw)   0 (0 Raw)

VHS Size:         29.8G

Used:             24.8G (58%)         0 (n/a)

Avail:            17.9G (42%)         0 (n/a)

 

 

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

---+---------------+-----+-----+-----+-----+-----------------+-----------------

  1|192.168.0.51   | OK  | 148k| 290k| 437k| 6.1G/18.1G( 34%)|(No Storage SSDs)

  2|192.168.0.52   | OK  |    0|92.9k|92.9k| 6.5G/18.1G( 36%)|(No Storage SSDs)

  3|192.168.0.53   | OK  |    0|37.2k|37.2k| 6.5G/18.1G( 36%)|(No Storage SSDs)

  4|192.168.0.54   | OK  |    0|    0|    0| 5.7G/18.1G( 31%)|(No Storage SSDs)

---+---------------+-----+-----+-----+-----+-----------------+-----------------