Find Communities by: Category | Product

trimbn

Back In Time With SnapRevert

Posted by trimbn Jul 27, 2016

There have been a couple of recent inquiries from the field about the SnapRevert job.


For example:


How does the snapshot revert job work and what does it do to the file system?


For context, SnapRevert is one of three main methods for restoring data from a OneFS snapshot. The options are:


  • Copying specific files and directories directly from the snapshot
  • Cloning a file from the snapshot
  • Reverting the entire snapshot via the SnapRevert job

 

Copying a file from a snapshot duplicates that file, which roughly doubles the amount of storage space it consumes. Even if the original file is deleted from HEAD, the copy of that file will remain in the snapshot. Cloning a file from a snapshot also duplicates that file. Unlike a copy, however, a clone does not consume any additional space on the cluster - unless either the original file or clone is modified.


However, the most efficient of these approaches is the SnapRevert job, which automates the restoration of an entire snapshot to its top level directory. This allows for quickly reverting to a previous, known-good recovery point - for example in the event of virus outbreak. The SnapRevert job can be run from the Job Engine WebUI, and requires adding the desired snapshot ID.


There are two main components to SnapRevert:


  • The file system domain that the objects are put into.
  • The job that reverts everything back to what's in a snapshot.

 

So what exactly is a SnapRevert domain? At a high level, a domain defines a set of behaviors for a collection of files under a specified directory tree. The SnapRevert domain is described as a ‘restricted writer’ domain, in OneFS parlance. Essentially, this is a piece of extra filesystem metadata and associated locking that prevents a domain’s files being written to while restoring a last known good snapshot. 


Because the SnapRevert domain is essentially just a metadata attribute placed onto a file/directory, a best practice is to create the domain before there is data. This avoids having to wait for DomainMark (the aptly named job that marks a domain’s files) to walk the entire tree, setting that attribute on every file and directory within it.


The SnapRevert job itself actually uses a local SyncIQ policy to copy data out of the snapshot, discarding any changes to the original directory.  When the SnapRevert job completes, the original data is left in the directory tree.  In other words, after the job completes, the file system (HEAD) is exactly as it was at the point in time that the snapshot was taken.  The LINs for the files/directories don't change, because what's there is not a copy.


SnapRevert can be manually run from the OneFS WebUI by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button.


snaprevert_1.png


Additionally, the job’s impact policy and relative priority can also be adjusted, if desired:

 

snaprevert_2.png

 

Before a snapshot is reverted, SnapshotIQ creates a point-in-time copy of the data that is being replaced. This enables the snapshot revert to be undone later, if necessary.


Additionally, individual files, rather than entire snapshots, can also be restored in place using the isi_file_revert command line utility. This can help drastically simplify virtual machine management and recovery.


Before creating snapshots, it’s worth considering that reverting a snapshot requires that a SnapRevert domain exist for the directory that is being restored. As such, it is recommended that you create SnapRevert domains for those directories while the directories are empty. Creating a domain for an empty (or sparsely populated) directory takes considerably less time.


How do domains work?


Files may belong to multiple domains. Each file stores a set of domain IDs indicating which domain they belong to in their inode’s extended attributes table. Files inherit this set of domain IDs from their parent directories when they are created or moved. The domain IDs refer to domain settings themselves, which are stored in a separate system B-tree. These B-tree entries describe the type of the domain (flags), and various other attributes.

As mentioned, a Restricted-Write domain prevents writes to any files except by threads that are granted permission to do so. A SnapRevert domain that does not currently enforce Restricted-Write shows up as "(Writable)" in the CLI domain listing.


Occasionally, a domain will be marked as “(Incomplete)”. This means that the domain will not enforce its specified behavior. Domains created by job engine are incomplete if not all of the files that are part of the domain are marked as being members of that domain. Since each file contains a list of domains of which it is a member, that list must be kept up to date for each file. The domain is incomplete until each file's domain list is correct.


In addition to SnapRevert, OneFS also currently uses domains for SyncIQ replication and SnapLock immutable archiving.


Creating a SnapRevert domain

 

A SnapRevert domain needs to be created on a directory before it can be reverted to a particular point in time snapshot. As mentioned before, the recommendation is to create SnapRevert domains for a directory while the directory is empty.

 

The root path of the SnapRevert domain must be the same root path of the snapshot. For example, a domain with a root path of /ifs/data/marketing cannot be used to revert a snapshot with a root path of /ifs/data/marketing/archive.

 

For example, for snaphsot DailyBackup_07-27-2016_12:00 which is rooted at /ifs/data/marketing/archive:

 

1. First, set the SnapRevert domain by running the DomainMark job (which marks all the files):

 

# isi job jobs start domainmark --root /ifs/data/marketing --dm-type SnapRevert

 

2. Verify that the domain has been created:

 

# isi_classic domain list –l

 

Reverting a snapshot

 

In order to restore a directory back to the state it was in at the point in time when a snapshot was taken, you need to:

 

  • Create a SnapRevert domain for the directory.
  • Create a snapshot of a directory.

 

To accomplish this:

 

1. First, identify the ID of the snapshot you want to revert by running the isi snapshot snapshots view command and picking your PIT (point in time).

 

For example:

 

# isi snapshot snapshots view DailyBackup_07-27-2016_12:00

ID: 38

Name: DailyBackup_07-27-2016_12:00

Path: /ifs/data/marketing

Has Locks: No

Schedule: daily

Alias: -

Created: 2016-07-27T12:00:05

Expires: 2016-08-26T12:00:00

Size: 0b

Shadow Bytes: 0b

% Reserve: 0.00%

% Filesystem: 0.00%

State: active

 

2. Revert to a snapshot by running the isi job jobs start command. The following command reverts to snapshot ID 38 named DailyBackup_07-27-2016_12:00:

 

# isi job jobs start snaprevert --snapid 38

 

This can also be done from the WebUI, by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button.

 

snaprevert_3.png

 

Deleting a SnapRevert domain

 

If desired or required, SnapRevert domains can also be deleted using the job engine CLI:

 

1. Run the following command to delete the SnapRevert domain - in this example of for /ifs/data/marketing:

 

# isi job jobs start domainmark --root /ifs/data/marketing --dm-type SnapRevert --delete

 

2. Verify that the domain has been deleted:

 

# isi_classic domain list –l

 

 

More information about SnapshotIQ can also be found in the following blog article and white paper:

 

https://community.emc.com/community/products/isilon/blog/2016/03/02/onefs-snapshots-tech-overview

 

http://www.emc.com/collateral/white-papers/h15048-wp-next-gen-data-protection-snapshot-iq.pdf

Isilon presents a single unified permissioning model, in which multiprotocol clients can access the same files and a consistent security model is enforced. With Isilon and Hadoop leveraging existing Active Directory Users and by taking advantage of SFU-rfc2307 allocation of UID's & GID's permissioning of directories and files can be done in a simple unified manner. Since an authenticated user Isilon Access Token has both windows SID’s and POSIX UID/GID’s full multiprotocol file permission can be implemented with ease.

 

 

It is important to understand that file permission on Isilon are implemented in the following way.

 

At the high level Isilon permissions exist in one of two states:

  1. POSIX authoritative + Synthetic ACL

HDFS & NFS access is evaluated against the posix permissions

SMB access is evaluated against an equivalent synthetic ACL generated on access

(these two permission are identical and provide the same access)

 

1.png

 

 

 

  1. Real ACL

When a real ACL exists on a file, all access checks are made directly against the full ACL, this gives higher granularity and control over the permissions, even in this state NFS and HDFS access check are made against the ACL. The posix

O:G:E permission is still visible but it is representative only and does not define the permissions on the file. Since the ACL may contain any number of access control entries (ACE) the entire list must be evaluated regardless of protocol. OneFS will still show a posix permission but it is likely not reflective of the true underlying ACL.

 

 

2.png

 

 

Depending on your workflow and use case either of the Isilon permission models can be leveraged, but it is suggested to stick to one permission strategy for a workflow. Isilon provides a lot of additional documentation on its permission model and strategy, search http://support.emc.com for additional documents.

 

The management of permissions also effects the behavior and state of an Isilon file permission, it is suggested you fully understand the behavior of permission tools before managing a permission as you can change a files state. eg: if you change a posix file state permission with windows explorer you can potentially change it to an ACL'd file state. This may not be what you intended.

 

 

Best Practice:
Always view permission on the Isilon CLI with # ls -le  and #ls -len, this is show the full Isilon OneFS permission; either posix+synthetic ACL OR a real ACL.

 

 

 

It is important to understand the permissioning strategy when moving from a simple HDFS authentication model to Kerberized users; previously file permissions were based on local user UID & GID’s and username and it is unlikely true multiprotocol was in the picture. As the hadoop cluster is transitioned to Kerberos with an Active Directory using SFU-rfc2307 you are now moving to Directory Service based identities. This gives increased flexibility with permissioning and the inclusion of true authenticated multiprotocol access.

 

 

Upon completion of the kerberization and Active Directory integration a user running a job is now doing so in the context of an Active Directory account and the permissions applied are based on identities held by those objects.

 

 

As we transition to authenticated multiprotocol access we now have to think about which permission model we wish to implement based on workflow and protocol access. Let's take a look at few examples, there is no wrong answer as both work but it will likely make more sense to select one permission model over the other based on the access and the requirements.

 

 

1. Data will only be accessed via HDFS, the best permission model is to leverage the simple standard posix model.

 

3.png

2. Data will be now be accessed via HDFS and NFS, it still makes sense to use the standard posix model.

 

4.png

 

 

3. We have now introduced SMB, NFS and HDFS access to our file access. In this example NFS & HDFS is doing read-write operations, whereas we just need to provide some simple READ-ONLY capability to some windows users via SMB. We can achieve this access pattern by still leveraging standard posix permissions and maybe a restrictive SMB share permissions. (Since windows access is simple and limited to RO, a synthetic ACL will still likely meet our requirement)

 

5.png

 

 

4. We now need to provide a more granular and complex set of permissions to our data, SMB, NFS & HDFS users all need read-write access and we need to provide multiple AD groups with different permissions to our data. Since we now have to provide a much richer set of permissions, it makes sense to move our data to the ACL model where we can apply a much more granular and complex set of ACE's to our files. It also likely means we will need to manage our permissions differently. (remember all access checks for the file now go through the ACL, the posix are just approximate and representative). This model still meets our workflow requirements and provides consistent and valid data access for all our clients.

 

6.png

 

5. So which model do we use? Ultimately the model selected depends on what your requirement is. There is no wrong answer here, the goal is the select the appropriate permission for the workflow.

 

7.png

 

 

-Analyze the workflow

-Determine protocol requirements

-Understand user access model

-Test and validate

-Implement

 

 

Bottom line, both permission models work and provide the same consistency of data access for your workflow. It is just about selecting and using the most appropriate method to meet your requirement and managing the data permissions correctly moving forward.

 

 

 

 

 

Isilon

russ_stevenson

Using Hadoop with Isilon - Isilon Info Hub

Hello world,

 

I’m an engineer at EMC.  This blog will demonstrate some use cases of the iiq_data_export utility for exporting FSA data for offline analysis and custom charge-back processing.

 

Back story

 

If you run OneFS 7.x code with InsightIQ, you can use the iiq_data_export utility to extract performance and file system reports.  While that’s great when it works, a very common gripe we hear is that the FS Analyze (FSA) job can take “forever” to complete.  The FSA job is an OneFS job that crawls the filesystem to gather metadata information about files.  InsightIQ consumes the results of the FSA job for filesystem reports.  Therefore, when the FSA job takes a long time to finish (days or even weeks), up-to-date file system reports aren’t readily available.  The FSA job in OneFS 7.x conducts a LIN-tree based scan on the filesystem to gather metadata about all files on the cluster.  If your cluster has a few billion files, the LIN-tree based scan can take a while to finish.

 

With the general availability of OneFS version 8.0.0 earlier this year, anyone is now able to deploy the latest version of OneFS in test environments and production clusters.  One of the first few things noticed is the drastic improvements made in the FSA job completion time.  And there are good reasons for it.

 

For OneFS 8.0, Isilon Engineering dedicated substantial efforts to overhaul how the FSA job works.  The FSA job now performs a change-list based scan instead of a LIN-tree based scan.  What this means is that only the files that have changed since the last FSA job completion are scanned.  This cuts down the amount of time it takes to aggregate metadata, even for a cluster that hosts a few billion files.

 

Of course, in order for this feature to work, your cluster must be on OneFS 8.x code, and that the first LIN-tree based scan has to complete. Subsequent FSA jobs will only touch the changed files since the last FSA completion.  I should also note that the subsequent FSA jobs will be fully independent of previous FSA results.  Meaning, if FSA result “A” comes from a full LIN-tree based scan, and FSA result “B” comes from a subsequent change-list based scan, you can delete or unpin result “A” and still have full file system reporting capability by using result “B”.

 

Reports from the field tell us that FSA jobs complete on a daily basis.  Daily export of FSA data is now possible. This opens doors for interesting uses of the FSA result data.  One potential use of reliable, fresh-daily FSA results is the ability to track application and project-use of Isilon storage for the purpose of charge-back.

 

This blog answers a very common inquiry about how to programmatically export storage consumption for project folders under /ifs instead of using the InsightIQ web interface and iterate through individual directories.

 

The location of the iiq_data_export utility

 

The iiq_data_export utility resides on the Linux server that runs the InsightIQ application.  Depending on your deployment method, your InsightIQ might be running on a physical Linux box or a virtual machine.  In either case, you need ssh access to the server.  The iiq_data_export utility can be executed by a non-root user.

 

The FSA export option of iiq_data_export

 

The iiq_data_export utility has two major functional areas.  It allows you to export performance stats or file system analytics data.  We’ll cover some uses of iiq_data_export for file system analytics in this blog.  Specifically, we will look at the “directory” data-module export option.

 

Using iiq_data_export, list FSA results for the cluster being monitored

 

The command to list available FSA results for the cluster is:

               iiq_data_export fsa list --reports <cluster_name>

 

Example, let’s suppose I have a cluster named “tme-sandbox”:

[10:41:39] rchang@VNODE0100:[~]:iiq_data_export fsa list --reports tme-sandbox

 

    Available Reports for: tme-sandbox Time Zone: EDT

================================================================================

    |ID |FSA Job Start                |FSA Job End               |Size        |

================================================================================

    |449 |Jun 06 2016, 10:00 PM        |Jun 06 2016, 10:31 PM     |4.806G      |

--------------------------------------------------------------------------------

    |455 |Jun 07 2016, 10:00 PM        |Jun 07 2016, 10:31 PM     |4.819G      |

--------------------------------------------------------------------------------

    |461 |Jun 08 2016, 10:00 PM        |Jun 08 2016, 10:30 PM     |4.817G      |

--------------------------------------------------------------------------------

    |467 |Jun 09 2016, 10:00 PM        |Jun 09 2016, 10:32 PM     |4.801G      |

--------------------------------------------------------------------------------

    |473 |Jun 10 2016, 10:00 PM        |Jun 10 2016, 10:30 PM     |92.933G     |

--------------------------------------------------------------------------------

    |479 |Jun 11 2016, 10:00 PM        |Jun 11 2016, 10:31 PM     |4.908G      |

--------------------------------------------------------------------------------

    |486 |Jun 12 2016, 10:00 PM        |Jun 12 2016, 10:31 PM     |4.816G      |

--------------------------------------------------------------------------------

    |492 |Jun 13 2016, 10:00 PM        |Jun 13 2016, 10:32 PM     |4.794G      |

--------------------------------------------------------------------------------

    |498 |Jun 14 2016, 10:00 PM        |Jun 14 2016, 10:30 PM     |4.816G      |

================================================================================

 

The ID column is the job number that is associated with that particular FS Analyze job engine job.  This is the ID number that you will provide to iiq_data_export to extract capacity information for your directory.

 

Exercise 1: Export first-level directories under /ifs

The command to export the first-level directories under /ifs from a specified cluster, for a specific FSA job is:

               iiq_data_export fsa export -c <cluster_name> --data-module directories -o <jobID>

 

Let’s suppose I want a listing of all first-level directories under /ifs, from FSA job ID 473, I would use the “data-module directories” option as follows:

 

[14:34:56] rchang@VNODE0100:[~/blog-work]:iiq_data_export fsa export -c tme-sandbox --data-module directories -o 473

    Successfully exported data to: directories_tme-sandbox_473_1467236098.csv


The resulting CSV file can be parsed through Excel or using another programmatic manner to derive the capacity consumption of the directories.  The output shows directory count, file counts, logical, and physical capacity consumption.  Example:

CSV-parse.png

 

Exercise 2: Export specific directories under /ifs including 2nd and 3rd level directories

 

Now suppose you want the capacity information for a specific directory that is nested somewhere under the /ifs branch, you would use the “directory filter” option (short hand -r).  Syntax is as follows:

 

iiq_data_export fsa export -c <cluster-name> --data-module directories -o <jobID> -r directory:<directory_under_ifs>

 

For example, the command below will extract directory information for /ifs/data/hdfs_dogfooding:


[16:14:31] rchang@VNODE0100:[~]:iiq_data_export fsa export -c tme-sandbox --data-module directories -o 473 -r directory:data/hdfs_dogfooding

 

 

    Successfully exported data to: directories_tme-sandbox_473_1466032486.csv

 

A quick look at this output file shows:

 

path[directory:/ifs/data/hdfs_dogfooding/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1465610442

/ifs/data/hdfs_dogfooding/user,52,2202,0,0,105103857955,136633898496,0

/ifs/data/hdfs_dogfooding/Shipit,14,12,0,0,337408759,340820992,0

/ifs/data/hdfs_dogfooding/benchmarks,5,22,0,0,104859576,134982144,0

/ifs/data/hdfs_dogfooding/tmp,35,21,0,0,4514730,7365120,0

/ifs/data/hdfs_dogfooding/hbase,27,32,0,0,11551,1380864,0

/ifs/data/hdfs_dogfooding/solr,1,0,0,0,0,2560,0

/ifs/data/hdfs_dogfooding/pyhdfs,1,0,0,0,0,2560,0

 

Caveats

 

There are a number of caveats around the iiq_data_export command that I should note:

 

  1. Currently with a single execution of the iiq_data_export command, we cannot extract more than one specific directory with the directory filter.  For example, if you issued -r directory:home -r directory:data/hdfs_dogfooding, only the second filter will be picked up by the command.  I will offer a quick bash script below to iterate through a user-defined list of directories.
  2. How far you can go down the /ifs tree depends on the FSA configuration from within InsightIQ. InsightIQ by default configures “directory filter maximum depth” to 5.  Meaning by default you could extract directory information as low as /ifs/level1/level2/level3/level4/level5.  Should you need to extract project directories from deeper than level 5, you could configure the FSA job to crawl deeper from within InsightIQ, as shown in the following image:

FSA-depth-config.png

Keep in mind that the larger the maximum depth, the more storage individual FSA result will consume on the cluster.

 

FSA Extraction Script

 

Here’s a simple bash script I whipped up to iterate through a list of directories.

 

Create a simple file with the list of directories under /ifs you’d like to extract:

[15:33:28] rchang@VNODE0100:[~/blog-work]:cat dir_list.input

data/hdfs_dogfooding/Shipit/DB-Directory

data

home

 

 

Create this script (remember to substitute the cluster name and the job ID):

[15:33:33] rchang@VNODE0100:[~/blog-work]:cat export-fsa.bash

for i in `cat dir_list.input`

do

   echo "Processing $i..."

   j=`basename $i`;

   echo "Basename is $j"

current_date_time="`date +%Y_%m_%d_%H%M%S_`";

iiq_data_export fsa export -c tme-sandbox --data-module directories -o 473 -r directory:$i -n fsa_export_$current_date_time$j.csv

done

 

Once executed, the resulting CSV file has the timestamp plus the directory’s base name:

[15:34:05] rchang@VNODE0100:[~/blog-work]:. export-fsa.bash

Processing data/hdfs_dogfooding/Shipit/DB-Directory...

Basename is DB-Directory

 

 

Successfully exported data to: fsa_export_2016_06_29_153408_DB-Directory.csv

 

 

Processing data...

Basename is data

 

 

Successfully exported data to: fsa_export_2016_06_29_153410_data.csv

 

 

Processing home...

Basename is home

 

 

    Successfully exported data to: fsa_export_2016_06_29_153411_home.csv

 

 

That’s it.  I welcome any comments and feedback.  Happy exporting!

With the recent publication of the high level overviews of deploying Kerberos authentication against Isilon and Hadoop on this blog, I thought I'd return and discuss some of the considerations around the configuration and methodologies used within OneFS to facilitate Kerberized Hadoop on Isilon.

 

One of the cornerstones of this implementation is leveraging the Active Directory's ability to also provide UNIX identities for users as well as normal SID's with additional schema attributes complying with rfc2307. Using these additional features we can simplify user mapping and identity management on Isilon from a permissions management perspective. Using the rfc2307 extension is definitely not the only method to achieve this but it does provide an elegant and simplified solution.

 

Let's discuss some of the considerations on OneFS with implementing Kerberized hadoop with AD.

 

 

PREREQUISITES

  • The cluster must be joined correctly to the target Active Directory.
  • The Access Zone the HDFS root lives under is configured for this Active Directory provider
  • All IP addresses within the required SmartConnectZone must be added to the reverse DNS with the same FQDN for the cluster delegation.
  • Isilon will leverage the Active Directory Schema extension that support UNIX identities; known as the Microsoft Service for UNIX or the Microsoft Identity Management for UNIX. These schema attributes extend Active Directory objects to provide UID’s and GID’s to a user account in Active Directory.
  • Users running hadoop jobs are Active Directory User Principals with UNIX attributes allocated.

 

 

OneFS ACTIVE DIRECTORY SETTINGS

In order to enable kerberized hadoop authentication operations where Active Directory is the authentication authority a couple of advanced options will need to be enabled on the Active Directory Provider.

 

1.png

 

 

From the Isilon WebUI:

Access

Authentication Providers

Active Directory

View Details

                Advanced Active Directory Settings

 

  • Enable - rfc2307: This leverages the Identity Management for UNIX services in the Active Directory schema
  • Map user/group into primary domain: Yes – Without this setting the domain name will need to be prefixed during user login.

 

The example below shows the advanced active directory settings utilized for the test domain FOO.COM. If the status indicator appears in any color other than green the active directory is out of synchronization with OneFS and will need to be restored before continuing.

 

2.png

 

Currently enabling rfc2307 for SFU support can be managed by the CLI but the assume default domain switch is missing from the CLI. It will likely return in an MR shortly.

 

#isi auth ads modify --sfu-support=rfc2307 FOO.COM

#isi auth ads view --provider-name=FOO.COM -v

 

5.png

 

Having enabled these features, we can validate look ups are working for short and long name:

#isi auth mapping token --user=administrator --zone=rip2-cd1

#isi auth mapping token --user=administrator@FOO.COM --zone=rip2-cd1

 

 

6.png

 

7.png

 

 

 

 

SFU-RFC2307 Enablement on the Active Directory Provider

By leveraging the Active Directory Provider with SFU support for rfc2307 enabled, we maintain a consistent user and identity mapping between users executing Hadoop jobs and Isilon. This will allow the implementation of a standard Isilon permissioning model leveraging the OneFS permission model with posix file permissions. Without SFU-rfc2307 support Isilon will need to leverage user mapping to a different LDAP provider who can provide UNIX UID & GID'S for the user.

 

I will discuss the permissioning model in an upcoming post specifically, but for a great background checkout the following series of multiprotocol post I coauthored a while back.

 

 

 

But, what is enablement of SFU-rfc2307 doing for us, the short answer is it providing UID's & GID's from Active Directory for our AD user accounts. Since our access token now contain Directory Service based UID/GID & SID we can permission directly against these AD identities to support full multiprotocol access.

 

 

User in Active Directory

Isilon User Access Token

  3.png

User’s UNIX ID as seen in Active Directory

4.png

User Token as seen on Isilon

 

 

The token validates that the Active Directory provider is pulling the correct information from Active Directory and the UNIX

identities are present.

 

Since AD is now providing the correct UID from AD for the users running jobs, the on-disk permission will based on UID's & GID'S (as can be seen in the token) and the permission model utilized can be easily based posix authoritative permissions and managed with existing tools; chown & chmod.

 

This wraps up the high level overview of how to leverage the Active Directory Provider for kerberized hadoop access by leveraging the SFU-rfc2307 extension in AD.

 

 

Next up Isilon permissioning strategies with multiprotcol access with Kerberized hadoop.

 

 

 

 

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

If you've ever done a full test of SyncIQ, you know that you failover to the target, verify everything works, and then failback to restore the configuration to its steady state. Before issuing the failback, you  run a resync-prep, which readies the original source to be restored as the active dataset. If you look at what happens during the resync-prep phase, the first time you ever failback a policy, you'll see output similar to this:

  

06/30/16 11:19:35  06/30/16 12:57:05 resync_prep Success

06/30/16 12:57:05  06/30/16 15:05:27 resync_prep_domain_mark Success

06/30/16 15:05:27  06/30/16 15:05:48 resync_prep_restore Success

06/30/16 15:05:48  06/30/16 15:05:51 resync_prep_finalize Success


So what is this domain_mark process?

A domain is simply a scoping mechanism for the data contained within a SyncIQ policy - that is, the directories and folders which are replicated. In normal operation, the SyncIQ target is read-only, so that data can only be updated by SyncIQ replication itself. This is what you want - all client I/O goes to your source and all changes are faithfully replicated to the target by SyncIQ. Setting the target read-only is achieved by performing a domain mark on the target, which happens automatically when the policy is sync'd for the first time. The domain mark process tags each existing LIN within the policy with a domain ID to mark the extent of the domain.

How does this work? When you run your policy for the first time, each incoming file and subdirectory is automatically marked as it is created on the target. If the target directory happens to already contain files, a process known as domain mark will explicitly mark any existing files. The domain on the target is then set to the restricted-write state – so that clients cannot write to it.

What about on subsequent incremental replications, as you add files on your source? Not to worry - any new files or directories created on your source automatically inherit the domain ID of the parent when they are replicated to the target, so there is no need to run the domain mark process again. So you're good to go - your target is guaranteed read-only.


Blog1.png

Now, suppose you're going to failover to your target, either for a test or in an actual disaster. On failover, SyncIQ sets the target to writable, and you direct your clients there.


Blog2.png

You do your testing or operations at the target site, until at some point, you're ready to initiate the failback to return to your original state. You almost certainly updated data while you were in failover, therefore you need to propagate those changes back to the source before failing back. The failback process restores the source to its last known good state and then resyncs it back from the target. This requires the source to be read-only, to guarantee the integrity of the data, and therefore the domain has to be marked, just like it was for the target when the policy was first run. At the completion of the domain mark, the data is resync'd from the target to the source, and the failback can then complete.


Failback first phase


Blog3.png

Failback second phase


Blog4.png

Remember, the domain only has to be marked once for any given policy. If the source domain hasn’t already been marked, the domain mark process runs during the resync-prep step of the failback, and it will require a tree walk. So if you haven't run a failback until there's a lot of data associated with your policy, that domain mark on the first failback can take a long time. It's not affecting your client I/O - that still proceeds on the target, but it does increase the duration of your failover test or your return to production. You could do a test failover cycle early on for each policy, as soon as you'd completed the initial sync.  If there's not much data in the source, it won't take much time. But you have to remember to do it, which could be easily overlooked.

You can also run the domain mark manually at any time - if it's already run previously on the specified domain, this command will have no effect. Use the CLI

isi job jobs start DomainMark –root=<SyncIQ source path> --dm-type=SyncIQ

or at the WebUI, navigate Cluster Management->Job Operations -> Domain Mark -> Start Job and specify the domain root path and domain type SyncIQ.

Blog5.png

An easier way in OneFS 8.0

OneFS 8.0 includes a flag in the advanced settings for SyncIQ policies which, when enabled, will cause the source to be automatically domain marked the next time the policy runs. So if you set this parameter on a policy when it is first defined, the failback process doesn't need to run the domain mark because it's already been taken care of.  The potentially time consuming domain mark step will then be eliminated from the first failback execution.

You can use the --accelerated-failback true|false option on the CLI (either when creating or modifying a policy), or check the option "Prepare policy for accelerated failback performance" in the WebUI.

BlogWebUI.png

Conclusion

In this article I've hopefully shed some light on a not so well understood piece of SyncIQ, and also highlighted one of the many smaller improvements in OneFS 8.0.



trimbn

Cloud Recall

Posted by trimbn Jul 19, 2016

In response the last blog article on CloudPools, the following question was asked:


“Since SmartPools has the ability to promote, or ‘up-tier’ data from an NL-node archive tier to an active X-node tier, is the same kind of functionality available for Cloudpools?


Unlike SmartPools, CloudPools does not use filepool policies to automatically and permanently retrieve data stored on a cloud tier back to a nodepool on local cluster.


Instead, the ‘isi cloud recall’ command provides the ability to move files back onto the cluster from a cloud provider. In this case, the original files replace the stub files that were created when the data was first archived. Files can be recalled individually by name or by specifying a fully recursive directory path.

 

CloudPools provide the ability to cache a file’s blocks locally when it’s read. Note, though, that the actual file data itself will still live in the cloud. 


Caching can be configured from the WebUI by browsing to Storage Pools > CloudPools Settings:


cloudpools8.png

 

Configurable cache parameters include:


  • Writeback frequency
  • Cache accessibility
  • Cache read-ahead
  • Cache experation


Note:  If a customer wishes to change cloud providers, they will need to have the space to manually retrieve the full dataset, then apply a new policy to move it to the new cloud account.

 

So, recall is the process of copying data back from the cloud storage to the primary storage and making it a regular file (not stub). Recall of an archived file can be triggered by a manual recall request.


As shown in the command syntax below, you can enter a list of files to be retrieved, separated by commas or spaces, or via a regular expression file filtering pattern. To ensure that files identified for recall are present on the cloud, OneFS scans the cluster for stub files prior to performing the recall. If you specify a filtering pattern, the files you want to recall must both match the pattern, and have representative stub files on the cluster. The user can specify a directory name to recursively recall files through the directory.


When a manual-recall request is received, a recall job is created and a job ID is returned to the user. This job ID can be used to track the recall process and its status.

 

Here’s how to recall a CloudPools archived file from the OneFS CLI:

 

1. Manually recall the stub file by running the command, isi cloud recall –v.

 

# isi cloud recall /ifs/data/cp_test/cp_testfile.txt -v

Created job 44

 

2. Run isi cloud job view and use the job ID above to check recall job status.

 

# isi cloud job view 44

ID: 44

Description: [f] /ifs/data/cp_test/cp...

State: completed

Type: recall

Create Time: 2016-07-19T10:43:22

Modified Time: 2016-07-19T10:43:26

Completion Time: 2016-07-19T10:43:26

Job Engine Job: 86

Job Engine State: succeeded

Total Files: 1

Total Failed: 0

Total Pending: 0

Total Processing: 0

Total Succeeded: 1

 

3.  Finally, check the stub flag using the command, isi get –D, to verify that the stub file has been recalled and become a regular file again.

 

# isi get -D /ifs/data/cp_test/cp_testfile.txt | grep -i stub

* Stubbed: False

 

The Stubbed Flag is marked “False”, indicating that it’s a regular file.

 

Here’s the full CLI syntax for the isi cloud recall command:


# isi cloud recall -h

Description:

    Recall a stubbed file.

 

Usage:

    isi cloud recall <files>

        [{--recursive | -r} <boolean>]

        [{--verbose | -v}]

        [{--help | -h}]

 

Options:

    <files>...

        File names to be acted on. Specify --files for each additional file name.

    --recursive | -r <boolean>

        Whether the recall/archive should apply recursively to nested

        directories.

 

  Display Options:

    --verbose | -v

        Display more detailed information.

    --help | -h

        Display help for this command.

 

File Matching Criteria:

    Overview:

        The following arguments are allowed for specifying file matching criteria for the given cloud. A full description on how to form file matching criteria can be found in man isi-file-matching.

    Special keywords:

        --begin-filter

        --end-filter

        --and

        --or

        --operator=<value>

    Arguments:

        --accessed-time=<value>

        --birth-time=<value>

        --changed-time=<value>

        --custom-attribute=<value>

        --file-type=<value>

        --metadata-changed-time=<value>

        --name=<value>

        --path=<value>

        --size=<value>

    Example:

        --begin-filter --name=my_name --and --size=100MB --operator=lt

        --end-filter

 

More information on SmartPools and tiering upwards can be found in the following blog article:


https://community.emc.com/community/products/isilon/blog/2015/04/08/smartpools-and-tiering-upwards

Following on from the Kerberization of Cloudera with Isilon post: Cloudera CDH 5.7 with Isilon 8.0.0.1 and Active Directory Kerberos Implementation this blog will focus on getting the Hue Service started.

 

On attempting to start the Hue service, you'll see the following errors relating to the KT_RENEWER service.

 

 

This is currently expected and the following workaround will need to be done to get the Hue service running. Stay tuned for updates to Isilon - Cloudera - Kerberos  - Hue integration.

 

 

19.png

 

In order to get this service started we will add a secondary HDFS instance and bind the solr service to it.

 

1.Add a Service

 

2.png

 

2.Select HDFS

 

3.png

 

 

3.Assign the HDFS service as needed

 

4.png

 

 

4.Leave the defaults for the HDFS service

 

5.png

 

 

5.Add the service

 

6.png

 

 

6.Let the deploy of the service complete

 

 

7.png

 

 

7.Return to the Cloudera Manager Dashboard

 

8.png

 

 

We now bind the Solr service to the HDFS instance

 

 

8.On the Configuration tab of the Solr service, select the recently added HDFS Service

 

9.png

 

10.png

 

 

Save the configuration

Return to the Cloudera Manager Dashboard

 

12.png

 

Since the configuration has changed we need to deploy the new configurations

 

9.Deploy Client Configuration

 

13.png

 

14.png

 

15.png

 

 

 

10.Restart the services

 

 

16.png

 

17.png

 

 

18.png

 

19.png

 

20.png

 

 

Restart completes, all services are operational

 

 

22.png

 

If any warning exist on the HDFS service, these can be surpressed as needed

 

23.png

 

 

All Kerberized services and now started.

 

24.png

 

 

 

Follow this blog for any updates with Isilon and Hue Services.

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

trimbn

Head in the cloud(s)

Posted by trimbn Jul 14, 2016

There have been several enquiries from the field recently around the fundamentals of OneFS and tiering to the cloud. So it seemed germane to cover this in a blog article:

 

Introduced in OneFS 8.0, CloudPools enables the movement and tiering of data via NFS or SMB from Isilon OneFS clusters to Amazon S3, Microsoft Azure, or Isilon to Isilon cloud storage. Benefits include allowing customers to free up space on their cluster and reducing overall storage costs as well as the following primary features and capabilities:

 

  • Archiving files to Cloud Storage
  • Recalling files from the Cloud Storage
  • NFS & SMB protocol support for client access to the cloud data
  • Encryption and/or compression for data written to the cloud
  • Garbage collection for cloud objects whose retention period has expired
  • Feature interoperability with other OneFS storage management and data protection tools, including SmartPools, SyncIQ, SnapshotIQ, SmartLock, SmartQuotas, NDMP backup, etc

 

Before archiving files to the cloud storage, we need to know which files will be archived and where to move the files. Filepool policy defines eligible files that will be moved to the cloud storage and specifies the target cloud storage where files reside. The following steps are used to configure Filepool policies. If you are unfamiliar with these steps, please consult the appropriate documentation.

 

Activate the SmartPools and CloudPools licenses.

From the OneFS command line interface, run the following command for each license, where
<module name> is either smartpools or cloudpools:

isi license <module name>

Or, from the OneFS web administration interface, go to Cluster Management > Licenses.

 

Next, create a cloud storage account on the primary storage.

 

Navigate to File System > Storage Pools > CloudPools.

 

Provide the following information to create a new CloudPools account:

 

  • Cloud provider type
  • provider account username
  • key (password)
  • URI to physical cloud storage

 

OneFS CloudPools supports the following cloud providers:

 

  • Amazon S3
  • Microsoft Azure
  • EMC ECS
  • Virtustream
  • Isilon to Isilon cloud storage


If you have an existing cloud account, you can modify the cloud account name, provider account username, key and URI.

 

cloudpools1.png

 

For example, to create a cloud account using Azure:

 

Assuming you have received similar information for your Azure account, for example:

 

Account user name:      cloudpoolsazure

Key:                               bMbeZWBhDEvv1qdJB5gEsEAPtfKzw6i45/yIMdvgP7/sMzY6HuucEaVPeoGdbNCFLXpRIYC8A==

URI:                               https://cloudpoolsazure.blob.core.windows.net

 

1. Create a cloud account using Azure as cloud storage

 

# isi cloud account cp_azure_accnt azure https://cloudpoolsazure.blob.core.windows.net cloudpoolsazure bMbeZWBhDEvv1qdJB5gEsEAPtfKzw6i45/yIMdvgP7/sMzY6HuucEaVPeoGdbNCFLXpRIYC8A==

 

2. Verify the account information

 

# isi cloud account view cp_azure_accnt

ID: cp_azure_accnt

Name: cp_azure_accnt

Type: azure

Account Username: cloudpoolsazure

URI: https://cloudpoolsazure.blob.core.windows.net

Bucket: d0007430856f6906ce9528f1f552a8b5b48a2i2

 

3. Create a cloud pool with cloud storage account and provider type. Enter CloudPool name and select type. The system automatically updates available accounts for selected type. Select an account from a list of available accounts and click Add button to add accounts for this CloudPool. Then click Create a CloudPool button.

 

A cloud pool is a container comprising of one or more cloud storage accounts. The user can add or remove the cloud account from the cloud pool. At the creation of a new cloud pool, at least one cloud account must be specified.

 

cloudpools2.png

 

cloudpools3.png


1.  Create a filepool policy to define archiving criteria and specify a target cloud pool The archiving criteria are used to identify files to be archived to the cloud. CloudPools uses the same policy criteria mechanism as SmartPools.

 

From the WebUI, navigate to Storage Pools > File Pool Policies and click Create a File Pool Policy.

 

2.  Enter Policy Name and File Matching Criteria. Click the check-box, Move to the cloud storage under the bottom section called Apply CloudPools Actions to Selected Files of Create a File Pool Policy window. Click Create Policy button to create a file pool policy.

 

The following example creates a policy that targets files that haven’t been modified in 6 months or more:

 

cloudpools4.png


3.  Configure CloudPools settings:

 

cloudpools5.png

 

Archive

 

Archiving is the process of moving file data from the cluster to the cloud and converting each file to a stub within OneFS. The file can be archived to the cloud either by automatically, utilizing the scheduled SmartPools job, or manually. The automatic archival scanning is performed by the scheduled SmartPools job. Manual archiving is initiated utilizing isi cloud archive or isi filepool applies commands.

 

When a manual archive request is received, an archive job is created and a job id is returned to the user. The job ID can be used to track archive process and status. The user can specify a directory name in the archive command to recursively archive files through the directory. After archiving completes, the local file becomes a stub file. The stub flag of the file changes to ‘True’.

 

To manually initiate archiving:

 

1. Archive a file from the command line using isi cloud archive –v command. This command returns a job ID.

 

# isi cloud archive /ifs/data/test/testfile_cli.txt -v

Created job 43

 

2. Use this job ID to check the archive job status.

 

# isi cloud job view 43

ID: 43

Description: [f] /ifs/data/test/t...

State: completed

Type: archive

Create Time: 2016-07-11T16:26:58

Modified Time: 2016-07-11T16:27:02

Completion Time: 2016-07-11T16:27:02

Job Engine Job: 77

Job Engine State: succeeded

Total Files: 1

Total Failed: 0

Total Pending: 0

Total Processing: 0

Total Succeeded: 1

 

3. Check the stub flag using the command, isi get –D, to make sure the file is archived and it becomes a stub file

 

# isi get -D /ifs/data/test/testfile_cli.txt | grep Stub

* Stubbed: True

Stubbed file flags 30 5

Stubbed file mtime 35 17

Stubbed file size 52 9

 

The Stubbed Flag is maked ‘True’ and the file becomes a stub.

 

Recall

 

Recall is the process of copying data back from the cloud storage to the primary storage and making it a regular file (not stub). Recall of an archived file can be triggered by a manual recall request. When a manual-recall request is received, a recall job is created and a job id is returned to the user. The job id can be used to track recall process and status.


The user can specify a directory name to recursively recall files through the directory.

 

To recall an archived file:

 

1. Manually recall the stub file by running the command, isi cloud recall –v.

 

# isi cloud recall /ifs/data/test/testfile_cli.txt -v

Created job 44

 

2. Run isi cloud job view and use the job ID above to check recall job status.

 

# isi cloud job view 44

ID: 44

Description: [f] /ifs/data/test/t...

State: completed

Type: recall

Create Time: 2016-07-11T16:43:22

Modified Time: 2016-07-11T16:43:26

Completion Time: 2016-07-11T16:43:26

Job Engine Job: 78

Job Engine State: succeeded

Total Files: 1

Total Failed: 0

Total Pending: 0

Total Processing: 0

Total Succeeded: 1

 

3. Check the stub flag using the command, isi get –D, to make sure the stub file is recalled and it becomes a regular file.

 

# isi get -D /ifs/data/test/testfile_cli.txt | grep Stub

* Stubbed: False

 

The Stubbed Flag is marked “False”, indicating that it’s a regular file.

 

 

Integration with SnapshotIQ

 

The system provides allow or disallow options for the end user to control the archiving behavior of files with snapshots:

 

1. Allow archiving snapshots of HEAD with older non-stubbed snapshot versions.

2. Disallow archiving files with older non-stubbed snapshot versions.

 

It can be configured on both WebUI and CLI. From CLI, run:

 

# isi cloud settings modify --archive-snapshot-files [disabled|enabled]

 

From the WebUI, go to FILE SYSTEM MANAGEMENT > Storage Pools > CloudPools Settings. The first check box, “All files that have had snapshots taken to be stored in the cloud”, is used to configure the archiving of files with snapshots.

 

cloudpools6.png

 

The CloudPools feature guarantees the point-in-time version access to the data in the snapshots for stub files.

 

Garbage Collection

 

When a stub file is deleted via NFS or SMB, or recalled and all references to the stub file removed, the system will remove the cloud objects after the retention period expires. The retention period can be set at the cluster level or per individual filepool policy. The policy level settings take precedence over the cluster level.

 

1. From WebUI, click File System Management, click Storage Pools, and click File Pool Policies. Click Create a File Pool Policy for a new policy or view/edit for the existing policy.

 

2. Click the checkbox of Move to cloud storage and click Show Advanced CloudPool Setting at the bottom of pop-up windows. You will see three parameters for data retention period: Cloud Data Retention Period, Incremental Backup Retention Period and Full Backup Retention Period. The incremental backup retention period and full backup retention period are used for NDMP and SyncIQ. When a backed-up stub is deleted, cloud objects are garbage collected when the longest of three parameters has expired.

 

cloudpools7.png

 

The CloudPools default settings can also be configured via the isi cloud settings modify CLI command:

 

isi cloud settings modify

        [--default-accessibility (cached | no-cache)]

        [--default-cache-expiration <duration>]

        [--default-compression-enabled <boolean>]

        [--default-data-retention <duration>]

        [--default-encryption-enabled <boolean>]

        [--default-full-backup-retention <duration>]

        [--default-incremental-backup-retention <duration>]

        [--default-read-ahead <string>]

        [--default-writeback-frequency <duration>]

        [--default-archive-snapshot-files <boolean>]


 

More information about tieiring data on Isilon clusters can also be found in the following blog article:

 

https://community.emc.com/community/products/isilon/blog/2015/04/20/tiering-snapshots

 

https://community.emc.com/community/products/isilon/blog/2015/04/08/smartpools-and-tiering-upwards

 

https://community.emc.com/community/products/isilon/blog/2015/04/27/smartpools-storage-pool-changes-and-new-files

 

https://community.emc.com/community/products/isilon/blog/2015/06/02/smartpools-and-dynamic-file-placement

  
Also see the Isilon Cloudera Kerberos Installation Guide: http://www.emc.com/collateral/TechnicalDocument/docu84031.pdf

 

Continuing the series of posts regarding the deployment of Hadoop with Isilon this post will outline  the high level overview of the procedure to Kerberize a Cloudera  CDH cluster with Isilon against an Active Directory. It provides the core tasks needed to complete the setup, additional topics will be covered later or in upcoming documents.

 

This procedure is based on the following:

Isilon 8.0.0.1

CDH 5.7.1

 

OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of kerberos against OneFS, it is highly recommended to use this version.

 

 

This blog assumes to following Isilon Hadoop environment is configured and operational.

 

 

A dedicated Isilon Access Zone is in use (not the system zone).

The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.

The Isilon HDFS configuration is correctly configured.

Cloudera Manager is configured correctly for Isilon integration.

Cloudera Manager will manage and deploy keytab and krb5.conf files

A simple access model currently exists between Hadoop and Isilon; user UID & GID are correctly implemented and allow HDFS access to the Isilon HDFS root with UID & GID parity.

Hadoop jobs and services are fully operational.

 

If your environment deviates from any of these configurations an alternative approach to Kerberization may be required especially with regards to management of keytabs and krb5.conf files. This procedure will not address all configurations or requirements. Additional EMC services should be engaged when required.

 

This post also does not address Linux host kerberization, Directory Service integration and the Isilon permissioning model for multiprotocol access following kerberization. I hope to address these at a later date.

 

 

Since  we are integrating an existing  Cloudera CDH cluster with an Isilon into a pre-existing Microsoft Active Directory environment, the high level outline approach can be considered to be:

 

Prepare and configure the Active Directory for Isilon – Hadoop integration

Prepare the Cloudera cluster and Linux hosts for kerberization

Integrate the Isilon cluster into Active Directory

Kerberize the HDP cluster via the  Cloudera enable Kerberos wizard

Complete the integration of Isilon and  Cloudera

Test and validate Kerberized services

 

 

This post is based on using a pre-existing Active Directory environment for Kerberos User Authentication.

To use an existing Active Directory domain for the cluster with the Cloudera Kerberos wizard, you must prepare the following:

 

-Isilon, Cloudera Manager and the compute cluster hosts have all the required network access to Active Directory and AD services

-All DNS name resolution of required Active Directory Services is valid

-Active Directory secure LDAP (LDAPS) connectivity has been configured.

-Active Directory OU User container for principals has been created, For example "OU=Hadoop--Cluster,OU=People,dc=domain,dc=com"

-Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the OU User container are implemented.

 

For additional information, see the cloudera  security documents.  http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_intro_kerb.html

 

 

 

How Kerberos is Implemented:

Since the Isilon integrated Hadoop cluster is a mix between Linux hosts running the compute services and Isilon running the data services Cloudera cannot effectively complete the Kerberization end-to-end. With Isilon running a clustered operating system ‘OneFS’  ssh based remote management cannot configure and manage the kerberization of Isilon completely, nor does it need to. But, It can still completely deploy and configure the Linux hosts though.

 

Because of this,  the kerberization of the Isilon integrated Hadoop cluster should be considered in the following context:

Isilon is Kerberized

Cloudera Kerberization wizard runs and deploys kerberization to Linux and Hadoop services

Since both sets of systems are now fully Kerberized within the same KDC realm, Kerberized user access can occur between the Isilon and Hadoop cluster seamlessly.

 

 

 

Cloudera  Pre-Configuration

Review that  Cloudera 5.x or higher is running.

Forward and reverse DNS between all hosts is tested and validated. Test this with dig or ping.

All services are running (green) on the Cloudera Manager Dashboard.

All other Cloudera specific Kerberos requirements have been met; NTP, DNS, packages etc.

 

 

Before launching the Cloudera Kerberization wizard, you need to make the following configurations customization and restart all services.

 

In the Isilon Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property for the Isilon service, set the value of the hadoop.security.token.service.use_ip property to FALSE.

 

This key may need creating:

 

0.png

 

 

Preparing Hosts for Kerberization

In order for Kerberization to be operational on all Hadoop hosts, the required client libraries must be installed. The OpenLdap client libraries should be installed on the Cloudera Manager Server and all  Kerberos client libraries should be installed on ALL hosts.

 

See the following Cloudera doc for more information: Enabling Kerberos Authentication Using the Wizard

 

On a RHEL or CentOS system this can be achieved with the following, with other OS install using the appropriate packages

# yum install krb5-workstation

# yum install krb5-libs

# yum install openldap-clients

 

 

 

Isilon OneFS Configuration

This section covers the configuration required for OneFS to respond to requests for secure Kerberized HDFS authenticated by Active Directory.

The cluster must be joined correctly to the target Active Directory as a Provider.

The Access Zone the HDFS root lives under is configured for this Active Directory provider.

All IP addresses within the required SmartConnect Zone must be added to the reverse DNS with the same FQDN for the cluster delegation. All IP's should resolve back to the SmartConnect Zone. This is required for kerberos.

 

 

Isilon SPN's

Since OneFS is a clustered file system running on multiple nodes but is joined to Active Directory as a single Computer Object. The SPN requirements for Kerberized hadoop access are unique. The required SPN’s for hadoop access are as follows, it requires additional SPN’s for the Access Zone that HDFS NameNode access is made through:

 

 

Review the registered SPN’s on the Isilon cluster and add the required SPN’s for the SmartConnect Zone name if needed.

#isi auth ads spn list --provider-name=<AD PROVIDER NAME>

 

The following example illustrates the required SPN’s:

Isilon Cluster Name - rip2.foo.com - SPN: hdfs/rip2.foo.com

Access Zone NN SmartConnect FQDN - rip2-cd1.foo.com - SPN's: hdfs/rip2-cd1.foo.com & HTTP/rip2-cd1.foo.com

 

 

For additional information on adding or modifying Isilon SPN’s in Active Directory see the Isilon CLI Administrative Guide.

 

 

Isilon Hadoop (HDFS) Changes

The following configuration changes are required on the HDFS Access Zone.

 

1. Disable simple authentication. This enforces only Kerberos or delegation token authentication access only.

 

# isi hdfs settings modify --authentication-mode=kerberos_only --zone=rip2-cd1

 

17.png

 

 

2. Create the required Proxy Users

 

Proxy users are required for service account impersonation for specific hadoop services to execute jobs, add the required proxy users as needed. More on proxy users in a later post and review the Isilon CLI administrative guide.

 

 

3. Increase the hdfs log level

 

#isi hdfs log-level modify --set=verbose

 

 

This completes the Isilon hadoop Active Directory setup:

- Isilon joined to Active Directory, Provider Online

- HDFS Access Zone has Active Directory provider added

- SPN's are correctly configured

- HDFS Service configured for kerberos_only

- DNS configuration is valid

 

 

Enable Kerberos on Cloudera

Having met all the prerequisites, the Cloudera cluster can be Kerberized. It is suggested to suspend all client and user activity on the Hadoop cluster prior to executing any Kerberization tasks.

 

01.png

 

From the Dashboard, Select Security

1.png

 

Enable Kerberos,

 

2.png

Having met all the prerequisites, check all the boxes

 

3.png

 

Continue,

 

5.png

 

Add the KDC Server Host FQDN

Add the Security Realm (the AD domain)

Add additional Encryption types (OneFS 8.0.x support aes-256)

Modify the OU for the delegated Cloudera OU to be used for Principals

 

 

It is recommended to manage host krb5.conf files through Cloudera Manager

6.png

 

check the Manage krb5.conf files through Cloudera Manager

7.png

 

8.png

 

Accept the defaults,

 

Since Cloudera Manager will create and manage all the Principals, an AD OU with an Delegated Administrative account is used,

 

9.png

 

Enter the credentials for the AD user with Delegated access to the OU in the AD Domain.

 

10.png

 

Continue,

11.png

 

Continue,

 

12.png

 

Continue,

 

13.png

 

These ports can be left as default,

 

14.png

 

Yes, I am ready to restart the cluster now and Continue,

 

The kerberization wizard will start,

 

15.png

 

 

The wizard will create the required Principals in Active Directory,

 

16.png

 

 

Kerberos enablement will continue, Service will attempt to restart

 

18.png

 

 

The Hue Service will Fail and halt the wizard, this is a known issue and will need to be worked around.

 

19.png

 

 

In order to get the Hue Service to start following Kerberization follow the following procedure --- >  <coming soon>

 

Since the failure of the Hue service prevents the wizard from completing, we need to do the following:

 

Open another browser session to the CM Dashboard: http://<CM-URL>:7180 and review the state of the services.

 

You will likely see Service in an unhealthy state, we can address each of these services individually; starting or restart them as needed, monitor the log files to get them started. Some services may just need restarting manually. But all services are fully Kerberized now.

 

 

20.png

 

 

On Completion of them ALL being started, EXCEPT HUE, you can close the other Kerberization wizard browser, all services are Kerberized and the cluster is operational (except Hue) as seen below:

 

Getting the Hue Service started:

Getting the Hue Service Started on Kerberized Cloudera with Isilon

 

 

 

21.png

 

Address any configuration or alarms as needed, but this completes the procedure for Kerberizing Cloudera with Isilon, we can now test it.

 

 

See my prior post for some basic kerberos testing methodology: Ambari HDP with Isilon 8.0.0.1 and Active Directory Kerberos Implementation

 

I'd suggested the following smoketests to valid the cluster & Isilon are correctly Kerberized.

1.Test without a valid ticket

(obtain a valid ticket)

2.Browse the Hadoop root

3.Write a file

4.Run a simple Yarn job; teragen

 

 

 

 

 

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

This article walks through the upgrade of a deployment of Hortonworks Data Platform (HDP) using Ambari 2.2 when Isilon OneFS is used as the data storage for HDFS.

 

Choosing an upgrade guide

 

This guide applies to OneFS 7.2.1.3 or higher, OneFS 8.0.0.1 or higher, and later releases of OneFS.

 

There have been two previous articles on this topic. Last November we described how to upgrade HDP, which required manual changes to work. Use that guide if you are stuck on OneFS 7.2.1.1 or lower. In January OneFS was updated, making the process much easier. Use that guide if you are not yet on OneFS 7.2.1.3 or 8.0.0.1, or if you are not ready to upgrade to Ambari 2.2.

 

A table will help illustrate.

 

OneFSAmbariGuide
  • 7.2.1 family, equal or greater than 7.2.1.3
  • 8.0.0 family, equal or greater than 8.0.0.1
  • Any later family
2.2 or higherCurrent page
7.2.1 family equal or greater than 7.2.1.22.1 family. But, not 2.1.2 because of JIRA 13414January 2016 guide
7.2.1.0 (with patch) or 7.2.1.12.1 familyNovember 2015 guide
7.2.0.3 or 7.2.0.42.1.0No upgrade possible to HDP 2.3

 

Ambari 2.2 mainstreamed the previously experimental skip of service checks that we used in the January 2016 guide. Also Ambari 2.2 introduced a new workflow, Express Upgrade, which speeds up the process when there is no problem for your data scientists if you disrupt jobs. This guide describes the improved Rolling Upgrade experience, and what to expect from Express Upgrade.

 

The Upgrade

Getting started

 

OneFS still needs to report to Ambari Server that it is at the target version. This step is unchanged. Here's how I described target version last time:

 

"This is the 4-number release version and 4 digit build number of HDP. The three HDP releases of 2.3 that are available today are 2.3.0.0-2557, 2.3.2.0-2950, and 2.3.4.0-3485. This can be determined from the release notes of the HDP build that you are upgrading to, or running the command 'hadoop version' on an existing deployment."

 

Currently we're at 2.4.2.0-258. Here's how to make the change:

 

isi hdfs settings modify --odp-version=[version] --zone=[zone]

 

Rolling Upgrade

 

Begin Ambari's upgrade steps and complete installation of packages. Click the Perform Upgrade button (it may have changed to a link named Upgrade: In Process; if so, click on this).

 

The Upgrade Options box will open:

 

HDP upgrade dialog.jpg

 

Click the blue link to take a look at list of failed pre-upgrade checks. There will be two Requirements that fail because of the OneFS host configuration. Ambari expects NameNode High Availability enabled, and expects no Secondary NameNode. Those are okay because OneFS provides continuous availability internally. Hadoop's NNHA and Secondary NameNode are not truly applicable. You can ignore these errors.

Ambari upgrade warning and requirements.jpg

 

If the odp-version setting in OneFS was not previously set, this warning may appear. It can be ignored as well. It could also be corrected by setting odp-version on OneFS to the initial HDP version and then restarting the HDFS service in Ambari.


HDP version error Ambari 2-2.jpg


If there are any other required checks or warning messages, resolve those issues before continuing.

 

Back at the Upgrade Options, select the "Skip all Service Check failures" checkbox, then choose Rolling Upgrade and Proceed.

 

Skip Service Checks Ambari 2-2.jpg

 

 

There may be other actions necessary to complete HDP installation, but the OneFS and HDFS aspects will be complete.

 

Express Upgrade

 

No special instructions are needed for HDP upgrade using Express Upgrade with OneFS in the cluster. You may encounter the warning about the HDFS components not having version stamps, but that can be ignored.

 

Work through the HDP upgrade wizard and a satisfying "Upgrade finished" dialog is your reward!

 

Finalize upgrade Ambari 2-2.jpg.png

If Cloudera 5.7 was installed in a process similar to the sequence described in: https://community.emc.com/community/products/isilon/blog/2016/07/07/cloudera-and-isilon-implementation-part-2

 

Then the Impala Service may not start or stay running correctly, a few additional configuration changes will need to be made to get this service running. Also take a look at the Cloudera doc: Using Impala with Isilon Storage

 

 

1.png

 

On further review, the Impala Daemon will not stay started correctly.

2.png

 

The process will continue to exit post start.

3.png

 

 

On reviewing the Impalad log we see the following info and errors.

 

-Short-circuit reads are not enabled

-block tracking is not properly enabled because of dfs.client.file-block-storage-locations.timeout.millis timeout

 

 

5.png

 

 

In order to correct these, from Cloudera Manager search for: dfs_client_read_shortcircuit

 

And select the Isilon : Enable HDFS Short-Circuit Reads

 

6.png

 

1.

Enable HDFS Short-Circuit Read on the Gateway Default Group

 

7.png

 

2.

In HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml properties for the Isilon service

 

Add a new value of the dfs.client.file-block-storage-locations.timeout.millis property to 10000

 

hdfs-site.xml

8.png

 

9.png

 

 

core-site.xml

10.png

 

Also add In the Isilon Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property for the Isilon service

 

3.

Set the value of the hadoop.security.token.service.use_ip property to FALSE.

11.png

 

Save and return to Cloudera Manager Dashboard, since configuration changes have been made we need to redeploy configuration and restart services.

 

12.png

 

Select, to Deploy Client Configuration

13.png

 

 

Deploy,

14.png

 

 

Monitor the deploy

15.png

 

 

We now need to restart the services, affected by the configuration redeployment

 

16.png

 

Restart all the services,

17.png

 

 

Restart,

18.png

 

 

Monitor the restarts,

19.png

 

 

 

You may still see alarms on services based on prior alert triggers

20.png

 

The alarms can be viewed for their status,

21.png

 

 

The alarms will timeout and the status of Impala will return to healthy green.

 

22.png

 

The Impala service is now started and operational.

 

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

This blog post continues where Cloudera and Isilon Implementation Part 1 leaves off:

 

 

In order to deploy a Hadoop Cluster with Isilon, we need to select the Custom Services install, this allows us to select the required components.

 

z.png

 

Initially no services are selected,

aa.png

 

Select the Hadoop services you wish to deploy;

 

DO NOT SELECT HDFS, since we are deploying Isilon as the storage & effective NN we don't need the Cloudera HDFS Service.

 

MapReduce is also not recommended, as MapReduce2 is included in Yarn

 

But, if your application is legacy and not written for Yarn MRv1 can be enabled and used.

bb.png

 

Having selected the services continue and assign roles. Since this is a single host all roles are deployed on the same host. Consult the Cloudera documentation for best practices regarding role assignments.

 

cc.pngdd.pngee.png

 

Leave the Isilon role, as default. No need to change it.

ff.png

 

Continue,

 

gg.png

 

hh.png

 

Again this is test host, so all defaults are selected, consult the Cloudera documentation for best practices regarding database.

Test the Database Connections,

ii.png

 

Continue,

 

On the Cluster Setup page, we need to assign the Isilon to the following two parameters, note the ports are different.

 

default_fs_name        hdfs://smartconnectzonename:8020

webhdfs_url                http://smartconnectzonename:8082/webhdfs/v1

 

jj.pngkk.png

ll.png

mm.png

nn.png

 

Assign the two Isilon paramters

oo.png

 

All other settings can be left as default currently,

 

Continue,

 

Cluster Setup will start,

pp.png

 

Review the setup as it runs,

 

qq.png

 

rr.png

Setup will complete, additional details can be seen by opening the specific services

 

The setup will complete,

ss.png

 

Continue, the Hadoop cluster deployment has finished.

tt.png

uu.png

 

Finish and return to the main Cloudera Manager dashboard, review the status

 

 

 

It is not uncommon to see alarms and service down issues on the dashboard. Review the alarms and services and triage as needed. Some services may just need restarting, follow standard protocols in starting these services.

-start the service

-monitor and review logs as needed

-reviewing the Isilon /var/log/hdfs.log  (remember Isilon is a clustered system, so all node logs need reviewing)

 

123.png

 

 

It is also common to see Configuration issues, address and make the required changes as needed to resolve each issue.

124.png

 

Restart services to resolve Alarms following configuration changes.

 

234.png

 

 

The Hadoop cluster and Services are now fully operational and can be tested.

 

vv.png

 

Basic functionality can be tested.

 

Browse the Isilon HDFS root:

hadoop fs -ls /

ww.png

 

Write to the Isilon HDFS root:

xx.png,

 

Run some basic smoketest jobs; PI or teragen/teravalidate/terasort to test mapreduce.

 

xxx.png

 

 

 

With Cloudera 5.7, you may notice that Impala service is not started fully, some additional configuration changes are needed to get this Service started.

 

345.png

 

 

 

The steps needed to get Impala running can be found here  ---- >  Get Cloudera 5.7 Impala starting with Isilon

 

 

Hopefully this doc gives the high level overview of getting Cloudera CDH up and running against Isilon OneFS.

 

 

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

The following post continues a series of high level overview posts on Isilon and Hadoop implementations. It provides the core tasks needed to complete the setup and get a basic operational Hadoop cluster running with Isilon, additional topics will be covered later or in upcoming documents. Since the steps to this process are long, I'll break this post up into two parts.

 

 

 

 

This procedure is based on the following:

Isilon OneFS: 8.0.0.1

CDH 5 parcel: 5.7.1-1.cdh5.7.1.p0.11

 

 

OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of hadoop against OneFS, it is highly recommended to use this version. The procedure may requires additional steps prior to 8.0.0.1 not documented in this post.

 

Before installing any Hadoop cluster, the OneFS supportability matrix should be consulted for compatibility: https://community.emc.com/docs/DOC-37101

 

 

This blog assumes the following Isilon Hadoop environment is configured and operational:

 

-Isilon is licensed for HDFS

-A dedicated Isilon Access Zone is in use (not the system zone).

-Isilon HDFS root directory in the Access Zone exists

-The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.

-The Isilon HDFS configuration is correctly configured.

-A simple access model will exists between Hadoop and Isilon; user UID & GID and parity will exist.

 

The best approach to achieving parity is beyond the scope of this post and will be addressed in up coming posts.

 

 

 

Assuming the Isilon is setup and configured for integration with Cloudera, we can begin the deployment of the Cloudera Manager.

 

This post does not address the setup, configuration and deployment of the Linux hosts used to deploy Hadoop services on.  The Cloudera documentation should be consulted to setup and prepare the hosts correctly: Overview of Cloudera and the Cloudera Documentation Set  The post also does not address advanced Cloudera installs, the focus is to highlight the Isilon integration into the installer and how to complete the install.

 

A good overview of the procedure can be found here: Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode)  This post begins with the download of the bits and installation of CM.

 

 

# wget https://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

# chmod u+x cloudera-manager-installer.bin

# ./cloudera-manager-installer.bin

 

0.png

 

On running the installer, you'll get the following:

1.png

Next,

2.png

Next, Accept the Cloudera License

3.png

Yes,

4.png

Next, we will let Cloudera Manager install the JDK

5.png

Yes, Accept the Oracle

6.png

 

7.png

OK, note the URL and the user/pass for the Cloudera Manager WebUI

8.png

OK,

 

You can validate the Cloudera Manager Service is running, if you see problems tail the cloudera-scm-server.log as you start the service.

 

# service cloudera-scm-server status

# cloudera-scm-server (pid  10487) is running...

# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

 

 

Log in to the Cloudera Manager WebUI;  user:admin, password:admin

a.png

 

Continue,

 

b.png

 

Select the Yes check box to accept the EULA and Continue,

 

c.png

 

Select the version you wish to deploy,

 

d.png

 

Continue,

 

e.png

 

Continue,

 

 

In this post we will deploy to just a single Linux host, but the process is the same when multiple hosts are used in the Hadoop cluster.

 

f.png

 

Add the FQDN of the Linux hosts to be deployed, Search,

 

g.png

 

On completion of the search, select the host(s) to deploy to,

 

h.png

 

Select to use Parcels

Select the CDH Stack you wish to deploy

 

i.png

 

Select the Additional Parcels and Agent configuration as needed and what is supported by Isilon,

 

j.png

 

Continue,

k.png

 

Select install the JDK,

Select install the JUSEP files is you intended to secure this cluster later, Continue,

 

l.png

 

We will not deploy in Single User Mode,  Continue

 

m.png

 

Provide the SSH credentials, either root password or SSH keys depending on how you set your Linux hosts up and wish to manage them, Continue

 

The installation will begin

 

n.png

 

Details,

o.png

 

Installation completes and the installer continues,

 

p.png

 

Parcels being downloaded,

 

q.png

 

Parcels being distributed

 

r.png

 

Parcels unpacked and activated

 

s.png

 

The host inspector will then validate hosts, versions and additional software installed

 

t.png

 

 

The installer will check and validate the hosts, if any deviations are seen recommendation are presented to optimize the hosts. If the validation checker fails it is suggested to follow the recommendations and then re-try the validation.

 

Common errors are seen with:

- transparent_hugepage

- swappiness

 

make the recommended changes to hosts and Run Again,

 

u.pngv.pngw.pngx.pngy.png

 

Finish,

 

 

This completes part 1 of the install, deploying Hadoop services with Cloudera Manager is continued in Part 2.

 

Cloudera and Isilon Implementation Part 2  ---->

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub

The following post represents a high level overview of the procedure to Kerberize an Ambari HDP cluster with Isilon against an Active Directory. It provides the core tasks needed to complete the setup, additional topics will be covered later or in upcoming documents.

 

This procedure is based on the following:

  • Isilon 8.0.0.1
  • Ambari 2.2.1.0
  • HDP Stack 2.4.2.0-258

 

OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of kerberos against OneFS, it is highly recommended to use this version. The procedure requires additional steps prior to 8.0.0.1 not documented in this post.

 

 

This blog assumes to following Isilon Hadoop environment is configured and operational.

 

  • A dedicated Isilon Access Zone is in use (not the system zone).
  • The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.
  • The Isilon HDFS configuration is correctly configured.
  • Ambari is configured correctly for Isilon integration.
  • Ambari will manage and deploy keytab and krb5.conf files
  • A simple access model currently exists between Hadoop and Isilon; user UID & GID are correctly implemented and allow HDFS access to the Isilon HDFS root with UID & GID parity. Isilon and Hadoop Local User UID Parity
  • Hadoop jobs and services are fully operational.

 

If your environment deviates from any of these configurations an alternative approach to Kerberization may be required especially with regards to management of keytabs and krb5.conf files. This procedure will not address all configurations or requirements. Additional EMC services should be engaged when required.

 

This post also does not address Linux host kerberization, Directory Service integration and the Isilon permissioning model for multiprotocol access following kerberization. I hope to address these at a later date.

 

 

As we are integrating an existing Ambari HDP cluster with an Isilon into a pre-existing Microsoft Active Directory environment, the high level outline approach can be considered to be:

 

  • Prepare and configure the Active Directory for Isilon – Hadoop integration
  • Prepare the Ambari HDP cluster and hosts for kerberization
  • Integrate the Isilon cluster into Active Directory
  • Kerberize the HDP cluster via the Ambari wizard
  • Complete the integration of Isilon and HDP
  • Test and validate

 

 

This post is based on using a preexisting Active Directory environment for Kerberos User Authentication.

To use an existing Active Directory domain for the cluster with the Ambari wizard Kerberos Setup, you must prepare the following:


  • Isilon, Ambari Server and compute cluster hosts have all the required network access to Active Directory and Active Directory services
  • All DNS name resolution of required Active Directory Services is valid
  • Active Directory secure LDAP (LDAPS) connectivity has been configured.
  • Active Directory OU User container for principals has been created, For example "OU=Hadoop-Cluster,OU=People,dc=domain,dc=com"
  • Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the OU User container are implemented.

For additional information, see the Hortonworks security documents.

 

 

How Kerberos is implemented here:

Since the Isilon integrated Hadoop cluster is a mix between Linux hosts running the compute services and Isilon running the data services Ambari cannot effectively complete the Kerberization end-to-end. With Isilon running a clustered operating system ‘OneFS’ the Ambari agent cannot configure and manage the kerberization of Isilon completely, nor does it need to. It can completely deploy and configure the Linux hosts though.

 

Because of this the kerberization of the Isilon integrated Hadoop cluster should be considered in the following context:

  1. Isilon is Kerberized
  2. Ambari Kerberization wizard runs and deploys kerberization to Linux and Hadoop

Since both sets of systems are now fully Kerberized within the same KDC realm, Kerberized user access can occur between the Isilon and Hadoop cluster seamlessly.

 

 

Ambari Pre-Configuration

  • Review that Ambari 2.0 or higher is running.
  • Forward and reverse DNS between all hosts is tested and validated. Test this with dig or ping.
  • All services are running (green) on the Ambari Dashboard.
  • All other Ambari specific Kerberos requirements have been met; NTP, DNS, packages etc.

1.png

 

 

Before launching the Ambari Kerberization wizard, you must make two configurations customization's and restart all services.

1. In HDFS -> Custom core-site set "hadoop.security.token.service.use_ip" to "false"  to the core-site.xml

This key may need creating:

2.png

 

Key after addition:

3.png

 

2. In MapReduce2 -> Advanced mapred-site add "`hadoop classpath`:" to the beginning of "mapreduce.application.classpath". Note the colon and backticks (but do not copy the quotation marks).

Locate the mapreduce.application.classpath key and add `hadoop classpath`:, save and restart the service.

 

4.png

 

 

Isilon OneFS Configuration

This section covers the configuration required for OneFS to respond to requests for secure Kerberized HDFS authenticated by Active Directory.

  • The cluster must be joined correctly to the target Active Directory.
  • The Access Zone the HDFS root lives under is configured for this Active Directory provider.
  • All IP addresses within the required SmartConnect Zone must be added to the reverse DNS with the same FQDN for the cluster delegation. All IP's should resolve back to the SmartConnect Zone. This is required for kerberos.

 

Isilon SPN's

Since OneFS is a clustered file system running on multiple nodes but is joined to Active Directory as a single Computer Object. The SPN requirements for Kerberized hadoop access are unique. The required SPN’s for hadoop access are as follows, it requires additional SPN’s for the Access Zone that HDFS NameNode access is made through:

6.png

 

Review the registered SPN’s on the Isilon cluster and add the required SPN’s for the SmartConnect Zone name if needed.

#isi auth ads spn list --provider-name=<AD PROVIDER NAME>

 

The following example illustrates the required SPN’s:

Isilon Cluster Name - rip2.foo.com – SPN: hdfs/rip2.foo.com

Access Zone NN SmartConnect FQDN - hdfs/rip2-horton1.foo.com & HTTP/rip2-horton1.foo.com


7.png

 

For additional information on adding or modifying Isilon SPN’s in Active Directory see the Isilon CLI Administrative Guide.

 

 

 

Isilon Hadoop (HDFS) Changes

The following configuration changes are required on the HDFS Access Zone.

 

1. Disable simple authentication. This enforces only Kerberos or delegation token authentication access only.


# isi hdfs settings modify --authentication-mode=kerberos_only --zone=rip2-horton1


8.png


2. Create the required Proxy Users


Proxy users are required for service account impersonation for specific hadoop services to execute jobs, add the required proxy users as needed. More on proxy users in a later post and review the Isilon CLI administrative guide.



3. Increase the hdfs log level

#isi hdfs log-level modify --set=verbose

9.png



This completes the Isilon hadoop Active Directory setup:

- Isilon joined to Active Directory, Provider Online

- HDFS Access Zone has Active Directory provider added

- SPN's are correctly configured

- HDFS Service configured for kerberos_only

- DNS configuration is valid



Kerberize Ambari Wizard

The following outlines the steps to run the Ambari Kerberization wizard and any customization required to allow the wizard to integrate with Isilon upon completion. It is suggested to stop all user activity on the Hadoop cluster prior to executing a Kerberization task.


1. Enable Kerberization

From the Ambari WebUI, Select Admin and Kerberos

 

   a.png


Then ‘Enable Kerberos’


b.png

 

Proceed at the warning.

c.png

 



2. Getting Started

At the ‘Get Started’ screen, select an ‘Existing Active Directory’ as the type of KDC you plan to use. In order to precede select all the check boxes to agree that you have met and completed all the prerequisites. This document does not include direction on setting up and completing these requirements, for additional information on meeting these prerequisites it is suggested the Hortonworks Security Guide is consulted for Ambari and Microsoft documentation is consulted for Active Directory information and configuration guidance.

Once you have met and selected the checkboxes for all the prerequisites, the wizard can continue.

 

d.png

 

The Ambari Kerberos Wizard will request information related to the KDC, LDAP URL, REALM, Active Directory OU and delegated Ambari user account is shown below. The Account will be used to Bind to Active Directory and create all the Ambari required principals in Active Directory.

 

 

3. Configure Kerberos

Enter the required information about the KDC and Test the KDC Connection.

    • KDC Host – An Active Directory Domain Controller
    • Realm Name – The name of the Kerberos realm you are joining
    • LDAP URL – The LDAP URL of the Directory Domain Controller; adding port 636 allows secure ldap.
    • Container DN – The OU that delegated access was granted on
    • Domains – (optional) A comma separated list of domain names to map server host names to realm names

 

e.png

 

    • Kadmin host – An Active Directory Domain Controller

    • Admin Principal – The Active Directory User account with delegated rights

    • Admin password – Password for the Admin Principal


f.png


The Advanced Kerberos-env setting should be reviewed, but no changes are required. As of OneFS 8.0.0.0 aes-256 encryption is supported.

 

g.png

h.png

 

The Advanced krb5-conf setting should be reviewed, but no changes are required.

i.png

 

Once all Configuration have been addressed, proceed with the wizard; Next.

k.png

 

 

The Wizard will deploy and configure the Kerberos Clients to all hosts.

 

l.png



Even though the Kerberos client and configuration is not being pushed to Isilon at this time, it will appear to and report success.


m.png



Ensure the successful deployment and test of the Kerberos Clients.

n.png

 

Click, Next to continue.

 

 

 

4. Ambari Principals customization for Isilon Integration

A number of changes need to be made to the principals that will be used and created by the Kerberization wizard:

Ambari creates user principals in the form ${username}-${clustername}@${realm}, then uses hadoop.security.auth_to_local in core-site.xml to map the principals into just ${username} on the file system.

 

Isilon does not honor the mapping rules, so you must remove the -${clustername} from all principals in the "Ambari Principals" section. Isilon will strip off the @${realm}, so no aliasing is necessary. In my Ambari cluster running HDFS, YARN, MapReduce2, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Ambari Metrics and Spark,

 

I make the following modifications in the "General" tab:

 

  • Smokeuser Principal Name: ${cluster-env/smokeuser}-${cluster_name}@${realm} => ${cluster-env/smokeuser}@${realm}
  • Spark.history.kerberos.principal: ${spark-env/spark_user}-${cluster_name}@${realm} => ${spark-env/spark_user}-@${realm}
  • HBase user principal: ${hbase-env/hbase_user}-${cluster_name}@${realm} => ${hbase-env/hbase_user}@${realm}
  • HDFS user principal: ${hadoop-env/hdfs_user}-${cluster_name}@${realm} => ${hadoop-env/hdfs_user}@${realm}


Additional Principals will require updating if these services are running.

  • Storm principal name: ${storm-env/storm_user}-${cluster_name}@${realm} => ${storm-env/storm_user}-@${realm}
  • accumulo_principal_name: ${accumulo-env/accumulo_user}-${cluster_name}@${realm} => ${accumulo-env/accumulo_user}@${realm}
  • trace.user: tracer-${cluster_name}@${realm} => tracer@${realm}


o.png

(you can see the modified principals with the orange reset arrow)

 

 

5. Ambari Created User Principals

Ambari creates users principals, some of which are different than their UNIX usernames. Again, since Isilon does not honor the mapping rules, you must modify the principal names to match their UNIX usernames. Make the following modifications the principal and the keytab name in the "Advanced" tab:

 

  • HDFS > dfs.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}
  • HDFS > dfs.namenode.keytab.file = ${keytab_dir}/nn.service.keytab  => ${keytab_dir}/hdfs.service.keytab

 

  • HDFS > dfs.secondary.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}
  • HDFS > dfs.secondary.namenode.keytab.file = ${keytab_dir}/nn.service.keytab  => ${keytab_dir}/hdfs.service.keytab

 

  • HDFS > dfs.datanode.kerberos.principal = dn/_HOST@${realm}  => hdfs/_HOST@${realm}
  • HDFS > dfs.datanode.keytab.file = ${keytab_dir}/dn.service.keytab => ${keytab_dir}/hdfs.service.keytab

 

  • MapReduce2 > mapreduce.jobhistory.principal = jhs/_HOST@${realm} => mapred/_HOST@${realm}
  • MapReduce2 > mapreduce.jobhistory.keytab = ${keytab_dir}/jhs.service.keytab => ${keytab_dir}/mapred.service.keytab

 

  • YARN > yarn.nodemanager.principal = nm/_HOST@${realm} =>  yarn/_HOST@${realm}
  • YARN > yarn.nodemanager.keytab = ${keytab_dir}/nm.service.keytab => ${keytab_dir}/yarn.service.keytab

 

  • YARN > yarn.resourcemanager.principal = rm/_HOST@${realm} => yarn/_HOST@${realm}
  • YARN > yarn.resourcemanager.keytab = ${keytab_dir}/rm.service.keytab => ${keytab_dir}/yarn.service.keytab

 

  • Falcon > *.dfs.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}

 

The changes to the HDFS and MapReduce2 principals are illustrated below.

p.png

 

q.png

 

 

After configuring the appropriate principals, press "Next". At the "Confirm Configuration" screen, press Next.

 

r.png

 

 

 

6. Confirm Configuration

Review the configuration, and proceed Next, Exiting the Wizard here will remove all configuration and customization's and they will need re-entering.

 

s.png

 

Download the csv and review, the csv file contains all the principals and keytabs that the Ambari will create in Active Directory. The list contains principals and keytabs for Isilon but these keytabs will not be distributed to the Isilon cluster. Isilon kerberization has already occurred and is implemented through joining Active Directory.

 

t.png

 

 

7. Stop Services

This will stop all the Hadoop services in Ambari, All user activity will stop

 

u.png

Services on all hosts will stop.

 

v.png

 

 

On successful stopping of all services, proceed Next.

 

w.png

 

8. Kerberize Cluster

The Kerberization wizard will begin execution of the Kerberization of the Ambari Services, create principals in Active Directory and distribute keytabs.

x.png

 

 

Following the creation of principals, you can view all the Active Directory principals in the Hadoop OU.

 

y.png

 

 

Note: The UPN and sAMAccountName differ in Active Directory; this does not present any problems in simple installation. Complex custom installs may require additional configuration to enable Isilon multi-protocol functionality to operate correctly. More on this in later posts.

 

z.png

 

 

Kerberization of the cluster completes successfully!

 

1a.png

 

Since Ambari created principals for the Isilon cluster in AD during deployment of kerberos that are not required, these need removing from Active Directory.

Remove the following User from the Hadoop OU:

hdfs/<isilon-clustername>

HTTP/<isilon-clustername>

 

 

2a.png


Remove the user AD principal’s auto created by the ambari kerberization wizard for the Isilon cluster;

 

Following removal of the users.

 

aa.png

 

 

 

9. Start and Test Services

 

The wizard will now attempt to start all the Kerberized Hadoop services on Ambari

 

5a.png

 

If some services fail to start, they can always be restarted. It is often common to see some failures. Review the start up logs of the service and monitor the Isilon /var/log/hdfs.log while services are starting to review what is happening.

 

6a.png

 

If some services do fail, move on and troubleshoot each service independently.

 

 

On completion of the Kerberos wizard the configuration can be seen in Ambari.

 

7a.png

 

A few services need restarting,

 

8a.png

 

On restart of these services the cluster and all hdfs services are running and the cluster is green.

 

9a.png

 

This completes the Kerberos deployment of the Hadoop services, Ambari has Kerberized the Hadoop cluster and Isilon is a valid Active Directory provider. We can now test and validate that Kerberos authentication is operational against the Isilon HDFS data.

 

 

 

Test and Validation Hadoop Services

In order the validate the newly Kerberized cluster a few simple tests should be run.

 

1. No kerberos Ticket Test

Since the cluster is now Kerberized and Isilon is enforcing Kerberos_only access to the HDFS root, if you attempt to run any simple hadoop commands they will fail if you do not have a valid kerberos ticket. This is good test to validate that simple authentication is still not happening.

 

1b.png

2b.png

 

 

2. Valid Kerberos Ticket Test

Get a kerberos ticket for your test user using a kinit command:  $kinit <ad user name>

3b.png

 

Execute a simple HDFS directory listing:  $hadoop fs –ls /

 

4b.png

 

 

3. Execute a simple file system write

Create a simple file on the Isilon Hadoop root: $hadoop fs -touchz /user/hdpuser3/This_file_testing_Kerberos.txt

 

5b.png

 

4. Run a simple yarn job without a valid Kerberos ticket, you see a lots of kerberos errors.

 

 

7b.png

 

5. Run a simple yarn job that access the file system; here's a simple teragen, in the output you'll see the delegation token used to execute the kerberized job.

 

9b.png10b.png11b.png

 

If you see issues with running Kerberized jobs, you can increase the kerberos logging to show you a lot more details:

Having Kerberos Authentication Issue, DEBUG it

 

 

 

This about wraps it up, clearly this is very large topic and this posts goal was to provide a high level overview of the considerations and procedure for Kerberizing an Ambari HDP cluster against an Isilon.

 

 

 

Filter Blog

By date:
By tag: