The following post represents a high level overview of the procedure to Kerberize an Ambari HDP cluster with Isilon against an Active Directory. It provides the core tasks needed to complete the setup, additional topics will be covered later or in upcoming documents.

 

This procedure is based on the following:

  • Isilon 8.0.0.1
  • Ambari 2.2.1.0
  • HDP Stack 2.4.2.0-258

 

OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of kerberos against OneFS, it is highly recommended to use this version. The procedure requires additional steps prior to 8.0.0.1 not documented in this post.

 

 

This blog assumes to following Isilon Hadoop environment is configured and operational.

 

  • A dedicated Isilon Access Zone is in use (not the system zone).
  • The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.
  • The Isilon HDFS configuration is correctly configured.
  • Ambari is configured correctly for Isilon integration.
  • Ambari will manage and deploy keytab and krb5.conf files
  • A simple access model currently exists between Hadoop and Isilon; user UID & GID are correctly implemented and allow HDFS access to the Isilon HDFS root with UID & GID parity. Isilon and Hadoop Local User UID Parity
  • Hadoop jobs and services are fully operational.

 

If your environment deviates from any of these configurations an alternative approach to Kerberization may be required especially with regards to management of keytabs and krb5.conf files. This procedure will not address all configurations or requirements. Additional EMC services should be engaged when required.

 

This post also does not address Linux host kerberization, Directory Service integration and the Isilon permissioning model for multiprotocol access following kerberization. I hope to address these at a later date.

 

 

As we are integrating an existing Ambari HDP cluster with an Isilon into a pre-existing Microsoft Active Directory environment, the high level outline approach can be considered to be:

 

  • Prepare and configure the Active Directory for Isilon – Hadoop integration
  • Prepare the Ambari HDP cluster and hosts for kerberization
  • Integrate the Isilon cluster into Active Directory
  • Kerberize the HDP cluster via the Ambari wizard
  • Complete the integration of Isilon and HDP
  • Test and validate

 

 

This post is based on using a preexisting Active Directory environment for Kerberos User Authentication.

To use an existing Active Directory domain for the cluster with the Ambari wizard Kerberos Setup, you must prepare the following:


  • Isilon, Ambari Server and compute cluster hosts have all the required network access to Active Directory and Active Directory services
  • All DNS name resolution of required Active Directory Services is valid
  • Active Directory secure LDAP (LDAPS) connectivity has been configured.
  • Active Directory OU User container for principals has been created, For example "OU=Hadoop-Cluster,OU=People,dc=domain,dc=com"
  • Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the OU User container are implemented.

For additional information, see the Hortonworks security documents.

 

 

How Kerberos is implemented here:

Since the Isilon integrated Hadoop cluster is a mix between Linux hosts running the compute services and Isilon running the data services Ambari cannot effectively complete the Kerberization end-to-end. With Isilon running a clustered operating system ‘OneFS’ the Ambari agent cannot configure and manage the kerberization of Isilon completely, nor does it need to. It can completely deploy and configure the Linux hosts though.

 

Because of this the kerberization of the Isilon integrated Hadoop cluster should be considered in the following context:

  1. Isilon is Kerberized
  2. Ambari Kerberization wizard runs and deploys kerberization to Linux and Hadoop

Since both sets of systems are now fully Kerberized within the same KDC realm, Kerberized user access can occur between the Isilon and Hadoop cluster seamlessly.

 

 

Ambari Pre-Configuration

  • Review that Ambari 2.0 or higher is running.
  • Forward and reverse DNS between all hosts is tested and validated. Test this with dig or ping.
  • All services are running (green) on the Ambari Dashboard.
  • All other Ambari specific Kerberos requirements have been met; NTP, DNS, packages etc.

1.png

 

 

Before launching the Ambari Kerberization wizard, you must make two configurations customization's and restart all services.

1. In HDFS -> Custom core-site set "hadoop.security.token.service.use_ip" to "false"  to the core-site.xml

This key may need creating:

2.png

 

Key after addition:

3.png

 

2. In MapReduce2 -> Advanced mapred-site add "`hadoop classpath`:" to the beginning of "mapreduce.application.classpath". Note the colon and backticks (but do not copy the quotation marks).

Locate the mapreduce.application.classpath key and add `hadoop classpath`:, save and restart the service.

 

4.png

 

 

Isilon OneFS Configuration

This section covers the configuration required for OneFS to respond to requests for secure Kerberized HDFS authenticated by Active Directory.

  • The cluster must be joined correctly to the target Active Directory.
  • The Access Zone the HDFS root lives under is configured for this Active Directory provider.
  • All IP addresses within the required SmartConnect Zone must be added to the reverse DNS with the same FQDN for the cluster delegation. All IP's should resolve back to the SmartConnect Zone. This is required for kerberos.

 

Isilon SPN's

Since OneFS is a clustered file system running on multiple nodes but is joined to Active Directory as a single Computer Object. The SPN requirements for Kerberized hadoop access are unique. The required SPN’s for hadoop access are as follows, it requires additional SPN’s for the Access Zone that HDFS NameNode access is made through:

6.png

 

Review the registered SPN’s on the Isilon cluster and add the required SPN’s for the SmartConnect Zone name if needed.

#isi auth ads spn list --provider-name=<AD PROVIDER NAME>

 

The following example illustrates the required SPN’s:

Isilon Cluster Name - rip2.foo.com – SPN: hdfs/rip2.foo.com

Access Zone NN SmartConnect FQDN - hdfs/rip2-horton1.foo.com & HTTP/rip2-horton1.foo.com


7.png

 

For additional information on adding or modifying Isilon SPN’s in Active Directory see the Isilon CLI Administrative Guide.

 

 

 

Isilon Hadoop (HDFS) Changes

The following configuration changes are required on the HDFS Access Zone.

 

1. Disable simple authentication. This enforces only Kerberos or delegation token authentication access only.


# isi hdfs settings modify --authentication-mode=kerberos_only --zone=rip2-horton1


8.png


2. Create the required Proxy Users


Proxy users are required for service account impersonation for specific hadoop services to execute jobs, add the required proxy users as needed. More on proxy users in a later post and review the Isilon CLI administrative guide.



3. Increase the hdfs log level

#isi hdfs log-level modify --set=verbose

9.png



This completes the Isilon hadoop Active Directory setup:

- Isilon joined to Active Directory, Provider Online

- HDFS Access Zone has Active Directory provider added

- SPN's are correctly configured

- HDFS Service configured for kerberos_only

- DNS configuration is valid



Kerberize Ambari Wizard

The following outlines the steps to run the Ambari Kerberization wizard and any customization required to allow the wizard to integrate with Isilon upon completion. It is suggested to stop all user activity on the Hadoop cluster prior to executing a Kerberization task.


1. Enable Kerberization

From the Ambari WebUI, Select Admin and Kerberos

 

   a.png


Then ‘Enable Kerberos’


b.png

 

Proceed at the warning.

c.png

 



2. Getting Started

At the ‘Get Started’ screen, select an ‘Existing Active Directory’ as the type of KDC you plan to use. In order to precede select all the check boxes to agree that you have met and completed all the prerequisites. This document does not include direction on setting up and completing these requirements, for additional information on meeting these prerequisites it is suggested the Hortonworks Security Guide is consulted for Ambari and Microsoft documentation is consulted for Active Directory information and configuration guidance.

Once you have met and selected the checkboxes for all the prerequisites, the wizard can continue.

 

d.png

 

The Ambari Kerberos Wizard will request information related to the KDC, LDAP URL, REALM, Active Directory OU and delegated Ambari user account is shown below. The Account will be used to Bind to Active Directory and create all the Ambari required principals in Active Directory.

 

 

3. Configure Kerberos

Enter the required information about the KDC and Test the KDC Connection.

    • KDC Host – An Active Directory Domain Controller
    • Realm Name – The name of the Kerberos realm you are joining
    • LDAP URL – The LDAP URL of the Directory Domain Controller; adding port 636 allows secure ldap.
    • Container DN – The OU that delegated access was granted on
    • Domains – (optional) A comma separated list of domain names to map server host names to realm names

 

e.png

 

    • Kadmin host – An Active Directory Domain Controller

    • Admin Principal – The Active Directory User account with delegated rights

    • Admin password – Password for the Admin Principal


f.png


The Advanced Kerberos-env setting should be reviewed, but no changes are required. As of OneFS 8.0.0.0 aes-256 encryption is supported.

 

g.png

h.png

 

The Advanced krb5-conf setting should be reviewed, but no changes are required.

i.png

 

Once all Configuration have been addressed, proceed with the wizard; Next.

k.png

 

 

The Wizard will deploy and configure the Kerberos Clients to all hosts.

 

l.png



Even though the Kerberos client and configuration is not being pushed to Isilon at this time, it will appear to and report success.


m.png



Ensure the successful deployment and test of the Kerberos Clients.

n.png

 

Click, Next to continue.

 

 

 

4. Ambari Principals customization for Isilon Integration

A number of changes need to be made to the principals that will be used and created by the Kerberization wizard:

Ambari creates user principals in the form ${username}-${clustername}@${realm}, then uses hadoop.security.auth_to_local in core-site.xml to map the principals into just ${username} on the file system.

 

Isilon does not honor the mapping rules, so you must remove the -${clustername} from all principals in the "Ambari Principals" section. Isilon will strip off the @${realm}, so no aliasing is necessary. In my Ambari cluster running HDFS, YARN, MapReduce2, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Ambari Metrics and Spark,

 

I make the following modifications in the "General" tab:

 

  • Smokeuser Principal Name: ${cluster-env/smokeuser}-${cluster_name}@${realm} => ${cluster-env/smokeuser}@${realm}
  • Spark.history.kerberos.principal: ${spark-env/spark_user}-${cluster_name}@${realm} => ${spark-env/spark_user}-@${realm}
  • HBase user principal: ${hbase-env/hbase_user}-${cluster_name}@${realm} => ${hbase-env/hbase_user}@${realm}
  • HDFS user principal: ${hadoop-env/hdfs_user}-${cluster_name}@${realm} => ${hadoop-env/hdfs_user}@${realm}


Additional Principals will require updating if these services are running.

  • Storm principal name: ${storm-env/storm_user}-${cluster_name}@${realm} => ${storm-env/storm_user}-@${realm}
  • accumulo_principal_name: ${accumulo-env/accumulo_user}-${cluster_name}@${realm} => ${accumulo-env/accumulo_user}@${realm}
  • trace.user: tracer-${cluster_name}@${realm} => tracer@${realm}


o.png

(you can see the modified principals with the orange reset arrow)

 

 

5. Ambari Created User Principals

Ambari creates users principals, some of which are different than their UNIX usernames. Again, since Isilon does not honor the mapping rules, you must modify the principal names to match their UNIX usernames. Make the following modifications the principal and the keytab name in the "Advanced" tab:

 

  • HDFS > dfs.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}
  • HDFS > dfs.namenode.keytab.file = ${keytab_dir}/nn.service.keytab  => ${keytab_dir}/hdfs.service.keytab

 

  • HDFS > dfs.secondary.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}
  • HDFS > dfs.secondary.namenode.keytab.file = ${keytab_dir}/nn.service.keytab  => ${keytab_dir}/hdfs.service.keytab

 

  • HDFS > dfs.datanode.kerberos.principal = dn/_HOST@${realm}  => hdfs/_HOST@${realm}
  • HDFS > dfs.datanode.keytab.file = ${keytab_dir}/dn.service.keytab => ${keytab_dir}/hdfs.service.keytab

 

  • MapReduce2 > mapreduce.jobhistory.principal = jhs/_HOST@${realm} => mapred/_HOST@${realm}
  • MapReduce2 > mapreduce.jobhistory.keytab = ${keytab_dir}/jhs.service.keytab => ${keytab_dir}/mapred.service.keytab

 

  • YARN > yarn.nodemanager.principal = nm/_HOST@${realm} =>  yarn/_HOST@${realm}
  • YARN > yarn.nodemanager.keytab = ${keytab_dir}/nm.service.keytab => ${keytab_dir}/yarn.service.keytab

 

  • YARN > yarn.resourcemanager.principal = rm/_HOST@${realm} => yarn/_HOST@${realm}
  • YARN > yarn.resourcemanager.keytab = ${keytab_dir}/rm.service.keytab => ${keytab_dir}/yarn.service.keytab

 

  • Falcon > *.dfs.namenode.kerberos.principal = nn/_HOST@${realm} => hdfs/_HOST@${realm}

 

The changes to the HDFS and MapReduce2 principals are illustrated below.

p.png

 

q.png

 

 

After configuring the appropriate principals, press "Next". At the "Confirm Configuration" screen, press Next.

 

r.png

 

 

 

6. Confirm Configuration

Review the configuration, and proceed Next, Exiting the Wizard here will remove all configuration and customization's and they will need re-entering.

 

s.png

 

Download the csv and review, the csv file contains all the principals and keytabs that the Ambari will create in Active Directory. The list contains principals and keytabs for Isilon but these keytabs will not be distributed to the Isilon cluster. Isilon kerberization has already occurred and is implemented through joining Active Directory.

 

t.png

 

 

7. Stop Services

This will stop all the Hadoop services in Ambari, All user activity will stop

 

u.png

Services on all hosts will stop.

 

v.png

 

 

On successful stopping of all services, proceed Next.

 

w.png

 

8. Kerberize Cluster

The Kerberization wizard will begin execution of the Kerberization of the Ambari Services, create principals in Active Directory and distribute keytabs.

x.png

 

 

Following the creation of principals, you can view all the Active Directory principals in the Hadoop OU.

 

y.png

 

 

Note: The UPN and sAMAccountName differ in Active Directory; this does not present any problems in simple installation. Complex custom installs may require additional configuration to enable Isilon multi-protocol functionality to operate correctly. More on this in later posts.

 

z.png

 

 

Kerberization of the cluster completes successfully!

 

1a.png

 

Since Ambari created principals for the Isilon cluster in AD during deployment of kerberos that are not required, these need removing from Active Directory.

Remove the following User from the Hadoop OU:

hdfs/<isilon-clustername>

HTTP/<isilon-clustername>

 

 

2a.png


Remove the user AD principal’s auto created by the ambari kerberization wizard for the Isilon cluster;

 

Following removal of the users.

 

aa.png

 

 

 

9. Start and Test Services

 

The wizard will now attempt to start all the Kerberized Hadoop services on Ambari

 

5a.png

 

If some services fail to start, they can always be restarted. It is often common to see some failures. Review the start up logs of the service and monitor the Isilon /var/log/hdfs.log while services are starting to review what is happening.

 

6a.png

 

If some services do fail, move on and troubleshoot each service independently.

 

 

On completion of the Kerberos wizard the configuration can be seen in Ambari.

 

7a.png

 

A few services need restarting,

 

8a.png

 

On restart of these services the cluster and all hdfs services are running and the cluster is green.

 

9a.png

 

This completes the Kerberos deployment of the Hadoop services, Ambari has Kerberized the Hadoop cluster and Isilon is a valid Active Directory provider. We can now test and validate that Kerberos authentication is operational against the Isilon HDFS data.

 

 

 

Test and Validation Hadoop Services

In order the validate the newly Kerberized cluster a few simple tests should be run.

 

1. No kerberos Ticket Test

Since the cluster is now Kerberized and Isilon is enforcing Kerberos_only access to the HDFS root, if you attempt to run any simple hadoop commands they will fail if you do not have a valid kerberos ticket. This is good test to validate that simple authentication is still not happening.

 

1b.png

2b.png

 

 

2. Valid Kerberos Ticket Test

Get a kerberos ticket for your test user using a kinit command:  $kinit <ad user name>

3b.png

 

Execute a simple HDFS directory listing:  $hadoop fs –ls /

 

4b.png

 

 

3. Execute a simple file system write

Create a simple file on the Isilon Hadoop root: $hadoop fs -touchz /user/hdpuser3/This_file_testing_Kerberos.txt

 

5b.png

 

4. Run a simple yarn job without a valid Kerberos ticket, you see a lots of kerberos errors.

 

 

7b.png

 

5. Run a simple yarn job that access the file system; here's a simple teragen, in the output you'll see the delegation token used to execute the kerberized job.

 

9b.png10b.png11b.png

 

If you see issues with running Kerberized jobs, you can increase the kerberos logging to show you a lot more details:

Having Kerberos Authentication Issue, DEBUG it

 

 

 

This about wraps it up, clearly this is very large topic and this posts goal was to provide a high level overview of the considerations and procedure for Kerberizing an Ambari HDP cluster against an Isilon.