Also see the Isilon Cloudera Kerberos Installation Guide: http://www.emc.com/collateral/TechnicalDocument/docu84031.pdf

 

Continuing the series of posts regarding the deployment of Hadoop with Isilon this post will outline  the high level overview of the procedure to Kerberize a Cloudera  CDH cluster with Isilon against an Active Directory. It provides the core tasks needed to complete the setup, additional topics will be covered later or in upcoming documents.

 

This procedure is based on the following:

Isilon 8.0.0.1

CDH 5.7.1

 

OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of kerberos against OneFS, it is highly recommended to use this version.

 

 

This blog assumes to following Isilon Hadoop environment is configured and operational.

 

 

A dedicated Isilon Access Zone is in use (not the system zone).

The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.

The Isilon HDFS configuration is correctly configured.

Cloudera Manager is configured correctly for Isilon integration.

Cloudera Manager will manage and deploy keytab and krb5.conf files

A simple access model currently exists between Hadoop and Isilon; user UID & GID are correctly implemented and allow HDFS access to the Isilon HDFS root with UID & GID parity.

Hadoop jobs and services are fully operational.

 

If your environment deviates from any of these configurations an alternative approach to Kerberization may be required especially with regards to management of keytabs and krb5.conf files. This procedure will not address all configurations or requirements. Additional EMC services should be engaged when required.

 

This post also does not address Linux host kerberization, Directory Service integration and the Isilon permissioning model for multiprotocol access following kerberization. I hope to address these at a later date.

 

 

Since  we are integrating an existing  Cloudera CDH cluster with an Isilon into a pre-existing Microsoft Active Directory environment, the high level outline approach can be considered to be:

 

Prepare and configure the Active Directory for Isilon – Hadoop integration

Prepare the Cloudera cluster and Linux hosts for kerberization

Integrate the Isilon cluster into Active Directory

Kerberize the HDP cluster via the  Cloudera enable Kerberos wizard

Complete the integration of Isilon and  Cloudera

Test and validate Kerberized services

 

 

This post is based on using a pre-existing Active Directory environment for Kerberos User Authentication.

To use an existing Active Directory domain for the cluster with the Cloudera Kerberos wizard, you must prepare the following:

 

-Isilon, Cloudera Manager and the compute cluster hosts have all the required network access to Active Directory and AD services

-All DNS name resolution of required Active Directory Services is valid

-Active Directory secure LDAP (LDAPS) connectivity has been configured.

-Active Directory OU User container for principals has been created, For example "OU=Hadoop--Cluster,OU=People,dc=domain,dc=com"

-Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the OU User container are implemented.

 

For additional information, see the cloudera  security documents.  http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_intro_kerb.html

 

 

 

How Kerberos is Implemented:

Since the Isilon integrated Hadoop cluster is a mix between Linux hosts running the compute services and Isilon running the data services Cloudera cannot effectively complete the Kerberization end-to-end. With Isilon running a clustered operating system ‘OneFS’  ssh based remote management cannot configure and manage the kerberization of Isilon completely, nor does it need to. But, It can still completely deploy and configure the Linux hosts though.

 

Because of this,  the kerberization of the Isilon integrated Hadoop cluster should be considered in the following context:

Isilon is Kerberized

Cloudera Kerberization wizard runs and deploys kerberization to Linux and Hadoop services

Since both sets of systems are now fully Kerberized within the same KDC realm, Kerberized user access can occur between the Isilon and Hadoop cluster seamlessly.

 

 

 

Cloudera  Pre-Configuration

Review that  Cloudera 5.x or higher is running.

Forward and reverse DNS between all hosts is tested and validated. Test this with dig or ping.

All services are running (green) on the Cloudera Manager Dashboard.

All other Cloudera specific Kerberos requirements have been met; NTP, DNS, packages etc.

 

 

Before launching the Cloudera Kerberization wizard, you need to make the following configurations customization and restart all services.

 

In the Isilon Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property for the Isilon service, set the value of the hadoop.security.token.service.use_ip property to FALSE.

 

This key may need creating:

 

0.png

 

 

Preparing Hosts for Kerberization

In order for Kerberization to be operational on all Hadoop hosts, the required client libraries must be installed. The OpenLdap client libraries should be installed on the Cloudera Manager Server and all  Kerberos client libraries should be installed on ALL hosts.

 

See the following Cloudera doc for more information: Enabling Kerberos Authentication Using the Wizard

 

On a RHEL or CentOS system this can be achieved with the following, with other OS install using the appropriate packages

# yum install krb5-workstation

# yum install krb5-libs

# yum install openldap-clients

 

 

 

Isilon OneFS Configuration

This section covers the configuration required for OneFS to respond to requests for secure Kerberized HDFS authenticated by Active Directory.

The cluster must be joined correctly to the target Active Directory as a Provider.

The Access Zone the HDFS root lives under is configured for this Active Directory provider.

All IP addresses within the required SmartConnect Zone must be added to the reverse DNS with the same FQDN for the cluster delegation. All IP's should resolve back to the SmartConnect Zone. This is required for kerberos.

 

 

Isilon SPN's

Since OneFS is a clustered file system running on multiple nodes but is joined to Active Directory as a single Computer Object. The SPN requirements for Kerberized hadoop access are unique. The required SPN’s for hadoop access are as follows, it requires additional SPN’s for the Access Zone that HDFS NameNode access is made through:

 

 

Review the registered SPN’s on the Isilon cluster and add the required SPN’s for the SmartConnect Zone name if needed.

#isi auth ads spn list --provider-name=<AD PROVIDER NAME>

 

The following example illustrates the required SPN’s:

Isilon Cluster Name - rip2.foo.com - SPN: hdfs/rip2.foo.com

Access Zone NN SmartConnect FQDN - rip2-cd1.foo.com - SPN's: hdfs/rip2-cd1.foo.com & HTTP/rip2-cd1.foo.com

 

 

For additional information on adding or modifying Isilon SPN’s in Active Directory see the Isilon CLI Administrative Guide.

 

 

Isilon Hadoop (HDFS) Changes

The following configuration changes are required on the HDFS Access Zone.

 

1. Disable simple authentication. This enforces only Kerberos or delegation token authentication access only.

 

# isi hdfs settings modify --authentication-mode=kerberos_only --zone=rip2-cd1

 

17.png

 

 

2. Create the required Proxy Users

 

Proxy users are required for service account impersonation for specific hadoop services to execute jobs, add the required proxy users as needed. More on proxy users in a later post and review the Isilon CLI administrative guide.

 

 

3. Increase the hdfs log level

 

#isi hdfs log-level modify --set=verbose

 

 

This completes the Isilon hadoop Active Directory setup:

- Isilon joined to Active Directory, Provider Online

- HDFS Access Zone has Active Directory provider added

- SPN's are correctly configured

- HDFS Service configured for kerberos_only

- DNS configuration is valid

 

 

Enable Kerberos on Cloudera

Having met all the prerequisites, the Cloudera cluster can be Kerberized. It is suggested to suspend all client and user activity on the Hadoop cluster prior to executing any Kerberization tasks.

 

01.png

 

From the Dashboard, Select Security

1.png

 

Enable Kerberos,

 

2.png

Having met all the prerequisites, check all the boxes

 

3.png

 

Continue,

 

5.png

 

Add the KDC Server Host FQDN

Add the Security Realm (the AD domain)

Add additional Encryption types (OneFS 8.0.x support aes-256)

Modify the OU for the delegated Cloudera OU to be used for Principals

 

 

It is recommended to manage host krb5.conf files through Cloudera Manager

6.png

 

check the Manage krb5.conf files through Cloudera Manager

7.png

 

8.png

 

Accept the defaults,

 

Since Cloudera Manager will create and manage all the Principals, an AD OU with an Delegated Administrative account is used,

 

9.png

 

Enter the credentials for the AD user with Delegated access to the OU in the AD Domain.

 

10.png

 

Continue,

11.png

 

Continue,

 

12.png

 

Continue,

 

13.png

 

These ports can be left as default,

 

14.png

 

Yes, I am ready to restart the cluster now and Continue,

 

The kerberization wizard will start,

 

15.png

 

 

The wizard will create the required Principals in Active Directory,

 

16.png

 

 

Kerberos enablement will continue, Service will attempt to restart

 

18.png

 

 

The Hue Service will Fail and halt the wizard, this is a known issue and will need to be worked around.

 

19.png

 

 

In order to get the Hue Service to start following Kerberization follow the following procedure --- >  <coming soon>

 

Since the failure of the Hue service prevents the wizard from completing, we need to do the following:

 

Open another browser session to the CM Dashboard: http://<CM-URL>:7180 and review the state of the services.

 

You will likely see Service in an unhealthy state, we can address each of these services individually; starting or restart them as needed, monitor the log files to get them started. Some services may just need restarting manually. But all services are fully Kerberized now.

 

 

20.png

 

 

On Completion of them ALL being started, EXCEPT HUE, you can close the other Kerberization wizard browser, all services are Kerberized and the cluster is operational (except Hue) as seen below:

 

Getting the Hue Service started:

Getting the Hue Service Started on Kerberized Cloudera with Isilon

 

 

 

21.png

 

Address any configuration or alarms as needed, but this completes the procedure for Kerberizing Cloudera with Isilon, we can now test it.

 

 

See my prior post for some basic kerberos testing methodology: Ambari HDP with Isilon 8.0.0.1 and Active Directory Kerberos Implementation

 

I'd suggested the following smoketests to valid the cluster & Isilon are correctly Kerberized.

1.Test without a valid ticket

(obtain a valid ticket)

2.Browse the Hadoop root

3.Write a file

4.Run a simple Yarn job; teragen

 

 

 

 

 

 

 

russ_stevenson

Isilon

Using Hadoop with Isilon - Isilon Info Hub