|Also see the Isilon Cloudera Kerberos Installation Guide: http://www.emc.com/collateral/TechnicalDocument/docu84031.pdf|
Continuing the series of posts regarding the deployment of Hadoop with Isilon this post will outline the high level overview of the procedure to Kerberize a Cloudera CDH cluster with Isilon against an Active Directory. It provides the core tasks needed to complete the setup, additional topics will be covered later or in upcoming documents.
This procedure is based on the following:
OneFS 184.108.40.206 contains a number of updates to facilitate the integration and deployment of kerberos against OneFS, it is highly recommended to use this version.
This blog assumes to following Isilon Hadoop environment is configured and operational.
A dedicated Isilon Access Zone is in use (not the system zone).
The Isilon SmartConnect Zone configuration is implemented per best practice for Isilon HDFS access.
The Isilon HDFS configuration is correctly configured.
Cloudera Manager is configured correctly for Isilon integration.
Cloudera Manager will manage and deploy keytab and krb5.conf files
A simple access model currently exists between Hadoop and Isilon; user UID & GID are correctly implemented and allow HDFS access to the Isilon HDFS root with UID & GID parity.
Hadoop jobs and services are fully operational.
If your environment deviates from any of these configurations an alternative approach to Kerberization may be required especially with regards to management of keytabs and krb5.conf files. This procedure will not address all configurations or requirements. Additional EMC services should be engaged when required.
This post also does not address Linux host kerberization, Directory Service integration and the Isilon permissioning model for multiprotocol access following kerberization. I hope to address these at a later date.
Since we are integrating an existing Cloudera CDH cluster with an Isilon into a pre-existing Microsoft Active Directory environment, the high level outline approach can be considered to be:
Prepare and configure the Active Directory for Isilon – Hadoop integration
Prepare the Cloudera cluster and Linux hosts for kerberization
Integrate the Isilon cluster into Active Directory
Kerberize the HDP cluster via the Cloudera enable Kerberos wizard
Complete the integration of Isilon and Cloudera
Test and validate Kerberized services
This post is based on using a pre-existing Active Directory environment for Kerberos User Authentication.
To use an existing Active Directory domain for the cluster with the Cloudera Kerberos wizard, you must prepare the following:
-Isilon, Cloudera Manager and the compute cluster hosts have all the required network access to Active Directory and AD services
-All DNS name resolution of required Active Directory Services is valid
-Active Directory secure LDAP (LDAPS) connectivity has been configured.
-Active Directory OU User container for principals has been created, For example "OU=Hadoop--Cluster,OU=People,dc=domain,dc=com"
-Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the OU User container are implemented.
For additional information, see the cloudera security documents. http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_intro_kerb.html
How Kerberos is Implemented:
Since the Isilon integrated Hadoop cluster is a mix between Linux hosts running the compute services and Isilon running the data services Cloudera cannot effectively complete the Kerberization end-to-end. With Isilon running a clustered operating system ‘OneFS’ ssh based remote management cannot configure and manage the kerberization of Isilon completely, nor does it need to. But, It can still completely deploy and configure the Linux hosts though.
Because of this, the kerberization of the Isilon integrated Hadoop cluster should be considered in the following context:
Isilon is Kerberized
Cloudera Kerberization wizard runs and deploys kerberization to Linux and Hadoop services
Since both sets of systems are now fully Kerberized within the same KDC realm, Kerberized user access can occur between the Isilon and Hadoop cluster seamlessly.
Review that Cloudera 5.x or higher is running.
Forward and reverse DNS between all hosts is tested and validated. Test this with dig or ping.
All services are running (green) on the Cloudera Manager Dashboard.
All other Cloudera specific Kerberos requirements have been met; NTP, DNS, packages etc.
Before launching the Cloudera Kerberization wizard, you need to make the following configurations customization and restart all services.
In the Isilon Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property for the Isilon service, set the value of the hadoop.security.token.service.use_ip property to FALSE.
This key may need creating:
Preparing Hosts for Kerberization
In order for Kerberization to be operational on all Hadoop hosts, the required client libraries must be installed. The OpenLdap client libraries should be installed on the Cloudera Manager Server and all Kerberos client libraries should be installed on ALL hosts.
See the following Cloudera doc for more information: Enabling Kerberos Authentication Using the Wizard
On a RHEL or CentOS system this can be achieved with the following, with other OS install using the appropriate packages
# yum install krb5-workstation
# yum install krb5-libs
# yum install openldap-clients
Isilon OneFS Configuration
This section covers the configuration required for OneFS to respond to requests for secure Kerberized HDFS authenticated by Active Directory.
The cluster must be joined correctly to the target Active Directory as a Provider.
The Access Zone the HDFS root lives under is configured for this Active Directory provider.
All IP addresses within the required SmartConnect Zone must be added to the reverse DNS with the same FQDN for the cluster delegation. All IP's should resolve back to the SmartConnect Zone. This is required for kerberos.
Since OneFS is a clustered file system running on multiple nodes but is joined to Active Directory as a single Computer Object. The SPN requirements for Kerberized hadoop access are unique. The required SPN’s for hadoop access are as follows, it requires additional SPN’s for the Access Zone that HDFS NameNode access is made through:
Review the registered SPN’s on the Isilon cluster and add the required SPN’s for the SmartConnect Zone name if needed.
#isi auth ads spn list --provider-name=<AD PROVIDER NAME>
The following example illustrates the required SPN’s:
Isilon Cluster Name - rip2.foo.com - SPN: hdfs/rip2.foo.com
Access Zone NN SmartConnect FQDN - rip2-cd1.foo.com - SPN's: hdfs/rip2-cd1.foo.com & HTTP/rip2-cd1.foo.com
For additional information on adding or modifying Isilon SPN’s in Active Directory see the Isilon CLI Administrative Guide.
Isilon Hadoop (HDFS) Changes
The following configuration changes are required on the HDFS Access Zone.
1. Disable simple authentication. This enforces only Kerberos or delegation token authentication access only.
# isi hdfs settings modify --authentication-mode=kerberos_only --zone=rip2-cd1
2. Create the required Proxy Users
Proxy users are required for service account impersonation for specific hadoop services to execute jobs, add the required proxy users as needed. More on proxy users in a later post and review the Isilon CLI administrative guide.
3. Increase the hdfs log level
#isi hdfs log-level modify --set=verbose
This completes the Isilon hadoop Active Directory setup:
- Isilon joined to Active Directory, Provider Online
- HDFS Access Zone has Active Directory provider added
- SPN's are correctly configured
- HDFS Service configured for kerberos_only
- DNS configuration is valid
Enable Kerberos on Cloudera
Having met all the prerequisites, the Cloudera cluster can be Kerberized. It is suggested to suspend all client and user activity on the Hadoop cluster prior to executing any Kerberization tasks.
From the Dashboard, Select Security
Having met all the prerequisites, check all the boxes
Add the KDC Server Host FQDN
Add the Security Realm (the AD domain)
Add additional Encryption types (OneFS 8.0.x support aes-256)
Modify the OU for the delegated Cloudera OU to be used for Principals
It is recommended to manage host krb5.conf files through Cloudera Manager
check the Manage krb5.conf files through Cloudera Manager
Accept the defaults,
Since Cloudera Manager will create and manage all the Principals, an AD OU with an Delegated Administrative account is used,
Enter the credentials for the AD user with Delegated access to the OU in the AD Domain.
These ports can be left as default,
Yes, I am ready to restart the cluster now and Continue,
The kerberization wizard will start,
The wizard will create the required Principals in Active Directory,
Kerberos enablement will continue, Service will attempt to restart
The Hue Service will Fail and halt the wizard, this is a known issue and will need to be worked around.
In order to get the Hue Service to start following Kerberization follow the following procedure --- > <coming soon>
Since the failure of the Hue service prevents the wizard from completing, we need to do the following:
Open another browser session to the CM Dashboard: http://<CM-URL>:7180 and review the state of the services.
You will likely see Service in an unhealthy state, we can address each of these services individually; starting or restart them as needed, monitor the log files to get them started. Some services may just need restarting manually. But all services are fully Kerberized now.
On Completion of them ALL being started, EXCEPT HUE, you can close the other Kerberization wizard browser, all services are Kerberized and the cluster is operational (except Hue) as seen below:
Getting the Hue Service started:
Address any configuration or alarms as needed, but this completes the procedure for Kerberizing Cloudera with Isilon, we can now test it.
See my prior post for some basic kerberos testing methodology: Ambari HDP with Isilon 220.127.116.11 and Active Directory Kerberos Implementation
I'd suggested the following smoketests to valid the cluster & Isilon are correctly Kerberized.
1.Test without a valid ticket
(obtain a valid ticket)
2.Browse the Hadoop root
3.Write a file
4.Run a simple Yarn job; teragen