Making clusters with Blueprints

 

Yes, you can use Ambari Blueprints if your data is stored in OneFS.

 

It's like a Hadoop replicator

 

Ambari Blueprints have been around since Ambari 1.6, and provide an API to make the Ambari cluster creation and configuration process effortless and repeatable. You no longer have to restart deployment after you choose the wrong radio button in the install wizard, or have to do a deep debug of services to discover you forgot a single, critical setting that you really have to do every time.

 

Actually it's like a replicator and a protocol droid rolled up in one -- Star Trek and Star Wars. You use it to talk to Ambari in the machine language that defines services and that it uses to speak to itself. The community continues to improve Blueprints, too. Ambari 2.2 includes Ranger, config recommendations, and a tech preview of true automation of Kerberos. Here is an intro from the latest Ambari documentation if you actually don't know what I'm talking about. It points to this helpful wiki for the API.

 

Because the OneFS integration with Ambari is unique, there are a few requirements that need to be repeated in every deployment. Blueprints to the rescue!

 

How Isilon engineering uses it

 

We use blueprints to quickly deploy test clusters that are the same every time. We automate cluster creation by spinning up enough

virtual machines with vCenter to satisfy the blueprint, then we install Ambari across them and submit the blueprint to Ambari. We are confident that as we test new Ambari and HDP versions we're looking at something built the same way every time. We use the OneFS platform API to create zones on existing systems, or to configure a short-lived virtualized OneFS cluster alongside the virtual HDP cluster.

 

Do It Yourself

 

Here's the workflow and some json samples so that you can get your hands dirty.

 

There are two key elements. First, a blueprint file includes all of the necessary settings and a description of which service components should be deployed together, and how many of each type of host should be created. Second, a cluster creation template provides Ambari with the details for the hosts that it should actually deploy on.

 

OneFS

 

Before you start the Ambari deployment, get OneFS ready. You should have necessary service users configured in your AD or as local users, provide the NameNode and Ambari Server addresses, and complete necessary HDFS settings. If you find yourself needing to repeatedly create and configure OneFS access zones in addition to Hadoop clusters, OneFS has a great API for management. Find out more in our OneFS SDK info hub.

 

Blueprint

 

Here are the things that OneFS needs for the deployment to be successful.

 

  • Host group for OneFS, which has cardinality of 1
  • OneFS includes NAMENODE, DATANODE, SECONDARY_NAMENODE, METRICS_MONITOR
    • Add in KERBEROS_CLIENT if you're doing a Kerberos deployment
  • Override a few default settings

 

Here's a simple example:

 

{

  "Blueprints" : {

    "stack_name" : "HDP",

    "stack_version" : "2.4"

  },

  "host_groups" : [

    {

      "name" : "onefs_group",

      "components" : [

        {"name" : "DATANODE"},

        {"name" : "NAMENODE"},

        {"name" : "SECONDARY_NAMENODE"},

        {"name" : "METRICS_MONITOR"}

          ],

      "cardinality" : "1"

    },

    {

      "name" : "master_group",

      "components" : [

        {"name" : "APP_TIMELINE_SERVER"},

        {"name" : "HISTORYSERVER"},

        {"name" : "RESOURCEMANAGER"},

        {"name" : "HDFS_CLIENT"},

        {"name" : "MAPREDUCE2_CLIENT"},

        {"name" : "NODEMANAGER"},

        {"name" : "YARN_CLIENT"},

        {"name" : "ZOOKEEPER_CLIENT"},

        {"name" : "ZOOKEEPER_SERVER"}

      ],

      "cardinality" : "1"

    },

    {

      "name" : "client_group",

      "components" : [

        {"name" : "HDFS_CLIENT"},

        {"name" : "MAPREDUCE2_CLIENT"},

        {"name" : "NODEMANAGER"},

        {"name" : "YARN_CLIENT"},

        {"name" : "ZOOKEEPER_CLIENT"},

        {"name" : "ZOOKEEPER_SERVER"}

      ],

      "cardinality" : "2"

    }

  ],

  "configurations" : [

    {

      "hdfs-site" : {

        "dfs.client-write-packet-size" : "131072",

        "dfs.namenode.http-address" : "%HOSTGROUP::onefs_group%:8082",

        "dfs.namenode.https-address" : "%HOSTGROUP::onefs_group%:8080"

      }

    },

    {

      "yarn-site" : {

        "yarn.scheduler.capacity.node-locality-delay" : "0"

      }

    }

  ]

}

 

Here's why each of these settings were included:

 

ScopeKeyValueNote
hdfs-sitedfs.client-write-packet-size131072The default block size on OneFS is larger than the Hadoop default of 65536.
hdfs-sitedfs.namenode.http-address\<SmartConnect>:8082OneFS uses custom ports.
hdfs-sitedfs.namenode.https-address\<SmartConnect>:8080OneFS uses custom ports.
yarn-siteyarn.scheduler.capacity.node-locality-delay0This parameter refers to how many "scheduling opportunities" (one opportunity is one heartbeat from any Node Manager) go by before giving up on node locality and try rack locality. Since all data is node-local to OneFS and not for any of the clients, waiting to find a client that has the data node-local is pointless.

 

Cluster creation in one step

 

In the cluster template, pass the FQDN for the OneFS Smart Connect name server. Here's a sample template:

 

{

  "blueprint" : "cluster-with-onefs",

  "host_groups" :[

    {

      "name" : "onefs_group",

      "hosts" : [

        {

          "fqdn" : "onefs-sc.emc.com"

        }

      ]

    },

    {

      "name" : "master_group",

      "hosts" : [

        {

          "fqdn" : "the-masters.emc.com"

        }

      ]

    },

    {

      "name" : "client_group",

      "hosts" : [

        {

          "fqdn" : "client-0.emc.com",

          "fqdn" : "client-1.emc.com"

            }

      ]

    }

  ]

}

 

Kerberization gets easier

 

As mentioned above, Ambari now has kerberization through Blueprints in tech preview. You'll need Ambari 2.2.0 and OneFS 8.0.0.1 or higher to use it. Really, be sure to use the most recent version of Ambari 2.2 because Apache devs are quickly fixing bugs and adding features in each minor release.

 

You'll need:

  • [KERBEROS CLIENT] added to the OneFS host group as well
  • HDP settings that need to be created or overridden when OneFS is in a kerberized cluster. For example, the use_ip setting for Yarn.
  • Principal name changes

 

There are more settings that need to be overridden and also principal name considerations for OneFS with Kerberos, so be sure to look through the configuration walk through that we published recently. Have a look at the comments, too. Implement each of the settings and the principals that are relevant to your cluster.

 

Principals need to be declared in your blueprint or cluster creation template using a new security section. Here's a simple example using the same principal names as described in the Kerberos ECN post.

 

  "security" : {

    "type" : "KERBEROS",

    "kerberos_descriptor": {

      "properties": {

        "realm" : "YOURREALMHERE",

        "keytab_dir" : "/etc/security/keytabs"

      },

      "identities" : [

        {

          "principal" : {

            "type" : "user",

            "value" : "ambari-qa@${realm}"

          },

          "name" : "smokeuser"

        }

      ],

      "services" : [

        {

          "name" : "HDFS",

          "components" : [

            {

              "name" : "NAMENODE",

              "identities" : [

                {

                  "principal" : {

                    "configuration" : "hadoop-env/hdfs_principal_name",

                    "type" : "user",

                    "local_username" : "hdfs",

                    "value" : "hdfs@${realm}"

                  },

                  "name" : "hdfs"

                }

              ]

            }

          ]

        },

        {

          "name" : "MAPREDUCE2",

          "components" : [

            {

              "name" : "HISTORYSERVER",

              "identities" : [

                {

                  "principal" : {

                    "configuration" : "mapred-site/mapreduce.jobhistory.principal",

                    "type" : "service",

                    "local_username" : "mapred",

                    "value" : "mapred/_HOST@${realm}"

                  },

                  "name" : "history_server_jhs"

                }

              ]

            }

          ]

        },

        {

          "name" : "YARN",

          "components" : [

            {

              "name" : "NODEMANAGER",

              "identities" : [

                {

                  "principal" : {

                    "configuration" : "yarn-site/yarn.nodemanager.principal",

                    "type" : "service",

                    "local_username" : "yarn",

                    "value" : "yarn/_HOST@${realm}"

                  },

                  "name" : "nodemanager_nm"

                }

              ]

            },

            {

              "name" : "RESOURCEMANAGER",

              "identities" : [

                {

                  "principal" : {

                    "configuration" : "yarn-site/yarn.resourcemanager.principal",

                    "type" : "service",

                    "local_username" : "yarn",

                    "value" : "yarn/_HOST@${realm}"

                  },

                  "name" : "resource_manager_rm"

                }

              ]

            }

          ]

        }

 

As Kerberos in Blueprints transitions to a supported Ambari feature, we'll loop back if we find any considerations needed for a successful deployment with OneFS.