I wanted to create a few posts revisiting the networking options when designing and configuring network connectivity for use with Hadoop clusters with OneFS. This topic and the recommended best practices have evolved significantly over the last few years as OneFS has evolved and changes were made to HDFS on Isilon.

 

Since ultimately the compute services and clients connect to Isilon via the defined File System URI DNS name, there are a number of potential options to consider when creating a SmartConnect network pool strategy for integration into a hadoop compute cluster:

  • IP Network Pools – One or many IP address pools; NameNode, DataNode or Single pool, SmartConnect
  • Dynamic or Static pools
  • HDFS racks implemented

 

With every node in an Isilon cluster being able to act as a NameNode and DataNode some options exist in how to best deploy the network configuration to best optimize the Isilon nodes and your clients. Before we get started, let’s recap a few concepts.

Remember: Pools are about segregating interfaces and traffic, while Allocation methods are about address failover behavior.

 

IP Address Pools

IP address pools are assigned within a subnet and consist of one or more IP address ranges. You can partition nodes and network interfaces into logical IP address pools. IP address pools are also utilized when configuring SmartConnect DNS zones and client connection management.

 

You can add network interfaces to IP address pools to associate address ranges with a node or a group of nodes. SmartConnect settings that manage DNS query responses and client connections are configured at the IP address pool level.

 

SmartConnect Zones

Clients can connect to the Isilon clusters through a specific IP address or through a name that represents an IP address pool. You can configure a SmartConnect DNS zone name for each IP address pool. The zone name must be a fully qualified domain name. SmartConnect requires that you add a new name server (NS) record that references the SmartConnect service IP address in the existing authoritative DNS zone that contains the cluster. You must also provide a zone delegation to the fully qualified domain name (FQDN) of the SmartConnect zone in your DNS infrastructure.

 

Static IP Allocation

Assigns one IP address to each network interface added to the IP address pool, but does not guarantee that all IP addresses are assigned. IP addresses do not failover if an interface becomes unavailable.

 

Dynamic IP Allocation

Assigns IP addresses to each network interface added to the IP address pool until all IP addresses are assigned. This guarantees a response when clients connect to any IP address in the pool. If a network interface becomes unavailable, its IP addresses are automatically moved to other available network interfaces in the pool as determined by the IP address failover policy

 

Virtual HDFS Racks

OneFS enables you to define a subset of node interfaces on the Isilon cluster through a pool and an associated group of Hadoop compute clients as a virtual HDFS rack. Virtual HDFS racks allow you to fine-tune client connectivity by directing Hadoop compute clients to preferentially connect to a specific set of nodes; these could be located on the same switch or faster nodes classes, depending on your network and cluster topology.

 

In a simple topology all Isilon nodes act as NameNode and DataNodes, this would be implemented as a single IP Pool/SmartConnect zone. A client requests access via the SmartConnect FQDN name associated with the HDFS root. In order to determine which NameNode we connect to, a DNS query is made against the SmartConnect zone name and we return any node in the cluster per normal SmartConnect behavior (1 - 4). The client then makes a NameNode request to that specific Isilon node (5 & 6), the Isilon node responds with which Isilon node to connect to get access to those data blocks from (can be any node in the cluster in the IP Pool assigned to the SmartConnect pool). The client then makes a data node connection to that Isilon node (7 & 8).

 

1.png

 

 

When a rack is introduced, all nodes still act as a NameNode and DataNode but the response of which DataNode to connect to can be managed. The process occurring is similar except on querying the NameNode (5 & 6) to get a DataNode to connect to, the cluster consults the defined rack to determine if the client IP requesting data should connect to only a specific set of nodes (7 & 8) (the ones defined by the rack allocation).The architecture this benefits is if the client and the Isilon nodes are located within the same rack/switch to limit cross switch traffic. Note: NameNode traffic can cross switches as all nodes are in the same pool and any Isilon node can be returned via SmartConnect as the NameNode for the client to connect to, NameNode traffic is significantly smaller than DataNode traffic so this should not be an issue.


 

2.png

 

 

A Virtual HDFS rack is the association between a range of hadoop client source IP’s and an Isilon IP pool. The base implementation of using racks requires a minimum of two pools but it may contain more.

  1. 1. The NameNode pool; this pool likely contains all the nodes in the cluster that will provide HDFS protocol access, HDFS clients make connections to this pools SmartConnect name for NameNode requests.
  2. 2. A DataNode pool; this is all the Isilon nodes that you wish to provide DataNode access to.

 

The rack definition can be explicit, a specific range of hadoop client IP's:

 

# isi hdfs racks list --zone=zone1-cdh

Name   Client IP Ranges                        IP Pools

------------------------------------------------------------------------------------

/rack1   10.99.36.1-10.99.36.124 subnet0:hadoop-pool-cdh1

------------------------------------------------------------------------------------

Total: 1

 

A rack can also be defined as a default rack, basically stating all source IP’s should be used in the rack definition.

 

# isi hdfs racks list --zone=zone1-cdh

Name   Client IP Ranges                        IP Pools

------------------------------------------------------------------------------------

/rack1   0.0.0.0-255.255.255.255 subnet0:hadoop-pool-cdh1

 

 

In earlier versions of OneFS it was recommended to use multiple IP pools and racks for all hdfs configurations, as improvements and new features were introduced into OneFS this recommended best practice has evolved depending on the cluster architecture and how clients and nodes are racked.

 

In the next post we will look at how to implement IP pool strategies and racks if they are indeed even needed.


Part2 ------ >  SmartConnect, Network Pools and HDFS Racks for Hadoop Part 2

 

 

Using Hadoop with Isilon - Isilon Info Hub

russ_stevenson

Isilon