In the previous blog article, we looked at some of the architectural drivers and decision points when design clusters to support large datasets. To build on this theme, the next couple of articles will focus on cluster hardware considerations at scale.

 

A key decision for performance, particularly in a large cluster environment, is the type and quantity of nodes deployed. Heterogeneous clusters can be architected with a wide variety of node styles and capacities, in order to meet the needs of a varied data set and wide spectrum of workloads. These node styles encompass several hardware generations, and fall loosely into four main categories or tiers.


  • Extreme performance (all-flash)
  • Performance
  • Hybrid/Utility
  • Archive

 

While heterogeneous clusters can easily include multiple hardware classes and configurations with a minimum of three of each, the best practice of simplicity for building large clusters holds true here too. The smaller the disparity in hardware style across the cluster, the less opportunity there is for overloading, or bullying, the more capacity-oriented nodes. Some points to consider are:


  • Ensure all nodes contain at least one SSD.
  • 20 nodes is the maximum number that OneFS will stripe data across.
  • At a node pool size 40 nodes, Gen6 hardware achieves sled, chassis and neighborhood level protection.
  • When comparing equivalent Gen 6 and earlier generations and types, consider the number of spindles rather than just overall capacity.


Consider the physical cluster layout and environmental factors when designing and planning for a large cluster installation. These factors include:


  • Redundant power supply
  • Airflow and cooling
  • Rackspace requirements
  • Floor tile weight constraints
  • Networking Requirements
  • Cabling distance Limitations


The following table details the physical dimensions, weight, power draw, and thermal properties for the range of Gen 6 chassis:


 

Model

 

Tier

Height

Width

Depth

RU

Weight

MaxWatts

Watts

Max BTU

Normal BTU

F800

All-flash

performance

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

35 IN / 88.9 cm

4RU

169 lbs (77 kg)

1764

1300

6019

4436

H600

 

Performance

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

35 IN / 88.9 cm

4RU

213 lbs (97 kg)

1990

1704

6790

5816

H500

 

Hybrid/Utility

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

35 IN / 88.9 cm

4RU

248 lbs (112 kg)

1906

1312

6504

4476

H400

 

Hybrid/Utility

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

35 IN / 88.9 cm

4RU

242 lbs (110 kg)

1558

1112

5316

3788

A200

 

Archive

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

35 IN / 88.9 cm

4RU

219 lbs (100 kg)

1460

1052

4982

3584

A2000

 

Archive

4U (4x1.75IN)

  1. 17.6 IN / 45 cm

39 IN / 99.06 cm

4RU

285 lbs (129 kg)

1520

1110

5186

3788

 

 

Isilon’s backend network is analogous to a distributed systems bus. Each node has two backend interfaces for redundancy that run in an active/passive configuration. The primary interface is connected to the primary switch, and the secondary interface to a separate switch.


Older clusters utilized DDR Infiniband controllers which required copper CX4 cables with a maximum cable length of 10 meters. After factoring in for dressing the cables to maintain some form of organization within the racks and cable tray, all the racks with Isilon nodes needed to be in close physical proximity to each other –either in the same rack row or close by in an adjacent row.


With newer generation nodes using either QDR Infiniband or 10 or 40Gb Ethernet utilize multi-mode fiber (SC), the cable length limitation is extended to 100 meters. This means that a cluster can now span multiple rack rows, floors, and even buildings, if necessary. This solves the floor space problem but introduces new ones. To perform any physical administration activity on nodes you must know where the equipment is located and potentially have admin resource in both locations, or have to travel back and forth to multiple locations.


Ethernet Backend

The table below shows the various Isilon nodes types and their respective backend network support. As we can see, Infiniband is the common denominator for the backend interconnect, so this is required network type for all legacy clusters that contain Gen5 and earlier node types. For new Gen6 deployments, Ethernet is the preferred medium - particularly for large clusters.

 

 

Node Type/ Backend Network

 

F800

 

H600

 

H500

 

H400

 

A200

 

A2000

 

S210

 

X210

 

X410

 

NL410

 

HD400

 

10 Gb Ethernet

 

 

 

 

P

 

P

 

P

 

 

 

 

 

 

40 Gb Ethernet

 

P

 

P

 

P

 

 

 

 

 

 

 

 

 

Infiniband (QDR)

 

P

 

P

 

P

 

P

 

P

 

P

 

P

 

P

 

P

 

P

 

P

 

 

Currently only Dell EMC Isilon approved switches are supported for backend Ethernet and IB cluster interconnection.

 

  • 40GbE is supported for the F800, H600, and H500 Nodes.
    • Celestica D4040 – 32 Ports
    • Arista DCS-7308 – 32-64 ports, 64-144 Ports with up to 3 additional line cards

 

Vendor

Model

Isilon Model Code

Backend Port Qty

Port Type

Rack Units

40 GbE Nodes

Mixed Nodes (10 & 40 GbE)

Celestica

D4040

851-0259

32

All 40 GbE

1

Less than 32

Supports breakout cables: Total 96 x 10 GbE nodes

Arista

DCS-7308

851-0261

64

All 40 GbE

13

Greater than 32 and less than 64 (included two 32 port line cards

No breakout cables, but supports addition of 10 GbE line card

Arista

 

851-0282

Leaf upgrade (32 ports)

All 40 GbE

 

Greater than 64 and less than 144 (max 3 leaf upgrade)

 

 

  • 10GbE is supported for the A200 and A2000 nodes intended for Archive workflows
    • Celestica D2024 – 24 ports
    • Celestica D2060 – 24-48 ports
    • Arista DCS-7304 – 48-96 ports, 96-144 ports with one additional line card

 

Vendor

Model

Isilon Model Code

Backend Port Qty

Port Type

Rack Units

All 10 GbE Nodes

Mixed Nodes (10 & 40 GbE)

Celestica

D2024

851-0258

24

24 x 10 GbE, 2 x 40 GbE

1

Up to 24 nodes

Not supported

Celestica

D2060

851-0257

48

48 x 10 GbE, 6 x 40 GbE

1

24 to 48 nodes

Not supported

Arista

DCS-7304

851-0260

96

48 x 10 GbE, 4 x 40 GbE

8

48 to 96 nodes     (two 48 port line cards included)

40 GbE line card can be added

Arista

 

851-0283

Leaf upgrade (48 ports)

 

 

96 to 144 nodes  (max 1 leaf upgrade

 

 

Be aware that the use of patch panels is not supported for Isilon cluster backend connections, regardless of overall cable lengths. All connections must be a single link, single cable directly between the node and backend switch. Also, Ethernet and Infiniband switches must not be reconfigured or used for any traffic beyond a single cluster.


Infiniband Backend

As the cluster grows, cable length limitations can become a challenge. A review of the current rack layout and node location is a great exercise to avoid downtime.


  • To upgrade an Infiniband switch, unplug the IB Cable from the Switch Side first. Be aware that there is power on the cable, and an electrical short or static discharge can fry the IB card. Use of a static wrist-band for grounding is strongly encouraged.
  • It is recommended to upgrade to OneFS 8.0.0.6 or later, which has an enhanced Infiniband backend throttle detection and back-off algorithm.
  • Ensure the IB switch is up to date with the latest firmware.
  • The CELOG events and alerts for a cluster’s Infiniband backend are fairly limited. For large clusters with managed switches, the recommendation is to implement additional SNMP monitoring and health-checks for the backend.
  • A pair of redundant backend switches for the exclusive use of a single cluster is a hard requirement. 
  • If the cluster is backed by Intel 12800 IB switches, periodic switch reboots are recommended. 

 

For Infiniband clusters that are anticipated to grow beyond 48 nodes, the current large cluster switches are the 6RU Mellanox SX6506 and the 9RU Mellanox SX6512. The details of these two switches are outlined in the table below:

 

Vendor

Model

Backend Port Qty

Port Type

Rack Units

Cable Type

Mellanox

SX6506

90

FDR Infiniband

6RU

QSFP+ copper or fiber

Mellanox

SX6512

144

FDR Infiniband

9RU

QSFP+ copper or fiber

 

Further information on monitoring, diagnosing and resolving backend Infiniband network issues is available in the Isilon Infiniband troubleshooting guide.