Take the following scenario:
“A six node X410 cluster with 1TB drives has a recommended protection level of +2d:1n. The cluster is already 87% full, so a new node addition is required to increase the capacity. At seven nodes the recommended protection level changes to +3d:1n1d. What’s the most efficient way to do this?”
In essence, should the protection level be changed before or after add the new node?
Since the main objective here is efficiency, limiting the amount of protection and layout work that OneFS has to perform is desirable.
In this case, both node addition and a change in protection level require the cluster’s restriper daemon to run. This entails two long running operations: First, to balance data evenly across the seven node cluster. Then to increase the on-disk data protection.
Say a new node is added first, and then the cluster protection is changed to the recommend level for the new configuration, the process would look like:
1) Add a new node to the cluster
2) Let rebalance finish
3) Configure the data protection level to +3d:1n1d
4) Allow the restriper to complete the re-protection
However, by addressing the protection level change first, all the data restriping can be performed more efficiently and in a single step:
1) Change the protection level setting to +3d:1n1d
2) Add nodes (immediately after changing the protection level)
3) Let rebalance finish
In addition to reducing the amount of work the cluster has to do, this streamlined process also has the benefit of getting data re-protected at the new recommended level more quickly.
OneFS protects and balances data by writing file blocks across multiple drives on different nodes. This process is known as ‘restriping’ in Isilon jargon. The Job Engine defines a restripe exclusion set that contains those jobs which involve file system management, protection and on-disk layout. The restripe set encompasses the following jobs:
Balance free space in a cluster
Scans file system after device failure to ensure all files remain protected
Locate and clear media-level errors from disks
Runs AutoBalance and Collect jobs concurrently
Applies the default file policy (unless SmartPools is activated)
Protect shadow stores
Protects and moves data between tiers of nodes within cluster
Manages OneFS version upgrades
Each of the Job Engine jobs has an associated restripe goal, which can be displayed with the following command:
# isi_gconfig -t job-config | grep restripe_goal
The different restriper functions operate as follows, where each in the path is a superset of the previous:
The following table describes the action and layout goal of each restriper function:
Always restripe using the retune layout goal. Originally intended to optimize layout for performance, but has instead become a synonym for ‘force restripe’.
Attempt to balance utilization between drives etc. Also address all conditions implied by REPROTECT.
Change the protection level to more closely match the policy if the current cluster state allows wider striping or more mirrors. Re-evaluate the disk pool policy and SSD strategy. Also address all conditions implied by REPAIR.
Replaces any references to restripe_from (down/smartfailed) components. Also fixes recovered writes.
Here’s how the various Job Engine jobs (as reported by the isi_gconfig –t job-config command above) align with the four restriper goals:
The retune goal moves the current disk pool to the bottom of the list, increasing the likelihood (but not guaranteeing) that another pool will be selected as the restripe target. This is useful, for example, in the event of a significant drive loss in one of the disk pools that make up the node’s pool (eg. disk pool 4 suffers loss of 2+ drives and it becomes > 90% full). Using a retune goal more ‘quickly’ forces rebalance to the other pools.
So, an efficient approach to the earlier cluster expansion scenario is to change protection and then add the new node. A procedure for this is as follows:
1. Reconfigure the protection level to the recommended setting for the appropriate node pool(s). This can be done from the WebUI by navigating to File System > Storage Pools > SmartPools and editing the appropriate node pool(s):
2. Especially for larger clusters (ie. twenty nodes or more), of if there’s a mix of node hardware generations, it’s helpful to do some prep work upfront prior to adding a node. Prior to adding node(s):
a. Image any new node to the same OneFS version that the cluster is running.
b. Ensure that any new nodes have the correct versions of node and drive firmware, plus any patches that may have been added, before joining to the cluster.
c. If the nodes are from different hardware generations or configuration, ensure that they fit within the Node Compatibility requirements for the cluster’s OneFS version.
3. Set the Job Engine daemon to ‘disable’ an hour or so prior to adding new node(s) to help ensure a clean node join. This can be done with the following command:
# isi services –a isi_job_d disable
4. Add the new node(s) and verify the healthy state of the exapanded cluster:
a. Confirm there are no un-provisioned drives:
# disi –I diskpools ls | grep –i “Unprovisioned drives”
b. Check that the node(s) joined the existing pools
# isi storagepool list
5. Restart the Job Engine:
# isi services –a isi_job_d enable
6. After adding all nodes, the recommendation for a cluster with SSDs is to run AutoBalanceLin with an impact policy of ‘LOW’ or OFF_HOURS. For example:
# isi job jobs start autobalancelin --policy LOW
7. To ensure the restripe is going smoothly, monitor the disk IO (‘DiskIn’ and ‘DiskOut’ counters) using the following command:
# isi statistics system –nall –-oprates –-nohumanize
Monitor the ‘DiskIn’ and ‘DiskOut’ counters. Between 2500-5000 disk IOPS is pretty healthy for nodes containing SSDs.
8. Additionally, cancelling Mediascan and/or MultiScan and pausing FSanalysis will reduce resource contention and allow the AutoBalanceLIN job to complete more efficiently.
# isi job jobs cancel mediascan
# isi job jobs cancel multiscan
# isi job jobs pause fsanalyze
Finally, it’s worth periodically running and reviewing the Isilon Advisor health check report - especially pre and post configuring changes and adding new nodes to the cluster.
The Isilon Advisor diagnostic tool will help verify that the OneFS configuration is as expected and verify there are no cluster issues.