Got the following question from the field recently:
I have a cluster with an primary X410 pool and archive NL410 pool. There is a nightly job that moves inactive files from primary to archive. However, can I set it up so that when I copy files to a folder they go directly to the NL pool without waiting for the nightly job to run?
The answer to the above is yes, with a couple of caveats.
Since the filepool policy applies to the directory, any new files written to it will automatically inherit the settings from the parent directory. Typically, there is not much variance between the directory and the new file. So, assuming the settings are correct, the file is writtenstraight to the desired pool or tier, with the appropriate protection, etc. This applies to access protocols like NFS and SMB, as well as copy commands like ‘cp’ issued directly from the OneFS command line interface (CLI). However, if the file settings differ from the parent directory, the SmartPools job will correct them and restripe the file. This will happen when the job next runs, rather than at the time of file creation.
However, simply moving a file into the directory (via the UNIX CLI commands such as cp, mv, etc) will not occur until a SmartPools, SetProtectPlus, Multiscan, or Autobalance job runs to completion. Since these jobs can each perform a re-layout of data, this is when the files will be re-assigned to the desired NL pool. The file movement can be verified by running the following command from the OneFS CLI:
# isi get -dD <dir>
So the key is whether you’re doing a copy (that is, a new write) or not. As long as you’re doing writes and the parent directory of the destination has the appropriate file pool policy applied, you should get the behavior you want.
One thing to note: If the actual operation that is desired is really a move rather than a copy, it may be faster to change the file pool policy and then do a recursive “isi filepool apply --recurse” on the affected files.
There’s negligible difference between using an NFS or SMB client versus performing the copy on-cluster via the OneFS CLI. As mentioned above, using isi filepool apply will be slightly quicker than a straight copy and delete, since the copy is parallelized above the filesystem layer.
Let’s take a quick file pools refresher…
File pools is the SmartPools logic layer, where user configurable policies govern where data is placed, protected, accessed, and how it moves among the Node Pools and Tiers. This is conceptually similar to storage ILM (information lifecycle management), but does not involve file stubbing or other file system modifications. File Pools allow data to be automatically moved from one type of storage to another within a single cluster to meet performance, space, cost or other requirements, while retaining its data protection settings.
For the scenario above, a File Pool policy may be crafted which dictates that anything written to path /ifs/path1 is automatically moved directly to the Archive tier. For example:
To simplify management, there are defaults in place for Node Pool and File Pool settings which handle basic data placement, movement, protection and performance. All of these can also be configured via the simple and intuitive UI, delivering deep granularity of control. Also provided are customizable template policies which are optimized for archiving, extra protection, performance and VMware files.
When a SmartPools job runs, the data may be moved, undergo a protection or layout change, etc. There are no stubs. The file system itself is doing the work so no transparency or data access risks apply.
Data movement is parallelized with the resources of multiple nodes being leveraged for speedy job completion. While a job is in progress all data is completely available to users and applications.
The performance of different nodes can also be augmented with the addition of system cache or Solid State Drives (SSDs). Within a File Pool, SSD ‘Strategies’ can be configured to place a copy of that pool’s metadata, or even some of its data, on SSDs in that pool.
Overall system performance impact can be configured to suit the peaks and lulls of an environment’s workload. Change the time or frequency of any SmartPools job and the amount of resources allocated to SmartPools. For extremely high-utilization environments, a sample File Pool policy can be used to match SmartPools run times to non-peak computing hours. While resources required to execute SmartPools jobs are low and the defaults work for the vast majority of environments, that extra control can be beneficial when system resources are heavily utilized.
SmartPools file pool policies can be used to broadly control the three principal attributes of a file:
1. Where a file resides.
- Node Pool
2. The file performance profile (I/O optimization setting).
- SmartCache write caching
3. The protection level of a file.
- Parity protected (+1n to +4n, +2d:1n, etc)
- Mirrored (2x – 8x)
A file pool policy is built on a file attribute the policy can match on. The attributes a file Pool policy can use are any of: File Name, Path, File Type, File Size, Modified Time, Create Time, Metadata Change Time, Access Time or User Attributes.
Once the file attribute is set to select the appropriate files, the action to be taken on those files can be added – for example: if the attribute is File Size, additional settings are available to dictate thresholds (all files bigger than… smaller than…). Next, actions are applied: move to Node Pool x, set to y protection level and lay out for z access setting.
Specifies file criteria based on the file name
Specifies file criteria based on where the file is stored
Specifies file criteria based on the file-system object type
Specifies file criteria based on the file size
Specifies file criteria based on when the file was last modified
Specifies file criteria based on when the file was created
Metadata Change Time
Specifies file criteria based on when the file metadata was last modified
Specifies file criteria based on when the file was last accessed
Specifies file criteria based on custom attributes – see below
‘And’ and ‘Or’ operators allow for the combination of criteria within a single policy for flexible, granular data manipulation.
As we saw earlier, for file Pool Policies that dictate placement of data based on its path, data typically lands on the correct node pool or tier without a SmartPools job running. File Pool Policies that dictate placement of data on other attributes besides path name get written to Disk Pool with the highest available capacity and then moved, if necessary to match a File Pool policy, when the next SmartPools job runs. This ensures that write performance is not sacrificed for initial data placement.
Any data not covered by a File Pool policy is moved to a tier that can be selected as a default for exactly this purpose. If no Disk Pool has been selected for this purpose, SmartPools will default to the Node Pool with the most available capacity.