There have been several questions from the field recently around how to determine file and directory quantity information on an Isilon cluster when InsightIQ analytics data is not available.
While InsightIQ and the associated Filesystem Analyze job (FSA) provide very accurate and rich file count and distribution metrics, sometimes the size of a dataset, cluster load, lack of license, or the duration of the FSA job mean that running it is undesirable. In these cases, there are a couple of other approaches that can be used instead.
There is a job engine job called Lincount which (relatively) quickly scans the filesystem and returns the total count of LINs (logical inodes). LIN count is equivalent to the total file and directory count on a cluster. The job runs by default at the LOW priority, and is the fastest method of determining object count on OneFS, assuming no other job has run to completion.
To kick off the Lincount job, the following command can be run from the OneFS command line interface (CLI):
# isi job start lincount
The output from this will be along the lines of “Added job ”.
Note: The number in square brackets is the job ID.To view results, run the following from the CLI:
# isi job reports view [job ID]
# isi job reports view 52
LinCount phase 1 (2015-06-17T09:33:33)
Elapsed time 1 seconds
Job mode LinCount
LINs traversed 1722
SINs traversed 0
The "LINs traversed" metric indicates that 1722 files and directories were found.
Note: The Lincount job will also include snapshot revisions of LINs in its count.
If you don’t wish to run the Lincount job for whatever reason, other recently completed jobs can also be used to determine a rough estimate of the file system’s object count.
For example, looking at the output from a recently run MultiScan job yields:
# isi job reports view 53 -v
MultiScan phase 1 (2015-06-17T11:27:27)
Elapsed time 16 seconds
Rebalance/Bytes 3069554688 bytes (2.859G)
--- snip ---
This approach also distinguishes between files and directories, in addition to LINs, and provides a count for each.
Note: The disk scan phase of Autobalance can over-report LINs. However, the AutobalanceLin job will show an accurate count. If accuracy matters more than other factors, use the Lincount job output instead.
If licensed and configured, SmartQuotas can be leveraged to help determine the file and directory count on a cluster. Quotas have the advantage of directory/subdirectory level configuration, and advisory quotas don’t have any capacity utilization restrictions or repercussions for cluster users. However the SmartQuotas job does take some time and resources to run the initial scan and, in the case of heavily nested quotas, can lead to general performance degradation. As such, configuring top level /ifs quotas is not a recommended practice.
Note: Quotas can be used to calculate as file and directory count that includes snapshot revisions, provided the quota is configured to include snaps in its accounting via the “--snaps=true” configuration option..
So, to summarize, for directory level granularity (and in the absence of FSA data), using SmartQuotas advisory quotas is the most viable option. However, if a file count for the entire cluster is required, running the Lincount job or viewing the output from Multiscan.