In a precious article, we took a look at OneFS’ in-line dedupe functionality, the newest component of the in-line data reduction suite. To complement this, OneFS 8.2.1 provides six principle reporting methods for obtaining efficiency information with in-line data reduction:
- Using the ‘isi statistics data-reduction’ CLI command
- Via the ‘isi compression’ CLI command
- Via the ‘isi dedupe’ CLI command and WebUI chart
- From the ‘isi get -O’ CLI command
- Configuring SmartQuotas reporting
- OneFS WebUI Cluster Dashboard
Let's look at each of these in a bit more detail:
1) The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:
# isi statistics data-reduction
Recent Writes (5 mins) Cluster Data Reduction
Logical data 339.50G Est. logical data 1.37T
Zero-removal saved 112.00k
Deduplication saved 432.00k Dedupe saved 1.41G
Compression saved 146.64G Est. compression saved 199.82G
Preprotected physical 192.87G Est. preprotected physical 1.18T
Protection overhead 157.26G Est. protection overhead 401.22G
Protected physical 350.13G Protected physical 1.57T
Deduplication ratio 1.00:1 Est. dedupe ratio 1.00:1
Compression ratio 1.76:1 Est. compression ratio 1.17:1
Data reduction ratio 1.76:1 Est. data reduction ratio 1.17:1
Efficiency ratio 0.97:1 Est. storage efficiency ratio 0.87:1
The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.
The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:
339.50 / 192.87 = 1.76 Or a Data Reduction ratio of 1.76:1
Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:
339.50 / 350.13 = 0.97 Or an Efficiency ratio of 0.97:1
2) From the OneFS CLI, the ‘isi compression stats’ command provides the option to either view or list compression statistics. When run in ‘view’ mode, the command returns the compression ratio for both compressed and all writes, plus the percentage of incompressible writes, for a prior five-minute (300 seconds) interval. For example:
# isi compression stats view
stats for 300 seconds at: 2018-12-14 11:30:06 (1544815806))
compression ratio for compressed writes: 1.28:1
compression ratio for all writes: 1.28:1
incompressible data percent: 76.49%
total logical blocks: 2681232
total physical blocks: 2090963
writes for which compression was not attempted: 0.02%
Note that if the ‘incompressible data’ percentage is high in a mixed cluster, there’s a strong likelihood that the majority of writes are going to a non-F810 pool.
The ‘isi compression stats’ CLI command also accepts the ‘list’ argument, which consolidates a series of recent reports into a list of the compression activity across the file system. For example:
# isi compression stats list
Statistic compression overall incompressible logical physical compression
ratio ratio % blocks blocks skip %
1544811740 3.07:1 3.07:1 10.59% 68598 22849 1.05%
1544812340 3.20:1 3.20:1 7.73% 4142 1293 0.00%
1544812640 3.14:1 3.14:1 8.24% 352 112 0.00%
1544812940 2.90:1 2.90:1 9.60% 354 122 0.00%
1544813240 1.29:1 1.29:1 75.23% 10839207 8402380 0.00%
The ‘isi compression stats’ data is used for calculating the right-hand side estimated ‘Cluster Data Reduction’ values in the ‘isi statistics data-reduction’ command described above. It also provides a count of logical and physical blocks and compression ratios, plus the percentage metrics for incompressible and skipped blocks.
The value in the ‘statistic’ column at the left of the table represents the epoch timestamp for each sample. This epoch value can be converted to a human readable form using the ‘date’ CLI command. For example:
# date -d <value>
3) From the OneFS CLI, the ‘isi dedupe stats’ command provides cluster deduplication data usage and savings statistics, in both logical and physical terms. For example:
# isi dedupe stats
Cluster Physical Size: 86.14T
Cluster Used Size: 4.44T
Logical Size Deduplicated: 218.81G
Logical Saving: 182.56G
Estimated Size Deduplicated: 271.92G
Estimated Physical Saving: 226.88G
In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency. Similarly, the OneFS WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.
Be aware that the deduplication statistics do not include zero block removal savings. Since zero block removal is technically not due to data deduplication it is tracked separately but is included as part of the overall data reduction ratio.
Note that while OneFS 8.2.1 tracks statistics for how often zero blocks are removed, there is no current method to determine how much logical space is being saved by zero block elimination. Zero block report enhancement is planned for a future OneFS release.
4) In addition to the ‘isi statistics data-reduction and isi compression commands, OneFS 8.2.1 also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:
# isi get –DDO file1
* Size: 167772160
* PhysicalBlocks: 10314
* LogicalSize: 167772160
Metatree logical blocks:
zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000
The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.
When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.
A file that for whatever reason cannot be compressed will be reported as such:
5) In OneFS 8.2.1, Isilon SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of compression and deduplication. On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.
Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:
# isi quota quotas list
Type AppliesTo Path Snap Hard Soft Adv Used Efficiency
directory DEFAULT /ifs No - - - 2.3247T 1.29 : 1
More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:
# isi quota quotas view /ifs directory
Thresholds Include Overhead: No
Physical(With Overhead): 1.80T
Logical(W/O Overhead): 2.33T
Efficiency(Logical/Physical): 1.29 : 1
To configure SmartQuotas for in-line data efficiency reporting, create a directory quota at the top-level file system directory of interest, for example /ifs. Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:
Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.
The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there. Note that the quota directory efficiency ratio, and other statistics are not yet available via the platform API as of OneFS 8.2.1.
6) In OneFS 8.2.1, the OneFS WebUI cluster dashboard now displays a storage efficiency tile, which show physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.
Be aware that, while all of the above storage efficiency tools are available on any cluster running OneFS 8.2.1, the in-line compression metrics will only be relevant for clusters containing F810 node pools.
It is challenging to broadly characterize the in-line dedupe performance overhead with any accuracy since it is dependent on various factors including the duplicity of the data set, whether matches are found against other LINs or SINs, etc. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of in-line dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.