Got a question from the field asking whether a deduplicated file gets reported by and counted against SmartQuotas, and if there’s a performance penalty accessing that deduplicated file.


With OneFS, deduplicated files appear no differently than regular files to standard quota policies, regardless of whether the file has been deduplicated by SmartDedupe or OneFS in-line deduplication – or both. This is also true if the file is a clone or has been containerized by OneFS Small File Storage Efficiency (SFSE), both of which also use shadow stores, and also for in-line compression.


However, if the quota accounting is configured for ‘physical size’, which includes data-protection overhead, the additional space used by the shadow store will not be accounted for by the quota.

 

In OneFS 8.2.1, SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of deduplication (plus in-line compression, if available and enabled). On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.


dedupe-quota-1.png


Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:


# isi quota quotas list

Type      AppliesTo Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No    - -     -    2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1

Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.

 

dedupe-quota-2.png

 

To configure SmartQuotas for in-line data efficiency reporting create a directory quota at the top-level file system directory of interest, for example /ifs. The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there.


In addition to SmartQuotas, OneFS provides several other reporting methods for obtaining efficiency information about deduplication, and data reduction in general. The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:


339.50 / 192.87 = 1.76    Or a Data Reduction ratio of 1.76:1


Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:


339.50 / 350.13 = 0.97    Or an Efficiency ratio of 0.97:1


In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency.


# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

Similarly, the WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.


dedupe-quota-3.png

 

OneFS’ WebUI cluster dashboard now displays a storage efficiency tile, which shows physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

dedupe-quota-4.png

 

SmartDedupe also deduplicates common blocks within the same file, resulting in even better data efficiency.

 

InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.


So how does SmartDedupe play with the other storage management and data protection tools in OneFS portfolio? Let’s take a look:


When deduplicated files are replicated to another Isilon cluster via SyncIQ, or backed up to a tape device, the deduplicated files are inflated (or rehydrated) back to their original size, since they no longer share blocks on the target Isilon cluster. However, once replicated data has landed, SmartDedupe can be run on the target cluster to provide the same space efficiency benefits as on the source.


Shadows stores are not transferred to target clusters or backup devices. Because of this, deduplicated files do not consume less space than non-deduplicated files when they are replicated or backed up. To avoid running out of space on target clusters or tape devices, it is important to verify that the total amount of storage space saved and storage space consumed does not exceed the available space on the target cluster or tape device. To reduce the amount of storage space consumed on a target Isilon cluster, you can configure deduplication for the target directories of your replication policies. Although this will deduplicate data on the target directory, it will not allow SyncIQ to transfer shadow stores. Deduplication is still performed post-replication, via a deduplication job running on the target cluster.


Because files are backed up as if the files were not deduplicated, backup and replication operations are not faster for deduplicated data. You can deduplicate data while the data is being replicated or backed up. It’s also worth noting that OneFS NDMP backup data won’t be deduped unless deduplication is provided by the backup vendor’s DMA software. However, compression is often provided natively by the backup tape or VTL device instead.


SmartDedupe does not deduplicate the data stored in a snapshot. However, snapshots can be created of deduplicated data. If a snapshot is taken of a deduplicated directory, and then the contents of that directory are modified, the shadow stores will be transferred to the snapshot over time. Because of this, more space will be saved on a cluster if deduplication is run prior to enabling snapshots. If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, it will take time before the snapshot data is affected by deduplication. Newly created snapshots will contain deduplicated data, but older snapshots will not.


It is also good practice to revert a snapshot before running a deduplication job. Restoring a snapshot will cause many of the files on the cluster to be overwritten. Any deduplicated files are reverted back to normal files if they are overwritten by a snapshot revert. However, once the snapshot revert is complete, deduplication can be run on the directory again and the resulting space savings will persist on the cluster.


Dedupe is also fully compatible with SmartLock, OneFS’ data retention and compliance product. SmartDedupe delivers storage efficiency for immutable archives and write once, read many (or WORM) protected data sets.

However, OneFS will not deduplicate files that span SmartPools pools or tiers, or that have different protection levels set. This is to avoid potential performance or protection asymmetry which could occur if portions of a file live on different classes of storage.


InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.