There have been a couple of recent questions from the field around how SmartDedupe interacts with other OneFS storage management and data protection features. So it seemed worth touching on this subject in a blog post.
SyncIQ Replication and SmartDedupe
When deduplicated files are replicated to another Isilon cluster via SyncIQ, or backed up to a tape device, the deduplicated files are inflated (or rehydrated) back to their original size, since they no longer share blocks on the target Isilon cluster. However, once replicated data has landed, SmartDedupe can be run on the target cluster to provide the same space efficiency benefits as on the source.
Shadows stores are not transferred to target clusters or backup devices. Because of this, deduplicated files do not consume less space than non-deduplicated files when they are replicated or backed up. To avoid running out of space on target clusters or tape devices, it is important to verify that the total amount of storage space saved and storage space consumed does not exceed the available space on the target cluster or tape device. To reduce the amount of storage space consumed on a target Isilon cluster, you can configure deduplication for the target directories of your replication policies. Although this will deduplicate data on the target directory, it will not allow SyncIQ to transfer shadow stores. Deduplication is still performed post-replication, via a deduplication job running on the target cluster.
Backup and SmartDedupe
Because files are backed up as if the files were not deduplicated, backup and replication operations are not faster for deduplicated data. You can deduplicate data while the data is being replicated or backed up.
Note: OneFS NDMP backup data won’t be deduplicated unless deduplication is provided by the backup vendor’s DMA software. However, compression is often provided natively by the backup tape or VTL device.
Snapshots and SmartDedupe
SmartDedupe will not deduplicate the data stored in a snapshot. However, snapshots can be created of deduplicated data. If a snapshot is taken of a deduplicated directory, and then the contents of that directory are modified, the shadow stores will be transferred to the snapshot over time. Because of this, more space will be saved on a cluster if deduplication is run prior to enabling snapshots.
If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, it will take time before the snapshot data is affected by deduplication. Newly created snapshots will contain deduplicated data, but older snapshots will not.
It is also good practice to revert a snapshot before running a deduplication job. Restoring a snapshot will cause many of the files on the cluster to be overwritten. Any deduplicated files are reverted back to normal files if they are overwritten by a snapshot revert. However, once the snapshot revert is complete, deduplication can be run on the directory again and the resulting space savings will persist on the cluster.
SmartLock and SmartDedupe
SmartDedupe is also fully compatible with OneFS SmartLock, Isilon’s data retention and compliance product. Dedupe delivers storage efficiency for immutable archives and write once, read many (or WORM) protected data sets.
SmartQuotas and SmartDedupe
OneFS SmartQuotas accounts for deduplicated files as if they consumed both shared and unshared data. From the quota side, deduplicated files appear no differently than regular files to standard quota policies. However, if the quota is configured to include data-protection overhead, the additional space used by the shadow store will not be accounted for by the quota.
SmartPools and SmartDedupe
SmartDedupe will not deduplicate files that span SmartPools node pools or tiers, or that have different protection levels set. This is to avoid potential performance or protection asymmetry which could occur if portions of a file live on different classes of storage.
InsightIQ and SmartDedupe
InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is integrated with SmartDedupe. Included in the data provided by the File Systems Analytics module is a report detailing the space savings efficiency delivered by deduplication.