In a precious article, we took a look at OneFS’ in-line dedupe functionality, the newest component of the in-line data reduction suite. To complement this, OneFS 8.2.1 provides six principle reporting methods for obtaining efficiency information with in-line data reduction:

 

  • Using the ‘isi statistics data-reduction’ CLI command
  • Via the ‘isi compression’ CLI command
  • Via the ‘isi dedupe’ CLI command and WebUI chart
  • From the ‘isi get -O’ CLI command
  • Configuring SmartQuotas reporting
  • OneFS WebUI Cluster Dashboard

 

Let's look at each of these in a bit more detail:

 

1)  The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:

 

339.50 / 192.87 = 1.76        Or a Data Reduction ratio of 1.76:1

 

Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:

 

339.50 / 350.13 = 0.97        Or an Efficiency ratio of 0.97:1


2)  From the OneFS CLI, the ‘isi compression stats’ command provides the option to either view or list compression statistics. When run in ‘view’ mode, the command returns the compression ratio for both compressed and all writes, plus the percentage of incompressible writes, for a prior five-minute (300 seconds) interval. For example:


# isi compression stats view

stats for 300 seconds at: 2018-12-14 11:30:06 (1544815806))

compression ratio for compressed writes:        1.28:1

compression ratio for all writes:               1.28:1

incompressible data percent:                    76.49%

total logical blocks:                           2681232

total physical blocks:                          2090963

writes for which compression was not attempted: 0.02%


Note that if the ‘incompressible data’ percentage is high in a mixed cluster, there’s a strong likelihood that the majority of writes are going to a non-F810 pool.

The ‘isi compression stats’ CLI command also accepts the ‘list’ argument, which consolidates a series of recent reports into a list of the compression activity across the file system. For example:


# isi compression stats list

Statistic    compression  overall       incompressible      logical       physical     compression

              ratio         ratio         %                    blocks blocks skip %

1544811740   3.07:1 3.07:1 10.59%        68598         22849         1.05%

1544812340   3.20:1 3.20:1 7.73%               4142          1293          0.00%

1544812640   3.14:1 3.14:1 8.24%               352           112           0.00%

1544812940   2.90:1 2.90:1 9.60%               354           122           0.00%

1544813240   1.29:1 1.29:1 75.23%        10839207     8402380       0.00%


The ‘isi compression stats’ data is used for calculating the right-hand side estimated ‘Cluster Data Reduction’ values in the ‘isi statistics data-reduction’ command described above. It also provides a count of logical and physical blocks and compression ratios, plus the percentage metrics for incompressible and skipped blocks.

The value in the ‘statistic’ column at the left of the table represents the epoch timestamp for each sample. This epoch value can be converted to a human readable form using the ‘date’ CLI command. For example:


# date -d <value>

 

3)  From the OneFS CLI, the ‘isi dedupe stats’ command provides cluster deduplication data usage and savings statistics, in both logical and physical terms. For example:

 

# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency. Similarly, the OneFS WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.

 

inline-dedupe2_1.png

 

Be aware that the deduplication statistics do not include zero block removal savings. Since zero block removal is technically not due to data deduplication it is tracked separately but is included as part of the overall data reduction ratio. 

 

Note that while OneFS 8.2.1 tracks statistics for how often zero blocks are removed, there is no current method to determine how much logical space is being saved by zero block elimination. Zero block report enhancement is planned for a future OneFS release.


4)  In addition to the ‘isi statistics data-reduction and isi compression commands, OneFS 8.2.1 also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:


# isi get –DDO file1

* Size:           167772160

* PhysicalBlocks: 10314

* LogicalSize:    167772160

PROTECTION GROUPS

lbn0: 6+2/2

2,11,589365248:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

2,4,691601408:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

Metatree logical blocks:

zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000

 

The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.

When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.

A file that for whatever reason cannot be compressed will be reported as such:


4,6,900382720:8192[INCOMPRESSIBLE]#1

 

5)  In OneFS 8.2.1, Isilon SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of compression and deduplication. On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.

 

inline-dedupe2_2.png

 

Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:

 

# isi quota quotas list

Type      AppliesTo  Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No -     -     - 2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1


To configure SmartQuotas for in-line data efficiency reporting, create a directory quota at the top-level file system directory of interest, for example /ifs. Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.


inline-dedupe2_3.png

 

The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there. Note that the quota directory efficiency ratio, and other statistics are not yet available via the platform API as of OneFS 8.2.1.

 

6)  In OneFS 8.2.1, the OneFS WebUI cluster dashboard now displays a storage efficiency tile, which show physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

inline-dedupe2_4.png

 

Be aware that, while all of the above storage efficiency tools are available on any cluster running OneFS 8.2.1, the in-line compression metrics will only be relevant for clusters containing F810 node pools.

 

It is challenging to broadly characterize the in-line dedupe performance overhead with any accuracy since it is dependent on various factors including the duplicity of the data set, whether matches are found against other LINs or SINs, etc. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of in-line dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.