Find Communities by: Category | Product

1 2 3 Previous Next

Isilon

270 Posts

The recent series of articles focused on data reduction spawned several questions around the mechanics of block sharing in OneFS. So it seemed like a good opportunity to explore this in a bit more depth.

Within OneFS, the shadow store is a class of system file that contains blocks which can be referenced by different file – thereby providing a mechanism that allows multiple files to share common data. Shadow stores were first introduced in OneFS 7.0, initially supporting Isilon file clones, and indeed there are many overlaps between cloning and deduplicating files. As we will see, a variant of shadow store is also used as a container for file packing in OneFS Small File Storage Efficiency (SFSE), often used in archive workflows such as healthcare PACS and DICOM systems.

Architecturally, each shadow store can contain up to 256 blocks, with each block able to be referenced by 32,000 files. If this 32KB reference limit is exceeded, a new shadow store is created. Additionally, shadow stores do not reference other shadow stores. All blocks within a shadow store must be either sparse or point at an actual data block. And snapshots of shadow stores are not allowed, since shadow stores have no hard links.

Shadow stores contain the physical addresses and protection for data blocks, just like normal file data. However, a fundamental difference between a shadow stores and a regular file is that the former doesn’t contain all the metadata typically associated with traditional file inodes. In particular, time-based attributes (creation time, modification time, etc) are explicitly not maintained.

 

Consider the shadow store information for a regular, undeduped file (file.orig):

 

# isi get -DDD file.orig | grep –i shadow

*  Shadow refs:        0

zero=36 shadow=0 ditto=0 prealloc=0 block=28

 

A second copy of this file (file.dup) is then created and then deduplicated:

 

# isi get -DDD file.* | grep -i shadow

*  Shadow refs:        28

zero=36 shadow=28 ditto=0 prealloc=0 block=0

*  Shadow refs:        28

zero=36 shadow=28 ditto=0 prealloc=0 block=0

 

As we can see, the block count of the original file has now become zero and the shadow count for both the original file and its copy is incremented to ‘28'. Additionally, if another file copy is added and deduplicated, the same shadow store info and count is reported for all three files. It’s worth noting that even if the duplicate file(s) are removed, the original file will still retain the shadow store layout.

 

Each shadow store has a unique identifier called a shadow inode number (SIN). But, before we get into more detail, here’s a table of useful terms and their descriptions:

 

Element

Description

Inode

Data structure that keeps track of all data and metadata (attributes, metatree blocks, etc.) for files and directories in OneFS

LIN

Logical Inode Number uniquely identifies each regular file in the filesystem.

LBN

Logical Block Number identifies the block offset for each block in a file

IFM Tree or Metatree

Encapsulates the on-disk and in-memory format of the inode. File data blocks are indexed by LBN in the IFM B-tree, or file metatree. This B-tree stores protection group (PG) records keyed by the first LBN. To retrieve the record for a particular LBN, the first key before the requested LBN is read. The retried record may or may not contain actual data block pointers.

IDI

Isi Data Integrity checksum. IDI checkcodes help avoid data integrity issues which can occur when hardware provides the wrong data, for example. Hence IDI is focused on the path to and from the drive and checkcodes are implemented per OneFS block.

Protection Group (PG)

A protection group encompasses the data and redundancy associated with a particular region of file data. The file data space is broken up into sections of 16 x 8KB blocks called stripe units. These correspond to the N in N+M notation; there are N+M stripe units in a protection group.

Protection Group Record

Record containing block addresses for a data stripe .There are five types of PG records: sparse, ditto, classic, shadow, and mixed. The IFM B-tree uses the B-tree flag bits, the record size, and an inline field to identify the five types of records.

BSIN

Base Shadow Store, containing cloned or deduped data

CSIN

Container Shadow Store, containing packed data (container or files).

SIN

Shadow Inode Number is a LIN for a Shadow Store, containing blocks that are referenced by different files; refers to a Shadow Store

Shadow Extent

Shadow extents contain a Shadow Inode Number (SIN), an offset, and a count.

Shadow extents are not included in the FEC calculation since protection is provided by the shadow store.

 

 

Blocks in a shadow store are identified with a SIN and LBN (logical block number).

 

# isi get -DD /ifs/data/file.dup | fgrep –A 4 –i “protection group”

PROTECTION GROUPS

       lbn 0: 4+2/2

4000:0001:0067:0009@0#64

               0,0,0:8192#32

 

A SIN is essentially a LIN that is dedicated to a shadow store file, and SINs are allocated from a subset of the LIN range. Just as every standard file is uniquely identified by a LIN, every shadow store is uniquely identified by a SIN. It is easy to tell if you are dealing with a shadow store because the SIN will begin with 4000. For example, in the output above:

4000:0001:0067:0009

 

Correspondingly, in the protection group (PG) they are represented as:

  • SIN
  • Block size
  • LBN
  • Run


The referencing protection group will not contain valid IDI data (this is with the file itself). FEC parity, if required, will be computed assuming a zero block.

When a file references data in a shadow store, it contains meta-tree records that point to the shadow store. This meta-tree record contains a shadow reference, which comprises a SIN and LBN pair that uniquely identifies a block in a shadow store.

A set of extension blocks within the shadow store holds the reference count for each shadow store data block. The reference count for a block is adjusted each time a reference is created or deleted from any other file to that block. If a shadow store block’s reference count drop to zero, it is marked as deleted, and the ShadowStoreDelete job, which runs periodically, deallocates the block.

Be aware that shadow stores are not directly exposed in the filesystem namespace. However, shadow stores and relevant statistics can be viewed using the ‘isi dedupe stats’, ‘isi_sstore list’ and ‘isi_sstore stats’ command line utilities.

Cloning

In OneFS, files can easily be cloned using the ‘cp –c’ command line utility. Shadow store(s) are created during the file cloning process, where the ownership of the data blocks is transferred from the source to the shadow store.


shadow_store_1.png

 

In some instances, data may be copied directly from the source to the newly created shadow stores. Cloning uses logical references to shadow stores, and the source and the destination data blocks refer to an offset in a shadow store. The source file’s protection group(s) are moved to a shadow store, and the PG is now referenced by both the source file and destination clone file. After cloning a file, both the source and the destination data blocks refer to an offset in a shadow store.

Dedupe

As we have seen in the recent blog articles, shadow Stores are also used for both OneFS in-line deduplication and post-process SmartDedupe. The principle difference with dedupe, as compared to cloning, is the process by which duplicate blocks are detected.

 

shadow_store_2.png

 

Since in-line dedupe and SmartDedupe use different hashing algorithms, the indexes for each are not shared directly. However, the work performed by each dedupe solution can be leveraged by each other.  For instance, if SmartDedupe writes data to a shadow store, when those blocks are read, the read hashing component of inline dedupe will see those blocks and index them.

SmartDedupe post process dedupe is compatible with in-line data reduction and vice versa. In-line compression is able to compress OneFS shadow stores. However, for SmartDedupe to process compressed data, the SmartDedupe job will have to decompress it first in order to perform deduplication, which is an addition resource overhead.

Currently neither SmartDedupe nor in-line dedupe are immediately aware of the duplicate matches that each other finds.  Both in-line dedupe and SmartDedupe could dedupe blocks containing the same data to different shadow store locations, but OneFS is unable to consolidate the shadow blocks together.  When blocks are read from a shadow store into L1 cache, they are hashed and added into the in-memory index where they can be used by in-line dedupe.

Unlike SmartDedupe, in-line dedupe can deduplicate a run of consecutive blocks to a single block in a shadow store. In contrast, the SmartDedupe job also has to spend more effort to ensure that contiguous file blocks are generally stored in adjacent blocks in the shadow store. If not, both read and degraded read performance may be impacted.

 

Small File Storage Efficiency

A class of specialized shadow stores are also used as containers for storage efficiency, allowing packing of small file into larger structures that can be FEC protected.

 

shadow_store_3.png

 

These shadow stores differ from regular shadow stores in that they are deployed as single-reference stores. Additionally, container shadow stores are also optimized to isolate fragmentation, support tiering, and live in a separate subset of ID space from regular shadow stores. (4080:xxxx:xxxx:xxxx).

The final article in this SmartDedupe series focuses on estimating the potential effectiveness of SmartDedupe on a particular dataset on a cluster.

To complement the actual SmartDedupe job, a dry-run Dedupe Assessment job is also provided to help estimate the amount of space savings that will be seen by running deduplication on a particular directory or set of directories. The dedupe assessment job reports a total potential space savings. The dedupe assessment does not differentiate the case of a fresh run from the case where a previous dedupe job has already done some sharing on the files in that directory. The assessment job does not provide the incremental differences between instances of this job. Isilon recommends that the user should run the assessment job once on a specific directory prior to starting an actual dedupe job on that directory.

The assessment job runs similarly to the actual dedupe job, but uses a separate configuration. It also does not require a product license and can be run prior to purchasing SmartDedupe in order to determine whether deduplication is appropriate for a particular data set or environment. This can be configured from the WebUI by browsing to File System > Deduplication > Settings and adding the desired directories(s) in the ‘Assess Deduplication’ section.

smartdedupe_assess_1.png


Alternatively, the following CLI syntax will achieve the same result:

# isi dedupe settings modify –add-assess-paths /ifs/data

Once the assessment paths are configured, the job can be run from either the CLI or WebUI. For example:

smartdedupe_assess_2.png


Or, from the CLI:

# isi job types list | grep –I assess

DedupeAssessment Yes      LOW 

# isi job jobs start DedupeAssessment

Once the job is running, it’s progress and be viewed by first listing the job to determine it’s job ID.

# isi job jobs list

ID   Type             State   Impact Pri  Phase  Running Time

---------------------------------------------------------------

919  DedupeAssessment Running Low     6    1/1 -

---------------------------------------------------------------

Total: 1

And then viewing the job ID as follows:

# isi job jobs view 919

ID: 919

Type: DedupeAssessment

State: Running

Impact: Low

Policy: LOW

Pri: 6

Phase: 1/1

       Start Time: 2020-03-21T21:59:26

     Running Time: 35s

Participants: 1, 2, 3

Progress: Iteration 1, scanning files, scanned 61 files, 9 directories, 4343277 blocks, skipped 304 files, sampled 271976 blocks, deduped 0 blocks, with 0 errors and 0 unsuccessful dedupe attempts

Waiting on job ID: -

Description: /ifs/data

The running job can also be controlled and monitored from the WebUI:

smartdedupe_assess_3.png


Under the hood, the dedupe assessment job uses a separate index table from the actual dedupe process. Plus, for the sake of efficiency, the assessment job also samples fewer candidate blocks than the main dedupe job, and obviously does not actually perform deduplication. This means that, often, the assessment will provide a slightly conservative estimate of the actual deduplication efficiency that’s likely to be achieved.

Using the sampling and consolidation statistics, the assessment job provides a report which estimates the total dedupe space savings in bytes. This can be viewed for the CLI using the following syntax:

# isi dedupe reports view 919

    Time: 2020-03-21T22:02:18

  Job ID: 919

Job Type: DedupeAssessment

Reports

        Time: 2020-03-21T22:02:18

     Results:

Dedupe job report:{

    Start time = 2020-Mar-21:21:59:26

    End time = 2020-Mar-21:22:02:15

    Iteration count = 2

    Scanned blocks = 9567123

    Sampled blocks = 383998

    Deduped blocks = 2662717

    Dedupe percent = 27.832

    Created dedupe requests = 134004

    Successful dedupe requests = 134004

Unsuccessful dedupe requests = 0

    Skipped files = 328

    Index entries = 249992

    Index lookup attempts = 249993

    Index lookup hits = 1

}

Elapsed time:                      169 seconds

Aborts:                              0

Errors:                              0

Scanned files:                      69

Directories:                        12

1 path:

/ifs/data

CPU usage:                         max 81% (dev 1), min 0% (dev 2), avg 17%

Virtual memory size:               max 341652K (dev 1), min 297968K (dev 2), avg 312344K

Resident memory size:              max 45552K (dev 1), min 21932K (dev 3), avg 27519K

Read:                              0 ops, 0 bytes (0.0M)

Write:                             4006510 ops, 32752225280 bytes (31235.0M)

Other jobs read:                   0 ops, 0 bytes (0.0M)

Other jobs write:                  41325 ops, 199626240 bytes (190.4M)

Non-JE read:                       1 ops, 8192 bytes (0.0M)

Non-JE write:                      22175 ops, 174069760 bytes (166.0M)

Or from the WebUI, by browsing to Cluster Management > Job Operations > Job Types:

smartdedupe_assess_4.png


As indicated, the assessment report for job # 919 in this case discovered the potential of 27.8% in data savings from deduplication.

Note that the SmartDedupe dry-run estimation job can be run without any licensing requirements, allowing an assessment of the potential space savings that a dataset might yield before making the decision to purchase a license for the full product.

Deduplication is a compromise, as with many things in life. In order to gain increased levels of storage efficiency, additional cluster resources (CPU, memory and disk IO) are utilized to find and execute the sharing of common data blocks.

Another important performance impact consideration with dedupe is the potential for data fragmentation. After deduplication, files that previously enjoyed contiguous on-disk layout will often have chunks spread across less optimal file system regions. This can lead to slightly increased latencies when accessing these files directly from disk, rather than from cache.

To help reduce this risk, SmartDedupe will not share blocks across node pools or data tiers, and will not attempt to deduplicate files smaller than 32KB in size. On the other end of the spectrum, the largest contiguous region that will be matched is 4MB.

Because deduplication is a data efficiency product rather than performance enhancing tool, in most cases the consideration will be around cluster impact management. This is from both the client data access performance front, since, by design, multiple files will be sharing common data blocks, and also from the dedupe job execution perspective, as additional cluster resources are consumed to detect and share commonality.

The first deduplication job run will often take a substantial amount of time to run, since it must scan all files under the specified directories to generate the initial index and then create the appropriate shadow stores. However, deduplication job performance will typically improve significantly on the second and subsequent job runs (incrementals), once the initial index and the bulk of the shadow stores have already been created.

If incremental deduplication jobs do take a long time to complete, this is most likely indicative of a data set with a high rate of change. If a deduplication job is paused or interrupted, it will automatically resume the scanning process from where it left off.

As mentioned previously, deduplication is a long running process that involves multiple job phases that are run iteratively. SmartDedupe typically processes around 1TB of data per day, per node.

Deduplication can significantly increase the storage efficiency of data. However, the actual space savings will vary depending on the specific attributes of the data itself. As mentioned above, the deduplication assessment job can be run to help predict the likely space savings that deduplication would provide on a given data set.

For example, virtual machines files often contain duplicate data, much of which is rarely modified. Deduplicating similar OS type virtual machine images (VMware VMDK files, etc, that have been block-aligned) can significantly decrease the amount of storage space consumed. However, as noted previously, the potential for performance degradation as a result of block sharing and fragmentation should be carefully considered first.

Isilon SmartDedupe does not deduplicate across files that have different protection settings. For example, if two files share blocks, but file1 is parity protected at +2:1, and file2 has its protection set at +3, SmartDedupe will not attempt to deduplicate them. This ensures that all files and their constituent blocks are protected as configured. Additionally, SmartDedupe won’t deduplicate files that are stored on different node pools. For example, if file1 and file2 are stored on tier 1 and tier 2 respectively, and tier1 and tier2 are both protected at 2:1, OneFS won’t deduplicate them. This helps guard against performance asynchronicity, where some of a file’s blocks could live on a different tier, or class of storage, than others.

OneFS performance resource management provides statistics for the resources used by jobs - both cluster-wide and per-node. This information is provided via the ‘isi statistics workload’ CLI command. Available in a ‘top’ format, this command displays the top jobs and processes, and periodically updates the information.

For example, the following syntax shows, and indefinitely refreshes, the top five processes on a cluster:


# isi statistics workload --limit 5 –format=top

last update:  2019-01-23T16:45:25 (s)ort: default

CPU  Reads Writes    L2   L3   Node SystemName      JobType

  1. 1.4s 9.1k 0.0        3.5k 497.0 2    Job:  237       IntegrityScan[0]
  2. 1.2s 85.7 714.7      4.9k 0.0  1    Job:  238       Dedupe[0]
  3. 1.2s 9.5k 0.0        3.5k 48.5 1    Job:  237       IntegrityScan[0]
  4. 1.2s 7.4k 541.3      4.9k 0.0  3    Job: 238        Dedupe[0]
  5. 1.1s 7.9k 0.0        3.5k 41.6 2    Job:  237       IntegrityScan[0]

    

From the output, we can see that two job engine jobs are in progress: Dedupe (job ID 238), which runs at low impact and priority level 4 is contending with IntegrityScan (job ID 237), which runs by default at medium impact and priority level 1.

The resource statistics tracked per job, per job phase, and per node include CPU, reads, writes, and L2 & L3 cache hits. Unlike the output from the ‘top’ command, this makes it easier to diagnose individual job resource issues, etc.

Below are some examples of typical space reclamation levels that have been achieved with SmartDedupe.

Be aware that these dedupe space savings values are provided solely as rough guidance. Since no two data sets are alike (unless they’re replicated), actual results can vary considerably from these examples.

 

Workflow / Data Type

Typical Space Savings

Virtual Machine Data

35%

Home Directories / File Shares

25%

Email Archive

20%

Engineering Source Code

15%

Media Files

10%

 

SmartDedupe is included as a core component of Isilon OneFS but requires a valid product license key in order to activate. This license key can be purchased through your Isilon account team. An unlicensed cluster will show a SmartDedupe warning until a valid product license has been purchased and applied to the cluster.

License keys can be easily added via the ‘Activate License’ section of the OneFS WebUI, accessed by navigating via Cluster Management > Licensing.

For optimal cluster performance, observing the following SmartDedupe best practices is recommended.


  • Deduplication is most effective when applied to data sets with a low rate of change – for example, archived data.
  • Enable SmartDedupe to run at subdirectory level(s) below /ifs.
  • Avoid adding more than ten subdirectory paths to the SmartDedupe configuration policy,
  • SmartDedupe is ideal for home directories, departmental file shares and warm and cold archive data sets.
  • Run SmartDedupe against a smaller sample data set first to evaluate performance impact versus space efficiency.
  • Schedule deduplication to run during the cluster’s low usage hours – i.e. overnight, weekends, etc.
  • After the initial dedupe job has completed, schedule incremental dedupe jobs to run every two weeks or so, depending on the size and rate of change of the dataset.
  • Always run SmartDedupe with the default ‘low’ impact Job Engine policy.
  • Run the dedupe assessment job on a single root directory at a time. If multiple directory paths are assessed in the same job, you will not be able to determine which directory should be deduplicated.
  • When replicating deduplicated data, to avoid running out of space on target, it is important to verify that the logical data size (i.e. the amount of storage space saved plus the actual storage space consumed) does not exceed the total available space on the target cluster.
  • Run a deduplication job on an appropriate data set prior to enabling a snapshots schedule.
  • Where possible, perform any snapshot restores (reverts) before running a deduplication job. And run a dedupe job directly after restoring a prior snapshot version.


With dedupe, there’s always trade-off between cluster resource consumption (CPU, memory, disk), the potential for data fragmentation and the benefit of increased space efficiency. Therefore, SmartDedupe is not ideally suited for heavily trafficked data, or high performance workloads.


  • Depending on an application’s I/O profile and the effect of deduplication on the data layout, read and write performance and overall space savings can vary considerably.
  • SmartDedupe will not permit block sharing across different hardware types or node pools to reduce the risk of performance asymmetry.
  • SmartDedupe will not share blocks across files with different protection policies applied.
  • OneFS metadata, including the deduplication index, is not deduplicated.
  • Deduplication is a long running process that involves multiple job phases that are run iteratively.
  • SmartDedupe will not attempt to deduplicate files smaller than 32KB in size.
  • Dedupe job performance will typically improve significantly on the second and subsequent job runs, once the initial index and the bulk of the shadow stores have already been created.
  • SmartDedupe will not deduplicate the data stored in a snapshot. However, snapshots can certainly be created of deduplicated data.
  • If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, it will take time before the snapshot data is affected by deduplication. Newly created snapshots will contain deduplicated data, but older snapshots will not.


SmartDedupe is one of several components of OneFS that enable Isilon to deliver a very high level of raw disk utilization. Another major storage efficiency attribute is the way that OneFS natively manages data protection in the file system. Unlike most file systems that rely on hardware RAID, OneFS protects data at the file level and, using software-based erasure coding, allows most customers to enjoy raw disk space utilization levels in the 80% range or higher. This is in contrast to the industry mean of around 50-60% raw disk capacity utilization. SmartDedupe serves to further extend this storage efficiency headroom, bringing an even more compelling and demonstrable TCO advantage to primary file based storage.

SmartDedupe post process dedupe is compatible with in-line data reduction and vice versa. In-line compression is able to compress OneFS shadow stores. However, for SmartDedupe to process compressed data, the SmartDedupe job will have to decompress it first in order to perform deduplication, which is an addition resource overhead.


Currently neither SmartDedupe nor in-line dedupe are immediately aware of the duplicate matches that each other finds.  Both in-line dedupe and SmartDedupe could dedupe blocks containing the same data to different shadow store locations, but OneFS is unable to consolidate the shadow blocks together.  When blocks are read from a shadow store into L1 cache, they are hashed and added into the in-memory index where they can be used by in-line dedupe.


Inline Dedupe

SmartDedupe

Globally enabled

Directory tree based

Will process small files

Skips files < 32KB (by default)

Will dedupe sequential runs of blocks of same data to single blocks

Can only dedupe between files

Per-node, non-persistent in-memory index

Large persistent on-disk index

Can convert copy operations to clone

Post process only

Opportunistic

Exhaustive






Unlike SmartDedupe, in-line dedupe can deduplicate a run of consecutive blocks to a single block in a shadow store. However, typically it is best to avoid running SmartDedupe on node pools that are already utilizing in-line dedupe.

As we saw in the previous article in this series, SmartDedupe operates at the directory level, targeting all files and directories underneath one or more root directories.

SmartDedupe not only deduplicates identical blocks in different files, it also matches and shares identical blocks within a single file. For two or more files to be deduplicated, the two following attributes must be the same:

  • Disk pool policy ID
  • Protection policy

If either of these attributes differs between two or more matching files, their common blocks will not be shared. SmartDedupe also does not deduplicate files that are less than 32 KB or smaller, because the resource consumption overhead outweighs the small storage efficiency benefit.

There are two principal elements to managing deduplication in OneFS. The first is the configuration of the SmartDedupe process itself. The second involves the scheduling and execution of the Dedupe job. These are both described below.

SmartDedupe works on data sets which are configured at the directory level, targeting all files and directories under each specified root directory. Multiple directory paths can be specified as part of the overall deduplication job configuration and scheduling.

smartdedupe_1.png

Similarly, the dedupe directory paths can also be configured from the CLI via the isi dedupe settings modify command. For example, the following command targets /ifs/data and /ifs/home for deduplication:

# isi dedupe settings modify --paths /ifs/data, /ifs/home.

 

Bear in mind that the permissions required to configure and modify deduplication settings are separate from those needed to run a deduplication job. For example, a user’s role must have job engine privileges to run a deduplication job. However, in order to configure and modify dedupe configuration settings, they must have the deduplication role privileges.

SmartDedupe can be run either on-demand (started manually) or via a predefined schedule. This is configured via the cluster management ‘Job Operations’ section of the WebUI.

smartdedupe_2.png

The recommendation is to schedule and run deduplication during off-hours, when the rate of data change on the cluster is low. If clients are continually writing to files, the amount of space saved by deduplication will be minimal because the deduplicated blocks are constantly being removed from the shadow store.

To modify the parameters of the dedupe job itself, run the isi job types modify command. For example, the following command configures the deduplication job to be run every Saturday at 12:00 AM:

# isi job types modify Dedupe --schedule "Every Saturday at 12:00 AM"


For most clusters, after the initial deduplication job has completed, the recommendation is to run an incremental deduplication job once every two weeks.

The amount of disk space currently saved by SmartDedupe can be determined by viewing the cluster capacity usage chart and deduplication reports summary table in the WebUI. The cluster capacity chart and deduplication reports can be found by navigating to File System Management > Deduplication > Summary.

smartdedupe_3.png

In addition to the bar chart and accompanying statistics (above), which graphically represents the data set and space efficiency in actual capacity terms, the dedupe job report overview field also displays the SmartDedupe savings as a percentage.

SmartDedupe space efficiency metrics are also provided via the ‘isi dedupe stats’ CLI command:

# isi dedupe stats

Cluster Physical Size: 676.8841T

Cluster Used Size: 236.3181T

Logical Size Deduplicated: 29.2562T

Logical Saving: 25.5125T

Estimated Size Deduplicated: 42.5774T

Estimated Physical Saving: 37.1290T

 

Since OneFS 8.2.1, SmartQuotas now reports the capacity saving from deduplication, and data reduction in general, as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of compression and deduplication. On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.

smartdedupe_3-1.png

Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:


# isi quota quotas list

Type      AppliesTo  Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No -     -     - 2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:

 

# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

                   Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

         Physical(With Overhead): 1.80T

           Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1

 

To configure SmartQuotas for data efficiency reporting, create a directory quota at the top-level file system directory of interest, for example /ifs. Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.

 

smartdedupe_3-2.png

 

The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of SmartDedupe plus in-line data reduction. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there. via the platform API.


The OneFS WebUI cluster dashboard also now displays a storage efficiency tile, which shows physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

smartdedupe_3-3.png

 

The Job Engine parallel execution framework provides comprehensive run time and completion reporting for the deduplication job.

Once the dedupe job has started working on a directory tree, the resulting space savings it achieves can be monitored in real time. While SmartDedupe is underway, job status is available at a glance via the progress column in the active jobs table. This information includes the number of files, directories and blocks that have been scanned, skipped and sampled, and any errors that may have been encountered.

Additional progress information is provided in an Active Job Details status update, which includes an estimated completion percentage based on the number of logical inodes (LINs) that have been counted and processed.

smartdedupe_4.png

 

Once the SmartDedupe job has run to completion, or has been terminated, a full dedupe job report is available. This can be accessed from the WebUI by navigating to Cluster Management > Job Operations > Job Reports, and selecting ‘View Details’ action button on the desired Dedupe job line item.

smartdedupe_5.png

 

 

The job report contains the following relevant dedupe metrics.

 

 

Report Field

Description of Metric

Start time

When the dedupe job started.

End time

When the dedupe job finished.

Scanned blocks

Total number of blocks scanned under configured path(s).

Sampled blocks

Number of blocks that OneFS created index entries for.

Created dedupe requests

Total number of dedupe requests created. A dedupe request gets created for each matching pair of data blocks. For example, three data blocks all match, two requests are created: One request to pair file1 and file2 together, the other request to pair file2 and file3 together.

Successful dedupe requests

Number of dedupe requests that completed successfully.

Failed dedupe requests

Number of dedupe requests that failed. If a dedupe request fails, it does not mean that the also job failed. A deduplication request can fail for any number of reasons. For example, the file might have been modified since it was sampled.

 

Skipped files

Number of files that were not scanned by the deduplication job. The primary reason is that the file has already been scanned and hasn’t been modified since. Another reason for a file to be skipped is if it’s less than 32KB in size. Such files are considered too small and don’t provide enough space saving benefit to offset the fragmentation they will cause.

Index entries

Number of entries that currently exist in the index.

Index lookup attempts

Cumulative total number of lookups that have been done by prior and current deduplication jobs. A lookup is when the deduplication job attempts to match a block that has been indexed with a block that hasn’t been indexed.

Index lookup hits

Total number of lookup hits that have been done by earlier deduplication jobs plus the number of lookup hits done by this deduplication job. A hit is a match of a sampled block with a block in index.

 

smartdedupe_6.png


Dedupe job reports are also available from the CLI via the ‘ isi job reports view <job_id> ’ command.

From an execution and reporting stance, the Job Engine considers the ‘dedupe’ job to comprise of a single process or phase. The Job Engine events list will report that Dedupe Phase1 has ended and succeeded. This indicates that an entire SmartDedupe job, including all four internal dedupe phases (sampling, duplicate detection, block sharing, & index update), has successfully completed. For example:

# isi job events list --job-type dedupe

Time                Message

------------------------------------------------------

2020-03-01T13:39:32 Dedupe[1955] Running

2020-03-01T13:39:32 Dedupe[1955] Phase 1: begin dedupe

2020-03-01T14:20:32 Dedupe[1955] Phase 1: end dedupe

2020-03-01T14:20:32 Dedupe[1955] Phase 1: end dedupe

2020-03-01T14:20:32 Dedupe[1955] Succeeded

 

For deduplication reporting across multiple OneFS clusters, SmartConnect is also integrated with Isilon’s InsightIQ cluster reporting and analysis product. A report detailing the space savings delivered by deduplication is available via InsightIQ’s File Systems Analytics module.

The last few articles have focused on in-line data reduction. Now, by popular demand and in a related vein, we’ll turn our attention to the next component in OneFS’ storage efficiency tools: SmartDedupe.

 

OneFS SmartDedupe maximizes the storage efficiency of a cluster by decreasing the amount of physical storage required to house an organization’s data. Efficiency is achieved by scanning the on-disk data for identical blocks and then eliminating the duplicates. This approach is commonly referred to as post-process, or asynchronous, deduplication.  Storage efficiency is achieved by scanning the data for identical blocks as it is received and then eliminating the duplicates.

 

This is in contrast to in-line dedupe (described in a previous post), which performs deduplication in real time as data is written to the cluster. The following table compares and contrasts some of the key attributes of the two approaches:


In-line Dedupe

SmartDedupe

F810 and H5600 node support only

All node types.

Globally enabled

Directory tree based

CitiHash-128 algorithm

SHA-1 algorithm

Will process small files (16KB and up)

Skips files < 32KB (by default)

Will dedupe sequential runs of blocks of same data to single blocks

Can only dedupe between files

Per-node, non-persistent in-memory index

Large persistent on-disk index

Can convert copy operations to clone

Post process only

Opportunistic

Exhaustive

 

 

 













A fundamental difference is in the underlying algorithm used by each: SmartDedupe uses SHA-1 hashing, in contrast to OneFS in-line deduplication which employs 128-bit CityHash. Since they both use different hashing algorithms, the indexes for each are not shared directly. However, the work performed by each dedupe solution can be leveraged by each other.  For instance, if SmartDedupe writes data to a shadow store, when those blocks are read, the read hashing component of inline dedupe will see those blocks and index them.


With post-process deduplication, new data is first stored on the storage device and then a subsequent process analyzes the data looking for commonality. This means that the initial file-write or modify performance is not impacted, since no additional computation is required in the write path. As such, SmartDedupe can be run on more capacity-oriented nodes in a lower data tier, whereas in-line data reduction is constrained to the higher performance F810 and H5600 nodes.

 

Currently neither SmartDedupe nor in-line dedupe are immediately aware of the duplicate matches that each other finds.  Both in-line dedupe and SmartDedupe could dedupe blocks containing the same data to different shadow store locations, but OneFS is unable to consolidate the shadow blocks together.  When blocks are read from a shadow store into L1 cache, they are hashed and added into the in-memory index where they can be used by in-line dedupe.

 

Anyway, back to SmartDedupe… In essence, SmartDedupe helps to maximize the storage efficiency of a cluster by decreasing the amount of physical storage required to house any given dataset. Efficiency is achieved by scanning the on-disk data for identical blocks and then eliminating the duplicates. This approach is commonly referred to as post-process, or asynchronous, deduplication.

 

On discovering duplicate blocks, SmartDedupe moves a single copy of those blocks to a special set of files known as shadow stores. During this process, duplicate blocks are removed from the actual files and replaced with pointers to the shadow stores.

With post-process deduplication, new data is first stored on the storage device and then a subsequent process analyzes the data looking for commonality. This means that initial file write or modify performance is not impacted, since no additional computation is required in the write path.

 

Architecturally, SmartDedupe is comprised of five principle components:

 

  • Deduplication Control Path
  • Deduplication Job
  • Deduplication Engine
  • Shadow Store
  • Deduplication Infrastructure

 

The SmartDedupe job itself is a highly distributed background process that orchestrates deduplication across all the nodes in the cluster. Job control encompasses file system scanning, detection and sharing of matching data blocks, in concert with the Deduplication Engine.


dedup_1-2.png

 

The SmartDedupe control path is the user interface portion, comprising the OneFS WebUI, command line interface and platform API, and is responsible for managing the configuration, scheduling and control of the deduplication job.

 

SmartDedupe works on data sets which are configured at the directory level, targeting all files and directories under each specified root directory. Multiple directory paths can be specified as part of the overall deduplication job configuration and scheduling. By design, the deduplication job will automatically ignore (not deduplicate) the reserved cluster configuration information located under the /ifs/.ifsvar/ directory, and also any file system snapshots.

 

It’s worth noting that the RBAC permissions required to configure and modify the deduplication settings are separate from those needed to actually run a deduplication job. For example, a user’s role must have job engine privileges to run a deduplication job. However, in order to configure and modify dedupe configuration settings, they must have the deduplication role privileges.

 

One of the most fundamental components of OneFS SmartDedupe, and deduplication in general, is ‘fingerprinting’. In this part of the deduplication process, unique digital signatures, or fingerprints, are calculated using the SHA-1 hashing algorithm, one for each 8KB data block in the sampled set.

 

When SmartDedupe runs for the first time, it scans the data set and selectively samples blocks from it, creating the fingerprint index. This index contains a sorted list of the digital fingerprints, or hashes, and their associated blocks. After the index is created, the fingerprints are checked for duplicates. When a match is found, during the sharing phase, a byte-by-byte comparison of the blocks is performed to verify that they are absolutely identical and to ensure there are no hash collisions. Then, if they are determined to be identical, the block’s pointer is updated to the already existing data block and the new, duplicate data block is released.

 

Hash computation and comparison is only utilized during the sampling phase. For the actual block sharing phase, full data comparison is employed. SmartDedupe also operates on the premise of variable length deduplication, where the block matching window is increased to encompass larger runs of contiguous matching blocks.

 

OneFS shadow stores are file system containers that allow data to be stored in a shareable manner. This allows files to contain both physical data and pointers, or references, to shared blocks in shadow stores. Shadow stores were originally introduced into OneFS to support file clones, and there are many overlaps between cloning and deduplicating files. Shadow stores are also used for in-line dedupe and file packing in OneFS SFSE (Small File Storage Efficiency).

 

Shadow stores are similar to regular files, but typically don’t contain all the metadata typically associated with regular file inodes. In particular, time-based attributes (creation time, modification time, etc) are explicitly not maintained.

 

For example, consider the shadow store information for a regular, undeduped file (file.orig):

 

# isi get -DDD file.orig | grep –i shadow

*  Shadow refs:        0

         zero=36 shadow=0 ditto=0 prealloc=0 block=28

 

A second copy of this file (file.dup) is then created and then deduplicated:

 

# isi get -DDD file.* | grep -i shadow

*  Shadow refs:        28

         zero=36 shadow=28 ditto=0 prealloc=0 block=0

*  Shadow refs:        28

         zero=36 shadow=28 ditto=0 prealloc=0 block=0

 

As we can see, the block count of the original file has now become zero and the shadow count for both the original file and it’s and copy is incremented to ‘28'. Additionally, if another file copy is added and deduplicated, the same shadow store info and count is reported for all three files.

 

It’s worth noting that, even if duplicate file(s) are removed, the original file still retains the shadow store layout.

 

Each shadow store can contain up to 256 blocks, with each block able to be referenced by 32,000 files. If this 32KB reference limit is exceeded, a new shadow store is created. Additionally, shadow stores do not reference other shadow stores. All blocks within a shadow store must be either sparse or point at an actual data block. And snapshots of shadow stores are not allowed, since shadow stores have no hard links.

 

Deduplication is performed in parallel across the cluster by the OneFS Job Engine via a dedicated deduplication job, which distributes worker threads across all nodes. This distributed work allocation model allows SmartDedupe to scale linearly as an Isilon cluster grows and additional nodes are added.

 

The control, impact management, monitoring and reporting of the deduplication job is performed by the Job Engine in a similar manner to other storage management and maintenance jobs on the cluster.

 

dedup_2.png

dedup_3.png


While deduplication can run concurrently with other cluster jobs, only a single instance of the deduplication job, albeit with multiple workers, can run at any one time. Although the overall performance impact on a cluster is relatively small, the deduplication job does consume CPU and memory resources.

 

Architecturally, the duplication job, and supporting dedupe infrastructure, are made up of the following four phases:

 

dedup_4-1.png

 

Because the SmartDedupe job is typically long running, each of the phases are executed for a set time period, performing as much work as possible before yielding to the next phase. When all four phases have been run, the job returns to the first phase and continues from where it left off. Incremental dedupe job progress tracking is available via the OneFS Job Engine reporting infrastructure.

 

Phase 1 - Sampling

 

In the sampling phase, SmartDedupe performs a tree-walk of the configured data set in order to collect deduplication candidates for each file.

 

dedup_5.png

 

The rational is that a large percentage of shared blocks can be detected with only a smaller sample of data blocks represented in the index table. By default, the sampling phase selects one block from every sixteen blocks of a file as a deduplication candidate. For each candidate, a key/value pair consisting of the block’s fingerprint (SHA-1 hash) and file system location (logical inode number and byte offset) is inserted into the index. Once a file has been sampled, the file is flagged and won’t be re-scanned until it has been modified. This drastically improves the performance of subsequent deduplication jobs.

 

Phase 2 – Duplicate Detection

 

During the duplicate detection phase, the dedupe job scans the index table for fingerprints (or hashes) that match those of the candidate blocks.

 

dedup_6.png

 

If the index entries of two files match, a request entry is generated.  In order to improve deduplication efficiency, a request entry also contains pre and post limit information. This information contains the number of blocks in front of and behind the matching block which the block sharing phase should search for a larger matching data chunk, and typically aligns to a OneFS protection group’s boundaries.

 

 

Phase 3 – Block Sharing

 

For the block sharing phase the deduplication job calls into the shadow store library and dedupe infrastructure to perform the sharing of the blocks.

 

dedup_7.png

 

Multiple request entries are consolidated into a single sharing request, which is processed by the block sharing phase, and ultimately results in the deduplication of the common blocks. The file system searches for contiguous matching regions before and after the matching blocks in the sharing request; if any such regions are found, they will also be shared. Blocks are shared by writing the matching data to a common shadow store and creating references from the original files to this shadow store.

 

Phase 4 – Index Update

 

The index table is populated with the sampled and matching block information gathered during the previous three phases. After a file has been scanned by the dedupe job, OneFS may not find any matching blocks in other files on the cluster. Once a number of other files have been scanned, if a file continues to not share any blocks with other files on the cluster, OneFS will remove the index entries for that file. This helps prevent OneFS from wasting cluster resources searching for unlikely matches. SmartDedupe scans each file in the specified data set once, after which the file is marked, preventing subsequent dedupe jobs from re-scanning the file until it has been modified.

 

SmartDedupe post process deduplication is compatible with in-line data compression and vice versa. In-line compression is able to compress OneFS shadow stores. However, for SmartDedupe to process compressed data, the SmartDedupe job will have to decompress it first in order to perform deduplication, which is an addition resource overhead.

For the final article in this in-line data reduction series, we’ll turn our attention to efficiency estimation tools.


Firstly, OneFS includes a dry-run Dedupe Assessment job to help estimate the amount of space savings that will be seen on a dataset. Run against a specific directory or set of directories on a cluster, the dedupe assessment job reports a total potential space savings. The job uses its own separate configuration, does not require a product license, and can be run prior to purchasing F810 or H5600 hardware to determine whether deduplication is appropriate for a particular data set or environment.

 

inline-dedupe4_1.png

 

The dedupe assessment job uses a separate index table to both in-line dedupe and SmartDedupe. For efficiency, the assessment job also samples fewer candidate blocks and does not actually perform deduplication. Using the sampling and consolidation statistics, the job provides a report which estimates the total dedupe space savings in bytes.


inline-dedupe4_2.png


The dedupe assessment job can also be run from the OneFS command line (CLI):


# isi job jobs start DedupeAssessment


Alternatively, in-line deduplication can be enabled in assessment mode:


# isi dedupe inline settings modify –mode assess


One the job has completed, review the following three metrics from each node:


# sysctl efs.sfm.inline_dedupe.stats.zero_block

# sysctl efs.sfm.inline_dedupe.stats.dedupe_block

# sysctl efs.sfm.inline_dedupe.stats.write_block

 

The formula to calculate the estimated dedupe rate from these statistics is:


dedupe_block / write_block * 100 = dedupe%


Note that the dedupe assessment does not differentiate the case of a fresh run from the case where a previous SmartDedupe job has already performed some sharing on the files in that directory. Isilon recommends that the user should run the assessment job once on a specific directory, since it does not provide incremental differences between instances of the job.


Similarly, the Dell Live Optics Dossier utility can be used to estimate the potential benefits of Isilon’s in-line data compression on a data set. Dossier is available for Windows and has no dependency on an Isilon cluster. This makes it useful for analyzing and estimating efficiency across real data in situ, without the need for copying data onto a cluster. The Dossier tool operates in three phases:


Dossier Phase

Description

Discovery

Users manually browse and select root folders on the local host to analyze.

Collection

Once the paths to folders have been selected, Dossier will begin walking the file system trees for the target folders. This process will likely take up to several hours for large file systems. Walking the file system has a similar impact to a malware/anti-virus scan in terms of the CPU, memory, and disk resources that will be utilized during the collection. A series of customizable options allow the user to deselect more invasive operations and govern the CPU and memory resources allocated to the Dossier collector.

Reporting

Users upload the resulting .dossier file to create a PowerPoint report.


To obtain a Live Optics Dossier report, first download, extract and run the Dossier collector. Local and remote UNC paths can be added for scanning. Ensure you are authenticated to the desired UNC path before adding it to Dossier’s ‘custom paths’ configuration. Be aware that the Dossier compression option only processes the first 64KB of each file to determine its compressibility. Additionally, the default configuration samples only 5% of the dataset, but this is configurable with a slider. Increasing this value improves the accuracy of the estimation report, albeit at the expense of extended job execution time.

 

inline-dedupe4_3.png

 

The compressibility scan executes rapidly, with minimal CPU and memory resource consumption. It also provides thread and memory usage controls, progress reporting, and a scheduling option to allow throttling of scanning during heavy usage windows, etc.


When the scan is complete, a ‘*.dossier’ file is generated. This file is then uploaded to the Live Optics website:

 

inline-dedupe4_4.png

 

Once uploaded and processed, a PowerPoint report is generated in real time and delivered via email.

 

inline-dedupe4_5.png


Compression reports are easy to comprehend. If multiple SMB shares or paths are scanned, a summary is generated at the beginning of the report, followed by the details of each individually selected path.


Live Optics Dossier can be found at URL https://app.liveoptics.com/tools/dossier


Documentation is at:           https://support.liveoptics.com/hc/en-us/articles/229590207-Dossier-User-Guide


When running the Live Optics Dossier tool, please keep the following considerations in mind. Doesn’t provide exactly the same algorithm as the OneFS hardware in-line compression. It also looks at the software compression, not the hardware compression. So actual results will generally be better than Dossier report. There will be some data for which Dossier overestimates compression, for example with files whose first blocks are significantly more compressible than later blocks. It is intended to be run against any SMB shares on any storage array or DAS and has no NFS export support.


The Dossier tool can also take a significant amount of time to run against a large data set. By default, it only samples a portion (first 64KB) of the data, so results can be inaccurate. Dossier only provides the size of the uncompressed and compressed data. It does not provide performance estimates of different compression algorithms. It doesn’t attempt to compress files with certain known extensions which are generally uncompressible.

In the precious article, we took a look at OneFS’ in-line data reduction functionality. To complement this, OneFS 8.2.2 provides six principle reporting methods for obtaining efficiency information with in-line data reduction:

  • Using the ‘isi statistics data-reduction’ CLI command
  • Via the ‘isi compression’ CLI command
  • Via the ‘isi dedupe’ CLI command and WebUI chart
  • From the ‘isi get -O’ CLI command
  • Configuring SmartQuotas reporting
  • OneFS WebUI Cluster Dashboard Storage Efficiency Summary

 

1.     Isi Statistics Data-reduction Command:


The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

 

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:

 

339.50 / 192.87 = 1.76        Or a Data Reduction ratio of 1.76:1

 

Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:

 

339.50 / 350.13 = 0.97        Or an Efficiency ratio of 0.97:1

 

2.     Isi Compression Stats Command:


From the OneFS CLI, the ‘isi compression stats’ command provides the option to either view or list compression statistics. When run in ‘view’ mode, the command returns the compression ratio for both compressed and all writes, plus the percentage of incompressible writes, for a prior five-minute (300 seconds) interval. For example:

 

# isi compression stats view

stats for 300 seconds at: 2018-12-14 11:30:06 (1544815806))

compression ratio for compressed writes:        1.28:1

compression ratio for all writes:               1.28:1

incompressible data percent:                    76.49%

total logical blocks:                           2681232

total physical blocks:                          2090963

writes for which compression was not attempted: 0.02%

If the ‘incompressible data’ percentage is high in a mixed cluster, there’s a strong likelihood that the majority of writes are going to a non-F810 pool.

The ‘isi compression stats’ CLI command also accepts the ‘list’ argument, which consolidates a series of recent reports into a list of the compression activity across the file system. For example:


# isi compression stats list

Statistic    compression  overall       incompressible      logical       physical     compression

              ratio         ratio         %                    blocks blocks skip %

1544811740   3.07:1 3.07:1 10.59%        68598         22849         1.05%

1544812340   3.20:1 3.20:1 7.73%               4142          1293          0.00%

1544812640   3.14:1 3.14:1 8.24%               352           112           0.00%

1544812940   2.90:1 2.90:1 9.60%               354           122           0.00%1544813240   1.29:1 1.29:1 75.23%        10839207     8402380       0.00%


The ‘isi compression stats’ data is used for calculating the right-hand side estimated ‘Cluster Data Reduction’ values in the ‘isi statistics data-reduction’ command described above. It also provides a count of logical and physical blocks and compression ratios, plus the percentage metrics for incompressible and skipped blocks.

The value in the ‘statistic’ column at the left of the table represents the epoch timestamp for each sample. This epoch value can be converted to a human readable form using the ‘date’ CLI command. For example:

# date -d 1544813240

Fri Dec 14 11:31:34 PST 2018

 

3.     Isi Dedupe Stats Command and WebUI chart:

 

From the OneFS CLI, the ‘isi dedupe stats’ command provides cluster deduplication data usage and savings statistics, in both logical and physical terms. For example:

 

# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency. Similarly, the OneFS WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.

 

 

 

The deduplication statistics do not include zero block removal savings. Since zero block removal is technically not due to data deduplication it is tracked separately but is included as part of the overall data reduction ratio. 

 

Note that while OneFS 8.2.2 tracks statistics for how often zero blocks are removed, there is no current method to determine how much logical space is being saved by zero block elimination. Zero block report enhancement is planned for a future OneFS release.

 

4.     Isi Get Overlay Statistics:


In addition to the ‘isi statistics data-reduction and isi compression commands, OneFS 8.2.2 also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:


# isi get –DDO file1

* Size:           167772160

* PhysicalBlocks: 10314

* LogicalSize:    167772160

PROTECTION GROUPS

lbn0: 6+2/2

2,11,589365248:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

2,4,691601408:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

Metatree logical blocks:

zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000

 

The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.


When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.


A file that for whatever reason cannot be compressed will be reported as such:


4,6,900382720:8192[INCOMPRESSIBLE]#1

 

5.     SmartQuotas Data Reduction Efficiency Reporting

In OneFS 8.2.2, Isilon SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of compression and deduplication. On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.


inline-dedupe2_2.png

 

Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:

 

# isi quota quotas list

Type      AppliesTo  Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No -     -     - 2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1


To configure SmartQuotas for in-line data efficiency reporting, create a directory quota at the top-level file system directory of interest, for example /ifs. Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


inline-dedupe2_3.png


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.

.

The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there. Note that the quota directory efficiency ratio, and other statistics are not yet available via the platform API as of OneFS 8.2.2.

 

6.     OneFS WebUI Cluster Dashboard Storage Efficiency Summary

In OneFS 8.2.2, the OneFS WebUI cluster dashboard now displays a storage efficiency tile, which show physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

inline-dedupe2_4.png

 

Be aware that, while all of the above storage efficiency tools are available on any cluster running OneFS 8.2.2, the in-line compression metrics will only be relevant for clusters containing F810 and/or H5600 node pools.


It is challenging to broadly characterize the in-line dedupe performance overhead with any accuracy since it is dependent on various factors including the duplicity of the data set, whether matches are found against other LINs or SINs, etc. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of in-line dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.

trimbn

OneFS In-line Data Reduction

Posted by trimbn Feb 17, 2020

The new OneFS 8.2.2 release extends the in-line data reduction suite to the H5600 deep-hybrid platform, in addition to the all-flash F810.


inline-dedupe_0-1.png

 

With the H5600 the in-line compression and deduplication are both performed in software within each node. This is in contrast to the F810 platform, where each node includes a FPGA hardware adapter, which off-loads compression from the CPU. The software compression code path is also used as fallback in the event of an F810 hardware failure, and in a mixed cluster, for use in nodes without a hardware offload capability. Both hardware and software compression implementations are DEFLATE compatible.

 

The H5600 deep-hybrid chassis is available in the following storage configurations:

 

Hard Drive Capacity

SSD Capacity

Encryption (SED)

Chassis Capacity (Raw)

10 TB

3.2TB SSD

No

800 TB

12 TB

2 x 3.2TB SSDs

No

960 TB

10 TB SED

3.2TB SSD

Yes

800 TB

 

Similarly, the F810 all-flash chassis is available with the following storage options:

 

Drive Capacity

Storage Medium

Encryption (SED)

Chassis Capacity (Raw)

3.8 TB

Solid state drive (SSD)

No

228 TB

7.7 TB

Solid state drive (SSD)

No

462 TB

15.4 TB

Solid state drive (SSD)

No

924 TB

15.4 TB SED

Solid state drive (SSD)

Yes

924 TB

 

When in-line data reduction is enabled on a cluster, data from network clients is accepted as is and makes its way through the OneFS write path until it reaches the BSW engine, where it is broken up into individual chunks. The in-line data reduction write path comprises three main phases:

 

inline-dedupe_1.png

 

If both in-line compression and deduplication are enabled on a cluster, zero block removal is performed first, followed by dedupe, and then compression. This order allows each phase to reduce the scope of work each subsequent phase.

 

The in-line data reduction zero block removal phase detects blocks that contain only zeros and prevents them from being written to disk. This reduces disk space requirements, reduces the amount of work that both in-line deduplication and compression need to perform, and avoids unnecessary writes to SSD, resulting in increased drive longevity.

 

Next in the pipeline is in-line dedupe. While Isilon has offered  native file system deduplication solution for several years, this was always accomplished by scanning the data after it has been written to disk, or post-process. With in-line data reduction, deduplication is now performed in real time as data is written to the cluster. Storage efficiency is achieved by scanning the data for identical blocks as it is received and then eliminating the duplicates.

 

When a duplicate block is discovered, in-line dedupe moves a single copy of the block to a special set of files known as shadow stores. These are file system containers which can contain both physical data and pointers, or references, to shared blocks. Shadow stores are similar to regular files but are hidden from the file system namespace, so cannot be accessed via a pathname. A shadow store typically grows to a maximum size of 2GB, which is around 256K blocks, with each block able to be referenced by 32,000 files. If the reference count limit is reached, a new block is allocated, which may or may not be in the same shadow store. Additionally, shadow stores do not reference other shadow stores. Plus, snapshots of shadow stores are not permitted because the data contained in shadow stores cannot be overwritten.

 

When a client writes a file to a node pool configured for in-line deduplication on a cluster, the write operation is divided up into whole 8KB blocks. Each of these blocks is then hashed and its cryptographic ‘fingerprint’ compared against an in-memory index for a match. At this point, one of the following operations will occur:

 

1.     If a match is discovered with an existing shadow store block, a byte-by-byte comparison is performed. If the comparison is successful, the data is removed from the current write operation and replaced with a shadow reference.

 

2.     When a match is found with another LIN, the data is written to a shadow store instead and replaced with a shadow reference. Next, a work request is generated and queued that includes the location for the new shadow store block, the matching LIN and block, and the data hash. A byte-by-byte data comparison is performed to verify the match and the request is then processed.

 

3.     If no match is found, the data is written to the file natively and the hash for the block is added to the in-memory index.

 

In order for in-line deduplication to be performed on a write operation, the following conditions need to be true:

 

  • In-line dedupe must be globally enabled on the cluster.
  • The current operation is writing data (ie. not a truncate or write zero operation).
  • The ‘no_dedupe’ flag is not set on the file.
  • The file is not a special file type, such as an alternate data stream (ADS) or an EC (endurant cache) file.
  • Write data includes fully overwritten and aligned blocks.
  • The write is not part of a ‘rehydrate’ operation.
  • The file has not been packed (containerized) by SFSE (small file storage efficiency).

 

OneFS in-line deduplication uses the 128-bit CityHash algorithm, which is both fast and cryptographically strong. This contrasts with OneFS’ post-process SmartDedupe, which uses SHA-1 hashing.

 

Each F810 or H5600 node in a cluster with in-line dedupe enabled has its own in-memory hash index that it compares block ‘fingerprints’ against. The index lives in system RAM and is allocated using physically contiguous pages and accessed directly with physical addresses. This avoids the need to traverse virtual memory mappings and does not incur the cost of translation lookaside buffer (TLB) misses, minimizing deduplication performance impact.

 

The maximum size of the hash index is governed by a pair of sysctl settings, one of which caps the size at 16GB, and the other which limits the maximum size to 10% of total RAM.  The strictest of these two constraints applies.  While these settings are configurable, the recommended best practice is to use the default configuration. Any changes to these settings should only be performed under the supervision of Dell EMC support.

 

Since in-line dedupe and SmartDedupe use different hashing algorithms, the indexes for each are not shared directly. However, the work performed by each dedupe solution can be leveraged by each other.  For instance, if SmartDedupe writes data to a shadow store, when those blocks are read, the read hashing component of inline dedupe will see those blocks and index them. 

 

When a match is found, in-line dedupe performs a byte-by-byte comparison of each block to be shared to avoid the potential for a hash collision. Data is prefetched prior to the byte-by-byte check and then compared against the L1 cache buffer directly, avoiding unnecessary data copies and adding minimal overhead. Once the matching blocks have been compared and verified as identical, they are then shared by writing the matching data to a common shadow store and creating references from the original files to this shadow store.

 

In-line dedupe samples every whole block written and handles each block independently, so it can aggressively locate block duplicity.  If a contiguous run of matching blocks is detected, in-line dedupe will merge the results into regions and process them efficiently.

 

In-line dedupe also detects dedupe opportunities from the read path, and blocks are hashed as they are read into L1 cache and inserted into the index. If an existing entry exists for that hash, in-line dedupe knows there is a block sharing opportunity between the block it just read and the one previously indexed. It combines that information and queues a request to an asynchronous dedupe worker thread.  As such, it is possible to deduplicate a data set purely by reading it all. To help mitigate the performance impact, the hashing is performed out-of-band in the prefetch path, rather than in the latency-sensitive read path.

 

Compression, the third and final phase of the in-line data reduction pipeline, occurs as files are written to from a node in the cluster via a connected client session. Similarly, files are re-inflated on demand as they are read by clients. 

 

Unlike the H5600, the F810 nodes’ FPGA-based hardware offload engine resides on the backend PCI-e network adapter to perform real-time data compression, using a proprietary implementation of DEFLATE with the highest level of compression, while incurring minimal to no performance penalty for highly compressible datasets.

 

The compression engine comprises three main components:

 

Engine Component

Description

Search Module

LZ77 search module analyzes in-line file data chunks for repeated patterns.

Encoding Module

Performs data compression (Huffman encoding) on target chunks.

Decompression Module

Regenerates the original file from the compressed chunks.

 

In addition to dual-port 40Gb Ethernet interfaces, each F810 node’s data reduction off-load adapter contains an FPGA chip, which is dedicated to the compression of data received via client connections to the node. These cards reside in the backend PCI-e slot in each of the four nodes. The two Ethernet ports in each adapter are used for the node’s redundant backend network connectivity.

 

The table below illustrates the relationship between the effective to usable and effective to raw ratios for the F810 and H5600 platforms:


inline-dedupe3_0.png

 

When a file is written to OneFS using in-line data compression, the file’s logical space is divided up into equal sized chunks called compression chunks. Compaction is used to create 128KB compression chunks, with each chunk comprising sixteen 8KB data blocks. This is optimal since 128KB is the same chunk size that OneFS uses for its data protection stripe units, providing simplicity and efficiency, by avoiding the overhead of additional chunk packing.

 

For example, consider the following 128KB chunk:

 

compression_1.png

 

After compression, this chunk is reduced from sixteen to six 8KB blocks in size. This means that this chunk is now physically 48KB in size. OneFS provides a transparent logical overlay to the physical attributes. This overlay describes whether the backing data is compressed or not and which blocks in the chunk are physical or sparse, such that file system consumers are unaffected by compression. As such, the compressed chunk is logically represented as 128KB in size, regardless of its actual physical size. The orange sector in the illustration above represents the trailing, partially filled 8KB block in the chunk. Depending on how each 128KB chunk compresses, the last block may be under-utilized by up to 7KB after compression.

 

Efficiency savings must be at least 8KB (one block) for compression to occur, otherwise that chunk or file will be passed over and remain in its original, uncompressed state. For example, a file of 16KB that yields 8KB (one block) of savings would be compressed. Once a file has been compressed, it is then protected with Forward Error Correction (FEC) parity blocks, reducing the number of FEC blocks and therefore providing further overall storage savings.

 

Note that compression chunks will never cross node pools. This avoids the need to de-compress or recompress data to change protection levels, perform recovered writes, or otherwise shift protection-group boundaries.

 

Compression and deduplication can significantly increase the storage efficiency of data. However, the actual space savings will vary depending on the specific attributes of the data itself.

 

Configuration of OneFS in-line data reduction is via the command line interface (CLI), using the ‘isi compression’ and ‘isi dedupe inline’ commands. There are also utilities provided to decompress, or rehydrate, compressed and deduplicated files if necessary. Plus, there are tools for viewing on-disk capacity savings that in-line data reduction has generated.


The ‘isi_hw_status’ CLI command can be used to confirm and verify the node(s) in a cluster. For example:


# isi_hw_status –i | grep Product

Product: F810-4U-Single-256GB-1x1GE-2x40GE SFP+-24TB SSD

 

Since Compression configuration is binary, either on or off across a cluster, it can be easily controlled via the OneFS command line interface (CLI). For example, the following syntax will enable compression and verify the configuration:


# isi compression settings view

    Enabled: No

# isi compression settings modify --enabled=True

# isi compression settings view

    Enabled: Yes

 

Be aware that in-line compression is enabled by default on new H5600 and F810 clusters.

 

In a mixed cluster containing other node styles in addition to compression nodes, files will only be stored in a compressed form on H5600 and F810 node pool(s). Data that is written or tiered to storage pools of other hardware styles will be uncompressed on the fly when it moves between pools. A non-in-line compression supporting node on the cluster can be an initiator for compressed writes in software to a compression node pool. However, this may generate significant CPU overhead for lower powered nodes, such as the A-series hardware and provide only software fallback based compression with lower compressibility.

 

While there are no visible userspace changes when files are compressed, the ‘isi get’ CLI command provides straightforward method to verify whether a file is compressed. If compression has occurred, both the ‘disk usage’ and the ‘physical blocks’ metric reported by the ‘isi get –DD’ CLI command will be reduced. Additionally, at the bottom of the command’s output, the logical block statistics will report the number of compressed blocks. For example:


Metatree logical blocks:

zero=260814 shadow=0 ditto=0 prealloc=0 block=2 compressed=1328

 

For more detailed information, the –O flag, which displays the logical overlay, can be used with the ‘isi get’ command.

OneFS in-line data compression can be disabled from the CLI with the following syntax:


# isi compression settings modify --enabled=False

# isi compression settings view

    Enabled: No

 

Since in-line deduplication configuration is binary, either on or off across a cluster, it can be easily controlled via the OneFS command line interface (CLI). For example, the following syntax will enable in-line deduplication and verify the configuration:


# isi dedupe inline settings view

    Mode: disabled

    Wait: -

   Local: -


# isi dedupe inline settings modify –-mode enabled

# isi dedupe inline settings view

    Mode: enabled

    Wait: -

   Local: -

 

Note that in-line deduplication is disabled by default on new H5600 and F810 clusters.


If deduplication has occurred, both the ‘disk usage’ and the ‘physical blocks’ metric reported by the ‘isi get –DD’ CLI command will be reduced. Additionally, at the bottom of the command’s output, the logical block statistics will report the number of shadow blocks. For example:


Metatree logical blocks:

zero=260814 shadow=362 ditto=0 prealloc=0 block=2 compressed=0

 

OneFS in-line data deduplication can be disabled from the CLI with the following syntax:


# isi dedupe inline settings modify –-mode disabled

# isi dedupe inline settings view

    Mode: disabled

    Wait: -

   Local: -

 

Additionally, OneFS in-line data deduplication can also be paused from the CLI with the following syntax:


# isi dedupe inline settings modify –-mode paused

The next few articles will continue to review the new features and functionality of the OneFS 8.2.2 ‘Beachcomber’ release. Next up, we’ll explore the topic of performance dataset monitoring and the new functionality that 8.2.2 introduces.


As clusters increase in scale and the number of competing workloads place demands on system resources, more visibility is required in order to share cluster resources equitably. OneFS partitioned performance monitoring helps define, monitor and react to performance-related issues on the cluster. This allows storage admins to pinpoint resource consumers, helping to identify rogue workloads, noisy neighbor processes, or users that consume excessive system resources.


Partitioned performance monitoring sees the addition of NFS protocol support in 8.2.2, and can be used to define workloads and view the associated performance statistics – protocols, disk ops, read/write bandwidth, CPU, IOPs, etc. Workload definitions can be quickly and simply configured to include any combination of directories, exports, shares, paths, users, clients and access zones. Customized settings and filters can be crafted to match specific workloads for a dataset that meets the required criteria, and reported statistics are refreshed every 30 seconds. Workload monitoring is also key for show-back and charge-back resource accounting.

 

Category

Description

Example

Workload

A set of identification metrics and resource consumption metrics.

{username:nick, zone_name:System} consumed {cpu:1.2s, bytes_in:10K, bytes_out:20M, …}

Dataset

A specification of identification metrics to aggregate workloads by, and the workloads collected that match that specification.

{username, zone_name}

Filter

A method for including only workloads that match specific identification metrics.

{zone_name:System}

 

Each resource listed below is tracked by certain stages of partitioned performance monitoring to provide statistics within a performance dataset, and for limiting specific workloads.


Resource Name

Definition

First Introduced

CPU Time

Measures CPU utilization. There are two different measures of this at the moment; raw measurements are taken in CPU cycles, but they are normalized to microseconds before aggregation.

OneFS 8.0.1

Reads

A count of blocks read from disk (including SSD). It generally counts 8 KB file blocks, though 512-byte inodes also count as a full block. These are physical blocks, not logical blocks, which doesn’t matter much for reads, but is important when analyzing writes.

OneFS 8.0.1

Writes

A count of blocks written to disk; or more precisely, to the journal. As with reads, 512-byte inode writes are counted as full blocks; for files, 8 KB blocks. Since these are physical blocks, writing to a protected file will count both the logical file data and the protection data.

OneFS 8.0.1

L2 Hits

A count of blocks found in a node’s L2 (Backend RAM) cache on a read attempt, avoiding a read from disk.

OneFS 8.0.1

L3 Hits

A count of blocks found in a node’s L3 (Backend SSD) cache on a read attempt, replacing a read from disk with a read from SSD.

OneFS 8.0.1

Protocol Operations

  • Protocol (smb1,smb2,nfs3, nfs4)
  • NFS in OneFS 8.2.2
  • SMB in OneFS 8.2
  • For SMB 1, this is the number of ops (commands) on the wire with the exception of the NEGOTIATE op.
  • For SMB 2/3 this is the number of chained ops (commands) on the wire, with the exception of the NEGOTIATE op.
  • The counted op for chained ops will always be the first op.
  • SMB NEGOTIATE ops will not be associated with a specific user.

OneFS 8.2.2

Bytes In

A count of the amount of data received by the server from a client, including the application layer headers but not including TCP/IP headers.

OneFS 8.2

Bytes Out

A count of the amount of data sent by the server to a client, including the application layer headers but not including TCP/IP headers.

OneFS 8.2

Read/Write/Other Latency Total

Sum of times taken from start to finish of ops as they run through the system identical to that provided by isi statistics protocol. Specifically, this is the time in between LwSchedWorkCreate and the final LwSchedWorkExecuteStop for the work item. Latencies are split between the three operations types, read/write/other, with a separate resource for each.

Use Read/Write/Other Latency Count to calculate averages

OneFS 8.2

Read/Write/Other Latency Count

Count of times taken from start to finish of ops as they run through the system identical to that provided by isi statistics protocol. Latencies are split between the three operations types, read/write/other, with a separate resource for each.

Used to calculate the average of Read/Write/Other Latency Total

OneFS 8.2

Workload Type

  • Dynamic (or blank) - Top-N tracked workload
  • Pinned - Pinned workload
  • Overaccounted - The sum of all stats that have been counted twice within the same dataset, used so that a workload usage % can be calculated.
  • Excluded - The sum of all stats that do not match the current dataset configuration. This is for workloads that do not have an element specified that is defined in the category, or for workloads in filtered datasets that do not match the filter conditions.
  • Additional - The amount of resources consumed by identifiable workloads not matching any of the above. Principally any workload that has dropped off of the top-n.
  • System - The amount of resources consumed by the kernel.
  • Unknown - The amount of resources that we could not attribute to any workload, principally due to falling off of kernel hashes of limited size.

OneFS 8.2

 

Identification Metrics are the client attributes of a workload interacting with OneFS through Protocol Operations, or System Jobs or Services. They are used to separate each workload into administrator-defined datasets.

Metric Name

Definition

First Introduced

System Name

The system name of a given workload. For services started by isi_mcp/lwsm/isi_daemon this is the service name itself. For protocols this is inherited from the service name. For jobs this is the job id in the form "Job: 123".

OneFS 8.0.1

Job Type + Phase

A short containing the job type as the first n bytes, and the phase as the rest of the bytes. There are translations for job type to name, but not job phase to name.

OneFS 8.0.1

Username

The user as reported by the native token. Translated back to username if possible by IIQ / stat summary view.

OneFS 8.2

Local IP

IP Address, CIDR Subnet or IP Address range of the node serving that workload. CIDR subnet or range will only be output if a pinned workload is configured with that range. There is no overlap between addresses/subnets/ranges for workloads with all other metrics matching.

OneFS 8.2

Remote IP

IP Address, CIDR Subnet or IP Address range of the client causing this workload. CIDR subnet or range will only be output if a pinned workload is configured with that range. There is no overlap between addresses/subnets/ranges for workloads with all other metrics matching.

OneFS 8.2

Protocol

Protocol enumeration index. Translated to string by stat.

  • smb1, smb2
  • nfs3, nfs4

OneFS 8.2 &

OneFS 8.2.2

Zone

The zone id of the current workload. If zone id is present all username lookups etc should use that zone, otherwise it should use the default "System" zone. Translation to string performed by InsightIQ / summary view.

OneFS 8.0.1

Group

The group that the current workload belongs to. Translated to string name by InsightIQ / summary view. For any dataset with group defined as an element the primary group will be tracked as a dynamic workload (unless there is a matching pinned workload in which case that will be used instead). If there is a pinned workload/filter with a group specified, the additional groups will also be scanned and tracked. If multiple groups match then stats will be double accounted, and any double accounting will be summed in the "Overaccounted" workload within the category.

OneFS 8.2

IFS Domain

The partitioned performance IFS domain and respective path LIN that a particular file belongs to, determined using the inode. Domains are not tracked using dynamic workloads unless a filter is created with the specified domain. Domains are created/deleted automatically by configuring a pinned workload or specifying a domain in a filter. A file can belong to multiple domains in which case there will be double accounting within the category. As with groups any double accounting will be summed in the "Overaccounted" workload within the category. The path must be resolved from the LIN by InsightIQ or the Summary View.

OneFS 8.2

SMB Share Name

The name of the SMB share that the workload is accessing through, provided by the smb protocol. Also provided at the time of actor loading are the Session ID and Tree ID to improve hashing/dtoken lookup performance within the kernel.

OneFS 8.2

NFS Export ID

The ID of the NFS export that the workload is accessing through, provided by the smb protocol.

OneFS 8.2.2

Path

Track and report SMB traffic on a specified /ifs directory path. Note that NFS traffic under a monitored path is excluded

OneFS 8.2.2

 

  So how does this work in practice? From the CLI, the following command syntax can be used to create a standard performance dataset monitor:


# isi performance dataset create –-name <name> <metrics>


For example:


# isi performance dataset create --name my_dataset username zone_name


To create a dataset that requires filters, use:


# isi performance dataset create –-name <name> <metrics> –-filters <filter-metrics>

 

# isi performance dataset create --name my_filtered_dataset username zone_name --filters zone_name


For example, to monitor the NFS exports in access zones:


# isi performance datasets create --name=dataset01 export_id zone_name

# isi statistics workload list --dataset=dataset01


Or to monitor by username for NFSv3 traffic only


# isi performance datasets create --name=ds02 username protocol --filters=protocol

# isi performance filters apply ds02 protocol:nfs3

# isi statistics workload list --dataset=ds02


Other performance dataset operation commands include:


# isi performance dataset list

# isi performance dataset view <name|id>

# isi performance dataset modify <name|id> --name <new_name>

# isi performance dataset delete <name|id>

 

A dataset will display the top 1024 workloads by default. Any remainder will be aggregated into a single additional workload.


If you want a workload to always be visible, it can be pinned using the following syntax:


# isi performance workload pin <dataset_name|id> <metric>:<value>


For example:

# isi performance workload pin my_dataset username:nick zone_name:System


Other workload operation commands include:

# isi performance workload list <dataset_name|id>

# isi performance workload view <dataset_name|id> <workload_name|id>

# isi performance workload modify <dataset_name|id> <workload_name|id> --name <new_name>

# isi performance workload unpin <dataset_name|id> <workload_name|id>


Multiple filters can also be applied to the same dataset. A workload will be included if it matches any of the filters. Any workload that doesn’t match a filter be aggregated into an excluded workload.


The following CLI command syntax can be sued to apply a filter:


# isi performance filter apply <dataset_name|id> <metric>:<value>


For example:


# isi performance filter apply my_filtered_dataset zone_name:System


Other filter options include:


# isi performance filter list <dataset_name|id>

# isi performance filter view <dataset_name|id> <filter_name|id>

# isi performance filter modify <dataset_name|id> <filter_name|id> --name <new_name>

# isi performance filter remove <dataset_name|id> <filter_name|id>


The following syntax can be used to enable path tracking. For example, to monitor traffic under /ifs/data:


# isi performance datasets create –name=dataset1 path

# isi performance workloads pin dataset1 path:/ifs/data/


Be aware that NFS traffic under a monitored path is currently not reported. For example:


nfs_partitioned_perf_1.png


Viewing Statistics


# isi statistics workload –-dataset <dataset_name|id>

# isi statistics workload --dataset my_dataset

    CPU BytesIn  BytesOut   Ops Reads  Writes   L2 L3  ReadLatency  WriteLatency OtherLatency  UserName   ZoneName WorkloadType

-------------------------------------------------------------------------------------------------------------------------------------

11.0ms 2.8M     887.4   5.5 0.0   393.7  0.3 0.0      503.0us       638.8us         7.4ms       nick System             -

  1.2ms 10.0K     20.0M  56.0 40.0     0.0  0.0 0.0        0.0us         0.0us         0.0us      mary System        Pinned

31.4us 15.1      11.7   0.1 0.0     0.0  0.0 0.0      349.3us         0.0us 0.0us       nick Quarantine             -

  1. 166.3ms      0.0 0.0   0.0    0.0 0.1  0.0  0.0 0.0us         0.0us         0.0us         -          - Excluded

31.6ms 0.0       0.0   0.0 0.0     0.0  0.0 0.0        0.0us         0.0us         0.0us         -          -        System

70.2us 0.0       0.0   0.0 0.0     3.3  0.1 0.0        0.0us         0.0us         0.0us         -          - Unknown

  0.0us 0.0       0.0   0.0 0.0     0.0  0.0 0.0        0.0us         0.0us         0.0us         -          - Additional

  0.0us 0.0       0.0   0.0 0.0     0.0  0.0 0.0        0.0us         0.0us         0.0us         - - Overaccounted

-------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Includes standard statistics flags, i.e. --numeric, --sort, --totalby etc..

 

Other useful commands include the following:


To list all available identification metrics:


# isi performance metrics list

# isi performance metrics view <metric>


To view/modify the quantity of top workloads collected per dataset:


# isi performance settings view

# isi performance settings modify <n_top_workloads>

 

To assist with troubleshooting, the validation of the configuration is thorough, and errors are output directly to the CLI. Name lookup failures, for example UID to username mappings, are reported in an additional column in the statistics output. Errors in the kernel are output to /var/log/messages and protocol errors are written to the respective protocol log.


Note that statistics are updated every 30 seconds and, as such, a newly created dataset will not show up in the statistics output until the update has occurred. Similarly, an old dataset may be displayed until the next update occurs.


A dataset with a filtered metric specified but with no filters applied will not output any workloads. Paths and Non-Primary groups are only reported if they are pinned or have a filter applied. Paths and Non-Primary groups may result in work being accounted twice within the same dataset, as they can match multiple workloads. The total amount over-accounted within a dataset is aggregated into the Overaccounted workload.


As mentioned previously, the NFS and SMB protocols are now supported in OneFS 8.2.2 . Other primary protocol monitoring support, such as HDFS, will be added in a future release.

 

In addition to protocol stats, OneFS also includes job performance resource monitoring, which provides statistics for the resources used by jobs - both cluster-wide and per-node. Available in a ‘top’ format, this command displays the top jobs and processes, and periodically updates the information.


For example, the following syntax shows, and indefinitely refreshes, the top five processes on a cluster:


# isi statistics workload --limit 5 –-format=top

 

last update:  2019-06-19T06:45:25 (s)ort: default

 

CPU   Reads Writes      L2    L3    Node  SystemName        JobType

  1. 1.4s  9.1k  0.0         3.5k  497.0 2     Job:  237         IntegrityScan[0]
  2. 1.2s  85.7  714.7       4.9k  0.0   1     Job:  238         Dedupe[0]
  3. 1.2s  9.5k  0.0         3.5k  48.5  1     Job:  237         IntegrityScan[0]
  4. 1.2s  7.4k  541.3       4.9k  0.0   3     Job:  238         Dedupe[0]
  5. 1.1s  7.9k  0.0         3.5k  41.6  2     Job:  237         IntegrityScan[0] 


The resource statistics tracked per job, per job phase, and per node include CPU, reads, writes, and L2 & L3 cache hits. Unlike the output from the ‘top’ command, this makes it easier to diagnose individual job resource issues, etc.

trimbn

OneFS Large File Support

Posted by trimbn Feb 4, 2020

A couple of weeks back, OneFS 8.2.2 was launched, adding a bevy of new features to the Isilon portfolio. We’ll take a peek at this new functionality over the course of the next few blog articles, kicking it off with large file support.


The largest file size that OneFS currently supports is raised to 16TiB in the new OneFS 8.2.2 release. This is a fourfold increase over previous OneFS versions, up from a maximum of 4TiB in prior releases.

 

16TB_file_1.PNG.png

 

This helps enable additional applications and workloads that typically deal with large files, for example videos & images, seismic analysis workflows, as well as a destination or staging area for backups and large database dumps.


So let’s take a quick look at this new functionality...


Firstly, large file support is available for free. No special license is required to activate large file support in OneFS 8.2.2 and, once enabled, files larger than 4TiB may be written to and/or exist on the system. However, large file support cannot be disabled once enabled.


In order for OneFS 8.2.2 to support files larger than 4TiB, adequate space is required in all of a cluster’s disk pools in order to avoid a potential performance impact. As such, the following requirements must be met in order to enable large file support:

 

 

Large File Support Requirement

Description

Version

A cluster must be running OneFS 8.2.2 in order to enable large file support.

Disk Pool

A maximum sized file (16TiB) plus protection can consume no more than 10% of any disk pool. This translates to a minimum disk pool size of 160TiB plus protection.

SyncIQ Policy

All SyncIQ remote clusters must be running OneFS 8.2.2 and also satisfy the restrictions for minimum disk pool size and SyncIQ policies.

 

Note that the above restrictions will be removed in a future release, allowing support for large (>4TiB) file sizes on all cluster configurations.

The following procedure can be used to configure a cluster for 16TiB file support:

 

16TB_file_2.PNG.png

 

Once a cluster is happily running OneFS 8.2.2, the ‘isi_large_file -c’ CLI utility will verify that the cluster’s disk pools and existing SyncIQ policies meet the requirements listed above. For example:

 

# isi_large_file -c

Checking cluster compatibility with large file support...

 

NOTE:

Isilon requires ALL clusters in your data-center that are part of

any SyncIQ relationship to be running on versions of OneFS compatible

with large file support before any of them can enable it.  If any

cluster requires upgrade to a compatible version, all SyncIQ policies

in a SyncIQ relationship with the upgraded cluster will need to resync

before you can successfully enable large file support.

 

* Checking SyncIQ compatibility...

- SyncIQ compatibility check passed

 

* Checking cluster disk space compatibility...

- The following disk pools do not have enough usable storage capacity to support                   large files:

 

Disk Pool Name    Members Usable  Required  Potential Capable  Add Nodes

-----------------------------------------------------------------------------

h500_30tb_3.2tb-ssd_128gb:2  2-3,6,8,10-11,13-16,18-19:bay3,6,9,12,15   107TB                     180TB      89T N        X

h500_30tb_3.2tb-ssd_128gb:3 2-3,6,8,10-11,13-16,18-19:bay4,7,10,13,16  107TB                     180TB      89T N        X

h500_30tb_3.2tb-ssd_128gb:4 2-3,6,8,10-11,13-16,18-19:bay5,8,11,14,17  107TB                     180TB      89T N        X

h500_30tb_3.2tb-ssd_128gb:9  1,4-5,7,9,12,17,20-24:bay5,7,11-12,17      107TB                     180TB      89T N        X

h500_30tb_3.2tb-ssd_128gb:10 1,4-5,7,9,12,17,20-24:bay4,6,10,13,16 107TB 180TB      89T      N X

h500_30tb_3.2tb-ssd_128gb:11 1,4-5,7,9,12,17,20-24:bay3,8-9,14-15 107TB 180TB      89T      N X

 

The cluster is not compatible with large file support:

  - Incompatible disk pool(s)

 

Here, the output shows that none of the pools meet the 10% disk pool rule above and contain insufficient storage capacity to allow large file support to be enabled. In this case, additional nodes would need to be added.

 

The following table explains the detail of output categories above:

 

Category

Description

Disk Pool Name

Node pool name and this disk pool ID.

Members

Current nodes and bays in this disk pool .

Usable

Current usable capacity of this disk pool.

Required

Usable capacity required for this disk pool to support large files.

Potential

The max usable capacity this disk pool could support at the target node count

Capable

Whether this disk pool has the size of disk and number per node to support large files

Add Nodes

If this disk pool is capable, how many more nodes need to be added

 

Once the validation confirms that the cluster meets the requirements, the following CLI command can then be run to enable large file support:

 

# isi_large_file -e

 

Upon successfully enabling large file support, the ‘cluster full’ alert threshold is automatically lowered to 85% from the OneFS default of 95%. This is to ensure that adequate space is available for large file creation, repair, and restriping. Additionally, any SyncIQ replication partners must also be running OneFS 8.2.2, adhere to the above minimum disk pool size, and have the large file feature enabled.

 

Any disk pool management commands that violate the large file support requirements are not allowed. Once enabled, disk pools are periodically checked for compliance and OneFS will alert if a disk pool fails to meet the minimum size requirement.

 

If Large File Support is enabled on a cluster, any SyncIQ replication policies will only succeed with remote clusters that are also running 8.2.2 and have Large File Support enabled. All other SyncIQ policies will fail until the appropriate remote clusters are upgraded and have large file support switched on.


16TB_file_3.PNG.png

 

Be aware that, once enabled, large file support cannot be disabled on a cluster – regardless of whether it’s a SyncIQ source or target, or not participating in replication. This may impact future expansion planning for the cluster and all of its SyncIQ replication partners.

 

Also note that, after enabling large file support, the ‘cluster full’ alert threshold is automatically lowered to 85% from the default of 95%. This helps ensure that adequate space is available for large file creation, repair, and restriping.


When the maximum file size is exceeded, OneFS typically returns an ‘EFBIG’ error. This is translated to an error message of “File too large”. For example:

 

# dd if=/dev/zero of=16TB_file.txt bs=1 count=2 seek=16384g

dd: 16TB_file.txt: File too large

1+0 records in

0+0 records out

  0 bytes transferred in 0.000232 secs (0 bytes/sec)

trimbn

OneFS Antivirus & ICAP - Part 2

Posted by trimbn Jan 20, 2020

In this second article in this AntiVirus series, we'll take a look at policies, exclusions, global configuration, plus some monitoring and sizing ideas.


The OneFS WebUI and CLI can be used to configure antivirus policies, adjust settings, and manage antivirus scans and reports.


icap_3.png

 

Antivirus scanning can be enabled or disabled via the check-box at the top of the page. Similarly, the AV settings can be viewed and changed via the CLI.


# isi antivirus settings view

           Fail Open: Yes

        Glob Filters: -

Glob Filters Enabled: No

Glob Filters Include: No

       Path Prefixes: /ifs/data

              Repair: Yes

       Report Expiry: 1Y

       Scan On Close: Yes

        Scan On Open: Yes

Scan Cloudpool Files: No

   Scan Size Maximum: 1.00G

             Service: Yes

          Quarantine: Yes

            Truncate: No

 

For example, the following syntax will change the maximum file scanning size from 1GB to 100 MB:


# isi antivirus settings modify --scan-size-maximum 100M


To prevent specific files from being scanned by antivirus scans, from the WebUI navigate to Data Protection > Antivirus > Settings and configure filters based on file size, name, etc.


icap_4.png


To exclude files based on file name, select Enable filters and configure either inclusions or exclusions. Specify one or more filters, which can include the following wildcard characters:


Wildcard Character

Description

*

Matches any string in place of the asterisk.

[  ]

Matches any characters contained in the brackets, or a range of characters separated by a dash.

?

Matches any character in place of the question mark.

 

Be aware that these filters apply globally to all antivirus scans.


OneFS can be configured to automatically scan files as they are accessed by users from the WebUI by navigating to Data Protection > Antivirus > Settings. In the On-Access Scans area, specify whether you want files to be scanned as they are accessed. 


On-access

Description

Open

To require that all files be scanned before they are opened by a user, select Enable scan of files on open, and then specify whether you want to allow access to files that cannot be scanned by selecting or clearing Enable file access when scanning fails.

Close

To scan files after they are closed, select Enable scan of files on close.


Note that on-access scans operate independently of antivirus policies.


For example, the following syntax will disable scanning on file open:


# isi antivirus settings modify --scan-on-open no


The amount of time OneFS retains antivirus reports before automatically deleting them can be configured via the WebUI by navigating to Data Protection > Antivirus > Settings > Reports and specifying a retention period.

To add an ICAP server from the WebUI, navigate to Data Protection > Antivirus > ICAP Servers, select Add an ICAP Server, enter its IP address,  and click Enable.


Or, from the CLI:


# isi antivirus servers create --enabled <url>


An antivirus policy that causes specific files to be scanned for viruses each time the policy is run can be crafted from the WebUI by navigating to Data Protection > Antivirus > Policies and creating an Antivirus Policy. Name the policy, specify the directory(s) that you want to scan in the Paths field, set the preferred recursion depth (full or number of subdirectories), and configure a schedule if desired. Note that scheduled policies can also be run manually at any time.


Option

Description

Run the policy only manually

Click ‘Manual’.

Run the policy according to a schedule

  1. Click ‘Scheduled’.
  2. Specify how often you want the policy to run.

 

Individual files can also be manually scanned for viruses. For example, the following CLI syntax will initiate a scan of the /ifs/data/infected file:

 

# isi antivirus scan /ifs/data/infected

Result: Succeeded

Report ID: R:5e1d083c:6f86


To quarantine a file to prevent it from being accessed by users, from the WebUI, browse to Data Protection > Antivirus > Detected Threats and select More > Quarantine File in the appropriate row of the Antivirus Threat Reports table. Or from the CLI:


# isi antivirus quarantine /ifs/data/infected


The quarantine status of a file can be inspected as follows:

# isi antivirus status /ifs/data/infected

File: /ifs/data/infected

  Last Scan: Never

Quarantined: Yes

 

It can also easily be un-quarantined, or released:

# isi antivirus release /ifs/data/infected

# isi antivirus status /ifs/data/infected

File: /ifs/data/infected

  Last Scan: Never

Quarantined: No

 

If a threat is detected in a file, and the file is irreparable and no longer needed, you can manually remove the file. For example, the following command deletes the /ifs/data/infected file:


# rm /ifs/data/infected


When sizing the ICAP servers for a cluster, the number of ICAP servers deployed per Isilon node (ICAP/node) is often used as the primary metric. With multiple ICAP servers per node, OneFS distributes files to the ICAP servers in an equally weighted, round-robin manner and does not consider the processing power of the ICAP servers when allocating files. Because of this, try to keep the configuration and resource allocation (CPU, memory, network) of each ICAP server relatively equal to avoid scanning bottlenecks.

 

  • If ICAP servers are virtual machines, their resources should be dedicated (CPU, memory, network) and the OS optimized to minimize swapping and latency.
  • Network latency is a significant factor to keep in mind when planning a cluster ICAP solution. Where possible, ensure that network routing is symmetric (e.g. switch stacks, hops, delay, static routes, sbr, etc) and keep latency to a minimum.
  • The majority of infected files tend to be <10MB in size, so reducing the file size to be scanned is also advisable. Select and setup a routine for updating the file types and sizes that would be scanned or skipped.
  • Scan only the data that is necessary, and ensure the cluster is running a current OneFS version.
  • For clusters with heavy workloads or high rate of change, consider scheduling scans during low periods instead of on access/close.
  • Round-robin scanning task allocation is per node, rather than across cluster. This can potentially lead to variable congestion on individual ICAP servers, depending on how clients connect and load a cluster.
  • If the cluster is running SyncIQ replication and has a heavy workload, it is also good to stagger the activity.
  • Suggest creating a separate RBAC account for AntiVirus operations.


The following guidelines are a useful starting point for sizing ICAP server sizing:


ICAP Attribute

Details

ICAP servers

  • Policy scan: Minimum of two ICAP servers for a small cluster, increasing as cluster grows.
  • On-access scan: At least one dedicated ICAP server per node.

ICAP threads

Test different thread numbers to determine the best value for your environment. For example:

  • McAfee: 50 to 100
  • Symantec: ~20

Network bandwidth

Suggested network connectivity bandwidth for ICAP servers, depending on the average file size:

  • <1 MB average file size: 1Gbps for ICAP servers
  • >1 MB average file size: 10Gbps for ICAP servers

CPU Load

In general, the scanning workload for ICAP servers is CPU intensive. If ICAP server CPU utilization >95%, either increase CPU of the ICAP servers or raise the ICAP servers per cluster ratio.

 

The number of ICAP server threads is one of the primary ICAP server-side tunables, and recommendations vary widely across vendors and products. However, the  ‘too_busy’ status and 'failed to scan' ratio are useful in gauging whether a cluster’s ICAP server(s) are too busy to handle further requests.


Firstly, OneFS reports the status of ICAP servers connected to isi_avscan_d, and this can be dumped to a logfile and viewed using the following command:


# kill -USR2 `ps -auxw | grep -i avscan | grep -v grep | awk '{print $2}'`


All of the isi_avscan_d daemon’s state information is logged to the file /var/log/isi_avscan_d.log. The following CLI command can be used to parse the ICAP server status from this file. For example:


# cat /var/log/isi_avscan_d.log | grep “too_busy”

2020-01-08T23:15:22Z <3.6> tme-sandbox-3 isi_avscan_d[71792]: [0x80070ba00]    too_busy: true


If the ‘too_busy’ field is set to ‘true’, as above, this typically indicates that an ICAP server is overloaded, suggesting that there are insufficient ICAP servers for the workload. In this case, the recommendation is to add more ICAP servers until the too_busy state is reported as ‘false’ for all ICAP servers. Conversely, be aware that having an ICAP server to cluster node ratio that is too high can also lead to performance issues. This becomes more apparent on large clusters with a high rate of change.


Secondly, the ‘failed to scan’ ratio can be calculated from the ‘failed’ and ‘scanned’ stats available via the following sysctl command:


# sysctl efs.bam.av.stats | egrep -i 'failed|scanned'


The formula for determining the ‘failed to scan’ ratio is:


(‘Failed’ number / ‘Scanned’ number) x 100 = Failed to scan %


If this percentage is much above zero, consider adding additional ICAP servers, or increasing bandwidth to existing servers if they’re network-bound.

trimbn

OneFS Antivirus & ICAP

Posted by trimbn Jan 13, 2020

It appears that security is top of mind currently, with several customer discussions of late around OneFS and antivirus practices. So, it seemed like a useful topic to review in a couple of blog articles.

 

That said, OneFS provides support for ICAP (Internet Content Adaptation Protocol), enabling real-time scanning of a cluster’s dataset for computer viruses, malware, and other threats. To do this, OneFS sends files to an ICAP server running third-party antivirus scanning software, which scrutinizes the files for viruses and other threat signatures. If a threat is detected, OneFS typically alerts cluster admins by firing an event, displaying near real-time summary information, and documenting the threat in an antivirus scan report. Here’s a high-level view of a typical OneFS antivirus architecture.


icap_1.png


OneFS can also be configured to either request that ICAP servers attempt to repair infected files, or to protect users against potentially dangerous files by truncating or quarantining infected files. Before OneFS sends a file to be scanned, it ensures that the scan is not redundant. If a file has already been scanned and has not been modified, OneFS will not send the file to be scanned unless the virus database on the ICAP server has been updated since the last scan. Note that Antivirus scanning is available only if all nodes in the cluster are connected to the external network (NANON configurations are not supported).


OneFS works with antivirus software that conforms to the ICAP standard, and the following list includes the supported and most widely used antivirus vendors:


Vendor

Details

Symantec

Scan Engine 5.2 and later

Trend Micro

Interscan Web Security Suite 3.1 and later

Kaspersky

Anti-Virus for Proxy Server 5.5 and later

McAfee

VirusScan Enterprise 8.7 and later with VirusScan Enterprise for Storage 1.0 and later

 

OneFS can be configured to send files to be scanned prior to opening, after they are closed, or both. Sending files to be scanned after they are closed is faster but less secure, whereas scanning before they are opened is slower but safer. If antivirus is configured for scanning files after they are closed, when a user creates or modifies a file on the cluster, OneFS queues the file to be scanned. It then sends the file to an ICAP server to be scanned when convenient. In this configuration, users can always access files without any delay. However, it is possible that after a user modifies or creates a file, a second user might access the file before the file is scanned. If a virus was introduced to the file from the first user, the second user would be able to access the infected file. Similarly, if an ICAP server is unable to scan a file, that file will still be accessible to users.


If a cluster is configured to scan files before they are opened, when a user attempts to download a file, OneFS first sends the file to an ICAP server to be checked. The user cannot access that file until the scan is complete. Scanning files before they are opened is more secure, however it does add access latency.


OneFS can also be configured to deny access to files that cannot be scanned by an ICAP server, which can further increase the delay. For example, if no ICAP server(s) are available, users will not be able to access any files until an ICAP server become available again. If OneFS is set to scan before open, it is recommended that it’s configured to scan files after they are closed. Scanning files as they are both opened and closed will not necessarily increase security, but it will usually improve data availability, because that file may have already been scanned since it was last modified. In this case, it can be skipped, as it would not need to be re-scanned if the ICAP server database has not been updated since its previous scan.


Antivirus scanning policies can be crafted that send files from a specified directory to be scanned. OneFS Antivirus policies target a specific directory tree on the cluster and can either be run manually at any time or scheduled for automatic execution. Exclusion rules can also be configured to prevent a policy from sending certain files within the specified root directory, based on the size, name, or extension of the file.


Antivirus scans are managed by the OneFS Job Engine and function similarly to and contend with other system jobs. Note that antivirus policies do not target snapshots, and only on-access scans include snapshots.


Antivirus allows specific file(s) to be manually sent to an ICAP server for scanning at any time. For example, if a virus is detected in a file but the ICAP server is unable to repair it, that file can be re-sent to the ICAP server after the virus database had been updated, and the ICAP server might be able to repair the file. You can also scan individual files to test the connection between the cluster and ICAP servers.


In summary, OneFS offers three flavors of AV scan which include: 


AV Scan Type

Description

On-access

Sends file to ICAP server(s) for scanning prior to opening, after closing, or both. Before opening is slower but safer, after closing is faster but less secure.

AntiVirus Policy

Scheduled or manual directory tree-based scans executed by the OneFS Job Engine.

Individual File

Specific individual files sent to ICAP server(s) for targeted scanning, initiated via OneFS CLI command.

 

In the event that an ICAP server does detect a threat and/or an infected file, OneFS can be configured to respond in one of the following ways:


Response

Description

Alert

All threats that are detected cause an event to be generated in OneFS at the warning level, regardless of the threat response configuration.

Repair

The ICAP server attempts to repair the infected file before returning the file to OneFS.

Quarantine

OneFS quarantines the infected file. A quarantined file cannot be accessed by any user. However, a quarantined file can be removed from quarantine by the root user if the root user is connected to the cluster through secure shell (SSH). If you back up your cluster through NDMP backup, quarantined files will remain quarantined when the files are restored. If you replicate quarantined files to another Isilon cluster, the quarantined files will continue to be quarantined on the target cluster. Quarantines operate independently of access control lists (ACLs).

Truncate

OneFS truncates the infected file. When a file is truncated, OneFS reduces the size of the file to zero bytes to render the file harmless.

It is recommended that you do not apply this setting. If you truncate files without attempting to repair them, you might delete data unnecessarily.

Repair or quarantine

Attempts to repair infected files. If an ICAP server fails to repair a file, OneFS quarantines the file. If the ICAP server repairs the file successfully, OneFS sends the file to the user. Repair or quarantine can be useful if you want to protect users from accessing infected files while retaining all data on a cluster.

Alert only

Only generates an event for each infected file. It is recommended that you do not apply this setting.

Repair only

Attempts to repair infected files. Afterwards, OneFS sends the files to the user, whether or not the ICAP server repaired the files successfully. It is recommended that you do not apply this setting. If you only attempt to repair files, users will still be able to access infected files that cannot be repaired.

Quarantine

Quarantines all infected files. It is recommended that you do not apply this setting. If you quarantine files without attempting to repair them, you might deny access to infected files that could have been repaired.

 

OneFS automatically generates an antivirus scan report each time that a policy is run. It also generates a global status report every 24 hours which includes all the on-access scans that occurred during the day. AV scan reports typically contain the following information: 


Criteria

Description

Start

The time that the scan started.

End

The time that the scan ended.

Number

The total number of files scanned.

Size

The total size of the files scanned.

Packets

The total network traffic sent.

Throughput

The network throughput that was consumed by virus scanning.

Success

Whether the scan succeeded.

Infection total

The total number of infected files detected.

Name

The names of infected files.

Threat

The threats associated with infected files.

Response

How OneFS responded to detected threats.

 

The available scans can be viewed from the CLI as follows:

# isi antivirus reports scans list

ID              Policy ID       Status Start               Files  Infections

--------------------------------------------------------------------------------

-

R:5e1d0e66:7f8b 1b8028028048580 Started   2020-01-14T00:42:14 1      0

R:5e1d0896:706a MANUAL          Succeeded 2020-01-14T00:17:26 0      0

R:5e1d083c:6f86 MANUAL          Succeeded 2020-01-14T00:15:56 0      0

RO5e1d0480      SCAN_ON_OPEN    Started 2020-01-14T00:00:30 0      0

RO5e1bb300      SCAN_ON_OPEN    Finish 2020-01-13T00:00:31 0      0

 

More detail on a particular scan is available via:

 

# isi antivirus reports scans view R:5e1d0e66:7f8b

        ID: R:5e1d0e66:7f8b

Policy ID: 1b8028028048580

    Status: Started

     Start: 2020-01-14T00:42:14

       End: 2020-01-14T00:42:15

  Duration: Now

     Files: 716

Infections: 0

Bytes Sent: 4242360130

      Size: 4241602042

    Job ID: 5363

 

Similarly, threats can be viewed using the following CLI syntax:

 

# isi antivirus reports threats list

Scan ID         File Remediation  Threat  Time

----------------------------------------------------------------------------------------------------

R:5d240ee9:2d62 /ifs/data/suspect.tar.gz Skipped              2019-12-09T03:50:01

----------------------------------------------------------------------------------------------------

Total: 1

 

And, details of a particular threat via:

 

# isi antivirus reports threats view <id>

 

For example:

 

# isi antivirus reports threats view R:5d240ee9:2d62

Threat id 'R:5d240ee9:2d62' is not valid.

 

Or from the WebUI, by navigating to Data Protection > AntiVirus > Detected Threats:

 

icap_2.png

 

That's it for now. In the next article in this AntiVirus series, we'll take a look at policies, exclusions, global configuration, and some monitoring and sizing ideas.

In the previous article, we looked at the scope of the ‘isi get’ CLI command. To compliment this, OneFS also provides the ‘isi set’ utility, which allows configuration of OneFS-specific file attributes.

 

This command works similarly to the UNIX ‘chmod’ command, but on OneFS-centric attributes, such as protection, caching, encoding, etc. As with isi set, files can be specified by path or LIN. Here are some examples of the command in action.

 

For example, the following syntax will recursively configure a protection policy of +2d:1n on /ifs/data/testdir1 and its contents:


# isi set –R -p +2:1 /ifs/data/testdir1


To enable write caching coalescer on testdir1 and its contents, run:


# isi set –R -c on /ifs/data/testdir1


With the addition of the –n flag, no changes will actually be made. Instead, the list of files and directories that would have write enabled is returned:


# isi set –R –n -c on /ifs/data/testdir2


The following command will configure ISO-8859-1 filename encoding on testdir3 and contents:


# isi set –R –e ISO-8859-1 /ifs/data/testdir3


To configure streaming layout on the file ‘test1’, run:


# isi set -l streaming test1


The following syntax will set a metadata-write SSD strategy on testdir1 and its contents:


# isi set –R -s metadata-write /ifs/data/testdir1


To performs a file restripe operation on the file2:


# isi set –r file2


To configure write caching on file2 via its LIN address, rather than file name:


# isi set –c on –L ` # isi get -DD file1 | grep -i LIN: | awk {'print $3}'`

1:0054:00f6

 

If you set streaming access, isi get reports that streaming prefetch is enabled:


# isi get file2.tst

default   6+2/2 concurrency on file2.tst

# isi set -a streaming file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming on    file2.tst

 

For streaming layout, the ‘@’ suffix notation indicates how many drives the file is written over. Streaming layout  optimizes for a larger number of spindles than concurrency or random.

 

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 concurrency on file2.tst

# isi set -l streaming file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming/@18 on    file2.tst

 

You can specify the number of drives to spread file across with ‘isi get –d’


# isi set -d 6 file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming/@6 on    file2.tst

 

The following table describes in more detail the various flags and options available for the isi set command:

 

Command Option

Description

-f

Suppresses warnings on failures to change a file.

-F

Includes the /ifs/.ifsvar directory content and any of its subdirectories. Without -F, the /ifs/.ifsvar directory content and any of its subdirectories are skipped. This setting allows the specification of potentially dangerous, unsupported protection policies.

-L

Specifies file arguments by LIN instead of path.

-n

Displays the list of files that would be changed without taking any action.

-v

Displays each file as it is reached.

-r

Performs a restripe on specified file.

-R

Sets protection recursively on files.

-p <policy>

Specifies protection policies in the following forms: +M Where M is the number of node failures that can be tolerated without loss of data.

+M must be a number from, where numbers 1 through 4 are valid.

+D:M Where D indicates the number of drive failures and M indicates number of node failures that can be tolerated without loss of data. D must be a number from 1 through 4 and M must be any value that divides into D evenly. For example, +2:2 and +4:2 are valid, but +1:2 and +3:2 are not.

Nx Where N is the number of independent mirrored copies of the data that will be stored. N must be a number, with 1 through 8 being valid choices.

-w <width>

Specifies the number of nodes across which a file is striped. Typically, w = N + M, but width can also mean the total of the number of nodes that are used. You can set a maximum width policy of 32, but the actual protection is still subject to the limitations on N and M.

-c {on | off}

Specifies whether write-caching (coalescing) is enabled.

-g <restripe goal>

Used in conjunction with the -r flag, -g specifies the restripe goal. The following values are valid:

  • repair
  • reprotect
  • rebalance
  • retune

-e <encoding>

Specifies the encoding of the filename.

-d <@r drives>

Specifies the minimum number of drives that the file is spread across.

-a <value>

Specifies the file access pattern optimization setting. Ie. default, streaming, random, custom, disabled.

-l <value>

Specifies the file layout optimization setting. This is equivalent to setting both the -a and -d flags. Values are concurrency, streaming, or random

--diskpool <id | name>

Sets the preferred diskpool for a file.

-A {on | off}

Specifies whether file access and protections settings should be managed manually.

-P {on | off}

Specifies whether the file inherits values from the applicable file pool policy.

-s <value>

Sets the SSD strategy for a file. The following values are valid: If the value is metadata-write, all copies of the file's metadata are laid out on SSD storage if possible, and user data still avoids SSDs. If the value is data, Both the file's meta- data and user data (one copy if using mirrored protection, all blocks if FEC) are laid out on SSD storage if possible.

avoid Writes all associated file data and metadata to HDDs only. The data and metadata of the file are stored so that SSD storage is avoided, unless doing so would result in an out-of-space condition.

metadata Writes both file data and metadata to HDDs. One mirror of the metadata for the file is on SSD storage if possible, but the strategy for data is to avoid SSD storage.

metadata-write Writes file data to HDDs and metadata to SSDs, when available. All copies of metadata for the file are on SSD storage if possible, and the strategy for data is to avoid SSD storage.

data Uses SSD node pools for both data and metadata. Both the metadata for the file and user data, one copy if using mirrored protection and all blocks if FEC, are on SSD storage if possible.

<file> {<path> | <lin>} Specifies a file by path or LIN.

--nodepool <id | name>

Sets the preferred nodepool for a file.

--packing {on | off}

Enables storage efficient packing off a small file into a shadow store container.

--mm-[access | packing | protection] { on|off}

The ‘manually manage’ prefix flag for the access, packing, and protection options described above. This ‘—mm’ flag controls whether the SmartPools job will act on the specified file or not. On means SmartPools will ignore the file, and vice versa.

trimbn

OneFS "Isi Get" CLI Command

Posted by trimbn Jan 2, 2020

One of the lesser publicized but highly versatile tools in OneFS is the ‘isi get’ command line utility. It can often prove invaluable for generating a vast array of useful information about OneFS file system objects. In its most basic form, the command outputs this following information:

 

  • Protection policy
  • Protection level
  • Layout strategy
  • Write caching strategy
  • File name

 

For example:

 

# isi get /ifs/data/file2.txt

POLICY LEVEL     PERFORMANCE      COAL FILE

default             4+2/2     concurrency      on file2.txt

 

Here’s what each of these categories represents:

 

POLICY:  Indicates the requested protection for the object, in this case a text file. This policy field is displayed in one of three colors:

 

Requested Protection Policy

Description

Green

Fully protected

Yellow

Degraded protection under a mirroring policy

Red

Under-protection using FEC parity protection

 

LEVEL:  Displays the current actual on-disk protection of the object. This can be either FEC parity protection or mirroring. For example:

 

Protection  Level

Description

+1n

Tolerate failure of 1 drive OR 1 node (Not Recommended)

+2d:1n

Tolerate failure of 2 drives OR 1 node

+2n

Tolerate failure of 2 drives OR 2 nodes

+3d:1n

Tolerate failure of 3 drives OR 1 node

+3d:1n1d

Tolerate failure of 3 drives OR 1 node AND 1 drive

+3n

Tolerate failure of 3 drives or 3 nodes

+4d:1n

Tolerate failure of 4 drives or 1 node

+4d:2n

Tolerate failure of 4 drives or 2 nodes

+4n

Tolerate failure of 4 nodes

2x to 8x

Mirrored over 2 to 8 nodes, depending on configuration

 

PERFORMANCE:  Indicates the on-disk layout strategy, for example:

 

Data Access Setting

Description

On Disk Layout

Caching

Concurrency

Optimizes for current load on cluster, featuring many simultaneous clients. Recommended for mixed workloads.

Stripes data across the minimum number of drives required to achieve the configured data protection level.

Moderate prefetching

Streaming

Optimizes for streaming of a single file. For example, fast reading by a single client.

Stripes data across a larger number of drives.

Aggressive prefetching

Random

Optimizes for unpredictable access to a file. Performs almost no cache prefetching.

Stripes data across the minimum number of drives required to achieve the configured data protection level.

Little to no prefetching

 

COAL:  Indicates whether the Coalescer, OneFS’s NVRAM based write cache, is enabled. The coalescer provides failure-safe buffering to ensure that writes are efficient and read-modify-write operations avoided.

 

The isi get command also provides a number of additional options to generate more detailed information output. As such, the basic command syntax for isi get is as follows:

 

isi get {{[-a] [-d] [-g] [-s] [{-D | -DD | -DDC}] [-R] <path>}

| {[-g] [-s] [{-D | -DD | -DDC}] [-R] -L <lin>}}

 

Here’s the description for the various flags and options available for the command:

 

Command Option

Description

-a

Displays the hidden "." and ".." entries of each directory.

-d

Displays the attributes of a directory instead of the contents.

-g

Displays detailed information, including snapshot governance lists.

-s

Displays the protection status using words instead of colors.

-D

Displays more detailed information.

-DD

Includes information about protection groups and security descriptor owners and groups.

-DDC

Includes cyclic redundancy check (CRC) information.

-L <LIN>

Displays information about the specified file or directory. Specify as a file or directory LIN.

-O

Displays any logical overlay information and a compressed block count when viewing a file’s details.

-R

Displays information about the subdirectories and files of the specified directories.

 

The following command shows the detailed properties of a directory, /ifs/data. Note that the output has been truncated slightly to aid readability:


# isi get -D data 

POLICY   W LEVEL PERFORMANCE COAL ENCODING      FILE              IADDRS

default       4x/2 concurrency on  ˜- còÎ" v:shapes="Picture_x0020_10">  N/A ./ <1,36,268734976:512>, <1,37,67406848:512>, <2,37,269256704:512>, <3,37,336369152:512> ct: 1459203780 rt: 0 

*************************************************

* IFS inode: [ 1,36,268734976:512, 1,37,67406848:512, 2,37,269256704:512, 3,37,336369152:512 ]   ×ôë¸å ]ï¤Ý" v:shapes="Picture_x0020_2">

*************************************************

*  Inode Version:      6

*  Dir Version:        2

*  Inode Revision:     6

*  Inode Mirror Count: 4

*  Recovered Flag:     0

*  Restripe State:     0

*  Link Count:         3

*  Size:               54

*  Mode:               040777

*  Flags:              0xe0

*  Stubbed:            False

*  Physical Blocks:    0

*  LIN:                1:0000:0004 

*  Logical Size:       None

*  Shadow refs:        0

*  Do not dedupe:      0

*  Last Modified:      1461091982.785802190

*  Last Inode Change:  1461091982.785802190

*  Create Time:        1459203780.720209076

*  Rename Time:        0

*  Write Caching:      Enabled

*  Parent Lin          2

*  Parent Hash:        763857

*  Snapshot IDs:       None

*  Last Paint ID:      47

*  Domain IDs:         None

*  LIN needs repair:   False

*  Manually Manage:

*       Access         False

*       Protection     True

*  Protection Policy:  default

*  Target Protection:  4x

*  Disk pools:         policy any pool group ID -> data target z x410_136tb_1.6tb-ssd_256gb:32(32), metadata target x410_136tb_1.6tb-ssd_256gb:32(32)

*  SSD Strategy:       metadata-write  {

*  SSD Status:         complete

*  Layout drive count: 0

*  Access pattern: 0

*  Data Width Device List:

*  Meta Width Device List:

*

*  File Data (78 bytes):

*    Metatree Depth: 1

*  Dynamic Attributes (40 bytes):

        ATTRIBUTE                OFFSET SIZE

        New file attribute       0 23

        Isilon flags v2          23 3

        Disk pool policy ID      26 5

        Last snapshot paint time 31     9

*************************************************

 

*  NEW FILE ATTRIBUTES |

*  Access attributes:  active

*  Write Cache: on

*  Access Pattern:  concurrency

*  At_r: 0

*  Protection attributes:  active

*  Protection Policy:  default

  1. *  Disk pools:         policy any pool group ID 

*  SSD Strategy:       metadata-write

*

*************************************************

 

Here is what some of these lines indicate:            

 

Line number

Description

1

OneFS command to display the file system properties of a directory or file.

2

The directory's data access pattern is set to concurrency.

3

Write caching (coalescer) is turned on.

4

Inode on-disk locations

5

Primary LIN.

6

Indicates disk pools that the data and metadata are targeted to.

7

the SSD strategy is set to metadata-write.

8

Files that are added to the directory are governed by these settings, most of which can be changed by applying a file pool policy to the directory.

 

From the WebUI, a subset of the ‘isi get –D’ output is also available from the OneFS File Explorer. This can be accessed by browsing to File System > File System Explorer and clicking on ‘View Property Details’ for the file system object of interest.


One question that is frequently asked is how to find where a file's inodes live on the cluster. The ‘isi get -D’ command output makes this fairly straightforward to answer. Take the file /ifs/data/file1, for example:


# isi get -D /ifs/data/file1 | grep -i "IFS inode"

* IFS inode: [ 1,9,8388971520:512, 2,9,2934243840:512, 3,8,9568206336:512 ]


This shows the three inode locations for the file in the *,*,*:512 notation. Let’s take the first of these:


1,9,8388971520:512


From this, we can deduce the following:

 

  • The inode is on node 1, drive 9 (logical drive number).
  • The logical inode number is 8388971520.
  • It’s an inode block that’s 512 bytes in size (Note: OneFS data blocks are 8kB in size).


Another example of where isi get can be useful is in mapping between a file system object’s pathname and its LIN (logical inode number). This might be for translating a LIN returned by an audit logfile or job engine report into a valid filename, or finding an open file from vnodes output, etc.


For example, say you wish to know which configuration file is being used by the cluster’s DNS service:


1.  First, inspect the busy_vnodes output and filter for DNS:


# sysctl efs.bam.busy_vnodes | grep -i dns

vnode 0xfffff8031f28baa0 (lin 1:0066:0007) is fd 19 of pid 4812: isi_dnsiq_d

 

This, among other things, provides the LIN for the isi_dnsiq_d process.


2.  The output can be further refined to just the LIN address as such:


# sysctl efs.bam.busy_vnodes | grep -i dns | awk '{print $4}' | sed -E 's/\)//'

1:0066:0007


3.  This LIN address can then be fed into ‘isi get’ using the ‘-L’ flag, and a valid name and path for the file will be output:


# isi get -L `sysctl efs.bam.busy_vnodes | grep -i dns | grep -v "(lin 0)" | awk '{print $4}' | sed -E 's/\)//'`

A valid path for LIN 0x100660007 is /ifs/.ifsvar/modules/flexnet/flx_config.xml


This confirms that the XML configuration file in use by isi_dnsiq_d is flx_config.xml.


OneFS 8.2.1 and later also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:


# isi get –DDO file1

* Size:           167772160

* PhysicalBlocks: 10314

* LogicalSize:    167772160

PROTECTION GROUPS

lbn0: 6+2/2

2,11,589365248:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

2,4,691601408:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

Metatree logical blocks:

zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000

 

The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.


When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.

A file that for whatever reason cannot be compressed will be reported as such:

4,6,900382720:8192[INCOMPRESSIBLE]#1

So, to recap, the ‘isi get’ command provides information about an individual or set of file system objects.

trimbn

OneFS SmartQuotas and Dedupe

Posted by trimbn Dec 17, 2019

Got a question from the field asking whether a deduplicated file gets reported by and counted against SmartQuotas, and if there’s a performance penalty accessing that deduplicated file.


With OneFS, deduplicated files appear no differently than regular files to standard quota policies, regardless of whether the file has been deduplicated by SmartDedupe or OneFS in-line deduplication – or both. This is also true if the file is a clone or has been containerized by OneFS Small File Storage Efficiency (SFSE), both of which also use shadow stores, and also for in-line compression.


However, if the quota accounting is configured for ‘physical size’, which includes data-protection overhead, the additional space used by the shadow store will not be accounted for by the quota.

 

In OneFS 8.2.1, SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of deduplication (plus in-line compression, if available and enabled). On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.


dedupe-quota-1.png


Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:


# isi quota quotas list

Type      AppliesTo Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No    - -     -    2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1

Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.

 

dedupe-quota-2.png

 

To configure SmartQuotas for in-line data efficiency reporting create a directory quota at the top-level file system directory of interest, for example /ifs. The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there.


In addition to SmartQuotas, OneFS provides several other reporting methods for obtaining efficiency information about deduplication, and data reduction in general. The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:


339.50 / 192.87 = 1.76    Or a Data Reduction ratio of 1.76:1


Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:


339.50 / 350.13 = 0.97    Or an Efficiency ratio of 0.97:1


In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency.


# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

Similarly, the WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.


dedupe-quota-3.png

 

OneFS’ WebUI cluster dashboard now displays a storage efficiency tile, which shows physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

dedupe-quota-4.png

 

SmartDedupe also deduplicates common blocks within the same file, resulting in even better data efficiency.

 

InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.


So how does SmartDedupe play with the other storage management and data protection tools in OneFS portfolio? Let’s take a look:


When deduplicated files are replicated to another Isilon cluster via SyncIQ, or backed up to a tape device, the deduplicated files are inflated (or rehydrated) back to their original size, since they no longer share blocks on the target Isilon cluster. However, once replicated data has landed, SmartDedupe can be run on the target cluster to provide the same space efficiency benefits as on the source.


Shadows stores are not transferred to target clusters or backup devices. Because of this, deduplicated files do not consume less space than non-deduplicated files when they are replicated or backed up. To avoid running out of space on target clusters or tape devices, it is important to verify that the total amount of storage space saved and storage space consumed does not exceed the available space on the target cluster or tape device. To reduce the amount of storage space consumed on a target Isilon cluster, you can configure deduplication for the target directories of your replication policies. Although this will deduplicate data on the target directory, it will not allow SyncIQ to transfer shadow stores. Deduplication is still performed post-replication, via a deduplication job running on the target cluster.


Because files are backed up as if the files were not deduplicated, backup and replication operations are not faster for deduplicated data. You can deduplicate data while the data is being replicated or backed up. It’s also worth noting that OneFS NDMP backup data won’t be deduped unless deduplication is provided by the backup vendor’s DMA software. However, compression is often provided natively by the backup tape or VTL device instead.


SmartDedupe does not deduplicate the data stored in a snapshot. However, snapshots can be created of deduplicated data. If a snapshot is taken of a deduplicated directory, and then the contents of that directory are modified, the shadow stores will be transferred to the snapshot over time. Because of this, more space will be saved on a cluster if deduplication is run prior to enabling snapshots. If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, it will take time before the snapshot data is affected by deduplication. Newly created snapshots will contain deduplicated data, but older snapshots will not.


It is also good practice to revert a snapshot before running a deduplication job. Restoring a snapshot will cause many of the files on the cluster to be overwritten. Any deduplicated files are reverted back to normal files if they are overwritten by a snapshot revert. However, once the snapshot revert is complete, deduplication can be run on the directory again and the resulting space savings will persist on the cluster.


Dedupe is also fully compatible with SmartLock, OneFS’ data retention and compliance product. SmartDedupe delivers storage efficiency for immutable archives and write once, read many (or WORM) protected data sets.

However, OneFS will not deduplicate files that span SmartPools pools or tiers, or that have different protection levels set. This is to avoid potential performance or protection asymmetry which could occur if portions of a file live on different classes of storage.


InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.

Filter Blog

By date:
By tag: