In the first article of this series, we explored the architecture and configuration of OneFS Small File Storage Efficiency. Next, we'll take a look at SFSE monitoring & reporting, defragmentation, plus some considerations and recommended practices.


There are three main CLI commands that report on the status and effect of SFSE. These are:

 

  • isi job reports view <job_id>
  • isi_packing –fsa
  • isi_sfse_assess

 

In when running the isi job report view command, enter the job ID as an argument. In the command output, the ‘file packed’ field will indicate how many files have been successfully containerized. For example, for job ID 1018:

 

# isi job reports view –v 1018

SmartPools[1018] phase 1 (2019-02-31T10:29:47

---------------------------------------------

Elapsed time                        12 seconds

Working time                        12 seconds

Group at phase end                  <1,6>: { 1:0-5, smb: 1, nfs: 1, hdfs: 1, swift: 1, all_enabled_protocols: 1}

Errors

‘dicom’:

      {‘Policy Number’: 0,

      ‘Files matched’: {‘head’:512, ‘snapshot’: 256}

      ‘Directories matched’: {‘head’: 20, ‘snapshot’: 10},

      ‘ADS containers matched’: {‘head’:0, ‘snapshot’: 0},

      ‘ADS streams matched’: {‘head’:0, ‘snapshot’: 0},

      ‘Access changes skipped’: 0,

‘Protection changes skipped’: 0,

‘Packing changes skipped’: 0,

‘File creation templates matched’: 0,

‘Skipped packing non-regular files’: 2,

‘Files packed’: 48672,

‘Files repacked’: 0,

‘Files unpacked’: 0,

},

}   

 

The second command, isi_packing –fsa, provides a storage efficiency percentage in the last line of its output. This command requires InsightIQ to be licensed on the cluster and a successful run of the file system analysis (FSA) job.


If FSA has not been run previously, it can be kicked off with the following 'isi job jobs start FSAnalyze' command. For example:

 

# isi job jobs start FSAnalyze

Started job [1018]

 

When this job has completed, run:

 

# isi_packing -–fsa -–fsa-jobid 1018

FSAnalyze job: 1018 (Mon Feb 29 22:01:21 2019)

Logical size:  47.371T

Physical size: 58.127T

Efficiency:    81.50%

 

In this case, the storage efficiency achieved after containerizing the data is 81.50%, as reported by isi_packing.

If you don't specify an FSAnalyze job ID, the –fsa defaults to the last successful FSAnalyze job run results.

Be aware that the isi_packing --fsa command reports on the whole /ifs file system. This means that the overall utilization percentage can be misleading if other, non-containerized data is also present on the cluster.

 

There is also a Storage Efficiency assessment tool available in OneFS 8.2. This can be run as from the CLI with the following syntax: # isi_sfse_assess <options>

 

Estimated storage efficiency is presented in the tool’s output in terms of raw space savings as a total and percentage and a percentage reduction in protection group overhead.

 

SFSE estimation summary:

* Raw space saving: 1.7 GB (25.86%)

* PG reduction: 25978 (78.73%)

 

 

When containerized files with shadow references are deleted, truncated or overwritten it can leave unreferenced blocks in shadow stores. These blocks are later freed and can result in holes which reduces the storage efficiency.


sfse_2.png

 

The actual efficiency loss depends on the protection level layout used by the shadow store. Smaller protection group sizes are more susceptible, as are containerized files, since all the blocks in containers have at most one referring file and the packed sizes (file size) are small.


In OneFS 8.2, a shadow store deframenter is added to reduce fragmentation resulting of overwrites and deletes of files. This defragmenter is integrated into the ShadowStoreDelete job. The defragmentation process works by dividing each containerized file into logical chunks (~32MB each) and assessing each chunk for fragmentation.


sfse_3.png

 

If the storage efficiency of a fragmented chunk is below target, that chunk is processed by evacuating the data to another location. The default target efficiency is 90% of the maximum storage efficiency available with the protection level used by the shadow store. Larger protection group sizes can tolerate a higher level of fragmentation before the storage efficiency drops below this threshold.


In OneFS 8.2, the ‘isi_sstore list’ command is enhanced to display fragmentation and efficiency scores. For example:


# isi_sstore list -v                    

              SIN lsize   psize   refs filesize  date       sin type underfull frag score efficiency

4100:0001:0001:0000 128128K 192864K 32032 128128K Sep 20 22:55 container no       0.01        0.66

 

The fragmentation score is the ratio of holes in the data where FEC is still required, whereas the efficiency value is a ratio of logical data blocks to total physical blocks used by the shadow store. Fully sparse stripes don't need FEC so are not included. The rule of thumb is that lower fragmentation scores and higher efficiency scores are better.


The defragmenter does not require a license to run and is disabled by default in OneFS 8.2. It can be easily activated using the following CLI commands:


# isi_gconfig -t defrag-config defrag_enabled=true


Once enabled, the defragmenter can be started via the job engine’s ShadowStoreDelete job, either from the OneFS WebUI or via the following CLI command:


# isi job jobs start ShadowStoreDelete


The defragmenter can also be run in an assessment mode. This reports on and helps to determine the amount of disk space that will be reclaimed, without moving any actual data. The ShadowStoreDelete job can run the defragmenter in assessment mode but the statistics generated are not reported by the job. The isi_sstore CLI command has a ‘defrag’ option and can be run with the following syntax to generate a defragmentation assessment:


# isi_sstore defrag -d -a -c -p -v

Processed 1 of 1 (100.00%) shadow stores, space reclaimed 31M

Summary:

Shadows stores total: 1

Shadows stores processed: 1

Shadows stores skipped: 0

Shadows stores with error: 0

Chunks needing defrag: 4

Estimated space savings: 31M


Isilon Small File Storage Efficiency for Archive is not free. There’s always trade-off between cluster resource consumption (CPU, memory, disk), the potential for data fragmentation and the benefit of improved space utilization. As such, it's worth bearing the following in mind:


  • This is a storage efficiency product, not a performance product.
  • A valid Isilon SmartPools software license is required in order to configure small file storage efficiency on a cluster.
  • The time to retrieve a packed archive image should not be much greater than an unpacked image data – unless fragmentation has occurred.
  • Configuration is only via the OneFS CLI, rather than the WebUI, at this point.
  • After enabling a filepool policy, the first SmartPools job may take a relatively long time due to packing work, but subsequent runs should be much faster.
  • For clusters using CloudPools you cannot containerize stubbed files.
  • SyncIQ data will be unpacked during replication, so SmartPools will need to be licensed and packing configured on the target cluster.
  • If the data is in a snapshot, it won’t be packed – only HEAD file data will be containerized.
  • The isi_packing --fsa command reports on the whole filesystem, so the overall utilization percentage can be misleading if other, non-containerized data is also present on the cluster.
  • Alternate data streams (ADS, i.e. the streams themselves, not the parent files) will not be containerized by default.
  • Packing and unpacking will be logically preserving actions, they will not cause logical changes to a file and therefore will not trigger snapshot COW.
  • If you’ve already run Isilon SmartDedupe data deduplication software on your data, you won’t see much additional benefit because your data is already in shadow stores.
  • If you run SmartDedupe against packed data, the deduped files will be skipped.
  • You can clone files with packed data.
  • Containerization is managed by the SmartPools job. However, the SmartPoolsTree job, isi filepool apply, and isi set will also be able to perform file packing.


Similarly, some recommended best practices for SFSE include:


  • Only enable storage efficiency on an archive workflow with a high percentage of small files.
  • The majority of logical space used on cluster is for small files. In this case, small files are considered as less than 512 KB in size.
  • The default minimum age for packing is anything over one day, and this will override anything configured in the filepool policy.
  • Limit changes (overwrites and deletes) to containerized files, which cause fragmentation and impact both file read performance and storage efficiency
  • Ensure there’s sufficient free space available on the cluster before unpacking any containerized data.
  • Ensure the archive solution being used does not natively perform file containerization, or the benefits of Isilon small file storage efficiency will likely be negated.
  • Use a path based filepool policy for configuration, where possible, rather than more complex filepool filtering logic.
  • Don’t configure the maximum file size value inside the file pool filter itself. Instead set this parameter via the isi_packing command.
  • Use SFSE to archive static small file workloads, or those with only moderate overwrites and deletes.
  • If necessary, run the defragmentation job on a defined schedule (ie. weekly) to eliminate fragmentation.