Find Communities by: Category | Product

1 2 3 Previous Next

Isilon

267 Posts
trimbn

OneFS Snapshot Scheduling

Posted by trimbn Dec 11, 2019

One question that frequently crops up from the field is what snapshot schedule to configure on a particular cluster.


SnapshotIQ scheduling allows cluster administrators to automatically generate snapshots according to a pre-defined itinerary. While there definitely isn’t a ‘one size fits all’ recommendation to make, three of main drivers for this decision are:

 

  • Recovery point objective (RPO)
  • Available cluster capacity
  • Dataset rate of change

 

An organization’s data security, availability, and disaster recovery policy will often answer the first question – how much? Many companies define explicit service level requirements (SLAs) around the availability of their data. RPO is the acceptable amount of data loss that can be tolerated. With an RPO of 30-minutes, for example, a half hour is the maximum amount of time that can elapse since the last backup or snapshot was taken.


While OneFS does not require any cluster capacity to be exclusively reserved for snapshots, obviously snaps do consume space. Furthermore, this space will grow the more HEAD data changes, and as more snapshots are retained.


OneFS snapshot schedules can be configured at daily, weekly, monthly or yearly intervals, with single or multiple job frequency per schedule, and down to a per-minute granularity.

There are two main strategies for snapshot scheduling:

  • Ordered Deletion
  • Unordered Deletion


Ordered deletion is suited to data sets with a low rate of change, such as archive or other cold data; whereas unordered deletion, which retains considerably fewer snapshots, is recommended for more active data, or clusters with limited capacity available.

The following table provides a recommended snapshot schedule for both ordered and unordered deletion configurations:

snap_schedule2.png

 

The following CLI command will create a schedule for hourly snapshots of the /ifs/data/prod directory and its contents, plus a one month retention setting:


# isi snapshot schedules create hourly /ifs/data/media HourlyBackup_%m-%d-%Y_%H:%M "Every day every hour" --duration 1M


To configure a similar schedule from the WebUI, navigating to Data Protection > Snapshots > Snapshot Schedules and clicking on the ‘Create a Schedule’ button.


snapshot_sched_1.png


On the other hand, the following commands create an unordered deletion schedule for /ifs/data/prod that generate snapshots at hourly, daily, weekly and monthly cadences:

 

# isi snapshot schedules create every-other-hour /ifs/data/prod EveryOtherHourBackup_%m-%d-%Y_%H:%M "Every day every 2 hours" --duration 1D


# isi snapshot schedules create daily /ifs/data/prod Daily_%m-%d-%Y_%H:%M "Every day at 12:00 AM" --duration 1W


# isi snapshot schedules create weekly /ifs/data/prod Weekly_%m-%d-%Y_%H:%M "Every Saturday at 12:00 AM" --duration 1M


# isi snapshot schedules create monthly /ifs/data/prod Monthly_%m-%d-%Y_%H:%M "The 1 Saturday of every month at 12:00 AM" --duration 3M

 

Existing snapshot schedules can be viewed from the CLI with the following command:

# isi snapshot schedules list

ID Name

---------------------

1 every-hour

2 daily

3 weekly

4 monthly

---------------------

 

More detailed information about a particular snapshot is also available. For example, the following command will display more context about the ‘every-hour’ schedule above:

# isi snapshot schedules view every-hour

ID: 1

Name: every-other-hour

Path: /ifs/data/media

Pattern: EveryOtherHourBackup_%m-%d-%Y_%H:%M

Schedule: Every day every 2 hours

Duration: 1D

Alias: -

Next Run: 2019-12-10T17:00:00

Next Snapshot: EveryHourBackup_12-10-2019_17:00

 

Another important consideration when configuring snapshot schedules at any level of scale is the snapshot naming convention. If you schedule snapshots to be automatically generated, either according to a snapshot schedule or a replication policy, a snapshot naming pattern that determines how the snapshots are named. Snapshot naming patterns contain variables that include information about how and when the snapshot was created.

The following variables can be included in a snapshot naming pattern:

 

Variable

Description

%A

Day of the week

%a

Abbreviated day of the week. Ie. if the snapshot is generated on a Sunday, %a will have value ‘Sun’

%B

Name of the month

%b

Abbreviated name of the month. Ie. if the snapshot is generated in September, %b will have value ‘Sep’

%C

First two digits of the year

%c

The time and day. This variable is equivalent to specifying %a %b %e %T %Y

%d

Two-digit day of the month

%e

Day of the month. A single digit day is preceded by a blank space

%F

The date. This variable is equivalent to %Y-%m-%d

%G

The year. This variable is equivalent to specifying %Y. However, if the snapshot is created in a week that has less than four days in the current year, the year that contains the majority of the days of the week is displayed. The first day of the week is calculated as Monday. Ie, if a snapshot is created on Sunday, January 1, 2020, %G is replaced with 2019, because only one day of that week is in 2019.

%g

The abbreviated year. This variable is equivalent to specifying %y.

%H

The hour. The hour is represented on the 24-hour clock. Single-digit hours are preceded by a zero. For example, if a snapshot is created at 1:45 AM, %H is replaced with 01

%I

The hour represented on the 12-hour clock. Single-digit hours are preceded by a zero. Ie. if a snapshot is created at 1:45 PM, %I is replaced with 01

%j

The numeric day of the year. Ie. If a snapshot is created on February 1, %j is replaced with 32 .

%k

The hour represented on the 24-hour clock.

%l

The hour represented on the 12-hour clock. Single-digit hours are preceded by a blank space. Ie, if a snapshot is created at 1:45 AM, %I is replaced with 1

%M

Two-digit minute

%m

Two-digit month

%p

AM or PM

%{PolicyName}

Name of the replication policy that the snapshot was created for. This variable is only valid if specifying a snapshot naming pattern for a replication policy.

%R

The time. This variable is equivalent to specifying %H:%M

%r

The time. This variable is equivalent to specifying %I:%M:%S %p

%S

Two-digit second

%s

The second represented in POSIX time

%{SrcCluster}

The name of the source cluster of the replication policy that the snapshot was created for. Valid only if specifying a snapshot naming pattern for a replication policy

%T

The time. Equivalent to %H: %M: %S

%U

Two-digit numerical week of the year

%u

Numerical day of the week. Ie. If a snapshot is created on Sunday, %u has value 7

%V

Two-digit numerical week

%v

Day of snapshot creation. Equivalent to %a-%b-%Y

%W

Two-digit numerical week of the year that the snapshot was created in

%X

Time that snapshot was created. Equivalent to %H: %M: %S

%Y

Year the snapshot was created

%y

Last two digits of snapshot creation year

%Z

Time zone the snapshot was created in

%z

Offset from UTC time of time zone snapshot was created in

%+

Time and date of snapshot creation. Equivalent to %a %b %e %X %Z %Y

 

 

Similarly, automatic snapshot deletion can also be configured per defined schedule at an hourly through yearly range.

Snapshot attributes such as name and expiry date can easily be changed. For example, the following command will cause the snapshot ‘HourlyBackup_06-15-2018_22:00 to expire at 2:30 PM on 30th December 2019:


# isi snapshot snapshots modify HourlyBackup_06-15-2018_22:00 --expires 2019-12-30T02:30

A snapshot schedule can also be easily modified. However, any changes to a schedule are applied only to snapshots generated after the modifications are made. Existing snapshots are not affected by schedule modifications. If the alias of a snapshot schedule is modified, the alias is assigned to the next snapshot generated based on the schedule. However, the old alias is not removed from the last snapshot that it was assigned to. Unless you manually remove the old alias, the alias will remain attached to the last snapshot that it was assigned to.

For example, the following command causes snapshots created according to the snapshot schedule hourly_prod_snap to be deleted 15 days after they are created:

# isi snapshot schedules modify hourly_prod_snap --duration 15D


Similarly, deleting a snapshot schedule will not remove snapshots that were previously generated according to the schedule.

The following command will delete the snapshot schedule named ‘hourly_prod_snap’:

# isi snapshot schedules delete hourly_prod_snap


You can configure a snapshot schedule to assign a snapshot alias to the most recent snapshot created by the schedule. As such, the alias will be  assigned to the next snapshot generated based on the schedule. However, the old alias is not automatically removed from the last snapshot that it was assigned to. Unless you manually remove the old alias, the alias will remain attached to the last snapshot that it was assigned to.

For example, the following command will configure the the snapshot schedule WeeklySnapshot to use the alias ‘LatestWeekly’:

# isi snapshot schedules modify WeeklySnapshot –alias LatestWeekly


It’s worth noting that a snapshot schedule cannot span multiple days. For example, you cannot specify to begin generating snapshots at 5:00 PM Monday and end at 5:00 AM Tuesday. Instead, to continuously generate snapshots for a period greater than a day, two individual snapshot schedules are required.

 

In order to generate snapshots from 5:00 PM Monday to 5:00 AM Tuesday, for example, create one schedule that generates snapshots from 5:00 PM to 11:59 PM on Monday, and another schedule that generates snapshots from 12:00 AM to 5:00 AM on Tuesday.

 

For mixed node clusters, associated with the snapshot schedule frequency question may also be a decision as to which storage tier of a cluster to house the snapshots on.  This can be set, along with a specific protection level and SSD strategy for the snapshot, in the SmartPools file pool policy configuration. For example, from the WebUI browse to File System > Storage Pools > File Pools and select the desired policy.

 

snapshot_sched_3.png

 

SnapshotIQ also provides a number of global snapshot settings, including:


  • Control of auto-creation of scheduled snapshots
  • Deletion of expired snapshots
  • The ability to enable and disable the snapshot service
  • Per-protocol and complete control of snapshot visibility and and accessibility

 

These global snapshot storage settings can be accessed and configured in the WebUI by browsing to Data Protection > Snapshots > Settings:


snapshot_sched_4.png

 

Or from the CLI, via:


# isi snapshot settings view

 

The following table provides a description of the global snapshot configuration settings:

 

Attribute

Description

Autodelete

Determines whether snapshots are automatically deleted according to their expiration dates.

Reserve

Specifies the percentage of disk space on the cluster that is reserved for snapshots.

NFS Root Accessible

Determines whether snapshot directories are accessible through NFS

NFS Root Visible

Determines whether snapshot directories are visible through NFS.

NFS Subdir Accessible

Determines whether snapshot subdirectories are accessible through NFS.

SMB Root Accessible

Determines whether snapshot directories are accessible through SMB. 

SMB Root Visible

Determines whether snapshot directories are visible through SMB.

SMB Subdir Accessible

Determines whether snapshot subdirectories are accessible through SMB.

Local Root Accessible

Determines whether snapshot directories are accessible through an SSH connection or the local console.

Local Root Visible

Determines whether snapshot directories are visible through the an SSH connection or the local console.

Local Subdir Accessible

Determines whether snapshot subdirectories are accessible through an SSH connection or the local console.

Received an interesting snapshot restore inquiry from the field and thought it was worth incorporating into a blog article. The scenario this this: A large amount of data needs to be restored on a cluster. Unfortunately, the SnapshotIQ policies are configured at the root /ifs level and it is not feasible to restore every subdirectory under the snapshot. Although the files themselves are not that large, the subdirectories contain anywhere from thousands to tens of millions of files. Restores are taking a very long time when copying the directories manually.


So, there are two main issues at play here:


  • Since the snapshot is taken at a lower level in the directory tree and the entire snapshot cannot be restored in place, using the SnapRevert job is not an option here.
  • The sheer quantity of files involved mean that a manual, serial restore of the data will be incredibly time consuming.


Fortunately, there is a solution that involves using replication. SyncIQ allows for snapshot subdirectories to be included or excluded, plus also provides the performance benefit of parallel job processing.


SyncIQ contains an option only available via the command line (CLI) which allows replicate out of a snapshot.


The procedure is as follows:


1)     Create a snapshot of a root directory.

# isi snapshot snapshots create --name snaptest3 /ifs/data


2)     List the available snapshots and select the desired instance.

 

For example:


# isi snapshot list

ID Name Path

----------------------------------------------------

6 FSAnalyze-Snapshot-Current-1529557209 /ifs

8    snaptest3                             /ifs/data

----------------------------------------------------

Total: 2


Note that there are a couple of caveats:


  • The subdirectory to be restored must still exist in the HEAD filesystem (ie. not have been deleted since the snapshot was taken).
  • You cannot replicate data from a SyncIQ generated snapshot.

 

3)     Create a local SyncIQ replication policy with the snapshot source as the original location and a new directory location on ‘localhost’ as the destination. The ‘—source-include-directories’ argument lists the desired subdirectory(s) to restore.

 

For example, via the CLI:

 

# isi sync policies create snapshot_sync3 sync /ifs/data localhost /ifs/file_sync3 --source-include-directories /ifs/data/local_qa

 

Or via the WebUI:

 

SyncIQ_snapshot_replication_1.png

 

Note:  You cannot configure the snapshot into the policy, or set source=snapshot.


4)     Next, run the sync job to replicate a subset of a snapshot. This step is CLI only (not WebUI) since the SyncIQ policy needs to be executed with ‘--source-snapshot’ argument specified.

 

For example:


# isi sync job start snapshot_sync3 --source-snapshot=snaptest3


Note: This command is essentially a change root for the single run of the SyncIQ Job.


5)     Finally, rename the original directory to something else with mv, and then rename the restore location to the original name.

 

For example:

 

# mv /ifs/data/local_qa /ifs/data/local_qa_old

# mv /ifs/file_sync3/local_qa /ifs/data/local_qa


If you do not have a current replication license on your cluster, you can enable the OneFS SyncIQ trial license from the WebUI by browsing to Cluster Management > Licensing.


Using SyncIQ in this manner is a very efficient way to recover large amounts of data from within snapshots. However, this scenario also illustrates one of the drawbacks of taking snapshots at the root directory level. Consider whether it’s more advantageous to configure snapshot schedules to capture at the subdirectory directory level instead.

trimbn

OneFS Snapshot Tiering

Posted by trimbn Nov 25, 2019

Within OneFS, data tiering falls under the purview of SmartPools, and snapshot tiering is no different. SmartPools file pool policies can be crafted to identify logical groups of files (or file pools) and storage operations can be specified and applied to these files.

 

Be aware that a SmartPools license must be activated before creating file pool policies, and SmartPools or higher administrative privilege is required for configuration.

 

File pool policies have two components:

 

  1. File-matching criteria that define a file pool
  2. Actions to be applied to the file pool

 

File pools can be defined based on characteristics such as file type, size, path, birth, change, access timestamps, etc. These criteria can then be combined with Boolean operators (ie. AND, OR). In addition to file-matching criteria, a variety of actions can be applied to the file pool, including:

 

  • Identifying data and snapshot storage tiers
  • Defining data and snapshot SSD strategies
  • Enabling or disabling SmartCache
  • Setting requested protection and data-access optimization parameters

 

The Snapshot Storage Target setting is applied to each file version by SmartPools. When a snapshot is taken (ie. on copy on write), the pool setting is simply preserved. This means that the snapshot will initially be written to the default data pool and then moved. The SmartPools job subsequently finds the snapshot version and moves it to desired pool or tier during the next scheduled SmartPools job run.

 

To configure the Snapshot Storage Target setting from the WebUI, browse to Storage Pools > File Pool Policies > Edit Default Policy Details. For example, the following will configure SmartPools to store snapshots on the cluster’s ‘archive’ tier:

 

snapshot_tiering_1.png

 

The same can be achieved from the CLI using the 'isi filepool modify' command. For example:

 

# isi filepool default-policy modify --snapshot-storage-target archive

 

In addition to the storage target, the preferred Snapshot SSD strategy can also be configured here. The available options are:

 

SSD Strategy

Description

Metadata

Place a copy of snapshot metadata on SSD for read acceleration

Metadata-write

Place all snapshot metadata on SSD for read & write acceleration

Data

Place all snapshot data and metadata on SSD

Avoid

No snapshot data or metadata on SSD

 

 

The following CLI command, for example, will place a mirror of the snapshot metadata on SSD, providing metadata read acceleration:

 

# isi filepool default-policy modify --snapshot-ssd-strategy metadata

 

Similarly, for regular files, SmartPools determines which node pool to write to based on one of two situations: If a file matches a file pool policy based on directory path, that file will be written into the Node Pool dictated by the File Pool policy immediately.  If that file matches a file pool policy which is based on any other criteria besides path name, SmartPools will write that file to the Node Pool with the most available capacity. If the file matches a file pool policy that places it on a different Node Pool than the highest capacity Node Pool, it will be moved when the next scheduled SmartPools job runs.


snapshot_tiering_3.png

 

Under the covers, when the SmartPools or FilePolicy job runs, it caches a policy on directories that it thinks will be applied to children of that directory.  When files are created they start out with that policy. At the next scheduled SmartPools job run, if a different policy matches from the configured file pool rules, it is applied at that time.

trimbn

OneFS Snapshot Deletion

Posted by trimbn Nov 19, 2019

Received a recent enquiry about snapshot deletion, and thought it was worth elaborating upon in a blog article:

OneFS Snapshots are created at the directory-level instead of the volume-level, thereby providing a high degree of granularity. However, they are a point in time immutable copy of a specified subset of OneFS’ data, so can’t be altered in any way once taken - beyond a full deletion. As such, removing a portion of an existing snapshot is not an option: Deleting an Isilon snapshot is an all-or-nothing event.


There are a couple of useful OneFS CLI commands that show how much space is consumed by snapshots:


First up, the ‘isi_classic snapshot usage’ command will display the existing snapshots and their disk usage. For example:


# isi_classic snapshot usage

FSAnalyze-Snapshot-Current-1530077114             51G     n/a (R) 0.00% (T)

SIQ-Failover-snapshot_sync3-2019-10-22            4.0K     n/a (R) 0.00% (T)

SIQ-Failover-snapsync-2019-10-22_12-02            4.0K     n/a (R) 0.00% (T)

Snapshot: 2019Oct22, 08:20:05 AM                  1.9G     n/a (R) 0.00% (T)

[snapid 57, delete pending]                          0     n/a (R) 0.00% (T)

snaptest1 6.0K     n/a (R)    0.00% (T)

snaptest2 70K     n/a (R)    0.00% (T)

snaptest3                                         1.3M     n/a (R) 0.00% (T)


In addition to the name of the snapshot and the amount of space the snapshot takes up, the percentage of the snapshot reserved space this accounts for (R), and the percentage of the total filesystem space this accounts for (T) are also displayed.


Secondly, the ‘isi snapshot view’ command can be used to find more detailed info for an individual snapshot. This includes the snapshot path, alias, ID, whether there are any locks, expiry date, etc. For example:

 

# isi snapshot view FSAnalyze-Snapshot-Current-1530077114

ID: 56

Name: FSAnalyze-Snapshot-Current-1530077114

Path: /ifs

Has Locks: No

Schedule: -

  Alias Target ID: -

Alias Target Name: -

Created: 2019-10-26T22:25:14

Expires: -

Size: 50.764G

Shadow Bytes: 0

% Reserve: 0.00%

     % Filesystem: 0.00%

State: active

 

 

Snapshots can be automatically deleted on a preconfigured schedule, or manually deleted via the ‘isi snapshot snapshots delete’ CLI command.



Usage:

    isi snapshot snapshots delete { <snapshot> | --schedule <string> | --type

(alias | real) | --all }

[{--force | -f}]

[{--verbose | -v}]

[{--help | -h}]

 

Options:

<snapshot>

Delete a single snapshot.

<schedule>

Delete all snapshots created by the given schedule.

    <type> (alias | real)

Delete all snapshots of the specified type.

    --all

Delete all snapshots.

 

 

Let’s look at a simple example:


1)  The following snapshot usage command lists the available snapshots and their size, ordered by age:


# isi_classic snapshot usage

CBsnapshot                                      85K      n/a (R)    0.00% (T)

Hourly - prod 6.0K     n/a (R)    0.00% (T)

SIQ-Failover-CBpolicy1-2019-10-29_13-0          6.0K     n/a (R) 0.00% (T)

Daily_2019-11-12_12:00                          584M     n/a (R) 0.00% (T)

Weekly_2019-11-11_12:00                         6.0K     n/a (R) 0.00% (T)

 

From this output, we can see the snapshot ‘Daily_2019-11-12_12:00’ is 584MB in size and appears to be a viable candidate for deletion.

 

2)  The following CLI command will return the snapshot ID.   


# isi snapshot snapshots list | grep “Daily_2019-11-12_12:00” | awk '{print $1}'

110


3)  Next, we can use the snap ID to verify the snapshot details to ensure its deletion is desirable:


# isi snapshot snapshots view `isi snapshot snapshots list | grep Daily_2019-11-12_12:00| awk '{print $1}'`

 

ID: 110

Name: Daily_2019-11-12_12:00

Path: /ifs

        Has Locks: No

Schedule: Daily @ Noon

  Alias Target ID: -

Alias Target Name: -

          Created: 2019-11-12T12:00:06

          Expires: -

Size: 582.45M

     Shadow Bytes: 0

        % Reserve: 0.00%

     % Filesystem: 0.00%

State: active

 

The output confirms that it’s the correct snapshot, its size, and that it’s not locked, etc.

 

4)  The following syntax will delete the snapshot ID 110, after prompting for confirmation:

 

# isi snapshot snapshots delete 110

Are you sure? (yes/[no]):


5)  A detailed report of the SnapshotDelete job can then be viewed from the WebUI. This can be found by browsing to Job Operations > Job Reports, filtering for ‘SnapshotDelete’, and selecting ‘View Details’ for the desired job.

 

When it comes to deleting snapshots, there are a couple of rules of thumb to keep in mind:

 

  • The oldest snapshot can be removed very quickly. An ordered deletion is the deletion of the oldest snapshot of a directory, and is a recommended best practice for snapshot management. An unordered deletion is the removal of a snapshot that is not the oldest in a directory, and can often take approximately twice as long to complete and consume more cluster resources than ordered deletions.

 

  • Where possible, avoid deleting snapshots from the middle of a time range. Newer snapshots are mostly pointers to older snapshots, and they look larger than they really are. Removing the newer snapshots will not free up much space. Deleting the oldest snapshot ensures you will actually free up the space. You can determine snapshot order (if not by name or date) by using the isi snapshot list -l command. The snapshot IDs (first column) are non-conserved, serial values.

  • Avoid deleting SyncIQ snapshots (snapshots with names that start with SIQ), unless the only remaining snapshots on the cluster are SyncIQ snapshots, and the only way to free up space is to delete those SyncIQ snapshots. Deleting SyncIQ snapshots resets the SyncIQ policy state, which requires a reset of the policy and potentially a full sync or initial diff sync. A full sync or initial diff sync could take many times longer than a regular snapshot-based incremental sync.

 

So what happens under the hood? Upon deleting a snapshot, OneFS immediately modifies some of the tracking data and the snapshot disappears from view. However, the actual behind-the-scenes clean-up of the snapshot can involve a fair amount of work, which is performed in the second phase of the SnapshotDelete job. There is no requirement for reserved space for snapshots in OneFS. Snapshots can use as much or little of the available file system space as desirable.

In the example below, snapshot ID 100 is being deleted. To accomplish this, any changes will likely need to be moved to the prior snapshot (ID 98), because that snapshot will no longer be able to read forward.

 

snap_delete_1.png

 

Snapshot 100 has two changed blocks: block 0 and block 4.  Snapshot 98 was changed after snapshot 98 was taken, so block 4 can be deleted, but block 0 needs to be moved over to snapshot 98.


snap_delete_2.png

 

It’s worth noting that SnapshotDelete will only run if the cluster is in a fully available state, i.e., no drives or nodes are down.

 

If you have old, large snapshots consuming space and the cluster does not have a current SnapshotIQ license, contact Dell EMC Isilon Technical Support to discuss your options and assistance with deleting the old snapshots.

There have been a couple of recent inquiries from the field around SMB opportunistic locking so it seemed like an appropriate topic to dig into a bit in an article.

 

Under certain conditions, opportunistic locks, or oplocks, can enable a storage device and client to aggressively cache data – helping to boost performance. More specifically, oplocks allow a Windows client to cache read-ahead data, writes, opens, closes, and byte-range lock acquisitions.

 

With SMB2.1 and onward, in addition to oplocks, Microsoft also introduced the concept of leases. These provide more flexible and granular caching for clients, while also allowing for lock upgrades.

 

Here’s a brief rundown on how SMB and NFS support locks and leases:

 

Protocol

Details

SMB1, SMB2.0

Oplocks are defined and used in the SMB1 protocol. These are fully supported in OneFS.

SMB2.1, SMB3

Oplocks are still supported but leases are also included in the protocol. These offer a number of improvements over oplocks. These are fully supported in OneFS.

NFSv3

No provision in the protocol for leases or oplocks.

NFSv4

Optional support for file and directory delegations which are similar to SMB leases. These are not currently supported by OneFS.

 

When a Windows client attempts to open a file, it can request no oplock, request a batch, or request an exclusive oplock. Once the open has passed the  access and share mode checks, OneFS must do one of the following:

 

1)  Grant the client its requested oplock on the file (exclusive or batch).

2) Grant the client a lower-level (level II) oplock on the file.

3) Not grant an oplock on the file at all.

 

The various oplocks types, ranked from the least to the most amount of caching, include:

 

Oplock Class

Details

Level II (shared)

Level II oplocks, also referred to as shared oplocks, grant clients the ability to cache the results of read operations. This means a client can prefetch data that an application may want to read, as well as retain old read data, allowing its reads to be more efficient. Multiple clients can hold level II oplocks at the same time, but all existing level II oplocks are broken when a client tries to write data to the file.



Exclusive

Exclusive oplocks grant clients the ability to retain read data, like level II oplocks, but also allow clients to cache data and metadata writes and byte-range lock acquisitions. Unlike level II oplocks, a client cannot be granted an exclusive oplock if the file is already opened. If a client is granted an exclusive oplock, it is able to cache writes to the file, cache metadata changes (such as timestamps, but not ACLs) and cache range locks of the file via byte-range locking. As soon as there is another opener, either from the same client or a different client, the cluster must request to break the exclusive oplock, in order to guarantee the second opener has access to the most up-to-date data.



Batch

Batch oplocks are identical to exclusive oplocks, except that they allow clients to cache open/close operations. The origins of this type of oplock are from the days of DOS batch files; batch files were opened and closed for every line of the script to be executed.



 

 

There are two types of oplock breaks: level I breaks and level II breaks. An oplock break occurs when an oplock is contended, due to a conflicting file operation. In OneFS, contention occurs when an operation on one File ID (a File ID, or FID, is the ‘handle’ that SMB uses to refer to a file) conflicts with a currently held oplock on a different FID, either on the same or a different client. When an oplock contends with an operation, the oplock is broken. The OneFS rules governing oplock contention are:

 

Rule

Details

1

A level II oplock contends with modifying operations, such as writes and truncates, as well as byte-range lock acquisitions.



2

An exclusive oplock contends with an open operation, except for stat-only opens.



3

A batch oplock contends with an open, delete, or rename operation.



 

 

Contention can occur if the operations are from either the same or a different Windows client. However, an operation on a file ID does not contend against the FID’s own oplock since FIDs must be different to contend. However, opening the same file a second time will typically contend with the first opening of the file, since the second opening will return a different FID.

 

The two level I oplocks, exclusive and batch, are broken in different ways. An exclusive oplock is broken when the file it pertains to has been requested for opening. Batch oplocks are broken when the same file is opened from a different client or when the file is deleted or renamed.

 

When OneFS needs to break a level I oplock, it allows the client a chance to perform all of the operations that it has cached. Before the cluster can respond to the open request from the second client, it waits for an acknowledgment of the oplock break from the first client. The first client now has the chance to either flush cached metadata or data or send byte-range locking requests.

 

Once the client has flushed its cached operations, it relinquishes its oplock either by closing the file or acknowledging to the cluster that it has downgraded its oplock. When a client decides to downgrade its oplock it either accepts a level II oplock or informs the cluster that it does not require an oplock at all. After the client has acknowledged the oplock break, OneFS is free to respond to the open request from the second client. It may also give the second client a level II oplock, allowing it to cache read data. Since OneFS waits up to 30 seconds for an acknowledgment of its oplock break request, after which it considers the client unresponsive and times out.

 

A FID’s level II oplock is broken when a modifying operation or a byte range lock acquisition is performed on a different FID. The cluster informs the first FID that its oplock has been broken and that it can no longer cache read data. Unlike the exclusive oplock break, OneFS does require an oplock break acknowledgment from the client and can continue processing the write request right away.

 

Leases are similar to, and compatible with, oplocks, but superior in a number of areas:


  • Leases contend based on a client key, rather than a FID, so two different applications on a client accessing the same file can share a lease whereas they cannot share an oplock.
  • There are more lease types, namely:

 

Lease Type

Details

Read (R)

A Read lease (shared) indicates that there are multiple readers of a stream and no writers. This supports client read caching (similar to Level II oplock).



Read-Handle (RH)

A Read-Handle lease (shared) indicates that there are multiple readers of a stream, no writers, and that a client can keep a stream open on the cluster even though the local user on the client machine has closed the stream. This supports client read caching and handles caching (level II plus handle caching).



Read-Write (RW)

A Read-Write lease (exclusive) allows a client to open a stream for exclusive access and allows the client to perform arbitrary buffering. This supports client read caching and write caching (Level I Exclusive).



Read-Write-Handle (RWH)

Read-Write-Handle (RWH) lease (exclusive) allows a client to keep a stream open on the cluster even though the local accessor on the client machine has closed the stream. This supports client read caching, write caching, and handle caching (Level I Batch).



 

To globally enable oplocks on a cluster’s shares from the WebUI, navigate to Protocols > Windows Sharing (SMB) > Default Share Settings > Advanced Settings > Oplocks and check the ‘Enable oplocks’ checkbox.


oplocks_1.png


The same can be done for individual shares, from the desired share’s advanced configuration menu.


From the CLI, the syntax to enable oplocks on a share named ‘test’ is:


# isi smb shares modify test --oplocks Yes


To verify the configuration:


# isi smb shares view test | grep -i oplocks

Oplocks: Yes

 

Similarly, the syntax to disable oplocks on the ‘test’ share is:

 

# isi smb shares modify test --oplocks No

 

To re-enable oplocks, the following command can be used:

 

# isi smb shares modify test --revert-oplocks

 

The following gconfig syntax can be used to disable leases:

 

# isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.smb2.EnableLeases=0

 

Note that the above oplocks configuration is unaffected by this config change to leases.

 

Similarly, to re-enable leases, the following command can be used:

 

# isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.smb2.EnableLeases=1


When using either the OneFS WebUI or PlatformAPI, all communication are encrypted using Transport Layer Security (TLS). TLS requires a certificate that serves two prinicple functions: Granting permission to use encrypted communication via Public Key Infrastructure (PKI), and authenticating the identity of the certificate's holder. OneFS defaults to the best supported version of TLS based on the client request.

 

An Isilon cluster initially contains a self-signed certificate for this purpose. The existing self-signed certificate can be used, or it can be replaced with a third-party certificate authority (CA)-issued certificate. If the self-signed certificate is used, when it expires it must be replaced with either a third-party (public or private) CA-issued certificate or another self-signed certificate that is generated on the cluster. The following folders are the default locations for the server.crt and server.key files.


  • TLS certificate: /usr/local/apache2/conf/ssl.crt/server.crt
  • TLS certificate key: /usr/local/apache2/conf/ssl.key/server.key


The following steps can be used to replace the existing TLS certificate with a public or private third-party certificate authority (CA)-issued TLS certificate.


1) Connect to a cluster node via SSH and log in as root and create a backup directory:


# mkdir /ifs/data/backup/


2) Set the permissions on the backup directory to 700:


# chmod 700 /ifs/data/backup


3) Copy the server.crt and server.key files to the backup directory:


# cp /usr/local/apache2/conf/ssl.crt/server.crt \ /ifs/data/backup/server.crt.bak

# cp /usr/local/apache2/conf/ssl.key/server.key \ /ifs/data/backup/server.crt.bak


4) Create a temporary directory for the files:


# mkdir /ifs/local


5) Set the temporary directory permissions to 700:


# chmod 700 /ifs/local


6) Change to the temporary directory:


# cd /ifs/local


7) Generate a new Certificate Signing Request (CSR) and a new key by running the following command. This name identifies the new .key and .csr files. Eventually, the files will be renamed, copied back to the default location and deleted. Although any name can be selected, the recommendation is to use the name the Common Name for the new TLS certificate (for example, the server FQDN or server name, such as isilon.example.com). This helps distinguish the new files from the originals.


# openssl req -new -nodes -newkey rsa:1024 -keyout \ .key -out .csr


8) When prompted, type the information to be incorporated into the certificate request. After entering this information, the .csr and .key files appear in the /ifs/local directory.


9) Send the contents of the .csr file from the cluster to the Certificate Authority (CA) for signing.


10) When you receive the signed certificate (now a .crt file) from the CA, copy the certificate to /ifs/local/.crt (where is the name you assigned earlier).


11) To verify the attributes in the TLS certificate, run the following command using the name that you assigned earlier:


# openssl x509 -text -noout -in .crt


12) Run the following five commands to install the certificate and key, and restart the isi_webui service. In the commands, replace with the name that you assigned earlier.


# isi services -a isi_webui disable chmod 640 .key

# isi_for_array -s 'cp /ifs/local/.key \ /usr/local/apache2/conf/ssl.key/server.key'

# isi_for_array -s 'cp /ifs/local/.crt \ /usr/local/apache2/conf/ssl.crt/server.crt'

# isi services -a isi_webui enable


13) Verify that the installation succeeded. For instructions, see the Verify a TLS certificate update section of this guide.


14) Delete the temporary files from the /ifs/local directory:


# rm /ifs/local/.csr \ /ifs/local/.key /ifs/local/.crt


15) Delete the backup files from the /ifs/data/backup directory:


# rm /ifs/data/backup/server.crt.bak \ /ifs/data/backup/server.key.bak

 

The following steps replace an expired self-signed TLS certificate by generating a new certificate based on the existing server key.


1) Open a secure shell (SSH) connection to any node in the cluster and log in as root.


2) Create a backup directory by running the following command:


# mkdir /ifs/data/backup/


3) Set the permissions on the backup directory to 700:


# chmod 700 /ifs/data/backup


4) Make backup copies of the existing server.crt and server.key files by running the following two commands:


# cp /usr/local/apache2/conf/ssl.crt/server.crt \ /ifs/data/backup.bak

# cp /usr/local/apache2/conf/ssl.key/server.key \ /ifs/data/backup.bak


Note: If files with the same names exist in the backup directory, either overwrite the existing files, or, to save the old backups, rename the new files with a timestamp or other identifier.


5) Create a temporary directory to hold the files while you complete this procedure:


# mkdir /ifs/local


6) Set the permissions on the temporary directory to 700:


# chmod 700 /ifs/local


7) Change to the temporary directory:


# cd /ifs/local


8) At the command prompt, run the following two commands to create a certificate that will expire in 2 years (365 days). Increase or decrease the value for -days to generate a certificate with a different expiration date.


# cp /usr/local/apache2/conf/ssl.key/server.key ./ openssl req -new -days 365 -nodes -x509 -key \ server.key -out server.crt


Note: the -x509 value is a certificate format.


9) When prompted, type the information to be incorporated into the certificate request. When you finish entering the information, a renewal certificate is created, based on the existing (stock) server key. The renewal certificate is named server.crt and it appears in the /ifs/local directory.


10) To verify the attributes in the TLS certificate, run the following command:


# openssl x509 -text -noout -in server.crt


11) Run the following five commands to install the certificate and key, and restart the isi_webui service:


# isi services -a isi_webui disable

# chmod 640 server.key

# isi_for_array -s 'cp /ifs/local/server.key \ /usr/local/apache2/conf/ssl.key/server.key'

# isi_for_array -s 'cp /ifs/local/server.crt \ /usr/local/apache2/conf/ssl.crt/server.crt'

# isi services -a isi_webui enable


12) Verify that the installation succeeded.


TLS certificate renewal or replacement requires you to provide data such as a fully qualified domain name and a contact email address. When you renew or replace a TLS certificate, you are asked to provide data in the format that is shown in the following example:


You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN.

There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank.

-----

Country Name (2 letter code) [AU]:US

State or Province Name (full name) [Some-State]:Washington

Locality Name (eg, city) []:Seattle

Organization Name (eg, company) [Internet Widgits Pty Ltd]:Company

Organizational Unit Name (eg, section) []:System

AdministrationCommon Name (e.g. server FQDN or YOUR name) []:localhost.example.org

Email Address []:support@example.com


In addition, if you are requesting a third-party CA-issued certificate, you should include additional attributes that are shown in the following example:


Please enter the following 'extra' attributes to be sent with your certificate request


A challenge password []:password

An optional company name []:Another Name

 

13) Delete the temporary files from the /ifs/local directory:


# rm /ifs/local/.csr \ /ifs/local/.key /ifs/local/.crt


14)  Delete the backup files from the /ifs/data/backup directory:


# rm /ifs/data/backup/server.crt.bak \ /ifs/data/backup/server.key.bak

trimbn

OneFS Patches

Posted by trimbn Oct 23, 2019

In the previous article on OneFS Healthchecks, we introduced the notion of the RUP, or roll-up patch. This generated a few questions from the field, so it seemed like a good blog topic. In this article we’ll take a look at the OneFS patch installation process which has been significantly refined and simplified in 8.2.1.

 

In previous releases the patching process could prove burdensome, often requiring multiple service restarts and reboots during patch installation.


To address this, OneFS 8.2.1 includes the following principle features in its enhanced patching process:


  • Supports installing patch without uninstalling previous version
  • Only requires a single node reboot
  • Reduces service stop and start to only once per service
  • Supports patching isi_upgrade_agent_d
  • Reduces security concerns during RUP installation
  • Supports patching the underlying Patch System
  • Utilizes the same familiar ‘isi upgrade patch’ CLI command syntax


So, let’s look at what’s involved in installing a new ‘roll-up patch’, or RUP, under 8.2.1.


1)  First, check for existing patches on the cluster:


# isi upgrade patch list

Patch Name Description            Status

UGA-August                          Installed

Total: 1

 

In this example, the CLI command verifies that patch ‘UGA-August’ is installed on the cluster. Patching activity is logged in a patch database, located at /ifs/.ifsvar/patch/patch.db

 

2)  Next, install a new patch directly (ie. without any uninstallation). In this case the September RUP, UGA-September, is being installed:


# isi upgrade patch install UGA-September.pkg

The below patches are deprecated by this patch and will be removed automatically:

- UGA-August

Would you like to proceed? (yes/[no]): yes

Requested install of patch UGA-September.

 

# isi upgrade patch list

Patch Name Description Status

UGA-August AdHoc

UGA-September Installing

Total: 2

 

Note that the previous patch, UGA-August, is now listed in ‘AdHoc’ state, which means this patch is to be automatically deprecated/removed by a new patch installation. However, at this point it is still installed and effective on cluster.


3)  After the installation, check for the correct installation of the new patch:


# isi upgrade patch list

Patch Name Description Status

UGA-September                                      Installed

Total: 1

 

If any issues are encountered with the patch installation process, please contact Isilon support immediately. That said, the state can be verified with the “isi upgrade patches list” command.


Additionally, patch installation logs are available under /var/log/isi_pkg.


Pertinent log messages include:

 

Log message

Action

2019-10-16T02:20:05Z <3.6> ezhou2-6t972a3-1 isi_pkg[64413]: pkg_request: begin check_delete_deprecated

Check the deprecated patches on cluster

2019-10-16T02:20:15Z <3.6> ezhou2-6t972a3-1 isi_pkg[64462]: Unregistered 'RUP1-s' from all versions

Unregister deprecated/old RUP patch in cluster patch DB,  the deprecated RUP patch Status become “AdHoc”

2019-10-16T02:20:59Z <3.6> ezhou2-6t972a3-1 isi_pkg[64694]: Remove deprecated patch 'RUP1-s'

Deprecated RUP patches’ files will be removed at the stage of “INSTALL_EXEC”.

2019-10-16T02:21:15Z <3.6> ezhou2-6t972a3-1 isi_pkg[64865]: Removing patch from installed DB,patch 'RUP1-s' hash 'a5a33e47d5a423f1b68970d88241af53'

Deprecated RUP patches will be removed from installed DB at the stage of “INSTALL_COMMIT”.

 

Note that the patch removal or un-installation process has not changed in OneFS 8.2.1.


Additionally, the installation of firmware patches (drive DSP or node NFP) are not covered by this feature.

trimbn

OneFS Healthchecks

Posted by trimbn Oct 16, 2019

Another area of OneFS that was recently redesigned and streamlined is Healthchecks. Previously, system health checks on Isilon were prone to several challenges. The available resources were a mixture of on and off-cluster tools, often with separate user interfaces. They were also typically reactive in nature and spread across Isilon Advisor, IOCA, self-service tools, etc. To address these concerns, the new OneFS Healthcheck feature creates a single, common framework for system health check tools, considerably simplifying both the user experience and ease of development and deployment. This affords the benefits of proactive risk, management and reduced resolution time, resulting in overall improved cluster uptime.


OneFS Healthchecks make no changes to the cluster and are complementary to other monitoring services such as CELOG. On detection of an issue, a healthcheck displays an actionable message detailing the problem and recommended corrective activity. If the action is complicated or involves decisions, a knowledge-base (KB) article will often be referenced. Alternatively, if no user action is possible or the remediation path is unclear the recommendation will typically be to be to contact Dell EMC Isilon support.

Healthcheck functions include warning about a non-recommended configuration, automatically detecting known issues with current usage and configuration, and identifying problems and anomalies in the environment where the cluster is deployed (network, AD, etc).

OneFS currently provides sixteen checklist categories containing more than two hundred items, including eighty three IOCA (Isilon On-Cluster Analysis) checks. These are:


Category

Description

All

All available checks

Avscan

Checklist to determine the overall health of AVScan

Cluster_capacity

Checklist to determine the overall capacity health for a pool or cluster

Infiniband

Checklist to determine the overall health of the Infiniband backend

IOCA

Pre-existing perl script that assesses the overall health of a cluster. Checklist contains all integrated IOCA items.

Job_engine

Job Engine-related health checks

Log_level

Checklist to determine the overall health of log-level

NDMP

Checklist to determine the overall health of NDMP

NFS

Checklist to determine the overall health of nfs

NTP

Checklist to determine the overall health of time synchronization

Post-upgrade

Checklist to determine post-upgrade cluster health

Pre-upgrade

Checklist to determine pre-upgrade cluster health

SmartConnect

Checklist to determine the overall health of SmartConnect

SmartPools

Checklist to determine the overall health of SmartPools

SMB

Checklist to determine the overall health of smb

Snapshot

Checklist to determine the overall health of snapshots.

Synciq

Checklist to determine the overall health of SyncIQ


Under the hood, a OneFS health check is a small script which assesses the vitality of a particular aspect of an Isilon cluster. It’s run on-cluster via the new healthcheck framework (HCF) and returns both a status and value:

 

Health Attribute

Description

Status

OK, WARNING, CRITICAL, EMERGENCY, UNSUPPORTED

Value

  1. 100  Is healthy; 0 is not.


The following terminology is defined and helpful in understanding the Healthcheck framework:


Type

Description

Item

Script that checks a specific thing

Checklist

Group related Items for easy use

Evaluation

One instance of running an Item or Checklist

Freshness

Each item has a ‘freshness’ value which defines whether it’s new or a cached from a previous evaluation

Parameter

Additional information provided to the item(s)

Result

Output of one Evaluation

RUP

Roll-up Patch: The delivery vehicle for new OneFS Healthchecks and patches.

 

CLI commands:


The healthchecks themselves automatically run daily. They can aso be managed via the OneFS CLI using dedicated set of ‘isi healthcheck’ commands. For example, the following syntax will display all the checklist categories available: 


# isi healthcheck checklists list


To list or view details of the various individual checks available within each category, use the ‘items’ argument and grep to filter by category. For example, the following command will list all the snapshot checks:


# isi healthcheck items list | grep -i snapshot

fsa_abandoned_snapshots        Per cluster   Warns if the FSAnalyze job has failed or has left excess snapshots on the cluster after a failure

ioca_checkSnapshot             Per cluster   Checks if the Snapshot count is approaching cluster limit of 20,000, whether Autodelete is set to yes, and checks snapshot logs. Checks snapshot logs for EIN/EIO/EDEADLK/Failed to create snapshot

old_snapshots                  Per cluster   Checks for the presence of snapshots older than 1 month

snapshot_count                 Per cluster   Verify the snapshot counts on the cluster conform to the limits.

  1. 1. Active snapshot count - Number of active snapshots in the system.
  2. 2. In-delete snapshot count - Number of snapshots pending delete.


The details of an individual check, in this case ‘old_snapshots’, can be displayed using the following syntax:


# isi healthcheck items view old_snapshots

Name: old_snapshots

Summary: Checks for the presence of snapshots older than 1 month

Scope: Per cluster

Freshness: Now

Parameters:

freshness_days(38)  *

Description: * OK: There are no unusually old snapshots stored on the cluster

* WARNING: At least one snapshot stored on the cluster is over one month old.

This does not necessarily constitute a problem and may be intentional, but such

snapshots may consume a considerable amount of storage. Snapshots may be viewed

with 'isi snapshot snapshots list', and marked for automatic removal with 'isi

snapshot snapshots delete <snapshot name>'

 

The full suite of checks for a particular category (or ‘all’) can be run as follows. For example, to kick of the snapshot checks:


# isi healthcheck run snapshot


The ‘evaluations’ argument can be used to display when each set of healthchecks was run. In this case, listing and grep’ing for snapshots will show when the test suite was executed, whether it completed, and whether it passed, etc:


# isi healthcheck evaluations list | grep -i snapshot

snapshot20190924T2046 Completed - Pass - /ifs/.ifsvar/modules/health-check/results/evaluations/snapshot20191014T2046

 

The ‘evaluations view’ argument can be used to display the details of a particular healthcheck run, including whether it completed, whether it passed, specifics of any failures, and the location of the pertitnent logfile:

 

# isi healthcheck evaluations view snapshot20191014T2046

ID: snapshot20191014T2046

Checklist: snapshot

Overrides: -

Parameters: {}

Run Status: Completed

Result: Pass

Failure: -

Logs: /ifs/.ifsvar/modules/health-check/results/evaluations/snapshot20191014T2046

 

New health checks are included in Roll-Up Patches, or RUPs (previously known as Service Packs), for common versions of OneFS, specifically 8.0.0.7, 8.1.0.2, 8.1.0.4, 8.1.2, 8.1.3, 8.2.0, 8.2.1. The RUPs for these releases are typically delivered monthly and new checks are added to subsequent RUPs.

 

With the delivery of each new RUP for a particular release, the core OneFS release is also rebuilt to include the latest health checks and patches. This means that the customer download URL for a OneFS release will automatically include latest pre-installed RUP, thereby removing an additional patching/reboot requirement from the cluster’s maintenance cycle. The checks run across all nodes and are typically run daily. The results are also automatically incorporated into ‘isi_phone_home’ data.

trimbn

OneFS Instant Secure Erase

Posted by trimbn Oct 7, 2019

There are a several notable problems with many common drive retirement practices. Although not all of them are related to information security, many still result in excess cost. For example, companies that decide to re-purpose their hardware may choose to overwrite the data rather than erase it completely. The process itself is both time consuming, and a potential data security risk. For example, since re-allocated sectors on the drives are not covered by the overwrite process, this means that some old information will remain on disk.

 

Another option is to degauss and physically shred drives when the storage hardware is retired. Degaussing can yield mixed results since different drives require unique optimal degauss strengths. This also often leads to readable data being left on the drive which can obviously constitute a significant security risk.


Thirdly, there is the option to hire professional disposal services to destroy the drive. However, the more people handling the data, the higher the data vulnerability. Total costs can also increase dramatically because of the need to publish internal reports and any auditing fees.


To address these issues, OneFS 8.2.1 introduces Instant Secure Erase (ISE). ISE enables the cryptographic erasure of non-SEDs drives in an Isilon cluster, providing customers with the ability to erase the contents of a drive after smartfail.


But first, some useful terminology:


Term

Definition

Cryptographic Erase

‘SANITIZE’ command sets for SCSI/ATA drive is defined by the T10/T13 technical committees, respectively.

Instant Secure Erase

The industry term referring to the drive’s ‘cryptographic erase’ capability.

isi_drive_d

The OneFS drive daemon that manages the various drive states/activities, mapping devices to physical drive slots, and supporting firmware updates.

 

So OneFS ISE uses the ‘cryptographic erase’ command to erase proprietary user data on supported drives. ISE is enabled by default and automatically performed upon OneFS Smartfailing a supported drive.


instant_secure_erase_1.png

 

ISE can also be run manually against a specific drive. To do this, it sends standard commands to the drive, depending on its interface type. For example:


  • SCSI: “SANITIZE (cryptographic)”
  • ATA: “CRYPTO SCRAMBLE EXT”


If the drive firmware supports the appropriate above command, it swaps out the Data Encryption key to render data on the storage media unreadable.


instant_secure_erase_2.png

 

In order to use ISE, the following conditions must be met:


  • The cluster is running OneFS 8.2.1 (Acela)
  • The node is not a SED-configuration (for automatic ISE action upon smartfail)
  • User has privileges to run related CLI commands (for manually performed ISE)
    • For example, the privilege to run ‘isi_radish’
  • Cluster contains currently supported drives:
    • SCSI / ATA interface
    • Supports “cryptographic erase” command
  • The target drive is present

 

instant_secure_erase_3.png


ISE can be run by the following methods:


1)  Via the isi_drive_d daemon during a drive Smartfail.

    • If the node is non-SED configuration
    • Configurable through ‘drive config’


2)  Manually, by running the ‘isi_radish’ command.


Additionally, it can also invoked programmatically by executing the python ‘isi.hw.bay’ module.

 

As mentioned previously, ISE is enabled by default, but it can be easily disabled from the OneFS CLI with the following syntax:


# isi devices drive config modify --instant-secure-erase no


The following CLI command can also be used to manually run ISE:


# isi_radish -S <bay/dev>


ISE provides fairly comprehensive logging, and the results differ slightly depending on whether it is run manually or automatically during a smartfail. Additionally, the ‘isi device drive list’ CLI command output will display the drive state. For example:


State

Context

SMARTFAIL

During ISE action

REPLACE

After ISE finish

 

 

Note that an ISE failure or error will not block the normal smartfail process.


For a manual ISE run against a specific drive, the results are both displayed on the OneFS CLI console and written to /var/log/messages.


The ISE logfile warning messages include:


Action

Log Entry

Running ISE

“Attempting to erase smartfailed drive in bay N ...”,

“Drive in bay N is securely erased”

(isi_drive_history.log) “is securely erased: bay:N unit:N dev:daN Lnum:N seq:N model:X …”

ISE not supported

“Drive in bay N is not securely erased, because it doesn't support crypto sanitize.”

ISE disabled in drive config

“Smartfailed drive in bay N is not securely erased. instant-secure-erase disabled in drive_d config.”

ISE error

“Drive in bay N is not securely erased, attempt failed.”

“Drive in bay N is not securely erased, can't determine if it supports crypto sanitize.”

(isi_drive_history.log) “failed to be securely erased: bay:N unit:N dev:daN Lnum:N seq:N model:X …”

 

When troubleshooting ISE, a good first move is using the CLI ‘grep’ utility to search for the keyword ‘erase’ in log files.


Symptom

Detail

ISE was successful but took too long to run

  • It depends on drive model, but usually < 1 minute
  • It may block other process from accessing the drive.

ISE reports error

  • Usually it’s due to CAM error(s) sending sanitize commands
  • Looking at console & /var/log/messages & dmesg for errors during ISE activity timeframe
    • Did CAM report error?
    • Did the device driver / expander report error?
    • Did the drive/device drop during sanitize activity?

For the final article in this in-line data reduction series, we’ll turn our attention to deduplication and compression efficiency estimation tools.


Firstly, OneFS includes a dry-run Dedupe Assessment job to help estimate the amount of space savings that will be seen on a dataset. Run against a specific directory or set of directories on a cluster, the dedupe assessment job reports a total potential space savings. The job uses its own separate configuration, does not require a product license, and can be run prior to purchasing F810 hardware to determine whether deduplication is appropriate for a particular data set or environment.

inline-dedupe4_1.png

The dedupe assessment job uses a separate index table to both in-line dedupe and SmartDedupe. For efficiency, the assessment job also samples fewer candidate blocks and does not actually perform deduplication. Using the sampling and consolidation statistics, the job provides a report which estimates the total dedupe space savings in bytes.

inline-dedupe4_2.png

The dedupe assessment job can also be run from the OneFS command line (CLI):


# isi job jobs start DedupeAssessment


Alternatively, in-line deduplication can be enabled in assessment mode


# isi dedupe inline settings modify –mode assess


One the job has completed, review the following three metrics from each node:


# sysctl efs.sfm.inline_dedupe.stats.zero_block

# sysctl efs.sfm.inline_dedupe.stats.dedupe_block

# sysctl efs.sfm.inline_dedupe.stats.write_block

 

The formula to calculate the estimated dedupe rate from these statistics is:


dedupe_block / write_block * 100 = dedupe%


Note that the dedupe assessment does not differentiate the case of a fresh run from the case where a previous SmartDedupe job has already performed some sharing on the files in that directory. Isilon recommends that the user should run the assessment job once on a specific directory, since it does not provide incremental differences between instances of the job.


Similarly, the Dell Live Optics Dossier utility can be used to estimate the potential benefits of Isilon’s in-line data compression on a data set. Dossier is available for Windows and has no dependency on an Isilon cluster. This makes it useful for analyzing and estimating efficiency across real data in situ, without the need for copying data onto a cluster. The Dossier tool operates in three phases:


Dossier Phase

Description

Discovery

Users manually browse and select root folders on the local host to analyze.

Collection

Once the paths to folders have been selected, Dossier will begin walking the file system trees for the target folders. This process will likely take up to several hours for large file systems. Walking the filesystem has a similar impact to a malware/anti-virus scan in terms of the CPU, memory, and disk resources that will be utilized during the collection. A series of customizable options allow the user to deselect more invasive operations and govern the CPU and memory resources allocated to the Dossier collector.

Reporting

Users upload the resulting .dossier file to create a PowerPoint report.


To obtain a Live Optics Dossier report, first download, extract and run the Dossier collector. Local and remote UNC paths can be added for scanning. Ensure you are authenticated to the desired UNC path before adding it to Dossier’s ‘custom paths’ configuration. Be aware that the Dossier compression option only processes the first 64KB of each file to determine its compressibility. Additionally, the default configuration samples only 5% of the dataset, but this is configurable with a slider. Increasing this value improves the accuracy of the estimation report, albeit at the expense of extended job execution time.

 

inline-dedupe4_3.png

 

The compressibility scan executes rapidly, with minimal CPU and memory resource consumption. It also provides thread and memory usage controls, progress reporting, and a scheduling option to allow throttling of scanning during heavy usage windows, etc.


When the scan is complete, a ‘*.dossier’ file is generated. This file is then uploaded to the Live Optics website:

 

inline-dedupe4_4.png

 

Once uploaded and processed, a PowerPoint report is generated in real time and delivered via email.

 

inline-dedupe4_5.png

 

Compression reports are easy to comprehend. If multiple SMB shares or paths are scanned, a summary is generated at the beginning of the report, followed by the details of each individually selected path.


Live Optics Dossier can be found at URL:   https://app.liveoptics.com/tools/dossier


Documentation is at:  https://support.liveoptics.com/hc/en-us/articles/229590207-Dossier-User-Guide


When running the Live Optics Dossier tool, please keep the following considerations in mind. Doesn’t provide exactly the same algorithm as the OneFS hardware in-line compression. It also looks at the software compression, not the hardware compression. So actual results will generally be better than Dossier report.


Note that there will be some data for which Dossier overestimates compression, for example with files whose first blocks are significantly more compressible than later blocks. It is intended to be run against any SMB shares on any storage array or DAS and has no NFS export support. The Dossier tool can also take a significant amount of time to run against a large data set. By default, it only samples a portion (first 64KB) of the data, so results can be inaccurate. Dossier only provides the size of the uncompressed and compressed data. It does not provide performance estimates of different compression algorithms. It doesn’t attempt to compress files with certain known extensions which are generally uncompressible.

As we've seen in the last couple of articles, compression and deduplication can significantly increase the storage efficiency of data. However, the actual space savings often can and will vary dramatically depending on the specific attributes of the data itself.

 

The following table illustrates the relationship between the effective to usable and effective to raw ratios for the three drive configurations that the F810 chassis is available in (3.8 TB. 7.6 TB, and 15.4 TB SSDs):


inline-dedupe3_1.png


Let's take a look at descriptions for the various OneFS reporting metrics, such as those returned by the ‘isi statistics data-reduction’ command described in the previous blog article. The following attempts, where appropriate, to equate the Isilon nomenclature with more general industry terminology:


inline-dedupe3_2.png


The interrelation of the data capacity metrics described above can be illustrated in as such:

 

inline-dedupe3_3.png

 

The preprotected physical (usable) value is derived by subtracting the protection overhead from the protected physical (raw) metric. Similarly, the difference in size between preprotected physical (usable) and logical data (effective) is the efficiency savings. If OneFS SmartDedupe is also licensed and running on the cluster, this data reduction savings value will reflect a combination of compression, in-line deduplication and post-process deduplication savings.


As with most things in life, data efficiency is a compromise. To gain increased levels of storage efficiency, additional cluster resources (CPU, memory and disk IO) are utilized to execute the compressing and deduping and re-inflating of files. As such, the following factors can affect the performance of in-line data reduction and the I/O performance of compressed and deduplicated pools:


  • Application and the type of dataset being used
  • Data access pattern (for example, sequential versus random access, the size of the I/O)
  • Compressibility and duplicity of the data
  • Amount of total data
  • Average file size
  • Nature of the data layout
  • Hardware platform: the amount of CPU, RAM, and type of storage in the system
  • Amount of load on the system
  • Level of protection


Clearly, hardware offload compression will perform considerably better, both in terms of speed and efficiency, than the software fallback option – both on F810 nodes where the hardware compression engine has been disabled, and on all other nodes types where software data reduction is the only available option.

Another important performance impact consideration with in-line data efficiency is the potential for data fragmentation. After compression or deduplication, files that previously enjoyed contiguous on-disk layout will often have chunks spread across less optimal file system regions. This can lead to slightly increased latencies when accessing these files directly from disk, rather than from cache.


Because in-line data reduction is a data efficiency feature rather than performance enhancing tool, in most cases the consideration will be around cluster impact management. This is from both the client data access performance front and from the data reduction execution perspective, as additional cluster resources are consumed when shrinking and inflating files.


With in-line data reduction enabled on F810 nodes, highly incompressible data sets may experience a small performance penalty. Conversely, for highly compressible and duplicate data there may be a performance boost. Workloads performing small, random operations will likely see a small performance degradation.

Since they reside on the same card, the compression FPGA engine shares PCI-e bandwidth with the node’s backend Ethernet interfaces. In general, there is plenty of bandwidth available. However, a best practice is to run incompressible performance streaming workflows on F810 nodes with in-line data reduction disabled to avoid any potential bandwidth limits. In general, rehydration requires considerably less overhead than compression.


When considering effective usable space on a cluster with in-line data reduction enabled, bear in mind that every capacity saving from file compression and deduplication also serves to reduce the per-TB compute ratio (CPU, memory, etc). For performance workloads, the recommendation is to size for performance (IOPS, throughput, etc) rather than effective capacity.


Similarly, it is challenging to broadly characterize the in-line dedupe performance overhead with any accuracy since it is dependent on various factors including the duplicity of the data set, whether matches are found against other LINs or SINs, etc. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of in-line dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.

 

In-line data reduction is included as a core component of Isilon OneFS 8.2.1 on the F810 hardware platform and does not require a product license key to activate. In-line compression is enabled by default and in-line deduplication can be activated via the following command: 


# isi dedupe inline settings modify --enabled=True


Note that an active Isilon SmartQuotas license is required to use quota reporting. An unlicensed cluster will show a SmartQuotas warning until a valid product license has been purchased and applied to the cluster. License keys can be easily added via the ‘Activate License’ section of the OneFS WebUI, accessed by navigating via Cluster Management > Licensing.


Below are some examples of typical space reclamation levels that have been achieved with OneFS in-line data efficiency. These data efficiency space savings values are provided solely as rough guidance. Since no two data sets are alike (unless they’re replicated), actual results can and will vary considerably from these examples.

 

Workflow / Data Type

Typical Efficiency Ratio

Typical Space Savings

Home Directories / File Shares

1.3:1

25%

Engineering Source Code

1.4:1

30%

EDA Data

2:1

50%

Genomics data

2.2:1

55%

Oil and gas

1.4:1

30%

Pre-compressed data

N/A

No savings

 

To calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:

 

339.50 / 192.87 = 1.76        Or a Data Reduction ratio of 1.76:1

 

Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:

 

339.50 / 350.13 = 0.97        Or an Efficiency ratio of 0.97:1


In a precious article, we took a look at OneFS’ in-line dedupe functionality, the newest component of the in-line data reduction suite. To complement this, OneFS 8.2.1 provides six principle reporting methods for obtaining efficiency information with in-line data reduction:

 

  • Using the ‘isi statistics data-reduction’ CLI command
  • Via the ‘isi compression’ CLI command
  • Via the ‘isi dedupe’ CLI command and WebUI chart
  • From the ‘isi get -O’ CLI command
  • Configuring SmartQuotas reporting
  • OneFS WebUI Cluster Dashboard

 

Let's look at each of these in a bit more detail:

 

1)  The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:

 

339.50 / 192.87 = 1.76        Or a Data Reduction ratio of 1.76:1

 

Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:

 

339.50 / 350.13 = 0.97        Or an Efficiency ratio of 0.97:1


2)  From the OneFS CLI, the ‘isi compression stats’ command provides the option to either view or list compression statistics. When run in ‘view’ mode, the command returns the compression ratio for both compressed and all writes, plus the percentage of incompressible writes, for a prior five-minute (300 seconds) interval. For example:


# isi compression stats view

stats for 300 seconds at: 2018-12-14 11:30:06 (1544815806))

compression ratio for compressed writes:        1.28:1

compression ratio for all writes:               1.28:1

incompressible data percent:                    76.49%

total logical blocks:                           2681232

total physical blocks:                          2090963

writes for which compression was not attempted: 0.02%


Note that if the ‘incompressible data’ percentage is high in a mixed cluster, there’s a strong likelihood that the majority of writes are going to a non-F810 pool.

The ‘isi compression stats’ CLI command also accepts the ‘list’ argument, which consolidates a series of recent reports into a list of the compression activity across the file system. For example:


# isi compression stats list

Statistic    compression  overall       incompressible      logical       physical     compression

              ratio         ratio         %                    blocks blocks skip %

1544811740   3.07:1 3.07:1 10.59%        68598         22849         1.05%

1544812340   3.20:1 3.20:1 7.73%               4142          1293          0.00%

1544812640   3.14:1 3.14:1 8.24%               352           112           0.00%

1544812940   2.90:1 2.90:1 9.60%               354           122           0.00%

1544813240   1.29:1 1.29:1 75.23%        10839207     8402380       0.00%


The ‘isi compression stats’ data is used for calculating the right-hand side estimated ‘Cluster Data Reduction’ values in the ‘isi statistics data-reduction’ command described above. It also provides a count of logical and physical blocks and compression ratios, plus the percentage metrics for incompressible and skipped blocks.

The value in the ‘statistic’ column at the left of the table represents the epoch timestamp for each sample. This epoch value can be converted to a human readable form using the ‘date’ CLI command. For example:


# date -d <value>

 

3)  From the OneFS CLI, the ‘isi dedupe stats’ command provides cluster deduplication data usage and savings statistics, in both logical and physical terms. For example:

 

# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency. Similarly, the OneFS WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.

 

inline-dedupe2_1.png

 

Be aware that the deduplication statistics do not include zero block removal savings. Since zero block removal is technically not due to data deduplication it is tracked separately but is included as part of the overall data reduction ratio. 

 

Note that while OneFS 8.2.1 tracks statistics for how often zero blocks are removed, there is no current method to determine how much logical space is being saved by zero block elimination. Zero block report enhancement is planned for a future OneFS release.


4)  In addition to the ‘isi statistics data-reduction and isi compression commands, OneFS 8.2.1 also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:


# isi get –DDO file1

* Size:           167772160

* PhysicalBlocks: 10314

* LogicalSize:    167772160

PROTECTION GROUPS

lbn0: 6+2/2

2,11,589365248:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

2,4,691601408:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

Metatree logical blocks:

zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000

 

The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.

When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.

A file that for whatever reason cannot be compressed will be reported as such:


4,6,900382720:8192[INCOMPRESSIBLE]#1

 

5)  In OneFS 8.2.1, Isilon SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of compression and deduplication. On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.

 

inline-dedupe2_2.png

 

Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:

 

# isi quota quotas list

Type      AppliesTo  Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No -     -     - 2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1


To configure SmartQuotas for in-line data efficiency reporting, create a directory quota at the top-level file system directory of interest, for example /ifs. Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.


inline-dedupe2_3.png

 

The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there. Note that the quota directory efficiency ratio, and other statistics are not yet available via the platform API as of OneFS 8.2.1.

 

6)  In OneFS 8.2.1, the OneFS WebUI cluster dashboard now displays a storage efficiency tile, which show physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

inline-dedupe2_4.png

 

Be aware that, while all of the above storage efficiency tools are available on any cluster running OneFS 8.2.1, the in-line compression metrics will only be relevant for clusters containing F810 node pools.

 

It is challenging to broadly characterize the in-line dedupe performance overhead with any accuracy since it is dependent on various factors including the duplicity of the data set, whether matches are found against other LINs or SINs, etc. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of in-line dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.

Got a recent question from the field asking explicitly how FilePolicy and/or regular SmartPools handles open files and thought this might be of broader interest.

 

A customer is ingesting multiple live streams onto the F800 pool of their mixed cluster. At the same time they want to run SP jobs to start moving these onto another tier to do work on them. The files might still be open and writing into the F800 pool when the job runs to start tiering them over to the H5600 pool.


Process:

  • File open and being accessed on node Pool A
  • SmartPools job kicks off
  • Files on F800 pool A (ideally including open files) should be getting moved to H5600 pool

 

Will this work?


In short, yes. Customers routinely have ‘down-tiering’ workflows and the proposed process above should work as intended. A SmartPools or FilePolicy job should move the files transparently, even if they’re open and being modified. However, be aware that restriping large directories across tiers can cause brief latency.


Under the hood, the locks OneFS uses to provide consistency inside the file system (internal) are separate from the file locks provided for consistency between applications (external). OneFS can move metadata and blocks of data around while the file is locked by an application. The restriper also does work in small chunks to minimize disruption. Note that directories are the one place where the restriper has a higher impact, since directories require more consistency locking, and this will be addressed in a future release.

trimbn

OneFS In-line Deduplication

Posted by trimbn Sep 9, 2019

The freshly minted OneFS 8.2.1 release introduces in-line deduplication to Isilon’s portfolio as part of the in-line data reduction suite, and is available on a cluster with the following characteristics:

 

  • F810 cluster or node pool
  • 40 Gb/s Ethernet backend
  • Running OneFS 8.2.1


When in-line data reduction is enabled on a cluster, data from network clients is accepted as is and makes its way through the OneFS write path until it reaches the BSW engine, where it is broken up into individual chunks. The in-line data reduction write path comprises three main phases:

 

  • Zero Block Removal
  • In-line Deduplication
  • In-line Compression

 

If both in-line compression and deduplication are enabled on a cluster, zero block removal is performed first, followed by dedupe, and then compression. This order allows each phase to reduce the scope of work each subsequent phase.


inline-dedupe_1.png

 

The in-line data reduction zero block removal phase detects blocks that contain only zeros and prevents them from being written to disk. This both reduces disk space requirements and avoids unnecessary writes to SSD, resulting in increased drive longevity.

 

Zero block removal occurs first in the OneFS in-line data reduction process. As such, it has the potential to reduce the amount of work that both in-line deduplication and compression need to perform. The check for zero data does incur some overhead. However, for blocks that contain non-zero data the check is terminated on the first non-zero data found, which helps to minimize the impact.

 

The following characteristics are required for zero block removal to occur:


  • A full 8KB block of zeroes
  • A partial block of zeroes being written to a sparse or preallocated block


The write will convert the block to sparse if not already. A partial block of zeroes being written to a non-sparse, non-preallocated block will not be zero eliminated.

 

While Isilon has offered a native file system deduplication solution for several years, until OneFS 8.2.1 this was always accomplished by scanning the data after it has been written to disk, or post-process. With in-line data reduction, deduplication is now performed in real time as data is written to the cluster. Storage efficiency is achieved by scanning the data for identical blocks as it is received and then eliminating the duplicates.


inline-dedupe_2.png

 

When a duplicate block is discovered, in-line deduplication moves a single copy of the block to a special set of files known as shadow stores. OneFS shadow stores are file system containers that allow data to be stored in a sharable manner. As such, files on OneFS can contain both physical data and pointers, or references, to shared blocks in shadow stores.

 

Shadow stores were first introduced in OneFS 7.0, initially supporting Isilon OneFS file clones, and there are many overlaps between cloning and deduplicating files. The other main consumer of shadow stores is OneFS Small File Storage Efficiency. This feature maximizes the space utilization of a cluster by decreasing the amount of physical storage required to house the small files that comprise a typical healthcare dataset.


Shadow stores are similar to regular files but are hidden from the file system namespace, so cannot be accessed via a pathname. A shadow store typically grows to a maximum size of 2GB, which is around 256K blocks, with each block able to be referenced by 32,000 files. If the reference count limit is reached, a new block is allocated, which may or may not be in the same shadow store. Additionally, shadow stores do not reference other shadow stores. And snapshots of shadow stores are not permitted because the data contained in shadow stores cannot be overwritten.


When a client writes a file to an F810 node pool on a cluster, the write operation is divided up into whole 8KB blocks. Each of these blocks is then hashed and it’s cryptographic ‘fingerprint’ compared against an in-memory index for a match. At this point, one of the following operations will occur:

 

1)  If a match is discovered with an existing shadow store block, a byte-by-byte comparison is performed. If the comparison is successful, the data is removed from the current write operation and replaced with a shadow reference.

 

2)  When a match is found with another LIN, the data is written to a shadow store instead and replaced with a shadow reference. Next, a work request is generated and queued that includes the location for the new shadow store block, the matching LIN and block, and the data hash. A byte-by-byte data comparison is performed to verify the match and the request is then processed.

 

3)  If no match is found, the data is written to the file natively and the hash for the block is added to the in-memory index.

 

In order for in-line deduplication to be performed on a write operation, the following conditions need to be true:

 

  • In-line dedupe must be globally enabled on the cluster.
  • The current operation is writing data (ie. not a truncate or write zero operation).
  • The ‘no_dedupe’ flag is not set on the file.
  • The file is not a special file type, such as an alternate data stream (ADS) or an EC (endurant cache) file.
  • Write data includes fully overwritten and aligned blocks.
  • The write is not part of a rehydrate operation.
  • The file has not been packed (containerized) by SFSE (small file storage efficiency).

 

OneFS in-line deduplication uses the 128-bit CityHash algorithm, which is both fast and cryptographically strong. This is in contrast to post-process SmartDedupe, which uses SHA-1 hashing.

 

Each F810 node in a cluster with in-line dedupe enabled has its own in-memory hash index that it compares block ‘fingerprints’ against. The index lives in system RAM and is allocated using physically contiguous pages and accessed directly with physical addresses. This avoids the need to traverse virtual memory mappings and does not incur the cost of translation lookaside buffer (TLB) misses, minimizing deduplication performance impact.

 

The maximum size of the hash index is governed by a pair of sysctl settings, one of which caps the size at 16GB, and the other which limits the maximum size to 10% of total RAM.  The strictest of these two constraints applies.  While these settings are configurable, the recommended best practice is to use the default configuration. Any changes to these settings should only be performed under the supervision of Isilon support.

 

Since in-line dedupe and SmartDedupe use different hashing algorithms, the indexes for each are not shared directly. However, the work performed by each dedupe solution can be leveraged by each other.  For instance, if SmartDedupe writes data to a shadow store, when those blocks are read, the read hashing component of inline dedupe will see those blocks and index them. 

 

When a match is found, in-line dedupe performs a byte-by-byte comparison of each block to be shared to avoid the potential for a hash collision. Data is prefetched prior to the byte-by-byte check and then compared against the L1 cache buffer directly, avoiding unnecessary data copies and adding minimal overhead. Once the matching blocks have been compared and verified as identical, they are then shared by writing the matching data to a common shadow store and creating references from the original files to this shadow store.


inline-dedupe_3.png

 

In-line dedupe samples every whole block written and handles each block independently, so it can aggressively locate block duplicity.  If a contiguous run of matching blocks is detected, in-line dedupe will merge the results into regions and process them efficiently.

 

In-line dedupe also detects dedupe opportunities from the read path, and blocks are hashed as they are read into L1 cache and inserted into the index. If an existing entry exists for that hash, in-line dedupe knows there is a block sharing opportunity between the block it just read and the one previously indexed. It combines that information and queues a request to an asynchronous dedupe worker thread.  As such, it is possible to deduplicate a data set purely by reading it all. To help mitigate the performance impact, the hashing is performed out-of-band in the prefetch path, rather than in the latency-sensitive read path.

 

Since in-line deduplication configuration is binary, either on or off across a cluster, it can be easily controlled via the OneFS command line interface (CLI). For example, the following syntax will enable in-line deduplication and verify the configuration:


# isi dedupe inline settings view

    Mode: disabled

    Wait: -

   Local: -

# isi dedupe inline settings modify –-mode enabled

# isi dedupe inline settings view

    Mode: enabled

    Wait: -

   Local: -

 

Note that in-line deduplication is disabled by default on new F810 cluster running OneFS 8.2.1.

While there are no visible userspace changes when files are deduplicated, if deduplication has occurred, both the ‘disk usage’ and the ‘physical blocks’ metric reported by the ‘isi get –DD’ CLI command will be reduced. Additionally, at the bottom of the command’s output, the logical block statistics will report the number of shadow blocks. For example:


Metatree logical blocks:

zero=260814 shadow=362 ditto=0 prealloc=0 block=2 compressed=0

 

OneFS in-line data deduplication can be disabled from the CLI with the following syntax:


# isi dedupe inline settings modify –-mode disabled

# isi dedupe inline settings view

    Mode: disabled

    Wait: -

   Local: -

 

OneFS in-line data deduplication can be paused from the CLI with the following syntax:


# isi dedupe inline settings modify –-mode paused

 

OneFS in-line data deduplication can be run in assess mode from the CLI with the following syntax:


# isi dedupe inline settings modify –-mode assess

 

Problems with in-line dedupe may generate the following OneFS events and alerts. These include:


Event Category

Alert Condition

Event ID

Health

Inline dedupe index allocation failed

  • 400180001

Health

Inline dedupe index allocation in progress

  • 400180002

Availability

Inline dedupe not supported

  • 400180003

Health

Inline dedupe index is smaller than requested

  • 400180004

Health

Inline dedupe index has non standard layout

  • 400180005

 

 

In the event that in-line deduplication encounters an unrecoverable error, it will restart the write operation with in-line dedupe disabled. If any of the above alert conditions occur, please contact Isilon Technical Support for further evaluation.

trimbn

OneFS NDMP Enhancements

Posted by trimbn Sep 5, 2019

The demand for storage is continuing to grow exponentially and all predictions suggest it will continue to expand at a very aggressive rate for the foreseeable future. To effectively protect a file system in the multi-petabyte size range requires an extensive use of multiple data availability and data protection technologies.


In tandem with this trend, the demand for ways to protect and manage that storage also increases. Today, several strategies for data protection are available and in use. If data protection is perceived as a continuum, at the beginning lies high availability. Without high availability technologies such as drive, network and power redundancy, data loss and its subsequent recovery would be considerably more prevalent.


Technologies like replication, synchronization and snapshots, in addition to traditional NDMP-based backup, are mainstream and established within the data protection realm. Snapshots offer rapid, user-driven restores without the need for administrative assistance, while synchronization and replication provide valuable tools for business continuance and offsite disaster recovery.


Some of these methods are biased towards cost efficiency but have a higher risk associated with them, and others represent a higher cost but also offer an increased level of protection. Two ways to measure cost versus risk from a data protection point of view are:


  • Recovery Time Objective (RTO): RTO is the allotted amount of time within a Service Level Agreement (SLA) to recover data.


For example, an RTO of four hours means data must be restored and made available within four hours of an outage.


  • Recovery Point Objective (RPO): RPO is the acceptable amount of data loss that can be tolerated per an SLA.


With an RPO of 30-minutes, this is the maximum amount of time that can elapse since the last backup or snapshot was taken.


The following chart illustrates how the core components of the Isilon data protection portfolio align with the notion of an availability and protection continuum and associated recovery objectives.


ndmp_1.png


OneFS’ NDMP solution, at the high end of the recovery objective continuum, receives a number of feature and functionality enhancements in OneFS 8.2. These include:


New Feature

Benefit

NDMP Redirector and Throttler

CPU usage management for NDMP backup and restore operations

ComboCopy for CloudPools

More options for CloudPools files backup

Fiber channel/ethernet controller

2-Way NDMP solution for Gen 6 Isilon nodes

 

NDMP is an open-standard protocol that provides interoperability with leading data-backup products and Isilon supports both NDMP versions 3 and 4. OneFS also provides support for both direct NDMP (referred to as 2-way NDMP), and remote NDMP (referred to as 3-way NDMP) topologies.


In the remote, 3-way NDMP scenario, there are no fibre channel connectors present in the Isilon cluster. Instead, the DMA uses NDMP over the LAN to instruct the cluster to start backing up data to the tape server - either connected via Ethernet or directly attached to the DMA host. In this model, the DMA also acts as the Backup/Media Server.


ndmp_2.png


During the backup, file history is transferred from the cluster via NDMP over the LAN to the backup server, where it is maintained in a catalog. In some cases, the backup application and the tape server software both reside on the same physical machine.


Direct, 2-way NDMP is typically the more efficient of the two models and results in the fastest transfer rates. Here, the data management application (DMA) uses NDMP over the Ethernet front-end network to communicate with the Isilon cluster.


ndmp_3.png


On instruction, the cluster, which is also the NDMP tape server, begins backing up data to one or more tape devices which are attached to it via Fibre Channel.


The DMA, a separate server, controls the tape library’s media management. File History, the information about files and directories, is transferred from the cluster via NDMP to the DMA, where it is maintained in a catalog.

Prior to 8.2,  2-way NDMP typically involved running the NDMP sessions primarily on dedicated Backup Accelerator (BA) nodes within a cluster. However, the BA nodes required that the cluster use an Infiniband, rather than Ethernet, backend.


OneFS 8.2 now enables a Fibre Channel HBA (host bus adapter) to be installed in Isilon Gen6 storage nodes with an Ethernet backend to support 2-way NDMP to tape devices and VTLs. The FC card itself is a hybrid 4-port (2 x 10GbE & 2 x 8Gb FC) HBA. These HBAs are installed in a paired configuration, meaning a Gen 6 chassis will contain either two or four cards per chassis.  When installed, a hybrid card will replace a node’s front-end Ethernet NIC and is supported on new Gen 6 hardware, plus legacy Gen 6 nodes via an upgrade process.


There are several tools in OneFS to facilitate information gathering and troubleshooting of the hybrid HBAs:


CLI Utility

Description

camcontrol

Provides general system status.

mt

Standard UNIX command for controlling tape devices.

chio

Standard UNIX command for controlling media changers.

sysctl dev.ocs_fc

Used to gather configuration information.

 

For example, the following syntax can be used to quickly verify is tape devices are present on a cluster:


# camcontrol devlist

<STK L180 0306>                    at scbus1 target 5 lun 0 (pass23,ch0)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 1 (sa0,pass24)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 2 (sa1,pass25)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 3 (sa2,pass26)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 4 (sa3,pass27)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 5 (sa4,pass28)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 6 (sa5,pass29)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 7 (sa6,pass30)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 8 (sa7,pass31)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun 9 (sa8,pass32)

<IBM ULTRIUM-TD5 8711>             at scbus1 target 5 lun a (sa9,pass33)

<STK L180 0306>                    at scbus1 target 6 lun 4 (pass34,ch1)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun 5 (sa10,pass35)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun 6 (sa11,pass36)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun 7 (sa12,pass37)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun 8 (sa13,pass38)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun 9 (sa14,pass39)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun a (sa15,pass40)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun b (sa16,pass41)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun c (sa17,pass42)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun d (sa18,pass43)

<IBM ULTRIUM-TD3 8711>             at scbus1 target 6 lun e (sa19,pass44)

<HP Ultrium 5-SCSI I6RZ>           at scbus1 target 12 lun 0 (sa20,pass45)

<HP Ultrium 5-SCSI I6RZ>           at scbus1 target 13 lun 0 (sa21,pass46)

<HP Ultrium 5-SCSI I6RZ>           at scbus1 target 14 lun 0 (sa22,pass47)

<HP Ultrium 5-SCSI I6RZ>           at scbus1 target 15 lun 0 (sa23,pass48)

<QUANTUM Scalar i500 681G>         at scbus1 target 15 lun 1 (pass49,ch2)

 

In this case, the presence of ‘sa’ devices in the output confirms the presence of tapes.


Also introduced in OneFS 8.2 is an NDMP Redirector for 2-way Backup, which automatically redistributes 2-Way NDMP backup or restore operations.


Each NDMP session runs with multiple threads of execution involving data movement. When multiple such NDMP sessions run on a single node, resource constraints come into play, which may cause the slow down of the NDMP processes or other processes running on the same node. The current NDMP architecture does not provide a method to fan out sessions to multiple nodes because of DMA and tape infrastructure constraints. With the introduction of Infinity platform, the utility of BA nodes is completely nullified. This results in having NDMP operations run on storage nodes with limited memory.


So having multiple NDMP operations running on a single node can cause performance issues to arise. This feature is to allow an NDMP operation to be redirected to different nodes on the cluster, which would reduce the memory and CPU contention issues on the node that had the session initiated and result in better overall load balancing and performance.


By having NDMP workloads redirected to different nodes, OneFS overcomes the potential for high memory utilization of NDMP workflows on a single node such as the Backup Accelerator. Also, the DMA can view the cluster as a whole, plus communicate with individual nodes to initiate a backup job with the desired load balancing criteria.


When a DMA initiates a backup session, it communicates with data server and tape server via a series of messages sent through the cluster. In order for the local backup sessions to have their data server or tape server migrate to other nodes, the following four aspects are required: 


Aspect

Description

Resource discovery

Nodes with available tapes in the cluster need to be discovered.

Resource Allocation

Assign resources for operations dynamically based on resource location and system load.

NDMP session load

Require stat information of active NDMP sessions and load on each node as proper load balancing criteria.

Agent

This runs on a node to redirect the data server or tape server to NDMP appropriate nodes.

 

The above operations involve the interception and internal functional response modification of a few protocol messages and is built on top of existing implementation.


Note that the NDMP redirector is only configurable through CLI via the following syntax:


# isi ndmp settings global modify --enable-redirector true


Also included in 8.2 is an NDMP Throttler, which manages CPU usages of NDMP backup and restore operations. This feature is designed specifically for Gen6 hardware which allows local NDMP operations on storage nodes. It operates by limiting NDMP’s CPU usage such that it does not overwhelm the nodes and impact client I/O and other cluster services. Like the redirector, the NDMP throttler is also only configurable through CLI. For example:


# isi ndmp settings global modify --enable-throttler true

# isi ndmp settings global view

                       Service: False

                          Port: 10000

                           DMA: generic

Bre Max Num Contexts: 64

MSB Context Retention Duration: 300

MSR Context Retention Duration: 600

Stub File Open Timeout: 10

Enable Redirector: False

Enable Throttler: False

Throttler CPU Threshold: 50

In addition, CPU usage is controlled by a CPU threshold value.  The default value is ‘50’, which means NDMP sessions should use less than 50% of CPU resources on each node. This threshold value is configurable, for example:


# isi ndmp settings global modify –throttler-cpu-threshold 80


The throttler influences both 2-Way and 3-Way data server operations and its settings are global to all the nodes in the cluster.


Finally, OneFS 8.2 also introduces NDMP Combo Copy for CloudPools. Stub files can contain sparse data, and CloudPools maintains a map cataloging each stub file’s non-sparse regions. Recalling a stub file from CloudPools can recover sparseness of the file. However, during a deep copy backup, NDMP does not recognize the sparse map and treats all CloudPools stub files as fully populated files. This means all stub files are expanded to their full size with sparse regions filled with zeros. This prolongs the backup time and enlarges the backup stream. In addition, the sparseness of the original files cannot be restored during a recovery. To address this, NDMP Combo Copy maintains the sparseness of CloudPools sparse files during a deep copy backup.

Here are the three CloudPools copy options available in OneFS 8.2:


Copy Type

Operation

Description

Value

Deep Copy

Backup

Back up files as regular files or unarchived files, and files can only be restored as regular files.

0x100

Shallow Copy

Backup

Back up files as SmartLink files without file data, and files can only be restored as SmartLink files.

0x200

Combo Copy

Backup

Back up files as SmartLink files with file data, and files can be restored as regular files or SmartLink files.

0x400

Deep Copy

Restore

Restore files as regular files.

0x100

Shallow Copy

Restore

Restore files a SmartLink files.

0x200

 

Note that both the DeepCopy and ComboCopy backups recall file data from Cloud, but the data is just for backup purposes and is not stored on disks. However, be aware that the recall of file data may incur charges from Cloud vendors.


Under the hood, CloudPools divides up file data into 20MB chunks, with each described by a CDO (cloud data object) that maps the chunk into cloud objects. Each CDO has a bit map to represent non-sparse regions of the data chunk. NDMP can read the CDOs of a stub file in order to construct a sparse map for the file, and then just back up non-sparse regions. On tape, the file looks like a regular sparse file, and therefore can be restored appropriately during a recovery.

 

Configuration is via the CLI and include the following syntax:

 

# isi ndmp settings variables create <backup path> <variable name> <value>


The <backup path> argument specifies a specific backup root, which is the same as the value of FILESYSTEM environment variable during backup.


The /BACKUP and /RESTORE arguments are global to all backups and restores respectively.


For example, the following CLI syntax configures ndmp for combo copy:


# isi ndmp settings variables /BACKUP BACKUP_OPTIONS 0x400


And for a deep restore:


# isi ndmp settings variables /RESTORE RESTORE_OPTIONS 0x100


Also included in OneFS 8.2 is an NDMP version check, which prevents recovery of incompatible SmartLink files. NDMP Backup automatically includes information of features activated by a Smartlinked file. Similarly, NDMP restore validates features required by a Smartlinked file and skips the file if the target cluster does not support those features.


For example, Cloudpools provides both AWS S3 v2 and v4 authentication support. If v4 Auth is required, then a Smartlinked file cannot be recovered to a cluster which only supports v2 Auth. Similarly, if SmartLinked files are backed up as stubs on a cluster running OneFS 8.2, which debuts CloudPools v2.0, they cannot be restored to a cluster running an earlier version of OneFS and CloudPools v1.0.


Note that there is no configuration option for disabling version checking.

Filter Blog

By date:
By tag: