Find Communities by: Category | Product

1 2 3 Previous Next

Isilon

269 Posts
trimbn

OneFS Antivirus & ICAP - Part 2

Posted by trimbn Jan 20, 2020

In this second article in this AntiVirus series, we'll take a look at policies, exclusions, global configuration, plus some monitoring and sizing ideas.


The OneFS WebUI and CLI can be used to configure antivirus policies, adjust settings, and manage antivirus scans and reports.


icap_3.png

 

Antivirus scanning can be enabled or disabled via the check-box at the top of the page. Similarly, the AV settings can be viewed and changed via the CLI.


# isi antivirus settings view

           Fail Open: Yes

        Glob Filters: -

Glob Filters Enabled: No

Glob Filters Include: No

       Path Prefixes: /ifs/data

              Repair: Yes

       Report Expiry: 1Y

       Scan On Close: Yes

        Scan On Open: Yes

Scan Cloudpool Files: No

   Scan Size Maximum: 1.00G

             Service: Yes

          Quarantine: Yes

            Truncate: No

 

For example, the following syntax will change the maximum file scanning size from 1GB to 100 MB:


# isi antivirus settings modify --scan-size-maximum 100M


To prevent specific files from being scanned by antivirus scans, from the WebUI navigate to Data Protection > Antivirus > Settings and configure filters based on file size, name, etc.


icap_4.png


To exclude files based on file name, select Enable filters and configure either inclusions or exclusions. Specify one or more filters, which can include the following wildcard characters:


Wildcard Character

Description

*

Matches any string in place of the asterisk.

[  ]

Matches any characters contained in the brackets, or a range of characters separated by a dash.

?

Matches any character in place of the question mark.

 

Be aware that these filters apply globally to all antivirus scans.


OneFS can be configured to automatically scan files as they are accessed by users from the WebUI by navigating to Data Protection > Antivirus > Settings. In the On-Access Scans area, specify whether you want files to be scanned as they are accessed. 


On-access

Description

Open

To require that all files be scanned before they are opened by a user, select Enable scan of files on open, and then specify whether you want to allow access to files that cannot be scanned by selecting or clearing Enable file access when scanning fails.

Close

To scan files after they are closed, select Enable scan of files on close.


Note that on-access scans operate independently of antivirus policies.


For example, the following syntax will disable scanning on file open:


# isi antivirus settings modify --scan-on-open no


The amount of time OneFS retains antivirus reports before automatically deleting them can be configured via the WebUI by navigating to Data Protection > Antivirus > Settings > Reports and specifying a retention period.

To add an ICAP server from the WebUI, navigate to Data Protection > Antivirus > ICAP Servers, select Add an ICAP Server, enter its IP address,  and click Enable.


Or, from the CLI:


# isi antivirus servers create --enabled <url>


An antivirus policy that causes specific files to be scanned for viruses each time the policy is run can be crafted from the WebUI by navigating to Data Protection > Antivirus > Policies and creating an Antivirus Policy. Name the policy, specify the directory(s) that you want to scan in the Paths field, set the preferred recursion depth (full or number of subdirectories), and configure a schedule if desired. Note that scheduled policies can also be run manually at any time.


Option

Description

Run the policy only manually

Click ‘Manual’.

Run the policy according to a schedule

  1. Click ‘Scheduled’.
  2. Specify how often you want the policy to run.

 

Individual files can also be manually scanned for viruses. For example, the following CLI syntax will initiate a scan of the /ifs/data/infected file:

 

# isi antivirus scan /ifs/data/infected

Result: Succeeded

Report ID: R:5e1d083c:6f86


To quarantine a file to prevent it from being accessed by users, from the WebUI, browse to Data Protection > Antivirus > Detected Threats and select More > Quarantine File in the appropriate row of the Antivirus Threat Reports table. Or from the CLI:


# isi antivirus quarantine /ifs/data/infected


The quarantine status of a file can be inspected as follows:

# isi antivirus status /ifs/data/infected

File: /ifs/data/infected

  Last Scan: Never

Quarantined: Yes

 

It can also easily be un-quarantined, or released:

# isi antivirus release /ifs/data/infected

# isi antivirus status /ifs/data/infected

File: /ifs/data/infected

  Last Scan: Never

Quarantined: No

 

If a threat is detected in a file, and the file is irreparable and no longer needed, you can manually remove the file. For example, the following command deletes the /ifs/data/infected file:


# rm /ifs/data/infected


When sizing the ICAP servers for a cluster, the number of ICAP servers deployed per Isilon node (ICAP/node) is often used as the primary metric. With multiple ICAP servers per node, OneFS distributes files to the ICAP servers in an equally weighted, round-robin manner and does not consider the processing power of the ICAP servers when allocating files. Because of this, try to keep the configuration and resource allocation (CPU, memory, network) of each ICAP server relatively equal to avoid scanning bottlenecks.

 

  • If ICAP servers are virtual machines, their resources should be dedicated (CPU, memory, network) and the OS optimized to minimize swapping and latency.
  • Network latency is a significant factor to keep in mind when planning a cluster ICAP solution. Where possible, ensure that network routing is symmetric (e.g. switch stacks, hops, delay, static routes, sbr, etc) and keep latency to a minimum.
  • The majority of infected files tend to be <10MB in size, so reducing the file size to be scanned is also advisable. Select and setup a routine for updating the file types and sizes that would be scanned or skipped.
  • Scan only the data that is necessary, and ensure the cluster is running a current OneFS version.
  • For clusters with heavy workloads or high rate of change, consider scheduling scans during low periods instead of on access/close.
  • Round-robin scanning task allocation is per node, rather than across cluster. This can potentially lead to variable congestion on individual ICAP servers, depending on how clients connect and load a cluster.
  • If the cluster is running SyncIQ replication and has a heavy workload, it is also good to stagger the activity.
  • Suggest creating a separate RBAC account for AntiVirus operations.


The following guidelines are a useful starting point for sizing ICAP server sizing:


ICAP Attribute

Details

ICAP servers

  • Policy scan: Minimum of two ICAP servers for a small cluster, increasing as cluster grows.
  • On-access scan: At least one dedicated ICAP server per node.

ICAP threads

Test different thread numbers to determine the best value for your environment. For example:

  • McAfee: 50 to 100
  • Symantec: ~20

Network bandwidth

Suggested network connectivity bandwidth for ICAP servers, depending on the average file size:

  • <1 MB average file size: 1Gbps for ICAP servers
  • >1 MB average file size: 10Gbps for ICAP servers

CPU Load

In general, the scanning workload for ICAP servers is CPU intensive. If ICAP server CPU utilization >95%, either increase CPU of the ICAP servers or raise the ICAP servers per cluster ratio.

 

The number of ICAP server threads is one of the primary ICAP server-side tunables, and recommendations vary widely across vendors and products. However, the  ‘too_busy’ status and 'failed to scan' ratio are useful in gauging whether a cluster’s ICAP server(s) are too busy to handle further requests.


Firstly, OneFS reports the status of ICAP servers connected to isi_avscan_d, and this can be dumped to a logfile and viewed using the following command:


# kill -USR2 `ps -auxw | grep -i avscan | grep -v grep | awk '{print $2}'`


All of the isi_avscan_d daemon’s state information is logged to the file /var/log/isi_avscan_d.log. The following CLI command can be used to parse the ICAP server status from this file. For example:


# cat /var/log/isi_avscan_d.log | grep “too_busy”

2020-01-08T23:15:22Z <3.6> tme-sandbox-3 isi_avscan_d[71792]: [0x80070ba00]    too_busy: true


If the ‘too_busy’ field is set to ‘true’, as above, this typically indicates that an ICAP server is overloaded, suggesting that there are insufficient ICAP servers for the workload. In this case, the recommendation is to add more ICAP servers until the too_busy state is reported as ‘false’ for all ICAP servers. Conversely, be aware that having an ICAP server to cluster node ratio that is too high can also lead to performance issues. This becomes more apparent on large clusters with a high rate of change.


Secondly, the ‘failed to scan’ ratio can be calculated from the ‘failed’ and ‘scanned’ stats available via the following sysctl command:


# sysctl efs.bam.av.stats | egrep -i 'failed|scanned'


The formula for determining the ‘failed to scan’ ratio is:


(‘Failed’ number / ‘Scanned’ number) x 100 = Failed to scan %


If this percentage is much above zero, consider adding additional ICAP servers, or increasing bandwidth to existing servers if they’re network-bound.

trimbn

OneFS Antivirus & ICAP

Posted by trimbn Jan 13, 2020

It appears that security is top of mind currently, with several customer discussions of late around OneFS and antivirus practices. So, it seemed like a useful topic to review in a couple of blog articles.

 

That said, OneFS provides support for ICAP (Internet Content Adaptation Protocol), enabling real-time scanning of a cluster’s dataset for computer viruses, malware, and other threats. To do this, OneFS sends files to an ICAP server running third-party antivirus scanning software, which scrutinizes the files for viruses and other threat signatures. If a threat is detected, OneFS typically alerts cluster admins by firing an event, displaying near real-time summary information, and documenting the threat in an antivirus scan report. Here’s a high-level view of a typical OneFS antivirus architecture.


icap_1.png


OneFS can also be configured to either request that ICAP servers attempt to repair infected files, or to protect users against potentially dangerous files by truncating or quarantining infected files. Before OneFS sends a file to be scanned, it ensures that the scan is not redundant. If a file has already been scanned and has not been modified, OneFS will not send the file to be scanned unless the virus database on the ICAP server has been updated since the last scan. Note that Antivirus scanning is available only if all nodes in the cluster are connected to the external network (NANON configurations are not supported).


OneFS works with antivirus software that conforms to the ICAP standard, and the following list includes the supported and most widely used antivirus vendors:


Vendor

Details

Symantec

Scan Engine 5.2 and later

Trend Micro

Interscan Web Security Suite 3.1 and later

Kaspersky

Anti-Virus for Proxy Server 5.5 and later

McAfee

VirusScan Enterprise 8.7 and later with VirusScan Enterprise for Storage 1.0 and later

 

OneFS can be configured to send files to be scanned prior to opening, after they are closed, or both. Sending files to be scanned after they are closed is faster but less secure, whereas scanning before they are opened is slower but safer. If antivirus is configured for scanning files after they are closed, when a user creates or modifies a file on the cluster, OneFS queues the file to be scanned. It then sends the file to an ICAP server to be scanned when convenient. In this configuration, users can always access files without any delay. However, it is possible that after a user modifies or creates a file, a second user might access the file before the file is scanned. If a virus was introduced to the file from the first user, the second user would be able to access the infected file. Similarly, if an ICAP server is unable to scan a file, that file will still be accessible to users.


If a cluster is configured to scan files before they are opened, when a user attempts to download a file, OneFS first sends the file to an ICAP server to be checked. The user cannot access that file until the scan is complete. Scanning files before they are opened is more secure, however it does add access latency.


OneFS can also be configured to deny access to files that cannot be scanned by an ICAP server, which can further increase the delay. For example, if no ICAP server(s) are available, users will not be able to access any files until an ICAP server become available again. If OneFS is set to scan before open, it is recommended that it’s configured to scan files after they are closed. Scanning files as they are both opened and closed will not necessarily increase security, but it will usually improve data availability, because that file may have already been scanned since it was last modified. In this case, it can be skipped, as it would not need to be re-scanned if the ICAP server database has not been updated since its previous scan.


Antivirus scanning policies can be crafted that send files from a specified directory to be scanned. OneFS Antivirus policies target a specific directory tree on the cluster and can either be run manually at any time or scheduled for automatic execution. Exclusion rules can also be configured to prevent a policy from sending certain files within the specified root directory, based on the size, name, or extension of the file.


Antivirus scans are managed by the OneFS Job Engine and function similarly to and contend with other system jobs. Note that antivirus policies do not target snapshots, and only on-access scans include snapshots.


Antivirus allows specific file(s) to be manually sent to an ICAP server for scanning at any time. For example, if a virus is detected in a file but the ICAP server is unable to repair it, that file can be re-sent to the ICAP server after the virus database had been updated, and the ICAP server might be able to repair the file. You can also scan individual files to test the connection between the cluster and ICAP servers.


In summary, OneFS offers three flavors of AV scan which include: 


AV Scan Type

Description

On-access

Sends file to ICAP server(s) for scanning prior to opening, after closing, or both. Before opening is slower but safer, after closing is faster but less secure.

AntiVirus Policy

Scheduled or manual directory tree-based scans executed by the OneFS Job Engine.

Individual File

Specific individual files sent to ICAP server(s) for targeted scanning, initiated via OneFS CLI command.

 

In the event that an ICAP server does detect a threat and/or an infected file, OneFS can be configured to respond in one of the following ways:


Response

Description

Alert

All threats that are detected cause an event to be generated in OneFS at the warning level, regardless of the threat response configuration.

Repair

The ICAP server attempts to repair the infected file before returning the file to OneFS.

Quarantine

OneFS quarantines the infected file. A quarantined file cannot be accessed by any user. However, a quarantined file can be removed from quarantine by the root user if the root user is connected to the cluster through secure shell (SSH). If you back up your cluster through NDMP backup, quarantined files will remain quarantined when the files are restored. If you replicate quarantined files to another Isilon cluster, the quarantined files will continue to be quarantined on the target cluster. Quarantines operate independently of access control lists (ACLs).

Truncate

OneFS truncates the infected file. When a file is truncated, OneFS reduces the size of the file to zero bytes to render the file harmless.

It is recommended that you do not apply this setting. If you truncate files without attempting to repair them, you might delete data unnecessarily.

Repair or quarantine

Attempts to repair infected files. If an ICAP server fails to repair a file, OneFS quarantines the file. If the ICAP server repairs the file successfully, OneFS sends the file to the user. Repair or quarantine can be useful if you want to protect users from accessing infected files while retaining all data on a cluster.

Alert only

Only generates an event for each infected file. It is recommended that you do not apply this setting.

Repair only

Attempts to repair infected files. Afterwards, OneFS sends the files to the user, whether or not the ICAP server repaired the files successfully. It is recommended that you do not apply this setting. If you only attempt to repair files, users will still be able to access infected files that cannot be repaired.

Quarantine

Quarantines all infected files. It is recommended that you do not apply this setting. If you quarantine files without attempting to repair them, you might deny access to infected files that could have been repaired.

 

OneFS automatically generates an antivirus scan report each time that a policy is run. It also generates a global status report every 24 hours which includes all the on-access scans that occurred during the day. AV scan reports typically contain the following information: 


Criteria

Description

Start

The time that the scan started.

End

The time that the scan ended.

Number

The total number of files scanned.

Size

The total size of the files scanned.

Packets

The total network traffic sent.

Throughput

The network throughput that was consumed by virus scanning.

Success

Whether the scan succeeded.

Infection total

The total number of infected files detected.

Name

The names of infected files.

Threat

The threats associated with infected files.

Response

How OneFS responded to detected threats.

 

The available scans can be viewed from the CLI as follows:

# isi antivirus reports scans list

ID              Policy ID       Status Start               Files  Infections

--------------------------------------------------------------------------------

-

R:5e1d0e66:7f8b 1b8028028048580 Started   2020-01-14T00:42:14 1      0

R:5e1d0896:706a MANUAL          Succeeded 2020-01-14T00:17:26 0      0

R:5e1d083c:6f86 MANUAL          Succeeded 2020-01-14T00:15:56 0      0

RO5e1d0480      SCAN_ON_OPEN    Started 2020-01-14T00:00:30 0      0

RO5e1bb300      SCAN_ON_OPEN    Finish 2020-01-13T00:00:31 0      0

 

More detail on a particular scan is available via:

 

# isi antivirus reports scans view R:5e1d0e66:7f8b

        ID: R:5e1d0e66:7f8b

Policy ID: 1b8028028048580

    Status: Started

     Start: 2020-01-14T00:42:14

       End: 2020-01-14T00:42:15

  Duration: Now

     Files: 716

Infections: 0

Bytes Sent: 4242360130

      Size: 4241602042

    Job ID: 5363

 

Similarly, threats can be viewed using the following CLI syntax:

 

# isi antivirus reports threats list

Scan ID         File Remediation  Threat  Time

----------------------------------------------------------------------------------------------------

R:5d240ee9:2d62 /ifs/data/suspect.tar.gz Skipped              2019-12-09T03:50:01

----------------------------------------------------------------------------------------------------

Total: 1

 

And, details of a particular threat via:

 

# isi antivirus reports threats view <id>

 

For example:

 

# isi antivirus reports threats view R:5d240ee9:2d62

Threat id 'R:5d240ee9:2d62' is not valid.

 

Or from the WebUI, by navigating to Data Protection > AntiVirus > Detected Threats:

 

icap_2.png

 

That's it for now. In the next article in this AntiVirus series, we'll take a look at policies, exclusions, global configuration, and some monitoring and sizing ideas.

In the previous article, we looked at the scope of the ‘isi get’ CLI command. To compliment this, OneFS also provides the ‘isi set’ utility, which allows configuration of OneFS-specific file attributes.

 

This command works similarly to the UNIX ‘chmod’ command, but on OneFS-centric attributes, such as protection, caching, encoding, etc. As with isi set, files can be specified by path or LIN. Here are some examples of the command in action.

 

For example, the following syntax will recursively configure a protection policy of +2d:1n on /ifs/data/testdir1 and its contents:


# isi set –R -p +2:1 /ifs/data/testdir1


To enable write caching coalescer on testdir1 and its contents, run:


# isi set –R -c on /ifs/data/testdir1


With the addition of the –n flag, no changes will actually be made. Instead, the list of files and directories that would have write enabled is returned:


# isi set –R –n -c on /ifs/data/testdir2


The following command will configure ISO-8859-1 filename encoding on testdir3 and contents:


# isi set –R –e ISO-8859-1 /ifs/data/testdir3


To configure streaming layout on the file ‘test1’, run:


# isi set -l streaming test1


The following syntax will set a metadata-write SSD strategy on testdir1 and its contents:


# isi set –R -s metadata-write /ifs/data/testdir1


To performs a file restripe operation on the file2:


# isi set –r file2


To configure write caching on file2 via its LIN address, rather than file name:


# isi set –c on –L ` # isi get -DD file1 | grep -i LIN: | awk {'print $3}'`

1:0054:00f6

 

If you set streaming access, isi get reports that streaming prefetch is enabled:


# isi get file2.tst

default   6+2/2 concurrency on file2.tst

# isi set -a streaming file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming on    file2.tst

 

For streaming layout, the ‘@’ suffix notation indicates how many drives the file is written over. Streaming layout  optimizes for a larger number of spindles than concurrency or random.

 

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 concurrency on file2.tst

# isi set -l streaming file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming/@18 on    file2.tst

 

You can specify the number of drives to spread file across with ‘isi get –d’


# isi set -d 6 file2.tst

# isi get file2.tst

POLICY    LEVEL PERFORMANCE COAL  FILE

default   6+2/2 streaming/@6 on    file2.tst

 

The following table describes in more detail the various flags and options available for the isi set command:

 

Command Option

Description

-f

Suppresses warnings on failures to change a file.

-F

Includes the /ifs/.ifsvar directory content and any of its subdirectories. Without -F, the /ifs/.ifsvar directory content and any of its subdirectories are skipped. This setting allows the specification of potentially dangerous, unsupported protection policies.

-L

Specifies file arguments by LIN instead of path.

-n

Displays the list of files that would be changed without taking any action.

-v

Displays each file as it is reached.

-r

Performs a restripe on specified file.

-R

Sets protection recursively on files.

-p <policy>

Specifies protection policies in the following forms: +M Where M is the number of node failures that can be tolerated without loss of data.

+M must be a number from, where numbers 1 through 4 are valid.

+D:M Where D indicates the number of drive failures and M indicates number of node failures that can be tolerated without loss of data. D must be a number from 1 through 4 and M must be any value that divides into D evenly. For example, +2:2 and +4:2 are valid, but +1:2 and +3:2 are not.

Nx Where N is the number of independent mirrored copies of the data that will be stored. N must be a number, with 1 through 8 being valid choices.

-w <width>

Specifies the number of nodes across which a file is striped. Typically, w = N + M, but width can also mean the total of the number of nodes that are used. You can set a maximum width policy of 32, but the actual protection is still subject to the limitations on N and M.

-c {on | off}

Specifies whether write-caching (coalescing) is enabled.

-g <restripe goal>

Used in conjunction with the -r flag, -g specifies the restripe goal. The following values are valid:

  • repair
  • reprotect
  • rebalance
  • retune

-e <encoding>

Specifies the encoding of the filename.

-d <@r drives>

Specifies the minimum number of drives that the file is spread across.

-a <value>

Specifies the file access pattern optimization setting. Ie. default, streaming, random, custom, disabled.

-l <value>

Specifies the file layout optimization setting. This is equivalent to setting both the -a and -d flags. Values are concurrency, streaming, or random

--diskpool <id | name>

Sets the preferred diskpool for a file.

-A {on | off}

Specifies whether file access and protections settings should be managed manually.

-P {on | off}

Specifies whether the file inherits values from the applicable file pool policy.

-s <value>

Sets the SSD strategy for a file. The following values are valid: If the value is metadata-write, all copies of the file's metadata are laid out on SSD storage if possible, and user data still avoids SSDs. If the value is data, Both the file's meta- data and user data (one copy if using mirrored protection, all blocks if FEC) are laid out on SSD storage if possible.

avoid Writes all associated file data and metadata to HDDs only. The data and metadata of the file are stored so that SSD storage is avoided, unless doing so would result in an out-of-space condition.

metadata Writes both file data and metadata to HDDs. One mirror of the metadata for the file is on SSD storage if possible, but the strategy for data is to avoid SSD storage.

metadata-write Writes file data to HDDs and metadata to SSDs, when available. All copies of metadata for the file are on SSD storage if possible, and the strategy for data is to avoid SSD storage.

data Uses SSD node pools for both data and metadata. Both the metadata for the file and user data, one copy if using mirrored protection and all blocks if FEC, are on SSD storage if possible.

<file> {<path> | <lin>} Specifies a file by path or LIN.

--nodepool <id | name>

Sets the preferred nodepool for a file.

--packing {on | off}

Enables storage efficient packing off a small file into a shadow store container.

--mm-[access | packing | protection] { on|off}

The ‘manually manage’ prefix flag for the access, packing, and protection options described above. This ‘—mm’ flag controls whether the SmartPools job will act on the specified file or not. On means SmartPools will ignore the file, and vice versa.

trimbn

OneFS "Isi Get" CLI Command

Posted by trimbn Jan 2, 2020

One of the lesser publicized but highly versatile tools in OneFS is the ‘isi get’ command line utility. It can often prove invaluable for generating a vast array of useful information about OneFS file system objects. In its most basic form, the command outputs this following information:

 

  • Protection policy
  • Protection level
  • Layout strategy
  • Write caching strategy
  • File name

 

For example:

 

# isi get /ifs/data/file2.txt

POLICY LEVEL     PERFORMANCE      COAL FILE

default             4+2/2     concurrency      on file2.txt

 

Here’s what each of these categories represents:

 

POLICY:  Indicates the requested protection for the object, in this case a text file. This policy field is displayed in one of three colors:

 

Requested Protection Policy

Description

Green

Fully protected

Yellow

Degraded protection under a mirroring policy

Red

Under-protection using FEC parity protection

 

LEVEL:  Displays the current actual on-disk protection of the object. This can be either FEC parity protection or mirroring. For example:

 

Protection  Level

Description

+1n

Tolerate failure of 1 drive OR 1 node (Not Recommended)

+2d:1n

Tolerate failure of 2 drives OR 1 node

+2n

Tolerate failure of 2 drives OR 2 nodes

+3d:1n

Tolerate failure of 3 drives OR 1 node

+3d:1n1d

Tolerate failure of 3 drives OR 1 node AND 1 drive

+3n

Tolerate failure of 3 drives or 3 nodes

+4d:1n

Tolerate failure of 4 drives or 1 node

+4d:2n

Tolerate failure of 4 drives or 2 nodes

+4n

Tolerate failure of 4 nodes

2x to 8x

Mirrored over 2 to 8 nodes, depending on configuration

 

PERFORMANCE:  Indicates the on-disk layout strategy, for example:

 

Data Access Setting

Description

On Disk Layout

Caching

Concurrency

Optimizes for current load on cluster, featuring many simultaneous clients. Recommended for mixed workloads.

Stripes data across the minimum number of drives required to achieve the configured data protection level.

Moderate prefetching

Streaming

Optimizes for streaming of a single file. For example, fast reading by a single client.

Stripes data across a larger number of drives.

Aggressive prefetching

Random

Optimizes for unpredictable access to a file. Performs almost no cache prefetching.

Stripes data across the minimum number of drives required to achieve the configured data protection level.

Little to no prefetching

 

COAL:  Indicates whether the Coalescer, OneFS’s NVRAM based write cache, is enabled. The coalescer provides failure-safe buffering to ensure that writes are efficient and read-modify-write operations avoided.

 

The isi get command also provides a number of additional options to generate more detailed information output. As such, the basic command syntax for isi get is as follows:

 

isi get {{[-a] [-d] [-g] [-s] [{-D | -DD | -DDC}] [-R] <path>}

| {[-g] [-s] [{-D | -DD | -DDC}] [-R] -L <lin>}}

 

Here’s the description for the various flags and options available for the command:

 

Command Option

Description

-a

Displays the hidden "." and ".." entries of each directory.

-d

Displays the attributes of a directory instead of the contents.

-g

Displays detailed information, including snapshot governance lists.

-s

Displays the protection status using words instead of colors.

-D

Displays more detailed information.

-DD

Includes information about protection groups and security descriptor owners and groups.

-DDC

Includes cyclic redundancy check (CRC) information.

-L <LIN>

Displays information about the specified file or directory. Specify as a file or directory LIN.

-O

Displays any logical overlay information and a compressed block count when viewing a file’s details.

-R

Displays information about the subdirectories and files of the specified directories.

 

The following command shows the detailed properties of a directory, /ifs/data. Note that the output has been truncated slightly to aid readability:


# isi get -D data 

POLICY   W LEVEL PERFORMANCE COAL ENCODING      FILE              IADDRS

default       4x/2 concurrency on  ˜- còÎ" v:shapes="Picture_x0020_10">  N/A ./ <1,36,268734976:512>, <1,37,67406848:512>, <2,37,269256704:512>, <3,37,336369152:512> ct: 1459203780 rt: 0 

*************************************************

* IFS inode: [ 1,36,268734976:512, 1,37,67406848:512, 2,37,269256704:512, 3,37,336369152:512 ]   ×ôë¸å ]ï¤Ý" v:shapes="Picture_x0020_2">

*************************************************

*  Inode Version:      6

*  Dir Version:        2

*  Inode Revision:     6

*  Inode Mirror Count: 4

*  Recovered Flag:     0

*  Restripe State:     0

*  Link Count:         3

*  Size:               54

*  Mode:               040777

*  Flags:              0xe0

*  Stubbed:            False

*  Physical Blocks:    0

*  LIN:                1:0000:0004 

*  Logical Size:       None

*  Shadow refs:        0

*  Do not dedupe:      0

*  Last Modified:      1461091982.785802190

*  Last Inode Change:  1461091982.785802190

*  Create Time:        1459203780.720209076

*  Rename Time:        0

*  Write Caching:      Enabled

*  Parent Lin          2

*  Parent Hash:        763857

*  Snapshot IDs:       None

*  Last Paint ID:      47

*  Domain IDs:         None

*  LIN needs repair:   False

*  Manually Manage:

*       Access         False

*       Protection     True

*  Protection Policy:  default

*  Target Protection:  4x

*  Disk pools:         policy any pool group ID -> data target z x410_136tb_1.6tb-ssd_256gb:32(32), metadata target x410_136tb_1.6tb-ssd_256gb:32(32)

*  SSD Strategy:       metadata-write  {

*  SSD Status:         complete

*  Layout drive count: 0

*  Access pattern: 0

*  Data Width Device List:

*  Meta Width Device List:

*

*  File Data (78 bytes):

*    Metatree Depth: 1

*  Dynamic Attributes (40 bytes):

        ATTRIBUTE                OFFSET SIZE

        New file attribute       0 23

        Isilon flags v2          23 3

        Disk pool policy ID      26 5

        Last snapshot paint time 31     9

*************************************************

 

*  NEW FILE ATTRIBUTES |

*  Access attributes:  active

*  Write Cache: on

*  Access Pattern:  concurrency

*  At_r: 0

*  Protection attributes:  active

*  Protection Policy:  default

  1. *  Disk pools:         policy any pool group ID 

*  SSD Strategy:       metadata-write

*

*************************************************

 

Here is what some of these lines indicate:            

 

Line number

Description

1

OneFS command to display the file system properties of a directory or file.

2

The directory's data access pattern is set to concurrency.

3

Write caching (coalescer) is turned on.

4

Inode on-disk locations

5

Primary LIN.

6

Indicates disk pools that the data and metadata are targeted to.

7

the SSD strategy is set to metadata-write.

8

Files that are added to the directory are governed by these settings, most of which can be changed by applying a file pool policy to the directory.

 

From the WebUI, a subset of the ‘isi get –D’ output is also available from the OneFS File Explorer. This can be accessed by browsing to File System > File System Explorer and clicking on ‘View Property Details’ for the file system object of interest.


One question that is frequently asked is how to find where a file's inodes live on the cluster. The ‘isi get -D’ command output makes this fairly straightforward to answer. Take the file /ifs/data/file1, for example:


# isi get -D /ifs/data/file1 | grep -i "IFS inode"

* IFS inode: [ 1,9,8388971520:512, 2,9,2934243840:512, 3,8,9568206336:512 ]


This shows the three inode locations for the file in the *,*,*:512 notation. Let’s take the first of these:


1,9,8388971520:512


From this, we can deduce the following:

 

  • The inode is on node 1, drive 9 (logical drive number).
  • The logical inode number is 8388971520.
  • It’s an inode block that’s 512 bytes in size (Note: OneFS data blocks are 8kB in size).


Another example of where isi get can be useful is in mapping between a file system object’s pathname and its LIN (logical inode number). This might be for translating a LIN returned by an audit logfile or job engine report into a valid filename, or finding an open file from vnodes output, etc.


For example, say you wish to know which configuration file is being used by the cluster’s DNS service:


1.  First, inspect the busy_vnodes output and filter for DNS:


# sysctl efs.bam.busy_vnodes | grep -i dns

vnode 0xfffff8031f28baa0 (lin 1:0066:0007) is fd 19 of pid 4812: isi_dnsiq_d

 

This, among other things, provides the LIN for the isi_dnsiq_d process.


2.  The output can be further refined to just the LIN address as such:


# sysctl efs.bam.busy_vnodes | grep -i dns | awk '{print $4}' | sed -E 's/\)//'

1:0066:0007


3.  This LIN address can then be fed into ‘isi get’ using the ‘-L’ flag, and a valid name and path for the file will be output:


# isi get -L `sysctl efs.bam.busy_vnodes | grep -i dns | grep -v "(lin 0)" | awk '{print $4}' | sed -E 's/\)//'`

A valid path for LIN 0x100660007 is /ifs/.ifsvar/modules/flexnet/flx_config.xml


This confirms that the XML configuration file in use by isi_dnsiq_d is flx_config.xml.


OneFS 8.2.1 and later also sees the addition of a ‘-O’ logical overlay flag to ‘isi get’ CLI utility for viewing a file’s compression details. For example:


# isi get –DDO file1

* Size:           167772160

* PhysicalBlocks: 10314

* LogicalSize:    167772160

PROTECTION GROUPS

lbn0: 6+2/2

2,11,589365248:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

2,4,691601408:8192[COMPRESSED]#6

0,0,0:8192[COMPRESSED]#10

Metatree logical blocks:

zero=32 shadow=0 ditto=0 prealloc=0 block=0 compressed=64000

 

The logical overlay information is described under the ‘protection groups’ output. This example shows a compressed file where the sixteen-block chunk is compressed down to six physical blocks (#6) and ten sparse blocks (#10). Under the ‘Metatree logical blocks’ section, a breakdown of the block types and their respective quantities in the file is displayed - including a count of compressed blocks.


When compression has occurred, the ‘df’ CLI command will report a reduction in used disk space and an increase in available space. The ‘du’ CLI command will also report less disk space used.

A file that for whatever reason cannot be compressed will be reported as such:

4,6,900382720:8192[INCOMPRESSIBLE]#1

So, to recap, the ‘isi get’ command provides information about an individual or set of file system objects.

trimbn

OneFS SmartQuotas and Dedupe

Posted by trimbn Dec 17, 2019

Got a question from the field asking whether a deduplicated file gets reported by and counted against SmartQuotas, and if there’s a performance penalty accessing that deduplicated file.


With OneFS, deduplicated files appear no differently than regular files to standard quota policies, regardless of whether the file has been deduplicated by SmartDedupe or OneFS in-line deduplication – or both. This is also true if the file is a clone or has been containerized by OneFS Small File Storage Efficiency (SFSE), both of which also use shadow stores, and also for in-line compression.


However, if the quota accounting is configured for ‘physical size’, which includes data-protection overhead, the additional space used by the shadow store will not be accounted for by the quota.

 

In OneFS 8.2.1, SmartQuotas has been enhanced to report the capacity saving from in-line data reduction as a storage efficiency ratio. SmartQuotas reports efficiency as a ratio across the desired data set as specified in the quota path field. The efficiency ratio is for the full quota directory and its contents, including any overhead, and reflects the net efficiency of deduplication (plus in-line compression, if available and enabled). On a cluster with licensed and configured SmartQuotas, this efficiency ratio can be easily viewed from the WebUI by navigating to ‘File System > SmartQuotas > Quotas and Usage’.


dedupe-quota-1.png


Similarly, the same data can be accessed from the OneFS command line via is ‘isi quota quotas list’ CLI command. For example:


# isi quota quotas list

Type      AppliesTo Path           Snap  Hard Soft  Adv  Used Efficiency

-----------------------------------------------------------------------------

directory DEFAULT    /ifs           No    - -     -    2.3247T 1.29 : 1

-----------------------------------------------------------------------------

Total: 1

 

More detail, including both the physical (raw) and logical (effective) data capacities, is also available via the ‘isi quota quotas view <path> <type>’ CLI command. For example:


# isi quota quotas view /ifs directory

                        Path: /ifs

                        Type: directory

Snapshots: No

Thresholds Include Overhead: No

                       Usage

                           Files: 4245818

Physical(With Overhead): 1.80T

Logical(W/O Overhead): 2.33T

Efficiency(Logical/Physical): 1.29 : 1

Creating and configuring a directory quota is a simple procedure and can be performed from the WebUI, as follows:


Navigate to ‘File System > SmartQuotas > Quotas and Usage’ and select ‘Create a Quota’. In the create pane, field, set the Quota type to ‘Directory quota’, add the preferred top-level path to report on, select ‘File system logical size’ for Quota Accounting, and set the Quota Limits to ‘Track storage without specifying a storage limit’. Finally, select the ‘Create Quota’ button to confirm the configuration and activate the new directory quota.

 

dedupe-quota-2.png

 

To configure SmartQuotas for in-line data efficiency reporting create a directory quota at the top-level file system directory of interest, for example /ifs. The efficiency ratio is a single, current-in time efficiency metric that is calculated per quota directory and includes the sum of in-line compression, zero block removal, in-line dedupe and SmartDedupe. This is in contrast to a history of stats over time, as reported in the ‘isi statistics data-reduction’ CLI command output, described above. As such, the efficiency ratio for the entire quota directory will reflect what is actually there.


In addition to SmartQuotas, OneFS provides several other reporting methods for obtaining efficiency information about deduplication, and data reduction in general. The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:


# isi statistics data-reduction

Recent Writes (5 mins)              Cluster Data Reduction

----------------------------------  -----------------------------------------

Logical data            339.50G     Est. logical data             1.37T

Zero-removal saved      112.00k

Deduplication saved     432.00k     Dedupe saved                  1.41G

Compression saved       146.64G     Est. compression saved        199.82G

Preprotected physical   192.87G     Est. preprotected physical    1.18T

Protection overhead     157.26G     Est. protection overhead      401.22G

Protected physical      350.13G     Protected physical            1.57T

Deduplication ratio     1.00:1      Est. dedupe ratio             1.00:1

Compression ratio       1.76:1      Est. compression ratio        1.17:1

Data reduction ratio    1.76:1      Est. data reduction ratio     1.17:1

Efficiency ratio        0.97:1      Est. storage efficiency ratio 0.87:1

 

The ‘recent writes’ data to the left of the output provides precise statistics for the five-minute period prior to running the command. By contrast, the ‘cluster data reduction’ metrics on the right of the output are slightly less real-time but reflect the overall data and efficiencies across the cluster. This is designated by the ‘Est.’ prefix, denoting an ‘estimated’ value.

The ratio data in each column is calculated from the values above it. For instance, to calculate the data reduction ratio, the ‘logical data’ (effective) is divided by the ‘preprotected physical’ (usable) value. From the output above, this would be:


339.50 / 192.87 = 1.76    Or a Data Reduction ratio of 1.76:1


Similarly, the ‘efficiency ratio’ is calculated by dividing the ‘logical data’ (effective) by the ‘protected physical’ (raw) value. From the output above, this yields:


339.50 / 350.13 = 0.97    Or an Efficiency ratio of 0.97:1


In-line dedupe and post-process SmartDedupe both deliver very similar end results, just at different stages of data ingestion. Since both features use the same core components, the results are combined. As such, the isi dedupe stats output reflects the sum of both in-line dedupe and SmartDedupe efficiency.


# isi dedupe stats

      Cluster Physical Size: 86.14T

          Cluster Used Size: 4.44T

  Logical Size Deduplicated: 218.81G

             Logical Saving: 182.56G

Estimated Size Deduplicated: 271.92G

  Estimated Physical Saving: 226.88G

 

Similarly, the WebUI’s deduplication savings histogram combines the efficiency savings from both in-line dedupe and SmartDedupe.


dedupe-quota-3.png

 

OneFS’ WebUI cluster dashboard now displays a storage efficiency tile, which shows physical and logical space utilization histograms and reports the capacity saving from in-line data reduction as a storage efficiency ratio. This dashboard view is displayed by default when opening the OneFS WebUI in a browser and can be easily accessed by navigating to ‘File System > Dashboard > Cluster Overview’.

 

dedupe-quota-4.png

 

SmartDedupe also deduplicates common blocks within the same file, resulting in even better data efficiency.

 

InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.


So how does SmartDedupe play with the other storage management and data protection tools in OneFS portfolio? Let’s take a look:


When deduplicated files are replicated to another Isilon cluster via SyncIQ, or backed up to a tape device, the deduplicated files are inflated (or rehydrated) back to their original size, since they no longer share blocks on the target Isilon cluster. However, once replicated data has landed, SmartDedupe can be run on the target cluster to provide the same space efficiency benefits as on the source.


Shadows stores are not transferred to target clusters or backup devices. Because of this, deduplicated files do not consume less space than non-deduplicated files when they are replicated or backed up. To avoid running out of space on target clusters or tape devices, it is important to verify that the total amount of storage space saved and storage space consumed does not exceed the available space on the target cluster or tape device. To reduce the amount of storage space consumed on a target Isilon cluster, you can configure deduplication for the target directories of your replication policies. Although this will deduplicate data on the target directory, it will not allow SyncIQ to transfer shadow stores. Deduplication is still performed post-replication, via a deduplication job running on the target cluster.


Because files are backed up as if the files were not deduplicated, backup and replication operations are not faster for deduplicated data. You can deduplicate data while the data is being replicated or backed up. It’s also worth noting that OneFS NDMP backup data won’t be deduped unless deduplication is provided by the backup vendor’s DMA software. However, compression is often provided natively by the backup tape or VTL device instead.


SmartDedupe does not deduplicate the data stored in a snapshot. However, snapshots can be created of deduplicated data. If a snapshot is taken of a deduplicated directory, and then the contents of that directory are modified, the shadow stores will be transferred to the snapshot over time. Because of this, more space will be saved on a cluster if deduplication is run prior to enabling snapshots. If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, it will take time before the snapshot data is affected by deduplication. Newly created snapshots will contain deduplicated data, but older snapshots will not.


It is also good practice to revert a snapshot before running a deduplication job. Restoring a snapshot will cause many of the files on the cluster to be overwritten. Any deduplicated files are reverted back to normal files if they are overwritten by a snapshot revert. However, once the snapshot revert is complete, deduplication can be run on the directory again and the resulting space savings will persist on the cluster.


Dedupe is also fully compatible with SmartLock, OneFS’ data retention and compliance product. SmartDedupe delivers storage efficiency for immutable archives and write once, read many (or WORM) protected data sets.

However, OneFS will not deduplicate files that span SmartPools pools or tiers, or that have different protection levels set. This is to avoid potential performance or protection asymmetry which could occur if portions of a file live on different classes of storage.


InsightIQ, Isilon’s multi-cluster reporting and trending analytics suite, is also integrated with and able to report in detail on SmartDedupe. This is available from the performance reporting section of IIQ, by selecting “Deduplication” as the “Report Type”. Also, included in the data provided by the File Systems Reporting section, is a report detailing the space savings efficiency delivered by deduplication.

trimbn

OneFS Snapshot Scheduling

Posted by trimbn Dec 11, 2019

One question that frequently crops up from the field is what snapshot schedule to configure on a particular cluster.


SnapshotIQ scheduling allows cluster administrators to automatically generate snapshots according to a pre-defined itinerary. While there definitely isn’t a ‘one size fits all’ recommendation to make, three of main drivers for this decision are:

 

  • Recovery point objective (RPO)
  • Available cluster capacity
  • Dataset rate of change

 

An organization’s data security, availability, and disaster recovery policy will often answer the first question – how much? Many companies define explicit service level requirements (SLAs) around the availability of their data. RPO is the acceptable amount of data loss that can be tolerated. With an RPO of 30-minutes, for example, a half hour is the maximum amount of time that can elapse since the last backup or snapshot was taken.


While OneFS does not require any cluster capacity to be exclusively reserved for snapshots, obviously snaps do consume space. Furthermore, this space will grow the more HEAD data changes, and as more snapshots are retained.


OneFS snapshot schedules can be configured at daily, weekly, monthly or yearly intervals, with single or multiple job frequency per schedule, and down to a per-minute granularity.

There are two main strategies for snapshot scheduling:

  • Ordered Deletion
  • Unordered Deletion


Ordered deletion is suited to data sets with a low rate of change, such as archive or other cold data; whereas unordered deletion, which retains considerably fewer snapshots, is recommended for more active data, or clusters with limited capacity available.

The following table provides a recommended snapshot schedule for both ordered and unordered deletion configurations:

snap_schedule2.png

 

The following CLI command will create a schedule for hourly snapshots of the /ifs/data/prod directory and its contents, plus a one month retention setting:


# isi snapshot schedules create hourly /ifs/data/media HourlyBackup_%m-%d-%Y_%H:%M "Every day every hour" --duration 1M


To configure a similar schedule from the WebUI, navigating to Data Protection > Snapshots > Snapshot Schedules and clicking on the ‘Create a Schedule’ button.


snapshot_sched_1.png


On the other hand, the following commands create an unordered deletion schedule for /ifs/data/prod that generate snapshots at hourly, daily, weekly and monthly cadences:

 

# isi snapshot schedules create every-other-hour /ifs/data/prod EveryOtherHourBackup_%m-%d-%Y_%H:%M "Every day every 2 hours" --duration 1D


# isi snapshot schedules create daily /ifs/data/prod Daily_%m-%d-%Y_%H:%M "Every day at 12:00 AM" --duration 1W


# isi snapshot schedules create weekly /ifs/data/prod Weekly_%m-%d-%Y_%H:%M "Every Saturday at 12:00 AM" --duration 1M


# isi snapshot schedules create monthly /ifs/data/prod Monthly_%m-%d-%Y_%H:%M "The 1 Saturday of every month at 12:00 AM" --duration 3M

 

Existing snapshot schedules can be viewed from the CLI with the following command:

# isi snapshot schedules list

ID Name

---------------------

1 every-hour

2 daily

3 weekly

4 monthly

---------------------

 

More detailed information about a particular snapshot is also available. For example, the following command will display more context about the ‘every-hour’ schedule above:

# isi snapshot schedules view every-hour

ID: 1

Name: every-other-hour

Path: /ifs/data/media

Pattern: EveryOtherHourBackup_%m-%d-%Y_%H:%M

Schedule: Every day every 2 hours

Duration: 1D

Alias: -

Next Run: 2019-12-10T17:00:00

Next Snapshot: EveryHourBackup_12-10-2019_17:00

 

Another important consideration when configuring snapshot schedules at any level of scale is the snapshot naming convention. If you schedule snapshots to be automatically generated, either according to a snapshot schedule or a replication policy, a snapshot naming pattern that determines how the snapshots are named. Snapshot naming patterns contain variables that include information about how and when the snapshot was created.

The following variables can be included in a snapshot naming pattern:

 

Variable

Description

%A

Day of the week

%a

Abbreviated day of the week. Ie. if the snapshot is generated on a Sunday, %a will have value ‘Sun’

%B

Name of the month

%b

Abbreviated name of the month. Ie. if the snapshot is generated in September, %b will have value ‘Sep’

%C

First two digits of the year

%c

The time and day. This variable is equivalent to specifying %a %b %e %T %Y

%d

Two-digit day of the month

%e

Day of the month. A single digit day is preceded by a blank space

%F

The date. This variable is equivalent to %Y-%m-%d

%G

The year. This variable is equivalent to specifying %Y. However, if the snapshot is created in a week that has less than four days in the current year, the year that contains the majority of the days of the week is displayed. The first day of the week is calculated as Monday. Ie, if a snapshot is created on Sunday, January 1, 2020, %G is replaced with 2019, because only one day of that week is in 2019.

%g

The abbreviated year. This variable is equivalent to specifying %y.

%H

The hour. The hour is represented on the 24-hour clock. Single-digit hours are preceded by a zero. For example, if a snapshot is created at 1:45 AM, %H is replaced with 01

%I

The hour represented on the 12-hour clock. Single-digit hours are preceded by a zero. Ie. if a snapshot is created at 1:45 PM, %I is replaced with 01

%j

The numeric day of the year. Ie. If a snapshot is created on February 1, %j is replaced with 32 .

%k

The hour represented on the 24-hour clock.

%l

The hour represented on the 12-hour clock. Single-digit hours are preceded by a blank space. Ie, if a snapshot is created at 1:45 AM, %I is replaced with 1

%M

Two-digit minute

%m

Two-digit month

%p

AM or PM

%{PolicyName}

Name of the replication policy that the snapshot was created for. This variable is only valid if specifying a snapshot naming pattern for a replication policy.

%R

The time. This variable is equivalent to specifying %H:%M

%r

The time. This variable is equivalent to specifying %I:%M:%S %p

%S

Two-digit second

%s

The second represented in POSIX time

%{SrcCluster}

The name of the source cluster of the replication policy that the snapshot was created for. Valid only if specifying a snapshot naming pattern for a replication policy

%T

The time. Equivalent to %H: %M: %S

%U

Two-digit numerical week of the year

%u

Numerical day of the week. Ie. If a snapshot is created on Sunday, %u has value 7

%V

Two-digit numerical week

%v

Day of snapshot creation. Equivalent to %a-%b-%Y

%W

Two-digit numerical week of the year that the snapshot was created in

%X

Time that snapshot was created. Equivalent to %H: %M: %S

%Y

Year the snapshot was created

%y

Last two digits of snapshot creation year

%Z

Time zone the snapshot was created in

%z

Offset from UTC time of time zone snapshot was created in

%+

Time and date of snapshot creation. Equivalent to %a %b %e %X %Z %Y

 

 

Similarly, automatic snapshot deletion can also be configured per defined schedule at an hourly through yearly range.

Snapshot attributes such as name and expiry date can easily be changed. For example, the following command will cause the snapshot ‘HourlyBackup_06-15-2018_22:00 to expire at 2:30 PM on 30th December 2019:


# isi snapshot snapshots modify HourlyBackup_06-15-2018_22:00 --expires 2019-12-30T02:30

A snapshot schedule can also be easily modified. However, any changes to a schedule are applied only to snapshots generated after the modifications are made. Existing snapshots are not affected by schedule modifications. If the alias of a snapshot schedule is modified, the alias is assigned to the next snapshot generated based on the schedule. However, the old alias is not removed from the last snapshot that it was assigned to. Unless you manually remove the old alias, the alias will remain attached to the last snapshot that it was assigned to.

For example, the following command causes snapshots created according to the snapshot schedule hourly_prod_snap to be deleted 15 days after they are created:

# isi snapshot schedules modify hourly_prod_snap --duration 15D


Similarly, deleting a snapshot schedule will not remove snapshots that were previously generated according to the schedule.

The following command will delete the snapshot schedule named ‘hourly_prod_snap’:

# isi snapshot schedules delete hourly_prod_snap


You can configure a snapshot schedule to assign a snapshot alias to the most recent snapshot created by the schedule. As such, the alias will be  assigned to the next snapshot generated based on the schedule. However, the old alias is not automatically removed from the last snapshot that it was assigned to. Unless you manually remove the old alias, the alias will remain attached to the last snapshot that it was assigned to.

For example, the following command will configure the the snapshot schedule WeeklySnapshot to use the alias ‘LatestWeekly’:

# isi snapshot schedules modify WeeklySnapshot –alias LatestWeekly


It’s worth noting that a snapshot schedule cannot span multiple days. For example, you cannot specify to begin generating snapshots at 5:00 PM Monday and end at 5:00 AM Tuesday. Instead, to continuously generate snapshots for a period greater than a day, two individual snapshot schedules are required.

 

In order to generate snapshots from 5:00 PM Monday to 5:00 AM Tuesday, for example, create one schedule that generates snapshots from 5:00 PM to 11:59 PM on Monday, and another schedule that generates snapshots from 12:00 AM to 5:00 AM on Tuesday.

 

For mixed node clusters, associated with the snapshot schedule frequency question may also be a decision as to which storage tier of a cluster to house the snapshots on.  This can be set, along with a specific protection level and SSD strategy for the snapshot, in the SmartPools file pool policy configuration. For example, from the WebUI browse to File System > Storage Pools > File Pools and select the desired policy.

 

snapshot_sched_3.png

 

SnapshotIQ also provides a number of global snapshot settings, including:


  • Control of auto-creation of scheduled snapshots
  • Deletion of expired snapshots
  • The ability to enable and disable the snapshot service
  • Per-protocol and complete control of snapshot visibility and and accessibility

 

These global snapshot storage settings can be accessed and configured in the WebUI by browsing to Data Protection > Snapshots > Settings:


snapshot_sched_4.png

 

Or from the CLI, via:


# isi snapshot settings view

 

The following table provides a description of the global snapshot configuration settings:

 

Attribute

Description

Autodelete

Determines whether snapshots are automatically deleted according to their expiration dates.

Reserve

Specifies the percentage of disk space on the cluster that is reserved for snapshots.

NFS Root Accessible

Determines whether snapshot directories are accessible through NFS

NFS Root Visible

Determines whether snapshot directories are visible through NFS.

NFS Subdir Accessible

Determines whether snapshot subdirectories are accessible through NFS.

SMB Root Accessible

Determines whether snapshot directories are accessible through SMB. 

SMB Root Visible

Determines whether snapshot directories are visible through SMB.

SMB Subdir Accessible

Determines whether snapshot subdirectories are accessible through SMB.

Local Root Accessible

Determines whether snapshot directories are accessible through an SSH connection or the local console.

Local Root Visible

Determines whether snapshot directories are visible through the an SSH connection or the local console.

Local Subdir Accessible

Determines whether snapshot subdirectories are accessible through an SSH connection or the local console.

Received an interesting snapshot restore inquiry from the field and thought it was worth incorporating into a blog article. The scenario this this: A large amount of data needs to be restored on a cluster. Unfortunately, the SnapshotIQ policies are configured at the root /ifs level and it is not feasible to restore every subdirectory under the snapshot. Although the files themselves are not that large, the subdirectories contain anywhere from thousands to tens of millions of files. Restores are taking a very long time when copying the directories manually.


So, there are two main issues at play here:


  • Since the snapshot is taken at a lower level in the directory tree and the entire snapshot cannot be restored in place, using the SnapRevert job is not an option here.
  • The sheer quantity of files involved mean that a manual, serial restore of the data will be incredibly time consuming.


Fortunately, there is a solution that involves using replication. SyncIQ allows for snapshot subdirectories to be included or excluded, plus also provides the performance benefit of parallel job processing.


SyncIQ contains an option only available via the command line (CLI) which allows replicate out of a snapshot.


The procedure is as follows:


1)     Create a snapshot of a root directory.

# isi snapshot snapshots create --name snaptest3 /ifs/data


2)     List the available snapshots and select the desired instance.

 

For example:


# isi snapshot list

ID Name Path

----------------------------------------------------

6 FSAnalyze-Snapshot-Current-1529557209 /ifs

8    snaptest3                             /ifs/data

----------------------------------------------------

Total: 2


Note that there are a couple of caveats:


  • The subdirectory to be restored must still exist in the HEAD filesystem (ie. not have been deleted since the snapshot was taken).
  • You cannot replicate data from a SyncIQ generated snapshot.

 

3)     Create a local SyncIQ replication policy with the snapshot source as the original location and a new directory location on ‘localhost’ as the destination. The ‘—source-include-directories’ argument lists the desired subdirectory(s) to restore.

 

For example, via the CLI:

 

# isi sync policies create snapshot_sync3 sync /ifs/data localhost /ifs/file_sync3 --source-include-directories /ifs/data/local_qa

 

Or via the WebUI:

 

SyncIQ_snapshot_replication_1.png

 

Note:  You cannot configure the snapshot into the policy, or set source=snapshot.


4)     Next, run the sync job to replicate a subset of a snapshot. This step is CLI only (not WebUI) since the SyncIQ policy needs to be executed with ‘--source-snapshot’ argument specified.

 

For example:


# isi sync job start snapshot_sync3 --source-snapshot=snaptest3


Note: This command is essentially a change root for the single run of the SyncIQ Job.


5)     Finally, rename the original directory to something else with mv, and then rename the restore location to the original name.

 

For example:

 

# mv /ifs/data/local_qa /ifs/data/local_qa_old

# mv /ifs/file_sync3/local_qa /ifs/data/local_qa


If you do not have a current replication license on your cluster, you can enable the OneFS SyncIQ trial license from the WebUI by browsing to Cluster Management > Licensing.


Using SyncIQ in this manner is a very efficient way to recover large amounts of data from within snapshots. However, this scenario also illustrates one of the drawbacks of taking snapshots at the root directory level. Consider whether it’s more advantageous to configure snapshot schedules to capture at the subdirectory directory level instead.

trimbn

OneFS Snapshot Tiering

Posted by trimbn Nov 25, 2019

Within OneFS, data tiering falls under the purview of SmartPools, and snapshot tiering is no different. SmartPools file pool policies can be crafted to identify logical groups of files (or file pools) and storage operations can be specified and applied to these files.

 

Be aware that a SmartPools license must be activated before creating file pool policies, and SmartPools or higher administrative privilege is required for configuration.

 

File pool policies have two components:

 

  1. File-matching criteria that define a file pool
  2. Actions to be applied to the file pool

 

File pools can be defined based on characteristics such as file type, size, path, birth, change, access timestamps, etc. These criteria can then be combined with Boolean operators (ie. AND, OR). In addition to file-matching criteria, a variety of actions can be applied to the file pool, including:

 

  • Identifying data and snapshot storage tiers
  • Defining data and snapshot SSD strategies
  • Enabling or disabling SmartCache
  • Setting requested protection and data-access optimization parameters

 

The Snapshot Storage Target setting is applied to each file version by SmartPools. When a snapshot is taken (ie. on copy on write), the pool setting is simply preserved. This means that the snapshot will initially be written to the default data pool and then moved. The SmartPools job subsequently finds the snapshot version and moves it to desired pool or tier during the next scheduled SmartPools job run.

 

To configure the Snapshot Storage Target setting from the WebUI, browse to Storage Pools > File Pool Policies > Edit Default Policy Details. For example, the following will configure SmartPools to store snapshots on the cluster’s ‘archive’ tier:

 

snapshot_tiering_1.png

 

The same can be achieved from the CLI using the 'isi filepool modify' command. For example:

 

# isi filepool default-policy modify --snapshot-storage-target archive

 

In addition to the storage target, the preferred Snapshot SSD strategy can also be configured here. The available options are:

 

SSD Strategy

Description

Metadata

Place a copy of snapshot metadata on SSD for read acceleration

Metadata-write

Place all snapshot metadata on SSD for read & write acceleration

Data

Place all snapshot data and metadata on SSD

Avoid

No snapshot data or metadata on SSD

 

 

The following CLI command, for example, will place a mirror of the snapshot metadata on SSD, providing metadata read acceleration:

 

# isi filepool default-policy modify --snapshot-ssd-strategy metadata

 

Similarly, for regular files, SmartPools determines which node pool to write to based on one of two situations: If a file matches a file pool policy based on directory path, that file will be written into the Node Pool dictated by the File Pool policy immediately.  If that file matches a file pool policy which is based on any other criteria besides path name, SmartPools will write that file to the Node Pool with the most available capacity. If the file matches a file pool policy that places it on a different Node Pool than the highest capacity Node Pool, it will be moved when the next scheduled SmartPools job runs.


snapshot_tiering_3.png

 

Under the covers, when the SmartPools or FilePolicy job runs, it caches a policy on directories that it thinks will be applied to children of that directory.  When files are created they start out with that policy. At the next scheduled SmartPools job run, if a different policy matches from the configured file pool rules, it is applied at that time.

trimbn

OneFS Snapshot Deletion

Posted by trimbn Nov 19, 2019

Received a recent enquiry about snapshot deletion, and thought it was worth elaborating upon in a blog article:

OneFS Snapshots are created at the directory-level instead of the volume-level, thereby providing a high degree of granularity. However, they are a point in time immutable copy of a specified subset of OneFS’ data, so can’t be altered in any way once taken - beyond a full deletion. As such, removing a portion of an existing snapshot is not an option: Deleting an Isilon snapshot is an all-or-nothing event.


There are a couple of useful OneFS CLI commands that show how much space is consumed by snapshots:


First up, the ‘isi_classic snapshot usage’ command will display the existing snapshots and their disk usage. For example:


# isi_classic snapshot usage

FSAnalyze-Snapshot-Current-1530077114             51G     n/a (R) 0.00% (T)

SIQ-Failover-snapshot_sync3-2019-10-22            4.0K     n/a (R) 0.00% (T)

SIQ-Failover-snapsync-2019-10-22_12-02            4.0K     n/a (R) 0.00% (T)

Snapshot: 2019Oct22, 08:20:05 AM                  1.9G     n/a (R) 0.00% (T)

[snapid 57, delete pending]                          0     n/a (R) 0.00% (T)

snaptest1 6.0K     n/a (R)    0.00% (T)

snaptest2 70K     n/a (R)    0.00% (T)

snaptest3                                         1.3M     n/a (R) 0.00% (T)


In addition to the name of the snapshot and the amount of space the snapshot takes up, the percentage of the snapshot reserved space this accounts for (R), and the percentage of the total filesystem space this accounts for (T) are also displayed.


Secondly, the ‘isi snapshot view’ command can be used to find more detailed info for an individual snapshot. This includes the snapshot path, alias, ID, whether there are any locks, expiry date, etc. For example:

 

# isi snapshot view FSAnalyze-Snapshot-Current-1530077114

ID: 56

Name: FSAnalyze-Snapshot-Current-1530077114

Path: /ifs

Has Locks: No

Schedule: -

  Alias Target ID: -

Alias Target Name: -

Created: 2019-10-26T22:25:14

Expires: -

Size: 50.764G

Shadow Bytes: 0

% Reserve: 0.00%

     % Filesystem: 0.00%

State: active

 

 

Snapshots can be automatically deleted on a preconfigured schedule, or manually deleted via the ‘isi snapshot snapshots delete’ CLI command.



Usage:

    isi snapshot snapshots delete { <snapshot> | --schedule <string> | --type

(alias | real) | --all }

[{--force | -f}]

[{--verbose | -v}]

[{--help | -h}]

 

Options:

<snapshot>

Delete a single snapshot.

<schedule>

Delete all snapshots created by the given schedule.

    <type> (alias | real)

Delete all snapshots of the specified type.

    --all

Delete all snapshots.

 

 

Let’s look at a simple example:


1)  The following snapshot usage command lists the available snapshots and their size, ordered by age:


# isi_classic snapshot usage

CBsnapshot                                      85K      n/a (R)    0.00% (T)

Hourly - prod 6.0K     n/a (R)    0.00% (T)

SIQ-Failover-CBpolicy1-2019-10-29_13-0          6.0K     n/a (R) 0.00% (T)

Daily_2019-11-12_12:00                          584M     n/a (R) 0.00% (T)

Weekly_2019-11-11_12:00                         6.0K     n/a (R) 0.00% (T)

 

From this output, we can see the snapshot ‘Daily_2019-11-12_12:00’ is 584MB in size and appears to be a viable candidate for deletion.

 

2)  The following CLI command will return the snapshot ID.   


# isi snapshot snapshots list | grep “Daily_2019-11-12_12:00” | awk '{print $1}'

110


3)  Next, we can use the snap ID to verify the snapshot details to ensure its deletion is desirable:


# isi snapshot snapshots view `isi snapshot snapshots list | grep Daily_2019-11-12_12:00| awk '{print $1}'`

 

ID: 110

Name: Daily_2019-11-12_12:00

Path: /ifs

        Has Locks: No

Schedule: Daily @ Noon

  Alias Target ID: -

Alias Target Name: -

          Created: 2019-11-12T12:00:06

          Expires: -

Size: 582.45M

     Shadow Bytes: 0

        % Reserve: 0.00%

     % Filesystem: 0.00%

State: active

 

The output confirms that it’s the correct snapshot, its size, and that it’s not locked, etc.

 

4)  The following syntax will delete the snapshot ID 110, after prompting for confirmation:

 

# isi snapshot snapshots delete 110

Are you sure? (yes/[no]):


5)  A detailed report of the SnapshotDelete job can then be viewed from the WebUI. This can be found by browsing to Job Operations > Job Reports, filtering for ‘SnapshotDelete’, and selecting ‘View Details’ for the desired job.

 

When it comes to deleting snapshots, there are a couple of rules of thumb to keep in mind:

 

  • The oldest snapshot can be removed very quickly. An ordered deletion is the deletion of the oldest snapshot of a directory, and is a recommended best practice for snapshot management. An unordered deletion is the removal of a snapshot that is not the oldest in a directory, and can often take approximately twice as long to complete and consume more cluster resources than ordered deletions.

 

  • Where possible, avoid deleting snapshots from the middle of a time range. Newer snapshots are mostly pointers to older snapshots, and they look larger than they really are. Removing the newer snapshots will not free up much space. Deleting the oldest snapshot ensures you will actually free up the space. You can determine snapshot order (if not by name or date) by using the isi snapshot list -l command. The snapshot IDs (first column) are non-conserved, serial values.

  • Avoid deleting SyncIQ snapshots (snapshots with names that start with SIQ), unless the only remaining snapshots on the cluster are SyncIQ snapshots, and the only way to free up space is to delete those SyncIQ snapshots. Deleting SyncIQ snapshots resets the SyncIQ policy state, which requires a reset of the policy and potentially a full sync or initial diff sync. A full sync or initial diff sync could take many times longer than a regular snapshot-based incremental sync.

 

So what happens under the hood? Upon deleting a snapshot, OneFS immediately modifies some of the tracking data and the snapshot disappears from view. However, the actual behind-the-scenes clean-up of the snapshot can involve a fair amount of work, which is performed in the second phase of the SnapshotDelete job. There is no requirement for reserved space for snapshots in OneFS. Snapshots can use as much or little of the available file system space as desirable.

In the example below, snapshot ID 100 is being deleted. To accomplish this, any changes will likely need to be moved to the prior snapshot (ID 98), because that snapshot will no longer be able to read forward.

 

snap_delete_1.png

 

Snapshot 100 has two changed blocks: block 0 and block 4.  Snapshot 98 was changed after snapshot 98 was taken, so block 4 can be deleted, but block 0 needs to be moved over to snapshot 98.


snap_delete_2.png

 

It’s worth noting that SnapshotDelete will only run if the cluster is in a fully available state, i.e., no drives or nodes are down.

 

If you have old, large snapshots consuming space and the cluster does not have a current SnapshotIQ license, contact Dell EMC Isilon Technical Support to discuss your options and assistance with deleting the old snapshots.

There have been a couple of recent inquiries from the field around SMB opportunistic locking so it seemed like an appropriate topic to dig into a bit in an article.

 

Under certain conditions, opportunistic locks, or oplocks, can enable a storage device and client to aggressively cache data – helping to boost performance. More specifically, oplocks allow a Windows client to cache read-ahead data, writes, opens, closes, and byte-range lock acquisitions.

 

With SMB2.1 and onward, in addition to oplocks, Microsoft also introduced the concept of leases. These provide more flexible and granular caching for clients, while also allowing for lock upgrades.

 

Here’s a brief rundown on how SMB and NFS support locks and leases:

 

Protocol

Details

SMB1, SMB2.0

Oplocks are defined and used in the SMB1 protocol. These are fully supported in OneFS.

SMB2.1, SMB3

Oplocks are still supported but leases are also included in the protocol. These offer a number of improvements over oplocks. These are fully supported in OneFS.

NFSv3

No provision in the protocol for leases or oplocks.

NFSv4

Optional support for file and directory delegations which are similar to SMB leases. These are not currently supported by OneFS.

 

When a Windows client attempts to open a file, it can request no oplock, request a batch, or request an exclusive oplock. Once the open has passed the  access and share mode checks, OneFS must do one of the following:

 

1)  Grant the client its requested oplock on the file (exclusive or batch).

2) Grant the client a lower-level (level II) oplock on the file.

3) Not grant an oplock on the file at all.

 

The various oplocks types, ranked from the least to the most amount of caching, include:

 

Oplock Class

Details

Level II (shared)

Level II oplocks, also referred to as shared oplocks, grant clients the ability to cache the results of read operations. This means a client can prefetch data that an application may want to read, as well as retain old read data, allowing its reads to be more efficient. Multiple clients can hold level II oplocks at the same time, but all existing level II oplocks are broken when a client tries to write data to the file.



Exclusive

Exclusive oplocks grant clients the ability to retain read data, like level II oplocks, but also allow clients to cache data and metadata writes and byte-range lock acquisitions. Unlike level II oplocks, a client cannot be granted an exclusive oplock if the file is already opened. If a client is granted an exclusive oplock, it is able to cache writes to the file, cache metadata changes (such as timestamps, but not ACLs) and cache range locks of the file via byte-range locking. As soon as there is another opener, either from the same client or a different client, the cluster must request to break the exclusive oplock, in order to guarantee the second opener has access to the most up-to-date data.



Batch

Batch oplocks are identical to exclusive oplocks, except that they allow clients to cache open/close operations. The origins of this type of oplock are from the days of DOS batch files; batch files were opened and closed for every line of the script to be executed.



 

 

There are two types of oplock breaks: level I breaks and level II breaks. An oplock break occurs when an oplock is contended, due to a conflicting file operation. In OneFS, contention occurs when an operation on one File ID (a File ID, or FID, is the ‘handle’ that SMB uses to refer to a file) conflicts with a currently held oplock on a different FID, either on the same or a different client. When an oplock contends with an operation, the oplock is broken. The OneFS rules governing oplock contention are:

 

Rule

Details

1

A level II oplock contends with modifying operations, such as writes and truncates, as well as byte-range lock acquisitions.



2

An exclusive oplock contends with an open operation, except for stat-only opens.



3

A batch oplock contends with an open, delete, or rename operation.



 

 

Contention can occur if the operations are from either the same or a different Windows client. However, an operation on a file ID does not contend against the FID’s own oplock since FIDs must be different to contend. However, opening the same file a second time will typically contend with the first opening of the file, since the second opening will return a different FID.

 

The two level I oplocks, exclusive and batch, are broken in different ways. An exclusive oplock is broken when the file it pertains to has been requested for opening. Batch oplocks are broken when the same file is opened from a different client or when the file is deleted or renamed.

 

When OneFS needs to break a level I oplock, it allows the client a chance to perform all of the operations that it has cached. Before the cluster can respond to the open request from the second client, it waits for an acknowledgment of the oplock break from the first client. The first client now has the chance to either flush cached metadata or data or send byte-range locking requests.

 

Once the client has flushed its cached operations, it relinquishes its oplock either by closing the file or acknowledging to the cluster that it has downgraded its oplock. When a client decides to downgrade its oplock it either accepts a level II oplock or informs the cluster that it does not require an oplock at all. After the client has acknowledged the oplock break, OneFS is free to respond to the open request from the second client. It may also give the second client a level II oplock, allowing it to cache read data. Since OneFS waits up to 30 seconds for an acknowledgment of its oplock break request, after which it considers the client unresponsive and times out.

 

A FID’s level II oplock is broken when a modifying operation or a byte range lock acquisition is performed on a different FID. The cluster informs the first FID that its oplock has been broken and that it can no longer cache read data. Unlike the exclusive oplock break, OneFS does require an oplock break acknowledgment from the client and can continue processing the write request right away.

 

Leases are similar to, and compatible with, oplocks, but superior in a number of areas:


  • Leases contend based on a client key, rather than a FID, so two different applications on a client accessing the same file can share a lease whereas they cannot share an oplock.
  • There are more lease types, namely:

 

Lease Type

Details

Read (R)

A Read lease (shared) indicates that there are multiple readers of a stream and no writers. This supports client read caching (similar to Level II oplock).



Read-Handle (RH)

A Read-Handle lease (shared) indicates that there are multiple readers of a stream, no writers, and that a client can keep a stream open on the cluster even though the local user on the client machine has closed the stream. This supports client read caching and handles caching (level II plus handle caching).



Read-Write (RW)

A Read-Write lease (exclusive) allows a client to open a stream for exclusive access and allows the client to perform arbitrary buffering. This supports client read caching and write caching (Level I Exclusive).



Read-Write-Handle (RWH)

Read-Write-Handle (RWH) lease (exclusive) allows a client to keep a stream open on the cluster even though the local accessor on the client machine has closed the stream. This supports client read caching, write caching, and handle caching (Level I Batch).



 

To globally enable oplocks on a cluster’s shares from the WebUI, navigate to Protocols > Windows Sharing (SMB) > Default Share Settings > Advanced Settings > Oplocks and check the ‘Enable oplocks’ checkbox.


oplocks_1.png


The same can be done for individual shares, from the desired share’s advanced configuration menu.


From the CLI, the syntax to enable oplocks on a share named ‘test’ is:


# isi smb shares modify test --oplocks Yes


To verify the configuration:


# isi smb shares view test | grep -i oplocks

Oplocks: Yes

 

Similarly, the syntax to disable oplocks on the ‘test’ share is:

 

# isi smb shares modify test --oplocks No

 

To re-enable oplocks, the following command can be used:

 

# isi smb shares modify test --revert-oplocks

 

The following gconfig syntax can be used to disable leases:

 

# isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.smb2.EnableLeases=0

 

Note that the above oplocks configuration is unaffected by this config change to leases.

 

Similarly, to re-enable leases, the following command can be used:

 

# isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.smb2.EnableLeases=1


When using either the OneFS WebUI or PlatformAPI, all communication are encrypted using Transport Layer Security (TLS). TLS requires a certificate that serves two prinicple functions: Granting permission to use encrypted communication via Public Key Infrastructure (PKI), and authenticating the identity of the certificate's holder. OneFS defaults to the best supported version of TLS based on the client request.

 

An Isilon cluster initially contains a self-signed certificate for this purpose. The existing self-signed certificate can be used, or it can be replaced with a third-party certificate authority (CA)-issued certificate. If the self-signed certificate is used, when it expires it must be replaced with either a third-party (public or private) CA-issued certificate or another self-signed certificate that is generated on the cluster. The following folders are the default locations for the server.crt and server.key files.


  • TLS certificate: /usr/local/apache2/conf/ssl.crt/server.crt
  • TLS certificate key: /usr/local/apache2/conf/ssl.key/server.key


The following steps can be used to replace the existing TLS certificate with a public or private third-party certificate authority (CA)-issued TLS certificate.


1) Connect to a cluster node via SSH and log in as root and create a backup directory:


# mkdir /ifs/data/backup/


2) Set the permissions on the backup directory to 700:


# chmod 700 /ifs/data/backup


3) Copy the server.crt and server.key files to the backup directory:


# cp /usr/local/apache2/conf/ssl.crt/server.crt \ /ifs/data/backup/server.crt.bak

# cp /usr/local/apache2/conf/ssl.key/server.key \ /ifs/data/backup/server.crt.bak


4) Create a temporary directory for the files:


# mkdir /ifs/local


5) Set the temporary directory permissions to 700:


# chmod 700 /ifs/local


6) Change to the temporary directory:


# cd /ifs/local


7) Generate a new Certificate Signing Request (CSR) and a new key by running the following command. This name identifies the new .key and .csr files. Eventually, the files will be renamed, copied back to the default location and deleted. Although any name can be selected, the recommendation is to use the name the Common Name for the new TLS certificate (for example, the server FQDN or server name, such as isilon.example.com). This helps distinguish the new files from the originals.


# openssl req -new -nodes -newkey rsa:1024 -keyout \ .key -out .csr


8) When prompted, type the information to be incorporated into the certificate request. After entering this information, the .csr and .key files appear in the /ifs/local directory.


9) Send the contents of the .csr file from the cluster to the Certificate Authority (CA) for signing.


10) When you receive the signed certificate (now a .crt file) from the CA, copy the certificate to /ifs/local/.crt (where is the name you assigned earlier).


11) To verify the attributes in the TLS certificate, run the following command using the name that you assigned earlier:


# openssl x509 -text -noout -in .crt


12) Run the following five commands to install the certificate and key, and restart the isi_webui service. In the commands, replace with the name that you assigned earlier.


# isi services -a isi_webui disable chmod 640 .key

# isi_for_array -s 'cp /ifs/local/.key \ /usr/local/apache2/conf/ssl.key/server.key'

# isi_for_array -s 'cp /ifs/local/.crt \ /usr/local/apache2/conf/ssl.crt/server.crt'

# isi services -a isi_webui enable


13) Verify that the installation succeeded. For instructions, see the Verify a TLS certificate update section of this guide.


14) Delete the temporary files from the /ifs/local directory:


# rm /ifs/local/.csr \ /ifs/local/.key /ifs/local/.crt


15) Delete the backup files from the /ifs/data/backup directory:


# rm /ifs/data/backup/server.crt.bak \ /ifs/data/backup/server.key.bak

 

The following steps replace an expired self-signed TLS certificate by generating a new certificate based on the existing server key.


1) Open a secure shell (SSH) connection to any node in the cluster and log in as root.


2) Create a backup directory by running the following command:


# mkdir /ifs/data/backup/


3) Set the permissions on the backup directory to 700:


# chmod 700 /ifs/data/backup


4) Make backup copies of the existing server.crt and server.key files by running the following two commands:


# cp /usr/local/apache2/conf/ssl.crt/server.crt \ /ifs/data/backup.bak

# cp /usr/local/apache2/conf/ssl.key/server.key \ /ifs/data/backup.bak


Note: If files with the same names exist in the backup directory, either overwrite the existing files, or, to save the old backups, rename the new files with a timestamp or other identifier.


5) Create a temporary directory to hold the files while you complete this procedure:


# mkdir /ifs/local


6) Set the permissions on the temporary directory to 700:


# chmod 700 /ifs/local


7) Change to the temporary directory:


# cd /ifs/local


8) At the command prompt, run the following two commands to create a certificate that will expire in 2 years (365 days). Increase or decrease the value for -days to generate a certificate with a different expiration date.


# cp /usr/local/apache2/conf/ssl.key/server.key ./ openssl req -new -days 365 -nodes -x509 -key \ server.key -out server.crt


Note: the -x509 value is a certificate format.


9) When prompted, type the information to be incorporated into the certificate request. When you finish entering the information, a renewal certificate is created, based on the existing (stock) server key. The renewal certificate is named server.crt and it appears in the /ifs/local directory.


10) To verify the attributes in the TLS certificate, run the following command:


# openssl x509 -text -noout -in server.crt


11) Run the following five commands to install the certificate and key, and restart the isi_webui service:


# isi services -a isi_webui disable

# chmod 640 server.key

# isi_for_array -s 'cp /ifs/local/server.key \ /usr/local/apache2/conf/ssl.key/server.key'

# isi_for_array -s 'cp /ifs/local/server.crt \ /usr/local/apache2/conf/ssl.crt/server.crt'

# isi services -a isi_webui enable


12) Verify that the installation succeeded.


TLS certificate renewal or replacement requires you to provide data such as a fully qualified domain name and a contact email address. When you renew or replace a TLS certificate, you are asked to provide data in the format that is shown in the following example:


You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN.

There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank.

-----

Country Name (2 letter code) [AU]:US

State or Province Name (full name) [Some-State]:Washington

Locality Name (eg, city) []:Seattle

Organization Name (eg, company) [Internet Widgits Pty Ltd]:Company

Organizational Unit Name (eg, section) []:System

AdministrationCommon Name (e.g. server FQDN or YOUR name) []:localhost.example.org

Email Address []:support@example.com


In addition, if you are requesting a third-party CA-issued certificate, you should include additional attributes that are shown in the following example:


Please enter the following 'extra' attributes to be sent with your certificate request


A challenge password []:password

An optional company name []:Another Name

 

13) Delete the temporary files from the /ifs/local directory:


# rm /ifs/local/.csr \ /ifs/local/.key /ifs/local/.crt


14)  Delete the backup files from the /ifs/data/backup directory:


# rm /ifs/data/backup/server.crt.bak \ /ifs/data/backup/server.key.bak

trimbn

OneFS Patches

Posted by trimbn Oct 23, 2019

In the previous article on OneFS Healthchecks, we introduced the notion of the RUP, or roll-up patch. This generated a few questions from the field, so it seemed like a good blog topic. In this article we’ll take a look at the OneFS patch installation process which has been significantly refined and simplified in 8.2.1.

 

In previous releases the patching process could prove burdensome, often requiring multiple service restarts and reboots during patch installation.


To address this, OneFS 8.2.1 includes the following principle features in its enhanced patching process:


  • Supports installing patch without uninstalling previous version
  • Only requires a single node reboot
  • Reduces service stop and start to only once per service
  • Supports patching isi_upgrade_agent_d
  • Reduces security concerns during RUP installation
  • Supports patching the underlying Patch System
  • Utilizes the same familiar ‘isi upgrade patch’ CLI command syntax


So, let’s look at what’s involved in installing a new ‘roll-up patch’, or RUP, under 8.2.1.


1)  First, check for existing patches on the cluster:


# isi upgrade patch list

Patch Name Description            Status

UGA-August                          Installed

Total: 1

 

In this example, the CLI command verifies that patch ‘UGA-August’ is installed on the cluster. Patching activity is logged in a patch database, located at /ifs/.ifsvar/patch/patch.db

 

2)  Next, install a new patch directly (ie. without any uninstallation). In this case the September RUP, UGA-September, is being installed:


# isi upgrade patch install UGA-September.pkg

The below patches are deprecated by this patch and will be removed automatically:

- UGA-August

Would you like to proceed? (yes/[no]): yes

Requested install of patch UGA-September.

 

# isi upgrade patch list

Patch Name Description Status

UGA-August AdHoc

UGA-September Installing

Total: 2

 

Note that the previous patch, UGA-August, is now listed in ‘AdHoc’ state, which means this patch is to be automatically deprecated/removed by a new patch installation. However, at this point it is still installed and effective on cluster.


3)  After the installation, check for the correct installation of the new patch:


# isi upgrade patch list

Patch Name Description Status

UGA-September                                      Installed

Total: 1

 

If any issues are encountered with the patch installation process, please contact Isilon support immediately. That said, the state can be verified with the “isi upgrade patches list” command.


Additionally, patch installation logs are available under /var/log/isi_pkg.


Pertinent log messages include:

 

Log message

Action

2019-10-16T02:20:05Z <3.6> ezhou2-6t972a3-1 isi_pkg[64413]: pkg_request: begin check_delete_deprecated

Check the deprecated patches on cluster

2019-10-16T02:20:15Z <3.6> ezhou2-6t972a3-1 isi_pkg[64462]: Unregistered 'RUP1-s' from all versions

Unregister deprecated/old RUP patch in cluster patch DB,  the deprecated RUP patch Status become “AdHoc”

2019-10-16T02:20:59Z <3.6> ezhou2-6t972a3-1 isi_pkg[64694]: Remove deprecated patch 'RUP1-s'

Deprecated RUP patches’ files will be removed at the stage of “INSTALL_EXEC”.

2019-10-16T02:21:15Z <3.6> ezhou2-6t972a3-1 isi_pkg[64865]: Removing patch from installed DB,patch 'RUP1-s' hash 'a5a33e47d5a423f1b68970d88241af53'

Deprecated RUP patches will be removed from installed DB at the stage of “INSTALL_COMMIT”.

 

Note that the patch removal or un-installation process has not changed in OneFS 8.2.1.


Additionally, the installation of firmware patches (drive DSP or node NFP) are not covered by this feature.

trimbn

OneFS Healthchecks

Posted by trimbn Oct 16, 2019

Another area of OneFS that was recently redesigned and streamlined is Healthchecks. Previously, system health checks on Isilon were prone to several challenges. The available resources were a mixture of on and off-cluster tools, often with separate user interfaces. They were also typically reactive in nature and spread across Isilon Advisor, IOCA, self-service tools, etc. To address these concerns, the new OneFS Healthcheck feature creates a single, common framework for system health check tools, considerably simplifying both the user experience and ease of development and deployment. This affords the benefits of proactive risk, management and reduced resolution time, resulting in overall improved cluster uptime.


OneFS Healthchecks make no changes to the cluster and are complementary to other monitoring services such as CELOG. On detection of an issue, a healthcheck displays an actionable message detailing the problem and recommended corrective activity. If the action is complicated or involves decisions, a knowledge-base (KB) article will often be referenced. Alternatively, if no user action is possible or the remediation path is unclear the recommendation will typically be to be to contact Dell EMC Isilon support.

Healthcheck functions include warning about a non-recommended configuration, automatically detecting known issues with current usage and configuration, and identifying problems and anomalies in the environment where the cluster is deployed (network, AD, etc).

OneFS currently provides sixteen checklist categories containing more than two hundred items, including eighty three IOCA (Isilon On-Cluster Analysis) checks. These are:


Category

Description

All

All available checks

Avscan

Checklist to determine the overall health of AVScan

Cluster_capacity

Checklist to determine the overall capacity health for a pool or cluster

Infiniband

Checklist to determine the overall health of the Infiniband backend

IOCA

Pre-existing perl script that assesses the overall health of a cluster. Checklist contains all integrated IOCA items.

Job_engine

Job Engine-related health checks

Log_level

Checklist to determine the overall health of log-level

NDMP

Checklist to determine the overall health of NDMP

NFS

Checklist to determine the overall health of nfs

NTP

Checklist to determine the overall health of time synchronization

Post-upgrade

Checklist to determine post-upgrade cluster health

Pre-upgrade

Checklist to determine pre-upgrade cluster health

SmartConnect

Checklist to determine the overall health of SmartConnect

SmartPools

Checklist to determine the overall health of SmartPools

SMB

Checklist to determine the overall health of smb

Snapshot

Checklist to determine the overall health of snapshots.

Synciq

Checklist to determine the overall health of SyncIQ


Under the hood, a OneFS health check is a small script which assesses the vitality of a particular aspect of an Isilon cluster. It’s run on-cluster via the new healthcheck framework (HCF) and returns both a status and value:

 

Health Attribute

Description

Status

OK, WARNING, CRITICAL, EMERGENCY, UNSUPPORTED

Value

  1. 100  Is healthy; 0 is not.


The following terminology is defined and helpful in understanding the Healthcheck framework:


Type

Description

Item

Script that checks a specific thing

Checklist

Group related Items for easy use

Evaluation

One instance of running an Item or Checklist

Freshness

Each item has a ‘freshness’ value which defines whether it’s new or a cached from a previous evaluation

Parameter

Additional information provided to the item(s)

Result

Output of one Evaluation

RUP

Roll-up Patch: The delivery vehicle for new OneFS Healthchecks and patches.

 

CLI commands:


The healthchecks themselves automatically run daily. They can aso be managed via the OneFS CLI using dedicated set of ‘isi healthcheck’ commands. For example, the following syntax will display all the checklist categories available: 


# isi healthcheck checklists list


To list or view details of the various individual checks available within each category, use the ‘items’ argument and grep to filter by category. For example, the following command will list all the snapshot checks:


# isi healthcheck items list | grep -i snapshot

fsa_abandoned_snapshots        Per cluster   Warns if the FSAnalyze job has failed or has left excess snapshots on the cluster after a failure

ioca_checkSnapshot             Per cluster   Checks if the Snapshot count is approaching cluster limit of 20,000, whether Autodelete is set to yes, and checks snapshot logs. Checks snapshot logs for EIN/EIO/EDEADLK/Failed to create snapshot

old_snapshots                  Per cluster   Checks for the presence of snapshots older than 1 month

snapshot_count                 Per cluster   Verify the snapshot counts on the cluster conform to the limits.

  1. 1. Active snapshot count - Number of active snapshots in the system.
  2. 2. In-delete snapshot count - Number of snapshots pending delete.


The details of an individual check, in this case ‘old_snapshots’, can be displayed using the following syntax:


# isi healthcheck items view old_snapshots

Name: old_snapshots

Summary: Checks for the presence of snapshots older than 1 month

Scope: Per cluster

Freshness: Now

Parameters:

freshness_days(38)  *

Description: * OK: There are no unusually old snapshots stored on the cluster

* WARNING: At least one snapshot stored on the cluster is over one month old.

This does not necessarily constitute a problem and may be intentional, but such

snapshots may consume a considerable amount of storage. Snapshots may be viewed

with 'isi snapshot snapshots list', and marked for automatic removal with 'isi

snapshot snapshots delete <snapshot name>'

 

The full suite of checks for a particular category (or ‘all’) can be run as follows. For example, to kick of the snapshot checks:


# isi healthcheck run snapshot


The ‘evaluations’ argument can be used to display when each set of healthchecks was run. In this case, listing and grep’ing for snapshots will show when the test suite was executed, whether it completed, and whether it passed, etc:


# isi healthcheck evaluations list | grep -i snapshot

snapshot20190924T2046 Completed - Pass - /ifs/.ifsvar/modules/health-check/results/evaluations/snapshot20191014T2046

 

The ‘evaluations view’ argument can be used to display the details of a particular healthcheck run, including whether it completed, whether it passed, specifics of any failures, and the location of the pertitnent logfile:

 

# isi healthcheck evaluations view snapshot20191014T2046

ID: snapshot20191014T2046

Checklist: snapshot

Overrides: -

Parameters: {}

Run Status: Completed

Result: Pass

Failure: -

Logs: /ifs/.ifsvar/modules/health-check/results/evaluations/snapshot20191014T2046

 

New health checks are included in Roll-Up Patches, or RUPs (previously known as Service Packs), for common versions of OneFS, specifically 8.0.0.7, 8.1.0.2, 8.1.0.4, 8.1.2, 8.1.3, 8.2.0, 8.2.1. The RUPs for these releases are typically delivered monthly and new checks are added to subsequent RUPs.

 

With the delivery of each new RUP for a particular release, the core OneFS release is also rebuilt to include the latest health checks and patches. This means that the customer download URL for a OneFS release will automatically include latest pre-installed RUP, thereby removing an additional patching/reboot requirement from the cluster’s maintenance cycle. The checks run across all nodes and are typically run daily. The results are also automatically incorporated into ‘isi_phone_home’ data.

trimbn

OneFS Instant Secure Erase

Posted by trimbn Oct 7, 2019

There are a several notable problems with many common drive retirement practices. Although not all of them are related to information security, many still result in excess cost. For example, companies that decide to re-purpose their hardware may choose to overwrite the data rather than erase it completely. The process itself is both time consuming, and a potential data security risk. For example, since re-allocated sectors on the drives are not covered by the overwrite process, this means that some old information will remain on disk.

 

Another option is to degauss and physically shred drives when the storage hardware is retired. Degaussing can yield mixed results since different drives require unique optimal degauss strengths. This also often leads to readable data being left on the drive which can obviously constitute a significant security risk.


Thirdly, there is the option to hire professional disposal services to destroy the drive. However, the more people handling the data, the higher the data vulnerability. Total costs can also increase dramatically because of the need to publish internal reports and any auditing fees.


To address these issues, OneFS 8.2.1 introduces Instant Secure Erase (ISE). ISE enables the cryptographic erasure of non-SEDs drives in an Isilon cluster, providing customers with the ability to erase the contents of a drive after smartfail.


But first, some useful terminology:


Term

Definition

Cryptographic Erase

‘SANITIZE’ command sets for SCSI/ATA drive is defined by the T10/T13 technical committees, respectively.

Instant Secure Erase

The industry term referring to the drive’s ‘cryptographic erase’ capability.

isi_drive_d

The OneFS drive daemon that manages the various drive states/activities, mapping devices to physical drive slots, and supporting firmware updates.

 

So OneFS ISE uses the ‘cryptographic erase’ command to erase proprietary user data on supported drives. ISE is enabled by default and automatically performed upon OneFS Smartfailing a supported drive.


instant_secure_erase_1.png

 

ISE can also be run manually against a specific drive. To do this, it sends standard commands to the drive, depending on its interface type. For example:


  • SCSI: “SANITIZE (cryptographic)”
  • ATA: “CRYPTO SCRAMBLE EXT”


If the drive firmware supports the appropriate above command, it swaps out the Data Encryption key to render data on the storage media unreadable.


instant_secure_erase_2.png

 

In order to use ISE, the following conditions must be met:


  • The cluster is running OneFS 8.2.1 (Acela)
  • The node is not a SED-configuration (for automatic ISE action upon smartfail)
  • User has privileges to run related CLI commands (for manually performed ISE)
    • For example, the privilege to run ‘isi_radish’
  • Cluster contains currently supported drives:
    • SCSI / ATA interface
    • Supports “cryptographic erase” command
  • The target drive is present

 

instant_secure_erase_3.png


ISE can be run by the following methods:


1)  Via the isi_drive_d daemon during a drive Smartfail.

    • If the node is non-SED configuration
    • Configurable through ‘drive config’


2)  Manually, by running the ‘isi_radish’ command.


Additionally, it can also invoked programmatically by executing the python ‘isi.hw.bay’ module.

 

As mentioned previously, ISE is enabled by default, but it can be easily disabled from the OneFS CLI with the following syntax:


# isi devices drive config modify --instant-secure-erase no


The following CLI command can also be used to manually run ISE:


# isi_radish -S <bay/dev>


ISE provides fairly comprehensive logging, and the results differ slightly depending on whether it is run manually or automatically during a smartfail. Additionally, the ‘isi device drive list’ CLI command output will display the drive state. For example:


State

Context

SMARTFAIL

During ISE action

REPLACE

After ISE finish

 

 

Note that an ISE failure or error will not block the normal smartfail process.


For a manual ISE run against a specific drive, the results are both displayed on the OneFS CLI console and written to /var/log/messages.


The ISE logfile warning messages include:


Action

Log Entry

Running ISE

“Attempting to erase smartfailed drive in bay N ...”,

“Drive in bay N is securely erased”

(isi_drive_history.log) “is securely erased: bay:N unit:N dev:daN Lnum:N seq:N model:X …”

ISE not supported

“Drive in bay N is not securely erased, because it doesn't support crypto sanitize.”

ISE disabled in drive config

“Smartfailed drive in bay N is not securely erased. instant-secure-erase disabled in drive_d config.”

ISE error

“Drive in bay N is not securely erased, attempt failed.”

“Drive in bay N is not securely erased, can't determine if it supports crypto sanitize.”

(isi_drive_history.log) “failed to be securely erased: bay:N unit:N dev:daN Lnum:N seq:N model:X …”

 

When troubleshooting ISE, a good first move is using the CLI ‘grep’ utility to search for the keyword ‘erase’ in log files.


Symptom

Detail

ISE was successful but took too long to run

  • It depends on drive model, but usually < 1 minute
  • It may block other process from accessing the drive.

ISE reports error

  • Usually it’s due to CAM error(s) sending sanitize commands
  • Looking at console & /var/log/messages & dmesg for errors during ISE activity timeframe
    • Did CAM report error?
    • Did the device driver / expander report error?
    • Did the drive/device drop during sanitize activity?

For the final article in this in-line data reduction series, we’ll turn our attention to deduplication and compression efficiency estimation tools.


Firstly, OneFS includes a dry-run Dedupe Assessment job to help estimate the amount of space savings that will be seen on a dataset. Run against a specific directory or set of directories on a cluster, the dedupe assessment job reports a total potential space savings. The job uses its own separate configuration, does not require a product license, and can be run prior to purchasing F810 hardware to determine whether deduplication is appropriate for a particular data set or environment.

inline-dedupe4_1.png

The dedupe assessment job uses a separate index table to both in-line dedupe and SmartDedupe. For efficiency, the assessment job also samples fewer candidate blocks and does not actually perform deduplication. Using the sampling and consolidation statistics, the job provides a report which estimates the total dedupe space savings in bytes.

inline-dedupe4_2.png

The dedupe assessment job can also be run from the OneFS command line (CLI):


# isi job jobs start DedupeAssessment


Alternatively, in-line deduplication can be enabled in assessment mode


# isi dedupe inline settings modify –mode assess


One the job has completed, review the following three metrics from each node:


# sysctl efs.sfm.inline_dedupe.stats.zero_block

# sysctl efs.sfm.inline_dedupe.stats.dedupe_block

# sysctl efs.sfm.inline_dedupe.stats.write_block

 

The formula to calculate the estimated dedupe rate from these statistics is:


dedupe_block / write_block * 100 = dedupe%


Note that the dedupe assessment does not differentiate the case of a fresh run from the case where a previous SmartDedupe job has already performed some sharing on the files in that directory. Isilon recommends that the user should run the assessment job once on a specific directory, since it does not provide incremental differences between instances of the job.


Similarly, the Dell Live Optics Dossier utility can be used to estimate the potential benefits of Isilon’s in-line data compression on a data set. Dossier is available for Windows and has no dependency on an Isilon cluster. This makes it useful for analyzing and estimating efficiency across real data in situ, without the need for copying data onto a cluster. The Dossier tool operates in three phases:


Dossier Phase

Description

Discovery

Users manually browse and select root folders on the local host to analyze.

Collection

Once the paths to folders have been selected, Dossier will begin walking the file system trees for the target folders. This process will likely take up to several hours for large file systems. Walking the filesystem has a similar impact to a malware/anti-virus scan in terms of the CPU, memory, and disk resources that will be utilized during the collection. A series of customizable options allow the user to deselect more invasive operations and govern the CPU and memory resources allocated to the Dossier collector.

Reporting

Users upload the resulting .dossier file to create a PowerPoint report.


To obtain a Live Optics Dossier report, first download, extract and run the Dossier collector. Local and remote UNC paths can be added for scanning. Ensure you are authenticated to the desired UNC path before adding it to Dossier’s ‘custom paths’ configuration. Be aware that the Dossier compression option only processes the first 64KB of each file to determine its compressibility. Additionally, the default configuration samples only 5% of the dataset, but this is configurable with a slider. Increasing this value improves the accuracy of the estimation report, albeit at the expense of extended job execution time.

 

inline-dedupe4_3.png

 

The compressibility scan executes rapidly, with minimal CPU and memory resource consumption. It also provides thread and memory usage controls, progress reporting, and a scheduling option to allow throttling of scanning during heavy usage windows, etc.


When the scan is complete, a ‘*.dossier’ file is generated. This file is then uploaded to the Live Optics website:

 

inline-dedupe4_4.png

 

Once uploaded and processed, a PowerPoint report is generated in real time and delivered via email.

 

inline-dedupe4_5.png

 

Compression reports are easy to comprehend. If multiple SMB shares or paths are scanned, a summary is generated at the beginning of the report, followed by the details of each individually selected path.


Live Optics Dossier can be found at URL:   https://app.liveoptics.com/tools/dossier


Documentation is at:  https://support.liveoptics.com/hc/en-us/articles/229590207-Dossier-User-Guide


When running the Live Optics Dossier tool, please keep the following considerations in mind. Doesn’t provide exactly the same algorithm as the OneFS hardware in-line compression. It also looks at the software compression, not the hardware compression. So actual results will generally be better than Dossier report.


Note that there will be some data for which Dossier overestimates compression, for example with files whose first blocks are significantly more compressible than later blocks. It is intended to be run against any SMB shares on any storage array or DAS and has no NFS export support. The Dossier tool can also take a significant amount of time to run against a large data set. By default, it only samples a portion (first 64KB) of the data, so results can be inaccurate. Dossier only provides the size of the uncompressed and compressed data. It does not provide performance estimates of different compression algorithms. It doesn’t attempt to compress files with certain known extensions which are generally uncompressible.

Filter Blog

By date:
By tag: