Find Communities by: Category | Product

In this Blog post I want to share some information and experience regarding Isilon and SAS Grid Computing.

 

What is SAS Grid Computing [6]?

 

“SAS Grid Computing is an efficient way to analyze enormous amounts of data. SAS Grid Computing enables applications to deliver value in a highly effective manner for SAS analytics, data integration, data mining, and  business intelligence, while enabling fine-tuning of the SAS grid environment to allow multiple applications to efficiently and dynamically use a virtual IT infrastructure.” [1]


shared filesystems survey:

 

From an infrastructure point of view it is important to know that a SAS Grid Application (running on multiple nodes in the SAS Grid) requires access to the same data on every node. Therefore a shared filesystem is required.

 

SAS did a survey of shared filesystems in [3] with different representatives: IBM GPFS, Quantum StorNext and GFS2. They also evaluated network attached storage systems with NFS and CIFS.

 

In case of NFS they SAS tested Isilon and wrote following statement:

 

“The benchmarks were run on a variety of devices including EMC® Isilon® […] NFS benefits from isolating file system metadata from file system data. Some storage devices like Isilon allow this isolation. Excellent results were obtained from devices [like Isilon] that both isolate file system metadata and utilize SSD's for file system metadata.” [3]

 

SAS workload profile:

 

Another important attribute of SAS Grid Computing is that its workload tends to be very sequential and requires a very high amount of bandwidth. A sizing guideline by SAS is to provide more than 75 MB/s, for each compute core, from the storage. As you can see these environments easily requires multiple GB/s of bandwidth.

 

“Generally SAS I/O workloads are sequential reads and writes, and fall into the percentage range of 50/50 to 60/40 reads versus writes. For the purposes of this presentation, we will use the 50/50 read/write split. SAS will automatically adjust the I/O size based on the data set sizes. For data sets larger than a few MB, SAS will use 128KB chunks.” [2]


Simulate SAS workload:

 

If you want to simulate a general SAS workload with a 50/50 sequential mix, FIO [5] is the tool you should use. FIO is capable of doing alternating sequential read and write operations to one file. In our test we observed that this is very different to other benchmark tools like iozone, which can only start one thread with sequential read and another one with sequential write.
There is a major difference at the client side which results from memory handling. Without tuning the Linux kernel you get much lower bandwidth results with FIO compared to iozone. We also did testing with a real SAS application and its performance and behavior was very close to the FIO test. If you want to read how to setup a FIO test please have a look at [2]


Client tuning:

 

The kernel tuning required achieving the required bandwidth is not unusual and well known for tuning Oracle DB running on NFS. [4]

 

For our testing we used following kernel memory parameters:

 

vm.swappiness = 0
vm.dirty_expire_centisecs = 10
vm.dirty_writeback_centisecs = 10
vm.dirty_ratio = 10







 

Mostly this will tell the kernel to flush the memory much more often.

 

Isilon/EMC Whitepaper:

 

Besides these personal experiences with Isilon and SAS, we (EMC) have done a benchmark a while ago with SAS. [1] This white paper is worth reading to better understand the SAS Grid computing infrastructure with Isilon. Please note that this paper is not uptodate and was done with the previous OneFS major release (6.5). There will be an update with a newer OneFS 7.x release.

 

 

 

In conclusion Isilon is not only achieving excellent performance but at the same time it is much less complex compared to other solutions. Furthermore it can scale with growing SAS Grid Computing environments.

 

 

[1] http://www.emc.com/collateral/emc-perspective/h10515-ep-isilon-for-sas-grid.pdf

[2] http://support.sas.com/resources/papers/proceedings13/479-2013.pdf

[3] http://support.sas.com/rnd/scalability/papers/SurveyofSharedFilepaper_20130315final.pdf

[4] http://www.redhat.com/promo/summit/2010/presentations/summit/decoding-the-code/fri/scott-945-tuning/summit_jbw_2010_presentation.pdf (Page13)

[5] http://freecode.com/projects/fio

[6] http://www.sas.com/

Introduction to JSON

I mentioned in the last post that the PAPI uses JSON as the data-interchange format so it's critical you understand how to leverage it if you want to use the PAPI to create, modify, or delete resources.  You can learn more about JSON at www.json.org but the key principle is that it is completely programming language independent and is built on two structures:

  • A collection of name/value pairs
  • An ordered list of values

Example output from getting a system object would look like this:

{

"<object>": {

     "<property>": <value>,

     ...
}

 

So, how do we know what JSON text we need to POST to create an object?  It's simple, we can just take a look at the PAPI self documentation by sending the following request "GET /platform/1/quota/quotas?describe".  Here is the relevant Powershell code:

#Get PAPI self documentation for quotas

$resource = "/platform/1/quota/quotas?describe"

$uri = $baseurl + $resource

$ISIObject = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get

$ISIObject

You can see in the output below that the self documentation will tell you exactly what you need to POST to create a new quota. Pay close attention to the required properties since they may not be the same properties required for the corresponding "isi" command.

selfdocumentation.png

Now that we know what's required for the POST, the following is an example JSON string we can use to create a directory hard quota:

$QuotaObject = @"

{"type":"directory","include_snapshots": false,"container": true, "path": /ifs/home/user1", "enforced": true, "thresholds": {"hard":10000000},"thresholds_include_overhead": false}

"@ 

 

With the JSON string completed, all that's left is to build the Invoke-RestMethod parameters and submit.  Notice in the code below that we specify the JSON string as the body of the POST and that the content type is "application/json":

$headers = @{"Authorization"="Basic $($EncodedPassword)"}

$uri = $baseurl + $resource

$ISIObject = Invoke-RestMethod -Uri $uri -Headers $headers -Body $QuotaObject -ContentType "application/json; charset=utf-8" -Method POST

Write-Host "   Resulting Quota ID: "  $ISIObject.id

 

Putting It All Together

So let's use everything we've learned so far to script what would normally be a tedious, manual process.  Let's assume you have many home directories for your users under a single parent directory (ex. "/ifs/home") and you want to set a directory quota for each of these directories.  We already know how to set the quota on each individual directory based on the information above but how do we get the path to each user home directory?  The answer is that we can leverage the Isilon RESTful Access to the Namespace (RAN) API to access the namespace just like we did for other resources.  The following code will get the subdirectories of a specified path and then set a directory quota on each subdirectory:

# Get subdirectories of path specified

$resource = '/namespace/' + $path

$uri = $baseurl + $resource

$ISIObject = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get

#Loop through each directory and set the quota

ForEach($folder in $ISIObject.children) {

#Create quota

$resource ="/platform/1/quota/quotas"

Write-Host "Setting a $quotasize byte quota on $quotapath"

$QuotaObject = @"

{"type":"directory","include_snapshots": false,"container": true, "path": "$quotapath", "enforced": true, "thresholds": {"hard":$quotasize},"thresholds_include_overhead": false}
"@

$headers = @{"Authorization"="Basic $($EncodedPassword)"}

$uri = $baseurl + $resource

$ISIObject2 = Invoke-RestMethod -Uri $uri -Headers $headers -Body $QuotaObject -ContentType "application/json; charset=utf-8" -Method POST

Write-Host "   Resulting Quota ID: "  $ISIObject2.id

}

 

Here is the output from running the script attached to this post:

setquotacommandline.png

setquotaoutput.png

Hopefully between these two blog posts you now you have all of the information you need to create your own automation scripts using Powershell and the Platform API and the RAN API.

Introduction

As a NAS Specialist at EMC, I frequently get customer requests for scripts to automate tasks on Isilon.  To that end, over the past year or so, I've developed several Powershell scripts to automate tasks on Isilon clusters using SSH and "isi" CLI commands. With the introduction of the new RESTful platform API (PAPI) there is now a much easier and more elegant way to automate and manage  Isilon clusters using Powershell. This first blog post on the topic will show you the basics on how to connect to a cluster using Powershell and the PAPI and I'll create additional posts to demonstrate more complex examples (ex. creating quotas).

Before we dig into Powershell specifics, I highly recommend you download and read the latest PAPI reference guide on support.emc.com (7.0.2 can be found here).

Because the PAPI provides access to the cluster via REST we can manipulate resources using HTTP methods like GET, POST, PUT, and DELETE.  The representations of objects and collections are exchanged as JSON formatted documents (more on that later).

To use the PAPI you will of course need to enable HTTP on the cluster (Protocols-->HTTP Settings-->Enable HTTP)

2013-09-25_13-21-35.png

Powershell v3

Because we will be working with REST and JSON, it's best if you use Powershell v3 because it has built-in methods (ex. "Invoke-RestMethod) that simplify RESTful access.   You can check your current Powershell version by looking at the $PSVersionTable value:

psversion.jpg

If you don't have version 3 you will first need to make sure you have the .NET Framework version 4 found here.

You can then download and install the appropriate v3 management framework here.


Connecting to the cluster

Take a look at the attached script and you'll see that the script accepts parameters for an Isilon IP address or hostname, a username and password:

# Accept input parameters

Param([String]$isilonip,[String]$username,[String]$password)

 

To access the resources on the cluster we will use the "Invoke-RestMethod" with the appropriate parameters.  This method takes several parameters such as a URI string, a body (used for a POST), headers, etc.  The following code will build the header information we need and the base URL for access.

# Encode basic authorization header and create baseurl

$EncodedAuthorization = [System.Text.Encoding]::UTF8.GetBytes($username + ':' + $password)

$EncodedPassword = [System.Convert]::ToBase64String($EncodedAuthorization)

$headers = @{"Authorization"="Basic $($EncodedPassword)"

$baseurl = 'https://' + $isilonip +":8080"


Now we need to decide which resource we want to access (quota in this example) and add it to the base URL to create the final URI that we'll pass to  "Invoke-RestMethod".  If you take a look at the PAPI documentation you'll see that a collection of resources are accessed in the following format "/platform/1/<namespace>/<collection-name>".  So if for example, we want to get the collection of objects representing all of the quotas defined on the cluster we need to get "/platform/1/quota/quotas".

$resourceurl = "/platform/1/quota/quotas"

$uri = $baseurl + $resourceurl

 

All that's left now is to  call Invoke-RestMethod and then access to the returned object that contains the quota collection:

$ISIObject = Invoke-RestMethod -Uri $url -Headers $headers -Method Get

$ISIObject.quotas

 

So, let's run the script to see what we get (the output of course will be much more interesting if you have at least one quota defined):

script input.png

quotaoutput.png

Voila, we see a list of all of the quotas (and their properties) defined on the cluster!

Notice that I didn't do anything to format the output I got back from the Invoke-RestMethod?  That's the beauty of using this method in PSv3, it automatically converts the JSON output into objects.

 

That's all for this post, in the next post I'll cover the slightly more complicated process of creating a resource object (I'll create a quota as an example).

Filter Blog

By date:
By tag: