Advanced Troubleshooting of an Isilon Cluster Part 5

NOTE: This topic is part of the Uptime Information Hub.


< Previous  Next >



Troubleshooting performance issues (cont'd)

This is a continuation of the troubleshooting performance issues series.


Troubleshooting using log files

Many of the log files in OneFS are under the /var/log directory. The cluster also has its own event-logging/alerting subsystem (CELOG). The events are coalesced and maintained across the cluster. You can access them using the command line language (CLI), WebUI, and Platform API (PAPI). In OneFS 7.1 and later, we also include the EMC Common Event Enabler.


Much of the logging is handled by using the traditional Unix syslog(3) interface. Logging levels can be adjusted by changing the template/override files that the master control program (MCP) uses to construct the /etc/syslog.conf file.


The Windows protocol-handling daemons have their own commands to change logging levels:

  isi auth log-level --set=debug
  isi smb log-level --set=debug


Due to the amount of logging that elevated logging can generate, and subsequent effects on performance, be cautious when changing these levels, and limit the amount of time that the file system runs at elevated logging levels.


Some services/daemons can have their logging levels toggled between regular and debug by sending them a SIGHUP(1) signal.

Updating cluster software and firmware

To prevent hardware events, it is critical to keep node and drive firmware up-to-date. Keep your cluster current by installing the most recent maintenance releases, patches, and firmware to help ensure that you have the latest fixes for known issues. The Current Isilon Software Releases document and the Uptime Information Hub landing page list the recommended and most recent releases for OneFS software modules and drive firmware. Check these resources every six months to find out if there are new releases that you might want to install. You can also subscribe to product updates on EMC Online Support to receive notifications when new releases and documentation are available.


Monitoring cluster alerts

EMC Technical Advisories (ETAs) and EMC Security Advisories (ESAs) alert you to potential hardware or software issues that could cause serious negative impacts to a production environment, such as data loss, data unavailability, loss of system functionality, or anything that could result in a significant safety or security risk. The advisories include specific details about the issue, and instructions to help prevent or alleviate the problem. See the ETAs and ESAs in effect for OneFS.


For more information on setting up alerts, watch the video, How to Set Up an Email Event Notification in OneFS When a Cluster Reaches Capacity.

Performing preventive maintenance

Request an Isilon health check. The Isilon health check evaluates the status of your cluster's hardware, software, firmware, events, and fundamental settings. If you're running OneFS 6.5.5 or later, and you have an active maintenance agreement, you can request a health check by creating a Service Request at: Select Administrative for the Service Request type, and select 5-Scheduled Event for Problem Severity. Type “Isilon Health Check Request” in the Problem Summary box.


If you're responsible for monitoring the cluster's physical environment, review the EMC Isilon Cluster Preventative Maintenance Checklist to determine environmental thresholds and how to check them. For example, you should check the ambient temperature of the data center and the temperature of the nodes every day. Also, you should check the power distribution unit and ventilation paths every week.


It's also important to monitor alerts. Stay on top of the built-in alerts by using the OneFS logging/event system, which can send alert emails, trigger SNMP traps, and so on. You can configure which events that you want emails sent for. If you enable SupportIQ or EMC Secure Remote Services Gateway (ESRS) (see instructions in the following section), all of the critical alerts will be sent to EMC Online Support so that we can promptly act on them.


Enabling remote access to EMC Online Support

It is very important to enable Support IQ, or EMC Secure Remote Services Gateway (ESRS), which has additional capabilities and will soon replace SupportIQ. This enables the cluster to transmit alerts and configuration files to EMC Online Support, making it possible EMC Isilon support to provide the most timely support for issues. The information collected by SupportIQ or ESRS helps EMC Online Support proactively respond to issues that would otherwise require you to initiate contact with support and manually gather and upload log files. With either SupportIQ or ESRS enabled, issues can be resolved  before there is any impact to operations. If remote troubleshooting is required, and we receive your permission, EMC can use the ESRS connection to establish a remote support session. Some sites have security policies that do not allow these remote services. However, if you are not in that situation, we strongly recommend enabling one of these services.


There are many shared benefits between SupportIQ and ESRS. For example:

  • Both are included in warranty or maintenance, at no additional charge.
  • Both send identical system-generated alerts to EMC.
  • Both enable logs to be gathered by EMC.
  • And  both solutions provide security features, such as encryption, along with 2-way connectivity (monitoring and remote support).


Benefits of ESRS

    • ESRS is a consistent EMC product connectivity solution. The ESRS Gateway is a single solution that supports all connectivity-enabled EMC products, driving a consistent conduit for remote support activities and a consistent experience, regardless of the product mix within your EMC environment.
    • ESRS includes industry-leading security features, such as Advanced Encryption Standard (AES) 256-bit encryption and RSA digital certificates. The solution is also Federal Information Processing Standard (FIPS) 140-2 validated. It enables proactive, around-the-clock remote monitoring and repair through a two-way encrypted connection between your Isilon systems and EMC Online Support.
    • ESRS offers the customer an option to allow or deny remote support activity through use of the optional ESRS Policy Manager application and the “ask for approval” setting.
    • Customers have the ability to audit all remote support activity in order to stay in compliance with internal business and industry requirements, by using the ESRS Policy Manager.
    • ESRS is integrated with EMC Online Support. Use the My Products table to view Isilon cluster names, status of connectivity, configuration details, proactive alerts, and more.
    • An internal review has demonstrated many business benefits for ESRS-connected products. Based on the sample data, an ESRS connection enables 5 times faster problem resolution, 15 percent higher levels of availability, and means that EMC is now 3 times as likely to resolve support issues at initial contact.


Let's take a look at the comparative features of SupportIQ and ESRS.


SupportIQ and ESRS feature comparison.JPG.jpg


Note: You must be running OneFS 7.1 or later to use ESRS, and the ESRS Gateway 2.24 or later must be installed in your environment. ESRS is included in all standard warranty and maintenance agreements, at no additional charge. To upgrade the ESRS gateway to the latest version, go to

For more information on how to enable Support IQ, see article 16537 on the EMC Online Support site.


< Previous  Next