Advanced Troubleshooting of an Isilon Cluster Part 7

NOTE: This topic is part of the Uptime Information Hub.

 

< Previous

 

Troubleshooting performance issues (cont'd)

This is the last in this series on troubleshooting performance issues.

 

 

Managing permissions and access

In modern environments, it has become common to have both Windows and Unix/POSIX clients that need to operate on common data in a secure and reliable fashion. In terms of handling permissions and access, authentication, identity management, and authorization (AIMA) are core functions of OneFS. It is also fertile ground for issues, due to the complexity involved.

 

One common source of support cases: access issues that are ultimately diagnosed as configuration issues. For this reason, we cannot stress enough how important it is to read the available documentation that covers the theory, in addition to a number of commonly-encountered situations with full setup instructions. For more information about AIMA, see the EMC Isilon Multiprotocol Data Access with a Unified Security Model and the Identities, Access Tokens, and the Identities, Access Tokens, and the Isilon OneFS User Mapping Service white papers. These documents provide high-level descriptions of how OneFS handles multiprotocol scenarios, details about user mapping, and how to configure protocols to achieve the desired behavior.

 

Probably the most common class of support cases involving AIMA are permissions issues. Most often, these issues involve the inability to access a file, directory, or program due to insufficient permissions. Occasionally, they will involve inappropriate access data that should be protected. There are both key differences and also some overlap between these two issues.

Managing hardware events

The most common hardware event you will see in a storage subsystem is disk failures. OneFS closely monitors the drives and will proactively kick out drives that show signs of impending failure. Despite this safety measure, sometimes drives will fail without warning. The system will automatically kick off the FlexProtect or FlexProtectLin (if there are SSDs on the cluster) jobs to repair all affected data and bring it back to full protection. The drive should not be removed from the cluster until the FlexProtect job has successfully completed and the drive bay is marked as empty.

 

The NVRAM journal on the Isilon nodes is critical to operation. As such, it has multiple batteries for backup, and these are continuously tested. If there are any issues with batteries, we strongly recommend contacting EMC Online Support.

 

Although a node can run off a single power supply, this should be allowed for only a limited time because it places undue stress on the remaining supply. Be sure that the failed supply is replaced as soon as possible.


EMC Isilon uses error-correcting code (ECC) memory in the cluster nodes. The system monitors and provides alerts about ECC errors. Occasional, single-event errors are not a cause for concern. However, if you are unsure, contact EMC Online Support.

 

Conclusion

Remember when troubleshooting an Isilon cluster that some issues, such as Isilon data integrity (IDI) errors, require immediate attention from EMC Online Support. If you are not sure how to resolve an issue, contact EMC Online Support immediately. Your data is our utmost concern, and we will do everything we can to protect it.

 

< Previous