Advanced Troubleshooting of an Isilon Cluster Part 6

NOTE: This topic is part of the Uptime Information Hub.

 

< Previous   Next >

 

 

 

Troubleshooting performance issues (cont'd)

This is a continuation of the troubleshooting performance issues series.

 

Managing protocol issues

If you have issues with a particular protocol, there are several things to consider. For SMB, if you see a client/server interaction where Isilon is the server and you think that the behavior is incorrect, try reproducing the same interaction against a Windows server. Taking packet captures of each interaction and comparing them can be very informative.

 

Some of the protocols (SMB, in particular) have a long and varied history. The EMC SMB service does not support all pre-SMB1 (LM0.12) commands, and does not support clients that cannot or will not use Unicode instead of ASCII. This includes some photocopiers that implement a very old SMB stack. EMC supports NFS versions 2 and 3, and, if you enable it, version 4. Version 1 is not supported. UDP is supported, but we strongly recommend against it, for reasons related to issues with the protocol. For HDFS, we maintain a list of tested and supported Hadoop distributions.

 

Another issue that affects SMB is the use of protocol accelerators over WAN. These devices are very useful, but because they necessarily change the protocol that they are accelerating, they also cause significant compatibility issues. In this case, we recommend performance testing with the device disabled or out of the loop.

 

Finally, firewalls can break protocols in subtle and unusual ways. In particular, NFS file locking in versions prior to version 4 require multiple open ports in both directions in order for the protocol to operate correctly.

 

NFS-specific issues

The NFS file protocol has a fairly long history. The jump to version 4 involves an enormous change in implementation. Unlike version 3 and earlier, NFS version 4 is "stateful," meaning that NFS remembers your configuration settings. OneFS supports versions 2, 3, and, optionally version 4 (disabled by default).

 

Some common issues that you may encounter with NFS:

 

NFS fields

NFS fields are part of the NFS specification. They are returned by the GETATTR and READDIRPLUS options in the isi nfs... commands. This means, for example, that listing out files by using the ls command on the client will trigger these calls. The fileid (not to be confused with the similarly named filehandle) represents the inode number of the file on the server. It is specified as a 64-bit value in the NFSv3 specification. On an Isilon cluster, for every file in the /ifs directory, except for /ifs itself, the inode number is guaranteed to be > 32-bit values. The issue is that certain older clients can't handle translating this larger value down into the inode value—for example, the stat() system call. Therefore, older clients—especially older releases of AIX, and even newer client operating system releases running older client code—can fail when these large values are returned. To work around this, there is a per-export option that forces OneFS to truncate the returned fileid to just the lower 32 bits.

 

If you are seeing unexpected access to exports, particularly client access that you think should be denied, check for overlapping exports. If you create a restricted export of /ifs/data, but /ifs is exported to all users and submounts are allowed, all users will be able to access /ifs/data by means of the /ifs export.

 

NFSv3 is not "firewall friendly." Because of the stateless nature and the add-on elements, such as the lockd utility, you must take extreme care with firewall rules.

 

SMB-specific issues

The SMB protocol also has a very long history. OneFS supports many modern clients, but older clients attempting to use pre-SMB1 protocol operations will not function properly.

 

Share ACLs

A common misunderstanding regarding file access using SMB is that there are two sets of permissions to deal with. Each share in SMB has what is called a share ACL. This is the first hurdle a client must clear to gain file access. If the share ACL does not give them access, the file system permissions are not even checked. In this sense, the share ACL is a subtractive system. In other words, with the exception of the "run as root" flag, the share ACL never gives extra permissions; it takes them away. For example, a share ACL entry that gives a group read access means the group can only read files; even the file system permissions would have allowed write access. After the share ACL has passed, then the on-disk permissions are compared to the access token for the client. If the "run as root" flag is set in the share ACL, access is performed as root, and permissions are bypassed. This can be useful for administrative purposes, but the primary use case should be data migration.

< Previous   Next >