1 2 3 4 Previous Next 45 Replies Latest reply: Aug 12, 2014 8:27 AM by justin_kidd RSS

Ask the Expert: SMB Protocol on an Isilon Cluster

Stephanie McBride

Welcome to this EMC Support Community Ask the Expert conversation.

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask the Expert: Are you ready to manage deep archiving workloads with Isilon’s HD400 node and OneFS 7.2.0? Find out more about the Data Lake Foundation products

Ask the Expert: Isilon Performance Analysis

Ask the Expert: EMC Isilon Scale-out Data Lake

This discussion will focus on supporting the SMB Protocol on an Isilon Cluster, including:

 

  • Differences between SMB1 and SMB2
  • What do the various isi auth and isi smb configuration options do
  • What logs and commands are used to diagnose issues
  • General troubleshooting concepts for SMB on an Isilon Cluster

 

Your host:

profile-image-display.jspa?imageID=7965&size=350

Peter Abromitis has been in support for over 8 years and is specialized in the Windows Protocol area. He is currently tasked as the Subject Matter Expert for Windows Protocols within Isilon Support, which involves everything from troubleshooting problems with SMB1, SMB2, Active Directory, and Permissions through standard Isilon Tools and Packet Traces; helping and developing TSEs as they progress through their career; and driving supportability needs into OneFS to make the lives of both customers and support engineers easier when dealing with issues on an Isilon Cluster.


Please see the posts below to read the full discussion. For a high-level summary of some of the key topics, Pete has posted this document: https://community.emc.com/docs/DOC-26337

  • 1. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Stephanie McBride

    This Ask the Expert event is now open for comments and questions. We look forward to an interesting and informative discussion!

  • 2. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    To get the ball rolling, have you ever wondered what the most common support case for SMB we get is?

     

    Permissions!

     

    Permission cases make up about 25% of our overall case work.  I have found that we can split permission cases into two types of case:

     

    1.) General Windows Permission

    2.) Multi-Protocol Permission

     

    With general windows permission cases, troubleshooting the issue should be the exact same way as you would troubleshoot a permission issue on a windows server.  Therefor, it is important to understand how share permission and file system permission interact with each other.

     

    For multi-protocol permission issues, it becomes more complex as Isilon OneFS has a very advanced ACL Policy that can be configured.

     

    I will not get into the details of our AIMA (Authentication, Identity Management, Authorization) engine here as that will be covered in a future Ask The Expert event.

     

    I will however provide some general pointers to troubleshooting a permission problem.

     

    The first thing I like to do is connect to Start -> Run ->  \\cluster (do not add a share to the end)

     

    The reason I connect to just the root of the cluster is because it is a good way to test Authentication.  If the connection fails, you should stop troubleshooting a permission problem and focus you efforts on authentication.

     

    After you have proven that you can connect to the cluster without any issues, I collect the following data to determine why permission is being denied:

     

    1.) Collect the Unix version of the Users Token (this may not return anything if multi-protocol is not in use:

     

        For 6.5

        isi auth mapping token --name=username

        For 7.x

       isi auth mapping token --user=username

     

    2.) Collect the Windows version of the Users Token (Note the \\ between domain and username)

     

        For 6.5

        isi auth mapping token --name=domain\\username

        For 7.x

        isi auth mapping token --user=domain\\username

     

    3.) Collect the share output

     

        For 6.5

        isi smb permission list --sharename=<problem share>

        For 7.x

        isi smb shares view --share=<problem share>

     

    4.) Collect both ls -led and ls -lend output of the problem file and each directory above it

     

         ls -led /ifs/data/file1.txt

         ls -led /ifs/data

         ls -led /ifs

     

         ls -lend /ifs/data/file1.txt

         ls -lend /ifs/data

         ls -lend /ifs

     

    Once you have collected the data above, the process to resolve the permission problem is as follows:

     

    1.) Note the Group Memberships the user is a member of from Step 1 and 2

     

    2.) Verify the user is either directly in or is a group member of an entry in the share permission in step 3

     

    3.) Verify the user is either directly in or is a group member of an entry in files system permission in step 4

     

    To provide an example, lets say that I have a user Pete who is unable to write to a share:

     

    1.) Collect isi auth mapping output

    isi-ess-east-1# isi auth mapping token --name=domain\\pete

    Initial name: pete

    Final Token                                                                                 

    --------------------------------------------------------------------------------------------

    Primary uid: pete (1502)

    Primary user sid: pete (SID:S-1-5-21-321531391-2185564565-1823270536-1014)

    Primary gid: pete (1502)

    Primary group sid: SID:S-1-5-21-321531391-2185564565-1823270536-1000

    On-disk user identity: pete (1502)

    On-disk group identity: pete (1502)

    Additional Identities:

        admin (GID:10)

        unixusers (SID:S-1-5-21-321531391-2185564565-1823270536-1029)

        unixusers (GID:2003)

     

    2.) Collect the share permission output:

    isi-ess-east-1# isi smb permission list --sharename=ITGroup

    SMB Share Permissions:

    Sharename:      ITGroup

            Account                    Acct Type  Perm Type  Permission 

            Everyone                   Builtin    Allow      Read          << Pete is a member of Everyone

            staff                      Group      Allow      Full Control  << Pete is not a member of staff

     

    3.) Collect ls -led and ls -lend ouptput of the paths (I am truncating the otuput)

    isi-ess-east-1# ls -led /ifs/data/itgroup

    drwxrwxrwx +  2 root  wheel  0 Jul 15 09:33 /ifs/data/itgroup

    OWNER: user:root

    GROUP: group:wheel

    CONTROL:dacl_auto_inherited,sacl_auto_inherited,dacl_protected

    0: user:root allow dir_gen_all

    1: creator_owner allow dir_gen_all,object_inherit,container_inherit,inherit_only

    2: group:Administrators allow dir_gen_all,object_inherit,container_inherit

    3: everyone allow dir_gen_read,dir_gen_write,dir_gen_execute,std_delete,object_inherit,container_inherit << Note this gives the Everyone Group Write Permission

    4: group:Users allow dir_gen_all,object_inherit,container_inherit

     

    Therefor, the problem in this scenario is at the share level.  Pete is a member of Everyone and gets Read at the share which overrides the File System Permission of Everyone Read/Write.  Thus, Pete can only read, not write.

     

    I hope this helps; Happy Permission Troubleshooting!

  • 3. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    soetingr

    Hi,

     

    I am very interested when standard windows tools like MMC, RMTSHARE and SETACL will be supported by Isilon in combination with access zones.

     

    Regards,

     

    Robert

  • 4. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    My understanding is that we are targeting the MMC feature for Waikiki (7.2).

  • 5. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    dynamox

    Peter,

     

    we are on 6.5.5.12 and continue to have intermittent issues with lssad process, where users get prompted for credentials even though they are logged-in to AD.  The only way to fix it is to killall -9  lsassd . We have been told many time "oh it's fixed in the next version OneFS", we upgrade only to be disappointed again because the issue continues. Will this ever be resolved ? (we are not upgrading to 7.x)

  • 6. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Phil

    How can I find out the currently used smb verison at a cluster?

    The common commands 'smbstatus'  or 'smbd -V' or similar are not working at the CLI.

  • 7. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    Hello dynamox,

     

    Great question, unfortunately the answer is, it depends.  I am not familiar with your direct issue so I am going to speak to what we generally see.  The lsassd service handles authentication requests and when things are working normally it should be in an Online state.  OneFS has a concept that if there is a problem with our domain connectivity, lsassd will then go into an Offline state.  When in an Offline state, the client may or may not be able to authenticate depending on the the authentication mechanism they use.  The lsassd service will stay in an Offline state for 5 Minutes at which point it will perform a new Domain Controller discovery and select a new DC.  The 5 minutes is tunable:

     

    In 6.5.x

    isi auth config modify --check-online-interval=

     

    In 7.x

    isi auth ads modify --check-online-interval=

     

    How is authentication impacted when lsassd goes Offline?

    -- If a user connects to a cluster and the client chooses to use NTLM for authentication, it will fail because in an Offline state we do not have a connection to a Domain Controller.

    -- If a user connects to a cluster and it uses Kerberos:

        -- If the user connected earlier and we already have the SID from the user token resolved to a username in our SID Cache, it will work.

        -- If the user connects and we do not have the SID in our SID Cache, it will fail as we will be unable to complete a SID2Name lookup to the domain controller.

     

    Are existing user connections impacted when lsassd goes Offline?

    -- No, the existing user connections will continue as normal.  The only time they will experience an issue is if the client does something to trigger a new authentication request.  Even in that scenario it is highly likely that the new authentication request will work as it is likely using Kerberos and our Sid Cache is populated.

     

    Why does lsassd go Offline?

    Our lsassd process goes offline when it detects problems with connectivity to a domain controller.  Depending on what type of failure it detects determines whether lsassd will go Offline or trigger a failover to another DC.  This process is documented in the following KB:

    Active Directory Discovery and Failover for OneFS

     

    Why was my answer "It depends?"

    Lsassd can go Offline because of an external event (a DC reset our TCP Connection) or an internal event (a bug with Lsassd).  If it happens to be an external event, the resolution will need to come from the DC side.  From the sounds of it, since support has declared your issue fixed in a newer release, they are indicating it is a bug so an upgrade would be relevant.  If the problem continues after the upgrade, it may have been an external event all along or it may be a new defect.  Either way, if you are on a fixed version, the best thing to do is contact support and collect the necessary data for root cause.

     

    What data should I collect so support can resolve the issue?

    I am glad you asked I have a very good step by step action plan that you can collect in order for us to resolve the issue.

     

    1.) Make the following directory:

    mkdir /ifs/data/Isilon_Support/DomainOfflineIssue

     

    2.) Start the packet traces (You will have to modify this command for the specific interfaces in your cluster (ie lagg0 may be em0) and you will also need to put your DC IPs in

    isi_for_array 'tcpdump -s 0 -i lagg0 -w /ifs/data/Isilon_Support/DomainOfflineIssue/`hostname`.$(date +%m%d%Y_%H%M%S).lagg0.pcap -- host <ip of dc1 in cluster site> or host <ip of dc2 in cluster site> &'

    isi_for_array 'tcpdump -s 0 -i lagg1 -w /ifs/data/Isilon_Support/DomainOfflineIssue/`hostname`.$(date +%m%d%Y_%H%M%S).lagg1.pcap -- host <ip of dc1 in cluster site> or host <ip of dc2 in cluster site> &'

     

    3.) Turn on lsassd debug logging

    isi_for_array -s 'isi auth log-level --set=debug'

     

    4.) Wait for the domain to report offline

     

    5.) After domain offline occurs run the following to stop the traces

    isi_for_array -s 'pkill -9 tcpdump'

     

    6.) Turn off lsassd debug logging

    isi_for_array -s 'isi auth log-level --set=error'

     

    7.) Copy lsassd logs to case directory

    isi_for_array -s 'ls /var/log/lsassd.log | cut -d / -f 4 | while read foo; do bar=$(cp "/var/log/$foo" /ifs/data/Isilon_Support/DomainOfflineIssue/`hostname`.$foo);done'

     

    8.) Upload all the data

    isi_gather_info -n 1 --nologs -s "isi_hw_status -i" -f /ifs/data/Isilon_Support/DomainOfflineIssue

     

    9.) Perform a full log gather

    isi_gather_info

  • 8. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    philippspohr, we have since moved from samba to a likewise implementation which is why those commands no longer work.  We do not have an equivalent to smbd -V.  For smbstatus you can run the following:

     

    For 6.5.x

    isi smb session list

    isi smb file list

     

    For 7.x

    isi smb sessions list

    isi smb openfiles list

  • 9. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    brewcityninja

    We are having the exact same issue on OneFS Version 7.0.1.5 and have a ticket open with support since April.  We have been classified as a bug for about a month, so its really interesting to see someone else with this issue.

  • 10. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    prashant_shah

    This might be a pretty basic question but I haven't yet found a good explanation for this.  When setting up an SMB share, we are given two choices.  "Apply Windows Default ACLs" and "Do not change existing permissions".  When I attended training, we were advised to do "Do not change existing permissions".  However, when I ran into issues at a client site and called support, I was told to use the other option.  What is the difference between the two and what's the general use case?  Thanks.

  • 11. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Mark

    Why isi_for_array tcpdump vs isi_netlogger?

  • 12. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    Hello prashant_shah,

     

    This option is often mis-understood so I am glad you asked.

     

    When a cluster is setup, /ifs is configured with the following default permissions:

    ISI7021-1# ls -led /ifs

    drwxrwxrwx    9 root  wheel  158 Jul 17 07:46 /ifs

    OWNER: user:root

    GROUP: group:wheel

    SYNTHETIC ACL

    0: user:root allow dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child

    1: group:wheel allow dir_gen_read,dir_gen_write,dir_gen_execute,delete_child

    2: everyone allow dir_gen_read,dir_gen_write,dir_gen_execute,delete_child

     

    If you create a directory through webui or cli, the directory will get the following permissions:

    ISI7021-1# ls -led /ifs/tmp

    drwxr-xr-x    2 root  wheel  0 Jul 17 07:46 /ifs/tmp

    OWNER: user:root

    GROUP: group:wheel

    SYNTHETIC ACL

    0: user:root allow dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child

    1: group:wheel allow dir_gen_read,dir_gen_execute

    2: everyone allow dir_gen_read,dir_gen_execute

     

    If you create a new share pointing to the /ifs/tmp directory and select "Do not change existing permissions", it will leave the permissions as:

    ISI7021-1# ls -led /ifs/tmp

    drwxr-xr-x    2 root  wheel  0 Jul 17 07:46 /ifs/tmp

    OWNER: user:root

    GROUP: group:wheel

    SYNTHETIC ACL

    0: user:root allow dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child

    1: group:wheel allow dir_gen_read,dir_gen_execute

    2: everyone allow dir_gen_read,dir_gen_execute

     

    If you create a new share pointing to the /ifs/tmp directory and select "Apply Windows Default ACLs" the equivalent will be run against the directory:

    chmod -D /ifs/tmp

    chmod -c dacl_auto_inherited,dacl_protected /ifs/tmp

    chmod +a# 0 group Administrators allow dir_gen_all,object_inherit,container_inherit /ifs/tmp

    chmod +a# 1 group creator_owner allow dir_gen_all,object_inherit,container_inherit,inherit_only /ifs/tmp

    chmod +a# 2 group everyone allow dir_gen_read,dir_gen_execute /ifs/tmp

    chmod +a# 3 group Users allow dir_gen_read,dir_gen_execute,object_inherit,container_inherit /ifs/tmp

    chmod +a# 4 group Users allow std_synchronize,add_file,add_subdir,container_inherit /ifs/tmp

     

    That ends up converting the ACL to:

    ISI7021-1# ls -led /ifs/tmp

    drwxrwxr-x +  2 root  wheel  0 Jul 17 07:46 /ifs/tmp

    OWNER: user:root

    GROUP: group:wheel

    CONTROL:dacl_auto_inherited,dacl_protected

    0: group:Administrators allow dir_gen_all,object_inherit,container_inherit

    1: creator_owner allow dir_gen_all,object_inherit,container_inherit,inherit_only

    2: everyone allow dir_gen_read,dir_gen_execute

    3: group:Users allow dir_gen_read,dir_gen_execute,object_inherit,container_inherit

    4: group:Users allow std_synchronize,add_file,add_subdir,container_inherit

     

    This may or may not be a good thing for the permissions on your directories.  Lets say that /ifs/tmp was a NFS export and you explicitly wanted those Mode Bit Rights set based due to Unix client application requirements.  By selecting the "Apply Windows Default ACLs" option, you have now overwritten the original ACL which may break the application.  Thus, there is risk associated with using "Apply Windows Default ACLs" with a currently existing directory.

     

    On the flip side, lets say that /ifs/tmp was a brand new directory created from cli that you want windows users to be able to create and delete files in.  When creating the share, if you set "Do not change existing permissions" and then had the users attempt to save files there, they would get access denied because "Everyone" only gets Read access.  In fact, even as Administrator, you would not be able to modify the security tab of the directory to add Windows users because the Mode Bits limit access to only Root.

     

    In summary, a pretty good rule of thumb is as follows:

    -- If you have an existing directory structure that you want to add a share to, you most likely do not want to change the ACL so you should select the "Do not change existing permissions" option.

    -- If you are creating a new share for a new directory you will likely be changing permissions to the ACL to grant Windows users rights to perform operations.  Thus, you should set the "Apply Windows Default ACLs" option and then once the share is created, go into the Security tab from Windows and assign permissions to users as needed.

  • 13. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    Peter Abromitis

    Hello Mark,

     

    isi_netlogger is a wrapper for tcpdump and isi_netlogger is cluster aware, thus the reason you do not need to run it with isi_for_array since it has the -c switch to select all nodes.

     

    Tcpdump is native to freebsd and is not cluster aware, therefor you have to isi_for_array when you want to run it across multiple nodes.

     

    I used to use isi_netlogger quite a bit but have since switched to just using tcpdump.  One of my favorite commands to run depending on the scenario is:

    tcpdump -s 0 -i <interface> -w /ifs/data/Isilon_Support/`hostname`.$(date +%m%d%Y_%H%M%S).<interface>.pcap &


    Or if I need multiple nodes:


    isi_for_array 'tcpdump -s 0 -i <interface> -w /ifs/data/Isilon_Support/`hostname`.$(date +%m%d%Y_%H%M%S).<interface>.pcap' &


    Also, when taking traces for SMB issues, please make sure to use the -s 0 switch.  By default tcpdump will truncate frames to 96 bytes.  People used to set it to around 400 for SMB1 but if you do that for SMB2, you will lose compounded commands so it is best to capture the entire frame.

  • 14. Re: Ask the Expert: SMB Protocol on an Isilon Cluster
    prashant_shah

    Great explanation!  Thank you.

1 2 3 4 Previous Next