Dell EMC Unity: CAVA server went offline and recovered intermittently

           

   Article Number:     531210                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

Dell EMC UnityVSA,Dell EMC Unity Hybrid,Dell EMC Unity Family,Dell EMC Unity All Flash

 

Issue:

 

 

Issue Description:   
    1. Customer configured CAVA, they got below error intermittently that CAVA server went offline and came back online in few seconds.    
    You will see following error logs in EMCSystemLogFile.log   
    "2018-12-28T02:18:22.331Z" "n1988006_spa" "Kittyhawk_safe" "26056" "unix/spa/root" "ERROR" "13:10510021" :: "The virus checker server xx.xx.xx.xx has encountered an error and is no longer operational.(Error: ERROR_AUTH 64)" :: Category=User Component=DART_VC   
    "2018-12-28T02:18:41.486Z" "n1988006_spa" "Kittyhawk_safe" "26056" "unix/spa/root" "NOTICE" "13:1051001d" :: "Virus checker server xx.xx.xx.xx is online." :: Category=User Component=DART_VC   
    "2018-12-28T02:35:42.361Z" "n1988006_spa" "Kittyhawk_safe" "26056" "unix/spa/root" "ERROR" "13:10510021" :: "The virus checker server xx.xx.xx.xx has encountered an error and is no longer operational.(Error: ERROR_AUTH 64)" :: Category=User Component=DART_VC   
    "2018-12-28T02:35:51.678Z" "n1988006_spa" "Kittyhawk_safe" "26056" "unix/spa/root" "NOTICE" "13:1051001d" :: "Virus checker server xx.xx.xx.xx is online." :: Category=User    
   
    The same error messages also was reported in c4_safe_ktrace.log like:   
    2018/12/28-21:32:25.865594 115K     7F86FEB3E709     sade:VC: 3:33:[Rick]  The virus checker server 160.46.85.196 has encountered an error and is no longer operational.(Error: ERROR_AUT   
    2018/12/28-21:32:25.865596    0     7F86FEB3E709     sade:VC: 3:33:[Rick]  H 64)   
    2018/12/28-21:32:25.865617   19     7F8790D83702     sade:SOCK_STREAM: 3:[core]  T_DISCON_REQ: (fd=0x1a0, NS=0) ShutdownStream() request UNSUPPORTED!   
    2018/12/28-21:32:25.865622    4     7F8790D83702     sade:SOCK_STREAM: 3:[core]  T_DISCON_REQ: (fd=0x1a0, NS=0)  Handling as DisconnectCloseStream()!   
    --   
    2018/12/28-21:32:53.272520    0     7FC1D8ADC70A      std:PSMSYS:PSMSYS:psmDataAreaClose(1:13): PENDED...   
    2018/12/28-21:32:55.127022 1.8M     7F86FEB3E702     sade:VC: 5:29:[Rick]  Virus checker server 160.46.85.196 is online.   
    2018/12/28-21:32:55.128201 1177     7F8790DC7704     sade:SOCK_STREAM: 3:[core]  T_DISCON_REQ: (fd=0x1cb, NS=0) ShutdownStream() request UNSUPPORTED!   
    2018/12/28-21:32:55.128206    3     7F8790DC7704     sade:SOCK_STREAM: 3:[core]  T_DISCON_REQ: (fd=0x1cb, NS=0)  Handling as DisconnectCloseStream()!    
   
    2. The error happened randomly with no certain pattern. It appears CAVA service is not impacted.    
   
    3. KB#462457 was followed up, but we didn't see a time skew between CAVA server/Unity/Windows DC server.   
   
    4. Network traces shows that CAVA server sent FSCTL_VALIDATE_NEGOTIATE_INFO Ioctl request but Unity did not respond and disconnected the TCP connection.   
    when the issue happened, CAVA server sent FSCTL_VALIDATE_NEGOTIATE_INFO Ioctl request but Unity did not respond and Finished the connection.   
    # Good case example   
    5817    0.433916    CAVA_IP    CIFS_IP    SMB2    156    Tree Connect Request Tree: \\rick\CHECK$   
    5820    0.433983    CIFS_IP     CAVA_IP    SMB2    138    Tree Connect Response   
    5823    0.434164    CAVA_IP    CIFS_IP    SMB2    212    Ioctl Request FSCTL_VALIDATE_NEGOTIATE_INFO   
    5826    0.434217    CIFS_IP     CAVA_IP    SMB2    194    Ioctl Response FSCTL_VALIDATE_NEGOTIATE_INFO  <=========== good case   
   
    # Bad case example   
    1806    0.096779    CAVA_IP    CIFS_IP    SMB2    156    Tree Connect Request Tree: \\rick\CHECK$   
    1807    0.096854    CIFS_IP     CAVA_IP    SMB2    138    Tree Connect Response   
    1818    0.097124    CAVA_IP    CIFS_IP    SMB2    212    Ioctl Request FSCTL_VALIDATE_NEGOTIATE_INFO   
    1820    0.097200    CIFS_IP     CAVA_IP    TCP    54    445 → 52006 [FIN, ACK] Seq=85 Ack=261 Win=304 Len=0   <=========== bad case   
   
    5. This issue would impact CIFS data copy like using emcopy to try and migrate data from VNX to Unity.   
    They run multiple copies simultaneously and intermittently different ones fail each time with the error.   
    Client OS : Microsoft (build 9200)   
    TH000 : 02:28:08 : ERROR (53) : unable get server info from \\cifsserver.exmaple.net
                                                           

 

 

Cause:

 

 

This is a bug, since Unity need to validate the information with the initial connection not with the binded connection. And it does cause customer issue.   
     
                                                           

 

 

Change:

 

 

The issue happened on Unity code 4.4.0.1536311042 & Windows Server 2016.                                                           

 

 

Resolution:

 

 

This is the problem only with the SMB version 3.0.2 even if they switch to 3.1 or 2.1 it will work. And even in 3.0.2 they have a workaround.    
    Workaround: Please dislabe Multichannel on Windows server or CAVA server.   
        
    Please run powershell as administrator, then:   
    PS C:\windows\system32> Get-SmbClientConfiguration   
        
        
    ConnectionCountPerRssNetworkInterface : 4   
    DirectoryCacheEntriesMax              : 16   
    DirectoryCacheEntrySizeMax            : 65536   
    DirectoryCacheLifetime                : 10   
    DormantFileLimit                      : 1023   
    EnableBandwidthThrottling             : True   
    EnableByteRangeLockingOnReadOnlyFiles : True   
    EnableInsecureGuestLogons             : False   
    EnableLargeMtu                        : True   
    EnableLoadBalanceScaleOut             : True   
    EnableMultiChannel                    : True   
    EnableSecuritySignature               : True   
    ExtendedSessionTimeout                : 1000   
    FileInfoCacheEntriesMax               : 64   
    FileInfoCacheLifetime                 : 10   
    FileNotFoundCacheEntriesMax           : 128   
    FileNotFoundCacheLifetime             : 5   
    KeepConn                              : 600   
    MaxCmds                               : 50   
    MaximumConnectionCountPerServer       : 32   
    OplocksDisabled                       : False   
    RequireSecuritySignature              : False   
    SessionTimeout                        : 60   
    UseOpportunisticLocking               : True   
    WindowSizeThreshold                   : 8   
        
    PS C:\windows\system32> Set-SmbClientConfiguration -EnableMultiChannel $false   
        
    Confirm   
    Are you sure you want to perform this action?   
    Performing operation 'Modify' on Target 'SMB Client Configuration'.   
    [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): yes   
    PS C:\windows\system32> Get-SmbClientConfiguration   
        
        
    ConnectionCountPerRssNetworkInterface : 4   
    DirectoryCacheEntriesMax              : 16   
    DirectoryCacheEntrySizeMax            : 65536   
    DirectoryCacheLifetime                : 10   
    DormantFileLimit                      : 1023   
    EnableBandwidthThrottling             : True   
    EnableByteRangeLockingOnReadOnlyFiles : True   
    EnableInsecureGuestLogons             : False   
    EnableLargeMtu                        : True   
    EnableLoadBalanceScaleOut             : True   
    EnableMultiChannel                    : False   
    EnableSecuritySignature               : True   
    ExtendedSessionTimeout                : 1000   
    FileInfoCacheEntriesMax               : 64   
    FileInfoCacheLifetime                 : 10   
    FileNotFoundCacheEntriesMax           : 128   
    FileNotFoundCacheLifetime             : 5   
    KeepConn                              : 600   
    MaxCmds                               : 50   
    MaximumConnectionCountPerServer       : 32   
    OplocksDisabled                       : False   
    RequireSecuritySignature              : False   
    SessionTimeout                        : 60   
    UseOpportunisticLocking               : True   
    WindowSizeThreshold                   : 8