Dell EMC Unity: Create Replication session hangs at 66% on step "Create replication session after provisioning destination" (User Correctable)

           

   Article Number:     529599                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

Dell EMC Unity Family

 

Issue:

 

 

Array was set up for async replication, when an attempt was made to create a replication session it hung indefinitely at 66%, it does not complete or fail.   
     
                                                           

 

 

Cause:

 

 

Network was not capable of handling the current Unity MTU setting. Network team need to confirm MTU appropriate for network.                                                           

 

 

Change:

 

 

Remote array was added successfully and connection shows no issues:   
   
    uemcli /remote/sys show -detail   
    ...   
          Operational status           = OK (0x2)   
          Health state                 = OK (5)   
          Health details               = "Communication with the replication host is established. No action is required."   
   
    Basic ping tests succeed.   
    Interconnect validation does not report any issues (/nas/bin/nas_cel -interconnect -validate).   
    Ethernet port MTU was set to 1500.   
    An attempt was made to create a replication session but it hung at 66% with the job state as follows:   
   
    uemcli /sys/task/job show -detail   
   
          ID                  = N-1660   
          Type                = Replication Service   
          Title               = Create replication session after provisioning destination   
          State               = Running   
          Result description  =   
          User                = local/admin   
          Step                = 3 of 3 (Create replication session)   
          Start time          = 2019-01-16 13:40:10   
          Elapsed time        = 1d 03h 01m 20s   
          Estimated time left = 1s   
          Percent complete    = 66%
                                                           

 

 

Resolution:

 

 

   

      Test a ping with a specified MTU size to find the optimal MTU for the network. The optimal MTU will be the successful MTU value specified in the ping command +28 (You add 28 bytes because 20 bytes are reserved for the IP header and 8 bytes must be allocated for the ICMP Echo Request header.). For example to test an MTU of 1500 you use 1472 in the ping command.     
            
      Usage: ping [-aAbBdDfhLnOqrRUvV] [-c count] [-i interval] [-I interface]     
                  [-m mark] [-M pmtudisc_option] [-l preload] [-p pattern] [-Q tos]     
                  [-s packetsize] [-S sndbuf] [-t ttl] [-T timestamp_option]     
                  [-w deadline] [-W timeout] [hop1 ...] destination     
     
              
      Example of a ping failure at MTU 1500:     
     
         spb:~> ping -M do -s 1472 -I <Source_Rep_Int_IP> -c 10 <Dest_Rep_Int_IP>     
         PING <Dest_Rep_Int_IP> (<Dest_Rep_Int_IP>) from <Source_Rep_Int_IP> : 1472(1500) bytes of data.     
         From <Source_Rep_Int_IP> icmp_seq=1 Frag needed and DF set (mtu = 1500)     
         ...     
         --- <Dest_Rep_Int_IP> ping statistics ---     
         0 packets transmitted, 0 received, +10 errors     
     
      It was also seen that there was no response at certain values.     
     
      Example of a successful test at a lower MTU value:     
             
         spb:~> ping -M do -s 1300 -I <Source_Rep_Int_IP> -c 10 <Dest_Rep_Int_IP>     
         PING <Dest_Rep_Int_IP> (<Dest_Rep_Int_IP>) from <Source_Rep_Int_IP> : 1300(1328) bytes of data.     
         1308 bytes from <Dest_Rep_Int_IP>: icmp_seq=1 ttl=64 time=31.4 ms     
         ...     
         --- <Dest_Rep_Int_IP> ping statistics ---     
         10 packets transmitted, 10 received, 0% packet loss, time 9012ms     
         rtt min/avg/max/mdev = 31.435/31.482/31.703/0.192 ms     
     
             
      If the ping is failing at 1500 options are:     
     
      1. Set the MTU to the lower optimal value for the network. From Unity OE 4.4 release on this can be done from the GUI   

   

      GUI - Settings > Access > Ethernet, earlier code will require a support engagement.     
     
      Or     
     
      2. The network team need to determine the reason why the network it is not capable of supporting the desired MTU value.     
     
      Once the MTU value is set to a functional value the sessions that were hung at 66% should be created successfully.   

                                                             

 

 

Notes:

 

 

Log extracts:   
   
    Ktrace/sade logs from the source array show the following events repeating constantly:   
         
       B       01/19/19 15:00:15.628 sade             79831702 c4_safe_ktrace   SOCK_STREAM: 3:[core] T_DISCON_REQ: (fd=0x126, NS=0) ShutdownStream() request UNSUPPORTED!   
       B       01/19/19 15:00:15.628 sade             79831702 c4_safe_ktrace   SOCK_STREAM: 3:[core] T_DISCON_REQ: (fd=0x126, NS=0) Handling as DisconnectCloseStream()!   
       B       01/19/19 15:00:17.559 sade             4731d701 c4_safe_ktrace   DIC: 3:[core] <DicXmlSyncMsgService> Recv Cmd from <Dest_Rep_Int_IP> failed (41=Disconnected)   
       B       01/19/19 15:00:17.559 sade             4731d701 c4_safe_ktrace   CMD: 3:[core] DicXmlSyncRequest::sendMessage sendCmd failed:41   
       B       01/19/19 15:00:17.559 sade             4731d701 c4_safe_ktrace   CMD: 6:[core] CmdReplicatev2CreatePri::stampSecDm - BUSY!!! dicstatus not ok 41   
         
         
    On the destination side we can see the following error in the cemtracer_dataprotection log:   
         
    [RESTClient] ERROR - {0:506:928641866}[14796|1058|d7dffb40][parseCSRFToken @ /builds/storage/KH/upc-Unity.19/infrastructureproviders/components/RESTClient/src/HttpWrapper.cpp:341] no csrfToken found in response: HTTP/1.1 200 OK