VxRack-Flex: All SDC's disconnected from SDS

           

   Article Number:     533542                                   Article Version: 2     Article Type:    Break Fix 
   

 


Product:

 

VxFlex Product Family,VxRack Flex Series

 

Issue:

 

 

During the upgrade of a VxRack Flex's RCM one server lost communication between its SDS and all SDC's.                                                           

 

 

Cause:

 

 

For some reason the duplex and speed on vmnic0 got reset to half duplex and auto-negotiate.    
   
    When the system was brought up scaleio attempted to drive IO over this nic and it kept failing because Scaleio was trying to use full duplex and 10000 speed.   
   
    This caused the cisco switch to down the port vmnic0 was on since the switch though the port was flapping due to a network error.    
   
    Here are some records from the vmkernel.log demonstrating the flapping caused by this scenario:    
   
    2019-05-01T15:46:03.721Z cpu26:86039)netschedHClk: NetSchedHClkWatchdogSysWorld:4552: vmnic0: link up event received, device running at 10000 Mbps so setting queue depth to 86460 bytes with expected 1310 bytes/us   
    2019-05-01T15:46:04.648Z cpu42:85995)netschedHClk: NetSchedHClkWatchdogSysWorld:4364: vmnic0: hclk scheduler instance clean up   
    2019-05-01T15:46:04.649Z cpu45:85995)netschedHClk: NetSchedHClkDoFlushQueue:3874: vmnic0: dropping 42 packets from queue netsched.pools.persist.default   
    2019-05-01T15:46:04.649Z cpu45:85995)netschedHClk: NetSchedHClkDoFlushQueue:3874: vmnic0: dropping 501 packets from queue netsched.pools.vm.67108869   
    2019-05-01T15:46:04.650Z cpu45:85995)netschedHClk: NetSchedHClkDoFlushQueue:3874: vmnic0: dropping 122 packets from queue netsched.pools.persist.mgmt   
    2019-05-01T15:46:04.653Z cpu45:85995)netschedHClk: NetSchedHClkWatchdogSysWorld:4475: vmnic0: watchdog world (worldID = 85995) exits   
    2019-05-01T15:46:07.015Z cpu42:66250)ixgben: ixgben_CheckTxHang:1762: vmnic0: false hang detected on TX queue 0   
    2019-05-01T15:46:12.017Z cpu42:66250)ixgben: ixgben_CheckTxHang:1762: vmnic0: false hang detected on TX queue 0   
    2019-05-01T15:46:12.649Z cpu38:65725)ixgben: indrv_UplinkReset:1520: indrv_UplinkReset : vmnic0 device reset started   
    2019-05-01T15:46:12.649Z cpu38:65725)ixgben: indrv_UplinkQuiesceIo:1483: Stopping I/O on vmnic0   
    2019-05-01T15:46:12.740Z cpu52:66252)ixgben: ixgben_CheckLink:2514: Link is down for device vmnic0 (0x4307f4af4540)   
    2019-05-01T15:46:12.740Z cpu52:66252)netschedHClk: NetSchedHClkNotify:2908: vmnic0: link down notification   
    2019-05-01T15:46:12.740Z cpu52:66252)netschedHClk: NetSchedHClkDoFlushQueue:3874: vmnic0: dropping 211 packets from queue netsched.pools.vm.67108869   
    2019-05-01T15:46:12.740Z cpu38:65725)ixgben: indrv_DeviceReset:2382: Device Resetting vmnic0   
    2019-05-01T15:46:12.740Z cpu38:65725)ixgben: indrv_Stop:1950: stopping vmnic0   
    2019-05-01T15:46:13.013Z cpu38:65725)ixgben: indrv_UplinkStartIo:1460: Starting I/O on vmnic0   
    2019-05-01T15:46:13.130Z cpu38:65725)ixgben: indrv_UplinkReset:1540: indrv_UplinkReset : vmnic0 device reset completed   
    2019-05-01T15:46:13.177Z cpu52:66252)ixgben: ixgben_CheckLink:2514: Link is up for device vmnic0 (0x4307f4af4540)   
    2019-05-01T15:46:13.177Z cpu52:66252)netschedHClk: NetSchedHClkNotify:2900: vmnic0: link up notification   
    2019-05-01T15:46:13.341Z cpu52:66252)ixgben: ixgben_CheckLink:2514: Link is down for device vmnic0 (0x4307f4af4540)   
    2019-05-01T15:46:13.341Z cpu52:66252)netschedHClk: NetSchedHClkNotify:2908: vmnic0: link down notification   
    2019-05-01T15:46:13.348Z cpu52:66252)ixgben: ixgben_CheckLink:2514: Link is up for device vmnic0 (0x4307f4af4540)   
    2019-05-01T15:46:13.348Z cpu52:66252)netschedHClk: NetSchedHClkNotify:2900: vmnic0: link up notification   
    2019-05-01T15:46:13.573Z cpu52:66252)ixgben: ixgben_CheckLink:2514: Link is down for device vmnic0 (0x4307f4af4540)   
    2019-05-01T15:46:13.573Z cpu52:66252)netschedHClk: NetSchedHClkNotify:2908: vmnic0: link down notification
                                                           

 

 

Change:

 

 

In this case it was an upgrade to VxRack Flex's RCM                                                           

 

 

Resolution:

 

 

1) Set the correct settings on the vmnic port.  In this case it was full duplex and 10000 speed.   
    2) Bounce the port on the cisco switch:   
   
    Putty to the cisco switch that owns the port that needs to be bounced.   
    Display all of the ports on the switch:   
    #show interfaces status   
   
    In our case this was the status we saw that was bad:   
    Eth1/2/3      316     eth  access down    linkFlapErrDisabled        auto(D) --    
   
    Now let's  bounce the port:   
    # enable   
    # configure terminal   
    (config)# interface Ethernet 1/2/3   
    (config-subif)# shutdown   
    (config-subif)# no shutdown   
    (config-subif)# end   

      # show interfaces Ethernet 1/2/3 status   

   

      The port should show it is enabled and the issue with scaleio will rectify itself once it starts driving IO again.