PowerPath for AIX : Possible Data Unavailable (DU) / host crash during a firmware upgrade on Dell Compellent Arrays

           

   Article Number:     538106                                   Article Version: 2     Article Type:    Break Fix 
   

 


Product:

 

PowerPath for AIX

 

Issue:

 

 

Environment :    
    OS : AIX (any flavour)   
    DELL EMC Software : PowerPath for AIX (below PowerPath for AIX 7.0)   
    DELL EMC array : Dell Compellent array (any type - in virtual mode or in legacy mode)   
   
    If a SC controller reboot of a DELL Compellent array takes longer than 15 seconds, PowerPath is seeing an "ALL_PATHS_DEAD" situation.   
   
    Nota Bene : this KB doesn't apply to a connection to a DELL Compellent array through a VPLEX. For a similar issue in a VPLEX environment, see VPLEX: Possible Data Unavailable (DU) during a firmware upgrade on Dell Compellent Arrays.
                                                           

 

 

Cause:

 

 

   

      Dell Compellent Arrays failover process between controllers can take longer than 15 seconds. This is the timeout hard coded when "dynamic tracking" is enabled. This can potentially trigger an "ALL_PATHS_DEAD" situation for PowerPath for AIX since the other paths to the data are not yet available.     
            
      Dell Compellent Array Behavior:     
     
      Legacy mode :   

   

      There are two controllers in the Dell Compellent Array systems to handle front end connections from the host initiators. The controllers contain HBAs with multiple ports and each HBA is assigned to a separate fault domain so there is a redundant path to the Dell Compellent Array LUN from the host initiators.   Access to the target LUNs is distributed across both controllers, for example LUN 1,3,5, etc. are accessed via controller 1 and LUN 0,2,4, etc are accessed via controller 2.     
     
      In the event one controller is shutdown, or reboots, the front end virtual ports that reside on the controller that goes down, and are part of the fault domain, are transferred to the remaining controller and the host initiators establish sessions to the remaining controller. As a result all host initiator connections connect to the target LUNs via one controller rather than the previous distribution across both controllers. This failover mechanism, where LUNs are transferred from one controller to the other, can take up to 55 seconds. During that time, access to the device is rejected. This is causing an ALL_PATHS_DEAD situation.     
            
      Once the controller that rebooted has been restored, the system administrator of the Dell Compellent Array can then manually rebalance the front end ports and redistribute the load across both controllers.     
     
      Virtual mode :     
     
      There are two controllers in the Dell Compellent Array systems to handle front end connections from the host initiators but only 1 controller attached to the front end. Access to the target LUNs is done through one controller only.     
     
      In the event one controller is shutdown, or reboots, the front end virtual ports that reside on the controller that goes down, and are part of the fault domain, are transferred to the remaining controller and the host initiators establish sessions to the remaining controller. As a result all host initiator connections connect to the target LUNs via the second controllers. This failover mechanism, where LUNs are transferred from one controller to the other, can take up to 55 seconds. During that time, access to any device is rejected. If it takes more than 15 seconds(hard coded dynamic tracking timeout value), this is causing an ALL_PATHS_DEAD situation.     
     
      When the first controller goes down during the upgrade, initiators are disconnected and must login to the newly updated controller and that process can take longer than the 15 seconds (up to 60 seconds).   

                                                             

 

 

Change:

 

 

DELL Compellent SC reboot / NDU                                                            

 

 

Resolution:

 

 

At the time of writing this article, there is no PowerPath Solution. PowerPath Engineering is considering how to take into account the DELL Compellent array behavior and the KB will be updated when plans are clear.   
   
    The only possibility for an AIX host connected to a Dell Compellent array to be sure to survive to a Dell Compellent NDU is to use MPIO together with the shim driver software developed by DELL Engineering and included in the "Dell Storage Software Suite for AIX" (https://www.dell.com/support/home/us/en/19/drivers/driversdetails?driverid=591xm)   
   
    Note that the failover time depends on the number of devices configured in the array, on whether or not clone are configured, on the activity, etc ... Very often, the failover will take less than 15 second and will be managed transparently.