How to correct consistent lun violation? vMotion failing between ESX hosts due to lun address mismatch between servers VMAX GateKeeper RDM

Environment:

EMC Hardware: Symmetrix VMAX Series

EMC SW: Solutions Enabler

 

Description:

A consistent lun violation exists and the user is trying to correct it.

 

The user is trying to enable consistent lun and is getting an error stating there is a consistent lun violation.

 

 

Cause:

Initiator Groups are part of more than one Cascaded IG without consistent_lun enabled. 

New hosts IG added to parent initiator groups in a different order than originally created.

 

 

 

Resolution:

Procedure:

 

1. Obtain a backup of the user's masking database.

 

          symaccess -sid xxx backup -file aclxbackup.txt

 

2. Copy the file to your laptop running cygwin and Solutions Enabler or a Linux vm with Solutions Enabler for offline troubleshooting.

 

3. Check out the masking views in question to see if there are multiple lun addresses in use.

Note: This command will sort the output and add a counter to tell you how many HLU exist for each lun in the masking view, any lun with a count greater than 1 indicates a problem.

 

          symaccess -f aclxbackup.txt show view cluster_view -detail | grep ^[01234] | sort -u |   awk '{print $1}' | uniq -c

Sample output:

          2 0E7B   
          2 0E7E

 

4. Do a closer examination of the output of the command without the count.

Note: In this example for the masking view "cluster_view"  dev 0E7B has 2 different HLU adresses, this means that 2 initator groups see the device with a different LUN number.  This is only a problem if the LUN in question is used for RDM and vMotion is required between hosts in the cluster.

 

          symaccess -f aclxbackup.txt show view cluster_view -detail | grep ^[01234] | sort -u

Sample output:

          0E7B    06E:0     0
          0E7B    06E:0    44

 

5. To determine which initiators are affected.

 

          symaccess -f aclxbackup.txt show view cluster_view -detail

 

6. Search for the string "0E7B    06E:0     0" and scroll back to see which initiators are using it, this can be done with your text editor, e.g notepad ++

 

Masking View Name           : cluster_view

Last update time            : 12:35:54 PM on Wed Mar 05,2014

View last update time       : 12:35:54 PM on Wed Mar 05,2014

 

Initiator Group Name        : cluster_ig

 

   Host Initiators

     {

       IG   : host01_ig   ----All initiators Grouped here will share the same lun addresses for all luns in the masking view

       IG   : host02_ig

       IG   : host03_ig

     }

 

Port Group Name             : cluster_pg

 

 

   Director Identification

     {

       FA-6E:0

       FA-8E:0

       FA-10E:0

       FA-11E:0

     }

 

Storage Group Name          : cluster_sg

 

   Number of Storage Groups : 0

   Storage Group Names      : None

 

Sym            Host

Dev     Dir:P  Lun

------  -----  ----

0E7B    06E:0     0

              08E:0     0

              10E:0     0

              11E:0     0


7. Search for the string "0E7B    06E:0    44 " and scroll backward in the file to see which initiators are using this.

 

Initiator Group Name        : host04_ig *  ---   The second instance is the one with the alternate lun IDs.

   Host Initiators
     {
       WWN  : xxxxxxxxxxxe2d5 [alias: xxxxxxxxxxxxe2d5/xxxxxxxxxxxe2d5]
       WWN  : xxxxxxxxxxxxe2c3 [alias: xxxxxxxxxxxxe2c3/xxxxxxxxxxxxe2c3]

 

 

8. At this point we can determine that all hosts except host04 are using the same lun addresses. This is due to host4_IG being used in multiple masking views.

 

9. In order to enable host04_ig with consistent_lun and add back to the original parent group, you may need to remove it from the other masking views as well. 

 

9. The following command will show you which masking views host04 is part of.

 

symaccess -f aclxbackup.txt list devinfo -ig host04_ig | awk '{print $3}' | grep view | sort -u

cluster_view

sqlcluster_view


10. The inconsistent lun is caused by host04_IG being added to the 2 parent initiators after they have been created but in a different order than the original creation. Adding host04 to the cluster_ig first will ensure that host04 will have the same lun addresses for all nodes in this cluster. The following steps allow you to correct this.


11. Put the host4 into maintenance mode and evacuate any VMS to the remaining cluster nodes with vMotion.

symaccess –sid 0342 ‑name cluster_ig ‑type initiator remove -ig host04_ig
symaccess –sid 0342 ‑name sqlcluster_ig ‑type initiator remove -ig host04_ig

 


12. Add the host back in the order in which masking views were originally created, in this case it was determined by the cluster_ig.

symaccess –sid 0342 ‑name cluster_ig ‑type initiator add  -ig host04_ig
symaccess –sid 0342 ‑name sqlcluster_ig ‑type initiator add -ig host04_ig


13. Remove host from maintenance mode and rescan scsi controllers

 

NOTE: Adding host04_ig to the sqlcluster_ig second does not guarantee that luns will be consistent for all clusters views where this Masking view is present; this is only possible with the consistent_lun flag which should be set when creating initiator groups.

 

Currently the only way to set consistent lun flag after an initiator group has been created is to remove it from the masking view and create a new Masking view with consistent lun enabled on the initiator groups. This is done in a phased approach to ensure minimal impact.

 


Procedure to set Consistent lun for all hosts in a cascaded Initiator group using a staging Initiator Group and Staging Masking view:
Note:   This will require downtime on a node at a time, however vMotion should minimse any impact.

1. Put Host into maintenance mode & evacuate all VM with vMotion.


2. Make a note of all the volumes and lun addresses in use:

 

        symaccess -sid xxx show cluster_view  view–detail
         symaccess -sid xxx show sqlcluster_view view –detail

 

Storage Group Name: cluster_sg

Number of Storage Groups: 0

Storage Group Names: None

 

 

Sym            Host

 

Dev     Dir:P  Lun

------  -----  ----

0E7B    06E:0     0

              08E:0     0

              10E:0     0

              11E:0     0


3. Remove the host initiator group from the Parent IG for the cluster masking view.

symaccess –sid xxx ‑name cluster_ig ‑type initiator remove -ig host04_ig
symaccess –sid xxx ‑name sqlcluster_ig ‑type initiator remove -ig host04_ig


4. Enable consistent_lun on host initiator group, create new parent IG with consistent_lun enabled.

symaccess -sid xxx -name host04_ig -type initiator set consistent_lun on

5 . Add host IG to new parent IG.

symaccess -sid xxx -name new_cluster_ig create -type init -consistent_lun
symaccess -sid xxx -name new_cluster_ig -type init -ig host04_ig add

6. Create a new storage group for the staging masking view;  add the first lun that had HLU 0.

symaccess -sid xxx -name new_cluster_sg create -type storage
symaccess -sid xxx -name new_cluster_ig -type storage add dev 0E7B
  <<Dev was the HLU 0 device gathered from output at Step >>

7. Create a new masking view using the new parent IG, new storage group and original port group.

symaccess -sid xxx ceate view -name new_cluster_vew -sg new_cluster_sg -ig newcluster_ig -pg cluster_ig

8. Add remaining luns to the new SG ensuring to use the same LUN id noted from the original masking view detail output.

symaccess -sid xxx -name new_cluster_ig -type storage add dev 0E7C -lun 1

9. Rescan Host, verify Lun addresses and Symmetrix volumes match original cluster config.


10. Remove from maintenance mode.


11. Repeat steps 1,3,4,9,10 for remaining hosts in the cluster.

 

Note: For the final host in the cluster, the masking view must be deleted and the original Parent IG;  the host_IG can then have it's IG set to consistent lun and added to the new Parent Initiator Group as described above.

symaccess -sid xxx delete view -name cluster_view
symaccess -sid xxx delete -name cluster_sg -type storage

 


12. Following this activity delete the original storage group and rename the staging storage group and initiator groups. 

symaccess -sid xxx rename view -name new_sluster_view -new_name cluster_view
symaccess -sid xxx rename -name new_cluster_sg  -type storage -new_name cluster_sg
symaccess -sid xxx rename -name new_cluster_ig  -type init -new_name cluster_ig

 

 

Reference:

EMC Support Solution Number: 184306