I have seen that happen on a Clariion when i set the failover mode to 4 and went back and it had chaged to 1. Have your run the relvant commands on your vmware hosts to change to ALUA also ?
here is a link to a guy who had a similar issue and all the commands etc he had to run to change vmware to use ALUA, or you could use powerpath to look after all the path management for you
anyway have a read of the above it may help you out
I had the ALUA working fine on both the VNX and VMWare side. I had actaully seen the site you mentioned when I was doing the research to configure ALUA.
I just can't figure out why adding an additional WWN to the host record on the VNX would reset it's mode back to 1. I would think once it's set for the host, it should stay, no matter what HBA/WWN changes you make.
I'm just trying to find out if it's a bug (or "undocumented feature"), or if there's an intentional reason to reset it back.
This might be related to what you're experiencing:
The following is a Primus(R) eServer solution:
Solution Class: 3.X Compatibility
Goal Why is the failover mode on the array changing for 4 (ALUA) to 1 after an Storage Processor reboot or an NDU?
Fact Product: CLARiiON CX4 Series
Fact EMC Firmware: FLARE Release 30
Fact Product: VMware ESX Server 4.0
Fact Product: VMware ESX Server 4.1
[NOT] FactThis statement does not apply: Product: VMware ESX Server 5.x
Symptom After a storage processor reboot (either because of a non-disruptive upgrade [NDU] or other reboot event), the failover mode for the ESX 4.x hosts changed from 4 (ALUA) to 1 on all host initiators.
Cause On this particular array, for each Storage Group a Host LUN Zero was not configured. This allowed the array to present to the host a "LUNZ." All host initiators had been configured to failover mode 4 (ALUA). When the storage processor rebooted due to a non-disruptive upgrade (NDU), when the connection was reestablished, the ESX host saw the LUNZ as an active/passive device and sent a command to the array to set the failover mode to 1. This changed all the failover mode settings for all the LUNs in the Storage Group and since the Failover Policy on the host was set to FIXED, when one SP was rebooting, it lost access to the LUNs.
Fix VMware will fix this issue in an upcoming patch for ESX 4.0 and 4.1. ESX 5.x does not have this issue.
To work around this issue, you can bind a small LUN, add to the Storage Group and configure the LUN as Host LUN 0 (zero). You will need to reboot each host after adding the HLU 0. For each Storage Group you will need a HLU 0. See solution emc57314 for information on changing the HLU.
These are the directions from VMware for the workaround:
- Present a 1.5 GB or larger LUN0 to all ESX hosts. (This volume does not need to be formatted, but must be equal to or larger than 1.5 GB.
- Roll a reboot through all hosts to guarantee that they are seeing the LUN 0 instead of the LUNZ. A rescan may work, but a reboot guarantees that they will not have any legacy data for the CommPath volume.
Thanks for the info.
It does sound like a similar effect, but I'm not sure it was the same cause. A couple of days before the issue with the reset, we had both SP's replaced. During that time, as each SP went down and then back up, VMWare did exactly what it was supposed to. The paths to the down SP showed "dead" but the other paths worked fine.
The issue we had later was when we basically added additional host initiators to the existing host record on the VNX. Still, it sounds a lot like the effect we had, so I'll go ahead and set up a small lun 0 in the group.
Actually, on the CX700, we had a 5G LUN "0" we had set up years ago to address some of the other "LUNZ" issues we had with VMWare. When we set up the VNX, apparently that LUN wasn't brought over. Fortunately, the other luns were created on the VNX with non-zero host lun values, so it will be easy to create a small lun 0.