12 Replies Latest reply: Dec 15, 2012 11:04 AM by Christopher Imes RSS

Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI

AndMar

Hello guys,

 

I am configuring a VNX5300 in order to connect a VSphere 5.0 thru iSCSI. Checking at the following link I found the best practices advise to configure the VNX this way, with the native mp plugin:

 

NMPVMW_SATP_DEFAULT_AAVMW_PSP_FIXED

 

http://partnerweb.vmware.com/comp_guide2/detail.php?deviceCategory=san&productid=19518

 

I tried some configuration on the VNX but I never get the VMW_SATP_DEFAULT_AA on the VSphere server, I always get VMW_SATP_CX.

 

Could you please help understand how to configure the Failover mode and the Intiator type for the VSphere Host on the VNX array?

 

Thanks in advance

Andrew

  • 1. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Firstly let me direct you to two very good documents regarding vSphere and CX/VNX connectivity.  They are both available from PowerLink via the breadcrumb trails below:

     

    1) EMC Host Connectivity Guide for VMWare ESX Server
    Home > Support > Technical Documentation and Advisories > Host Connectivity/HBAs > Installation/Configuration

     

    2) TechBook: Using EMC VNX Storage with VMware vSphere

    Home > Support > Technical Documentation and Advisories > TechBooks

     


    SATP (Storage Array Type Plug-in) and (Unisphere) Host initiator Failover Modes
    ================================================================
    Firstly, VMX_SATP_* is reference to an SATP (Storage Array Type Plug-in) and would change depending on the vendor's implementation of the extension to the PSA (Pluggable Storage Architecture) framework.  When connected to a VNX/CLARiiON you would only use/see one of two (and will never see the generic: VMW_SATP_DEFAULT_AA):

     

    1) VMW_SATP_CX
    2) VMW_SATP_ALUA_CX

     

    These are defined by the host initiators' "Failover Mode" as set in "Connectivity Status" within Unisphere.  With your environment, there are only two possible choices for ESXi (and ESX for 4.x).

     

    1) Failover Mode 1 (FM1) = PNR (Passive Not Ready)


    2) Failover Mode 4 (FM4) = ALUA (Assymetic Logical Unit Access)

     

    In summary, the Failover Mode defines how the array responds to I/O via a path to the non-owning SP.  Without going into it here, I'd like to defer you to an old white paper but still relevant and the diagram showing the redirection as required should speak the proverbial "thousand words":

     

    http://www.emc.com/collateral/hardware/white-papers/h2890-emc-clariion-asymm-active-wp.pdf

     

     

    It is fair to say that where supported, ALUA (simulating an ACTIVE/ACTIVE array model reducing the trespass requirements) is the best choice but the array and the host have to support it.  In regards to ESX/ESXi, ALUA is supported with:

    1) ESX/ESXi 4.x (this is when it was first introduced by VMware)

    2) FLARE 28.5 patch 704 (or newer)

    a) ALUA array support was actually released with FLARE 26 (this is when EMC first introduced it but the initial implementaiton only supports SCSI-2)

     

    b) With the VNX, ALUA has been supported since its initial release

     

    Therefore, from your comment about always getting VMW_SATP_CX, this means that your Failover Mode is set to 1 (or it was changed to 4 and the ESXi 5 servers weren't yet rebooted as is required when changing from one to another but I'm assuming this is not the case for you).  So your first consideration should be to change the Failover Mode to 4 (ALUA) since you meet/exceed the requirements above.  This is possible within Unisphere:

     

    1) Using the "Failover Wizard" in the menu to the right when clicking into "Hosts"

     

    2) Or, for each path associated with the registered host in "Connectivity Status":

    a) Highlight the registered host (will select all paths associated with it or can select individual paths)

    b) Use same settings as before but only modify the "Failover Mode"

     

    Then when updated:

     

    3) Reboot the ESXi 5 hosts

     

    4) Confirm the host recognizes the modified settings (Storage Array Type = VMW_SATP_ALUA_CX)

    Also, from the host's perspective the array advertises itself as an ACTIVE/ACTIVE architecture (even though there is still LUN ownership and via the upper-director requests are forwarded as necessary via the CMI channel as you will have read) so all paths will now show as "Active" instead of half "Inactive" (as was the case with Failover Mode = 1) when viewed within the vSphere Client.  However, the host will only (by default) use the optimal paths as described below)

     


    NMP (Native Multipathing Plugin)
    ==========================
    Then, once set to ALUA and confirmed (after a reboot) that the SATP has been updated, you then have two choices for the PSP (Path Selection Policy):

    1) Round-Robin
    2) Fixed (default in ESXi 5)

     

    NOTE: in ESX 4.x (not relevant to this conversation), there was the introduction of a PSP called: "FIXED with ARRAY PREFERENCE".  The observed behavior was as follows:

     

    The ARRAY PREFERENCE from my experience aligned more with the “optimized” path, meaning the current owner and not the default owner.  Thus if you have 30 hosts all booting up at different times they would choose the optimized path which could be different depending on when the host was rebooted and what the current owner of the path was.  Without AP and FM4 any path that responds the quickest would be chosen.  Of course having paths chosen by default owner would be nice, but I don’t believe that was part of it.

     

    However, this is no longer in ESXi 5, but it basically made choices for you in regards to the "Preferred Path".  Also, in its decision tree, it did not make any effort to balance the paths either so it was possible that per SP/per host, the same preferred path was always chosen.  This also happens to be the default PSP in ESX/ESXi 4 when the SATP is set to VMX_SATP_ALUA_CX.

     

    Depending on which perspective you consider as discussed later, there would be preference over one or the other, but it is incorrect to state that EMC or VMware only supports one or the other.  They are both valid choices (when configured with ALUA), but each have their own management concerns. 

     


    PROS/CONS

    ==========

    ROUND-ROBIN better balances the load across the paths than one can ever do with FIXED; guaranteed.  RR is sending I/O down one (optimal) path then the next but not simultaneously.  It is possible, though, that if the trespassed LUNs are not managed, for example, after an NDU or code upgrade everything is now running on one SP thus overloading it.  The solution would of course be to manage the trespassed LUNs over time and keeping in mind that seeing a trespassed LUN isn’t a result of it being RR, just that it doesn’t have a mechanism to fail-back.  Whatever condition that prompted the trespass originally would have occurred with either FIXED or RR.

     

    More than once, I've either heard the statement "ROUND-ROBIN causes trespasses storms" or the question was asked if it does.  I wanted to share my thoughts about it and makes some points about the possible Native Path Selection Policies (PSP) when PowerPath/VE is not used.

     

    More often than not, what clients running ROUND-ROBIN are calling a “trespass storm” is simply because overtime the LUNs have explicitly/implicitly (ALUA) trespassed and without a failback mechanism it remained on the peer SP (unlike FIXED which restores it eventually back to the assigned preferred path).  A client who hasn’t been monitoring their trespassed LUNs with a RR configuration suddenly sees many/all of their LUNs on the peer SP and calls that a “trespass storm”.  Technically a true “trespass storm” would be seen in Unisphere with the LUN bouncing continuously back-and-forth from default to peer SP.   However, barring this scenario, in actuality FIXED causes more trespasses than RR when taken literally.  Under normal conditions, if a LUN were going to trespass originally, again it isn’t because it is FIXED or RR; the question is would it revert back to the original owner when the original path(s) are again available.  With FIXED it would trespass once more (back to what would be the original default owner in a properly configured environment), but with RR, it would remain on the peer SP; so in actuality, you have twice as many trespasses with this literal example.  Also, in a way, one can even take it further and suggest that FIXED can cause trespass storms if the original issue that caused the trespass is intermittent.

     

    Personally when I am with a client, I mention both possible choices: RR or FIXED (of course with anything ESX 4.0+ and at least FLARE 28.5 patch 704 or greater they should be running ALUA, but never an argument there).  It would be imho a disservice to not mention both options which are each valid and leave it up to the client to choose.  Even our documentation mention both solutions and for every example where FIXED is recommended, there is an equal number of statements where ROUND-ROBIN is suggested.  With the failback mechanism, many will suggest that FIXED is best practice, but in a well managed environment ROUND-ROBIN can most certainly be implemented (except in a MSCS environment).

     

     

    CONFIGURATION

    =============

    1) ROUND-ROBIN
    a) Better distributes the load across the fabric than anyone can do manually with FIXED (guaranteed) by sending by default 1000 I/O down one optimal path then 1000 I/O down the other (never simultaneously though)

    b) However, you will need to manage trespassed LUNs and of course I don’t expect them to do it in the GUI which is cumbersome and instead leave with them the following commands:

    naviseccli <SPA> trespass mine
    naviseccli <SPB> trespass mine

     

    c) I also remind clients, as tempting as it may be, to not enable “useANO=1” (use Active-Non Optimized); they will eventually read about it.
    By setting this, you are telling your hosts to include the non-optimal paths even in a healthy environment where all configured paths from the host to the VNX are available for I/O.  A non-optimal path would be a path from the host to the owning SP's peer then through the CMI (CLARiiON Messaging Interface) then to the owning SP.  By leaving it at 0 (default), then unoptimized paths reported by the array won't be included in the RR path selection until optimized paths become unavailable.  With ALUA configured all paths will show ACTIVE; however, only the optimal paths or those associated with the current SP owner will show ACTIVE (I/O).

     

    d) Also, there is a way of changing the default 1000 I/O’s of RR.  I’m indifferent about it personally, but the client will eventually read about it.  EMC has a good whitepaper about the results of changing from 1000 and 1 and the effects on different I/O profiles.  I'll supply the command for sake of completeness.

     

    esxcli nmp roundrobin setconfig --device <device UID> –iops

     

    e) Make RR the default PSP (path selection policy) for the ALUA SATP (storage array type plugin)
    - Currently, FIXED is the default PSP when ALUA is used
    - Depending on the version of ESX, the command to change this behavior is as follows:

    ESX 4.x (reboot required):

    esxcli nmp satp setdefaultpsp --satp=VMW_SATP_ALUA_CX --psp=VMW_PSP_RR

    ESX 5.x (reboot not required):

    esxcli storage nmp satp set -s VMW_SATP_ALUA_CX -P VMW_PSP_RR

     

    h) Install the Path Management feature of the Virtual Storage Integrator plug-in available from PowerLink via the following breadcrumb trail:

     

    Home > Support > Software Downloads and Licensing > Downloads T-Z > Virtual Storage Integrator (VSI)

     

    - In bulk, can manage the PSP for “EMC devices” (versus manually modifying individually per LUN on each host)
    - Keep in mind (whether or not you agree with the behavior) that this only affects what is currently presented, for instance, unless you change the default PSP with the commands above, any new LUNs that are presented will use FIXED and the admin will need to rerun the plugin (or of course, change the default behavior).

     

    While you are in PowerLink, you may also want to install the other relevant VSI features:

     

    - Storage Viewer

    - Unified Storage Management

     


    2) FIXED
    a) Unlike RR, this has a mechanism to failback (preferred path)

     

    b) However, FIXED has its management concerns as you have to manually select the preferred path (VSI Path Management feature does not assist with the preferred path):
    - You need to make sure the path corresponds to the default owner or else you force a trespass unintentionally
    - Furthermore, need to do this for each host and for each lun
    - Also, the assumption is that they are manually balancing the LUNs as best they can so that one path (per host) is not utilized more than the other
    - it is fair to say that for each example where clients weren’t managing their trespassed LUNs with RR and eventually running entirely on one SP, there are just as many examples of clients with misconfigured preferred paths.  For instance, imagine a scenario where host 1 had a preferred path for the same LUN on SPA and host 2 had a preferred path for the same LUN on SPB.

     


    So what would be the best solution?  It would be one that load balances and has visibility to the queue on the SP’s, can send I/O down the optimized (pool of) paths simultaneously, and has a fail-back mechanism.  Of course, PowerPath/VE offers this but is not a solution for all as it requires a vSphere Enterprise/Enterprise Plus license and a PowerPath/VE license (trial is available: emc.com/powerpath-ve-trial).

  • 2. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Sorry, a few corrections (in red):

     

    1) "ALUA (Asymmetric Logic Unit Access)"

    Not the mess in the beginning of the response.

     

    2) Firstly, "VMW_SATP_* is reference..."

    Typo in just the beginning but correctly listed throughout

     

     

    Also, meant to make a quick note above that ALUA (Failover Mode 4) is also one of the prerequisites for VAAI (vStorage APIs for Array Integration) support required by the host.  This hardware acceleration/offload feature and its primitives has been discussed in detail in other posts.

  • 3. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    AndMar

    Hello Chris,

     

    I would thank you for your detailed answer and the URL provided. Just to be more shortly I did not written down all the story of my question...

     

    The point is, I've open a SR to VMWare because I received a lot of disconnection warning message from the VNX, and I found that the cause of this disconnection are the hosts (ESXi 5). (the server is a Cisco UCS C200M2 with Broadcom NIC NetXtreme II 5709 Quad with iSCSI HBA/TOE)

     

    They told me that ESXi 5 only support the option mentioned aloft (hearing that was weird because the ESX 4.1 was supporting ALUA and other pretty stuff), that's way I've mentioned only the VMW_SATP_CX, I was not considering the ALUA as "configurable" in this circumtance.

     

    Anyway, imho the problem is related to the hardware and not to the configuration(and your answer helps me to state that), because I've tested all the SATP and PSP options available, but I always get the same error message on the vmkernel log.

     

    Moreover, I found that I'm not hte first one that is experiencing problems with that NIC.....

     

     

    2012-03-14T17:14:40.459Z cpu0:4804)bnx2i::0x41001360ad10: bnx2i_conn_stop::vmnic9 - sess 0x41000d70af48 conn 0x41000d70b2d0, icid 31, cmd stats={p=0,a=1,ts=20653,tc=20652}, ofld_conns 8

    2012-03-14T17:14:40.459Z cpu0:4804)iscsi_linux: [vmhba40: H:8 C:0 T:1] session blocked

    2012-03-14T17:14:40.459Z cpu8:5297)WARNING: LinScsi: SCSILinuxAbortCommands:1798:Failed, Driver bnx2i, for vmhba40

    2012-03-14T17:14:40.569Z cpu0:4804)bnx2i::0x41001360ad10: bnx2i_ep_disconnect: vmnic9: disconnecting ep 0x41001321a0b0 {31, 14dc00}, conn 0x41000d70b2d0, sess 0x41000d70af48, hba-state 1, num active conns 8

    2012-03-14T17:14:40.570Z cpu6:4102)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x2a (0x412400ec1c80) to dev "naa.60060160caf02e00b4845fd80148e111" on path "vmhba40:C0:T1:L3" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:EVAL

    2012-03-14T17:14:40.570Z cpu6:4102)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.60060160caf02e00b4845fd80148e111" state in doubt; requested fast path state update...

    2012-03-14T17:14:40.570Z cpu6:4102)ScsiDeviceIO: 2316: Cmd(0x412400ec1c80) 0x2a, CmdSN 0x4f0 to dev "naa.60060160caf02e00b4845fd80148e111" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

    2012-03-14T17:14:45.404Z cpu0:4804)<1>bnx2i::0x41001360ad10: conn update: icid 32 - MBL 0x40000 FBL 0x0MRDSL_I 0x20000 MRDSL_T 0x10000

    2012-03-14T17:14:45.405Z cpu0:4804)iscsi_linux: [vmhba40: H:8 C:0 T:1] session unblocked"

     

    If you have any ideas about that, I would be gratefull if you can share it with me.

     

    Thanks again

    Andrew

  • 4. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Thanks for the clarification.

     

    I was able to dig up the following from VMware Communities.  As you mentioned, there are many people having issues when using the Broadcom iSCSI driver (configuring as Hardware iSCSI).  While not ideal, but for any dedicated iSCSI HBA/TOE card the iSCSI Software Adapter is always an option, and not surprisingly people have reported that this is an alternative work-around.  If you choose this option, remember to configure the iSCSI VMkernel Port bindings, per best practice, as you would if they were generic 1GbE NIC's.  ESXi 5 now provides a GUI interface to perform this task but in 4.x it could only be performed via the CLI: esxcli swiscsi nic add

     

    NOTE: in ESXi 5.x, VMkernel Port binding can still be done via CLI via the slightly modified syntax:  esxcli iscsi networkportal add

     

     

    Seems that Broadcom acknowledged the issue recently with a test driver as suggested from one of the later responses in that post from 3/10/2012 (5 days ago):

     

    http://communities.vmware.com/thread/276107?start=0&tstart=0

     

    [...]

    I wrote a lot of emails from broadcom in this case. And?
    Finally a solution!

     

    I got a new test driver for the Broadcom iSCSI adapter.

     

    Now everything works as it should offloading and properly supports iscsi.

     

    [...]

     

     

    Finally, probably just a reminder, even though I only addressed the specific question you had regarding initiator settings (failover mode) and PSP, make sure you reference the two guides I noted earlier regarding iSCSI connectivity.  For instance the usual best practices for iSCSI connectivity:

     

    1) Separate subnets for each adapter and the corresponding SP ports they will connect to (also separate from the "mgmt" network)

    2) Disable Delayed Ack

    3) Review single vSwitch (multiple VMkernel ports while still maintaining separate subnets of course) vs. separate vSwitch for each NIC/VMkernel ports

    4) VMkernel Port binding (VMkernel to network adapters mapping)

    etc.

  • 5. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    AndMar wrote:

     

    The point is, I've open a SR to VMWare because I received a lot of disconnection warning message from the VNX, and I found that the cause of this disconnection are the hosts (ESXi 5). (the server is a Cisco UCS C200M2 with Broadcom NIC NetXtreme II 5709 Quad with iSCSI HBA/TOE)

     

    They told me that ESXi 5 only support the option mentioned aloft (hearing that was weird because the ESX 4.1 was supporting ALUA and other pretty stuff), that's way I've mentioned only the VMW_SATP_CX, I was not considering the ALUA as "configurable" in this circumtance.

     

    Interesting, if they are suggesting that only Failover Mode of 1 (PNR) is supported (or just VMW_SATP_CX) with ESXi 5, then I will only say that it is not a true general statement.  That would suggest then that you couldn't benefit from the hardware offload of VAAI as ALUA is one of the prerequisites.  However, I definitely don't want to second guess their comment, so I'll assume there is something specific to your environment that disqualifies it (but am personally not seeing the culprit).  Without looking through the VMware compatibility guides, from EMC's perspective and iSCSI Adapters we support as follows:

    "All 1 Gb/s or 10 Gb/s NICs for iSCSI connectivity, as supported by the Server/OS vendor."

     

    However, still search the "ESM by Host" PDF for any reference to this specific adapter (NOTE: you won't find it separately when building results from "Advanced Query" or "Solutions and Wizards" as you will with FC HBA's):

     

    https://elabnavigator.emc.com/vault/pdf/esm_by_host.pdf

     

    You'll only find the following comment:

     

    [..]

    Broadcom iSCSI boot is supported with following adapters

    - Broadcom 57710 based cards

    - Broadcom 57711 based cards

    - Broadcom 5708

    - Broadcom 5709

    [..]

  • 6. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Simply wanted to mention that an EMC KB article is "In Progress" and should be made public shortly.  It makes mention of the ESX Driver: bnx2i (which the OP is using as noted in the pasted error logs above)

     

    emc290457: "iSCSI Logout Info=0x0120071d [Target NopTimeout] with ESX 5.0 Host and NetXtreme II NIC"

     

    [...]

    When installing ESX(i) 5 and using the NetXtreme II NIC with TOE based on Broadcom 57711 chip (ESX Driver bnx2i) make sure to download and install latest Driver CD for Broadcom NetXtreme II Netowrk/iSCSI/FCoE Driver. This Driver CD is available in the download Section from VMware.

    [...]

  • 7. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Baif

    vSphere 5.1 and EMC Storage Multipathing

     

    kb.vmware.com/kb/2034799 Load balancing using Round Robin multipathing policy on EMC VNX arrays on vsphere 5.1

    kb.vmware.com/kb/2034797 Load balancing using Round Robin multipathing policy on EMC Symmetrix arrays on vsphere 5.1

    PowerPath/VE fails to load on vSphere 5.1

     

     

    kb.vmware.com/kb/2034796 VMware and EMC have identified two issues with PowerPath/VE 5.7 and VMware vSphere 5.1.

  • 8. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Yes, now that vSphere 5.1 is out, that changes a few things when talking about the use of vSphere Native Multipathing as Round-Robin is now the default PSP for ALUA/failover mode of 4 (or rather for the VMW_SATP_ALUA_CX SATP).  Whereas in 5.0 FIXED and in 4.x FIXED "with array preference" was the default.

     

    RR PSP also now provides proper path rebalancing when used with VNX OE 32 (and a mentioned backport to FLARE 30 in the comments section of the following article from Chad).  Baif, please review the following articles for more detail.

     

    http://virtualgeek.typepad.com/virtual_geek/2012/08/vmworld-2012-vmware-emc-storagethe-best-gets-better.html

     

    http://velemental.com/2012/09/07/fixedround-robin-in-5-1-and-a-simple-powercli-block-pathing-module/

     

     

    What I find interesting is in that in some cases before these recent enhancements, it was identified (as seen in Chad's video) with Fixed the paths may not have been rebalanced properly after all and manually trespassing LUNs back was required.  Therefore, our arguments before these enhancements/fixes where Fixed provided failback wasn't always the case.  Interesting.

  • 9. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    A1exp

    I've got a cx3-40 that I can only run in failover mode 1 (vmw_satp_cx).

     

    Having read all the info about round robin in 5.1 I can't see any issues with swapping to RR but I can't find anybody that mentions it's okay for active/passive, all talk is about ALUA arrays.

     

    Are there any issues with going to RR with vmw_satp_cx using 5.1?

  • 10. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    A1exp,

     

    Firstly, welcome to the forums and thank you for being an EMC customer.

     

    As you probably have already identified, but I'm obligated to mention it, is that ESXi 5.x is not a supported/validated solution with the CX3.  Part of it has to do with the array being EOL (but not yet EOPS or EOSL).  You will notice the absence of it when you review:

     

    1) VMware HCL (in reference to storage arrays):

    http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san

     

     

    2) E-Lab Interoperability

    https://elabnavigator.emc.com/do/navigator.jsp

     

    a) Via "Advanced Query", attempts to gather "Base Connectivity" results for: any CX3 models and ESXi 5.1 won't return results

     

    b) "Simple Support Matrix" will remind you that an RPQ (Request for Product Qualification) is required for either NMP or PowerPath/VE

     

    With that formality out of the way, as for why you are only finding (recent) discussions related to ALUA is that moving forward "All VMware HCL support and all E-Lab testing is now only done with failovermode 4" **

     

     

    Also, you have correctly identified that the CX3 while it supports ALUA which was introduced with FLARE 26, integration with ESX/ESXi 4.x+ requires SCSI-3 reservation support which was introduced in a later revision; FLARE 26 only supports SCSI-2 reservations.  In the original post above, it was mentioned while technically FLARE 04.28.000.5.704+ provides that mechanism (both SCSI-2 and SCSI-3 reservations) and was the minimum supported at the time, recently "E-Lab and VMware jointly made the decision to support only ALUA on FLARE R30 and VNX OE for Block" **.

     

     

    Finally to your specific question, while MRU is the recommended and default PSP when using failover mode of 1, Round-Robin is also a valid configuration even for arrays that don't support ALUA (w/ SCSI-3 reservations) except for 3.x when it was experimental of course.  The only thing we actually call out and explicitly don't recommend though when using failover mode of 1 (PNR) is Fixed due to the possibility of path thrashing where luns constantly trespassing between SP's.  ALUA, as a reminder, minimizes the LUN trespasses by instead routing I/O via the non-optimal path (up to the decision to do an implicit trespass). 

     

    You can find just a few of many examples of this documented as follows:

     

    1) emc301503: "What are the recommended path failover policies for VMware ESX/ESXi NMP (native multipath plugin) ?"

     

    For Active/Active and ALUA enabled storage arrays, fixed is the default policy. RoundRobin is supported and can be a good option to improve performance.

     

    For Active/Passive arrays MRU is the recommended policy. RoundRobin is supported and can be used to improve performance.

     

     

    2) Host Connectivity Guide for VMware ESX Server (search support.emc.com)

     

    "For VNX series and CLARiiON systems not capable of ALUA
    failover mode support, the MRU or Round Robin policies are
    recommended to avoid path thrashing."

     

    WARNING: I often see the following passage from the Connectivity Guide quoted as suggesting Round-Robin is supported with only ALUA, but that is an incorrect assumption.  It is simply talking about choices not restrictions:

     

    The use of ALUA failover mode allows users to choose from two
    failover policies for VMware Native Multipathing (NMP):
    - Fixed
    - Round Robin

     

     

    On the other hand, Fixed is the default and generally recommended PSP for arrays that support failover mode of 4 with ESX/ESXi 4.x and ESXi 5.0.  However, barring any bugs of course, Round-Robin is also a valid configuration which was my point in the lengthy post above.  That as we all know now, then changed with ESXi 5.1 where Round-Robin is now the default PSP for the SATP: VMW_SATP_ALUA_CX as path restore was introduced (finally).

     

    Keep in mind, there are certain applications that do not support Round-Robin even though all else does.  The often mentioned example is MSMC which doesn't support this PSP for the required RDM's.

     

     

    ** For the 2 quoted comments above you can refer to the KB article:

     

    emc99467: ""What are the Initiator, Arraycommpath, and Failovermode settings for PowerPath, DMP, PVLinks, and native failover software?

     

    Specifically refer to the embedded link to the PDF which has been updated with:

     

    1) All VMware HCL support and all E-Lab testing is now only done with failovermode 4

    2) E-Lab and VMware jointly made the decision to support only ALUA on FLARE R30 and VNX OE for Block

  • 11. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    A1exp

    Thanks for taking the time to write the post Chris, that's very helpful.

     

    We'll be moving up to a VNX shortly but we've got several cx3's we're trying to squeeze some life out of!

  • 12. Re: Failover mode and Initiator Type best Practices for VNX5300 VSphere 5 iSCSI
    Christopher Imes

    Actually, I just caught one important note you made.  You are running ESXi 5.1 and that is also when Round-Robin introduced path restore. Unfortunately, everything would be true except for ESXi 5.1.  In the same manner as I had mentioned that we recommend not using Fixed except with ALUA due to path thrashing, this would also technically apply also with Round-Robin as implemented in ESXi 5.1. 

     

    Therefore, the following statement:

     

    "For VNX series and CLARiiON systems not capable of ALUA
    failover mode support, the MRU or Round Robin policies are
    recommended to avoid path thrashing."

     

    would apply only to ESX/ESXi 4.x and ESXi 5.0 (again, ESX/ESXi 3.5 Round-Robin was experimental).  It can not be recommended with ESXi 5.1 (then again as noted above, CX3 and ESXi 5.x isn't a validated/supported configuration anyways).