ETA emc296739: CLARiiON, VNX: Improperly shutting down a CLARiiON or VNX array can lead to data loss
Safely Shutting Down Your CLARiiON or VNX Array
What Is the Issue?
CLARiiON and VNX arrays have multiple protective features in place to protect your data during most shutdown methods, both planned and unplanned. Standby power supplies (SPSs) protect the enclosure(s) that hold the storage processors as well as the first tray of disks (bus 0, enclosure 0.) Extra private space is reserved for the “vault” on the first four disks of this battery-protected enclosure. In the event of a power loss while write cache is still enabled, write cache will perform an emergency dump to the “vault,” which ensures that data that has not yet written to disk is still protected.
While most methods of shutting down an array result in the full protection of user data, a small number of operations could result in the loss of data in write cache. If write cache is lost, any LUNs that had data in write cache get marked as “cache dirty” and this data will not be assigned to disk. This creates a temporary “data unavailable” (DU) situation that must be cleared and recovered by Engineering. The most recent writes to the array may be lost.
Things to Avoid
Splitting LUNs across bus 0, enclosure 0 and other enclosures should be done with great caution or avoided altogether. In a total power-fail scenario, the standby power supply (SPS) supplies power to the storage processors and the disks in bus 0, enclosure 0, which contains the vault. This allows the storage system to save the contents of write cache to disk. However, the power to non-vault Disk Array Enclosures (DAE) is not maintained as they have no battery backup. As a result, if you have bound LUNs across bus 0, enclosure 0 and other enclosures such that the LUN would be able to stay up in degraded mode when it lost one or more of its disks outside of bus 0, enclosure 0, then such a configuration would potentially incur full rebuilds after every storage system power down. See solution emc98039 for more information on LUN binding considerations across different enclosures.
Standby Power Supply (SPS) units must be cabled properly and be configured so that they supply power to the storage processor enclosures and the first enclosure of disks (bus 0, enclosure 0) containing the vault drives. Weekly SPS tests will ensure that the SPS is configured properly. Improper configurations may result in the cache data being lost if power to the vault drives is not properly maintained.
Safe Power Down Methods
Approved methods of shutting down arrays for each array model may be found in customer documentation, located here:
- CLARiiON - http://powerlink.emc.com/public/CLARiiON_All_Models/FAQs/welcome.htm
- VNX - https://mydocs.emc.com/VNX
However the Procedure Generator should be the number one choice for the power down/ power up procedures for the CLARiiON and VNX. The Procedure Generator is constantly updated whereas the other documentation may be out of date at times.
In an emergency power down scenario where you have very little time to prepare, the safest methods generally include killing power to the racks into which all the array components should be plugged. This will ensure power is removed to all components including the SPS simultaneously. The SPS will then detect power loss and allow cache to dump safely to the vault drives.
In a planned power down scenario, it is safest to zero out and disable write cache prior to shutting off power. This ensures that write cache will not be lost, and protects LUNS from being marked as dirty and taken offline.
The most common incorrect way to power down an array that could result in “data unavailable” and prolonged recoveries is if power is directly removed from the storage processors or the vault drives (bus 0, enclosure 0) without giving write cache an opportunity to dump. Never remove the cable between your SPS unit and the storage processors or the vault drives. Also never remove both storage processors simultaneously.
For integrated arrays (Unified VNX especially) another common issue is completing the correct shutdown procedure for either the Block or the File component of the array without being aware of the relationship and dependencies between the two. For that reason, adhering to the recommended procedures linked above for integrated VNX units is strongly recommended
Recovery from Cache Dirty/Unassigned LUNs
In the event that you have offline LUNs marked “cache dirty can’t assign” after a storage system power cycle, contact EMC Technical Support to recover the LUNs.
For more information, Refer EMC Knowledgebase article emc296739