This Post is dedicated to my Friends at Adlon.
We at EMC normally talk a lot about Disaster recovery. We have Plenty Demos on it. We live DR. At least at our Customers Sites.
And in our own Production. No outages at all.
But sometimes, the Shoemakers Son always goes Barefoot.
Not because he does not have the tools, maybe because he is getting lazy for himself :-)
I am outing myself here :-) for my Demo Lab ....
Due to the Lack of DR Hosts ( and not due to the lack of Storage :-) ) not all of my Hyper-V VM´s are Disaster Protected by a DR Host or Cluster. ( Well, supporters can donate me Hosts if they wan´t. There is enough Open Space on my EMC Cycling Shirt for Sponsors )
So, in reality does my Demo Environment need to be Disaster Protected ? Not Really, It´s a Sandbox. Could reinstall it fro Scratch ..
But what a if disaster really happens ?
When does it happen ?
Do i need it when it happens ?
Do i want to reinstall from that scratch ? not really?
It happened to me yesterday.
One day before Vacation . . .
here is what Happened briefly
Thursday, 1pm, Partner Workshop with Adlon, one of our Cloud OS Partners in Ravensburg
My Personal Disaster happens.
We spent already a half day on integrations from EMC into Microsoft, like ESI and Powershell.
Then it was time to do some Azure Pack and SCVMM Demo´s.
I tried to connect to my SCVMM Host, but the Console failed.
Pretty soon i figured out that my Hyper-V Host running SCVMM and one Node of my SQL Servers Crashed.
Well, he did not really Crash, he was in dimension between here and cloud. I Still receives Pings, but Remote Management with wmi/cim/wsman no longer worked.
OK. Plan b, i ran trough the Demos with Videos, a Plan i normally don´t like.
Friday, 10am, Homeoffice
6 hours till vacation, i do not want to leave my Lab in a nonworking fashion.
Encouraged by the Adlon Folks, i thought it is a good idea to use what i am Always praying.
Yes, Dogfood, eat your own meal ! Do your DR !
Also, i had to finish a presentation on SCVMM with some Creenshots, thus i need my SCVMM VM.
Easy Way would be Power off and On the Failed Host.
But how about doing a Disaster test rather than doing a "Reboot" ?
My failed Host, Agent-J, still only reacting to pings. The Lab Admin not available, unfortunately i do not have KVM access to that host currently.
What are my Options ? I have 17 VM´s on that host, 5 are Guest-Clustered. Good, ignore them
The others are Application Replicated ( Exchange Machines )
But one is not Protected.
My SCVMM host, Normally not that Important... unless you use some Self Servicing Stuff like Azure Pack :-)
Last Backup from Tuesday. Could be Plan C.
Plan B: Unmask the Production LUN from Host and Present it to other host. Might be an Option, but if data gets Corrupted ....
Plan A: The host is running on an RecoverPoint Protected VNX5300 Array. Why not Replicate the LUN to a DR Site and do some Testing First ?
Obviously, the LUN is not in a CG right now. No Problem.
Step 1: Create a CG by using ESI´s Recoverpoint Plugin. Pretty Simple, Creates a DR Lun on my Array in the DR SITE.
( you may notice the nRed Cross for my Production Host in the Picture :-) )
The Process of creating took only a few Minutes, plus some Minutes to Sync 2TB Data to the Remote Array.
Waited till Data was in Sync for 2 TB.
Not a too long time, but a good Idea to grab a Coffee.
A few Weeks ago i blogged on Something that i called SRM_4_Hyperv . Good chance to test this thing now in a realworld disaster.
Rather that running the Full Automation, i wanted to test the individual Steps to verify that everything works in a real Desaster :-)
Recover Pint has a Powershell Integration with EMC´s ESIPSToolKit, so the Automation is a No-Brainer.
So i ran the Script in the Powershell ISE.
The Fist thing i test in Which of my 2 Sites i am currently Running my Production for that Host/Volume:
Fine, Production is on agent-J, Array VNX5300C, proteced by RecoverPoint Site RPA_C.
The i check if i can do a Ordered Shutdown/Unmount of the Production Host / Volumes:
Since the host is no longer Managable, i enter my Forced Process.
This is the Point where i enable the Volume Access on the Remote site.
Now comes the fine Trick: I do not need to specify a DR Host upfront. I can slect this dynamically, based upon DR Capacities in my DR Site.
In this Example, Agent-K was selected as the DR Host.
After Presenting and rescanning the LUNS, the Volume get´s dicovered and Mounted on Agent-K
The Best Time now for doing a selective Import. One Method is to do an automated Testing of VM Configs by using compare-vm. Ben Armstrong has written a good Description on that. This is what i use in my Script normally when i do the Automated failover.
When i use a selective Fail over, i prefer to use the Import Wizard from Hyper-V Manager:
I did the Above for my SCVMM VM. Since the Machine was not shutdown remiotely and files where open, the Guest needed to Run Check disk once to get into a Consistent State.
After that my VM was Up and Running.
After Considering a good State, it is a Good time to Fail over the Complete Replication to the Remote Site and replixcate Back to old Production:
For People tha do not know Recoverpoint: When doing a failover, we first take the last Point in Time of Synchronization. That Guarantees that every IO has made it to the Remote Site. If for what ever reason, that Image does not Work ( e.g. a Rolling Disaster ) whe can choose from
A: A consistent Bookmark ( Triggered by Backups like VSS, Self defined Bookmarks, evens etc. )
B: Any Point in Time from a Timescale up to Microseconds:
This gives us by far the most granular Disaster Recover Points !
As a Resume:
Hyper-V is Rock Solid and fully Crash Recoverable with 2012R2. You may also want to consider VM Checkpoints in Combination with Array Based Snapshot´s or Bookmarks for Consistency Points, but a Crash Recovery always works.
BTW: It took me 20 Minutes during my Breakfast to Sync the 2TB of data and do the Assisted Fail over to my Remote Site.
The Original Site LUN is blocked right now from Host Access, so if the Host Reboots, it will not be able to Start the Failed Over machines.
Having a DR Strategy: good
Having Hyper-V VM´s Protected by Recover point: Priceless
Friday, 11am. Work: Done. DR Done. Ready to go off for Skiing for a Week ...