Well, my question isn't really about troubleshooting, but if you have the time I'd appreciate the advice. If this is not the approprite venue for this question please say so.
I'm new to VMware/VNXe. Currently replacing some aging hardware (some 10+ years old). Recently purchased a VNXe3100 and built a 3 node VM cluster (ESXi 5). I'm only likely to have about 8-10 VMs all lightly loaded. I haven't put anything critical on it yet (just a print server wehre I can revert back to physical hardware easily if needed).
I have been looking around and couldn't find much in the way of best practices or rules of thumb. ie is one large pool and one large VMFS volume a "bad" think to do?
My setup is:
Two RAID5 sets on the VNXe, one is allocated to an iSCSI server and the other to a shared file server for CIFS shares.
One large data pool for each storage type.
I do not have snapshots enabled on the storage pool used for the iSCSI server. The recommended snapshot space of 235% seemed like a large amount of storage to give up for what I saw as little gain. Correct me if I'm wrong here, but when would one really be likely to recover a snapshot no a VMFS volume? I would think I'd be much better off with snapshots on the VMware side so I could roll back an individual VM rather than the entire VMFS volume.
Just wanted to make sure I'm not doing anything stupid before I put production VMs on here and find out the only recourse is to delete the entire datastore and start over.
Thanks for your question.
How many and what type of disks you have on it?
We are using 28 disk R5 pools on 3300 and soon we will have 3100 with 20 disk R5 pools. On 3300 we are using 2TB VMFS volumes. Issue with big volumes is that you tend to put too many VMs on those. 8-10 VMs should be fine, depending of the load of course. I would personally go with the bigger pools and not with the samller RGs.
With the snapshots I agree with you
I have a 3100 with dual SP's, the extra 4x1GB ethernet modules, and 11 600GB SAS disks (two 5 disk R5 with hot spare).
I have one set alocated to iSCSI and the other to shared folders (each on a different SP). It seemed like a good idea to keep the iSCSI and shared folders on seperate spindle sets, you agree? I have the two base ethernet ports on each SP serving the iSCSI server (setup more or less per the VNXe HA whitepaper using two HP Procurve 2910 switches). The shared folder server is using ports on the ethernet module.
So at the moment given the size of my environment, you'd agree that just making one large (1.5-2TB thick provisioned) VMFS volume with no snapshotting is a reasonable decision? In the future if I need more space I'll add a disk enclosure and another R5 set and create a second VMFS volume rather than growing the first.
For multipathing, round robin or most recently used?
I'm planning backups with Backup Exec using the NDMP plugin for the CIFS server on VNXe, the virutual infrastructure agent to backup the vm disks for DR, as well as agents in the VMs for file level backup) going to LTO5 tape.
Any other things I should watch out for as an ingnorant newbie?
Your question is perfect for this forum, and I love talking about these kind of details. I'll focus on the drives for this response.
You mention two R5 RAID Groups on the VNXe, so we know you have ten drives configured for user data. The volume manager on the VNXe will slice these drives up using logic to balance the disk consumption across spindles. Now comes the decision point: a single pool or split up the pools? I am all about admitting the checks and balances of a decision like this one.
What multiple pools offers is IO segregation. You know what storage is tapping into what spindles with certainty. It's handy for troubleshooting. For the SAN performance buff, you know it means you get 5 drives worth of IOPS per server, which is calculable. You know what systems will be impacted by specific drive removal or fault. There is comfort in this certainty.
What you lose with multiple pools is simplicity. You spend more time managing the system. There is also always that moment when you have 30GB in one pool and 25 in another and you have that realization that you wish you could carve up 50GB.
What single pools offer is simplicity of management. You know the slices of drive capacity that make up the storage will be spread across all the drives, so that's no problem. Life is easier and your storage is consumable more readily. There is also a potential performance benefit of a very active stream tapping into all 10 drives rather than just 5.
What you don't know is who is where - there are slices of each server's storage on all the drives. This extends the impact of a double faulted RG for example. It also increases the chance of spindle contention, which is when servers are requesting more IOPS than a drive can service within normal latency windows.
Now that the options are laid down, you asked for an opinion, so I'll give one. I am all for simplicity of management, so a single pool per tier of storage is my default option. The workloads of an iSCSI server and a file server can be complimentary, so I would spread the load across all drives possible unless I had good reason to segregate a specific workload. Remember that write cache is shared and mirrored to each SP, so the idea of isolating server storage is not going to hold. Placing one server to SP is smart thought - it lets you maximize your processing power.
And let's not forget - at some point you'll want to reclaim space. Planning for a double faulted RG on 11 drives is like planning for lightning to strike every single day. I always find it easier when all my eggs are in the same basket. Plan well, configure a HS like you did and have a backup for the worst case.
So there you have it - there's a breakdown of my logic related to drive layout. We'll get to more details tomorrow.
I would also go with the 10 disk pool. Like Matt mentioned that sometimes you need IO segregation and the one pool isn't an option. Putting all database server disks (db, log, tmp, backup) on the same pool might not be the best option.
Matt also mentioned the impact of a double faulted RG. What does RAID Group have to do with Pool? Well Pool has two or more RAID groups in it and the data is striped across the RGs. VNXe can use up to four RGs in a Pool. If you create pool with 10 disks it will have two 5 disk RGs in it and the data will be striped across the two RGs. If you then add ten more disks to that pool the data that were there is still only striped across the two first RGs. New datastores that are created to the extended pool can use all 20 spindles. But if the whole capacity of the 10 disk pool is in use and 10 more disks are added to the pool then the new datastore will only be striped across the two new RGs in the pool.
Definitely multipathing and round robin. I have made some tests with and without RR and got about 100MBps more throughput from VNXe 3300 when RR was enabled. I'm still testing the performance on 3100 and it seems that ESX 4.x with two separate ports and RR is performing better than link aggregation.
OK, so 10 disk pool it is... any advice on the easiest way to change things around? As I said preivoulsy I have the two pools each with one RG. I don't really have any data on the pool allocated to the shared folders. Can I just delete the shared folder server, recycle the second pool, and add those disks to the first pool that currently has the VMFS volume, then recreate the shared folder server again using the one and only pool?
I'd rather not have to delete the pool with my vmfs volume on it if it can be avoided.
Yes you can recycle the RG with CIFS share on and then extend the first RG with the recycled five disks. You don't need to delete the server, just the datastore. I would also suggest creating new VMFS datastore on the extended pool and then moving data from current datastore to the new datastore to get the benefit of all 10 spindles. VMFS volume that you have now is still going to be on one RG after the extension.
Henri hit an expert level point that's worth repeating:
The VNXe's volume manager will stripe across available drives at creation. Since this process will add 5 more drives we want to take advantage of, you'll want to made a new datastore and migrate the data over. There is no dynamic restriping which is important to keep in mind as you plan out capacity consumption.
Keep the questions coming.
Understood. Thanks both of you for pointing that out, it's something I definitely would have overlooked. Is there any way to tell what drives the volume is striped across?
I deleted my shared folder data store, recycled the disks, added them to the remaining pool. I created the new volume and upgraded it to VMFS 5. Changed the multipathing from fixed to RR on each host.
So far, so good. Now for the fun part, moving things over... unfortunately I only have essentials plus, so no storage vmotion for me.
Once I get that done, I'll delete my now empty volume, recycle, and create my shared folder datastore again.
I'm assuming when I delete my old VMFS volume from the VNXe it will remove it from vCenter as well, or is this a more manual process?
Did I miss anything?
Thanks again for your help,
I didn't use the storage vmotion eval, but since I'm almost done moving things over I'll save that for a real emergency :-)
I moved everything over to the new volume except my vcenter appliance. Been busy with other things the last few days (its more of a side project that I'm working on when I have a few minutes here and there).
Hopefully I'll get it moved over and clear out the old volume this weekend.