|Article Number: 529853||Article Version: 3||Article Type: How To|
VxRail Appliance Family,VxRail Appliance Series,VxRail 460 and 470 Nodes,VxRail E Series Nodes,VxRail G Series Nodes,VxRail Gen 5,VxRail Gen2 Hardware,VxRail P Series Nodes,VxRail S Series Nodes,VxRail V Series Nodes,VxRail Software
About VxRail Remote Support's failed drive troubleshooting and hardware replacement process:
Timely resolutions are a main focus of VxRail Support's approach to situations where drives may need to be replaced. In general, all that is required to send replacement drives is to identify a faulted disk and ensure the correct part(s) are matched up for dispatch. In most cases, this can be done quickly and arrangements should be made to send any needed replacements very quickly.
It is also important to ensure that drive replacement is the appropriate solution and the correct hardware information has been identified to avoid issues with sending the correct hardware and unexpected problems during/after replacement. Sometimes, it may appear that a drive has failed because vCenter or host clients show a disk or diskgroup in an unexpected state. Even if a drive shows 'Permanent Device Loss' (PDL), that drive may not actually be faulted, and replacement might not resolve the issue.
The below steps are intended to be quick but to also maximize the chance of identifying the root issue, appropriate resolution, and any additional impact or problems that should be addressed. It is expected that all verification steps (barring complications) can be done and a determination can be made to send a disk replacement in less than 30 minutes, unless checks indicate an issue that requires a different approach.
Steps to determine if hardware replacement should resolve the issue:
1. Check in IDRAC (Dell) or BMC (Phoenix/Quanta) or their hardware logs for drives marked as failed. Other than SATADOM, M.2, or other devices unrelated to the vSAN storage, if drives are marked as failed here, this generally means that the hardware really is faulted and needs to be replaced.
***Note that even if a drive shows failure in hardware sensors and on the vSAN level, sometimes it shows the same even after replacing the drive. In these cases, the expected next point of troubleshooting is the backplane on the host.
2. Check vCenter and the host to determine if a vSAN drive is marked PDL. On a host with a suspected drive failure you can use command line:
esxcli vsan storage list (if 'CMMDS' parameter is "false" then communication to that disk is lost)
Steps AFTER determining that drive replacement should resolve the issue:
1. The VxRail Support Engineer (TSE) checks for and notes any additional issues that might impact drive replacement or that will need to be addressed afterwards.
2. Collect and upload logs to the Service Request (SR) if not done already. Expected logs are TSR logs (if Dell hardware environments), the VxRail Manager log bundle, and vCenter + host logs. *See information about collecting VxRail logs at https://support.emc.com/kb/333684
3. TSE verifies customer contact and shipping details are correct.
4. Customer chooses between replacing drives themselves or if a field resource needs to be sent on site to do it.
5. TSE updates case notes with details about the situation, upcoming disk replacement, and any special considerations.
6. TSE provides node replacement documentation (customer can also download from Solve). If 'De-duplication and Compression' is enabled on the cluster, TSE should attach https://support.emc.com/kb/528355 to the SR, ensure it is provided to the customer, and include it in 'CE Instructions' if a field resource will be sent.
7. The VxRail Support Engineer creates a work order for dispatch of part(s) and labor (if needed). At this point, the SR ownership moves from VxRail 'Remote Support' to 'Field Support'. If the original SR was not specifically for a failed drive, the TSE should create a new SR for the dispatch and retain ownership of the original one.
8. Someone from Dell EMC's scheduling team reaches out to the customer contact to finalize shipping/dispatch details.
9. VxRail Remote Support may be re-engaged as needed by the customer/partner or by on-site field resources through the original SR or a new one.