Symptoms
RecoverPoint GUI reported "RPA down"
RecoverPoint Appliance rebooted unexpectedly
This is a hardware related issue with Intel RPA. Looking through IPMI event logs we can see that an SMI timeout occurred around the time of the RPA reboot.
From log: files/home/kos/kbox/utilities/regulate_reboot/detailed_startup_information.txt
reboot Tue Dec 6 21:27:46 UTC 2016
From log: processes/usr/bin/ipmitoolselelist or files/home/kos/sel/sel.log
59 | 12/06/2016 | 21:16:33 | Unknown SMI Timeout | State Asserted
5a | 12/06/2016 | 21:16:58 | Unknown SMI Timeout | State Asserted
Affected version: All version with Intel hardware RPA
Cause
According to Intel documentation: "SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events (typically memory or PCI errors, or other forms of critical interrupts). If this interrupt times out the system is frozen." This long timeout caused the RPA to go down.
Resolution
The affected RPA should recover itself after about ~6-10 minutes. Power cycle the RPA if it does not recover itself.
Monitor the affected RPA and if the issue persists, this RPA will need to be replaced.
RecoverPoint, RecoverPoint CL, RecoverPoint EX, RecoverPoint Gen5 Server, RecoverPoint SE