RecoverPoint for VM: ESX splitter (kdriver) crash when un-protecting or deleting a production volume

Article Number: 500497                              Article Version:                               Article Type: Break Fix


Product:

RecoverPoint,RecoverPoint for Virtual Machines,RecoverPoint for VMs for VSPEX BLUE,RecoverPoint for Virtual Machines 4.3 P1,RecoverPoint for Virtual Machines 4.3 SP1,RecoverPoint for Virtual Machines 4.3 SP1 P1

 

Issue:

ESX splitter crash when un-protecting or deleting a production volume (or any volume with a backlog).

 

Symptoms found in the logs:

 

Notice, if a production volume (or any volume with a backlog) was removed from RP (will result in removing volume from kdriver's HostReplication state) in the affected versions kdriver is expected to crash.

(a) In kdriver logs before the "HERE" print or in kdriver core

Before v4.3.1, kdriver would assert:

 

NO TOPIC: errno=0 Assertion failed: iter != m_VolumeBacklogs.end() Line 169 File Backlog.cc

v4.3.1 and up, will have the following log but most chances it wasn't written to the log file yet. If a kdriver core exists (by default kdriver core will be exist only from v5.0) then the log should be able to be found there. In any case, finding this log in the affected versions, will definitely indicate the problem. Notice number after "vol = " is the volume guid in decimal.

 

2017/04/11 00:44:53.039 - #1 - 34286/34249 - Backlog::closeTask: Backlog task is already closed! vol = 371842328648135397

(b) If the above log (a) wasn't seen, the following logs will give some indication. In kdriver logs before the "HERE" statement:

 

2017/05/05 01:03:11.131 - #2 - 20143025/20142987 - HostReplication:  changing state of kboxlun 0x68fd86504304250f from Option(state=FAIL-ALL()) to state=DETACHING()


(c) In kdriver.log.startup we can see a segmentation fault:

Segmentation fault
Fri Jul  8 18:01:57 UTC 2016) launch_kdriver_watchdog: KDriver stopped / died / killed
Fri Jul  8 18:01:58 UTC 2016) launch_kdriver_watchdog: Killing kdriver_heartbeats.sh
Fri Jul  8 18:01:58 UTC 2016) launch_kdriver_watchdog: Sleeping for 10 seconds
Fri Jul  8 18:02:08 UTC 2016) launch_kdriver_watchdog: Next action = open
Fri Jul  8 18:02:08 UTC 2016) launch_kdriver_watchdog: Launching kdriver_heartbeats.sh
Fri Jul  8 18:02:08 UTC 2016) launch_kdriver_watchdog: Launching run_kdriver.sh
Fri Jul  8 18:02:08 UTC 2016) kdriver_heartbeats: deleting old /scratch/log/kdriver-heartbeats
Fri Jul  8 18:02:08 UTC 2016) run_kdriver: run_kdriver.sh - starting
Fri Jul  8 18:02:08 UTC 2016) run_kdriver: Launching KDriver


Affected versions: 4.3, 4.3.P1, 4.3.SP1, 4.3.SP1.P1, 4.3.SP1.P2, 4.3.SP1.P3

 

Splitter type(s): ESX (vSCSI splitter)

 

Cause:

Upon removing volume with a backlog from RP, RP closes backlog task twice, which results in kdriver crashing

 

Change:

Un-protecting or deleting a production volume

 

Resolution:

Resolution fixed at

4.3.SP1.P4
5.0