A job fails and has a successful retry, but reports only show it as failed.

Product:

Data Protection Advisor (DPA)  all versions

 

 

Description:

A backup job shows as failed in reports, even though according to the backup application, the retry of the job was successful.

The successful retry of the job does not appear in the reports.

 

Examining the DPA Collector log (collector.log) on the node that collects that job data, it shows that there are time out errors for the backup application's commands that DPA runs to collect the data.

 

 

Resolution:

In some large backup application configurations, or on very active/busy backup servers, the command to gather job information can take an excessively long time to complete.  If the amount of time for the job monitor command to complete exceeds the configured job monitor request time out in DPA, then the data will not be collected. The DPA  Collector cannot collect partial command output.  If the command times out, any data that had been outputted to that point is discarded.

 

This often can be resolved by adjusting the time out for the Job Monitoring Request in DPA.

 

In the DPA GUI/Console, for the node running the Collector where the command is timing out, do the following:

 

Right click on the node, select Configuration and select Properties.

 

In the node Properties window,  select the Requests or Assignations Tab and then click on the Job Monitor request for the backup application.

In the Options section, click on the box at the right of the timeout field.

Change the timeout value to a more appropriate value.

 

The value can be experimented with to determine an appropriate number.  Alternatively, a good estimate for this value can be obtained by running the command from the job monitor request manually, outside of DPA, and noting how long it takes to complete.


Note: In some environments, this value may need to be tuned periodically as the backup environment grows or changes.

 

Also note that setting the timeout value to a longer value then the period for which the requests runs, i.e.  the Job Monitor runs every 5 minutes and the timeout is set to 10 minutes, will not stop the current request running. The current request will run to completion or to the timeout value. The next run of the request will not start until the last request has completed or timed out.