EMC M&R | ViPR SRM | Watch4net: Health graphs and reports are missing for some Watch4net or ViPR SRM hosts on Linux installations

           

   Article Number:     516547                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

ViPR SRM 3.0,ViPR SRM 3.5,ViPR SRM 3.6,ViPR SRM 3.7,ViPR SRM 4.0,ViPR SRM 4.1,ViPR SRM,Watch4net,Watch4net 6.6,Watch4net 6.5,Watch4net 6.4,Watch4net 6.3

 

Issue:

 

 

   

         
  •         A lost+found directory owned and accessible by the root user and located in the installation directory causes the health collector to stop collecting data.     
  •      
  •         The health collector logs will show the following SEVERE error:     
  •    
   
      SEVERE   -- [2017-12-19 23:27:36 AEDT] -- CollectorWorkerCommandTask::run(): nulljava.lang.NullPointerException   at com.watch4net.apg.common.jmxutils.VMUtils.listPrivateJVMs(VMUtils.java:176)   at com.watch4net.apg.common.jmxutils.VMUtils.getVmInfos(VMUtils.java:53)   ....    
                                                             

 

 

Cause:

 

 

When the Watch4net/ViPR SRM installation directory is a mount point with a file system mounted on it, then each time the fsck utility is run on the file system to check and correct it, fsck creates a lost+found directory in the root of the file system. The fsck utility is executed on the file system after a system crash when the file system is flagged as "dirty".   
   
    The lost+found directory is accessible by the root user only. Since the health collector executes as user "apg", it does not have access to the lost+found directory. This will stop the health collector's search for JVM instances on the host and as a result it will not collect any data.   
   
    Example:   
   
    A binary installation is performed in /app/APG where /app/APG is a mount point for a file system. fsck, when executed on the file system, will create the directory /app/APG/lost+found/ - this directory will be accessible to the root user only. When executing 'find /app/APG -name ".*"' as user "apg" on the host, a "permission denied" message can be observed on the lost+found directory. 
                                                           

 

 

Resolution:

 

 

There is a fix for the problem scheduled for EMC M&R/Watch4net 6.9.   
   
    A workaround for versions prior to this is to remove the lost+found directory and its contents in the SRM installation directory and restart the health collector each time the health graphs and reports stop collecting data.