EMC M&R | ViPR SRM | Watch4net: Health graphs and reports are missing for some Watch4net or ViPR SRM hosts on Linux installations


ViPR SRM 3.0,ViPR SRM 3.5,ViPR SRM 3.6,ViPR SRM 3.7,ViPR SRM 4.0,ViPR SRM 4.1,ViPR SRM,Watch4net,Watch4net 6.6,Watch4net 6.5,Watch4net 6.4,Watch4net 6.3






  •         A lost+found directory owned and accessible by the root user and located in the installation directory causes the health collector to stop collecting data.     
  •         The health collector logs will show the following SEVERE error:     
      SEVERE   -- [2017-12-19 23:27:36 AEDT] -- CollectorWorkerCommandTask::run(): nulljava.lang.NullPointerException   at com.watch4net.apg.common.jmxutils.VMUtils.listPrivateJVMs(VMUtils.java:176)   at com.watch4net.apg.common.jmxutils.VMUtils.getVmInfos(VMUtils.java:53)   ....    






When the Watch4net/ViPR SRM installation directory is a mount point with a file system mounted on it, then each time the fsck utility is run on the file system to check and correct it, fsck creates a lost+found directory in the root of the file system. The fsck utility is executed on the file system after a system crash when the file system is flagged as "dirty".   
    The lost+found directory is accessible by the root user only. Since the health collector executes as user "apg", it does not have access to the lost+found directory. This will stop the health collector's search for JVM instances on the host and as a result it will not collect any data.   
    A binary installation is performed in /app/APG where /app/APG is a mount point for a file system. fsck, when executed on the file system, will create the directory /app/APG/lost+found/ - this directory will be accessible to the root user only. When executing 'find /app/APG -name ".*"' as user "apg" on the host, a "permission denied" message can be observed on the lost+found directory. 






There is a fix for the problem scheduled for EMC M&R/Watch4net 6.9.   
    A workaround for versions prior to this is to remove the lost+found directory and its contents in the SRM installation directory and restart the health collector each time the health graphs and reports stop collecting data.