ScaleIO: "The SVM used disk space exceeds the configured threshold" alert in AMS

           

   Article Number:     514642                                   Article Version: 4     Article Type:    Break Fix 
   

 


Product:

 

ScaleIO Ready Node-PowerEdge 13G,ScaleIO Software 2.0.1.4,ScaleIO Software 2.0.1.3,ScaleIO Software 2.0.1.2,ScaleIO Software 2.0.1.1

 

Issue:

 

 

   

      Issue Description   

   

      "The SVM used disk space exceeds the configured threshold" alert is raised in AMS environment   

   

      Scenario   

   

      ScaleiO Ready Node (AMS) reports "The SVM used disk space exceeds the configured threshold"   

   

      Symptoms   

   

      Log lines like the one below in the AMS trace files:   

   
2017-09-27 09:23:30,859 [hw-sampling-thread-910] DEBUG c.e.s.a.alert.HardwareAlertsProducer - Adding alert: [NODE_SVM_OS_DISK_USED_SPACE_HIGH,ALERT_HIGH,com.emc.ecs.api.model.ams.hardware.gen.NodeTransientProperties@5b61a4f5,SvmUsedDiskSpacePercentage]    
   

      In addition, There will low free disk space on the primary disk   

   

      Impact   

   

      The disk space can reach 100% full   

                                                             

 

 

Cause:

 

 

   

      Root cause   

   

      The AMS samples the MegaRAID controller periodically and the command output is written to the file MegaSAS.log   

                                                             

 

 

Resolution:

 

 

   

      Workaround     
      Truncate the content of the MegaSAS.log file.   

   
cp /dev/null /root/MegaSAS.log    
   

      Sometimes it might happen that the disk space reported in "df" output is still not updated - that's because the process which had the large file open is still running, so the space might not be calculated properly - in such a case either restart that process or simply reboot the whole server.   

   

      In rare events even after a reboot the disk space is not displayed properly:   

   
Before reboot:[root@localhost ~]# df -hFilesystem Size Used Avail Use% Mounted on/dev/sdv4 50G 50G 325M 100% //dev/sdv2 1014M 182M 833M 18% /boot/dev/sdv1 200M 9.8M 191M 5% /boot/efi/dev/sdv5 50G 33M 50G 1% /hometmpfs 4.6G 0 4.6G 0% /run/user/0root@localhost ~]# du -sh /*0 /bin160M /boot19M /dev30M /etc0 /home0 /lib0 /lib640 /media0 /mnt982M /opt0 /proc68K /root9.1M /run0 /sbin0 /srv0 /sys8.0K /tmp1014M /usr134M /varAfter reboot:[root@localhost ~]# df -hFilesystem Size Used Avail Use% Mounted on/dev/sdv4 50G 49G 1.2G 98% //dev/sdv5 50G 33M 50G 1% /home/dev/sdv2 1014M 182M 833M 18% /boot/dev/sdv1 200M 9.8M 191M 5% /boot/efi[root@localhost ~]# du -sh /*0 /bin160M /boot18M /dev30M /etc0 /home0 /lib0 /lib640 /media0 /mnt100M /opt0 /proc68K /root9.1M /run0 /sbin0 /srv0 /sys1.4M /tmp1014M /usr140M /var    
   

      In such a case there might be a file system problem - boot the host into Rescue Mode and run appropriate "fsck"  against the root file system to fix it.   

   

      Impacted versions   

   

      All AMS versions   

   

      Fixed in version   

   

      TBD