VPLEX: False positive events sms/1 events generated during VPLEX CD log collection for VS6

           

   Article Number:     534010                                   Article Version: 2     Article Type:    Break Fix 
   

 


Product:

 

VPLEX VS6,VPLEX GeoSynchrony 6.0,VPLEX GeoSynchrony 6.1,VPLEX Series,VPLEX GeoSynchrony,VPLEX Local,VPLEX Metro

 

Issue:

 

 

The sms/1 events generated during VPLEX CD log generation in VS6 systems are false positives.   

         
  1.         The issue is seen when “collect-diagnostics” is run on VS6 systems, when there would be sms/1 events generated. These sms/1 events here are false positives, concluded from the reasons given below:                
               
    •             The directors (verified for both subnets) were pingable all throughout the CD log collection and the pings never disconnected.         
    •          
    •             The firmware logs were monitored continuously on the sms and the logging mechanism was working between the directors and sms, which would not be possible if there was a genuine network outage.         
    •          
    •             No evidence of any hardware faults were found.         
    •          
    •             No sms/1 events after completion of CD log collection.         
    •          
    •             The above observations were consistently made on all cases         
    •        
                  
  2.      
  3.         As these sms/1 events are false positives, the end user may be told to ignore these events for the time-being and be assured that there is no potential risk to the end user due of these events.     
  4.    
                                                             

 

 

Cause:

 

 

During the CD log collection, it takes about 10 minutes for the /opt/zephyr/bin/zdt -c / -e diags command (related to obtaining hardware details) to complete. Even when running this command from the director level, it took about the same amount of time. It was noticed that there were no sms/1 events once this command execution was complete. Thus, this command is causing the sms/1 events to be generated.   
   
    Executing CLI/Director commands     
      =======================================================================     
      2019-05-12 14:41:46 UTC: ***Executing regedit -d /root/full-register-dump.txt ; cat /root/full-register-dump.txt on directors...     
      2019-05-12 14:41:51 UTC: ****Completed regedit -d /root/full-register-dump.txt ; cat /root/full-register-dump.txt on directors.     
      2019-05-12 14:41:51 UTC: ***Executing /opt/zephyr/bin/zdt -c / -e diags on directors...   >>>>NOTE     
            
      128.221.252.36:5988:null:1:<3>2019/05/12 14:43:58.442: sms/1 management network failure: address 128.221.252.36 is unreachable     
      128.221.253.35:5988:null:1:<3>2019/05/12 14:46:00.734: sms/1 management network failure: address 128.221.253.35 is unreachable     
      128.221.252.38:5988:null:1:<3>2019/05/12 14:48:28.323: sms/1 management network failure: address 128.221.252.38 is unreachable     
      128.221.253.36:5988:null:1:<3>2019/05/12 14:49:22.501: sms/1 management network failure: address 128.221.253.36 is unreachable     
      128.221.252.38:5988:null:1:<3>2019/05/12 14:49:28.393: sms/1 management network failure: address 128.221.252.38 is unreachable     
      128.221.252.35:5988:null:1:<3>2019/05/12 14:49:56.440: sms/1 management network failure: address 128.221.252.35 is unreachable     
      128.221.253.36:5988:null:1:<3>2019/05/12 14:50:22.568: sms/1 management network failure: address 128.221.253.36 is unreachable     
            
      2019-05-12 14:51:15 UTC: ****Completed /opt/zephyr/bin/zdt -c / -e diags on directors.     >>>>NOTE     
            
      2019-05-12 14:51:15 UTC: ***Executing smartctl -a /dev/sda1 on directors...     
      2019-05-12 14:51:17 UTC: ****Completed smartctl -a /dev/sda1 on directors.
                                                           

 

 

Resolution:

 

 

Procedure to check and confirm whether sms/1 events during VPLEX CD log collection are false positive or not.   
   
    Pre-requisite/Pre-checks:   

         
  1.         Confirm that the VPLEX hardware is VS6.     
  2.      
  3.         Ensure that at the time of testing no other management tasks or activities are performed, only the VPlex CD log collection.     
  4.      
  5.         In general, verify the health of the system and the cluster-status. Both should be good without any known connectivity issues or hardware faults.     
  6.      
  7.         Analyze the firmware.log prior to the procedure to see if the system has any recent/periodic/ongoing streaming instances of sms/1 events. If yes, these might indicate a genuine network connection failure between the mgmt server and directors.     
  8.    
    Procedure:   
         
  1.         Use multiple putty sessions for the mgmt server, and in each initiate continuous pinging of a couple of the directors using both subnets, .252 and .253, using the command "ping -i <interval> <director-IP>".     
  2.      
  3.         From the mgmt server, in another putty session, monitor the latest firmware.log in the /var/log/VPlex/cli/ folder. Ensure that the cable connectivity between mgmt server and directors are good. Thus ensure that there are no sms/1 events streaming before going on to the next step.     
  4.      
  5.         From the mgmt server, in another putty session, go to vplexcli and run the command collect-diagnostics --verbose --noextended     
  6.      
  7.         Monitor the firmware.log opened in step 2.                
               
    1.             We will see that there would be intermittent sms/1 events generated for multiple directors while the CD log collection is in progress.         
    2.          
    3.             Though sms/1 events are generated we can still see that the logging mechanism is working between those sms/1 event generating directors and mgmt server, which would not be possible if there was a genuine network outage.         
    4.          
    5.             The director pings initiated in step 1 should still be successfully pinging.         
    6.          
    7.              Ensure that the cable connections, especially between mgmt server and directors are intact and good         
    8.        
                  
  8.      
  9.         Once the CD collection is complete,                
               
    1.             notice that no more sms/1 events are generated         
    2.          
    3.             Stop the pings and directors and see the amount of packet data lost, it would ideally be 0%         
    4.        
                  
  10.    
                                                             

 

 

Notes: