ECS: Troubleshooting ECS performance issues

           

   Article Number:     538583                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

Elastic Cloud Storage,ECS Appliance

 

Issue:

 

 

How to troubleshoot ECS peformance issues.   
        

      Performance related issues:   

   
         
  •         One or more types of operations are slow. This is the most common issue since problem is noticed from end application(s).     
  •      
  •         One or more nodes are slow or have high resource utilization. This could be a question raised by the monitoring of the ECS system, or found as part of normal usage.     
  •    
                                                             

 

 

Cause:

 

 

Multiple factors such as network related, high load, usage pattern etc                                                           

 

 

Resolution:

 

 

   

       
   

      Questions to consider when experiencing performance degradation. (Expect customer support to request answers to below queries)   

   
         
  •                 

              What behaviour or symptoms are you experiencing that indicates a performance problem?       

                  
  •      
  •                 

              What is the impact of the problem?       

                  
  •      
  •                 

              When was performance problem first noticed?       

                  
  •      
  •                 

              Anything change recently in environment? Software or hardware?  Load? Network? Firewall? or Load Balancer       

                  
  •      
  •                 

              Can the problem be expressed in terms of latency or run time       

                  
  •      
  •                 

              What is the environment? Software & hardware being used? Versions? Configuration? Customer application?       

                  
  •      
  •                 

              What is the average file size (large/small files)?       

                  
  •      
  •                 

              Are reads/writes/deletes/updates affected? or all methods affected?       

                  
  •      
  •                 

              Are you able to read/write via other Access methods? Is one specific application affected or are all applications affected?       

                  
  •      
  •                 

              What access method is been used:       

                    
               
    •             S3         
    •          
    •             Swift         
    •          
    •             Centera SDK (CAS) Is there a tranformation/ECSsync or other migration in progress?         
    •          
    •             File System access:         
    •          
    •             Windows using CIFS (geodrive)         
    •          
    •             Linux using NFS         
    •        
                  
  •      
  •                 

              What is the application access pattern?       

                  
  •    
   
      POST   creates/renames objects     
      GET   retrieves object data     
      PUT   updates object attributes     
      DELETE   removes objects and metadata from the system     
      HEAD  corresponds to each GET method. HEAD looks exactly like a GET request except the method name is HEAD instead of GET.     
      The response for a HEAD request includes only headers; it does not include a response body.   
   
         
  •                 

              Is your application connecting to all or some individual nodes? Is a load balancer configured?       

                  
  •      
  •                 

              Can you supply the Namespace, Bucket  secret, and UID of the affected application so we can use perform similar tests on the ECS?       

                  
  •    
   

          

   

      Information to collect on ECS if you are opening an SR with customer support to help narrow problem resolution   

   
         
  •         Application logs from affected application. Logs specific to a failed request or delayed request are most beneficial here.     
  •      
  •         Any observations from ECS GUI like a node offline or other critical failures observed from dahboard menu.     
  •      
  •         Answers to questions from previous section     
  •      
  •         xdoctor run if system affected does not have dialhome capability.     
  •