ScaleIO: Known problems, errors and Fixed Issues documented in ScaleIO 2.0.1.3 Release Notes

           

   Article Number:     513870                                   Article Version: 5     Article Type:    Break Fix 
   

 


Product:

 

ScaleIO Software

 

Issue:

 

 

Fixed in Scaleio 2.0.1.3:   
    See ScaleIO 2.0.1.3 Release Notes for current fixes and known problems with workaround:   
    https://support.emc.com/docu82892_ScaleIO_2.0.1.3_Release_Notes.pdf?language=en_US&language=en_US   
     
                                                           

 

 

Change:

 

 

Fixed in Scaleio 2.0.1.3:   
    See ScaleIO 2.0.1.3 Release Notes for current fixes and known problems with workaround:   
    https://support.emc.com/docu82892_ScaleIO_2.0.1.3_Release_Notes.pdf?language=en_US&language=en_US   
     
                                                           

 

 

Resolution:

 

 

   

      Fixed in Scaleio 2.0.1.3:     
      See ScaleIO 2.0.1.3 Release Notes for current fixes and known problems with workaround:     
      https://support.emc.com/docu82892_ScaleIO_2.0.1.3_Release_Notes.pdf?language=en_US&language=en_US     
     
      Issue number Severity Product Component  and Problem summary      
     
      82168518 SCI-17542 1-High MDM In a system with low available memory, changing the SDS performance profile to "high" might cause it to disconnect for a few seconds.  Doing this to multiple SDSs in parallel might result in  temporary data unavailability.         
         
          84223528 SCI-19755 1-High MDM When an SDS is in maintenance mode, if the Master MDM is restarted or a switchover is done, and the new Master MDM is on a server that is under extreme memory load, some capacity might become unavailable until exiting maintenance mode.         
                    
          6495094 SCI-21189 1-High MDM Entering an unknown GUID in the map_volume_to_sd command might cause the MDM to restart.         
         
          84793404 SCI-20980 1-High MDM, SDS When the MDM disconnects from multiple SDS nodes, a data DEGRADED event may be observed.
     
      83293640, 84794292, 84793404, 83291602  SCI-18294 1-High MDM, Security When running the bind function as part of the LDAP authentication, the LDAP server might not reply. The thread  stays "stuck" waiting on the reply until it times out.  Fixed Issues 83549926, 82583102     
     
      SCI-13257 1-High SDC The Linux mount “discard” option does not use SCSI commands to perform its space reclamation action. No capacity is trimmed within the ScaleIO volume whenever a "discard" command is issued to a scini device. An attempt to run the “fstrim” command on a mounted scini device results in an error message, reporting that the device does not support this operation. This issue seems to also affect the “trim” plug-in in XenServer environments: XenServer’s VGs are trimmed, however the ScaleIO volume does not reflect the trimmed capacity.     
     
      79505790 SCI-18491 1-High SDS When a physical SSD device of an SDS is pulled out of a serverstem, the SDS might not reconnect back to the ScaleIO system.     
     
      84322862 SCI-19967 1-High SDS CloudLink generates the device mapper names based on the device name (for example, /dev/sdX will always be mapper/svm_sdX).  In rare cases, when the device paths change after an SDS reboot (the /dev/sdX dev name), the ScaleIO system does not reconfigure to match with the CloudLink persistent device mapper /dev/mapper/svm_sdX. This will prevent the SDS from adding the devices back.  The solution, implemented in v2.0.1.3, is to reconfigure the SDS devices with the CloudLink mapper after each SDS restart.     
     
      6382254 SCI-20823 1-High SDS When an SDS is in a rebuild process and network disconnections are encountered, the rebuild fails without reporting to the MDM that the SDS is in degraded status.  This will cause the MDM not to trigger a role switch, resulting in IO error.     
     
      83941192 SCI-19238 2-Medium CLI, LIA, SDS The output returned from the hardware awareness scli --query_sds_device_info command does not reflect actual disk information.     
     
      80912480 SCI-16125 2-Medium SDC Running the Windows 'chkdsk' utility on ScaleIO system installed on Windows 2008, will return errors.  SR number 81165758, 81212730,  81450412     
     
      SCI-16500 2-Medium SDC When an I/O failure occurs, the ScaleIO system logs an error message for each incident. Under high I/O stress, when I/O errors are encountered, error messages are displayed many times (due to many retries). This depletes the machine’s resources (mainly CPU used by syslog). As a result, the ESXi system may freeze for several minutes.     
      6334236 SCI-20618 3-Low General A space character is missing in the ScaleIO syslog events reporting. This does not conform with RFC5424.     
     
      SCI-11490 1-High AMS When upgrading a large system, and the GUI and AMS reside on the same Windows machine, the GUI session might get an out of memory exception.     
     
      SCI-11517 1-High AMS In some cases, in scale configurations, the GUI might reach over 2 GB memory usage.     
     
      SCI-11865 1-High AMS In a system with 50 (or more) nodes, the Hardware view=may show the Add Node button instead of the node view.     
     
      SCI-13154 1-High AMS In rare cases, during an upgrade, an "Unexpected Error" message might be displayed while waiting for the ESX to enter into maintenance mode.  The upgrade fails because the SVM cannot be powered off because AMS fails to identify the correct state of the ESX.     
     
      SCI-17903 1-High AMS  The Renew Certificate process runs without verifying whether the certificate is new. When the current certificate is being used for renewal, this results in the system entering maintenance mode unnecessarily.     
     
      SCI-18627 1-High AMS Users may encounter a special scenario involving large and small block sizes handling simultaneously, which cause some IOs to go through DAS Cache and some to go directly to the disks.  If the node has a single NUMA (Non-Uniform Memory Access) socket, in some cases, it causes threads from different processes (DAS Cache, SDS, etc.) to run on the same core, exhausting the resources.  The above description, combined together, including flushing done to the disks from DAS Cache, causes “hiccups” or “IO wait” that result in SDS disconnection         
         
          SCI-10494 1-High AMS, GUI While running an upgrade (ScaleIO and BMC) on a 2U4N ScaleIO Node, the connection between the GUI and the AMS may be lost.  SCI-14168 1-High AMS-GUI The AMS GUI may freeze when performing actions that take longer than usual, such as adding many devices at once, configuring the cache settings during deployment, and similar actions.
     
     
      SCI-20501 1-High DAS Cache In stressed IO scenarios and when DAS cache is destaging a full cache, some disk timeouts may occur, which can lead to disk failure.  This issue occurs on previous versions of the firmware and driver of the Perc H730 controller in a ScaleIO ESXi environment with RDM solution configuration. Using the latest certified versions fixes this issue.     
     
      SCI-4124 1-High Device If devices are loaded after the SDS service starts, the SDS reports device errors.     
     
      SCI-17132 1-High ESX, MDM In rare cases of very high I/O load on disks in ESXi  environment, the SVM may freeze for short periods of time.  This can cause MDM network disconnections and in some extreme cases, MDM cluster switchover. The issue is an ESXi issue.     
     
      SCI-11621 1-High GUI In very large-scale systems (100k volumes+), the GUI might start up slower than expected. Also, the user might experience slower response in various volume.     
     
      SCI-13138 1-High General In rare cases, after multiple SDS reconnection, the rebuild process might get stuck.     
     
      SCI-14935 1-High IM Server The ScaleIO System Analysis feature (415), which enables you to identify potential issues with your ScaleIO system,  doesn't identify ScaleIO devices that were set to be in 'Offline' state.     
     
      SCI-18978 1-High IM Web, MDM When upgrading ScaleIO from v2.0.0.3 to a version prior to v2.0.1.3 (v2.0.1, v2.0.1.1, or v2.0.1.2), Standby Tie-Breakers become Standby Slaves.     
     
      SCI-13037 1-High LIA On Windows servers, after upgrading from v1.32 to v2.0 (which leaves the system in "Non Secure mode"), when extending from 3 to 5-node cluster, the operation fails due to LIA panic.   

   

      SCI-12342 1-High MDM In rare cases, even though an MDM has more than one network interface, the MDM might switch from the Master to Slave role when only one of the network interfaces experiences connection disruption.   

   

      SCI-12683 1-High MDM In rare cases, when running a script that repeatedly causes an SDS to enter and exit maintenance mode, the SDS might go offline and come back online.   

   

      SCI-13896 1-High MDM When there are SDSs in Maintenance Mode, disconnecting an SDS that is not in Maintenance Mode, might lead to I/O errors.  No data loss occurs in such a case.  All the data will be available once the SDS reconnects, or upon exiting from Maintenance Mode.   

   

      SCI-16521 1-High MDM In a network environment with connectivity and high latency issues, when replacing an MDM or changing cluster mode, in rare cases, the Master MDM restarts and the commands timeout. After the Master MDM restarts, the system will continue as usual. It is possible that the replace or change mode action will succeed in spite of the timeout.   

   

      SCI-16978 1-High MDM After an unexpected termination of the MDM process, one or more SDS devices might report being in an error state.  This is a false error that needs to be cleared to continue normal operation.   

   

      SCI-17341 1-High MDM In very rare cases, after MDM switchover when an SDS repeatedly fails to rejoin the cluster due to extreme delays of SDS responses, IO may fail to this SDS, but the expected forward rebuild is not initiated.   

   

      SCI-12515 1-High MDM, SDS During extreme load on a network with high latency simulation, ESX nodes might present the error event "Power-on Reset occurred."   

   

      SCI-18936 1-High MDM, Upgrade  When upgrading a 5-node cluster ScaleIO system, the rollback functionality of a 5-node cluster is not operational.   

   

      SCI-6836 1-High NDU, SDS After upgrade from v1.32.x to v2.0.x.x (fixed in v2.0.1.3), emergency DRL hardening does not work.  The emergency DRL hardening does work if a clean install was preformed (rather than an upgrade).   

   

      SCI-17338 1-High NDU, Security When upgrading ScaleIO components from v1.32.x to v2.0.1, v2.0.1.1, or v2.0.1.2, the following configurations are deleted:     
      1. Restricted SDC mode.|         
          2. Remote read only limit.         
          3. Remote Syslog.
     
      Upgrades to v2.0.1.3 are not affected, however if the upgrade has already been performed to the earlier versions, these configurations were already deleted, and must be manually reinstated.     
     
      SCI-17403 1-High REST Gateway When upgrading the ScaleIO Gateway from v2.0.0.x to 2.0.1 or from v2.0.0.x to a newer version of 2.0.0.x, the following parameters are deleted from the gatewayUser.properties file:  Trusted.LIA.IPs  lockbox.persistent.path SSH.Key.Path     
     
      SCI-12639 1-High SDS Frequent and long network disconnections might result in an IO error.     
     
      SCI-12928 1-High SDS In some cases, the SDS might remain in join-pending state after clearing device error.     
     
      SCI-13517 1-High SDS ScaleIO volumes formatted with exFAT may cause delays in the restart of the SDS process, when the restart was initiated during IOs.     
     
      SCI-17694 1-High SDS When adding SDSs in parallel, if one add request fails for a legitimate reason, for example a communication error, you may not be able to add that SDS again.     
     
      SCI-21363 1-High SDS CloudLink generates the device mapper names based on the device name (for example, /dev/sdX will always be mapper/svm_sdX). In rare cases, when the device paths change after an SDS reboot (the /dev/sdX dev name), the ScaleIO system does not reconfigure to match with the CloudLink persistent device mapper /dev/mapper/svm_sdX. This will prevent the SDS from adding the devices back.  The solution, implemented in v2.0.1.3, is to reconfigure the SDS devices with the CloudLink mapper after each SDS restart.     
     
      SCI-10481 2-Medium AMS Closing the Hardware Deployment Wizard during the Deploy stage causes the Discovering Nodes process to run endlessly, with the Deploy button remaining enabled.     
     
      SCI-11494 2-Medium AMS An upgrade operation finished successfully, but returned an exception to the user.     
     
      SCI-18401 2-Medium AMS-GUI The NIC properties and status are not displayed correctly (or may not be displayed at all) in the GUI.     
     
      SCI-16423 2-Medium CLI, MDM When using the SCLI to reduce the RFcache page size from the 64K default size, the memory usage for the RFcache kernel driver in the ScaleIO VMs may cause a memory starvation event of the SDS and MDM processes.  This will lead to a failure of both process to start.     
     
      SCI-11845 2-Medium GUI In a large-scale environment, when you sort the Backend view by Storage Pools, and then click the "+" button quickly twice, the Storage Pools may be presented twice.     
     
      SCI-17571 2-Medium GW When extending a ScaleIO system using the Installation Manager, the spare capacity percentage is calculated automatically by the system. If the initial spare capacity was set manually, it may cause problems since there may not be enough free capacity for the new Storage Pools beyond what was initially set.     
     
      SCI-16709 2-Medium IM Server When login to the Gateway (Installation Manager), the 5 seconds delay upon insertion of incorrect login credential, is not enforced.     
     
      SCI-14045 2-Medium MDM The system may enter an endless rebuild when entering an SDS into Maintenance Mode and then repeatedly exiting after few minutes.     
     
      SCI-16929 2-Medium MDM In rare cases, when changing the cluster configuration or removing cluster nodes, the MDM may become unresponsive to SCLI commands, and will restart itself after 5 minutes.  After the MDM process restarts, the system will resume normal operation mode.     
     
      SCI-12945 2-Medium SDS In a scenario where all the following conditions apply, IO errors might be encountered:     
      1. A device error was encountered on an SDS, and a rebuild has been performed         
          2. IOs are running         
          3. The Clear Device Error command is executed, using the GUI or CLI
     
     
      SCI-13101 2-Medium SDS In rare cases, after clearing a device error, the SDS might restart.     
     
      SCI-10597 3-Low AMS In the Upgrade window, the text error message displayed when one of MDMs is down is unclear.     
     
      SCI-7243 3-Low GUI During the Add Devices stage of the Add Nodes process, specifically, when applying devices to a Storage Pool,  Storage Pool capacity size may temporarily be displayed as twice the actual size. After completion, the capacity is again displayed at its real size.     
     
      SCI-13096 3-Low General In rare cases, when one of the SDS nodes is stuck in join-pending state, the MDM might present wrong statistics that take into account the last successful SDS's report.     
     
      SCI-15349 3-Low MDM In a vSphere ScaleIO Plug-In environment, when using the scli command "--replace_cluster_mdm" to replace a slave MDM with a standby MDM, the following error is displayed: "A timeout occurred".     
     
      SCI-11839 3-Low SDS Adding a mapped ScaleIO volume, or any partition on it, as an SDS device in the same Storage Pool might result in data conflict issues.  This is more sensitive on Windows OS based nodes, as there is currently no distinction between normal & ScaleIO physical drives in the disk management utility.     
            
      SCI-17641 3-Low SDS When an SDS is in maintenance mode,you cannot toggle between the performance parameters (Default and High).     
            
      SCI-15718 3-Low Tools When using a hostname for syslog server, messages stop to arrive after changing the IP address.