A dual SP reboot can occur when upgrading FLARE code after issuing a commit.

Product:

 

CLARiiON Series

 

Description:

 

A dual SP reboot can occur when upgrading FLARE code after issuing a commit.  The Commit fails and a check condition is generated:  PSM_BUGCHECK_DATA_AREA_ACCESS_TIMER_EXCEEDED

 

Thin LUN failure during a non-disruptive upgrade (NDU).

 

Issues detailed below can potentially affect the uptime of CLARiiON CX4 arrays running a FLARE revision of Release 30 earlier than Release 30.509.

 

EMC has implemented several important changes to the FLARE Release 30 software. These changes are part of EMC’s effort to continuously improve product quality. This ETA provides guidance to improve uptime and robustness of EMC CX4 Series arrays and Celerra Network Servers.  All Release 30.511 fixes are documented in Release Notes available on Powerlink. The Release Notes should be consulted for a full list of issues addressed in a particular release. EMC recommends planning to upgrade CLARiiON arrays no less than twice a year for the best uptime/robustness experience.

 

  • Media errors during a Proactive Copy (PACO) on a R3, R5, or R6 RAID group can lead to stale data.During proactive sparing, FLARE reads from the Proactive Copy (PACO) candidate and writes to the PACO spare.  If there are any media errors, the data is reconstructed from the other disks in the RAID Group.  This reconstructed data is not being written to the PACO spare.  As a result, stale data remains on the PACO spare.  A subsequent data verify will return a coherency error when the data is read. A write to the lost position will correct the issue for that stripe.  Data loss could occur with the need to unbind/rebind/restore due to resulting coherency errors. 

    Occurrence potential: High

    Caution! A proactive copy could be initiated automatically if one or more hot spare drives are configured on the system. The only workaround to avoid this bug on a system running Release 30 (R30.508 OR earlier) is to remove/unbind all hot spare drives on the system.
     
  • A dual SP reboot can occur when upgrading FLARE code after issuing a commit.  The Commit fails and a check condition is generated:  PSM_BUGCHECK_DATA_AREA_ACCESS_TIMER_EXCEEDED.
    The reboot is a result of an I/O that is unable to complete within a specified time limit.  The root cause is a deadlock condition that occurs between two operations that need to acquire a lock on the same RAID group.

    Occurrence potential: Medium
     
  • Dual SP Reboot. Arrays running FAST (Auto Tiering) are susceptible to a dual storage processor (SP) reboot that can cause a data unavailable condition.  A code problem results in a deadlock due to a lock not being released at the appropriate time. The deadlock prompts FLARE to reset the storage processors to clear the deadlock.

    Occurrence potential: Medium
     
  • Thin LUN failure during NDU.

    If I/O is ongoing during an upgrade (pre-R30 -> R30+), the upgrade can fail and leave the Thin LUNs offline and unavailable to the host.

    Occurrence potential: Medium
     
  • LUN Compression does not work for LUNs greater than 2 terabytes
    The problem is encountered when an attempt is made to compress a pool LUN larger than 2 TB.  A data variable was set too low previously, but is set correctly in the Release 30.509.

    Occurrence potential: Medium
     
  • Unisphere displays Initiator connectivity status incorrectly.
    Arraycommpath appears as disabled when in fact it is enabled for the selected host. CLI displays the setting correctly.  This is the most common mode setting.

    Occurrence potential: Medium

     
  • Unisphere memory leak
    Leak occurs with Thin Provisioning.   (Management Server- Thin Provisioning Provider)

    Occurrence potential: High
     
  • Single SP Virtual Provisioning timeout panic
    Lock contention between Back Ground Zeroing and MLU polling causes the timeout.  The panic occurs following the commit of a non-disruptive upgrade (NDU) from Release 28 or Release 29 to Release 30.

    Occurrence potential: High

 

Resolution:

Upgrade to FLARE Release 30.511 or later for fixes for all of these issues.

To schedule an upgrade, contact the your EMC service representative. Please quote this solution ID (emc258303).

Identical SP check codes can have multiple causes. Do not automatically assume that when an SP bugcheck code matches a solution that the symptom or resolution will be the same. Consult with Technical Support Level 2 to determine whether the problem and resolution are the ones indicated in this solution.

To determine which component faulted Technical Support Level 2 requires dump files and SPCollect files.

EMC recommends upgrading CLARiiON arrays no less than twice a year to take advantage of the latest fixes.

 

For more information on this, refer primus solution “emc258303”.