Connectrix B-Series: MAPS daemon terminated and caused CP’s going out of sync.

           

   Article Number:     496168                                   Article Version: 3     Article Type:    Break Fix 
   

 


Product:

 

Connectrix,Connectrix ED-DCX-B,Connectrix ED-DCX-4S-B,Connectrix ED-DCX8510-4B,Connectrix ED-DCX8510-8B

 

Issue:

 

 

Issue:   
    Impact:   
    None   
        
    Environment:   
    EMC Hardware: Connectrix ED-DCX-B   
    EMC Hardware: Connectrix ED-DCX-4S   
    EMC Hardware: Connectrix ED-8510-4B   
    EMC Hardware: Connectrix ED-8510-8B   
    Brocade Software: Fabric OS 7.3.1   
        
    Problem:   
    I.   
    HA out. of sync message in the errdump, frequently, every minute.   
    Errdump:   
    2017/02/17-07:49:24, [MAPS-1020], 32312, SLOT 6 | FID 128, WARNING, DCX, Switch wide status has changed from HEALTHY to MARGINAL..       
        2017/02/17-07:50:24, [MAPS-1020], 32313, SLOT 6 | FID 128, WARNING, DCX, Switch wide status has changed from MARGINAL to HEALTHY.       
        2017/02/17-07:51:24, [MAPS-1020], 32314, SLOT 6 | FID 128, WARNING, DCX, Switch wide status has changed from HEALTHY to MARGINAL.       
        2017/02/17-07:52:24, [MAPS-1020], 32315, SLOT 6 | FID 128, WARNING, DCX, Switch wide status has changed from MARGINAL to HEALTHY.
   
   
    After a while the MDD terminated:   
    [KSWD-1002], 51122/32327, SLOT 6 | FFDC | CHASSIS, WARNING, DCX, Detected termination of process mdd:2209., hasm_swd.c, line: 191, comp:insmod, ltime:       
        [FSSM-1003], 51123/32328, SLOT 7 | CHASSIS, WARNING, DCX, HA State out of sync., OID:0x003e0007, hasm_sgi.c, line: 2139, comp:hamd, ltime:       
        [RAS-5001], 51124/0, SLOT 6 | CHASSIS, INFO, DCX, Message KSWD-1002 caused FFDC event, ffdcd.c, line: 1030, comp:raslogd, ltime:       
        [RAS-1001], 51125/32329, SLOT 6 | CHASSIS, INFO, DCX, First failure data capture (FFDC) event occurred., ffdcd.c, line: 1031, comp:raslogd, ltime:
   
        
    II.   
    Another problem could be a Maps daemon (MDD) panic and restarting on active CP, as a result, director will lose HA SYNC status.   
    This could happen when standby CP is in and out of faulty state very frequently, and HA cannot be recovered even after standby CP replacement.
                                                           

 

 

Cause:

 

 

MAPS mdd daemon was terminated, due to a NULL pointer reference in in the code.   
    The MAPS daemon, panicked and did not restart on the active CP, and as a result the director lost HA SYNC.   
        
    Brocade DEFECT000618464
                                                           

 

 

Change:

 

 

None                                                           

 

 

Resolution:

 

 

Fix:   
    Upgrade to Fabric OS v7.4.1e, v7.4.2, v8.0.2, v8.1.0.   
        
    Workaround:   
    For problem I, there is no workaround, the mdd daemon terminated on the standby and came back online and in sync automatically.   
        
    For problem II, Schedule a maintenance window and reboot the active CP. Which will be disruptive, since the CP’s are not in sync.