DCNM 10.1(x) Hangs or Stops Responding

Article Number: 499525                              Article Version: 3                               Article Type: Break Fix


Product:

Connectrix MDS-Series Data Center Network Manager

 

Issue:

DCNM 10.1(x), when on a virtual server without reserved resources, has its Web Client or SAN Client stop loading and/or responding until DCNM services are restarted.

 

 

Cause:

Conditions: All must be met.

DCNM 10.1(x) is being used.

Performance Monitor / Collection feature is in use.

DCNM is on a virtual server with shared resources.

You will see snmp timeout messages in DCNM's logs, and especially the Web Client will start to hang and no longer respond unless you stop and restart DCNM services. You will see Java "out of memory" errors.

 

 

---- Further Details on Cause:

 

 

Cisco has confirmed that in DCNM 10.1(x), the code for Performance Monitor uses a lot of CPU to process its tasks. When the server has reserved CPU this isn’t a problem, but when it doesn’t, java is unable to acquire enough processing power to complete its tasks with snmp and heap memory will slowly build up and cause services to come to a standstill. It simply requires a lot of CPU, and with shared resources the amount it needs isn’t available.

 

 

This can be verified by logging into the server's internal debug port and watching the server's Heap Memory; if over time it increases and is getting close to the listed memory limit, you've likely got a match. You can also have Cisco TAC install and use Java JDK's jmap feature to monitor active processes and see what ones are using the most memory. If it is snmpd, you've likely got a match.

 

 

Server Debug Port Login Instructions:

Perform a daily Internal Debug Port Login : https://<serveraddress>/serverinfo user creds: admin/nbv_12345 ; take Heap Memory Usage snapshots once/day.

We want the snapshots to compare use and see if it is increasing.

 

 

Change:

No change other than customer is using DCNM 10.1(x).

 

Resolution:

Upgrade to or do a fresh install of Data Center Network Manager 10.2(1).

 

So, we have 2 options:

Ask for reserved CPU. This would alleviate the issue.

Upgrade to DCNM 10.2(1), as the PM collection coding has been improved in this release where it is much faster and consumes less CPU.

 

 

.p