2 Replies Latest reply: Apr 19, 2017 12:42 PM by jim.kunysz RSS

OneFS 8.0.x.x Event Alerts

jim.kunysz

We recently upgraded from 7.2 to 8.0.0.4 and we don't receive the same event alerts in the webui.

For example, the Webui has been reporting all nodes as 'healthy' but when i ran a isi_hw_status, i found that node 4 was reporting a Faulted Power Supply.

When I reviewed the WebUI for configuring the Event Alerts, under Edit an Alert, there is the default 10 Event Group Categories but no where to drill down into what events are reported for each group; attempting to add another Event Group ID apparently requires you to 'know' what Event ID you want to add. In reading the Isilon Web Administration guide, it doesn't provide any list for Event Group ID's or break down what the default Event Group categories explicitly report on.

Is there any documentation that provides this level of detail because if the webui won't report on a failed node power supply, I wonder what other hw conditions it won't report on and that makes me not 'trust' the webui for event alerts, forcing me to write monitoring scripts instead.

 

Thanks.

 

Jim

  • 1. Re: OneFS 8.0.x.x Event Alerts
    Phil Lam

    Your failed power supply could be an indication of BMC/CMC issue. Failed BMC/CMC on that node.

     

    466373 : S210, X210, X410, NL410 or HD400 shows event: 'Node's Baseboard Management Controller (BMC) and/or Chassis Management Controller (CMC) are unresponsive.' https://support.emc.com/kb/466373

     

    Step 1 - Reset BMC/CMC:

    First, each node will need to be power cycled and drained to reset the BMC to a clean state. This avoids potential upgrade issues due to a developing BMC hang condition. Shut down the node by either running '

    shutdown -p now' via the CLI or selecting the shutdown option in the Web UI. Once the node is powered off, unplug both power cords, wait one minute, reconnect the power cords and restart the node by pressing the power button on the back of the node...

  • 2. Re: OneFS 8.0.x.x Event Alerts
    jim.kunysz

    Ahh, I'll have to review whether the BMC remediation steps have been performed on that cluster. I'll check and provide an update.

     

    Thanks.