Isilon Gen6: Gen6 cluster shows event 'HW_INFINITY_I2C_FAULT' but nodes are up and running

           

   Article Number:     503528                                   Article Version: 15     Article Type:    Break Fix 
   

 


Product:

 

Isilon,Isilon Gen6,Isilon F800,Isilon H400,Isilon H500,Isilon H600,Isilon A200_A2000

 

Issue:

 

 

Gen6 clusters can sometimes show the critical-level event 'HW_INFINITY_I2C_FAULT' for multiple nodes even though all nodes are up and running. The event generally cancels after a few minutes, but may reoccur at a later time. The event implies an I2C communication error has occurred, but this does not necessarily indicate any hardware failure has occurred. Outside of the event itself, you should not experience any adverse effects to your node or cluster. Given these I2C messages are often between two nodes, it is common to see this event on more than one node in a chassis.                                                           

 

 

Cause:

 

 

An issue has been identified with the shipping firmware for the BMC component in certain Gen6 nodes that appears to be causing many of these error messages. The issue has been resolved in Gen6 BMC firmware 23.90 which was released in Node Firmware Package 10.2.1. This issue is non-impactful; the node’s redundant I2C buses should ensure that necessary system communication continues normally.                                                           

 

 

Resolution:

 

 

The issue has been addressed in Gen6 BMC firmware version 23.90, which was released in Node Firmware Package 10.2.1. This node firmware package also contains fixes for several other Gen6-specific issues. Improvements to prevent this event from sometimes being generated at the critical event level unnecessarily instead of at the warning level have also been released, and improvements were made to the firmware upgrade process in OneFS 8.1.0.4 and later. It is recommended that you upgrade the cluster to OneFS 8.1.0.4 or later, and Node Firmware Package 10.2.1 or later, in that order, during your next scheduled maintenance window, or at the earliest opportunity. If upgrading OneFS is not an option at this time, upgrading directly to Node Firmware Package 10.2.1 should still resolve this spurious error message.   
   
    PLEASE NOTE: Upgrading node firmware and/or OneFS will require each node being upgraded to reboot at the end of its upgrade cycle - this is a required part of the upgrade process and cannot be avoided.   
   
    The latest Node Firmware Package and OneFS installation package, and their accompanying Release Notes documents, can be downloaded from the support.emc.com web site. Instructions for installing the firmware package and updating the cluster firmware, as well as instructions for updating OneFS, can be found in their respective Release Notes documents.