Limiting downtime for SMB workflows during OneFS upgrades

NOTE: This topic is part of the Uptime Information Hub.

 

Are you planning to upgrade the version of OneFS running on your EMC Isilon cluster? Do your clients connect to the cluster over the SMB protocol?  If you’re planning to upgrade the version of OneFS running in your SMB environment, this article provides information about your upgrade options and things to consider before you upgrade.

 

Upgrade types

Depending on the version of OneFS running on your cluster, and the version of OneFS you want to upgrade to, your cluster can be upgraded in one of two ways: a rolling upgrade or a simultaneous upgrade. The downtime you might experience during a OneFS upgrade is directly related to the type of upgrade you perform.

 

What is a rolling upgrade?

During a rolling upgrade, OneFS reboots each node in the cluster sequentially. During this procedure, the version of the OneFS operating system running on a node is upgraded, the node is rebooted, and then the next node is upgraded and rebooted until all nodes in the cluster have been upgraded. Because only one node is being upgraded at any given time, the bulk of the cluster's bandwidth and redundancy is available throughout the entire upgrade process.


The cluster remains online during a rolling upgrade and continues serving clients with little interruption in service, although  SMB client connections to a node will be disconnected when that node is rebooted during the upgrade.


During a rolling upgrade, if an SMB client is disconnected when a node reboots, the SMB client can reconnect to a different node in the cluster. However, it's possible that the node that the client reconnects to will be rebooted later during the upgrade. For this reason, upgrades are ideally scheduled in advance during a maintenance window and users connected to the cluster through an SMB client are asked to proactively disconnect from the cluster before the upgrade begins.


Rolling upgrades are only supported between specified releases. The versions you can perform rolling upgrades between depends on various factors, including changes in functionality between releases, which bug fixes are included in the version you’re upgrading to, and known issues that affect the upgrade process. As a rule, you cannot perform a rolling upgrade between major releases and between some minor releases. To determine if a rolling upgrade is an option between the versions of OneFS you are upgrading from and to, read the release notes for the version of OneFS that you are upgrading to. OneFS release notes are available on the EMC Online Support site.

 

What is a simultaneous upgrade?

A simultaneous upgrade reboots OneFS and restarts all of the nodes in the cluster at the same time. During a simultaneous upgrade, all SMB clients are simultaneously disconnected, and they must reconnect when the upgrade is complete. However, this only occurs once during the upgrade process. A simultaneous upgrade is faster than a rolling  upgrade but requires a temporary interruption in service to all clients when the cluster is rebooted. Data is inaccessible during the time that it takes to complete the reboot process. All input/output (I/O) requests by clients connected to the cluster are suspended during the simultaneous reboot.


Although clients are disconnected during a simultaneous upgrade, NFS clients will reconnect automatically and resume operations. Clients connected to the cluster through other protocols—particularly SMB 1 and SMB 2 clients—must reestablish their connections. Note: This is a limitation in the SMB protocol.

 

Ultimately, in SMB environments, whether you perform a rolling or simultaneous upgrade, the choice is between continued service with intermittent disconnections over a longer period of time versus a temporary but complete interruption of service.

 

Things to consider

  1. Schedule upgrades in advance during a maintenance window. Request that SMB clients disconnect from the cluster proactively prior to the maintenance window.
  2. It's best to initiate a reboot from the command-line interface (CLI). The CLI displays more detailed information than the web interface. You can also launch a screen session on the CLI console, which enables you to resume from where you left off if you get disconnected when a node reboots.
  3. Read the documentation before upgrading your cluster. For instructions about how to plan an upgrade, prepare the cluster for upgrade, and perform an upgrade of the operating system, see the OneFS Upgrade Planning and Process Guide. For a list of supported upgrade paths to the version of OneFS to which you plan to upgrade, see the OneFS release notes for that version of OneFS on the EMC Online Support site.