Find Communities by: Category | Product

For the last time, welcome back! Here we are, end of the series, and almost everything covered! What wasn't will be covered today in this article. For our last look at VMAX & OpenStack Ocata we are going to delve deeper into the topic of troubleshooting, I know, I have covered troubleshooting in all of the articles so far, but I felt it could still use a dedicated article. We won't be looking at troubleshooting individual features, but instead troubleshooting the setup and configuration of your environment,and where to look if something isn't working as expected.

 

How to properly troubleshoot issues in your environment

When I am troubleshooting any kind of issues with Cinder & VMAX I always follow the same series of tasks to determine what is wrong.  Almost every time I find the issue is human-error on my part and a quick fix is all the work that is needed to get things running smoothly. When beginning the process of determining what is wrong in my environment I follow these steps:

  1. Check Cinder logs for indication of warning or error debug statements (if you are seeing incorrect behaviour for operations involving attaches to instances be sure to check the Nova logs also!).
  2. If nothing is apparent from the standard Cinder/Nova logs, enable debug mode for the service, restart the service, and attempt the operation again. Additional debug level reporting may provide a better insight into what is going on, and in the event that you may need to escalate the issue debug level logs will be required from your environment.
  3. After you investigate the logs, even if you do or don't find an indication as to what the problem may be, check the configuration of your VMAX back end(s) in cinder.conf to ensure all required parameters are included and correct, and the back end(s) are included in the enabled_backends parameter in the [DEFAULT] section
  4. If everything appears to be correct with your back end config in cinder.conf check your associated XML configuration files, are all required tags included values correct?
  5. If all configuration seems correct, is your ECOM server accessible and running?
  6. If the ECOM is fine, is there successful connectivity in your storage network between your controller and VMAX?
  7. Lastly, check your SSL certificates are valid and imported into your distro correctly. You can also specify the path to the certificate itself, I recommend also trying this to be doubly sure

 

Following the steps above I am usually able to fix any problem myself, as mentioned previously in the series almost every problem encountered is a configuration issue and can be isolated and fixed using the steps above.

 

Enabling debug mode in OpenStack

To enable debug mode for any service in OpenStack, navigate to that service's .conf file in its installation directory and set the debug flag to true in the [default] configuration group. Restart the service to see the change reflected in the service log files.

 

Commonly encountered configuration errors

To give you an idea of where to look or what to do if you encounter some configuration issues with the VMAX Cinder drivers, I will go over some of the most commonly met issues, how to spot them, and how to fix them. An important piece of information to look for in the logs when trying to diagnose an issue is to look for some indication of a specific resource which is the problem, be it a volume type referenced in the error logs, or a service or resource which is inaccessible. Identifying this will narrow your search for the problem dramatically.

 

Note: For each of the problems and solutions below, it is necessary to restart Cinder services after the changes so they propagate through the system. If the issue is with Nova, then Nova services need to be restarted and so on. To save me writing it every time it is safe for you to assume that after each config change you will need to restart the required services so the changes take effect.

 

Misconfigured back end stanza in cinder.conf

If the back end specified in the enabled_backends parameter in the [DEFAULT] section does not match the back end configuration group name you will get 'failed to initialize driver' error back in Cinder volume logs:

 

StanzaWrongSpelling.PNG.png

StanzaWrongSpelling2.PNG.png

 

The screenshots from above hints at a cinder-volume group is not found and following which tells us the affected Cinder service (cinder-volume) and associated problematic VMAX volume type, this makes sense as there is inconsistency between the back end specified in the enabled_backends parameter in the [DEFAULT] section and what is defined for the VMAX configuration group name.

 

StanzaWrongSpelling3.PNG.png


Fixing this issue is easy, just rename the configuration group so that it matches what is specified in the enabled_backends parameter. Restart the Cinder services after the change and everything should be fine!


Misconfigured VMAX back end XML configuration file

There are a few things which can go wrong here so I will cover them all in one go. First up, incorrect spelling, case-sensitivity, and the impact it has. When you specify the XML configuration file in cinder.conf, if the path is incorrect or the XML filename is wrong you will see the error in the screenshot below in the Cinder volume logs. To fix this, either fix the path to the XML file in cinder.conf or change the name of the XML file itself so it matches what is in cinder.conf. Restart the cinder-volume service after the change to clear the error.

 

WrongXMLName1.PNG.png

 

The next most commonly encountered issue is misspelling in the XML tags within the file or missing tags. When you are creating your back end XML configuration file, ensure that you have all required tags included, correct spelling, and correct values for each. After fixing any problems that are found in your XML file, just restart cinder-volume service to clear the error.  For a full and complete breakdown of the XML tags in use, check part 1 of this series for 'installation & setup', section 7 - 'Create your VMAX volume type XML configuration file'

 

Misspelling in XML tags is a bit trickier to spot from the logs but thankfully once identified it is an easy fix.  I have highlighted a few different parts of this screenshot of the logs to make things a bit clearer.

 

WrongXMLName2.PNG.png

 

We know from the top of the error log that the issue is with the volume type VMAX_ISCSI_DIAMOND, but alone that isn't enough. The next parts let us know that a call was made to gather_info from the config_file. Just below this we get an important piece of info, the parseString error is related to an XML function. From these pieces of info alone we can deduce there is a problem with the volume type XML configuration file. The last line 'ExpatError: mismatched tag: line 4, column 28' tells us the exact line and position of the error. Lets go have a look...

 

WrongXMLParam.PNG.png

 

After opening the XML config file it is immediately obvious what the issue is, there is an incorrectly defined XML tag which has been misspelt in this occasion. Fix the misspelling of the tag, restart Cinder services and clear the error to get expected behaviour. I have removed some values from the screenshot above for security reasons, these tags should all have respective values relating to your own environment.

 

If you happen to incorrectly specify a parameter value, say port groups for example, you might not realise the error until later when you attempt some operations in OpenStack. Sticking with port groups as the example, if you specify an incorrect or non-existent VMAX port group in your XML file, you won't know about it until you try to attach a volume from that volume type to an instance or copy an image to it, that's because the error won't occur until we hit an operation where the port group is necessary.

 

WrongXMLParam2.PNG.png

 

Thankfully the cinder-volume logs are very good, and in this case they will tell you exactly what is going on. Fix the incorrect value and restart services to clear the error.

 

SSL Certificate Troubleshooting

Starting with Solutions Enabler 8.3 and newer SSL encrypted communications is enabled by default, so must be enabled and configured for use in our environment. There are only a small few things which can go wrong with configuring SSL so we won't have to look to far if we get errors back about it. The main issues with SSL are:

  • SSL parameters not included in back end in cinder.conf
  • Invalid SSL certificate
  • Certificate not imported into distro correctly
  • Path to SSL certificate is invalid

 

If you don't include the required SSL parameters in your back end stanza in cinder.conf you will see an error like the one in the screenshot below in the cinder-volume logs. The indicator that the issue is with SSL config for your back end is the line 'CIMError: (0, "The web server returned a bad status line ''''")'. The web server in this instance is our ECOM server, which is the server we are trying to connect to using SSL certs, the bad status line although nondescript tells us enough so we know to look at the SSL config.

 

SSLerror1-NoSettings.PNG.png

 

When checking your SSL config in cinder.conf make sure that you have the following included for each and every VMAX back end. The driver_ssl_cert_path is optional, you only need to include the direct path if you do not import the certificates directly into your system.

driver_ssl_cert_verify = True

driver_use_ssl = True

driver_ssl_cert_path = /my_location/ca_cert.pem

 

If you are having issues with certs loaded into the system you might encounter the error below in the cinder-volume logs. There is a known issue surrounding system certs and permissions, but luckily the optional parameter driver_ssl_cert_path will clear this error for you when the Cinder services are restarted.

 

SSL_unable_to_load_cert.PNG.png

 

If you are having issues with the cert itself being verified you will see an error back in the cinder-volume logs similar to the screenshot below. It is easy to determine the issue here, 'certificate verify failed' is self-explanatory - the cert could not be verified for use with the ECOM server. To get around this I would recommend generating a new cert from the ECOM server, if that still does not fix the issue, ensure that the ECOM server specified in your XML file is the same one that you are trying to pull the certs from.  Once you get the new cert from the ECOM server, either update it in your system loaded certs or update the path to the cert in cinder.conf, and restart Cinder services for the change to take effect.

 

SSL_wrong_cert.PNG.png

 

When specifying your ECOM host name in the associated XML configuration file of your VMAX back end, it is important to remember that it is the host name that is used here and not the fully qualified domain name (FQDN). To explain a bit better, a FQDN may look like a host name but it is actually a host name and a domain name together:

 

Host name: ecom_openstack

Domain name: openstack.prod.com

FQDN: ecom_openstack.openstack.prod.com

 

If you specify the FQDN in the XML file instead of the host name, you will get an error back that the host name supplied does not match the x509 certificate contents common name (example screenshot below). To fix this issue just remove the domain name part from the XML config file, leaving only the host name of the ECOM server, and as always, restart Cinder services for the changes to take effect.

 

SSL_fqdn.PNG.png

 


SMI-S/ECOM Server Troubleshooting

When installing Solutions Enabler and the SMI-S/ECOM components the process is fairly self-explanatory, the prompts for user input are only to ask if you want to change values from their recommended defaults, there isn't anything complicated about the process from start to finish.

 

Note: When installing Solutions Enabler & SMI-S, SMI-S is not set to install by default, you must explicitly choose to install this component when installing Solutions Enabler!


The ECOM is usually installed at /opt/emc/ECIM/ECOM/bin on Linux and C:\Program Files\EMC\ECIM\ECOM\bin on Windows. After you install and configure the ECOM, go to that directory and type TestSmiProvider.exe for windows and ./TestSmiProvider for Linux.  Use dv in TestSmiProvider to ensure your VMAX arrays are added.

 

Note: You must discover storage arrays on the SMI-S server before you can use the VMAX drivers. Follow instructions in the SMI-S release notes.  For detailed installation & configuration instructions please see the ‘Solutions Enabler 8.3.0 Installation & Configuration Guide’ and the ‘ECOM Deployment and Configuration Guide’.

 

What can happen is the ECOM server becomes unresponsive or does not operate correctly, in this case it is best just to restart the ECOM server to see if that helps. To restart the ECOM server use the following command on the ECOM server itself (note: the command below assumes the ECOM server is installed and run from the default install location):

 

$ cd /opt/emc/ECIM/ECOM/bin ; ./ECOM -d -c /opt/emc/ECIM/ECOM/conf

 

If when checking the cinder-volume logs you see reference to the ECONNREFUSED this is typically indicative of the ECOM server being inaccessible. Examples of this error are below, the error may vary from occurrence to occurrence but the key part of the log remains the same, the error or exception at the bottom with ECONNREFUSED referenced. If you face this issue in your environment restart the ECOM using the command above, check your network connections are still working as intended and that there is communication between your controller node & ECOM server, a simple ping request can confirm this. Both screenshots below show different ways the same error can be reported in the Cinder volume logs.

 

EcomDown.PNG.png

EcomDown2.PNG.png

 

 

PyWBEM Troubleshooting

PyWBEM is the client which allows the VMAX Cinder drivers to speak to the ECOM server in order to perform system management tasks. It is installed during the configuration of your OpenStack Cinder & VMAX environment, and although there is only one step required for setup it can on occasion produce some problems.

 

How you install PyWBEM varies depending on which version of Python you are using on your OpenStack nodes. If you are using Python 2 in your environment, please install PyWBEM 0.7.0 natively using the command:

 

Ubuntu: $ sudo apt-get install python-pywbem==0.7.0

RHEL/CentOS/Fedora: $ sudo yum install pywbem==0.7.0

OpenSUSE: $ sudo zypper install python-pywbem==0.7.0

 

If you are using Python 3, please install PyWBEM versions 0.8.4 or 0.9.0 using pip, or 0.7.0 using native package installation:

 

All: $ sudo pip install python-pywbem=={0.9.0/0.8.4}

Ubuntu: $ sudo apt-get install python-pywbem==0.7.0

RHEL/CentOS/Fedora: $ sudo yum install pywbem==0.7.0

OpenSUSE: $ sudo zypper install python-pywbem==0.7.0

 

Note: At the time of Ocata's release, PyWBEM 0.9.0 was the most up-to-date version available at the time. Since then, version 0.10.0 has been released, however, this version has not been verified for use with the VMAX Cinder drivers so we would recommend using versions 0.7.0, 0.8.4 or 0.9.0 as outlined above.

 

If you install an incorrect version of PyWBEM for your environment you will see an error in the cinder-volume logs similar to that in the screenshot below. PyWBEM is installed but it is not the correct version, as a result the connection is closed without returning any data (client times out).  Although this looks like any other connection error, we know it is related to PyWBEM thanks to the last trace before the error message which references specifically the PyWBEM package.

 

PyWbemWrong.PNG.png

 

To correct this issue, run the following commands (dependent on your previous installation method):

 

Ubuntu Native: $ sudo apt-get remove --purge -y python-pywbem

RHEL/CentOS/Fedora Native: $ sudo yum remove python-pywbem

OpenSUSE Native: $ sudo zypper remove --clean-deps python-pywbem

Pip: $ sudo pip uninstall pywbem

 

Reinstall PyWBEM afterwards using the correct installation method outlined in the installation & setup guide of this series of articles.

 

When installing PyWBEM on your system, there is another package dependency of PyWBEM that is installed at the same time - M2Crypto. There is no need to get into the specifics of what this package does, what is important in the context of this article is that from time to time this dependency does not install correctly and can cause issues with PyWBEM operations. Issues with M2Crypto manifest themselves in the Cinder volume logs in such a way that it looks like PyWBEM was not installed:

 

PyWbemMissing.PNG.png

 

Fixing this issue requires M2Crypto to be completely removed (purged) from the system and reinstalled through PyWBEM again (when purging M2Crypto from your system PyWBEM will be removed along with it). Depending on how you installed PyWBEM, the method of removal will vary (apt-get remove vs. pip uninstall):

 

 

DistroCommand
Ubuntu

$ sudo apt-get remove --purge -y python-m2crypto

$ sudo pip uninstall pywbem

$ sudo apt-get install python-pywbem

RHEL/CentOS/Fedora

$ sudo yum remove python-m2crypto

$ sudo pip uninstall pywbem

$ sudo yum install pywbem

OpenSUSE

$ sudo zypper remove --clean-deps python-m2crypto

$ sudo pip uninstall pywbem

$ sudo zypper install python-pywbem

 

 

Volume Type Troubleshooting

There is the possibility of human error during the creation of the volume types in OpenStack as it is a manual process, each volume type has to be created, then given a volume_backend_name property to tie it together with the the back end specified in cinder.conf. There are of course more properties you can associate with volume types to provide additional functionality such as QoS, but I am only going to focus on setting up the volume type at its simplest level. For troubleshooting specific functionality (which may involve adding new properties to a given volume type), please see the respective article where I discuss that piece of functionality.

 

One thing that might happen is a misspelling in either the key volume_backend_name or its associated value. In the case of the value being misspelled or not specified in cinder.conf, when you go to create a volume in OpenStack using that volume type you will get an immediate 'error' status on the volume. Whilst it might not be immediately obvious what has happened, and with no indication in the cinder-volume logs, we can look at the cinder-scheduler logs to see what has went wrong. From the scheduler logs it is possible to determine first that no weighed back end was found for the volume (no weighed back end means no valid or usable back end), then from a subsequent error message that there was no valid back end was found.

 

VolumeType_WrongNoneExistent.PNG.png

 

If we dig into the configuration of the back end in this case we find that the volume_backend_name used for the volume type is not the same as the volume_backend_name specified in the back end stanza in cinder.conf, hence the no valid back end found error. To fix this problem you have one of two options, either delete the volume type and create it again with the correct key/value pair, or change the value of volume_backend_name to what it should be as specified in cinder.conf. There is no need to restart Cinder services after either change to the volume type, they will both take effect immediately after the change. The same error as above will appear in the cinder-scheduler logs if volume_backend_name is not included or misspelt, an error will be thrown notifying you of no valid back found.

 

Network Connectivity Troubleshooting

There are two network types which are supported by the VMAX drivers for Cinder; iSCSI & Fibre Channel. Whilst we will recommend going to your storage admin to diagnose any issues with the storage network in your environment, there are a few checks which we can do beforehand to determine if there is a connectivity issue or if more troubleshooting is required. As every environment is different we will only be going through some basic checks to test connectivity mostly and some additional checks for iSCSI multipathing where it has been configured for use.

 

Note: I am assuming that at this point you have set up your port group for use with OpenStack and any other VMAX related configuration completed.

 

iSCSI Troubleshooting

Troubleshooting iSCSI environments is not difficult at the host level as we can use commands such as ping and iscsiadm to determine if we have connectivity and can discover/login to iSCSI targets on the VMAX. We are going to simulate an error with an iSCSI environment whereby volumes are not accessible via the provided port group. In this scenario I am trying to create a bootable volume, but when it comes to copying the image to the volume there are a number of messages which come up beforehand which indicate a problem before we actually see the error/exception.

 

iSCSI_error1.PNG.png

 

There are a number of exceptions which are thrown afterwards but the most relevant of these is the first. The first error message lets us know that there is a problem with the iSCSI connector, indicating a connectivity issue between the Cinder controller node and the VMAX.

 

iSCSI_error2.PNG.png

 

With the errors pointing at a connectivity issue the next place to look is in the port groups designated for use by that volume group, are the port groups valid? If so, is the port status of each port in the port group 'ON'? If the ports are marked as on you can test connectivity to them by using ping commands to the IP interfaces assigned to each iSCSI target port.

 

Note: You can only ping the VMAX iSCSI target ports when there is a valid masking view. An attach operation creates this masking view, but if you are testing iSCSI connectivity before using OpenStack, ensure you have MVs set up in advance of testing.

 

iSCSI_error3.PNG.png

 

Ping commands show that both IP interfaces on the VMAX are accessible via the storage network from the Cinder controller node, so we can deduce at this point that it is not a wider network problem, if it was it is very likely that ping commands would fail here and no response returned to the controller. With network connectivity confirmed, the next step is to test connectivity to the iSCSI targets using iscsiadm commands. The first command to test is iscsiadm discovery, whereby we check is the target accessible over the network.

 

iSCSI_error6.PNG.png

 

With the iSCSI target inaccessible via the IP interface we are starting to narrow in on our issue. With the ports being marked as 'on' in Unisphere, we know it isn't an issue there, so we can check the iSCSI interfaces themselves through the iSCSI dashboard. A quick check can reveal that that although the ports are up as seen in the port group, the associated iSCSI targets are not attached to an IP interface. Once these targets are attached to an IP interface, we can run the  same iscsiadm commands again to test connectivity.

 

iSCSI_error7.PNG.png

 

Successful iscsiadm discovery commands will return the IP address, port, and IQN of the iSCSI target (screenshot above).  We can take this one step further again to see is it possible to log in to the iSCSI target, confirming the functionality needed for our VMAX/OpenStack attach operations.

 

iSCSI_error8.PNG.png

 

When we return to OpenStack now to attempt another bootable volume creation, there are no issues this time with volume connectivity or launching an instance and attaching it to the bootable volume.

 

There are other errors which may be presented back when running the iscsiadm commands, an example being where the IP interfaces are up but the ports behind them are offline (found after checking the port status in the associated port group).

 

iSCSI_error4.PNG.png

 

The important thing to remember when troubleshooting iSCSI connections is the point at which the testing steps outlined here fail, that usually points towards the underlying issue:

  1. Are the ports in the port group online? If not, enable them and try the operation again
  2. Is the IP interfaces accessible via ping commands from the controller node? If not, check IP interfaces in iSCSI dashboard in Unisphere, create/enable if necessary, and try operation again
  3. Is it possible to discover the iSCSI targets behind the IP interface using iscsiadm discovery commands? Is the target IQN returned? If not, attach iSCSI target to IP interface and try operation again
  4. Is it possible to log in to the iSCSI target? Do you get a successful log notification back? If not, check iSCSI setup with storage admin, if the IQN is discoverable there might be restrictions on logging in

 

iSCSI Multipath Troubleshooting

When troubleshooting iSCSI multipath the process is similar to troubleshooting standard iSCSI connections, the only difference is that instead of just checking connectivity between hosts in your environment there is some additional configuration checks required. Setting up multipath in your environment requires a number of packages to be installed in the environment to support the functionality, along with extra flags to be set on the Cinder & Nova nodes and a multipath configuration file on each Nova node specific to the VMAX.  I will go over the configuration checks in this section, troubleshooting the connections is the exact same as troubleshooting standard iSCSI connections.

 

The first step in the process of troubleshooting iSCSI multipath is to ensure that all the required packages are installed. Checking the packages varies from distro to distro, but to just display the minimal required info use the commands below:

 

Ubuntu: $ sudo dpkg-query -W package_name

RedHat/SLES: $ sudo rpm -q --info package_name

mpio_1.PNG.png


If the package is shown in the list output from the command then it is installed successfully on the node, if you get a 'package not found' error then there is a problem with the package installation or it is not installed. Reinstall the missing package and attempt the iSCSI multipath operation again. Also, each of the packages listed as required for iSCSI multipath need to be installed on each of the nodes on your environment, so it is imperative that you check each node, having one node without the packages will result in failures in multipath operations.

 

If each node has all of the required packages installed, the next step is to check the /etc/multipath.conf file which contains VMAX specific configuration information for multipath functionality. The contents of the multipath files are detailed in the installation & setup guide from the start of this series, check this file to make sure it matches what is in the guide.  The multipath.conf file needs to be present on all Nova compute nodes in your environment, having one node without the configuration file will result in failures in multipath operations.

 

In addition to the multipath configuration file being required on all Nova compute nodes, there are extra flags which must be set on all Cinder and Nova nodes. On all Nova compute nodes, add the following flag in the [libvirt] section of /etc/nova/nova.conf:

 

iscsi_use_multipath = True

 

On all Cinder controller nodes, set the multipath flag to true in the [default] section of /etc/cinder/cinder.conf:

 

use_multipath_for_image_xfer = True

 

That is it in terms of required setup for multipath, at the end of all this you should have:

  1. The required packages installed on all nodes
  2. The VMAX multipath.conf file on all Nova compute nodes
  3. The iscsi_use_multipath = True flag set in the [libvirt] section of /etc/nova/nova.conf on all Nova compute nodes
  4. The use_multipath_for_image_xfer = True flag set in cinder.conf on all Cinder controller nodes

 

Once all the required steps are complete and you have checked packages and node configurations, restart the iSCSI & OpenStack services to ensure the changes are being propagated through the environment:

 

DistroCommands
Ubuntu

$ service open-iscsi restart

$ service multipath-tools restart

$ service nova-compute restart

$ service cinder-volume restart

RHEL/CentOS/SLES/openSUSE

$ systemctl restart open-iscsi

$ systemctl restart multipath-tools

$ systemctl restart nova-compute

$ systemctl restart cinder-volume

 

Once all of the configuration checks are complete and services are restarted, try to perform an operation in OpenStack in which multipath is tested with VMAX as the storage back end. If the operation is not successful and you are still getting errors back, start the iSCSI troubleshooting section which goes through pinging your iSCSI IP interfaces and running iscsiadm discovery and log in commands. If you find that some paths work but others do not, then you need to investigate those individual paths to see why they are unusable.

 

Fibre Channel (FC) SAN Troubleshooting

FC SAN troubleshooting is more complicated than troubleshooting iSCSI environments, a lot of the set up and configuration is done by the SAN admin and thus out of the scope of this guide as each environment is inherently different from the next.  If Zone Manager is used to manage your fabric, the OpenStack Official Openstack Ocata documentation on Zone Manager might provide some useful information. Apart from that all we can check is that the FC ports are up on the VMAX and the HBAs are up on the host and logged in to the fabric.  You can find out detailed information  about the FC HBAs in the folder /sys/class/fc_host/:

 

fc_1.PNG.png

 

The directories host2 and host4 in the example above contain information specific to each adapter like node name (WWN), port name (WWN), type, speed,state etc. Using the directory host names we can find detailed information about the HBAs using the systool command:

 

$  systool -c fc_host -v host2

fc_3.png

 

The most important parts of the output from the systool command are 'port_state' and 'fabric_name'. The port state indicates if the HBA is offline or online, and a value in the fabric name indicates the HBA is logged in to a SAN fabric.  If the port state is offline or there is no fabric name, you need to get your SAN admin to take a closer look to determine why.

 

Miscellaneous Issues

Oslo rpc_response_timeout

OpenStack Oslo use an open standard for messaging middleware known as AMQP. This messaging middleware (the RPC messaging system) enables the OpenStack services that run on multiple servers to talk to each other.

 

By default, the RPC messaging client is set to timeout after 60 seconds, meaning if any operation you perform takes longer than 60 seconds to complete the operation will timeout and fail.

 

rpc_timeout.PNG.png

 

Changing this default is very straightforward in OpenStack, you only need to change the rpc_response_timeout flag value in cinder.conf and nova.conf on all Cinder and Nova nodes and restart the services to increase this timeout value.

 

What to change this value to will depend entirely on your own environment, you might only need to increase it slightly, or if your environment is under heavy network load it could need a bit more time than normal. Fine tuning is required here, change the value and run intensive operations to determine if your timeout value matches your environment requirements.

 

Nova Block Device Allocation

Another operation in OpenStack with timeouts set by default is block device allocation in Nova. Similar to rpc_response_timeout, when an operation using block device allocation exceeds the default timeout of 60 seconds it will fail. As we are working with block storage on VMAX, this timeout may be exceeded if your environment is under heavy load or if the block device being allocated is bigger than your normal block device. This error will appear with the message 'block device mapping invalid', looking in the Nova compute logs will provide more insight into what is going on. We can see from the screenshot below that there was an issue waiting for the allocated time for block device mapping (note: I changed these values to force this error, so it is not normal to a failure after waiting 2 seconds and 2 attempts).

 

block_mapping1.PNG.png

 

To increase the block device allocation default times in Nova, change the values of the following flags in nova.conf on all Nova nodes and restart Nova services afterwards for the changes to take effect.

 

FlagDescription
block_device_allocate_retriesNumber of times to retry block device allocation on failures
block_device_allocate_retries_intervalThis option allows the user to specify the time interval between consecutive retries
block_device_creation_timeoutTime in secs to wait for a block device to be created

 

And it's a wrap!

I would like to thank you for joining me for this series I have put together over the last few weeks. I hope I have covered everything there is in the VMAX & OpenStack ecosphere, but if there is anything you believe I have missed then let me know in the comments below or via private message and I will see what I can do!  Next time around, it won't be Ocata I am looking at but the next release up in the OpenStack cycle, Pike! There are lots, and I mean lots, of changes coming in the next release, each of which I am really excited about writing about and sharing with the world, expect more as the time grows closer! Until then, good day!