SP3 is exciting release. Especially for those of us who want rush with NetWorker 9 updates. Why is that? Because SP3 brings many changes too and all of them - at least on paper - look nice and rather welcome. You know very well things are changing if you read following in official docs:
Changes to mdb? Oh yeah baby. Very same change seen in NW9 is now pushed down to NW8. And excitement doesn't stop there as there is more to come. Let's quickly explore what's new and changed:
- As seen above, we have new mdb. New mdb is no longer based on WISS database - it uses now SQLite. The nsrmmdbd process handles the migration from WiSS to SQLite automatically during NetWorker startup after the upgrade (3 stages). The database migration does not require any user intervention. If the database migration does not complete, the nsrmmdbd process notifies you that migration has failed and the legacy WiSS database continues to run to process jobs until the migration is successful. EMC recommends that you stop external applications which linked to the NetWorker media database during the migration, for example, DPA or DPA search activity. I briefly touched this subject in Part II: What is NetWorker 9? What are benefits? Check this out:
Object caching - A targeted cache facility that operates independently of file system or database caching to maintain recently used objects in memory for subsequent requests.
Parallel request processing (or simply multi-threading) - The database in previous versions of NetWorker was single-threaded, which means it could handle only one database request at a time. As a result, long transactions could delay the performance of any other transactions (for example, an operational request would have been queued until the bootstrap backup was completed). With multi-threading functionality, the database can handle multiple requests in parallel, so that operational requests are picked up and handled immediately without requiring completion of the bootstrap backup.
Request handling - Any request that takes multiple seconds gets logged. You can choose to log all requests by setting dbgcommand to level 1 or higher in debug mode.
Bootstrap compatibility - The mechanism being used for SQLlite is the same as the WiSS database mechanism. If a problem occurs after upgrading, you can use a bootstrap taken from the previous system and recover that information directly into the SQLdatabase. If you must temporarily roll back NetWorker but already performed backups using NetWorker 8.2 SP3 that you want to maintain, you can perform a bootstrap backup and recover the media database into the previous version, and that data is recovered into the WiSS database.
- NetWorker 8.2 SP3 features enhancements for clone controlled replication (CCR). These improvements include an auto multi streaming (AMS) feature to improve the DD to DD replication performance. Also included are changes to load balancing so that the load (save sets to clone) is spread evenly across the multi-threaded nsrclone process.
- NetWorker 8.2 SP3 features improvements to the NetWorker server. The responsibilities of the NetWorker server include serving RAP queries, updates to the resource database, scheduling save groups, and acting as a broker for device allocation. NetWorker 8.2 SP3 includes several enhancements to improve server throughput with a special focus on eliminating server unresponsiveness. These improvements include the following:
- Resolved blocking issues when communicating with unresponsive storage nodes.
- Improved responsiveness when a storage node fails.
- Internal DNS caches used by NetWorker daemons are now automatically populated by the nsrd process.
- Capability to start nsrsnmds in a rolling startup to prevent an unmanageable load on the nsrd process when many storage nodes are configured.
- User rights obtained during nsrauth queries are now cached. I'm not 100% sure here, but I suspect that this indicates previously all requests had to rechecked which was kind of annoying and most likely brought some performance hit. Now, this is cached so if should be way faster and without impact for the rest of ops.
- nsrwatch went through plastic surgery. Actually, it went through intensive body building process too:
- nsrwatch command line monitoring tool is updated to match much of the functionality in the Monitoring window in NMC
- refined and more flexible window presentation, including multiple window views
- move windows, and navigation hot keys
- basic device manipulation - label, mount, and unmount
- savegroup control - start, stop, and real-time group process
- New OVA files. You can recognize those as being version 22.214.171.124 (0.5TB, 4TB, EXT4 external proxy and upgrading ISO). Note that if you run 126.96.36.199 (only that version) you must get a patch before you upgrade.
- nsrrpcinfo -p output went through changes. I'm not sure why this is announced as something new in SP3 as I have seen it in 188.8.131.52 already (didn't test previous SP2 releases). Here is example from one server (Linux based) running recent SP2 version:
- Mac OS known as El Capitan is now supported by NW8.
- Wait! There is more for Mac OS folks. You can use DDBoost now too.
- LTO7 is now also supported (though its native capacity is not correctly set, but impact of that is pretty much non existent except to some message degree which 99% of people ignore).
- NetWorker's DNS cache time to live (TTL) values are now modifiable using the following NSRLA resource attributes (assuming referring to Negative cache):
- Positive DNS cache TTL: 1800;
- Negative DNS cache TTL: 1800;
- Another DNS related improvement comes in a way that the frequency that reverse name resolving is needed is reduced, improving performance.
- When restoring a block based backup (BBB) volume from the command line (CLI), NetWorker now prompts for confirmation before starting a destructive recovery. Obviously someone might have burned their fingers here. It is always good to have confirmation when doing destructive things.
- In NetWorker 8.2 SP3, the libDDBoost library is updated from version 184.108.40.206 to 220.127.116.11. What it means in respect to DDOS you use (if using DDBoost)? Here is the hint (so minimal version of DDOS here should be 5.5):
- In NetWorker 8.2 SP3, two parameters are added to nsrclone to manage concurrency during Data Domain to Data Domain replication with CCR: The max_total_dd_streams parameter, and the max_concurrent_groups parameter. Both parameters are set internally in NetWorker nsrclone via nsrcloneconfig (if not familiar with that config file, you can read Concurrent cloning and DDOS 5.4.x first). Here is how it works. The max_total_dd_streams parameter specifies the maximum number of Data Domain replication streams that each nsrclone process can use. You can restrict the maximum number of Data Domain replication streams to a lower value by setting this parameter when running multiple nsrclone processes in parallel. Use the following formula as guideline to determine this value: max_total_dd_streams=Total DD source replication streams/number of concurrently scheduled nsrclone processes. The maximum, and default, value for this parameter is 256. You can change the value from the default setting by modifying the parameter in /nsr/debug/nsrcloneconfig (eg., if DD supports 240 replication streams, you can restrict nsrclone to use a maximum of 240 replication streams via max_total_dd_streams=240). The max_concurrent_groups parameter specifies the maximum number of groups the total save sets are divided into. Based on the number of save sets, all groups are equally balanced. This parameter can be used to increase the number of parallel nsrclone threads per nsrclone process. The load (total number of save sets to clone) is equally spread across the nsrclone threads for load balancing. This ensures that all threads that are started by nsrclone complete their actions at approximately the same time. The default value for this parameter is 16.
- NetApp folks will love this; following support is added for NetApp SnapVault and NetApp SnapMirror operations to replicate data snapshots in C-mode:
- Support covers snapshot creation, snapshot replication, snapshot restore, and rollover of snapshots to tape or VTL storage.
- SnapVault takes a full PiT snapshot and stores only new and changed data.
- The source and destination volumes (qtrees) must exist before you configure a replication. The volumes may be on the same or separate NetApp devices on the same or separate servers, which may be NetApp virtual machines, and both volumes must be in an online state.
- The source volume must be read-writable and the destination volume must be a data protection type.
- You must configure SnapVault and SnapMirror replication policies on the NetApp devices.
- Validate the replication configuration by performing an initial replication operation. This makes the replication policies available as selections in the NetWorker client wizard.
- If the snapshot destination volume is written to, the destination becomes the new source volume and the relationship switches.
- To enable data recovery from NetApp replications, you must specify the NSR_MOUNTPOINT_NAME variable. NetApp devices on Linux operating systems cannot use a temporary mount point.
- In-place recoveries and out-of-place recoveries of an entire directory or save set are not supported. You may recover one or more files, provided the files do not make up the entire contents of a directory.
- A snapshot must be restored to a unique destination volume that is not used for other SnapMirror operations. If the same snapshot exists on the destination volume.
- Isilon FAST lovers got attention too. NetWorker 8.2 SP3 adds support for using the Isilon FAST incrementals feature.
I won't lie - there are couple of features here that made almost upgrade as soon as I read it. mdb and server performance improvements in first place. Replication enhancements second. And so on. But, my policy so far was to never apply release until within its third or fourth release. What does that mean? Apart from patches for NetWorker (that would be Z in W.X.Y.Z and you can find overview in NetWorker release management), NetWorker engineering will release service packs (that is Y is W.Z.Y.Z) like this. While patches are released on ftp, these service packs are released on support site and are downloadable from web or ftp. Also, their patch list is contained within release notes. In this case, 8.2 SP3 translates to 18.104.22.168 and key question every serious backup admin may ask is if this is right version for them. For example, recently 22.214.171.124 was released at almost the same time; what is better - 126.96.36.199 or 188.8.131.52? To get an idea, we need to compare fixes inside already released versions and you must make up your mind. Obviously, new SP is built at certain point in time and most likely does not contain all patches as 184.108.40.206 might, but it also may contain new features or RFEs or fixes you might be interested in.
Here is the list of 220.127.116.11 patches along with list where mentioned fix has been seen elsewhere.
|ID||Description||Fixed also in|
|240584||Failure of block based backups renamed recoveries of a single file within a cluster volume.||18.104.22.168, 22.214.171.124|
|239549||After upgrade to 8.2.x from 8.0.1, nsrvim binary runs on NetWorker server instead of hypervisor.||126.96.36.199, 188.8.131.52|
|239529||Storage node nsrsnmd database keeps re-starting on Windows storage nodes.||184.108.40.206, 220.127.116.11|
|239310||Unknown operation messages logged in daemon.raw log following nsrim -X running.||NA|
|237895||Direct-NDMP clone job from Windows 2012 R2 NetWorker server fails if NDMP save set spans multiple tapes.||18.104.22.168|
|236308||Restarted VADP group notification marks group as failed for completed group.||22.214.171.124|
|235186||Fixed issue of NetWorker client 8.1 failures on Linux.||126.96.36.199, 188.8.131.52|
|235162||After migrating from Solaris to Linux and upgrading from 184.108.40.206 to 220.127.116.11 nsrd cored.||18.104.22.168|
|242238||Synthetic full backup failed due to verification failure.||22.214.171.124|
|242040||Avoid nsrexecd unresponsiveness by performing DNS lookups outside of lock protection in liblocal/is_mynam.c.||NA|
|241542||NetWorker does not show block based backups in NMC Recover window.||126.96.36.199|
|241149||Percentage completion for successfully Cloned save sets is showing less than 100%.||188.8.131.52, 184.108.40.206|
|239567||When using the group clone option "clone on each save set completion" and nothing was backed up, there was a clone failure notification triggered.||220.127.116.11, 18.104.22.168|
|237810||Windows 2008 R2 NetWorker server daemon.raw not updated timely affecting automated runtime rollover and rendering.||22.214.171.124, 126.96.36.199|
|239369||nsradmin fails in visual mode when tried to create an NSR device resource.||188.8.131.52|
|243281||xdrfr_destroy() does not NULL out pointers after freeing memory.||NA|
|243582||gstclreport does not honor "-C Client name" switch in the Data Domain Statistics report.||184.108.40.206, 220.127.116.11|
|242746||Since upgrade to 18.104.22.168, every GSS failure causes an increased number of message 200k lines generated for 32 clients with certificate issues.||NA|
|242705||nsrd coredumps due to "assert" in nsrsvc_run().||22.214.171.124|
|237378||nsrexecd killing nsrdasv by SIGKILL causes Domino server failure.||126.96.36.199, 188.8.131.52|
|237019||Semaphore timeouts result in core dumps owing to data being freed while still in use by another thread.||184.108.40.206, 8..2.2.4, 220.127.116.11|
|230290||jobquery group detail output is incorrect and results in bad data in NMC.||18.104.22.168, 22.214.171.124|
|244929||Backup Overview in NMC Recover window is unable to change to a monthly schedule on Japanese environment.||126.96.36.199, 188.8.131.52|
|243716||After an upgrade to NetWorker 184.108.40.206, NMC no longer shows the Clone Start Time under Show Clone History.||220.127.116.11, 18.104.22.168|
|243027||Granular level restore operation is selecting multiple items fails with error, "Object reference not set to an instance of an object."||22.214.171.124|
|242999||nsrjobd emits excessive error messages of the following: parent job not found.||126.96.36.199, 188.8.131.52|
|242410||The size of NDMP save set spanning multiple tapes reported under successfully completed section under Group details in NMC Monitoring tab is incorrect.||184.108.40.206, 220.127.116.11|
|244864||Recover wizard stalls at 5% progress when recovering from a clone volume.||18.104.22.168|
|246567||After upgrade to NetWorker 8.2.x, nsrsnmd is unstable on more storage nodes on backup server when trying to start a node and start NetWorker.||NA|
|241053||NetWorker does not move to the next file when encounters an I/O error reading a file.||22.214.171.124, 126.96.36.199, 188.8.131.52|
|239893||Performance enhancement for NetWorker CCR a.k.a DD-DD Managed File Replication.||NA|
|247305||BBB recovers from NMC fails with the following error: "unable to connect to Networker server."||184.108.40.206, 220.127.116.11|
|247366||NMC in French reports successful clone status as Echec (Failed).||NA|
|246925||NetWorker server stops responding, after a Storage node does not respond on time.||NA|
|247484||NDMP backups failed when save set names contained special characters on Windows.||18.104.22.168, 22.214.171.124, 126.96.36.199|
|232803||The NMC Software Administration wizard stops responding at 33%.||188.8.131.52, 184.108.40.206, 220.127.116.11|
|249237||NMC database backup fails after update to version 18.104.22.168 savepsm: FAILED HY000 -697 Permission denied.||22.214.171.124, 126.96.36.199|
|248858||gstclreport returns null followed by usage when -C Status is used.||188.8.131.52, 184.108.40.206|
|247419||Snapshot discovery fails with Path not found message.||NA|
|246787||Fixed typo in man pages for nsrpush: nsr_push.||NA|
|246661||To avoid confusion, the Next button is now disabled in the Select the Recovery hosts page in the Recover window if a file system does not have any backup files.||NA|
|245187||Cloning from a DD to an LTO volume becomes unresponsive with the following error: Media Notice: Volume `xxx' ineligible for this operation; Need a different volume from pool `xxx.'||220.127.116.11, 18.104.22.168|
|250070||Clients with the 'Data Domain Device' option selected fail to clone to tape media, since this value is applied to the clone-write (re-save), with the following error message: 'no matching IP interface data domain devices for save of client.'||NA|
|250709||NetWorker version 22.214.171.124 Build 815 became unresponsive very often.||NA|
|251485||When you update the NetWorker server to version 126.96.36.199 from version 8.1.x.x, after selecting the Restore option in NMC, a blank gray window appears with an OK button only.||NA|
|247691||VADP restores fail for virtual machines with snapshots at the time of backup in VC/ESX version 6.0.||188.8.131.52|
|251822||nsrsnmd startup caused high load on server and failed with 100's of storage nodes.||NA|
|249664||When you type the command nsradmin -p nsrexec -C -y "NSR peer information", all peer errors were not corrected.||184.108.40.206|
|250028||Incorrect message received when tape drives become full.||NA|
It is hard to make call based on above - especially since this is not full list. If you know bug id you had in past, you may wish to check it with support before proceeding with upgrade to SP3.