NetWorker Exchange failure – The group or resource is not in the correct state to perform the requested operation.

As with any windows system backup, we sadly have to deal with VSS. VSS is the bane of any backup administrators existence. Add in Exchange, and you have another layer of services that require the correct orchestration to ensure protection.

In today’s post, we revisit our old friend VSS and provide some additional insight into NetWorker and the associated services that help coordinate backups.

I had some new Exchange systems to add to backup rotation. These had Exchange 2013 on W2k12.

We started to see the following errors:

Networker error messages:
APPLICATIONS:\Microsoft Exchange 2013: Backup of [APPLICATIONS:\Microsoft Exchange 2013] failed
rsnap_vss_save:NMM .. error caught calling RM to commit the replica.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 024168 ERROR:The VSS shadow copy for Exchange Replication Service was not successful because of an error that occurred earlier.
83394:nsrsnap_vss_save:NMM .. cluster failover or application role change after a replica created may have caused snapshot creation failure, try to restart the backup.
Microsoft DiskPart version 6.3.9600
Copyright (C) 1999-2013 Microsoft Corporation.
On computer: exchange-node
Automatic mounting of new volumes enabled.
37959:nsrsnap_vss_save:The group or resource is not in the correct state to perform the requested operation.
63335:nsrsnap_vss_save:NMM backup failed to complete successfully.
Internal error.
102333 1447959921 3 0 0 15084 14984 0 server nsrsnap_vss_save NSR error 21 Exiting with failure. 0

Corrective actions taken:
Rebooted Exchange servers
Removed networker client , NMM,

And some of the following from the event viewer:
[RpcHttp] Marking ClientAccess 2010 server server(https://server/rpc/rpcproxy.dll) as unhealthy due to exception: System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
[Autodiscover] Marking ClientAccess 2010 server server (https://server) as unhealthy due to exception: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.Exchange.HttpProxy.ProtocolPingStrategyBase.Ping(Uri url)

The description for Event ID 5068 from source Replication API Service cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

2015 11 19 00:00:01

The specified resource type cannot be found in the image file

PS: (CPSImportService::GetOperationStatus) ERR: Error for operation 1
NMM .. Error during import of snapshot: PS .. Error completing snapshot import
NMM .. Registration of snapshot set failed — PS .. Error completing snapshot import
Microsoft Exchange VSS Writer backup failed. No log files were truncated. Instance d93e725e-486d-440f-8397-93061460ee02. Database 8bee9b3c-8fff-405e-a42d-b739d05caadd.
The Microsoft Exchange Replication service VSS Writer (Instance d93e725e-486d-440f-8397-93061460ee02) failed with error FFFFFFFC when processing the backup completion event.
An internal transport certificate will expire soon. Thumbprint:EE063967EB62D17668AD392286EFF96622F84BFC, hours remaining: 1594

What we did?

We rebooted the system, attempted to restart some of the associated VSS services. No Joy. Opened a ticket with EMC, Anthony gave me a call back. I had worked with him before, so I was glad to hear from him.

On your Exchange Server run the following commands and give their output:


  1. Vssadmin list writers
  2. Vssadmin list shadows
  3. If the above command is listing any shadow copies, perform following steps:
  4. type diskshadow
  5. type list shadows all
  6. type delete shadows all
  7. Stop all networker services on client, verify all services are stopped aside from nsrpm.
  8. While NetWorker services are stopped restart Replication Manager RMAgentPS and make sure the other Replication Manager service is stopped as well.
  9. Restart NetWorker services
  10. In cmd prompt type tasklist | findstr nsr

We retried the backup and it worked.

A little insight into some of these services and the roles they play. We are familiar with nsrexecd and Powersnap. We also had to restart the RMAgentPS service. This service executes operations for Replication Manager Client for RMAgentPS. In addition we had to stop the Replication Manager Exchange Interface. This service executes Exchange commands for replication manager. In the end there were some stale shadows and we needed to restart the required service in the correct order. Of course also remember to make sure the Exchange VSS writer is present.




no comments


Troubleshooting Conflicting NSR peer Information errors

Recently, our server team migrated some data to a new server, keeping the same name and IP for the client.

After reinstalling the NetWorker client I ran a test backup and noticed the following filling up the log,

Error: Conflicting NSR peer information resources detected for host. Please see server log for more information.

This is a pretty easy thing to fix, a quick google of this error and you will easily find the solution, but what does it mean and why is this error being produced?

Below from the man page.

The NSR peer information resource is used by NetWorker authentication daemon nsrexecd. Resources of this type are populated/created by NetWorker. They are used to hold the identity and certificate of remote NetWorker installations that the local installation communicated with in the past. These resources are similar to known_hosts file used by ssh(1). Once a NetWorker installation (client, server, or storage node) communicates with a remote NetWorker install (client, server, or storage node), a NSR peer information resource will be created on each host and will contain information about the peer (i.e. identity and certificate). During this initial communication, each host will send information about itself to the peer. This information includes the NW instance name, NW instance ID, and the certificate. After this initial communication, each NetWorker install will use the registered peer certificate to validate future communications with that peer.

So, it goes without saying that when a system is rebuilt or a new system is built with a previously used name, this certificate will change.

The resolution below is from the following link;


Delete the NSR Peer Information of the NetWorker Server on the client/storage node.
Then delete the NSR Peer Information for the client/storage node from the NetWorker Server.

Please follow the steps given below to delete the NSR peer information on NetWorker Server and on the Client.

1. At NetWorker server command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information; name:client_name

Specify the name of the client/storage node in the place of client_name.

1. At the client/storage node command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information



no comments


The most recent checkpoint for the VBA appliance is outdated

Recently found one of our VBA appliances was producing the following error in vSphere

error> The most recent checkpoint for the VBA appliance is outdated

cp.20150714161043 Tue Jul 14 10:10:43 2015 valid hfs — nodes 1/1 stripes 344
cp.20150714172020 Tue Jul 14 11:20:20 2015 valid hfs — nodes 1/1 stripes 344

Opened a ticket with EMC and the following procedure was provided.

dpnctl status
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: gsan status: up
dpnctl: INFO: MCS status: up.
dpnctl: INFO: Backup scheduler status: up.
dpnctl: INFO: axionfs status: up.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Unattended startup status: enabled.

A status.dpn gives us a little more information on some of the maintenance services

Tue Jul 21 09:35:44 MDT 2015 [VBA01] Tue Jul 21 15:35:44 201 5 UTC (Initialized Tue Nov 4 20:34:05 2014 UTC)
Node IP Address Version State Runlevel Srvr+Root+User Dis Suspend Loa d UsedMB Errlen %Full Percent Full and Stripe Status by Disk
0.0 7.0.62-10 ONLINE fullaccess mhpu+0hpu+0hpu 4 false 0.5 8 7586 15696646 6.1% 6%(onl:116) 6%(onl:116) 6%(onl:115)
Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwrit eable
System ID: 1415133245@00:50:56:88:1A:CC
All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0hpu)
System-Status: ok
Access-Status: full
Last checkpoint: cp.20150714172020 finished Tue Jul 14 11:20:40 2015 after 00m 2 0s (OK)
No GC yet
Last hfscheck: finished Tue Jul 14 11:28:07 2015 after 07m 19s >> checked 343 of 343 stripes (OK)


Although maintenance is running cp, gc and the hfscheck were suspended?


Maintenance windows scheduler capacity profile is active.
WARNING: cp is suspended permanently.
WARNING: gc is suspended permanently.
WARNING: hfscheck is suspended permanently.
The maintenance window is currently running.
Next backup window start time: Tue Jul 21 20:00:00 2015 MDT
Next maintenance window start time: Wed Jul 22 08:00:00 2015 MDT

The following commands will set hfscheck, gc and cp to on permanently.

avmaint sched resume hfscheck –permanent –ava
avmaint sched resume gc–permanent –ava
avmaint sched resume cp–permanent –ava

You may want to create a checkpoint manually
dpnctl stop maint
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: Suspending maintenance windows scheduler…

avmaint checkpoint –ava
<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>

dpnctl start maint
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: Resuming maintenance windows scheduler…
dpnctl: INFO: maintenance windows scheduler resumed.

no comments


NetWorker Build 783 released

NetWorker Build 783
Publication Date: 2015-JUN-08

234898 ESC		NW_VSS		Escalation 23776:Large SQLDatabase Backups are not possible(VDI Backup)
234595 ESC		NW_Console	Escalation 23700:VBA: VMware protection policy details shows another days backup information
233518 ESC		NW_VSS		Escalation 23701:NMSQL recovered the different data than what is being requested
232892 ESC		NetWorker	Escalation 23391: device mismatch errors in other devices after device deletion that prevents devices from working.
232110 ESC		NetWorker	Escalation 23548:nsrjobd crashes while performing RMAN recovery
229042 ESC		NW_VSS		Escalation 23085:nsrsnap_vss_save crashes in a 25 node Hyper-V Windows 2012 R2 core setup
225356 ESC		NW_VSS		Escalation 22686:NW00162213-NW162213:EXCHANGE 2010 SP3 RU5 backups failing with error VSS_E_WRITERERROR_RETRYABLE
233253 BUG		NetWorker	VBA stuck in query pending forever in case of problems
232507 BUG		NetWorker	ESC 23654 - VBA Policy status shows as failed but not clients are listed in NMC
232306 BUG		NetWorker	ESC 23393:VBA jobs show nothing in waiting to run in NMC even when there are jobs in queued state on VBA and failed                                         VMs show no error
185641 (NW160391) BUG	NetWorker	No info to user when backups are not run due to No Eligible Proxies during hot-add only backup mode
206681 (NW159953) ESC	NetWorker	Cannot label blank tape after upgrade to NetWorker version 8.1 on AIX - different than NW156909
206213 (NW157573) ESC	NetWorker	Library getting down after upgraded from to even to
204138 (NW159899) ESC	NetWorker	auto inventory of HP MSL libraries doesnt work in 8.1
200652 (NW161429) ESC	NetWorker	Skips of scheduled clone jobs show as interrupted in the NW gui After upgrading to NW


no comments


Cannot access NetWorker VBA GUI

Had an issue recently where the VBA config and FLR GUI’s were inaccessible.  It was easy enough to stop and start tomcat with the emwebapp script, but it didn’t work. EMC provided this process to re-register the certificate also.


1. Stop emwebapp
emwebapp.sh – -stop

2. Back up existing keystore
cp /root/.keystore /root/.keystore.sav

3.  List tomcat certificate – should see 1 certificate
/usr/java/latest/bin/keytool -list -keystore /root/.keystore -storepass changeit -alias tomcat

4. Delete tomcat certificate from keystore
/usr/java/latest/bin/keytool -delete -alias tomcat -storepass changeit

5. List tomcat certificate again – should return empty
/usr/java/latest/bin/keytool -list -keystore /root/.keystore -storepass changeit -alias tomcat

6. Regenerate certificate using SHA256
/usr/java/latest/bin/keytool -genkeypair -v -alias tomcat -keyalg RSA -sigalg SHA256withRSA -keystore /root/.keystore -storepass changeit -keypass changeit -validity 3650 -dname “CN=localhost.localdom, OU=Avamar, O=EMC, L=Irvine, S=California, C=US”

7. List tomcat certificate again – should see 1 certificate

8. Start emwebapp.sh
emwebapp.sh – -start

no comments


Troubleshooting NetWorker Disaster Recovery Backup Failures

Occasionally, I have found that all backup savesets will complete with the exception of the disaster recovery portion?

One thing to check is to ensure all volumes are online. This is required for the disaster recovery backup to complete.

Open a command line on the client and use the diskpart utility.


Always, rescan first,


When complete list volumes to see if any are offline

DISKPART>list vol

Identify the volume offline and put it online

DISKPART>select volume 1

DISKPART>online volume



no comments


Unexpected Connection error with NetWorker and VBA

I was really looking forward to another idyllic day as a Backup Administrator.

Those days usually begin troubleshooting a few backup failures, drinking a lot of coffee and planning world domination. Sadly, I actually had to do some work today instead.

I found all my VMware protection policies had failed the previous evening.

error: Unable to connect to VBA, error Cannot establish session to VBA.
Logged onto vSphere and attempted to browse the backup recover area. Where we found the following error:

error: An unexpected connection error occurred and the cause could not be determined, Please check your EBR configuration screen to troubleshoot, or contact an administrator.

We rebooted the appliance and and it seemed to fix the issue for one. Ended up delving into the nuts and bolts of the various resources required to create a VBA backup. Let’s review these components.

EBR Config GUI

You can access the configuration interface by browsing to:


Login via the root credentials of the appliance. The default password is 8RttoTriz

Note: I don’t know about you, but try as I might I could not access this via chrome.


In the GUI you can view running service and restart if required. You can also configure the connection to NetWorker. To do this you will need to know the password for the vmuser account. Near as I can tell the is an ID internal to NetWorker that is used to establish communication. The default password is “changeme”


We hadn’t change this password. If you would like to change the password this can be done on the NetWorker server properties page under the misc tab. We also were not sure what the password was? Yeah, I know.

Re-establish communication between the appliance and NetWorker

Go to the NetWorker Config tab. Here you can enter the password and save to re-establish (and confirm the password) the interface with NetWorker. You will need to reboot the appliance after this.

6-24-2015 2-30-16 PM



This did resolve our issue and we can now browse the Backup Recovery interface in vSphere. With this re-established backups should run tonight. Fingers crossed.


no comments


NetWorker Build 774 has been released.

NetWorker Build 774 has been released.

It can be downloaded from ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/8.2/
This package contains the following cumulative fixes:

ID Details
(NW161917) ESC	NetWorker Escalation 22509: NSM DB2 PIT restore doesn't restore connecting directories' ACL on AIX
NW161624) ESC	NetWorker Escalation 22307: Device discovery raises alerts if udev-named library handle already configured: 14249:dvdetect: 'skipped as requested'
ESC	NetWorker Escalation 22567: Error counts not correctly handled with nsrsnmd & cdi changes
(NW159916) ESC 	NetWorker Escalation 21778: Disable "label" operation in AMM functionality: DataDomain devices unmount with "RPC severe Lost connection to media database"

no comments


Troubleshooting Cloning – Part 2 – Clone Wars

The clone wars continue. As you may recall from the Part 1, my NetWorker optimized clones have been hanging with this misleading error:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

To further complicate things, control over the clone from the console can be limited. I had been using the jobkill utility. Preston has a great write up on it here. The issue is that after killing the clone, the NMC console shows it as running still? Attempts to restart the clone via NMC resulted in the following error:

1 1429638078 event task manager Task aborted: task ‘clone.name Clone’ is already running

So is the job kill, really murdering all the required processes? Lets take a look.

My new env is Windows. So lets visit our old friend the task manager, here we want to look for nsrtask and nsrclone processes. First nsrclone. I’ve had to obfuscate the output, but what I found were the appropriate running clones. The specific job I needed to restart was not listed.



The same cannot be said for the clone jobs associated nsrtask process. There was indeed a process still hanging around.


After killing it, the state in NMC changed from Running to Interrupted. I could then restart the job.




All this just to get my clone going again. This is some progress as I had previously been restarting NetWorker, interrupting the service and other running clones.

no comments


Troubleshooting Cloning – Part 1

I’ve had some issues with cloning recently. It’s been an interesting issue that will require some more analysis, but for now I have a work around that has helped narrow the issue down.  It all started with this error on a running clone:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

After checking the device is mounted I started to search for other issues. As usual I found a great post over at Preston’s blog. There he explains:

“A core component in NetWorker’s media database design is that a saveset can only ever have one instance on a piece of media. This applies as equally to failed as complete saveset instances.The net result is that this error/situation will occur because it’s meant to – NetWorker doesn’t permit more than one instance of a saveset to appear on the same piece of physical media.”

So what I surmise happens is there is a failure during the clone operation. The aborted saveset on the destination needs to be removed or excluded. I have not been able to formulate an mminfo command to find aborted clone savesets yet? See part 2 when published. For now I would be satisfied to exclude the savesets. This specific clone job in question provides DR protection for one of my NetWorker servers. That is, it clones index, bootstrap and filesystem saves to an offsite DD. For reasons I cannot explain here at this time this clone was configured to go back 6 months. I know.



So after the clone had run for a few minutes I opened the clone properties and hit the “Preview saveset” button. What I found was there was a particular group of savesets from a specific date in the past identified and that date alone. My assumption is those savesets already exist on the destination and the clone job is not intelligent enough to identify and skip. The error “waiting for volume” is really misleading, at least to me. After greatly narrowing to number of days to look back and rerunning the clone job completed successfully.

Stay tuned for Part 2 -Identifying aborted clone savesets. Do you know how? Comment below!

no comments

Back to top