6
November

AWS – Introduction to Glacier

no comments

1
November

AWS – Introduction to VPCs

no comments

27
October

AWS – Introduction to EBS

no comments

25
October

AWS – Introduction to EC2

no comments

24
October

AWS – Introduction to S3

no comments

13
June

A new start

I started this blog in 2012 when I was assigned to a new client and had to get up to speed with NetWorker. The purpose was to capture any and all learning’s here. Four years and close to 80 posts later, it has been an invaluable tool to capture and share knowledge. It’s not unusual for me to look up past issues here or to google them, only to be redirected to my own blog. I never considered an issue resolved, until it was captured here. Not unusual for others to find their way here also. Stats for the last month show 656 hits, mostly from India and the U.S. Its good to know I’m not here alone screaming in the dark.

 

 

Today I find myself at the start of a new opportunity with many great challenges and things to learn. So, expect this blog to not only be a great place for capturing knowledge for NetWorker and Avamar. It will now be a repository for new learning’s related to my old friend NetBackup, as well as storage and virtualization.

no comments

13
June

Getting familiar with cmode

Had a request from my client to upload some logs. These logs were required for some c-mode systems and required access to the systemshell. Having completed some troubleshooting recently for 7-mode, the process was not altogether unfamiliar but was slightly different

The diag user is required. So lets check the status. Is it locked? Do we know the password? Lets hope.

blob0::> security login show -username diag

Username Application method Role Name Acct locked
diag            console        passwd  admin           no

Enter priv mode
set -privilege advanced

After confirming the password we can access the systemshell

blob0::>system node systemshell -node blob0n01

Fascinating, I know

no comments

29
November

Avamar – expiring snapups

 

 

il_fullxfull.329221825

In a perfect world, you should never need this command. As my friend Ian Anderson wrote in a great Ask the experts session  where he spoke of achieving  “Avamar Zen”.

Avamar Zen is a state of harmony, where you have achieved a steady state of data ingestion vs data expiration. Where hopefully, you have more data expiring and being cleaned by the garbage collection than you have new data coming in.  However, zen can be hard to achieve. Avamar is an amazing product. If the SE’s have done their job and sized it properly, you should realize steady state. What does happen sometimes is, the client is so impressed they begin adding more systems and workloads that were outside of the initial sizing scope.

Years ago, when I was embedded onsite we ran into such an issue. It wasn’t so much about adding to many systems, but one in particular.  We had some groups, one configured to cross mount points and another to only protect local data. A co-worker spun up a new system and instead of checking with me, added the system himself to Avamar, and the wrong group.

The next day I arrive and find my Avamar grid is filled, also this was replicated over to the secondary. Quite the mess. So just roll the system back to a previous checkpoint? You may think, unfortunately when an Avamar system has reached capacity there is not enough space for the required overhead to engage the checkpoint roll-back. Now, lets meet our friend expire-snapshots.

What does it do? What do you think it does? It expires snaps! Awesome, right? What is really cool is how it does this. The command runs with switches where you can granulary target specific data to remove. For example, if you wanted to remove all data from Nov 30, 2015 and the previous 25 days, you would run the following

expire-snapups –before=’2015-11-30′ –days=25 –domain=/ > do-expire.sh

This will create a script in tmp you can then run and wipe out the offending data.

There are other switches available to target specific data and clients. When complete, settle in for a long garbage collect to run and turf the offending client.

When complete, I hope you can achieve “Avamar Zen” as I did. I also changed the admin password, so my helpful co-worker could not again repeat the same mistake.

 

no comments

20
November

NetWorker Exchange failure – The group or resource is not in the correct state to perform the requested operation.

As with any windows system backup, we sadly have to deal with VSS. VSS is the bane of any backup administrators existence. Add in Exchange, and you have another layer of services that require the correct orchestration to ensure protection.

In today’s post, we revisit our old friend VSS and provide some additional insight into NetWorker and the associated services that help coordinate backups.

I had some new Exchange systems to add to backup rotation. These had Exchange 2013 on W2k12.

We started to see the following errors:

Networker error messages:
APPLICATIONS:\Microsoft Exchange 2013: Backup of [APPLICATIONS:\Microsoft Exchange 2013] failed
rsnap_vss_save:NMM .. error caught calling RM to commit the replica.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 027114 ERROR:Exchange Replication Service is stable. The error is VSS_E_WRITERERROR_RETRYABLE. The code is: 0x800423f3. Check the application event log for more information.
49931:nsrsnap_vss_save:RM .. 024168 ERROR:The VSS shadow copy for Exchange Replication Service was not successful because of an error that occurred earlier.
83394:nsrsnap_vss_save:NMM .. cluster failover or application role change after a replica created may have caused snapshot creation failure, try to restart the backup.
Microsoft DiskPart version 6.3.9600
Copyright (C) 1999-2013 Microsoft Corporation.
On computer: exchange-node
Automatic mounting of new volumes enabled.
37959:nsrsnap_vss_save:The group or resource is not in the correct state to perform the requested operation.
.
63335:nsrsnap_vss_save:NMM backup failed to complete successfully.
Internal error.
102333 1447959921 3 0 0 15084 14984 0 server nsrsnap_vss_save NSR error 21 Exiting with failure. 0

Corrective actions taken:
Rebooted Exchange servers
Removed networker client , NMM,

And some of the following from the event viewer:
[RpcHttp] Marking ClientAccess 2010 server server(https://server/rpc/rpcproxy.dll) as unhealthy due to exception: System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
**************************************************************************************************************************
[Autodiscover] Marking ClientAccess 2010 server server (https://server) as unhealthy due to exception: System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.Exchange.HttpProxy.ProtocolPingStrategyBase.Ping(Uri url)
****************************************************************************************************************************

The description for Event ID 5068 from source Replication API Service cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

2015 11 19 00:00:01
exchange-node

The specified resource type cannot be found in the image file
***********************************************************************************************************************

********************************************************************************************
PS: (CPSImportService::GetOperationStatus) ERR: Error for operation 1
*********************************************************************************************
NMM .. Error during import of snapshot: PS .. Error completing snapshot import
**********************************************************************************************.
NMM .. Registration of snapshot set failed — PS .. Error completing snapshot import
****************************************************************************************************
Microsoft Exchange VSS Writer backup failed. No log files were truncated. Instance d93e725e-486d-440f-8397-93061460ee02. Database 8bee9b3c-8fff-405e-a42d-b739d05caadd.
********************************************************************************************************************
The Microsoft Exchange Replication service VSS Writer (Instance d93e725e-486d-440f-8397-93061460ee02) failed with error FFFFFFFC when processing the backup completion event.
********************************************************************************************************************
An internal transport certificate will expire soon. Thumbprint:EE063967EB62D17668AD392286EFF96622F84BFC, hours remaining: 1594
******************************************************************************************************************

What we did?

We rebooted the system, attempted to restart some of the associated VSS services. No Joy. Opened a ticket with EMC, Anthony gave me a call back. I had worked with him before, so I was glad to hear from him.

On your Exchange Server run the following commands and give their output:

 

  1. Vssadmin list writers
  2. Vssadmin list shadows
  3. If the above command is listing any shadow copies, perform following steps:
  4. type diskshadow
  5. type list shadows all
  6. type delete shadows all
  7. Stop all networker services on client, verify all services are stopped aside from nsrpm.
  8. While NetWorker services are stopped restart Replication Manager RMAgentPS and make sure the other Replication Manager service is stopped as well.
  9. Restart NetWorker services
  10. In cmd prompt type tasklist | findstr nsr

We retried the backup and it worked.

A little insight into some of these services and the roles they play. We are familiar with nsrexecd and Powersnap. We also had to restart the RMAgentPS service. This service executes operations for Replication Manager Client for RMAgentPS. In addition we had to stop the Replication Manager Exchange Interface. This service executes Exchange commands for replication manager. In the end there were some stale shadows and we needed to restart the required service in the correct order. Of course also remember to make sure the Exchange VSS writer is present.

 

 

 

no comments

29
July

Troubleshooting Conflicting NSR peer Information errors

Recently, our server team migrated some data to a new server, keeping the same name and IP for the client.

After reinstalling the NetWorker client I ran a test backup and noticed the following filling up the log,

Error: Conflicting NSR peer information resources detected for host. Please see server log for more information.

This is a pretty easy thing to fix, a quick google of this error and you will easily find the solution, but what does it mean and why is this error being produced?

Below from the man page.

The NSR peer information resource is used by NetWorker authentication daemon nsrexecd. Resources of this type are populated/created by NetWorker. They are used to hold the identity and certificate of remote NetWorker installations that the local installation communicated with in the past. These resources are similar to known_hosts file used by ssh(1). Once a NetWorker installation (client, server, or storage node) communicates with a remote NetWorker install (client, server, or storage node), a NSR peer information resource will be created on each host and will contain information about the peer (i.e. identity and certificate). During this initial communication, each host will send information about itself to the peer. This information includes the NW instance name, NW instance ID, and the certificate. After this initial communication, each NetWorker install will use the registered peer certificate to validate future communications with that peer.

So, it goes without saying that when a system is rebuilt or a new system is built with a previously used name, this certificate will change.

The resolution below is from the following link;

https://community.emc.com/docs/DOC-20085

Delete the NSR Peer Information of the NetWorker Server on the client/storage node.
Then delete the NSR Peer Information for the client/storage node from the NetWorker Server.

Please follow the steps given below to delete the NSR peer information on NetWorker Server and on the Client.

1. At NetWorker server command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information; name:client_name
delete
y

Specify the name of the client/storage node in the place of client_name.

1. At the client/storage node command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information
delete
y

 

 

no comments

Back to top