29
July

Troubleshooting Conflicting NSR peer Information errors

Recently, our server team migrated some data to a new server, keeping the same name and IP for the client.

After reinstalling the NetWorker client I ran a test backup and noticed the following filling up the log,

Error: Conflicting NSR peer information resources detected for host. Please see server log for more information.

This is a pretty easy thing to fix, a quick google of this error and you will easily find the solution, but what does it mean and why is this error being produced?

Below from the man page.

The NSR peer information resource is used by NetWorker authentication daemon nsrexecd. Resources of this type are populated/created by NetWorker. They are used to hold the identity and certificate of remote NetWorker installations that the local installation communicated with in the past. These resources are similar to known_hosts file used by ssh(1). Once a NetWorker installation (client, server, or storage node) communicates with a remote NetWorker install (client, server, or storage node), a NSR peer information resource will be created on each host and will contain information about the peer (i.e. identity and certificate). During this initial communication, each host will send information about itself to the peer. This information includes the NW instance name, NW instance ID, and the certificate. After this initial communication, each NetWorker install will use the registered peer certificate to validate future communications with that peer.

So, it goes without saying that when a system is rebuilt or a new system is built with a previously used name, this certificate will change.

The resolution below is from the following link;

https://community.emc.com/docs/DOC-20085

Delete the NSR Peer Information of the NetWorker Server on the client/storage node.
Then delete the NSR Peer Information for the client/storage node from the NetWorker Server.

Please follow the steps given below to delete the NSR peer information on NetWorker Server and on the Client.

1. At NetWorker server command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information; name:client_name
delete
y

Specify the name of the client/storage node in the place of client_name.

1. At the client/storage node command line, go to the location /nsr/res
2. Type the command:

nsradmin -p nsrexec
print type:nsr peer information
delete
y

 

 

no comments

9
July

NetWorker 8.2.1.4 Build 783 released

NetWorker 8.2.1.4 Build 783
Publication Date: 2015-JUN-08

234898 ESC		NW_VSS		Escalation 23776:Large SQLDatabase Backups are not possible(VDI Backup)
234595 ESC		NW_Console	Escalation 23700:VBA: VMware protection policy details shows another days backup information
233518 ESC		NW_VSS		Escalation 23701:NMSQL recovered the different data than what is being requested
232892 ESC		NetWorker	Escalation 23391: device mismatch errors in other devices after device deletion that prevents devices from working.
232110 ESC		NetWorker	Escalation 23548:nsrjobd crashes while performing RMAN recovery
229042 ESC		NW_VSS		Escalation 23085:nsrsnap_vss_save crashes in a 25 node Hyper-V Windows 2012 R2 core setup
225356 ESC		NW_VSS		Escalation 22686:NW00162213-NW162213:EXCHANGE 2010 SP3 RU5 backups failing with error VSS_E_WRITERERROR_RETRYABLE
233253 BUG		NetWorker	VBA stuck in query pending forever in case of problems
232507 BUG		NetWorker	ESC 23654 - VBA Policy status shows as failed but not clients are listed in NMC
232306 BUG		NetWorker	ESC 23393:VBA jobs show nothing in waiting to run in NMC even when there are jobs in queued state on VBA and failed                                         VMs show no error
185641 (NW160391) BUG	NetWorker	No info to user when backups are not run due to No Eligible Proxies during hot-add only backup mode
206681 (NW159953) ESC	NetWorker	Cannot label blank tape after upgrade to NetWorker version 8.1 on AIX - different than NW156909
206213 (NW157573) ESC	NetWorker	Library getting down after upgraded from 8.0.1.1 to 8.1.0.2 even to 8.1.0.3
204138 (NW159899) ESC	NetWorker	auto inventory of HP MSL libraries doesnt work in 8.1
200652 (NW161429) ESC	NetWorker	Skips of scheduled clone jobs show as interrupted in the NW gui After upgrading to NW 8.1.1.6

 

no comments

8
July

Cannot access NetWorker VBA GUI

Had an issue recently where the VBA config and FLR GUI’s were inaccessible.  It was easy enough to stop and start tomcat with the emwebapp script, but it didn’t work. EMC provided this process to re-register the certificate also.

 

1. Stop emwebapp
emwebapp.sh – -stop

2. Back up existing keystore
cp /root/.keystore /root/.keystore.sav

3.  List tomcat certificate – should see 1 certificate
/usr/java/latest/bin/keytool -list -keystore /root/.keystore -storepass changeit -alias tomcat

4. Delete tomcat certificate from keystore
/usr/java/latest/bin/keytool -delete -alias tomcat -storepass changeit

5. List tomcat certificate again – should return empty
/usr/java/latest/bin/keytool -list -keystore /root/.keystore -storepass changeit -alias tomcat

6. Regenerate certificate using SHA256
/usr/java/latest/bin/keytool -genkeypair -v -alias tomcat -keyalg RSA -sigalg SHA256withRSA -keystore /root/.keystore -storepass changeit -keypass changeit -validity 3650 -dname “CN=localhost.localdom, OU=Avamar, O=EMC, L=Irvine, S=California, C=US”

7. List tomcat certificate again – should see 1 certificate

8. Start emwebapp.sh
emwebapp.sh – -start

no comments

3
July

Troubleshooting NetWorker Disaster Recovery Backup Failures

Occasionally, I have found that all backup savesets will complete with the exception of the disaster recovery portion?

One thing to check is to ensure all volumes are online. This is required for the disaster recovery backup to complete.

Open a command line on the client and use the diskpart utility.

C:\diskpart

Always, rescan first,

DISKPART>rescan

When complete list volumes to see if any are offline

DISKPART>list vol

Identify the volume offline and put it online

DISKPART>select volume 1

DISKPART>online volume

 

 

no comments

26
June

Unexpected Connection error with NetWorker and VBA

I was really looking forward to another idyllic day as a Backup Administrator.

Those days usually begin troubleshooting a few backup failures, drinking a lot of coffee and planning world domination. Sadly, I actually had to do some work today instead.

I found all my VMware protection policies had failed the previous evening.

error: Unable to connect to VBA, error Cannot establish session to VBA.
Logged onto vSphere and attempted to browse the backup recover area. Where we found the following error:

error: An unexpected connection error occurred and the cause could not be determined, Please check your EBR configuration screen to troubleshoot, or contact an administrator.

We rebooted the appliance and and it seemed to fix the issue for one. Ended up delving into the nuts and bolts of the various resources required to create a VBA backup. Let’s review these components.

EBR Config GUI

You can access the configuration interface by browsing to:

https://VBA:8543/ebr-configure/

Login via the root credentials of the appliance. The default password is 8RttoTriz

Note: I don’t know about you, but try as I might I could not access this via chrome.

VMUSER

In the GUI you can view running service and restart if required. You can also configure the connection to NetWorker. To do this you will need to know the password for the vmuser account. Near as I can tell the is an ID internal to NetWorker that is used to establish communication. The default password is “changeme”

CHANGING THE VMUSER PASSWORD

We hadn’t change this password. If you would like to change the password this can be done on the NetWorker server properties page under the misc tab. We also were not sure what the password was? Yeah, I know.

Re-establish communication between the appliance and NetWorker

Go to the NetWorker Config tab. Here you can enter the password and save to re-establish (and confirm the password) the interface with NetWorker. You will need to reboot the appliance after this.

6-24-2015 2-30-16 PM

 

 

This did resolve our issue and we can now browse the Backup Recovery interface in vSphere. With this re-established backups should run tonight. Fingers crossed.

 

no comments

21
April

Troubleshooting Cloning – Part 2 – Clone Wars

The clone wars continue. As you may recall from the Part 1, my NetWorker optimized clones have been hanging with this misleading error:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

To further complicate things, control over the clone from the console can be limited. I had been using the jobkill utility. Preston has a great write up on it here. The issue is that after killing the clone, the NMC console shows it as running still? Attempts to restart the clone via NMC resulted in the following error:

1 1429638078 event task manager Task aborted: task ‘clone.name Clone’ is already running

So is the job kill, really murdering all the required processes? Lets take a look.

My new env is Windows. So lets visit our old friend the task manager, here we want to look for nsrtask and nsrclone processes. First nsrclone. I’ve had to obfuscate the output, but what I found were the appropriate running clones. The specific job I needed to restart was not listed.

 

nsrclone

The same cannot be said for the clone jobs associated nsrtask process. There was indeed a process still hanging around.

nsrtask

After killing it, the state in NMC changed from Running to Interrupted. I could then restart the job.

interuppted

 

 

All this just to get my clone going again. This is some progress as I had previously been restarting NetWorker, interrupting the service and other running clones.

no comments

7
April

Troubleshooting Cloning – Part 1

I’ve had some issues with cloning recently. It’s been an interesting issue that will require some more analysis, but for now I have a work around that has helped narrow the issue down.  It all started with this error on a running clone:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

After checking the device is mounted I started to search for other issues. As usual I found a great post over at Preston’s blog. There he explains:

“A core component in NetWorker’s media database design is that a saveset can only ever have one instance on a piece of media. This applies as equally to failed as complete saveset instances.The net result is that this error/situation will occur because it’s meant to – NetWorker doesn’t permit more than one instance of a saveset to appear on the same piece of physical media.”

So what I surmise happens is there is a failure during the clone operation. The aborted saveset on the destination needs to be removed or excluded. I have not been able to formulate an mminfo command to find aborted clone savesets yet? See part 2 when published. For now I would be satisfied to exclude the savesets. This specific clone job in question provides DR protection for one of my NetWorker servers. That is, it clones index, bootstrap and filesystem saves to an offsite DD. For reasons I cannot explain here at this time this clone was configured to go back 6 months. I know.

 

 

So after the clone had run for a few minutes I opened the clone properties and hit the “Preview saveset” button. What I found was there was a particular group of savesets from a specific date in the past identified and that date alone. My assumption is those savesets already exist on the destination and the clone job is not intelligent enough to identify and skip. The error “waiting for volume” is really misleading, at least to me. After greatly narrowing to number of days to look back and rerunning the clone job completed successfully.

Stay tuned for Part 2 -Identifying aborted clone savesets. Do you know how? Comment below!

no comments

25
March

EMC NetWorker Technical Advisory – Failed recovery may result in data loss

EMC published an advisory today specific to the following NetWorker versions. Data loss may be experienced when performing command line recoveries on the client and when performing recoveries from the Recover Wizard on the NetWorker Management Console (NMC).

EMC Software: NetWorker Server: NetWorker 8.0 through 8.0.0.7
EMC Software: NetWorker Server: NetWorker 8.0 SP1 through 8.0.1.6
EMC Software: NetWorker Server: NetWorker 8.0 SP2 through 8.0.2.6
EMC Software: NetWorker Server: NetWorker 8.0 SP3 through 8.0.3.7
EMC Software: NetWorker Server: NetWorker 8.0 SP4 through 8.0.4.1
EMC Software: NetWorker Server: NetWorker 8.1 through 8.1.0.5
EMC Software: NetWorker Server: NetWorker 8.1 SP1 through 8.1.1.9
EMC Software: NetWorker Server: NetWorker 8.1 SP2 through 8.1.2.2
EMC Software: NetWorker Server: NetWorker 8.2 through 8.2.0.4

Files on the local file system may be deleted after a failed recovery to the original file location.This issue does not occur if the recovery is directed to a location other than the original data location.  This is caused If NetWorker is unable to read the header of the data source that is used to recover, it processes the error and removes the contents of the file on the target system. However, at this point in the recover process nothing has been written to the target and the existing (original) data on the target system is removed.

EMC addressed this issue in the following NetWorker Client versions, and strongly recommends that impacted customers install the release when possible.

  • NetWorker Client 8.0.4.2 and later
  • NetWorker Client 8.1.2.3 and later
  • NetWorker Client 8.2.0.5 and later

no comments

19
March

NetWorker 8.2.1.1 released

NetWorker 8.2.1.1 Build 753 has been released.
It can be downloaded from ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/8.2/8.2.1.1/
This package contains the following cumulative fixes:
ID Details
226673 (NW162239) ESC  Escalation 22704:Browseable recover: expand_check() function experiences massive delays after upgrade to 8.1.x from 7.x on AIX clients
225294 ESC  Escalation 22923:savepnpc commands with level skip hangs backup
225029 BUG  MMDB latency to respond to VBA savesets query, causes Restore Tab malfunction to list available backups
223984 ESC  Escalation 22756:Unable to set extended attribute ‘security.selinux’ Operation not permitted
223513 ESC  Escalation 22121:jobquery core during DPA 6.1 data collection with NetWortker 8.1.0.5
223175 (NW162157) ESC  Escalation 22649:nsrd/nsrmmdbd dead-lock during relabel
222890 (NW162215) ESC  Escalation 22688:NW:Avamar: Avamar does not delete all savesets even though nsravamar.raw shows them deleted
222833 (NW162150) ESC  Escalation 22643:[BZ: 232867] 8dot3name setting is lost after BMR using BMR_8.1.0.199 or above
206724 (NW161663) ESC  Expired cleaning tape (0 uses left) is being used for cleaning
206664 (NW162021) ESC  [MIGRATED TO BZ]Everytime a backup starts for a UNIX/Linux client, NetWorker queries LDAP for root account
204302 (NW161964) ESC  Failed recover deletes existing original file or folder on file system
199061 (NW161544) ESC  NMC is not displaying the ‘enabled/disabled’ field correctly after upgrade to 8.1.1.3
198242 (NW161619) ESC  Incosintency on used space reported on NMC, mminfo and Disk manager for AFTD device
190825 (NW154749) ESC  NetWorker server client parallelism silently changed to 12 after a restart when parallelism configured to <1
190583 ESC  snmd will not come up on slow systems owing to too short snmd poll timeout

no comments

19
March

EBR not sending summary report

This was a new one. My VMware admin dropped me a note wondering why he had not been receiving a backup summary report from NetWorker? I didn’t even know this was a configurable option? Sure enough after poking around the Web client interface I found it.

3-19-2015 4-11-53 PM

 

 

Then we found this very informative message in the log.

3-19-2015 4-15-19 PM

 

Some quick research found this is a known issue. It typically occurs around daylight saving time and is caused by a mismatch in time value between the summary report timer and the database. Some success had reported with rebooting the appliance.  There was some indication that editing the email option and saving without changes would rectify the issue.

 

 

no comments

« Previous Entries