3
July

Troubleshooting NetWorker Disaster Recovery Backup Failures

Occasionally, I have found that all backup savesets will complete with the exception of the disaster recovery portion?

One thing to check is to ensure all volumes are online. This is required for the disaster recovery backup to complete.

Open a command line on the client and use the diskpart utility.

C:\diskpart

Always, rescan first,

DISKPART>rescan

When complete list volumes to see if any are offline

DISKPART>list vol

Identify the volume offline and put it online

DISKPART>select volume 1

DISKPART>online volume

 

 

no comments

26
June

Unexpected Connection error with NetWorker and VBA

I was really looking forward to another idyllic day as a Backup Administrator.

Those days usually begin troubleshooting a few backup failures, drinking a lot of coffee and planning world domination. Sadly, I actually had to do some work today instead.

I found all my VMware protection policies had failed the previous evening.

error: Unable to connect to VBA, error Cannot establish session to VBA.
Logged onto vSphere and attempted to browse the backup recover area. Where we found the following error:

error: An unexpected connection error occurred and the cause could not be determined, Please check your EBR configuration screen to troubleshoot, or contact an administrator.

We rebooted the appliance and and it seemed to fix the issue for one. Ended up delving into the nuts and bolts of the various resources required to create a VBA backup. Let’s review these components.

EBR Config GUI

You can access the configuration interface by browsing to:

https://VBA:8543/ebr-configure/

Login via the root credentials of the appliance. The default password is 8RttoTriz

Note: I don’t know about you, but try as I might I could not access this via chrome.

VMUSER

In the GUI you can view running service and restart if required. You can also configure the connection to NetWorker. To do this you will need to know the password for the vmuser account. Near as I can tell the is an ID internal to NetWorker that is used to establish communication. The default password is “changeme”

CHANGING THE VMUSER PASSWORD

We hadn’t change this password. If you would like to change the password this can be done on the NetWorker server properties page under the misc tab. We also were not sure what the password was? Yeah, I know.

Re-establish communication between the appliance and NetWorker

Go to the NetWorker Config tab. Here you can enter the password and save to re-establish (and confirm the password) the interface with NetWorker. You will need to reboot the appliance after this.

6-24-2015 2-30-16 PM

 

 

This did resolve our issue and we can now browse the Backup Recovery interface in vSphere. With this re-established backups should run tonight. Fingers crossed.

 

no comments

20
May

NetWorker 8.2.1.3 Build 774 has been released.

NetWorker 8.2.1.3 Build 774 has been released.

It can be downloaded from ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/8.2/8.2.1.3/
This package contains the following cumulative fixes:

ID Details
229347
(NW161917) ESC	NetWorker Escalation 22509: NSM DB2 PIT restore doesn't restore connecting directories' ACL on AIX
227398
NW161624) ESC	NetWorker Escalation 22307: Device discovery raises alerts if udev-named library handle already configured: 14249:dvdetect: 'skipped as requested'
204435
ESC	NetWorker Escalation 22567: Error counts not correctly handled with nsrsnmd & cdi changes
192149
(NW159916) ESC 	NetWorker Escalation 21778: Disable "label" operation in AMM functionality: DataDomain devices unmount with "RPC severe Lost connection to media database"

no comments

21
April

Troubleshooting Cloning – Part 2 – Clone Wars

The clone wars continue. As you may recall from the Part 1, my NetWorker optimized clones have been hanging with this misleading error:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

To further complicate things, control over the clone from the console can be limited. I had been using the jobkill utility. Preston has a great write up on it here. The issue is that after killing the clone, the NMC console shows it as running still? Attempts to restart the clone via NMC resulted in the following error:

1 1429638078 event task manager Task aborted: task ‘clone.name Clone’ is already running

So is the job kill, really murdering all the required processes? Lets take a look.

My new env is Windows. So lets visit our old friend the task manager, here we want to look for nsrtask and nsrclone processes. First nsrclone. I’ve had to obfuscate the output, but what I found were the appropriate running clones. The specific job I needed to restart was not listed.

 

nsrclone

The same cannot be said for the clone jobs associated nsrtask process. There was indeed a process still hanging around.

nsrtask

After killing it, the state in NMC changed from Running to Interrupted. I could then restart the job.

interuppted

 

 

All this just to get my clone going again. This is some progress as I had previously been restarting NetWorker, interrupting the service and other running clones.

no comments

7
April

Troubleshooting Cloning – Part 1

I’ve had some issues with cloning recently. It’s been an interesting issue that will require some more analysis, but for now I have a work around that has helped narrow the issue down.  It all started with this error on a running clone:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

After checking the device is mounted I started to search for other issues. As usual I found a great post over at Preston’s blog. There he explains:

“A core component in NetWorker’s media database design is that a saveset can only ever have one instance on a piece of media. This applies as equally to failed as complete saveset instances.The net result is that this error/situation will occur because it’s meant to – NetWorker doesn’t permit more than one instance of a saveset to appear on the same piece of physical media.”

So what I surmise happens is there is a failure during the clone operation. The aborted saveset on the destination needs to be removed or excluded. I have not been able to formulate an mminfo command to find aborted clone savesets yet? See part 2 when published. For now I would be satisfied to exclude the savesets. This specific clone job in question provides DR protection for one of my NetWorker servers. That is, it clones index, bootstrap and filesystem saves to an offsite DD. For reasons I cannot explain here at this time this clone was configured to go back 6 months. I know.

 

 

So after the clone had run for a few minutes I opened the clone properties and hit the “Preview saveset” button. What I found was there was a particular group of savesets from a specific date in the past identified and that date alone. My assumption is those savesets already exist on the destination and the clone job is not intelligent enough to identify and skip. The error “waiting for volume” is really misleading, at least to me. After greatly narrowing to number of days to look back and rerunning the clone job completed successfully.

Stay tuned for Part 2 -Identifying aborted clone savesets. Do you know how? Comment below!

no comments

7
April

This is embarrassing

That said, I’m never afraid too admit what I don’t know or when I’ve made a mistake. In hindsight, I can’t believe it took me this long to find this. In my defense, this has never been a requirement in to many of the NetWorker environments I’ve managed, until recently.

My client has some remote sites with some with limited bandwidth that we are attempting to backup over the wire to a DD at the home office. The issue is the clients at the site gets a little cranky when the backup hijacks their bandwidth during core business hours. We had been manually killing the backup job upon arriving in the office. The other day I was looking at the client properties and realized there were some variables that I had no idea what function they performed. One was hard limit. There you can set the runtime of the client backup in minutes. So yeah, I never realized there was a way to set a defined backup window for NetWorker clients. There I said it! In my defense, other backup products I had worked with have defined backup window resources.

 

no comments

25
March

AvOpener

Do you have multiple Avamar grids? You probably do. You may have even more with multiple versions. Check out the AVOpener tool

ftp://avamar_ftp:anonymous@ftp.avamar.com/software/scripts/AvOpener.jar

Drop this  on your desktop and use it to open the MCS Console GUI. You can also create favorites and customize it for your environment.Not officially supported by EMC.

no comments

25
March

EMC NetWorker Technical Advisory – Failed recovery may result in data loss

EMC published an advisory today specific to the following NetWorker versions. Data loss may be experienced when performing command line recoveries on the client and when performing recoveries from the Recover Wizard on the NetWorker Management Console (NMC).

EMC Software: NetWorker Server: NetWorker 8.0 through 8.0.0.7
EMC Software: NetWorker Server: NetWorker 8.0 SP1 through 8.0.1.6
EMC Software: NetWorker Server: NetWorker 8.0 SP2 through 8.0.2.6
EMC Software: NetWorker Server: NetWorker 8.0 SP3 through 8.0.3.7
EMC Software: NetWorker Server: NetWorker 8.0 SP4 through 8.0.4.1
EMC Software: NetWorker Server: NetWorker 8.1 through 8.1.0.5
EMC Software: NetWorker Server: NetWorker 8.1 SP1 through 8.1.1.9
EMC Software: NetWorker Server: NetWorker 8.1 SP2 through 8.1.2.2
EMC Software: NetWorker Server: NetWorker 8.2 through 8.2.0.4

Files on the local file system may be deleted after a failed recovery to the original file location.This issue does not occur if the recovery is directed to a location other than the original data location.  This is caused If NetWorker is unable to read the header of the data source that is used to recover, it processes the error and removes the contents of the file on the target system. However, at this point in the recover process nothing has been written to the target and the existing (original) data on the target system is removed.

EMC addressed this issue in the following NetWorker Client versions, and strongly recommends that impacted customers install the release when possible.

  • NetWorker Client 8.0.4.2 and later
  • NetWorker Client 8.1.2.3 and later
  • NetWorker Client 8.2.0.5 and later

no comments

19
March

NetWorker 8.2.1.1 released

NetWorker 8.2.1.1 Build 753 has been released.
It can be downloaded from ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/8.2/8.2.1.1/
This package contains the following cumulative fixes:
ID Details
226673 (NW162239) ESC  Escalation 22704:Browseable recover: expand_check() function experiences massive delays after upgrade to 8.1.x from 7.x on AIX clients
225294 ESC  Escalation 22923:savepnpc commands with level skip hangs backup
225029 BUG  MMDB latency to respond to VBA savesets query, causes Restore Tab malfunction to list available backups
223984 ESC  Escalation 22756:Unable to set extended attribute ‘security.selinux’ Operation not permitted
223513 ESC  Escalation 22121:jobquery core during DPA 6.1 data collection with NetWortker 8.1.0.5
223175 (NW162157) ESC  Escalation 22649:nsrd/nsrmmdbd dead-lock during relabel
222890 (NW162215) ESC  Escalation 22688:NW:Avamar: Avamar does not delete all savesets even though nsravamar.raw shows them deleted
222833 (NW162150) ESC  Escalation 22643:[BZ: 232867] 8dot3name setting is lost after BMR using BMR_8.1.0.199 or above
206724 (NW161663) ESC  Expired cleaning tape (0 uses left) is being used for cleaning
206664 (NW162021) ESC  [MIGRATED TO BZ]Everytime a backup starts for a UNIX/Linux client, NetWorker queries LDAP for root account
204302 (NW161964) ESC  Failed recover deletes existing original file or folder on file system
199061 (NW161544) ESC  NMC is not displaying the ‘enabled/disabled’ field correctly after upgrade to 8.1.1.3
198242 (NW161619) ESC  Incosintency on used space reported on NMC, mminfo and Disk manager for AFTD device
190825 (NW154749) ESC  NetWorker server client parallelism silently changed to 12 after a restart when parallelism configured to <1
190583 ESC  snmd will not come up on slow systems owing to too short snmd poll timeout

no comments

19
March

EBR not sending summary report

This was a new one. My VMware admin dropped me a note wondering why he had not been receiving a backup summary report from NetWorker? I didn’t even know this was a configurable option? Sure enough after poking around the Web client interface I found it.

3-19-2015 4-11-53 PM

 

 

Then we found this very informative message in the log.

3-19-2015 4-15-19 PM

 

Some quick research found this is a known issue. It typically occurs around daylight saving time and is caused by a mismatch in time value between the summary report timer and the database. Some success had reported with rebooting the appliance.  There was some indication that editing the email option and saving without changes would rectify the issue.

 

 

no comments

Back to top