7
April

Troubleshooting Cloning – Part 1

I’ve had some issues with cloning recently. It’s been an interesting issue that will require some more analysis, but for now I have a work around that has helped narrow the issue down.  It all started with this error on a running clone:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

After checking the device is mounted I started to search for other issues. As usual I found a great post over at Preston’s blog. There he explains:

“A core component in NetWorker’s media database design is that a saveset can only ever have one instance on a piece of media. This applies as equally to failed as complete saveset instances.The net result is that this error/situation will occur because it’s meant to – NetWorker doesn’t permit more than one instance of a saveset to appear on the same piece of physical media.”

So what I surmise happens is there is a failure during the clone operation. The aborted saveset on the destination needs to be removed or excluded. I have not been able to formulate an mminfo command to find aborted clone savesets yet? See part 2 when published. For now I would be satisfied to exclude the savesets. This specific clone job in question provides DR protection for one of my NetWorker servers. That is, it clones index, bootstrap and filesystem saves to an offsite DD. For reasons I cannot explain here at this time this clone was configured to go back 6 months. I know.

 

 

So after the clone had run for a few minutes I opened the clone properties and hit the “Preview saveset” button. What I found was there was a particular group of savesets from a specific date in the past identified and that date alone. My assumption is those savesets already exist on the destination and the clone job is not intelligent enough to identify and skip. The error “waiting for volume” is really misleading, at least to me. After greatly narrowing to number of days to look back and rerunning the clone job completed successfully.

Stay tuned for Part 2 -Identifying aborted clone savesets. Do you know how? Comment below!

No comments yet

7
April

This is embarrassing

That said, I’m never afraid too admit what I don’t know or when I’ve made a mistake. In hindsight, I can’t believe it took me this long to find this. In my defense, this has never been a requirement in to many of the NetWorker environments I’ve managed, until recently.

My client has some remote sites with some with limited bandwidth that we are attempting to backup over the wire to a DD at the home office. The issue is the clients at the site gets a little cranky when the backup hijacks their bandwidth during core business hours. We had been manually killing the backup job upon arriving in the office. The other day I was looking at the client properties and realized there were some variables that I had no idea what function they performed. One was hard limit. There you can set the runtime of the client backup in minutes. So yeah, I never realized there was a way to set a defined backup window for NetWorker clients. There I said it! In my defense, other backup products I had worked with have defined backup window resources.

 

No comments yet

25
March

AvOpener

Do you have multiple Avamar grids? You probably do. You may have even more with multiple versions. Check out the AVOpener tool

ftp://avamar_ftp:anonymous@ftp.avamar.com/software/scripts/AvOpener.jar

Drop this  on your desktop and use it to open the MCS Console GUI. You can also create favorites and customize it for your environment.Not officially supported by EMC.

no comments

25
March

EMC NetWorker Technical Advisory – Failed recovery may result in data loss

EMC published an advisory today specific to the following NetWorker versions. Data loss may be experienced when performing command line recoveries on the client and when performing recoveries from the Recover Wizard on the NetWorker Management Console (NMC).

EMC Software: NetWorker Server: NetWorker 8.0 through 8.0.0.7
EMC Software: NetWorker Server: NetWorker 8.0 SP1 through 8.0.1.6
EMC Software: NetWorker Server: NetWorker 8.0 SP2 through 8.0.2.6
EMC Software: NetWorker Server: NetWorker 8.0 SP3 through 8.0.3.7
EMC Software: NetWorker Server: NetWorker 8.0 SP4 through 8.0.4.1
EMC Software: NetWorker Server: NetWorker 8.1 through 8.1.0.5
EMC Software: NetWorker Server: NetWorker 8.1 SP1 through 8.1.1.9
EMC Software: NetWorker Server: NetWorker 8.1 SP2 through 8.1.2.2
EMC Software: NetWorker Server: NetWorker 8.2 through 8.2.0.4

Files on the local file system may be deleted after a failed recovery to the original file location.This issue does not occur if the recovery is directed to a location other than the original data location.  This is caused If NetWorker is unable to read the header of the data source that is used to recover, it processes the error and removes the contents of the file on the target system. However, at this point in the recover process nothing has been written to the target and the existing (original) data on the target system is removed.

EMC addressed this issue in the following NetWorker Client versions, and strongly recommends that impacted customers install the release when possible.

  • NetWorker Client 8.0.4.2 and later
  • NetWorker Client 8.1.2.3 and later
  • NetWorker Client 8.2.0.5 and later

no comments

19
March

NetWorker 8.2.1.1 released

NetWorker 8.2.1.1 Build 753 has been released.
It can be downloaded from ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/8.2/8.2.1.1/
This package contains the following cumulative fixes:
ID Details
226673 (NW162239) ESC  Escalation 22704:Browseable recover: expand_check() function experiences massive delays after upgrade to 8.1.x from 7.x on AIX clients
225294 ESC  Escalation 22923:savepnpc commands with level skip hangs backup
225029 BUG  MMDB latency to respond to VBA savesets query, causes Restore Tab malfunction to list available backups
223984 ESC  Escalation 22756:Unable to set extended attribute ‘security.selinux’ Operation not permitted
223513 ESC  Escalation 22121:jobquery core during DPA 6.1 data collection with NetWortker 8.1.0.5
223175 (NW162157) ESC  Escalation 22649:nsrd/nsrmmdbd dead-lock during relabel
222890 (NW162215) ESC  Escalation 22688:NW:Avamar: Avamar does not delete all savesets even though nsravamar.raw shows them deleted
222833 (NW162150) ESC  Escalation 22643:[BZ: 232867] 8dot3name setting is lost after BMR using BMR_8.1.0.199 or above
206724 (NW161663) ESC  Expired cleaning tape (0 uses left) is being used for cleaning
206664 (NW162021) ESC  [MIGRATED TO BZ]Everytime a backup starts for a UNIX/Linux client, NetWorker queries LDAP for root account
204302 (NW161964) ESC  Failed recover deletes existing original file or folder on file system
199061 (NW161544) ESC  NMC is not displaying the ‘enabled/disabled’ field correctly after upgrade to 8.1.1.3
198242 (NW161619) ESC  Incosintency on used space reported on NMC, mminfo and Disk manager for AFTD device
190825 (NW154749) ESC  NetWorker server client parallelism silently changed to 12 after a restart when parallelism configured to <1
190583 ESC  snmd will not come up on slow systems owing to too short snmd poll timeout

no comments

19
March

EBR not sending summary report

This was a new one. My VMware admin dropped me a note wondering why he had not been receiving a backup summary report from NetWorker? I didn’t even know this was a configurable option? Sure enough after poking around the Web client interface I found it.

3-19-2015 4-11-53 PM

 

 

Then we found this very informative message in the log.

3-19-2015 4-15-19 PM

 

Some quick research found this is a known issue. It typically occurs around daylight saving time and is caused by a mismatch in time value between the summary report timer and the database. Some success had reported with rebooting the appliance.  There was some indication that editing the email option and saving without changes would rectify the issue.

 

 

no comments

18
March

New to NetWorker?

dogihavenoidea

Congratulations! You have inherited some NetWorker infrastructure, either by your defined career path or by misfortune. You have a rudimentary knowledge of backup technologies, but you’re not sure where to start. This post is for you. First, head over to support.emc.com and sign up for an account. You may need to contact site support to associate your new user ID with your site id. What is your site ID? Glad you asked.

Things you should have when opening a support call

  • You should know your site ID
  • Host ID
  • NetWorker Version
  • Platform

Hopefully, somebody you work with has the site ID, if not your local sales rep might. If not contact support and have your ID associated. When associated you can open support requests online and select your site from a drop down list.

The host id is a unique ID NetWorker assigns when the software is installed. It is not to be confused with the host id of your NetWorker server or data zone.

  1. Open the NetWorker server’s NetWorker Management Console (Console) interface.
    2. Select NetWorker Administration.
    3. In the Administration interface, click the Configuration button.
    4. Right-click Registrations in the navigation tree, then right-click the NetWorker evaluation license (or any NetWorker license) in the Registrations area of the screen. The Properties window appears.
    5. In the Configuration area of the Properties window, the Host ID is the last of the parameters displayed.
    6. Click OK or Cancel to leave the Properties window.

Your version is easy enough to find. In the NMC select help and about.

With your new EMC ID you should also be able to access https://community.emc.com/. The ECN is your portal for all things EMC. Great forums, sometimes you might actually find an answer there. If you post be sure to give as much info about you infrastructure as possible, Version, platform, etc. There are some great, smart NetWorker guys that hang out there and not giving this context to your issue is a massive pet peeve.

Going back to support.emc.com. There you can find the NetWorker support portal. It’s pretty sexy. Here you can see other recommended resources. Remember, NetWorker modules have separate support pages. You can subscribe to the page and it will be saved in your product list for the future. Also I love the service life by version section as well as the shortcuts to open support tickets or live chat with EMC support.

3-18-2015 1-22-05 PM

 

Next check out the documentation section. Find the documentation portfolio for your NetWorker version. In there you will find the admin guide as well as a wealth of other information. One document that is worth a look if you’re new to NetWorker is “Theory of Core Operations” It may not be bundled in the portfolio, but is there if you search. The information in this guide is primarily intended to familiarize new EMC developers, test engineers, technical support engineers, product specialists, instructors, course developers, and information developers with NetWorker concepts.
This document addresses the EMC concept of “core engineering” function for the NetWorker product for Windows and UNIX operating systems.

Last but not least is http://nsrd.info/. This site is owned and administered by Preston de Guise. I’m not sure when he sleeps. He frequently post in depth content and his blog is a fantastic library of all things NetWorker. Be sure to check out his micro manuals. He has a new one called Turbo Charged NetWorker, which he intends to update regularly. He also published yearly NetWorker administrator survey results which he uses to capture trends in NetWorker uses and functions. Wow.

Also, you may pick up his book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. A great read, I’m sure. Bought it a while ago and promise to read it one day.

So there you go. Did I miss anything? Comment below!

no comments

5
March

Backups are collaborative

This is an open letter to Windows server admins.

I have had the luxury of managing large backup environments at a few different organizations over the years. This is a great way for me to engage my skills in a very niche area as well as for my colleagues. Why for them? Well then they don’t have to manage backups as part of their daily operations. If I have learned anything since focusing in this area, it’s that nobody thinks about backups. Nobody wants to. Whether it’s in the early stages of a large system build, where we really should be brought in to consult and ensure that our backup application can ingest this new load or in daily ops. It is somehow expected that we will be there ready to drink from the veritable data fire hose and provide near zero downtime recovery.

The point I wanted to make was since the first tape drive was connected to the first mainframe, demands have required tighter integration between the backup application and your data. Whether it is a backup agent protecting your database, the vstorage API to protect your VM or VSS to capture a backup of your Windows server to ensure recoverability. They all require collaboration with backup administrators and systems administrators. Are you a Windows administrator? Are you familiar with the VSS process?  If not, you should be. Like the vstorage API and RMAN, VSS is just a mechanism that allows a backup application to capture the data. I don’t own the VSS component although I’m probably more familiar with it and better at resolving VSS issue than any Windows administrator.

If there is one thing I can assure you is this. VSS issues are not unique to any one backup product. Google “VSS backup issue” and you will find a multitude of forums from an array of products filled with angry, annoyed backup system administrators fuming over VSS.   I can generally and do fix most VSS issues without having to bother an admin. Sometimes I can’t and the only thing I can offer is to reboot the system. Is there a better solution? Yes, call Microsoft! Open a ticket, run some VSS traces. In short, work with me. Don’t treat me as a nuisance who is again requesting a reboot.

Collaborate with me because one day, you will need me by far more than I need you.

no comments

3
March

Purging Data with NetWorker and Data Domain

Purging data is sometimes required. There could be some legal requirement or more likely you are out of space on disk storage. Here is what I recently had to do.

Some data was identified to be purged. Specifically three clients.

The following tasks need to be completed:

  • Identify the save sets
  • Build the batch file
  • run nsrim -X
  • Run a clean on the DD

Identifying the save sets

There is a ton of info out there on mminfo. This is the command we will use to identify which save sets to remove. The command has two parts. A query portion where we feed in the required variables and a report portion where we can narrow the specific data we need. My command resembled the following.

mminfo -avot -q “client-clientname, volume=volumename.001″ -r client,volume,ssid,cloneid

I ran the command and outputted to text. The output resembled the following.

client    volume         ssid          clone id

XXXXX volume.001 4065016450 1397439105

XXXXX volume.001 3997907656 1397439175

XXXXX  volume.001 3981130503 1397439239

XXXXX  volume.001 3947576123 1397439291

XXXXX  volume.001 3930798958 1397439341

XXXXX  volume.001 3914021948 1397439548

I’m really only concerned with the last two columns, as these are required to use with the nsrmm command to delete the data from the NetWorker databases.

The command we will use will look like this.

nsrmm-dy -S SSID/CloneID

Now that we have this we can build the batch file, as this is a windows system.

Building the batch

Just a lot of excelFU here. I saved the output from the mminfo command to a text file, then imported into using Data, From text. I selected Delimited and then selected “space” as the delimiter. This inserted my data into the columns nicely.

1 2

 

3

Next, I’ll delete everything in in columns A and B. In column A I will enter our nsrmm command as above. Then I’ll select the cell and drag it down to auto fill in the cells below.

5

Next I formatted all the cells as number and then entered the following formula into cell E2.

CONCATENATE(A3,” “,D3,”/”E3)

This will merge our command in column A, add a space after then our SSID and insert a / to seperate. Finally it will tag on the cloneid and output to one cell. Select the cell and drag down to auto populate. You will then have a column filled with individual commands. That entire column can be pasted into a batch file.

6

When complete run the batch, I have over 2000 rows so this will take while.

Run nsrim -X

nsrim -X will synchronize the media DB and wraps up the purging of this data from NetWorker

Run a clean on the DD

Start the DD clean. It can take some time, best to run when things are quiet. Here we can see we did not win a lot in the way of cleanable data?

 

ddclean

 

Why is that? Reducing data retention has limited effect. When data is expired the pointers can be removed, but unique data is still needed to be retained for recovery.  Reducing retention is not always a positive thing as it can lead to a reduced pool of data to deduplicate  against. However, here we removed clients in their entirety? I can only assume that the data on this client was already highly deduplicated and there was actual precious little unique data identified. So are results are what they are. I hope yours are better. Let me know. Comment below.

 

 

no comments

25
February

Protecting the vCenter server database

 

Had a few VBA backup failures this morning. I had isolated the failures to my clients VBLOCK and was just about to start digging into the problem, when the VMware admin contracted me and advised he had found the issue. Apparently we had configured a VBA backup of the vCenter server.  I’m surprised we had not seen this issue before. While taking a quiesced snapshot or while deleting the snapshot of the database virtual machine the vCenter server can loose connectivity to the database. What is actually happening beneath the hood is interesting and pretty 


“The vCenter database layer (Vdb) replays the failed SQL statement requests to continue the vCenter operation. During the replay process, if it turns out that the previously failed SQL statement has been committed to the database, and if there is a unique constraint definition on the specific table, the ODBC driver reports the unique constraint violated error to the VMware VirtualCenter Server service and the service shuts down to prevent corruption of the vCenter Server database.”

Vmware reconfirms what I have always considered a best practice for protecting any application.

Currently, VMware does not support quiesced snapshots of virtual machines running the vCenter Server database. 

To work around this issue, use one of these options:
  • If quiesced snapshots are created by backup software to back up the virtual machine data, either use:
    • A backup solution that provides application-level quiescing.
    • Backup Agents in the guest operating system.Note: Any backup agent that quiesces the file system causes the issue described in this article.

  • If snapshots are created manually during virtual machine maintenance (for example, guest os patching, configuration changes), deselect the Quiesce guest file system option while taking the virtual machine snapshot.

See: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2003674

 

no comments

Back to top