21
April

Troubleshooting Cloning – Part 2 – Clone Wars

The clone wars continue. As you may recall from the Part 1, my NetWorker optimized clones have been hanging with this misleading error:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

To further complicate things, control over the clone from the console can be limited. I had been using the jobkill utility. Preston has a great write up on it here. The issue is that after killing the clone, the NMC console shows it as running still? Attempts to restart the clone via NMC resulted in the following error:

1 1429638078 event task manager Task aborted: task ‘clone.name Clone’ is already running

So is the job kill, really murdering all the required processes? Lets take a look.

My new env is Windows. So lets visit our old friend the task manager, here we want to look for nsrtask and nsrclone processes. First nsrclone. I’ve had to obfuscate the output, but what I found were the appropriate running clones. The specific job I needed to restart was not listed.

 

nsrclone

The same cannot be said for the clone jobs associated nsrtask process. There was indeed a process still hanging around.

nsrtask

After killing it, the state in NMC changed from Running to Interrupted. I could then restart the job.

interuppted

 

 

All this just to get my clone going again. This is some progress as I had previously been restarting NetWorker, interrupting the service and other running clones.

no comments

7
April

Troubleshooting Cloning – Part 1

I’ve had some issues with cloning recently. It’s been an interesting issue that will require some more analysis, but for now I have a work around that has helped narrow the issue down.  It all started with this error on a running clone:

Waiting for 1 Writable volume on backup pool ‘Device’ disk(s) on nsr_server

After checking the device is mounted I started to search for other issues. As usual I found a great post over at Preston’s blog. There he explains:

“A core component in NetWorker’s media database design is that a saveset can only ever have one instance on a piece of media. This applies as equally to failed as complete saveset instances.The net result is that this error/situation will occur because it’s meant to – NetWorker doesn’t permit more than one instance of a saveset to appear on the same piece of physical media.”

So what I surmise happens is there is a failure during the clone operation. The aborted saveset on the destination needs to be removed or excluded. I have not been able to formulate an mminfo command to find aborted clone savesets yet? See part 2 when published. For now I would be satisfied to exclude the savesets. This specific clone job in question provides DR protection for one of my NetWorker servers. That is, it clones index, bootstrap and filesystem saves to an offsite DD. For reasons I cannot explain here at this time this clone was configured to go back 6 months. I know.

 

 

So after the clone had run for a few minutes I opened the clone properties and hit the “Preview saveset” button. What I found was there was a particular group of savesets from a specific date in the past identified and that date alone. My assumption is those savesets already exist on the destination and the clone job is not intelligent enough to identify and skip. The error “waiting for volume” is really misleading, at least to me. After greatly narrowing to number of days to look back and rerunning the clone job completed successfully.

Stay tuned for Part 2 -Identifying aborted clone savesets. Do you know how? Comment below!

no comments