25
February

Protecting the vCenter server database

 

Had a few VBA backup failures this morning. I had isolated the failures to my clients VBLOCK and was just about to start digging into the problem, when the VMware admin contracted me and advised he had found the issue. Apparently we had configured a VBA backup of the vCenter server.  I’m surprised we had not seen this issue before. While taking a quiesced snapshot or while deleting the snapshot of the database virtual machine the vCenter server can loose connectivity to the database. What is actually happening beneath the hood is interesting and pretty 


“The vCenter database layer (Vdb) replays the failed SQL statement requests to continue the vCenter operation. During the replay process, if it turns out that the previously failed SQL statement has been committed to the database, and if there is a unique constraint definition on the specific table, the ODBC driver reports the unique constraint violated error to the VMware VirtualCenter Server service and the service shuts down to prevent corruption of the vCenter Server database.”

Vmware reconfirms what I have always considered a best practice for protecting any application.

Currently, VMware does not support quiesced snapshots of virtual machines running the vCenter Server database. 

To work around this issue, use one of these options:
  • If quiesced snapshots are created by backup software to back up the virtual machine data, either use:
    • A backup solution that provides application-level quiescing.
    • Backup Agents in the guest operating system.Note: Any backup agent that quiesces the file system causes the issue described in this article.

  • If snapshots are created manually during virtual machine maintenance (for example, guest os patching, configuration changes), deselect the Quiesce guest file system option while taking the virtual machine snapshot.

See: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2003674

 

No comments yet

20
February

What Does Total Source Capacity Mean?

Recently, when reviewing a report on a clients monthly backup statistics, an issue was highlighted with NetWorkers native licenses conformance output.

nlic

 

 

As we can see the output is not super informative. I have to admit, I was completely ignorant of exactly what Total Source Capacity meant? Now it seems self explanatory, but it took a few attempts from my SE of explaining before it sunk through the first few layers of my skull. I found this description from an EMC website that sums it up far better than I ever could.

“Data is measured as the largest aggregate full backup or synthetic full backup which is the combination of full backups plus incremental backups that are performed for all protected data by the NetWorker software over a two-month period (60 days). This is irrespective of where the data is backed up, for example, from tape, disk, VTL, Avamar ® Data Store, or Data Domain. The quantity of pre-deduplicated data is included in the calculation.”

So the measurement of total source capacity is not really directly tied to data moved or data stored, but rather the aggregate of the largest Full backup of all your protected clients over a 60 day period. My next challenge was how to determine this number? This particular client does use DPA, and I was getting ready to attempt to create a report that may pull out this information. Some googlefu found a recent question posted in the EMC community forum from a user looking for the exact same thing. After bumping some dude named Gareth came through with a deeply buried report template in DPA that would address this.

https://community.emc.com/message/866670?et=watches.email.thread#866670

 

The report is called “Estimated Protected Capacity”. You can find this report under Status / Backup in the report menu. Just set you time period for 60 days and viola!

tc

No comments yet

11
April

Configuring NMM Exchange Backups

stephen-colbert

First I know what you are all wondering, Dan how do you feel about Stephen Colbert being named as David Letterman’s replacement in 2015? I find Stephen Colbert both charming and handsome and look forward to meeting the real Colbert when he debuts. It’s a good thing Mr. Colbert can relay on his many talents to help fulfill a role as a lat night talk show host. Without those skills he may have ended up a lowly Data recovery specialist in Calgary faced with an NMM configuration.

That my friends, is what we call a segue in the biz.

 

 

Recently I rejoined the professional services team at my company. I had been embedded for sometime, but lets not talk about that. Suffice to say I’m excited to return to PS as a trusted adviser on all things NetWorker. Those words are a complete fabrication of reality and were proven when I attempted to configure NMM backups for my clients Exchange systems.

Given that this was my first project upon returning to PS I was eager to put a “W” up on the board. I had thought I was prepared having reviewed the NMM documentation. I found it slightly convoluted and confusing in some parts, but this is not an unusual circumstance for my brain. Perhaps I was delusional in assuming that the documentation contained all the knowledge needed to be successful? Yes, yes I was.

I don’t have the time to re-write the NMM documentation here. What I can do is fill you in on some glaring omissions form the documentation that EMC support was able to inform me of.

After adding the client check the client resources.

Not sure how critical this was but EMC support advised that a pool should not be specified.

4-11-2014 12-43-22 PM

 

 

Ensure the DAG entry is correct

4-11-2014 12-43-35 PM

 

 

 

 

 

Ensure all DNS aliases are entered

4-11-2014 12-43-51 PM

The remote access are should contain entries for all the exchange nodes.

4-11-2014 12-44-09 PM

 

 

The administration field should contain the following. An entry for the NMMBackupUser, this user should have been created with the required permissions and a mailbox assigned. There should also be entries for the exchange nodes and the Exchange DAG name.

4-11-2014 12-55-31 PM

 

 

Again, not sure how important, but the EMC support zeroed out these entries on the group properties

 

4-11-2014 12-44-49 PM

 

 

The snapshot policy and pool should be identified under the group properties.

4-11-2014 12-44-32 PM

 

Finally, ensure you have the following entries in administration  under setup in the NetWorker Server properties.

4-11-2014 12-55-31 PM

 

 

EMC documentation is pretty good, but I don’t know how these very critical items could be missed. I searched the NMM documentation portfolio for the following items.

I hope this post helps other users, VAR’s and EMC staff in any future NMM implementations.  All said, it seems to work well when you get it going.

no comments

29
July

Ask the expert?

I’ll be hosting a discussion over at the ECN network on NetWorker day to day operations.

https://community.emc.com/thread/177390

 

I know, expert? I’ll try my best to impart something that may or may not resemble wisdom. I’m not making any promises.

 

no comments

4
June

Creating NetWorker Client Repositories

Creating client repositories is pretty easy. All you need is the package and the command line. It does get a little tricky when you need to create a cross platform repository. That is create Windows client repository on a Linux NetWorker server.

For same OS, it’s easy enough.  Move the package over to the NetWorker server. You may want to look at the LGTO meta file. It will break down the variables for how they should be entered into the command line.

The switches below denotate the product, platform and path to the client files.

nsrpush -a -p NetWorker -v 8.0.1.2 -P linux_x86 -m /tmp/linux_x86 -U

Success adding product from:
/tmp/linux_x86
Add to repository status: succeeded

Cross platform is a little more of a pain. You need to place the Windows client files on a like Windows client as well as on the NetWorker server. Then we will specify the Windows client in the command along with the path.

 

Below we add the C for the client name. This is the name of an existing networker client where we have placed the new client files as well as the path to the same files on our NIX NetWorker server.

nsrpush -a -p NetWorker -P win_x64 -v 8.0.1.2 -m /tmp/win_x64 -c cwf161 -C ‘D:\8.0.1.2\nw80sp1_win_x64\win_x64′ -W
Hostname and mount point recorded.
Success adding product from:
/tmp/win_x64

 

 

no comments

2
May

Troubleshooting NDMP backup issues

This morning I came in to find the NDMP backups of the Celerra had been hung overnight. The other filer was fine as were all the client backups.

So lets go through the list of things to check.

1. Is the filer up?

I know this sounds dumb, but really start with the simplest solution. Yes, the filer is online.

2.Is the NDMP service running on the filer?

To confirm run the following command from your NetWorker server.

inquire -N filer_name.

You will be prompted to enter the associated credentials. It will then return a list of devices attached to the filer. Compare this against what is in NetWorker. Yup! It all looks good.

3. Is there an issue with the media database?

I’ve seen this before. Where there is media available to use but networker is not using it. This should be associated with some alerts indicating “unable to allocate media” I wasnt seeing any such alert. At the time I had only scratch tapes, that is returned previously used media. I added some brand new never used tapes to see if that would work. I also ran some of the database checks. nsrim -X. No joy, if the problem is related in some way to this I’ll need some help from the vendor.

4. Are there any processes running for these jobs?

# ps -ef | grep FILER101
root 1512 3029 0 May01 ? 00:00:04 /usr/sbin/nsrndmp_save -T dump -F :sapvol01:uxvol01:filenetvol01 -s cls213.company.ca -c FILER101.company.ca -g FILER101_04 -LL -t 1367328945 -l 7 -q -W 78 -N /epvol08 /epvol08
root 1942 3047 0 02:14 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 SQL Prod SQL_DB_Prod
root 4735 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 7 -N 1 FILER101_07
root 10453 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 5 -N 1 FILER101_05
root 11767 16287 0 11:47 pts/3 00:00:00 grep FILER101
root 13881 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 3 -N 1 FILER101_03
root 14461 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -s cls213.company.ca -c FILER101.company.ca -g FILER101_03 -LL -t 1366767465 -l 2 -q -W 78 -N /epvol07 /epvol07
root 14914 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 1 -N 1 FILER101_01
root 15502 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -F :epvol11 -s cls213.company.ca -c FILER101.company.ca -g FILER101_01 -LL -t 1367367626 -l 4 -q -W 78 -N /epvol10 /epvol10
root 21381 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 6 -N 1 FILER101_06
root 26306 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 2 -N 1 FILER101_02
root 26418 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -F :SEIS:epvol01:epvol02:epvol03 -s cls213.company.ca -c FILER101.company.ca -g FILER101_02 -LL -l full -q -W 78 -N /EP /EP
root 30062 3047 0 Apr30 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 4 -N 1 FILER101_04

5. Finally, check the logs.
Check /nsr/cores and the daemon log.

In this instance I found this message scrolling through.
83446 05/02/2013 02:13:28 PM nsrmmd NSR warning Ignoring shutdown request for nsrmmd 3 with pid 6870 because it’s currently busy.

Lets look at this PID…

root 6870 1 0 Apr16 ? 00:01:07 /usr/sbin/nsrmmd -b 2 -N 364266497 -n 3 -s cls213.company.ca -r FILER101 -t cls213.company.ca

My plan was to kill and restart, but before I could restart the jobs took off. I guess the process restarted on its own.
Really, this process is not specific to NDMP. Just wanted to map out my process when troubleshooting.

no comments

21
April

More fun with NSRADMIN

Base_Camp_and_Climbing_894

If you are responsible for maintenance and operations of some NetWorker infrastructure, you are doing yourself a huge disservice not getting to know NSRADMIN.

There is a some documentation out there, this micromanual  written by Preston over at nsrd.info is invaluable. So, lets talk about a recent use case that was of great help to me recently.

I have a large client that was migrating their VM’s into a new vCenter. Task 1. was to update the application information variable on all the client resources with the name of the new vCenter.  My friend and co-worker informed me that you can now edit multiple clients simultaneously with the NMC on NetWorker 8! I was pretty happy with this news for two reasons.

1. As previously stated, I’m not very smart. While I was sure there was going to be a way to do this with NSRADMIN I wasn’t sure how?

2. I’m also lazy and had no desire or time to figure this out.

I attempted to do this via the NMC initially. I didnt take the time to note the error, but it was something to the effect of  “Could not edit 6 of 18 selected clients…”. I kept trying with different blocks of clients and consistently had this error returned. Eventually I gave up and had more success with NSRADMIN. Here is what I did.

nsradmin> show name:; application information:
nsradmin> print type: NSR client; group: DEV4_VADP

The above returns a list of all clients in the group and the associated application information.

name: server_nameDPU101;
application information: VADP_HYPERVISOR=vdc01vmvc01.domain.com;

name: server_nameDXU101;
application information: “VADP_HYPERVISOR=vdc01vmvc03.domain.com”;

 

To update this variable for all the clients in the group, run the following.

nsradmin> update application information: VADP_HYPERVISOR=vsphere4.domain.com

You will be prompted to confirm for each client. I have not been able to find a way to force this. This could be scripted to make it even easier, but I was hammering this on the fly in the middle of the night and had to get this done as I wanted to go back to bed.

Lazy? Remember?

 

 

no comments

16
March

Log File Size Management

networker is down

 

I got to thinking about log files size mgmt after I had an issue this week.

I came in the office to find NetWorker a smoking wreck, as pictured above. Unresponsive and our /nsr directory rapidly running out of space. We had just added 200 GB a few weeks ago. It now was 99% full with only 16 GB left. With a couple of hours that space would be filled also.

I had assumed that the index was the culprit. Perhaps there was huge new data load on one of the clients or the filer?

I was working with the storage team to get some space added, when they informed me it was the daemon.raw and .log files that were over 160 GB in size a growing rapidly. Still not 100% sure what the cause was? We had recently upgraded from 7.6sp3 to 8.0 sp1. There had been some issues with NDMP backups and I had requested the storage team increase the debug level of the NDMP daemon on the filer. I dont think this would inform the daemon files?

Anyway, we also found there were some runaway ndmp process on the filers and killed them.
With the crisis averted it was time to do some analysis on log file mgmt.

I know what you are thinking. Great way to spend a Saturday afternoon :)

Again, nsradmin is you’re friend!

To trim the log files I restarted networker. This rolled them over and I then deleted.
Here are our current log files.

-bash-3.2$ ls -lh daemon*
-rw-r–r– 1 root root 36M Feb 19 12:26 daemon_130219_122717.log
-rw-r–r– 1 root root 56M Feb 19 12:26 daemon_130219_122717.raw
-rw-r–r– 1 root root 4.1M Feb 21 10:01 daemon_130221_104654.log
-rw-r–r– 1 root root 6.8M Feb 21 10:01 daemon_130221_104654.raw
-rw-r–r– 1 root root 19M Feb 28 10:29 daemon_130228_115135.raw
-rw-r–r– 1 root root 11M Feb 28 10:29 daemon_130228_115136.log
-rw-r–r– 1 root root 59M Mar 11 09:20 daemon_130311_092040.raw
-rw-r–r– 1 root root 47M Mar 11 09:20 daemon_130311_092041.log
-rw-r–r– 1 root root 13M Mar 16 12:29 daemon.log
-rw-r–r– 1 root root 25M Mar 8 11:37 daemon.log2
-rw-r–r– 1 root root 14M Mar 16 12:29 daemon.raw
So we can see our files aren’t out of control. Lets look at our log mgmt settings.

[root@cls###~]# nsradmin -p nsrexec
NetWorker administration program.
Use the “help” command for help, “visual” for full-screen mode.
nsradmin> . type:NSR log
Current query set

Here we see a few directives

nsradmin> print
type: NSR log;
administrator: root, “user=root,host=cls###.WDenergy.ca”;
owner: NetWorker;
maximum size MB: 2;
maximum versions: 10;
runtime rendered log: /nsr/logs/daemon.log;
runtime rollover by size: Disabled;
runtime rollover by time: ;
name: daemon.raw;
log path: /nsr/logs/daemon.raw;

So here we can see the max file size is set for 2 MB, but the associated directive is disabled. So, our log files can and will grow unabated.

Lets, enable it. NetWorker will then do a hourly check on the file. It will then be rolled over. A maximum of 10 log archives will be kept.

nsradmin> . type:NSR log;name:daemon.raw
Current query set
nsradmin> print
type: NSR log;
administrator: root, “user=root,host=cls###.WDenergy.ca”;
owner: NetWorker;
maximum size MB: 2;
maximum versions: 10;
runtime rendered log: /nsr/logs/daemon.log;
runtime rollover by size: Disabled;
runtime rollover by time: ;
name: daemon.raw;
log path: /nsr/logs/daemon.raw;
nsradmin> update runtime rollover by size: Enabled
runtime rollover by size: Enabled;
Update? y
updated resource id 12.0.230.115.0.0.0.0.82.37.185.74.0.0.0.0.10.204.4.77(3)

Lets view and confirm the change.
nsradmin> print
type: NSR log;
administrator: root, “user=root,host=cls###.WDenergy.ca”;
owner: NetWorker;
maximum size MB: 2;
maximum versions: 10;
runtime rendered log: /nsr/logs/daemon.log;
runtime rollover by size: Enabled;
runtime rollover by time: ;
name: daemon.raw;
log path: /nsr/logs/daemon.raw;
nsradmin> . type:NSR log;name:daemon.log

Had to wait a little bit, but less than an our later the rollover happened!

 

[root@cls### logs]# ls -lh daemon*
-rw-r–r– 1 root root 36M Feb 19 12:26 daemon_130219_122717.log
-rw-r–r– 1 root root 56M Feb 19 12:26 daemon_130219_122717.raw
-rw-r–r– 1 root root 4.1M Feb 21 10:01 daemon_130221_104654.log
-rw-r–r– 1 root root 6.8M Feb 21 10:01 daemon_130221_104654.raw
-rw-r–r– 1 root root 19M Feb 28 10:29 daemon_130228_115135.raw
-rw-r–r– 1 root root 11M Feb 28 10:29 daemon_130228_115136.log
-rw-r–r– 1 root root 59M Mar 11 09:20 daemon_130311_092040.raw
-rw-r–r– 1 root root 47M Mar 11 09:20 daemon_130311_092041.log
-rw-r–r– 1 root root 14M Mar 16 13:59 daemon_130316_135910.raw
-rw-r–r– 1 root root 13M Mar 16 13:59 daemon_130316_135911.log
-rw-r–r– 1 root root 75K Mar 16 14:09 daemon.log
-rw-r–r– 1 root root 25M Mar 8 11:37 daemon.log2
-rw-r–r– 1 root root 24K Mar 16 14:09 daemon.raw

 

 

no comments

7
March

The Transformation Continues…Backup Game Day Is Back

The Transformation Continues…Backup Game Day Is Back How do you turn backup and recovery into an offensive strategy that delivers game-changing business results? More than 4,000 of you tuned in last fall to hear the first part of the story, and on Monday, March 11, we’re back with the [...]

no comments

6
March

NetWorker vStorage API missing from client

Wow, this was infuriating.
After upgrading to NetWorker 8 I noticed some VADP backups I had been testing started failing.

I found this message in the output

c:\mnt\cwf3019.hq.huskyenergy.com\cwf3019.hq.huskyenergy.com\tmp.
Temporary Directory for VADP created.
Temporary vmMntLoc Directory for VADP created.
*** vStorage API driver (\??\C:\Program Files\Legato\nsr\plugins\VDDK\bin\AMD64\vstor2-mntapi10-shared.sys) not installed by NetWorker. This may cause potential compatibility issues. ***

Yeah!? NetWorker can’t find the vStorage API?! Really?!

Then I noticed NetWorkers new default install path is C:\Program Files\EMC NetWorker. Thank you EMC for updating that. You only bought Legato 9 years ago.

The former install path was C:\Program Files\Legato

I searched the registry on my proxy and found this key

vstor2-mntapi10-shared

Then updated the path in the registry to C:\Program Files\EMC NetWorker\

Also of note, if you like to create clients by right clicking and selecting copy.
For VADP clients you need to ensure the field in application information on the Apps & Moduiles VADP_VM_NAME is updated.

no comments

Back to top