Troubleshooting NDMP backup issues

This morning I came in to find the NDMP backups of the Celerra had been hung overnight. The other filer was fine as were all the client backups.

So lets go through the list of things to check.

1. Is the filer up?

I know this sounds dumb, but really start with the simplest solution. Yes, the filer is online.

2.Is the NDMP service running on the filer?

To confirm run the following command from your NetWorker server.

inquire -N filer_name.

You will be prompted to enter the associated credentials. It will then return a list of devices attached to the filer. Compare this against what is in NetWorker. Yup! It all looks good.

3. Is there an issue with the media database?

I’ve seen this before. Where there is media available to use but networker is not using it. This should be associated with some alerts indicating “unable to allocate media” I wasnt seeing any such alert. At the time I had only scratch tapes, that is returned previously used media. I added some brand new never used tapes to see if that would work. I also ran some of the database checks. nsrim -X. No joy, if the problem is related in some way to this I’ll need some help from the vendor.

4. Are there any processes running for these jobs?

# ps -ef | grep FILER101
root 1512 3029 0 May01 ? 00:00:04 /usr/sbin/nsrndmp_save -T dump -F :sapvol01:uxvol01:filenetvol01 -s cls213.company.ca -c FILER101.company.ca -g FILER101_04 -LL -t 1367328945 -l 7 -q -W 78 -N /epvol08 /epvol08
root 1942 3047 0 02:14 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 SQL Prod SQL_DB_Prod
root 4735 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 7 -N 1 FILER101_07
root 10453 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 5 -N 1 FILER101_05
root 11767 16287 0 11:47 pts/3 00:00:00 grep FILER101
root 13881 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 3 -N 1 FILER101_03
root 14461 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -s cls213.company.ca -c FILER101.company.ca -g FILER101_03 -LL -t 1366767465 -l 2 -q -W 78 -N /epvol07 /epvol07
root 14914 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 1 -N 1 FILER101_01
root 15502 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -F :epvol11 -s cls213.company.ca -c FILER101.company.ca -g FILER101_01 -LL -t 1367367626 -l 4 -q -W 78 -N /epvol10 /epvol10
root 21381 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 6 -N 1 FILER101_06
root 26306 3047 0 May01 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 2 -N 1 FILER101_02
root 26418 3029 0 May01 ? 00:00:02 /usr/sbin/nsrndmp_save -T dump -F :SEIS:epvol01:epvol02:epvol03 -s cls213.company.ca -c FILER101.company.ca -g FILER101_02 -LL -l full -q -W 78 -N /EP /EP
root 30062 3047 0 Apr30 ? 00:00:00 /usr/sbin/savegrp -I -C FILER101 Day 4 -N 1 FILER101_04

5. Finally, check the logs.
Check /nsr/cores and the daemon log.

In this instance I found this message scrolling through.
83446 05/02/2013 02:13:28 PM nsrmmd NSR warning Ignoring shutdown request for nsrmmd 3 with pid 6870 because it’s currently busy.

Lets look at this PID…

root 6870 1 0 Apr16 ? 00:01:07 /usr/sbin/nsrmmd -b 2 -N 364266497 -n 3 -s cls213.company.ca -r FILER101 -t cls213.company.ca

My plan was to kill and restart, but before I could restart the jobs took off. I guess the process restarted on its own.
Really, this process is not specific to NDMP. Just wanted to map out my process when troubleshooting.

Category: NetWorker |

no comments

Comments are closed.