What is this?

This is basically where I write down stuff that I work with at my job as a GIS Technical Analyst (previously system administrator). I do it because it's practical for documentation purposes (although, I remove stuff that might be a security breach) and I hope it can be of use to someone out there. I frequently search the net for help myself, and this is my way of contributing.

Tuesday, November 17, 2009

Netapp FAS 2040 CIFS backup performance benchmark


We recently invested in a new filer from Netapp. Unfortunately we can't do agent based backups of the large CIFS-filesystem on the filer, and NDMP dump to tape is also problematic. Actually the NDMP backup itself is not a problem, but restore is. NDMP backups initiated from Data Protector will do a raw block copy, not a file copy of the individual files on the filesystem and thus making restore quite interesting.

We still chose to go for NetApp though, because we trust that the snapshot functionality will allow us to use less backups to tape (hopefully as little as once or twice a month) and thus making high speed a less of a critical issue.

The obvious solution is to do a normal Network share backup from DP (in production this will probably be done on a snapshot, but for now this will do). Basically I'm going to run a series of tests to determine the number of data streams and virtual tape devices which will give the best performance. Note that there are no other users attached to the NetApp filer or the Ethernet switch.

The results are considerably slower than what we can expect from a larger backup due to the relatively large overhead with small jobs.

I noticed that this type of backup is very CPU intensive on the Cell Manager. It would pretty much max out at 100% constantly during. Memory was not a problem though.

Setup

Server: HP Proliant DL380 G3 / Windows 2003 x86 / 2 GB ram / Dual Intel Xeon 2.8 ghz
Backup software: Data Protector 6.0
Network: HP Procurve 2824 switch (single gigabit connection)
Storage system: NetApp FAS 2040
Tape library: HP 6636 VLS (Virtual tape library) connected to the DP Cell Manager by FC.
Test data: A collection of user home folders on 7 shares containg 39.108 files (19.932 MB)


Test Results

Test 1
1 data stream / 1 drive (Load balancing Min:1 / Max:1 and Concurrency: 1
Total 1427 seconds = 13,97 MB/sec

Test 2
2 data streams / 2 drives (Load balancing Min:2 / Max:2 and Concurrency: 1 per drive)
Total 1128 seconds = 17,67 MB/sec

Test 3
4 data streams / 2 drives (Load balancing Min:2 / Max:2 and Concurrency: 2 per drive)
Total 1002 seconds = 19,9 MB/sec

Test 4
2 data streams / 1 drives (Load balancing Min:1 / Max:1 and Concurrency: 2 per drive)
Total 1135 seconds = 17,56 MB/sec

Test 5
3 data streams / 1 drives (Load balancing Min:1 / Max:1 and Concurrency: 3 per drive)
Total 1015 seconds = 19.64 MB/sec

Test 6
4 data streams / 1 drives (Load balancing Min:1 / Max:1 and Concurrency: 4 per drive)
Total 983 seconds = 20,28 MB/sec

Test 7
5 data streams / 1 drives (Load balancing Min:1 / Max:1 and Concurrency: 5 per drive)
Total 994 seconds = 20,05 MB/sec

Test 8
7 data streams / 1 drives (Load balancing Min:1 / Max:1 and Concurrency: 7 per drive)
Total 964 seconds = 20,67 MB/sec

Test 9
8 (7) data streams / 2 drives (Load balancing Min:2 / Max:2 and Concurrency: 4 per drive)
Total 960 seconds = 20,76 MB/sec

Test 10
I decided to increase the amount of data to 37508 mb and rerun the backup with settings from Test 8 to see if the results would be any better.
Total 1597 seconds = 23,49 MB/sec




I was a little disappointed about the results. They are considerably slower than what I experienced with the old HP MSA1000 SAN using a backup agent. On the other hand - I knew that this type of backup is slow.

It also seems that you wont gain much in terms of speed from using > 3 data streams, but you will stress the Netapp filer a little more (I noticed CPU would be higher the more streams you use. One stream would average 9% on the Netapp and 8 streams would average 14%).

Perhaps I can achieve slightly better results by using a faster CPU on the Data Protector Cell Manager server, but I doubt we're talking about anything higher that 30MB/sec at best.

On the other hand - it really does not matter much on a day to day basis if I'm only going to do a full backup to tape only a couple of times a month. A 3 TB volume should take apx 36 hours to finish.