What is this?

This is basically where I write down stuff that I work with at my job as a GIS Technical Analyst (previously system administrator). I do it because it's practical for documentation purposes (although, I remove stuff that might be a security breach) and I hope it can be of use to someone out there. I frequently search the net for help myself, and this is my way of contributing.

Monday, March 29, 2010

HP VLS6636 deduplication real world performance

A few months we installed a new virtual tape library - a HP VLS6636. The library and our Data Protector 6.11 backup server are connected through a FC switch, and most server backups are done via regular gigabit ethernet.

Currently the library has 24 x 476(500) GB SATA disks. A total of apx 11 TB raw capacity which after initialization leaves 8.8 TB physical capacity. 1.04 TB is reserved by the system (temp-space for deduplication etc) so the space left for backup data is 7.76 TB.

The total size of all our backup data is apx 3,04 TB with an average of 0,26 TB differential data daily. It's all kinds of data - but mainly Winows Server OS. My goal was to hold 5 weeks of full an ddifferential (Mon-Thu) backups, some manual backups and in addition leave room for some future growth. The rough calculation for our current needs is like this:

Full backup: 5 x 3.04 TB
Differential backup: 4 x 0.26 TB x 5 (4 days a week for 5 weeks)
A total of 15.2 + 5.2 = 20,4 TB
+ 10 TB growth and manual backups

In other words - we require 4 times the available space on the VLS6636. Time to start filling it up :)



The first reading (7.1) is done after two full backups and a few manually defined backups.

The increase in Logical data usage after 4.2 is due to a couple of full backups outside the regular backup schedules, so you're in fact looking at 7 full backup sets.Notice that even though the Logical data increases sharply, there's hardly any increase in used capacity.

After 11.3 you'll notice a sharp increase in the logical data and a decrease in available capacity. This is due to changes in backup procedures for Virtual Servers (VRanger Pro upgrade). Basically the same VM's are being dumped to file, but the dump - and metadata files look different and the VLS6636 naturally interprets it as new data and thus ruining the nice stats. I expect the data usage to go back to what it was prior to 11.3 once backup data from the old VM backup procedures have expired.

This illustrates the importance of remembering things like this when using deduplication technology. Seemingly small changes can have a significant impact on storage needs. This also goes for the backup jobs. Creating two similar jobs with different names will be considered completely different sets and thus they will not deduplicate at all.

All in all; as long as the backup definitions and data are fairly static in terms structure and contents - the deduplication works very well. Even the differential backups deduplicate quite well, I would estimate an average of 5:1.

When it comes to speed the VLS6636 really performs decent. We experience speeds up to 60 MB/sec when using 4 simultanous datastreams. On average mixed backup to VLS will happen at apx 40/MB sec. The backup server is fairly old, so I would not be surprised if the results will be even better with a new backup server.

We have also restored files for verification purposes, and haven't noticed any problems. Yet ;-)