How to modernise your backup infrastructure
- 15 September, 2010 22:00
Ever get the feeling that your backup system is behind the times? Do you read trade magazines and wonder if you're the only one still using an antiquated backup system? The first thing you should know is that you're not the only one. But your backup system could probably use some modernisation.
New technologies have changed the nature of the backup game in a fundamental way, with disk playing an increasingly important role and tape moving further into the background. Many of the liabilities and performance issues that have dogged datacentre backups forever now have plausible technology solutions, provided those solutions are applied carefully and dovetail with primary storage strategy. It is truly a new day.
Before you contemplate a modernisation plan, you need a working understanding of new high-speed disk based solutions, schemes that reduce the volume of data being replicated, and how real-time data protection techniques actually work. With that under your belt, you can start to apply those advancements to the real world data protection problems every data center faces.
The disk in the middle
D2D2T (disk-to-disk-to-tape) strategies have gained popularity in recent years due to the great disparity between the devices being backed up (disks), the network carrying the backup, and the devices receiving the backup (tape). The average throughput of a disk drive 15 years ago was approximately 4MBps to 5MBps, and the most popular tape drive was 256KBps, so the bottleneck was the tape drive.
Fast-forward to today, and we have 70MBps disk drives, but tape drives that want 120MBps. Disks got 15 to 20 times faster, but tape drives got almost 500 times faster! Tape is no longer the bottleneck; it's starving to death. This is especially true when you realise that most backups are incremental and hold on to a tape drive for hours on end -- all the while moving only a few gigabytes of data.
D2D2T strategies solve this problem by placing a high-speed buffer between the fragmented, disk-based file systems and databases being backed up and the hungry tape drive. This buffer is a disk-based storage system designed to receive slow backups and supply them very quickly to a high-speed tape drive.
The challenge faced by some customers (especially large ones) was that many backup systems didn't know how to share a large disk system and use it for backups. Sure, they could back up to a disk drive, but what if you needed to share that disk drive among multiple backup servers? Many backup products still can't do that, especially Fibre Channel-connected disk drives. Enter the virtual tape library, or VTL. It solved this sharing problem by presenting the disk drives as tape libraries, which the backup software products have already learned how to share. Now you could share a large disk system among multiple servers.
In addition, customers more familiar with a tape interface were presented with a very easy transition to backing up to disk. Another approach to creating a shareable disk target is the intelligent disk target, or IDT. Vendors of IDT systems felt the best approach was to use the NFS or CIFS protocol to present the disk system to the backup system. These protocols also allowed for easy sharing among multiple backup servers.
But both VTL and IDT vendors had a fundamental problem: The cost of disk made their systems cost effective as staging devices only. Customers stored a single night's backups on disk and then quickly streamed them off to tape. They wanted to store more backups on disk, but they couldn't afford it. Enter data deduplication.
The magic of data deduplication
Typical backups create duplicate data in two ways: repeated full backups and repeated incrementals of the same file when it changes multiple times. A deduplication system identifies both situations and eliminates redundant files, reducing the amount of disk necessary to store your backups anywhere from 10:1 to 50:1 and beyond, depending on the level of redundancy in your data.
Deduplication systems also work their magic at the subfile level. To do so, they identify segments of data (a segment is typically smaller than a file but bigger than one byte) that are redundant with other segments and eliminate them. The most obvious use for this technology is to allow users to switch from disk staging strategies (where they're storing only one night's worth of back-ups) to disk backup strategies (where they're storing all onsite backups on disk).
There are two main types of deduplication:
- Target dedupe systems allow customers to send traditional backups to a storage system that will then dedupe them; they are typically used in medium to large data centers and perform at high speed.
- Source dedupe systems use different backup software to eliminate the redundant data from the very beginning of the process and serve to back up remote offices and mobile users.
Backing up as you go
CDP (continuous data protection) is another increasingly popular disk-based backup technology. Think of it as replication with an Undo button. Every time a block of data changes on the system being backed up, it is transferred to the CDP system. However, unlike replication, CDP stores changes in a log, so you can undo those changes at a very granular level. In fact, you can recover the system to literally any point in time at which data was stored within the CDP system.
A near-CDP system works in similar fashion except that it has discrete points in time to which it can recover. To put it another way, near-CDP combines snapshots with replication. Typically, a snapshot is taken on the system being backed up, whereupon that snapshot is replicated to another system that holds the backup. Why take the snapshot on the source before replication? Because only at the source can you typically quiesce the application writing to the storage so that the snapshot will be a meaningful one.