One of the most basic things to understand in back-up and recovery is the concept of back-up levels and what they mean.
Without a proper understanding of what they are and how they work, companies can adopt bad practices that range from wasted bandwidth and storage to actually missing important data on their back-ups.
Understanding these concepts is also crucial when selecting new data-protection products or services.
A full back-up contains all data in the entire system. A full back-up of the C:\ drive in Windows contains every file on the C: drive. A full back-up of a Windows system should contain a copy of every file on every drive on the machine or VM (e.g. C:\, D:\, F:\, etc.).
The same goes for a full back-up of a UNIX or Linux machine; it contains every file on every file system on the machine (e.g./, /home, /opt, etc.).
The only thing that should be excluded from a full back-up are files that were specifically excluded by the configuration.
For example, many system administrators choose to exclude directories that will have no value during a restore (e.g. /boot or /dev), or contain transient files (e.g. C:\Windows\TEMP in Windows, or /tmp in Linux).
There are two philosophies when discussing what files should be included or excluded from back-up: back-up everything and exclude what you know you don't need, or select only what you want to back-up.
The former is the safer option, the latter will save some space on your back-up system. Some people see it as a waste to back-up application files, such as the directory into which you have loaded Oracle or SQL Server.
They believe they would simply reload the application during a restore. The risk of this approach is that someone will place valuable data in a directory that is not selected for back-up.
For example, if you select only /home1 or D:\Data to be backed up, how will the back-up system know if someone adds /home2 or E:\Data?
This is why it is much safer to back-up everything and exclude only the files that you know you don't need, even if it does take up some additional space.
An exception to this might be if you have a strongly controlled environment where all data is always loaded in the same place, and you have a well orchestrated solution for replacing the operating system and applications in a restore.
An incremental back-up typically backs up all data that has changed since the last back-up of any kind.
Historically, such back-ups were file-based back-ups, meaning that they backed up all files that had changed since the last backup.
The challenge with this from a modern data protection standpoint is that we are attempting in every way to minimise the I/O impact of back-ups on the server (especially when backing up VMs), and backing up a 10 GB file because 1 MB has changed isn't very efficient.
This is why many vendors have switched to block-based incremental based back-ups, which back up only the blocks that have changed. The most common way to do this is when back-up software products are backing up VMware or Hyper-V using their APIs.
The app notifies the appropriate API it is doing a block-based incremental, after which it is given a list of blocks to back up.
Although it has meant a few different things over the years, it is now widely accepted that a differential back-up will back-up all data that has changed since the last full back-up.
This type of back-up was much more in vogue in the days of tape, as it minimised the number of tapes that was required for a restore. A restore needed the latest full, followed by the latest differential, followed by the latest incremental.
If you are still doing tape-based back-ups, consider this: move from weekly fulls to a monthly full, weekly differential, and daily incremental.
A restore will need to load one more backup than it would have needed to load under a weekly full backup setup. It saves a tremendous amount of tape and network bandwidth. This has been quite popular for quite a while for those still using tapes.
The advent of disk and deduplication has made full and differential back-ups passé. As mentioned previously, the reason we did the occasional full and differential back-ups was to minimise the number of tapes necessary to perform a restore.
This no longer applies in the world of disk back-ups. As long as a product has been architected to fully utilise disk, restoring data from thousands of incrementals should take no more time than restoring it from a single full.
This is because the backup system is simply keeping a record of where all of the files/blocks are in its storage and transferring all of those files/blocks from its storage back to the client during a restore.
How those files/blocks got there is irrelevant in a modern back-up world. Forever-incremental, especially if it is implemented using a block-based approach, is the most efficient way to update your back-up repository with the latest information from each back-up client.
Windows systems use something called the archive bit to determine if a file has changed since the last back-up. Any modifications to a file result in its archive bit being set, after which any backup of any level would back it up.
After the file has been backed up, the backup application clears the archive bit, after which it will not get backed up again until the next full back-up.
Many back-up purists do not like the archive bit, if for no other reason than it should be called the back-up bit – as back-ups are not archives.
Other issues with the archive bit include the fact that if you have two back-up applications running at the same time they will step on each other by clearing the archive bit.
The move of most companies to virtualisation, and the use of back-up APIs that interface at the virtualisation level, followed by the use of block-based incremental back-ups has somewhat made the archive bit not as important as it used to be.
It really only applies in host-based back-ups, which are becoming more rare every day.
(Reporting by W. Curtis Preston, Network World)