BACKUP VS ARCHIVE – WHAT DIFFERENCE DOES IT MAKE?
Some words have a variety of meanings, and we can sometimes get confused with their purpose and usage. For instance, what is the difference between continually and continuously? They are often used interchangeably but there are key variances. According to Dictionary.com continually means “very often; at regular or frequent intervals,” and continuously means “unceasingly; constantly; without interruption.” For example: the executive continually reads the stock reports at noon each day versus the executive reads the stock reports continuously, all day long without stopping. The latter would be quite tiresome and nonproductive. Similarly, backup and archive are sometimes used interchangeably but important distinctions exist, especially in practice. Let’s take a closer look.
Backup and Archive – It Makes a Difference
Backup (often called Backup and Restore) is making a copy of current data so that it can be utilized to restore the data in the unforeseen circumstance that the original data was corrupted, deleted or destroyed by unintentional or intentional means. We discussed these data destroyers in a previous BlogBytes called the “Backup Blacklist.” The backup is often done incrementally and kept for certain lengths of time (seven days, two weeks, etc.) based on user set policies that consider the data value, as well as internal and external requirements.
Archive stores a single instance of the data or data sets, a historical collection, explicitly chosen for potential long term future usage. A key distinction here is that archived data is the original or single copy and is typically no longer in current usage.
Some treat multiple backups as their data archive and are unnecessarily creating a mountain of information that can consume space and costs. Once data is no longer in the ‘active’ category but a single copy is still needed in the chance that it may be recalled, it should be moved to the archive for long term retention. This practice can free up primary storage and backup space, lessen backup management overhead, and provide easier classification and retrieval of the information from the archive, rather than trying to sift through multiple copies of backed up information.
Mine that Data
Most archive management software provides capabilities not typically found in backup processes such as cataloging and metadata search. Information Lifecycle Governance (ILG) enables that cataloging, eDiscovery, defensible deletion and data retention on lowest cost data storage infrastructures. The metadata could include names, labels, data types, owners, dates and more. This can provide the means to mine the archive to address a variety of important requests such as an internal audit, customer inquiry or regulatory requirement. Tape storage data can be mined and with use of the Linear Tape File System the mining of metadata on LTO tape becomes even easier. Especially when used with archive management software offered by a variety of LTFS supporting providers.
Where Do Tape and Disk Fit?
Enterprise disk and flash are used for primary data storage and can be backed up to other storage devices
including deduplicated disk and tape. Deduplication has provided the means to reduce the amount of space typically needed to store data thereby stretching the storage investment dollar. Bear in mind though, deduplication needs to see multiple copies of the same data set to do its dedupe magic. When the original file is put in the archive for long term storage, it is the only copy. Therefore, deduplication will have little effect on this single version of the original data that is now in the archive. That and other benefits make tape the prime choice for archive storage to contain costs while providing protection for original content. This can put money back in to the enterprise piggy bank.
One last note: since the archive contains original and single copy data, make a second copy on tape and move it offsite for the ultimate protection. Essentially, it is a backup of the archive. LTO Tape is low cost…do it now.
To sum up, with carefully chosen descriptors, backup and archive processes need to occur continually to help protect data, contain costs, and provide management information to keep the enterprise running continuously.