Deduplication provides an efficient method to transmit and store date by identifying and eliminating duplicate blocks of data during backups.
Deduplication offers the following benefits:
- Optimizes use of storage media by eliminating duplicate blocks of data.
- Reduces network traffic by sending only unique data during backup operations.
Inline and post process
- It lengthens the time to complete the backup, leading to longer backup windows and degraded performance during business hours as well as the inability to start the next backup because the previous backup job is still running;
With post-process deduplication, the backup is briefly placed on disk-based staging storage prior to being deduplicated.
Source-side deduplication typically uses a client-located deduplication engine that will check for duplicates against a centrally-located deduplication index, typically located on the backup server or media server.
However, by running source-side deduplication users are adding hashing, a processor-intensive algorithm, to the client.
Target-side deduplication is generally better suited for data-intensive environments and runs the deduplication at the storage level, removing the need to have clients with enough ‘horsepower.
A final criterion to review when evaluating deduplication technologies is deciding how long to retain data; the more the data that is examined, the greater the likelihood that duplicates are found and hence the greater the space savings.