As deduplication takes hold and IT organizations gain maturity in its use, it’s only natural that users will seek greater optimization of the technology. One would think that global deduplication would be on that optimization checklist. Curiously, recent ESG data protection research found that the ability to deduplicate across systems, as opposed to just within a system, ranked near the bottom of respondents’ evaluation and selection criteria—well below their more pressing concerns such as cost, ease of implementation/use, impact on backup/recovery performance, integration with existing backup processes, and scalability.
What is global deduplication? Well, we know that deduplication in backup finds redundant data and ensures that only unique data is written, storing data more efficiently. Identification of duplicates can occur in two ways: within a single domain (backup data passing through an individual system is compared with data passing through the same system) or across domains (backup data passing through an individual system is compared with data passing through the same system as well as other systems in the domain). The latter, global deduplication, can result in higher deduplication ratios since there are more comparisons and, therefore, more chances to find replicate data.
Global deduplication is often the byproduct of the underlying architecture—or approach—of the solution, such as source-side software and grid- or clustered node-architecture approaches, performing redundancy checks. Backup software solutions with deduplication offered at the global level include: CommVault Simpana, EMC Avamar, and Symantec NetBackup PureDisk. Backup target devices with global deduplication include: Exagrid EX Series, FalconStor FDS, HP VLS, IBM ProtecTIER, NEC HydraStor, and Sepaton DeltaStor.
Are the potential higher reduction ratios afforded by global deduplication the real benefit? It’s a benefit, but probably not the biggest one. Poor reduction ratios are not the real issue with solutions that lack global deduplication. It’s more about the inefficiency of managing the silos that develop in local-only deduplication solutions.
Data growth rates in the 10-30% range are the norm. Deduplication helps stem the need to scale the backup environment. However, at some point, backup throughput and/or capacity requirements are going to “break” the solution. Adding more backup infrastructure can get you back on track, but it introduces a new problem: more points of management.
What the global deduplication approaches have in common is a multi-node architecture with the ability to manage multiple deduplication systems as one. Throughput scalability, high availability, and load balancing benefits of these architectural approaches should be the callout. It’s these features that reduce administrative overhead—oftentimes the larger burden in the backup TCO equation—and, importantly, tie back into the aforementioned top criteria for purchasing deduplication, including cost, ease of use, performance, and scalability.
Read more of Lauren’s blog entries at Data Protection Perspectives.





