Introduction
ESG Lab conducted this Validation Report in November 2009 prior to the public announcement of Albireo. At the time, we evaluated a Beta version of Albireo as Permabit was in early discussions under NDA with storage OEMs and beginning initial implementations. Permabit publicly announced Albireo High Performance Data Optimization Software on June 7, 2010.
Background
Back in 2001, when Data Domain was founded and EMC purchased Belgian startup FilePool for its Centera product line, few in the industry had heard of data deduplication. Fewer still were aware that Permabit, founded in 2000, was hard at work developing scalable data deduplication technology. Today, it’s hard to find anyone in the storage industry that hasn’t heard about data deduplication.
As shown in Figure 1, a recently completed ESG survey of 398 North American IT organizations decision makers indicates extreme interest in using data deduplication technology for data protection. In fact, 43% of respondents currently utilize or plan to use some type of data deduplication technology to eliminate redundant data.[1] Within the data protection market, ESG expects that interest and adoption in deduplication will increase significantly over the next five years as it reaches mainstream market adoption.
Interest outside of the backup and recovery market is growing as well. Solutions with built-in deduplication support are being used for disk-based archival (e.g., EMC Centera, Permabit Enterprise Archive). In the primary storage market, data deduplication has been added to network attached storage (NAS) systems (e.g., NetApp FAS). As a matter of fact, ESG’s research indicates that space efficiency and data reduction rank second and third, respectively, in the most important features and attributes when considering a NAS solution. This survey of decision makers within enterprise-class organizations indicates that 30% would not purchase a NAS system without data reduction and 46% would strongly prefer a solution with this attribute.[2] These are surprisingly strong results since the majority of NAS solutions don’t support data deduplication. Given the dramatic benefits that have been realized with existing data deduplication technologies, ESG believes that it’s only a matter of time before policy-based deduplication support is added to primary block-based storage systems as well.
Introducing Permabit Deduplication Technology
Permabit’s field-proven data deduplication engine is now available as a software development kit (SDK) that provides unified data deduplication advisory services. The Albireo SDK is unique in its ability to provide a wide variety of deduplication services. The Albireo SDK can be used to:
- Reduce capacity within file- or block-based storage systems (e.g., NAS appliances or FC disk arrays)
- Process objects at the sub-file or single instance storage (SIS) level
- Remove duplicates in real-time (a.k.a., inline) or later after data has arrived (post-process)
- Eliminate application performance concerns
- Scale from a single storage controller to a cluster of storage controllers or a cluster of deduplication appliances communicating with a storage system over an industry standard Ethernet interface
Permabit was founded in 2000 by MIT engineers with a goal of developing a scalable enterprise-class archive product with built-in data deduplication. That product, now known as the Permabit Enterprise Archive, is shown towards the left in Figure 2. Industry standard servers are arranged in a grid, with each server acting as either an access or storage node. The storage nodes are packed full of high capacity SATA drives and presented as a network attached file system (NAS). Servers can be added to the grid for increased performance and capacity.
In the Permabit Enterprise Archive product, a global pool with deduplication technology is implemented within a grid of servers. Permabit has extracted this core deduplication technology to create the Albireo software development kit. Albireo deduplication advisory services are accessed by software running within a storage system through an application programming interface (API) provided by Permabit. A storage system with Albireo running within a single controller attached to a number of drive enclosures is shown towards the right in Figure 2.
Albireo indexes chunks of data and provides advisory notifications when duplication is detected. The software running in the storage system decides whether to take the advice. If the advice is taken, it is the responsibility of the storage system software to update references to duplicate data. The Permabit Albireo API supports both block and stream-based access methods. The stream-based API provides content-aware segmentation to optimize deduplication based on the file type. Duplicate advisory services can be provided synchronously, or asynchronously using a registered callback. Taken together, the Albireo SDK provides a flexible and powerful array of deduplication services.
ESG Lab Validation
ESG Lab evaluated the Albireo SDK during two days of hands-on testing at Permabit headquarters in Cambridge, Massachusetts. The evaluation began with an overview of how Albireo deduplication advisory services are used within an existing storage system. As shown in Figure 3, the software running within a storage system uses an API call to send new data and addressing information to Albireo. Albireo ingests the data and performs a SHA-256 hash. A two stage lookup (memory and if needed disk) is performed to see if the data has already been stored. If the data is a duplicate, the API returns the location of the pre-existing data. If the data is new, Albireo remembers the SHA-256 hash and the chunk’s location so that it can detect duplicate data in the future.
ESG Lab evaluated the Albireo SDK using a pair of programs developed by Permabit:
PBSCAN: This utility uses Albireo advisory services to determine the savings that can be achieved with deduplication. The program is routinely used at prospective customer sites to determine the benefits of deduplication with real-world data sets.
DD2FS : An open-source user-space file system was modified to illustrate the ease of integrating Albireo into an existing file system.
Each test program was implemented as a single process with the Albireo engine running as a separate service. The pbscan utility was implemented with multi-threads working in parallel. As shown in Figure 4, the test bed used for this phase of the evaluation consisted of a single quad core Intel Xeon 3 GHz processor with 4 GB of RAM. For test purposes only and to more easily discern the resource utilization by Albireo, all but one of the processor cores were disabled in BIOS.
While ESG Lab testing was performed with a single processor core, it should be noted that the extreme scalability of the Albireo engine has been proven in production customer environments. Later in this report, we’ll take a look at how Permabit’s underlying deduplication technology is routinely deployed within clusters of multi-core servers.
Capacity Savings
ESG Lab used the Albireo-enabled pbscan utility and real-world application data collected from Permabit’s production IT servers to evaluate the capacity savings that can be achieved with Permabit deduplication advisory services. The results for two data sets are summarized in Figure 5 and Table 1.


What the Numbers Mean
- A 43 GB set of common office productivity files (e.g. documents, spreadsheets, presentations) was reduced by 32.67%.
- Four VMware virtual server images with a total capacity of 157 GB were reduced to only 4.3 GB. VMware virtual server images are often highly redundant—especially when each of the virtual machines is running the same guest operating system as was the case for this test. In this example, 97% of the capacity required for the VMware images can be saved with Albireo enabled deduplication.
The experiment was repeated for a pair of Microsoft Hyper-V images and a week’s worth of Microsoft Exchange backup images. One of the backup images was a full backup; the other images were incremental. The results for all of the data types evaluated by ESG Lab are summarized in Figure 6 and Table 2. Note that the savings are shown as a deduplication ratio instead of a percentage of capacity saved.


What the Numbers Mean
- A pair of Microsoft Hyper-V images was reduced by a factor of 2.1:1
- A single week’s worth of Microsoft Exchange backups (weekly full, daily incremental) was reduced from 2,203 GB to 501 GB, providing an excellent deduplication ratio of 7.4:1
- A deduplication ratio of 36.2:1 was recorded for four VMware images.
Why This MattersData proliferation is a challenge for IT professionals within organizations of all sizes. By eliminating redundant data, data deduplication can significantly reduce capacity requirements. Reducing capacity requirements reduces the cost of storing and protecting data. ESG Lab testing with real-world application data has confirmed that Permabit Albireo deduplication advisory services can be used to reduce capacity requirements for primary storage arrays, disk-based archives, and backups. ESG Lab recorded an outstanding deduplication rate of 97% (36.2:1) for four VMware virtual server images. |
Ease of Integration
ESG Lab evaluated the ease of adding deduplication to an existing storage solution using the FUSE open-source user-space file system framework.[3] The FUSE file system is built over the native ext3 Linux file system. There’s no need to patch or recompile the kernel. Permabit created a file system using FUSE named dd2fs. The dd2fs file system was modified to use Albireo to identify and eliminate duplicates. Six Albireo API calls and 52 lines of supporting code were added to the 1,563 line FUSE demonstration program. Synchronous (inline) and asynchronous (post-processing) deduplication modes were demonstrated.
As shown in Figure 7, the update_block function passes data to be written to the Albireo uds_index_block API. A return value of 1 indicates that the block is a duplicate and that it can share storage with the canonical chunk. After a bit of bookkeeping and error checking, the storage associated with the duplicate block is freed. Other than this key function call, the majority of the Albireo-related code changes were isolated to initialization, shutdown, and handling callbacks when operating in asynch/post-process mode.
The FUSE demo was run to see Albireo-enabled deduplication in action. As shown in Figure 8, a 64 KB file full of random data was written to an empty Albireo-enabled file system. A copy of the file with a new name was added. The Linux df and du utilities were used to verify that the user’s view of the file system included the space consumed by both files, yet only a single file’s worth of disk capacity was consumed. The Linux md5sum utility was used to verify that the files were the same. As each file was deleted, the file system capacity and underlying disk capacity were checked.

Why This MattersDeduplication is valuable technology that can be frustratingly hard to develop and debug. The field-proven and patented deduplication technology at the core of Permabit Albireo API incorporates over 25 man years of development. Integrating Albireo using six well-documented API calls consumed only 52 lines of code. ESG Lab is confident that an experienced storage systems architect working alongside a Permabit engineer can complete a proof of concept integration in two weeks—or less. |
Resource Utilization
Identifying duplicate data is a resource intensive operation. First, data needs to be fed into a deduplication engine. Moving a lot of data can consume a lot of bandwidth. Next, the engine needs an algorithm which can be used to quickly, and accurately, identify duplicates. Most deduplication solutions use cryptographic hashing functions to identify duplicates, but hashing a lot of data can consume a lot of CPU horsepower. Last, but not least, the deduplication solution needs to maintain an index of previously processed data to find and keep track of duplicates. Most deduplication solutions use a two stage lookup, with the first lookup occurring in memory and the second occurring on disk. Indexing lots of possibly duplicate data can consume a significant amount of memory. Each of these resource issues can have an effect on the overall performance of a storage solution.
ESG Lab’s analysis of CPU, memory, and performance efficiency began with a review of Permabit’s patented two stage deduplication detection algorithm. [4] The patent describes a highly efficient two stage index residing in memory and on disk. This novel approach uses a combination of bit sampling and byte differencing to provide a first stage memory lookup that executes very quickly, consumes very little memory, and has a very low probability of false positives. ESG Lab has confirmed that the Permabit deduplication advisory services engine:
- Delivers fast in-memory deduplication lookups that take between 9 and 17 microseconds[5]
- Requires less than 3.5 bytes of memory to index each chunk of deduplicated data[6]
- Allows a single server to deduplicate up to 48 TB of data with under 2.5 GB of RAM[7]
A 42 GB set of office productivity files (documents, spreadsheets, presentations, PDFs, etc.) was processed by the Albireo-enabled scan utility. The scan utility was used to quantify the processor, memory, and performance impact of Permabit deduplication advisory services. The results of three tests were compared:
- Scan only: opened the file system and read 42 GB of office productivity files from beginning to end. The 42 GB was spread over 26,094 files.
- Scan and index (64 KB chunk): opened and passed all of the data within each file to the Albireo API. The Albireo API was used to identify and keep track of 64 KB chunks of data in a Permabit index of duplicate candidates.
- Scan and index (4 KB chunk): repeated the second test with a chunk size of 4 KB.
Linux utilities were used to record CPU and memory utilization. Trace output was used to record the average latency for each Albireo deduplication lookup. In particular, trace data recorded all of the latency associated with a Permabit deduplication advisory services call, including the SHA-256 ingest, the first pass memory index, and a second order index operation for likely duplicate candidates. The elapsed timed needed to process the 42 GB file set was used to calculate throughput. The results are summarized in Table 3 and Figure 9.


What the Numbers Mean
- The scan only test spent most of the time reading from disk and consumed very little CPU (less than 1%).
- Permabit deduplication advisory services consumed approximately half (28% to 59%) of a 3 GHz Xeon processor core.
- Scanning and indexing with a larger chunk size consumes slightly more CPU. This is due to the fact that bigger chunks of data were passing through the CPU intensive SHA-256 algorithm.
- A SHA-256 scan and index with a 4 KB chunk size incurred only 43 microseconds of latency. That’s less than 1% of overhead compared to a typical disk-based latency of 5 milliseconds.
- A single 3 GHz CPU core running at less than 75% CPU utilization was able to sustain 124 MB/sec of throughput. ESG Lab is confident that Albireo can deliver significantly more aggregate throughput by scaling the number of processes, processor cores, and servers working in parallel.
Why This MattersData deduplication is a resource intensive operation that can have a dramatic impact on the overall cost and performance of a storage solution. ESG Lab is confident that resource efficient Permabit deduplication advisory services can be used to architect a cost effective solution that provides a virtually limitless pool of globally deduplicated capacity using industry standard server hardware. |
Maturity
ESG Lab performed a high level assessment of Permabit’s software development processes to understand the maturity of the Albireo SDK. The bulk of the code within the Albireo SDK has been used within Permabit’s shipping products for more than six years. Permabit has been using agile software development processes for more than seven years.
Agile software development refers to a group of software development methodologies based on iterative development where requirements and solutions evolve through collaboration between self-organizing cross-functional teams. The term was coined in the year 2001 with the formulation of the Agile Manifesto. [8]Agile methods generally promote a disciplined project management process that encourages frequent inspection and adaptation; a leadership philosophy that encourages teamwork, self-organization, and accountability; a set of engineering best practices that allow for rapid delivery of high-quality software; and a business approach that aligns development with customer needs and company goals.
The code at the core of Albireo SDK is continuously integrated and tested on a two week iteration cycle. Stories captured in an online Wiki are used to manage requirements. Unit, functional, and stress tests are highly automated and run continuously. Developers write the majority of unit and system level tests. All changes are peer reviewed. Face-to-face cross functional interaction is embraced. Reducing code complexity is valued and recognized.
Why This MattersData deduplication is complicated technology. When it works as designed, it saves capacity and money. When it fails, it can corrupt data. Storage system vendors looking to add deduplication technology to an existing product must be absolutely sure that that deduplication algorithm will not fail. Rigorous design processes and continuous testing are needed to ensure that the deduplication implementation is bug free. While a more rigorous review is recommended for organizations considering a partnership with Permabit, ESG Lab is very impressed with the maturity and stability of Permabit’s software development and QA processes. |
Scalability and Fault Tolerance
Permabit’s global deduplication algorithm is designed to run within a grid for maximum scalability and fault tolerance. Permabit builds systems that can scale to 4.6 petabytes today (that’s 4,600 terabytes). Its Enterprise Archive product is continuously tested with deduplication services running on a grid of three or more servers. A typical entry-level grid at a customer site, comprised of access and storage nodes, has deduplication services running on 11 nodes. The largest grid deployed in production at a customer site is 38 nodes.
ESG Lab performed a series of tests on a nine node Permabit Enterprise Archive system to determine whether deduplication advisory services continue running after multiple hardware failures. A long running directory level file copy operation was started. As shown in Figure 10, a node was powered off and a drive was removed on a different node. The system remained available and the copy operation completed without error.

Why This MattersESG Lab’s experience with nearly all of the vendors offering data deduplication solutions indicates that scaling a fault tolerant pool of global deduplication is a difficult task that can take years to complete. ESG Lab has confirmed that the global deduplication technology at the heart of the Permabit Albireo SDK has been deployed on a grid of up to 38 servers in production environments. An error injection test by ESG Lab proved that deduplication services remain available after both a drive and a server failure. |
ESG Lab Validation Highlights
- An open-source user-space file system was modified to use Albireo deduplication advisory services using six API calls and only 52 lines of code.
- Synchronous and asynchronous APIs were used to implement inline and post-process data deduplication, respectively.
- The Permabit SDK identified potential capacity savings ranging from 33% to 97% for real-world applications including office productivity files, virtual server images, and e-mail backups.
- Fast and resource efficient deduplication indexing was confirmed. Less than 3.5 bytes of memory per index entry was recorded. Ingest and deduplication detection running at speeds of up to 124 MB/sec was recorded using approximately half of a 3GHz Intel Xeon processor core.
- Permabit’s agile software development processes were audited.
- Systems in the lab and in the field confirm that Permabit global deduplication advisory services have been deployed on grids of up to 38 servers.
- An error injection test confirmed that the Albireo deduplication services running within a field-proven Permabit Enterprise Archive solution remain available after both a drive and a server failure.
Issues to Consider
- The Permabit Albireo SDK detects duplicate data, but it does not actually remove it. Removing duplicates and maintaining pointers to duplicate data is implemented within the storage system using the Permabit SDK. Data structures, which map and keep track of duplicate data references, are needed to take advantage of the Permabit SDK. This is a trivial consideration for NAS systems which use an inode map to keep track of data on disk. For modern block-based disk arrays, this service is often available, but it may be an issue that may have an impact on the complexity, and resources, associated with Albireo integration.
- An Albireo-enabled storage solution continues operating even if Albireo becomes unavailable. If and when Albireo becomes unavailable, the system is unable to detect new duplicates but data access is unaffected. This is due to the fact that Albireo provides deduplication advice and the storage system uses that advice to maintain data integrity.
- While the Permabit architecture has been designed to use any hashing algorithm, Permabit has been using the SHA-256 algorithm for years. While the 256 bit SHA hashing algorithm virtually eliminates the risk of deduplication induced data corruption due to a hash collision, it does add CPU overhead compared to less rigorous 64 bit algorithms (e.g., MD5).
- The API integration and capacity savings results presented in this report were collected using relatively simple test programs running on a single server. Estimating the effort required to evaluate, architect, and implement a solution using a production storage system is beyond the scope of this report. Similarly, estimating the savings that can be achieved with your customer’s data is beyond the scope of this report. Testing in your lab, with your storage system, and with your data is strongly recommended.
The Bigger Truth
One of ESG Lab’s first projects was a 2004 validation of a disk-based backup appliance with built-in data deduplication from Data Domain.[9] Since then, data deduplication has evolved into the hottest, most paradigm shifting technology to hit the storage industry since the UC Berkeley RAID papers were published in 1989. Like RAID, data deduplication quickly permeated the storage market due to its outstanding value proposition. Storage administrators struggling to finish backups within shrinking windows were able to reduce the capacity required to retain backups on disk by 90% or more. The value of this new technology was clearly compelling: data deduplication reduced the cost of disk-based backups, putting it on equal, or better, footing than tape. Backups that finish within a shrinking window and quick ad-hoc restores from disk had suddenly become economically feasible.
In recent years, data deduplication has begun to permeate the storage industry. A number of startups, including Diligent, Sepaton, and Exagrid, followed Data Domain into the disk-based backup appliance market. Since then, all of the major systems vendors have added deduplicating backup appliances to their portfolios. More recently, all of the major backup software vendors have added deduplication to their offerings. Content addressable disk-based solutions with embedded data deduplication technology were introduced in the archive market. Permabit was among the first vendors to enter this growing market. Deduplication has been used to reduce WAN traffic within primary and secondary storage replication solutions. And last, but not least, data deduplication is beginning to take hold in the primary storage market, with NetApp and Microsoft leading the way (Deduplication for FAS and Single Instance Store within Windows Storage Server, respectively). In ESG’s opinion, it’s simply a matter of time before data deduplication gains wide market acceptance in the nearline archive and online primary storage markets.
Data deduplication technology has driven a number of strategic acquisitions. ADIC purchased deduplication technology from Rocksoft for $63M (ADIC has since been acquired by Quantum). IBM acquired Diligent in a deal rumored to be worth between $160M and $200M. Those acquisitions were dwarfed by EMC’s recent acquisition of Data Domain, where a bidding war with NetApp drove the value of the deal up to $2.1B in cash.
So what’s the big deal with deduplication? It’s actually rather simple: deduplication reduces storage capacity requirements up to 99% for primary data and backups stored on disk. In other words, IT managers can squeeze up to a hundred times more out of each dollar they spend on disk capacity. Even as data deduplication becomes more of a feature than a product, the value is clearly compelling. While one could argue that the hype and valuations of deduplication solutions in the backup arena have gotten a bit ahead of the market, it’s clear to ESG that we haven’t seen the peak in the archive and primary storage markets yet.
The deduplication market has begun to mature in recent years. As the feature becomes more of a check-off item within backup solutions, vendors are leveraging differences in architectures and implementations to grow market share. Aside from the usual delineations based on price and performance, vendors are competing based on the finer differences between deduplication solutions: object vs. block-based, inline vs. post-process, fixed vs. variable length, and global vs. islands of deduplication. Permabit has extracted the core deduplication technology from a field-proven archiving solution with a goal of delivering a deduplication algorithm that can be used to architect solutions with any of these attributes in mind. In other words, instead of arguing the merits of one implementation vs. another, Permabit enables a vendor to implement multiple alternatives and just say yes.
ESG Lab has confirmed that the Permabit Albireo deduplication advisory services work as advertised. Inline and post-process deduplication support was added to a user space file system with only six Albireo function calls and 52 lines of code. The capacity of real-world data sets were reduced between 33% and 97%. The patented deduplication lookup and indexing algorithm was fast and efficient. Permabit deduplication was observed running on more than one server for a scalable global pool of deduplication and fault tolerance. ESG Lab saw no interruption in access when errors were injected on a nine node Permabit Enterprise Archiving cluster.
ESG Lab’s experience with nearly all of the vendors that have brought data deduplication solutions to the market indicates that correctly implementing data deduplication is a difficult task that requires man years of effort. Performance, resource efficiency, and scalability have proven to be particularly challenging for a number of vendors. On top of the technical challenges, this relatively new and valuable technology has a growing number of patent portfolios that need to be navigated.
Speaking of patents, Permabit has been awarded a total of 16 patents covering diverse areas in data protection and archive and many more filings are pending in similar areas. The growing portfolio includes patents in the areas of hash–based deduplication for scalable file and object data storage, encrypted deduplication, memory based snap shots, and many other features of Permabit’s product line. ESG Lab was particularly impressed with the resource efficient two-stage indexing method described in Patent 7457813, Storage System for Random Blocks of Data. Highlights of that well-claimed patent are summarized in the resource efficiency section of this report.
ESG Lab is confident that the flexibility provided by the Alberio SDK is unique in the industry. ESG Lab has confirmed that Alberio can be used with file or block storage. It provides deduplication services at the object or sub-file level. It supports an inline and post-processing programming model with minimal performance and resource impact. Running over a grid, it can be used to create a global pool of deduplication with predictably scalable performance and rock solid reliability. It provides deduplication capacity savings that are far greater than can be achieved with compression. Stream support can be used to provide content aware data deduplication for objects that are misaligned with the block boundries of an underlying file system. This allows data stored in container formats (e.g., TAR and ZIP files) to be intelligently deduplicated.
Last, but not least, the Permabit Albireo SDK was designed with quick and easy integration in mind. Based on hands-on experience with the Permabit Albireo SDK, ESG Lab believes that Permabit deduplication can be tested within an existing storage system in a matter of weeks. Given the growing size of the market for capacity reduction and the high cost of developing a deduplication solution, organizations considering the merits of adding data deduplication to an existing storage system should seriously consider a test drive of Permabit’s field proven, patent protected deduplication algorithm.
Appendix

[1] Source: ESG Research Report, Data Protection Market Trends, January 2008.
[2] Source: ESG Research Report, Enterprise Storage System Survey, November 2008.
[4] US Patent 7457813, Storage System for Random Blocks of Data, Nov 25, 2008.
[5] Confirmed via a review of traces of code execution through the in memory index code path.
[6] Confirmed via memory usage comparisons presented later in this section.
[7] Depending on the size of the chunks that are used for deduplication and whether Permabit memory constrained mode is in use. Memory constrained mode was not tested by ESG Lab.
[8] Agilemanifesto.org
[9] See: ESG Lab Report, The Data Domain DD200 Restorer, February 2004.






Due to overwhelming demand, we asked ESG Labs to post this Lab Report as is. Permabit has made many enhancements to the Albireo API and in performance since the lab tests were first conducted. We thought this was a good starting point – please contact us with questions and for up-to-date information at 617-252-9600, bd@perambit.com or visit www.permabit.com. Mike Ivanov VP, Marketing Permabit Technology mivanov@permabit.com
Correction to this report: "ESG believes that it’s only a matter of time before policy-based deduplication support is added to primary block-based storage systems as well." NetApp has offered dedupe on its block-based SAN storage systems for nearly four years now. Any NetApp FAS system can operate concurrently as a NAS or SAN device, and both volumes and LUNs can be enabled for deduplication. In addition, NetApp's V-Series can dedupe 3rd party SAN storage arrays, for example a V-Series can front-end and dedupe data stored on EMC Clariions or Sym's. More info here: http://blogs.netapp.com/drdedupe/2010/05/netapp-luns-and-deduplication.html#tp
Great point Larry. The report starts with a review of where dedupe has taken hold in the market for backup and archive and we point to the future using NetApp dedupe for primary storage as an example. We were blinded by NetApp's dedupe wins with NFS and VMware and overlooked that fact that NetApp's unified architecture extends the benefites of dedupe to block protocols as well. This highlights a problem we have in the industry when we get hung up on a storage protocol instead of a storage solution. What matters here is that NetApp has supported deduplication for primary storage for nearly four years. Due to NetApp's unified architecture, it doesn't matter if the deduplicated primary storage is accessed via a file- or block-based protocol. I believe we captured this a bit better towards the end of the report when we stated: ...data deduplication is beginning to take hold in the primary storage market, with NetApp and Microsoft leading the way (Deduplication for FAS and Single Instance Store within Windows Storage Server, respectively). In ESG’s opinion, it’s simply a matter of time before data deduplication gains wide market acceptance in the nearline archive and online primary storage markets.