<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Enterprise Strategy Group &#187; Data Reduction Software</title>
	<atom:link href="http://www.enterprisestrategygroup.com/category/by-coverage-area/information-and-risk-management/data-protection/data-reduction-software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.enterprisestrategygroup.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Fri, 03 Sep 2010 20:46:55 +0000</lastBuildDate>
	<language></language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Dedupe Delivery Models: Symantec Takes An Appliance Route?</title>
		<link>http://www.enterprisestrategygroup.com/2010/08/dedupe-delivery-models-symantec-takes-an-appliance-route/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/08/dedupe-delivery-models-symantec-takes-an-appliance-route/#comments</comments>
		<pubDate>Fri, 06 Aug 2010 12:35:17 +0000</pubDate>
		<dc:creator>kevin</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[appliance]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[NetBackup]]></category>
		<category><![CDATA[PureDisk]]></category>
		<category><![CDATA[Symantec]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17699</guid>
		<description><![CDATA[Chris Mellor at the Channel Register posted an article suggesting that Symantec is delivering its NetBackup PureDisk on an appliance.  This got me thinking about the different delivery models vendors choose and what buyers prefer.  Does it matter if deduplication is delivered in software or a hardware appliance? We’re still in the early phases of [...]]]></description>
			<content:encoded><![CDATA[<p>Chris Mellor at the Channel Register posted an <a href="http://www.channelregister.co.uk/2010/08/04/symantec_netbackup_5000/" target="_blank">article</a> suggesting that <a href="http://www.symantec.com/" target="_blank">Symantec</a> is delivering its NetBackup PureDisk on an  appliance.  This got me thinking about the different delivery models vendors  choose and what buyers prefer.  Does it matter if deduplication is delivered in  software or a hardware appliance?</p>
<p>We’re still in the early phases of deduplication adoption, with less than 40%  of respondents to <a href="http://www.esg-global.com/" target="_blank">ESG</a> <a href="../../../../../2010/04/2010-data-protection-trends/" target="_blank">data protection research</a> having deployed the technology in  backup environments.  Of these, ESG’s research found that adopters of  deduplication selected a mix of hardware and software deployment approaches: 40%  implement it in backup software, 26% in backup hardware (appliance or disk  storage system), and 34% in a combination of hardware and software.</p>
<p>Deduplication as a feature of software solutions may be less expensive, but  there could be more time and technical acumen required for installation,  configuration, and performance tuning, and any hardware necessary to create a  whole solution will need to be acquired. Appliance-based solutions are  pre-assembled with all of the required components, offering end-users a more  plug-and-play installation and configuration experience, and since they are  purpose-built they may deliver better performance. One drawback to hardware  appliances is that they may underutilize system resources since purchases are  typically made to accommodate future growth.</p>
<p>Organizations with fewer production servers and less capacity tend to favor  software-only approaches while those with more production servers and capacity  take a hybrid approach, applying hardware to some workloads and software to  others.</p>
<p>Both adopter and planned adopter respondents ranked cost, ease of  installation/use, and impact on backup/recovery performance as the top three  purchase criteria for deduplication.  Interestingly, dedupe buyers using  hardware-only approaches were influenced most in their decision by a  relationship with an existing vendor.  I wonder if that will be the case with  Symantec’s appliance approach.</p>
<p>What’s your preference and why?</p>
<p>Read more of Lauren&#8217;s blog posts at <a href="http://www.dataprotectionperspectives.com/" target="_blank">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/08/dedupe-delivery-models-symantec-takes-an-appliance-route/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Reduction is Coming</title>
		<link>http://www.enterprisestrategygroup.com/2010/07/data-reduction-is-coming/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/07/data-reduction-is-coming/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 14:46:48 +0000</pubDate>
		<dc:creator>kevin</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Brian Babineau]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Ocarina]]></category>
		<category><![CDATA[Permabit]]></category>
		<category><![CDATA[StoreWize]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17604</guid>
		<description><![CDATA[When Paul Revere raced through the villages of Massachusetts yelling “The Red Coats are Coming,” I bet you a lot more people would have listened if there were two British soldiers chasing him.  In other words, if Paul Revere had tangible proof or some “examples,” his word would have meant a lot more. ESG has [...]]]></description>
			<content:encoded><![CDATA[<p>When Paul Revere raced through the villages of Massachusetts yelling “The Red  Coats are Coming,” I bet you a lot more people would have listened if there were  two British soldiers chasing him.  In other words, if Paul Revere had tangible  proof or some “examples,” his word would have meant a lot more.</p>
<p>ESG has been saying that primary data reduction would come to a storage  environment near you.  We even pointed at <a href="http://www.netapp.com/us/" target="_blank">NetApp</a> as proof.  The response was “one vendor does not make a  trend.”  <a href="http://www.emc.com/" target="_blank">EMC</a> talked about it  quietly, but no one gave them any credit.  So, let’s say it again.  “Data  reduction is coming to PRIMARY storage ; Data reduction is coming to PRIMARY  storage.”  <a href="http://www.dell.com/" target="_blank">Dell</a> now owns <a href="http://www.ocarinanetworks.com/" target="_blank">Ocarina</a> and <a href="http://www.ibm.com/us/en/" target="_blank">IBM</a> bought up  compression-technology leader <a href="http://www.storewize.com/" target="_blank">StoreWize</a>.   Someone will be smart enough to own / partner  with <a href="http://www.permabit.com/" target="_blank">Permabit</a> soon.   We  have three concrete examples and more likely on the way.  Convinced?  One big  bullet from a musket in honor of Mr. Revere:</p>
<ul>
<li>Storage solution vendors are going to try and help you save less data.    Please listen to us this time and try it.   It will save you money.  If not, ESG  may have to stop blogging and tweeting and buy a horse.</li>
</ul>
<p>Read Brian&#8217;s other blog entries at <a href="http://www.itbulletins.com/" target="_blank">IT BULLETins</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/07/data-reduction-is-coming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>eChannelLine USA &#8211; IBM buys data-compression vendor Storwize</title>
		<link>http://www.enterprisestrategygroup.com/2010/07/echannelline-usa-ibm-buys-data-compression-vendor-storwize/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/07/echannelline-usa-ibm-buys-data-compression-vendor-storwize/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 12:54:11 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Brian Babineau]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[In The News]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Storwize]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17619</guid>
		<description><![CDATA[The deal continues a trend in which the big storage vendors are clearly moving towards the reduction of primary storage, said Brian Babineau, senior consulting analyst at the Enterprise Strategy Group ESG. via eChannelLine USA &#8211; IBM buys data-compression vendor Storwize.]]></description>
			<content:encoded><![CDATA[<p>The deal continues a trend in which the big storage vendors are clearly moving towards the reduction of primary storage, said Brian Babineau, senior consulting analyst at the Enterprise Strategy Group ESG.</p>
<p>via <a href="http://www.echannelline.com/usa/story.cfm?item=25987" target="_blank">eChannelLine USA &#8211; IBM buys data-compression vendor Storwize</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/07/echannelline-usa-ibm-buys-data-compression-vendor-storwize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Primary Deduplication Musical Chairs: Dell Nabs Ocarina</title>
		<link>http://www.enterprisestrategygroup.com/2010/07/primary-deduplication-musical-chairs-dell-nabs-ocarina/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/07/primary-deduplication-musical-chairs-dell-nabs-ocarina/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 18:19:33 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[Backup Exec]]></category>
		<category><![CDATA[capacity optimization]]></category>
		<category><![CDATA[CommVault]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Exar]]></category>
		<category><![CDATA[GreenBytes]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Ocarina]]></category>
		<category><![CDATA[Permabit]]></category>
		<category><![CDATA[PowerVault]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Storwize]]></category>
		<category><![CDATA[Symantec]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17531</guid>
		<description><![CDATA[Data deduplication, data reduction, capacity optimization &#8230; call it what you will, but the end game is to optimize data transfer and storage. The impact on backup/secondary storage has been significant—both for the vendors with the technology and those IT organizations that have deployed it. While the excitement to date has been centered on deduplication [...]]]></description>
			<content:encoded><![CDATA[<p>Data deduplication, data reduction, capacity optimization &#8230; call it what you will, but the end game is to optimize data transfer and storage.  The impact on backup/secondary storage has been significant—both for the vendors with the technology and those IT organizations that have deployed it.  While the excitement to date has been centered on deduplication in backup, the shift to primary storage is on … especially since <a href="http://www.dell.com" target="_blank">Dell</a> announced yesterday that it plans to acquire <a href="http://www.ocarinanetworks.com/" target="_blank">Ocarina Networks</a>, a provider of content-aware data compression and deduplication solutions for unstructured data.</p>
<p>So how does the acquisition benefit Dell?  Dell sells storage systems for primary and backup data, and wants to provide features that lower data management costs for end-users.  But where does the Ocarina technology fit considering Dell’s existing partnerships with <a href="http://www.commvault.com" target="_blank">CommVault</a> (Simpana on PowerVault DL systems), <a href="http://www.emc.com">EMC </a>(resell Data Domain) and <a href="http://www.symantec.com" target="_blank">Symantec</a> (Backup Exec on PowerVault DL systems)?  What’s the relationship between data deduplicated on primary storage and data deduplicated in backup processes?</p>
<p>In an earlier <a href="http://www.dataprotectionperspectives.com/2010/06/deduping-upstream" target="_blank">post</a>, I discussed the benefits of deduplicating “upstream”—as close to the source of data as possible with solutions from the likes of <a href="http://www.exar.com/" target="_blank">Exar</a>, <a href="http://www.getgreenbytes.com/" target="_blank">GreenBytes</a>, <a href="http://www.netapp.com" target="_blank">NetApp</a>, <a href="http://www.permabit.com" target="_blank">Permabit</a>, and <a href="http://www.storwize.com/" target="_blank">StorWize</a>.  One of the theories is that data can be backed up in its optimized state—that is, without having to be “rehydrated” or “reinflated” to its non-optimized format for backup processes.  This would enable an end-to-end deduplication strategy for data.  Not quite sure how this plays out with some of the primary deduplication solutions, including Ocarina.</p>
<p>With this gotcha in mind, it would, therefore, make sense that Dell would need deduplication for primary data and other solutions (CommVault, EMC and Symantec) for backup data.  The technologies from these different vendors could be complementary.  I don’t know how efficient a story that is for Dell customers …  it’s definitely a wrinkle Dell will have to iron out as it brings Ocarina-optimized storage to market and positions it versus other capacity optimization technologies in its portfolio.</p>
<p>Read more of Lauren&#8217;s blog entries at <a href="http://www.dataprotectionperspectives.com" target="_blank">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/07/primary-deduplication-musical-chairs-dell-nabs-ocarina/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Backup to Disk Gains. VTL Wanes?</title>
		<link>http://www.enterprisestrategygroup.com/2010/07/backup-to-disk-gains-vtl-wanes/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/07/backup-to-disk-gains-vtl-wanes/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 17:35:20 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[D2D2T]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[disk backup]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[tape]]></category>
		<category><![CDATA[VTL]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17454</guid>
		<description><![CDATA[When the news broke that NetApp was abandoning future development of its NearStore virtual tape library (VTL) solution, there was much speculation in the industry that VTL had run its course. While the popularity of file-interface disk targets—especially those with deduplication capabilities—has gained, VTLs appear to have waned. Recently published ESG research on The State [...]]]></description>
			<content:encoded><![CDATA[<p>When the <a href="http://www.dataprotectionperspectives.com/2010/02/netapp-nearstore-no-more/" target="_blank">news broke </a>that <a href="http://www.netapp.com" target="_blank">NetApp</a> was abandoning future development of its NearStore virtual tape library (VTL) solution, there was much speculation in the industry that VTL had run its course.  While the popularity of file-interface disk targets—especially those with deduplication capabilities—has gained, VTLs appear to have waned.  Recently published <a href="http://www.esg-global.com" target="_blank">ESG</a> research on <a href="http://www.enterprisestrategygroup.com/2010/06/the-state-of-virtual-tape-library-technology/?utm_source=ConstantContact&amp;utm_medium=Email&amp;utm_campaign=NewsletterJune10" target="_blank"><em>The State of Virtual Tape Library Technology</em> </a>provides insight into respondents’ usage of and plans for VTL technology.</p>
<p>What ESG’s research reveals is a mixed outlook for VTLs: there is widespread use of and ongoing interest in VTL technology; however, the technology may simply fill niche requirements for select organizations.  Results of surveying over 500 IT professionals revealed VTL deployments are a function of data volume and organizations’ reliance on physical tape.  Companies with multi-terabytes of data to back up daily require high-performance solutions—such as VTL—in order to meet backup windows, and VTL has higher appeal to those organizations leveraging both disk and tape.</p>
<p>Most IT organizations (80%) are using disk as the initial on-site backup target.  Retention time on disk in D2D2T strategies has been extended from one week (66%) in 2008 to one month or more (68%) in 2010.  The latter number jumps to 80% for VTL users in 2010.  This is likely fueled by the availability of deduplication: more VTL users employ deduplication than those organizations that back up to other forms of disk targets.</p>
<p>VTLs will likely face mounting pressure from competing disk backup targets that are perceived as more cost-effective.  However, organizations with retention mandates requiring physical tape, and data volume that prevents IT from meeting backup windows are likely to find relief in VTL technology.</p>
<p>Read more of Lauren&#8217;s blog entries at <a href="http://www.dataprotectionperspectives.com" target="_blank">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/07/backup-to-disk-gains-vtl-wanes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HP Data Protector and Deduplication Solutions: Scalability and Performance from the Core to the Edge</title>
		<link>http://www.enterprisestrategygroup.com/2010/06/hp-data-protector-and-deduplication-solutions-scalability-and-performance-from-the-core-to-the-edge/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/06/hp-data-protector-and-deduplication-solutions-scalability-and-performance-from-the-core-to-the-edge/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 20:10:29 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lab Reports]]></category>
		<category><![CDATA[Tony Palmer]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[StorageWorks]]></category>
		<category><![CDATA[VLS]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17264</guid>
		<description><![CDATA[A large number of organizations have deployed disk-to-disk backup technologies to improve the speed and reliability of their backup and disaster recovery operations. A growing number of these organizations look to data deduplication to enhance retention periods and reduce the cost of storage for backups and disaster recovery.  This ESG Lab Validation Report examines Hewlett [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">A large number of organizations have deployed disk-to-disk backup technologies to improve the speed and reliability of their backup and disaster recovery operations. A growing number of these organizations look to data deduplication to enhance retention periods and reduce the cost of storage for backups and disaster recovery.  This ESG Lab Validation Report examines <a href="http://www.hp.com/" target="_blank">Hewlett Packard</a>’s family of backup and recovery solutions that combine the power of HP StorageWorks Virtual Library Systems (VLS) in the data center and the agility of HP D2D appliances in remote offices, tied together with HP Data Protector backup and recovery software. Special attention was paid to ease of implementation as well as the solution’s ability to improve the speed and reliability of disk-based data protection while reducing the cost of disk capacity and network bandwidth. Some of the issues associated with choosing a deduplication solution are also explored.</div>
<h2>Background</h2>
<p>While deduplication can reduce the cost of the raw storage required to store and replicate backup data on disk, integration with the existing ecosystem is crucial. As shown in Figure 1, recently completed ESG research indicates that ease of implementation, performance impacts, and integration with existing backup processes are key concerns.<a href="#_ftn1">[1]</a> Robust management, edge to core replication, tape integration, and deduplication options are important considerations as well, especially within large enterprise-class organizations. The diverse family of Hewlett Packard data protection solutions is ideally suited to address these, and other, concerns.</p>
<div class="graph_top">Figure 1. Data Deduplication Evaluation   Criteria</div>
<p><img class="aligncenter size-full wp-image-17268" title="HPdedupeF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF1.png" alt="" width="619" height="372" /></p>
<h3>Deduplication</h3>
<p>Choosing a deduplication strategy should include a discussion on how and where deduplication should occur. There are two basic locations deduplication can occur: at the source, typically accomplished via a software agent runing on the client machine, or at the target, which involves either writing directly to the device  or running a software agent on the media server to perform deduplication.  All deduplication includes some level of overhead. If it occurs at the source, that overhead occurs on the client machine or media server and may have an impact on backup performance due to the software required on client systems, which can consume processing, storage, and/or network resources to deduplicate data.  When deduplication is perfomed at the target, the overhead is incurred in the device where data is being written.</p>
<p>In addition, it is important to differentiate between a pure software approach — in which the software  runs on an industry standard platform, — and an appliance-based (or storage hardware-based) solution. With a  software solution users have flexibility in their choice of physical storage, but must ensure that the system has enough I/O bandwidth to support their performance needs, sufficient system resources to support desired de-duplication rates, and the right storage security to prevent potential data loss. An appliance-based  solution has the ability to address many of these concerns, but locks users into a particular hardware platform.</p>
<p>HP’s D2D and VLS are target-side deduplication appliances which are designed to provide cost-effective, easy-to-deploy deduplication and are tuned for optimal performance on each hardware platform.</p>
<h2>HP Data Protection Solutions</h2>
<p>Hewlett Packard offers diverse data protection hardware and software to address the concerns of enterprises from the core data center to the smallest remote office. Figure 2 shows the HP family of data protection solutions as they might be deployed in a typical distributed enterprise.</p>
<div class="graph_top">Figure 2. HP Data Protection Solutions</div>
<p><img class="aligncenter size-full wp-image-17269" title="HPdedupeF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF2.png" alt="" width="602" height="298" />HP StorageWorks VLS virtual tape libraries provide a high performance primary backup target with optional deduplication in the FC SAN enterprise data center while HP StorageWorks D2D serves as an easy to manage data protection appliance in mid market data centers and remote offices with deduplication and WAN-efficient replication across sites. HP Data Protector software provides a solution to manage the D2D and the VLS not only in sngle-site backup environments but also in replicated solutions. While this report focuses on the benefits of a total HP solution, it’s important to note that HP products integrate well with third party products. Data Protector works with any inline de-duplication solution and the D2D and VLS can be used with third party backup software. However the end to end, single vendor solution including replication enablement and integration is a unique HP offering.</p>
<p>Benefits of HP StorageWorks Data Protection Solutions</p>
<p><strong>VLS:</strong></p>
<ul>
<li><strong>Highly scalable capacity and performance:</strong> Up to 4800 MB/sec of throughput and 1280 TB of usable storage.</li>
<li><strong>The entire capacity of the system can be presented as a single virtual library target:</strong> The system can scale without time consuming reconfiguration and rebalancing of backup software and backup jobs.</li>
<li><strong>Accelerated deduplication: </strong>Fast hardware compression, in combination with post process deduplication, provides capacity efficiency without impact to backup windows.</li>
<li><strong>WAN efficient replication:</strong> Fast, cost-effective disaster recovery capability between data centers and remote sites.</li>
</ul>
<p><strong>D2D:</strong></p>
<ul>
<li><strong>Distributed capacity and performance:</strong> The D2D Backup Systems offer high performance multi-streaming backup speeds of up to 720 GB/hour.</li>
<li><strong>HP Dynamic deduplication: </strong>Enables longer term data retention on disk and WAN efficient replication.</li>
<li><strong>WAN efficient replication:</strong> Fast, cost-effective disaster recovery capability between multiple remote sites.</li>
<li><strong>Ease of use and deployment:</strong> HP’s D2D Backup Systems are designed for easy installation and deployment for mid-sized business environments—it is ready to deploy right out of the box.</li>
</ul>
<p>As of the publication of this report, HP has refreshed the entire D2D product family and introduced a new high-end model, the D2D 4312. The D2D 4312 has more processing power and offers higher capacity than previous generation D2D models.  The new D2D product family runs a new 64bit data deduplication technology called HP StoreOnce.   HP’s vision is to port StoreOnce deduplication technology to several HP platforms – including HP Data Protector.   While ESG tested the previous generation D2D appliances, the scenarios depicted and conclusions drawn in this report still apply.</p>
<p><strong>Data Protector:</strong></p>
<ul>
<li><strong>Advanced Backup to Disk</strong> &#8211; 24/7 information access and quick disaster recovery.</li>
<li><strong>Multiple Recovery Point/Recovery Time Objectives</strong> &#8211; Achieve business-driven recovery objectives.</li>
<li><strong>Manage data effectively</strong> &#8211; within existing budgets and infrastructure, even as the quantity of data grows.</li>
<li><strong>Centralized Data Protection</strong> &#8211; Protect data on distributed physical and virtual infrastructures.</li>
<li><strong>Broad Interoperability</strong> &#8211; Integrates with partner solutions including NetApp, Data Domain, IBM ProtecTIER, and supports any third party inline deduplication target appliance.</li>
</ul>
<h1>ESG Lab Validation</h1>
<p>ESG Lab performed hands-on evaluation and testing of HP’s data protection solutions at an HP facility in Fort Collins, CO. Testing was designed to demonstrate the scalability, performance, and ease of management  of HP’s solutions from the point of view of a typical enterprise systems administrator integrating HP’s disk-based solutions into an existing tape environment.</p>
<h2>Ease of Deployment and Integration</h2>
<p>The test environment, shown in Figure 3, was used throughout testing. Testing began with a one-node HP StorageWorks VLS9000 array with deduplication, physically installed and powered up as it would be by HP professional services for an enterprise customer, in a data center environment with HP Data Protector software installed. Other elements typical of an enterprise environment, such as physical tape libraries and  HP StorageWorks D2D appliances, were also present in the test bed.<a href="#_ftn2">[2]</a></p>
<div class="graph_top">Figure 3. The ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-17270" title="HPdedupeF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF3.png" alt="" width="597" height="287" /></p>
<h3>ESG Lab Testing</h3>
<p>ESG Lab initially logged in to the Command View VLS console by pointing Internet Explorer at the administrative IP address on the VLS9000 library and entering the administrator username and password. Next, the Create Virtual Library Wizard was launched. The Create Virtual Library Wizard walks the administrator through the steps to create a virtual tape library, asking for such details as library type,  and is shown in Figure 4.</p>
<div class="graph_top">Figure 4. Creating a Virtual Tape  Library with VLS</div>
<p><img class="aligncenter size-full wp-image-17271" title="HPdedupeF4" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF4.png" alt="" width="605" height="293" />ESG Lab selected an HP ESL E-Series to match the physical library type already installed in the test environment. Next, and more important, the virtual tape drive type and quantity, as well as the capacity and quantity of virtual tape cartridges, were set, as seen in Figure 5.</p>
<div class="graph_top">Figure 5. Virtual Tape Library Created</div>
<p><img class="aligncenter size-full wp-image-17272" title="HPdedupeF5" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF5.png" alt="" width="587" height="297" />In less than two minutes, the Create Virtual Library Wizard was completed and the virtual tape library was configured and presented on the SAN.  Next, ESG Lab started the HP Data Protector Autoconfigure Wizard. The HP Data Protector Autoconfigure Wizard, as seen in Figure 6, discovers new backup target devices and prepares Data Protector to use them.</p>
<div class="graph_top">Figure 6. Configuring a Virtual Tape   Library in Data Protector</div>
<p><img class="aligncenter size-full wp-image-17273" title="HPdedupeF6" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF6.png" alt="" width="487" height="328" />The Data Protector Autoconfigure Wizard took about two minutes to discover and add the Virtual Tape Library and its virtual tape drives to Data Protector. As Figure 7 shows, within five minutes of sitting down at the keyboard, ESG Lab was running a full backup of the first server to the HP VLS9000.</p>
<div class="graph_top">Figure 7. Running the First Backup In Data   Protector</div>
<p><img class="aligncenter size-full wp-image-17274" title="HPdedupeF7" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF7.png" alt="" width="503" height="190" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="687" valign="top">
<h1>Why This Matters</h1>
<p>ESG   Research found that 46% of early adopters of deduplication solutions   indicated ease of deployment as the single most important factor in   purchasing a deduplication solution.<a href="#_ftn3">[3]</a> This is especially important for enterprise- class appliances deployed   in large, complex environments where backup policies span hundreds of servers   and dozens of applications—stretching resources to the limit.  ESG Lab has confirmed that an HP   StorageWorks VLS is extremely easy to configure and manage.  The system was dropped into an existing   tape environment and performing backups in less than five minutes using   familiar tools and methodologies.</td>
</tr>
</tbody>
</table>
<h2>Scalability and Performance</h2>
<p>One of the fundamental advantages of VTL backup is the ability to run many backup streams concurrently using multiple virtual tape drives. A single tape drive can only perform one backup at a time.  To get more than one backup job running at the same time, more tape drives must be added and run in parallel. A disk-based backup and recovery solution with many random access disk drives emulating many virtual tape drives can run many backup jobs simultaneously.  The random access nature of disk also provides improved performance when locating individual files to be restored.</p>
<h3>ESG Lab Testing</h3>
<p>ESG Lab performed backups using a single node VLS9000 as a target first and then repeated the test after upgrading the VLS to two nodes in order to examine its relative performance as storage capacity is scaled.  A full backup of mulitple servers was simulated using the HP tapeperf utility set to generate data with 2:1 compressibility.<a href="#_ftn4">[4]</a> The first iteration ran with two servers running 10 backup streams; the second iteration was performed with three servers running sixteen streams. Performance scaled linearly when the second node was added. The screen capture in Figure 8 shows the VLS Command View Console during the two node test, running at 1,164 MB/sec.</p>
<div class="graph_top">Figure 8. Two Node VLS   System Performance</div>
<p><img class="aligncenter size-full wp-image-17275" title="HPdedupeF8" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF8.png" alt="" width="559" height="308" />Figure 9 shows actual performance results obtained in ESG Lab testing projected out to a fully populated, eight-node system. Detailed results for each test run are shown in Table 1.</p>
<div class="graph_top">Figure 9. VLS Performance Scaling</div>
<p><img class="aligncenter size-full wp-image-17276" title="HPdedupeF9" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF9.png" alt="" width="588" height="282" /></p>
<div class="graph_top">Table 1: Raw VLS9000 Backup Results</div>
<p><img class="aligncenter size-full wp-image-17285" title="HPdedupeT1" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeT1.png" alt="" width="617" height="104" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>When a node was added to the VLS, performance scaled nearly linearly with no degradation.</li>
<li>An eight-node system should be able to acheive 4,656 MB/sec of sustained backup throughput, representing the ability to protect nearly 128 TB of data in an eight-hour backup window.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="708" valign="top">
<h1>Why This Matters</h1>
<p>ESG research<a href="#_ftn5">[5]</a> has found that the number one challenge   enterprises report with their data protection processes and technologies is   the need to reduce backup and recovery times. Backup administrators have been   struggling for years to get nightly backups completed before business resumes   in the morning.  Quicker recoveries are   also needed to increase user productivity and meet service level   agreements.</p>
<p>ESG Lab   validated through direct test and audit that HP’s VLS 9000 can linearly scale   aggregate backup throughput as nodes are added. In other words, a single VLS9000   disk backup system can be used to protect up to 128 TB of data in an eight   hour shift and restore individual files in a matter of seconds. Accelerated   deduplication in the VLS series means that users can meet the protection requirements   of a large number of servers with one system, enabling zero-impact deduplication   while lowering acquisition costs and operational complexity.</td>
</tr>
</tbody>
</table>
<h2>Remote Office Protection</h2>
<p>The family of HP disk and tape data protection solutions, combined with HP Data Protector software, can be configured to create an automated, edge-to-core data protection topology (see Figure 10) that spans multiple sites and provides disk-to-disk-to-tape (D2D2T) functionality. HP StorageWorks D2D appliances provide WAN-efficient movement of data between sites and storage tiers while Data Protector provides the single point of management and catalog for backup data—regardless of where it resides (remote office or corporate data center), what type of media it is stored on (disk or tape), or its age (recent backup or long term archive). D2D disk-based backup and replication appliances support data deduplication to reduce the resources required to store backup images on disk and replicate backup images over a WAN.</p>
<div class="graph_top">Figure 10. D2D WAN Efficient Replication</div>
<p><img class="aligncenter size-full wp-image-17277" title="HPdedupeF10" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF10.png" alt="" width="555" height="372" /></p>
<h3>ESG Lab Testing</h3>
<p>ESG Lab used HP Data Protector software to configure, automate, and track the migration of backup data residing on HP D2D data protection appliances, as shown previously in Figure 3. An edge-to-core D2D2T data protection strategy was implemented using an HP D2D appliance located in a simulated remote office. Remote office backup data was replicated over a simulated WAN to an HP D2D in a corporate data center with data movement carried out by the D2D systems. The object copy capabilities of the Data Protector software were used to write a copy of the data to removable media in a Fibre Channel SAN-attached HP StorageWorks MSL tape library. First, ESG Lab logged in to the D2D web management console, shown in Figure 11.</p>
<div class="graph_top">Figure 11. The D2D Management   Interface</div>
<p><img class="aligncenter size-full wp-image-17278" title="HPdedupeF11" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF11.png" alt="" width="575" height="428" />The first backup was ‘seeded’ or replicated locally over a gigabit ethernet LAN connection. Seeding is often employed by users who have large data sets at remote offices as it allows the first bulk transfer of data to complete very quickly. The D2D device is then shipped to the central data center and from then on, updates require much less bandwidth thanks to deduplication.  The first bulk replication is illustrated in Figure 12.</p>
<div class="graph_top">Figure 12. Initial Replication of a   Remote Office Dataset</div>
<p><img class="aligncenter size-full wp-image-17279" title="HPdedupeF12" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF12.png" alt="" width="602" height="231" />Replication of the first backup transferred 53 GB of data in 10 minutes and 40 seconds over an unrestricted Gigabit Ethernet connection. Once the first full backup was completely replicated to the target D2D appliance, ESG Lab inserted the ‘Network Nightmare’ WAN simulator and restricted throughput to 2 megabits per second to simulate a nearby WAN connection between the remote office and data center.</p>
<div class="graph_top">Figure 13. Capacity Efficient Replication</div>
<p><img class="aligncenter size-full wp-image-17280" title="HPdedupeF13" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF13.png" alt="" width="602" height="240" />The first incremental backup was performed and automatically replicated over the 2 Mbit/sec simulated WAN connection. Replication of the incremental backup transferred 5 GB of deduplicated data in 1 hour, 17 minutes and 49 seconds.  This means the D2D transferred at the equivalent of 11 Mbit/sec over the 2 Mbit/sec WAN connection. By comparison, the second full backup resulted in the equivalent of 22.9 Mbit/sec over the same link. The higher virtual throughput is due to the greater volume of duplicate data in a full backup.</p>
<p>Figure 14 shows actual and projected deduplication capacity savings over 30 days of backups on a weekly full, daily incremental backup schedule. The capacity savings over 30 days was projected at 92%—an 11.94:1 deduplication ratio.</p>
<div class="graph_top">Figure 14. Deduplication Capacity Savings Over Time</div>
<p><img class="aligncenter size-full wp-image-17281" title="HPdedupeF14" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF14.png" alt="" width="595" height="343" />Finally, ESG Lab copied the latest full backup to tape the copy to tape capabilities of Data Protector with the D2D, which can be used for archiving. D2D frees up space by expiring source media (virtual tapes) rather than deleting the source data (files). This ensures that data that exists in multiple backups is not deleted until all references to it are deleted.</p>
<div class="graph_top">Figure 15. Copy to Tape</div>
<p><img class="aligncenter size-full wp-image-17282" title="HPdedupeF15" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF15.png" alt="" width="592" height="246" />In 20 minutes, the copy to tape was complete.</p>
<div class="graph_top">Figure 16. Copy to Tape Complete</div>
<p><img class="aligncenter size-full wp-image-17283" title="HPdedupeF16" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF16.png" alt="" width="578" height="346" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>A growing number of organizations are struggling to   protect information assets residing in remote and branch offices. Most are   alarmed at the rate of data growth in these locations. Many lack the IT staff   and expertise needed to manage traditional tape-based protection methods.   Many more are frustrated with the cost and complexity of managing tape media   at remote offices. Disk-based backup and replication to a corporate data   center reduces the complexity and risk, but, until recently, it’s been too   expensive to justify due to the cost of remote office disk capacity and the   WAN connectivity required.</p>
<p>At the time of testing, the street price started at   $4,499 for a 2 TB system. This included dynamic deduplication and low   bandwidth replication—ESG Lab found the HP Storageworks D2D to be an   affordable, simple, and effective solution for the protection of valuable   information assets residing in remote and branch offices. Data protector   provides a single point of management and catalog for local and replicated   backup data that reduces complexity and cost for distributed environments.</td>
</tr>
</tbody>
</table>
<h2>Cost-Efficient Protection</h2>
<p>Organizations of all sizes are struggling to meet the conflicting challenges associated with macro-level global financial uncertainty and micro-level information storage growth and complexity. A growing number of IT managers are turning to virtualization and consolidation technologies to meet these challenges. With a focus on scalability, automated management using rich software tools, and capacity-efficient pricing, HP’s data protection solutions are an excellent example of solutions that are purpose-built to address these issues.</p>
<p>ESG Lab created a total cost of ownership (TCO) model based on a hypothetical backup environment with multiple remote offices, a major data center, and remote replication for disaster protection.  The scenario examines cost-savings associated with moving from a tape-based to a VLS and D2D-based backup and recovery strategy, although it still assumes use of tape for long-term archive of backups. Costs were broken down by category:  capital expenditures, administrative costs, tape costs, maintenance costs, power and cooling costs, and total floorspace costs.  The cumulative costs for both tape- and disk-based backups were calculated annually over a five year period.  A number of assumptions were made and included in the calculations based on what a current IT organization might have in place for equipment, WAN connectivity, backup and restore policies, and capacity and performance requirements.<a href="#_ftn6">[6]</a> Comparisons were made between the total cost of ownership of a traditional tape infrastructure with no replication or deduplication and an HP Data Protection environment with disk-based backup, deduplication, and replication.</p>
<div class="graph_top">Figure 17. The HP Data Protection Solutions Advantage Over Five Years</div>
<p><img class="aligncenter size-full wp-image-17284" title="HPdedupeF17" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeF17.png" alt="" width="586" height="355" />Figure 17 shows the total cost a hypothetical end-user would incur over five years when comparing a traditional tape environment to a backup to disk environment with replication and deduplication. The inflection point, where the disk environment becomes less costly than the tape environment, occurs just before the end of year two.</p>
<div class="graph_top">Table 2. Five Year Cost Breakdown by Category</div>
<p><img class="aligncenter size-full wp-image-17437" title="7-6-2010 4-52-42 PM" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/7-6-2010-4-52-42-PM.png" alt="" width="621" height="168" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>The total cost of ownership of tape alone is roughly 69% higher than HP disk-based data protection with deduplication and replication.</li>
<li>Data Protector software provides significant savings due to licensing based on data stored as opposed to data protected. Deduplication reduces licensing costs.</li>
<li>The tape solution is more expensive in part due to the cost of acquiring tape media and the added complexity of managing the distributed tape infrastructure.</li>
<li>While eliminating daily off-siting of tapes represents significant savings, tape is still the most cost-efficient method for long term archive of backups, and most organizations will replicate deduplicated data to a remote site for copy to tape for this purpose.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="704" valign="top">
<h1>Why This Matters</h1>
<p>Until recently,   extending the benefits of a D2D2T protection strategy to remote and branch   offices has been impractical. The cost of disk and WAN bandwidth for remote   offices often can’t be justified. If a disk-based storage system is used for   replication, the backup software can’t keep track of where the copies reside.   Remote offices typically do not have the experiencesd IT staff needed to   effectively administer tape or disk.</p>
<p>With HP   StorageWorks D2D reducing the cost of disk capacity and WAN bandwidth and HP   Data Protector to manage the data protection environment, ESG Lab has   confirmed that HP’s Data Protection solutions can extend the benefits of a   centrally managed D2D2T strategy to an entire organization.</td>
</tr>
</tbody>
</table>
<h1>ESG Lab Validation Highlights</h1>
<ul>
<li>Within five minutes of sitting down at the keyboard, ESG Lab was running a full backup of the first server to the HP VLS9000.</li>
<li>The VLS 9000 demonstrated near linear performance scalability, achieving an impressive 4 TB/hour with a two-node configuration.</li>
<li>The HP D2D Backup System achieved 81% bandwidth efficiency, transferring a 5 GB incremental backup across a 2 Mbit/sec simulated WAN link in just under 1 hour and 18 minutes for an effective throughput of more than 11 Mbit/sec. Replicating a second full backup yielded an even more impressive 90% bandwidth efficiency, transferring a 54 GB incremental backup across the same 2 Mbit/sec simulated WAN link in just over 6 hours and 42 minutes, for an effective throughput of more than 22.9 Mbit/sec.</li>
<li>The HP Data Protection solution suite demonstrated faster, more reliable backups and restores at a significantly lower total cost of ownership than a tape environment.</li>
</ul>
<h1>Issues to Consider</h1>
<ul>
<li>As with all VTLs today, when a cartridge is deleted or expired in a backup application, space on the VLS is not reclaimed until the cartridge is deleted, expired, or overwritten via the VLS management application. Integration with backup applications to automatically trigger a delete or expire in the VLS when a cartridge is expired in the backup application would be a useful enhancement.</li>
<li>While the VLS, D2D, and Data Protector all have easy to use management interfaces, a single “manager of managers” that integrated all three products and provided an overall view of an entire enterprise environment would be of great value to administrators.</li>
<li>While ESG is confident that one or more HP StorageWorks D2D Backup Systems can be used to meet the performance needs of a mid-sized organization,  D2D systems with more capacity and horsepower could reduce cost and complexity within larger mid-sized organizations.  HP has advised ESG that the new line of D2D Backup Systems, announced in June 2010, has been designed with these considerations in mind.</li>
</ul>
<h1>The Bigger Truth</h1>
<p>ESG Lab conducted its first hands-on testing of the Hewlett Packard’s enterprise VTL, the VLS 6000, in 2006 and then validated the VLS 9000 in 2008. Testing and discussions with end-users have confirmed that HP’s disk-based backup solutions fit seamlessly into existing backup environments while providing dramatic performance and capacity reduction benefits compared to legacy tape-based methods. The HP StorageWorks D2D, aimed at delivering deduplication and WAN efficient replication to smaller, remote offices, completes a comprehensive, enterprise wide edge-to-core data protection architecture, managed by HP Data Protector, that goes beyond disk-based backup.</p>
<p>During this independent lab validation, ESG Lab confirmed HP’s edge to core capabilities in support of large enterprises as well as deduplication support across disk, replication, and tape throughout the product line. HP’s comprehensive offering with capacity efficient pricing provides high performance data deduplication technology to deliver dramatic disk capacity savings while offering scalable, predictable performance.</p>
<p>A modest two-node VLS configuration tested by ESG Lab was able to back up at a sustained 4 TB/hour.  Based on the nearly linear scalability observed by ESG,  an eight-node system should be able to protect data at 16 TB/hour. Easy to navigate, web-based management enabled ESG Lab to manage backups for a remote office from creation, through replication, and finally to tape—all using HP Data Protector software—giving rise to an edge-to-core data protection strategy covering remote and branch offices as well as multiple data centers. Direct attach to tape capability enables enterprises to meet offsite and deep archive requirements using familiar tools and techniques while keeping tape copy traffic off the SAN.</p>
<p>HP Data Protector software exemplifies the company’s depth and breadth of end-to-end solutions for backup and recovery encompassing disk and tape. It should bring significant value to customers grappling with the challenges associated with cost-effective management of their data protection resources.</p>
<p>In essence, deduplication has become a crucial component of disk to disk backups, but when considering competing methods for implementation, customers should consider the tradeoffs and what’s best for their organization: ease of implementation, cost, and bandwidth all play an important role.</p>
<p>ESG Lab believes that the combination of enterprise class performance and scalability—along with comprehensive storage management software and services—provides a unique approach for optimizing data protection and recovery strategy in the enterprise. Hewlett Packard customers can now retain more data for fast and reliable restores and longer retention periods while minimizing impact on backups with accelerated deduplication. Combined with the consolidated data management provided by Data Protector, customers have a wide choice of configurations which can be used to dramatically increase the role of disk in the protection of critical data.</p>
<h1>Appendix</h1>
<div class="graph_top">Table 3. ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-17287" title="HPdedupeT3" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/HPdedupeT3.png" alt="" width="635" height="726" /></p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <em>Data Protection Survey</em>, to be published in Q2 2010.</p>
<p><a name="_ftn2">[2]</a> Detailed configuration information can be found in the Appendix.</p>
<p><a name="_ftn3">[3]</a> Source: ESG Research Report, <a href="http://www.enterprisestrategygroup.com/2010/04/2010-data-protection-trends/" target="_blank"><em>Data Protection Survey</em></a>, April 2010.</p>
<p><a name="_ftn4">[4]</a> Full testing configuration is described in detail in the Appendix.</p>
<p><a name="_ftn5">[5]</a> Source: ESG Research Report, <a href="http://www.enterprisestrategygroup.com/2010/04/2010-data-protection-trends/" target="_blank"><em>Data Protection Survey</em></a>, April 2010.</p>
<p><a name="_ftn6">[6]</a> Assumptions and parameters can be found in the Appendix.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/06/hp-data-protector-and-deduplication-solutions-scalability-and-performance-from-the-core-to-the-edge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>eChannelLine USA &#8211; ChannelBIZ: Easing deduplication fear factors for customers</title>
		<link>http://www.enterprisestrategygroup.com/2010/06/echannelline-usa-channelbiz-easing-deduplication-fear-factors-for-customers/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/06/echannelline-usa-channelbiz-easing-deduplication-fear-factors-for-customers/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 13:19:54 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[In The News]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[tape]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=17199</guid>
		<description><![CDATA[Seamless tape integration with deduplication is critical because, as industry analyst Enterprise Strategy Group points out, an estimated 62% of data protection customers use disk-to-disk-to-tape. via eChannelLine USA &#8211; ChannelBIZ: Easing deduplication fear factors for customers.]]></description>
			<content:encoded><![CDATA[<p>Seamless tape integration with deduplication is critical because, as industry analyst Enterprise Strategy Group points out, an estimated 62% of data protection customers use disk-to-disk-to-tape.</p>
<p>via <a href="http://www.echannelline.com/usa/story.cfm?item=25836" target="_blank">eChannelLine USA &#8211; ChannelBIZ: Easing deduplication fear factors for customers</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/06/echannelline-usa-channelbiz-easing-deduplication-fear-factors-for-customers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Permabit Albireo: Empowering Unified Deduplication</title>
		<link>http://www.enterprisestrategygroup.com/2010/06/permabit-albireo-empowering-unified-deduplication/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/06/permabit-albireo-empowering-unified-deduplication/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 14:02:22 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Brian Garrett]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lab Reports]]></category>
		<category><![CDATA[Albireo]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Permabit]]></category>
		<category><![CDATA[SDK]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=16888</guid>
		<description><![CDATA[Permabit’s field-proven data deduplication engine is now available as a software library and software development kit (SDK) that provides unified data deduplication advisory services.  This ESG Lab Validation report documents the results of hands-on testing of the Permabit Albireo SDK with a focus on ease of integration into existing storage systems, capacity savings that can [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract"><a href="http://www.permabit.com" target="_blank">Permabit</a>’s field-proven data deduplication engine is now available as a software library and software development kit (SDK) that provides unified data deduplication advisory services.  This ESG Lab Validation report documents the results of hands-on testing of the Permabit Albireo SDK with a focus on ease of integration into existing storage systems, capacity savings that can be achieved with real-world data, resource usage, and fault tolerance.</div>
<h1>Introduction</h1>
<p>ESG Lab conducted this Validation Report in November 2009 prior to the public announcement of Albireo.  At the time, we evaluated a Beta version of Albireo as Permabit was in early discussions under NDA with storage OEMs and beginning initial implementations.  Permabit publicly announced Albireo High Performance Data Optimization Software on June 7, 2010.</p>
<h2>Background</h2>
<p>Back in 2001, when Data Domain was founded and EMC purchased Belgian startup FilePool for its Centera product line, few in the industry had heard of data deduplication.  Fewer still were aware that Permabit, founded in 2000, was hard at work developing scalable data deduplication technology.   Today, it’s hard to find anyone in the storage industry that hasn’t heard about data deduplication.</p>
<p>As shown in Figure 1, a recently completed ESG survey of 398 North American IT organizations decision makers indicates extreme interest in using data deduplication technology for data protection.  In fact, 43% of respondents currently utilize or plan to use some type of data deduplication technology to eliminate redundant data.<a href="#_ftn1">[1]</a> Within the data protection market, ESG expects that interest and adoption in deduplication will increase significantly over the next five years as it reaches mainstream market adoption.</p>
<div class="graph_top">Figure 1. Widespread Interest in Data   Deduplication</div>
<p><img class="aligncenter size-full wp-image-16910" title="PermabitAlbireoF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF1.png" alt="" width="614" height="300" />Interest outside of the backup and recovery market is growing as well.  Solutions with built-in deduplication support are being used for disk-based archival (e.g., EMC Centera, Permabit Enterprise Archive).  In the primary storage market, data deduplication has been added to network attached storage (NAS) systems (e.g., NetApp FAS).   As a matter of fact, ESG’s research indicates that space efficiency and data reduction rank second and third, respectively, in the most important features and attributes when considering a NAS solution.  This survey of decision makers within enterprise-class organizations indicates that 30% would not purchase a NAS system without data reduction and 46% would strongly prefer a solution with this attribute.<a href="#_ftn2">[2]</a> These are surprisingly strong results since the majority of NAS solutions don’t support data deduplication.   Given the dramatic benefits that have been realized with existing data deduplication technologies,   ESG believes that it’s only a matter of time before policy-based deduplication support is added to primary block-based storage systems as well.</p>
<h2>Introducing Permabit Deduplication Technology</h2>
<p>Permabit’s field-proven data deduplication engine is now available as a software development kit (SDK) that provides unified data deduplication advisory services.    The Albireo SDK is unique in its ability to provide a wide variety of deduplication services.  The Albireo SDK can be used to:</p>
<ul>
<li>Reduce capacity within file- or block-based storage systems (e.g., NAS appliances or FC disk arrays)</li>
<li>Process objects at the sub-file or single instance storage (SIS) level</li>
<li>Remove duplicates in real-time (a.k.a., inline) or later after data has arrived (post-process)</li>
<li>Eliminate application performance concerns</li>
<li>Scale from a single storage controller to a cluster of storage controllers or a cluster of deduplication appliances communicating with a storage system over an industry standard Ethernet interface</li>
</ul>
<p>Permabit was founded in 2000 by MIT engineers with a goal of developing  a scalable enterprise-class archive product with built-in data deduplication.   That product, now known as the Permabit Enterprise Archive, is shown towards the left in Figure 2. Industry standard servers are arranged in a grid, with each server acting as either an access or storage node. The storage nodes are packed full of high capacity SATA drives and presented as a network attached file system (NAS).   Servers can be added to the grid for increased performance and capacity.</p>
<div class="graph_top">Figure 2. Introducing Permabit Albireo Deduplication Advisory Services</div>
<p><img class="aligncenter size-full wp-image-16911" title="PermabitAlbireoF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF2.png" alt="" width="577" height="262" />In the Permabit Enterprise Archive product, a global pool with deduplication technology is implemented within a grid of servers.   Permabit has extracted this core deduplication technology to create the Albireo software development kit.   Albireo deduplication advisory services are accessed by software running within a storage system through an application programming interface (API) provided by Permabit.   A storage system with Albireo running within a single controller attached to a number of drive enclosures is shown towards the right in Figure 2.</p>
<p>Albireo indexes chunks of data and provides advisory notifications when duplication is detected. The software running in the storage system decides whether to take the advice. If the advice is taken, it is the responsibility of the storage system software to update references to duplicate data.  The Permabit Albireo API supports both block and stream-based access methods. The stream-based API provides content-aware segmentation to optimize deduplication based on the file type. Duplicate advisory services can be provided synchronously, or asynchronously using a registered callback.  Taken together, the Albireo SDK provides a flexible and powerful array of deduplication services.</p>
<h1>ESG Lab Validation</h1>
<p>ESG Lab evaluated the Albireo SDK during two days of hands-on testing at Permabit headquarters in Cambridge, Massachusetts.  The evaluation began with an overview of how Albireo deduplication advisory services are used within an existing storage system.  As shown in Figure 3, the software running within a storage system uses an API call to send new data and addressing information to Albireo.   Albireo ingests the data and performs a SHA-256 hash. A two stage lookup (memory and if needed disk) is performed to see if the data has already been stored. If the data is a duplicate, the API returns the location of the pre-existing data.  If the data is new, Albireo remembers the SHA-256 hash and the chunk’s location so that it can detect duplicate data in the future.</p>
<div class="graph_top">Figure 3. Permabit Albireo Deduplication Advisory Services</div>
<p><img class="aligncenter size-full wp-image-16912" title="PermabitAlbireoF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF3.png" alt="" width="434" height="352" />ESG Lab evaluated the Albireo SDK using a pair of programs developed by Permabit:</p>
<p><strong>PBSCAN:</strong> This utility uses Albireo advisory services to determine the savings that can be achieved with deduplication. The program is routinely used at prospective customer sites to determine the benefits of deduplication with real-world data sets.</p>
<p><strong>DD2FS :</strong> An open-source user-space file system was modified to illustrate the ease of integrating Albireo into an existing file system.</p>
<p>Each test program was implemented as a single process with the Albireo engine running as a separate service.  The pbscan utility was implemented with multi-threads working in parallel.  As shown in Figure 4, the test bed used for this phase of the evaluation consisted of a single quad core Intel Xeon 3 GHz processor with 4 GB of RAM.  For test purposes only and to more easily discern the resource utilization by Albireo, all but one of the processor cores were disabled in BIOS.</p>
<div class="graph_top">Figure 4. The ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-16913" title="PermabitAlbireoF4" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF4.png" alt="" width="494" height="378" />While ESG Lab testing was performed with a single processor core, it should be noted that the extreme scalability of the Albireo engine has been proven in production customer environments.  Later in this report, we’ll take a look at how Permabit’s underlying deduplication technology is routinely deployed within clusters of multi-core servers.</p>
<h2>Capacity Savings</h2>
<p>ESG Lab used the Albireo-enabled pbscan utility and real-world application data collected from Permabit’s production IT servers to evaluate the capacity savings that can be achieved with Permabit deduplication advisory services.   The results for two data sets are summarized in Figure 5 and Table 1.</p>
<div class="graph_top">Figure 5. Capacity Savings</div>
<p><img class="aligncenter size-full wp-image-16914" title="PermabitAlbireoF5" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF5.png" alt="" width="576" height="292" /></p>
<div class="graph_top">Table 1: Capacity Savings</div>
<p><img class="aligncenter size-full wp-image-16920" title="PermabitAlbireoT1" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoT1.png" alt="" width="637" height="98" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>A 43 GB set of common office productivity files (e.g. documents, spreadsheets, presentations) was reduced by 32.67%.</li>
<li>Four VMware virtual server images with a total capacity of 157 GB were reduced to only 4.3 GB. VMware virtual server images are often highly redundant—especially when each of the virtual machines is running the same guest operating system as was the case for this test.    In this example, 97% of the capacity required for the VMware images can be saved with Albireo enabled deduplication.</li>
</ul>
<p>The experiment was repeated for a pair of Microsoft Hyper-V images and a week’s worth of Microsoft Exchange backup images.  One of the backup images was a full backup; the other images were incremental.  The results for all of the data types evaluated by ESG Lab are summarized in Figure 6 and Table 2.  Note that the savings are shown as a deduplication ratio instead of a percentage of capacity saved.</p>
<div class="graph_top">Figure 6. Deduplication Ratios</div>
<p><img class="aligncenter size-full wp-image-16915" title="PermabitAlbireoF6" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF6.png" alt="" width="570" height="299" /></p>
<div class="graph_top">Table 2: Deduplication Ratios</div>
<p><img class="aligncenter size-full wp-image-16921" title="PermabitAlbireoT2" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoT2.png" alt="" width="621" height="166" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>A pair of Microsoft Hyper-V images was reduced by a factor of 2.1:1</li>
<li>A single week’s worth of Microsoft Exchange backups (weekly full, daily incremental) was reduced from 2,203 GB to 501 GB, providing an excellent deduplication ratio of 7.4:1</li>
<li>A deduplication ratio of 36.2:1 was recorded for four VMware images.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Data proliferation is a challenge for IT professionals   within organizations of all sizes.  By   eliminating redundant data, data deduplication can significantly reduce   capacity requirements.  Reducing   capacity requirements reduces the cost of storing and protecting data.</p>
<p>ESG Lab testing with real-world application data has   confirmed that Permabit Albireo deduplication advisory services can be used   to reduce capacity requirements for primary storage arrays, disk-based   archives, and backups.   ESG Lab   recorded an outstanding deduplication rate of 97% (36.2:1) for four VMware   virtual server images.</td>
</tr>
</tbody>
</table>
<h2>Ease of Integration</h2>
<p>ESG Lab evaluated the ease of adding deduplication to an existing storage solution using the FUSE open-source user-space file system framework.<a href="#_ftn3">[3]</a> The FUSE file system is built over the native ext3 Linux file system.  There’s no need to patch or recompile the kernel.   Permabit created a file system using FUSE named dd2fs. The dd2fs file system was modified to use Albireo to identify and eliminate duplicates.   Six Albireo API calls and 52 lines of supporting code were added to the 1,563 line FUSE demonstration program.  Synchronous (inline) and asynchronous (post-processing) deduplication modes were demonstrated.</p>
<p>As shown in Figure 7, the update_block function passes data to be written to the Albireo uds_index_block API.  A return value of 1 indicates that the block is a duplicate and that it can share storage with the canonical chunk. After a bit of bookkeeping and error checking, the storage associated with the duplicate block is freed.  Other than this key function call, the majority of the Albireo-related code changes were isolated to initialization, shutdown, and handling callbacks when operating in asynch/post-process mode.</p>
<div class="graph_top">Figure 7. Integrating Albireo</div>
<p><img class="aligncenter size-full wp-image-16916" title="PermabitAlbireoF7" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF7.png" alt="" width="560" height="328" />The FUSE demo was run to see Albireo-enabled deduplication in action.  As shown in Figure 8, a 64 KB file full of random data was written to an empty Albireo-enabled file system. A copy of the file with a new name was added.  The Linux <em>df</em> and <em>du</em> utilities were used to verify that the user’s view of the file system included the space consumed by both files, yet only a single file’s worth of disk capacity was consumed.  The Linux <em>md5sum </em>utility was used to verify that the files were the same.  As each file was deleted, the file system capacity and underlying disk capacity were checked.</p>
<div class="graph_top">Figure 8. Validating Albireo Enabled   Deduplication</div>
<p><img class="aligncenter size-full wp-image-16917" title="PermabitAlbireoF8" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF8.png" alt="" width="584" height="418" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Deduplication is valuable technology that can be   frustratingly hard to develop and debug.     The field-proven and patented deduplication technology at the core of   Permabit Albireo API incorporates over 25 man years of development.  Integrating Albireo using six well-documented   API calls consumed only 52 lines of code.    ESG Lab is confident that an experienced storage systems architect   working alongside a Permabit engineer can complete a proof of concept   integration in two weeks—or less.</td>
</tr>
</tbody>
</table>
<h2>Resource Utilization</h2>
<p>Identifying duplicate data is a resource intensive operation.     First, data needs to be fed into a deduplication engine.  Moving a lot of data can consume a lot of bandwidth.  Next, the engine needs an algorithm which can be used to quickly, and accurately, identify duplicates.    Most deduplication solutions use cryptographic hashing functions to identify duplicates, but hashing a lot of data can consume a lot of CPU horsepower.  Last, but not least, the deduplication solution needs to maintain an index of previously processed data to find and keep track of duplicates.  Most deduplication solutions use a two stage lookup, with the first lookup occurring in memory and the second occurring on disk.  Indexing lots of possibly duplicate data can consume a significant amount of memory.   Each of these resource issues can have an effect on the overall performance of a storage solution.</p>
<p>ESG Lab’s analysis of CPU, memory, and performance efficiency began with a review of Permabit’s patented two stage deduplication detection algorithm. <a href="#_ftn4">[4]</a> The patent describes a highly efficient two stage index residing in memory and on disk.  This novel approach uses a combination of bit sampling and byte differencing to provide a first stage memory lookup that executes very quickly, consumes very little memory, and has a very low probability of false positives.  ESG Lab has confirmed that the Permabit deduplication advisory services engine:</p>
<ul>
<li>Delivers fast in-memory deduplication lookups that take between 9 and 17 microseconds<a href="#_ftn5">[5]</a></li>
<li>Requires less than 3.5 bytes of memory to index each chunk of deduplicated  data<a href="#_ftn6">[6]</a></li>
<li>Allows a single server to deduplicate up to 48 TB of data with under 2.5 GB of RAM<a href="#_ftn7">[7]</a></li>
</ul>
<p>A 42 GB set of office productivity files (documents, spreadsheets, presentations, PDFs, etc.) was processed by the Albireo-enabled scan utility.  The scan utility was used to quantify the processor, memory, and performance impact of Permabit deduplication advisory services. The results of three tests were compared:</p>
<ol>
<li><strong>Scan only</strong>: opened the file system and read 42 GB of office productivity files from beginning to end. The 42 GB was spread over 26,094 files.</li>
<li><strong>Scan and index (64 KB chunk)</strong>: opened and passed all of the data within each file to the Albireo API. The Albireo API was used to identify and keep track of 64 KB chunks of data in a Permabit index of duplicate candidates.</li>
<li><strong>Scan and index (4 KB chunk)</strong>:  repeated the second test with a chunk size of 4 KB.</li>
</ol>
<p>Linux utilities were used to record CPU and memory utilization. Trace output was used to record the average latency for each Albireo deduplication lookup. In particular, trace data recorded all of the latency associated with a Permabit deduplication advisory services call, including the SHA-256 ingest, the first pass memory index, and a second order index operation for likely duplicate candidates.  The elapsed timed needed to process the 42 GB file set was used to calculate throughput.   The results are summarized in Table 3 and Figure 9.</p>
<div class="graph_top">Table 3: Resource Utilization Analysis</div>
<p><img class="aligncenter size-full wp-image-16922" title="PermabitAlbireoT3" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoT3.png" alt="" width="640" height="131" /></p>
<div class="graph_top">Figure 9. CPU Utilization Analysis</div>
<p><img class="aligncenter size-full wp-image-16918" title="PermabitAlbireoF9" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF9.png" alt="" width="484" height="299" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>The scan only test spent most of the time reading from disk and consumed very little CPU (less than 1%).</li>
<li>Permabit deduplication advisory services consumed approximately half (28% to 59%) of a 3 GHz Xeon processor core.</li>
<li>Scanning and indexing with a larger chunk size consumes slightly more CPU. This is due to the fact that bigger chunks of data were passing through the CPU intensive SHA-256 algorithm.</li>
<li>A SHA-256 scan and index with a 4 KB chunk size incurred only 43 microseconds of latency.  That’s less than 1% of overhead compared to a typical disk-based latency of 5 milliseconds.</li>
<li>A single 3 GHz CPU core running at less than 75% CPU utilization was able to sustain 124 MB/sec of throughput.   ESG Lab is confident that Albireo can deliver significantly more aggregate throughput by scaling the number of processes, processor cores, and servers working in parallel.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Data deduplication is a resource intensive operation   that can have a dramatic impact on the overall cost and performance of a   storage solution.  ESG Lab is confident   that resource efficient Permabit deduplication advisory services can be used   to architect a cost effective solution that provides a virtually limitless   pool of globally deduplicated capacity using industry standard server   hardware.</td>
</tr>
</tbody>
</table>
<h2>Maturity</h2>
<p>ESG Lab performed a high level assessment of Permabit’s software development processes to understand the maturity of the Albireo SDK.  The bulk of the code within the Albireo SDK has been used within Permabit’s shipping products for more than six years.  Permabit has been using agile software development processes for more than seven years.</p>
<p>Agile software development refers to a group of <a title="Software development methodologies" href="http://en.wikipedia.org/wiki/Software_development_methodologies">software development methodologies</a> based on iterative development where requirements and solutions evolve through collaboration between self-organizing <a title="Cross-functional team" href="http://en.wikipedia.org/wiki/Cross-functional_team">cross-functional teams</a>. The term was coined in the year 2001 with the formulation of the <a title="Agile Manifesto" href="http://en.wikipedia.org/wiki/Agile_Manifesto">Agile Manifesto</a>. <a href="#_ftn8">[8]</a>Agile methods generally promote a disciplined project management process that encourages frequent inspection and adaptation; a leadership philosophy that encourages teamwork, self-organization, and accountability; a set of engineering best practices that allow for rapid delivery of high-quality software; and a business approach that aligns development with customer needs and company goals.</p>
<p>The code at the core of Albireo SDK is continuously integrated and tested on a two week iteration cycle. Stories captured in an online Wiki are used to manage requirements.  Unit, functional, and stress tests are highly automated and run continuously.  Developers write the majority of unit and system level tests. All changes are peer reviewed. Face-to-face cross functional interaction is embraced.  Reducing code complexity is valued and recognized.</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Data deduplication is complicated technology.  When it works as designed, it saves   capacity and money.  When it fails, it   can corrupt data. Storage system vendors looking to add deduplication   technology to an existing product must be absolutely sure that that   deduplication algorithm will not fail.    Rigorous design processes and continuous testing are needed to ensure   that the deduplication implementation is bug free.</p>
<p>While a more rigorous review is recommended for   organizations considering a partnership with Permabit, ESG Lab is very   impressed with the maturity and stability of Permabit’s software development   and QA processes.</td>
</tr>
</tbody>
</table>
<h2>Scalability and Fault Tolerance</h2>
<p>Permabit’s global deduplication algorithm is designed to run within a grid for maximum scalability and fault tolerance.   Permabit builds systems that can scale to 4.6 petabytes today (that’s 4,600 terabytes).  Its Enterprise Archive product is continuously tested with deduplication services running on a grid of three or more servers.  A typical entry-level grid at a customer site, comprised of access and storage nodes, has deduplication services running on 11 nodes. The largest grid deployed in production at a customer site is 38 nodes.</p>
<p>ESG Lab performed a series of tests on a nine node Permabit Enterprise Archive system to determine whether deduplication advisory services continue running after multiple hardware failures.  A long running directory level file copy operation was started. As shown in Figure 10, a node was powered off and a drive was removed on a different node. The system remained available and the copy operation completed without error.</p>
<div class="graph_top">Figure 10. Validating Permabit Fault Tolerance</div>
<p><img class="aligncenter size-full wp-image-16919" title="PermabitAlbireoF10" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoF10.png" alt="" width="514" height="368" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>ESG Lab’s experience with nearly all of the vendors   offering data deduplication solutions indicates that scaling a fault tolerant   pool of global deduplication is a difficult task that can take years to complete.   ESG Lab has confirmed that the global deduplication technology at the heart   of the Permabit Albireo SDK has been deployed on a grid of up to 38 servers   in production environments. An error injection test by ESG Lab proved that   deduplication services remain available after both a drive and a server   failure.</td>
</tr>
</tbody>
</table>
<h1>ESG Lab Validation Highlights</h1>
<ul>
<li>An open-source user-space file system was modified to use Albireo deduplication advisory services using six API calls and only 52 lines of code.</li>
<li>Synchronous and asynchronous APIs were used to implement inline and post-process data deduplication, respectively.</li>
<li>The Permabit SDK identified potential capacity savings ranging from 33% to 97% for real-world applications including office productivity files, virtual server images, and e-mail backups.</li>
<li>Fast and resource efficient deduplication indexing was confirmed.  Less than 3.5 bytes of memory per index entry was recorded.  Ingest and deduplication detection running at speeds of up to 124 MB/sec was recorded using approximately half of a 3GHz Intel Xeon processor core.</li>
<li>Permabit’s agile software development processes were audited.</li>
<li>Systems in the lab and in the field confirm that Permabit global deduplication advisory services have been deployed on grids of up to 38 servers.</li>
<li>An error injection test confirmed that the Albireo deduplication services running within a field-proven Permabit Enterprise Archive solution remain available after both a drive and a server failure.</li>
</ul>
<h1>Issues to Consider</h1>
<ul>
<li>The Permabit Albireo SDK detects duplicate data, but it does not actually remove it.  Removing duplicates and maintaining pointers to duplicate data is implemented within the storage system using the Permabit SDK.  Data structures, which map and keep track of duplicate data references, are needed to take advantage of the Permabit SDK. This is a trivial consideration for NAS systems which use an inode map to keep track of data on disk. For modern block-based disk arrays, this service is often available, but it may be an issue that may have an impact on the complexity, and resources, associated with Albireo integration.</li>
<li>An Albireo-enabled storage solution continues operating even if Albireo becomes unavailable.  If and when Albireo becomes unavailable, the system is unable to detect new duplicates but data access is unaffected.  This is due to the fact that Albireo provides deduplication advice and the storage system uses that advice to maintain data integrity.</li>
<li>While the Permabit architecture has been designed to use any hashing algorithm, Permabit has been using the SHA-256 algorithm for years.   While the 256 bit SHA hashing algorithm virtually eliminates the risk of deduplication induced data corruption due to a hash collision, it does add CPU overhead compared to less rigorous 64 bit algorithms (e.g., MD5).</li>
<li>The API integration and capacity savings results presented in this report were collected using relatively simple test programs running on a single server.    Estimating the effort required to evaluate, architect, and implement a solution using a production storage system is beyond the scope of this report. Similarly, estimating the savings that can be achieved with your customer’s data is beyond the scope of this report.  Testing in your lab, with your storage system, and with your data is strongly recommended.</li>
</ul>
<h1>The Bigger Truth</h1>
<p>One of ESG Lab’s first projects was a 2004 validation of a disk-based backup appliance with built-in data deduplication from Data Domain.<a href="#_ftn9">[9]</a> Since then, data deduplication has evolved into the hottest, most paradigm shifting technology to hit the storage industry since the UC Berkeley RAID papers were published in 1989.  Like RAID, data deduplication quickly permeated the storage market due to its outstanding value proposition.  Storage administrators struggling to finish backups within shrinking windows were able to reduce the capacity required to retain backups on disk by 90% or more.   The value of this new technology was clearly compelling: data deduplication reduced the cost of disk-based backups, putting it on equal, or better, footing than tape.   Backups that finish within a shrinking window and quick ad-hoc restores from disk had suddenly become economically feasible.</p>
<p>In recent years, data deduplication has begun to permeate the storage industry.  A number of startups, including Diligent, Sepaton, and Exagrid, followed Data Domain into the disk-based backup appliance market.    Since then, all of the major systems vendors have added deduplicating backup appliances to their portfolios.  More recently, all of the major backup software vendors have added deduplication to their offerings.  Content addressable disk-based solutions with embedded data deduplication technology were introduced in the archive market.  Permabit was among the first vendors to enter this growing market.  Deduplication has been used to reduce WAN traffic within primary and secondary storage replication solutions. And last, but not least, data deduplication is beginning to take hold in the primary storage market, with NetApp and Microsoft leading the way (Deduplication for FAS and Single Instance Store within Windows Storage Server, respectively).  In ESG’s opinion, it’s simply a matter of time before data deduplication gains wide market acceptance in the nearline archive and online primary storage markets.</p>
<p>Data deduplication technology has driven a number of strategic acquisitions.  ADIC purchased deduplication technology from Rocksoft for $63M (ADIC has since been acquired by Quantum).  IBM acquired Diligent in a deal rumored to be worth between $160M and $200M.  Those acquisitions were dwarfed by EMC’s recent acquisition of Data Domain, where a bidding war with NetApp drove the value of the deal up to $2.1B in cash.</p>
<p>So what’s the big deal with deduplication?  It’s actually rather simple: deduplication reduces storage capacity requirements up to 99% for primary data and backups stored on disk.  In other words, IT managers can squeeze up to a hundred times more out of each dollar they spend on disk capacity.  Even as data deduplication becomes more of a feature than a product, the value is clearly compelling.  While one could argue that the hype and valuations of deduplication solutions in the backup arena have gotten a bit ahead of the market, it’s clear to ESG that we haven’t seen the peak in the archive and primary storage markets yet.</p>
<p>The deduplication market has begun to mature in recent years.  As the feature becomes more of a check-off item within backup solutions, vendors are leveraging differences in architectures and implementations to grow market share.  Aside from the usual delineations based on price and performance, vendors are competing based on the finer differences between deduplication solutions:  object vs. block-based, inline vs. post-process, fixed vs. variable length, and global vs. islands of deduplication.  Permabit has extracted the core deduplication technology from a field-proven archiving solution with a goal of delivering a deduplication algorithm that can be used to architect solutions with any of these attributes in mind. In other words, instead of arguing the merits of one implementation vs. another, Permabit enables a vendor to implement multiple alternatives and just say yes.</p>
<p>ESG Lab has confirmed that the Permabit Albireo deduplication advisory services work as advertised. Inline and post-process deduplication support was added to a user space file system with only six Albireo function calls and 52 lines of code.  The capacity of real-world data sets were reduced between 33% and 97%. The patented deduplication lookup and indexing algorithm was fast and efficient.   Permabit deduplication was observed running on more than one server for a scalable global pool of deduplication and fault tolerance.  ESG Lab saw no interruption in access when errors were injected on a nine node Permabit Enterprise Archiving cluster.</p>
<p>ESG Lab’s experience with nearly all of the vendors that have brought data deduplication solutions to the market indicates that correctly implementing data deduplication is a difficult task that requires man years of effort.   Performance, resource efficiency, and scalability have proven to be particularly challenging for a number of vendors.  On top of the technical challenges, this relatively new and valuable technology has a growing number of patent portfolios that need to be navigated.</p>
<p>Speaking of patents, Permabit has been awarded a total of 16 patents covering diverse areas in data protection and archive and many more filings are pending in similar areas. The growing portfolio includes patents in the areas of hash–based deduplication for scalable file and object data storage, encrypted deduplication, memory based snap shots, and many other features of Permabit’s product line.  ESG Lab was particularly impressed with the resource efficient two-stage indexing method described in Patent 7457813, Storage System for Random Blocks of Data. Highlights of that well-claimed patent are summarized in the resource efficiency section of this report.</p>
<p>ESG Lab is confident that the flexibility provided by the Alberio SDK is unique in the industry.  ESG Lab has confirmed that Alberio can be used with file or block storage. It provides deduplication services at the object or sub-file level. It supports an inline and post-processing programming model with minimal performance and resource impact.  Running over a grid, it can be used to create a global pool of deduplication with predictably scalable performance and rock solid reliability.  It provides deduplication capacity savings that are far greater than can be achieved with compression. Stream support can be used to provide content aware data deduplication for objects that are misaligned with the block boundries of an underlying file system.  This allows data stored in container formats (e.g., TAR and ZIP files) to be intelligently deduplicated.</p>
<p>Last, but not least, the Permabit Albireo SDK was designed with quick and easy integration in mind. Based on hands-on experience with the Permabit Albireo SDK, ESG Lab believes that Permabit deduplication can be tested within an existing storage system in a matter of weeks.  Given the growing size of the market for capacity reduction and the high cost of developing a deduplication solution, organizations considering the merits of adding data deduplication to an existing storage system should seriously consider a test drive of Permabit’s field proven, patent protected deduplication algorithm.</p>
<h1>Appendix</h1>
<div class="graph_top">Table 4: The ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-16923" title="PermabitAlbireoT4" src="http://www.enterprisestrategygroup.com/media/wordpress/2010/06/PermabitAlbireoT4.png" alt="" width="632" height="83" /></p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <em>Data Protection Market Trends</em>, January 2008.</p>
<p><a name="_ftn2">[2]</a> Source: ESG Research Report, <em>Enterprise</em><em> Storage System Survey</em>, November 2008.</p>
<p><a name="_ftn3">[3]</a> <a href="http://fuse.sourceforge.net" target="_blank">fuse.sourceforge.net</a></p>
<p><a name="_ftn4">[4]</a> US Patent 7457813, Storage System for Random Blocks of Data, Nov 25, 2008.</p>
<p><a name="_ftn5">[5]</a> Confirmed via a review of traces of code execution through the in memory index code path.</p>
<p><a name="_ftn6">[6]</a> Confirmed via memory usage comparisons presented later in this section.</p>
<p><a name="_ftn7">[7]</a> Depending on the size of the chunks that are used for deduplication and whether Permabit memory constrained mode is in use.  Memory constrained mode was not tested by ESG Lab.</p>
<p><a name="_ftn8">[8]</a> Agilemanifesto.org</p>
<p><a name="_ftn9">[9]</a> See: ESG Lab Report, <em>The Data Domain DD200 Restorer</em>, February 2004.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/06/permabit-albireo-empowering-unified-deduplication/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Permabit&#8217;s Primary Dedupe has Me Thinking</title>
		<link>http://www.enterprisestrategygroup.com/2010/06/permabits-primary-dedupe-has-me-thinking/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/06/permabits-primary-dedupe-has-me-thinking/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 13:58:37 +0000</pubDate>
		<dc:creator>kevin</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Brian Babineau]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Public Cloud Computing Infrastructure and Services]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[box.net]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Permabit]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=16842</guid>
		<description><![CDATA[Yesterday, Permabit released its Albireo data optimization software–a deduplication solution designed to be embedded into primary storage systems.  Right now, a majority of primary storage systems do not have dedupe capabilities and, because of lengthy product cycles, it is hard to determine when they will.  Now, hold that thought for one second. I spend a majority [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, <a href="http://www.permabit.com/" target="_blank">Permabit</a> released its Albireo data optimization software–a deduplication solution  designed to be embedded into primary storage systems.  Right now, a majority of  primary storage systems do not have dedupe capabilities and, because of lengthy  product cycles, it is hard to determine when they will.  Now, hold that thought  for one second.</p>
<p>I spend a majority of my time researching the information management software  market–solutions that deliver search, access, and business process management /  automation  (including electronic discovery and retention management)  capabilities.  As I was driving home from SFO the other night I saw a billboard  for <a href="http://www.box.net/" target="_blank">Box.net</a> – a “cloud” content  management provider and alternative to SharePoint.  The billboard highlighted  “two software updates per month versus two years”–a direct shot at <a href="http://www.microsoft.com/en/us/default.aspx" target="_blank">Microsoft</a> which has pretty much standardized on bi-annual product releases.</p>
<p>Let me connect my two disparate thoughts.  One of the primary reasons that  “cloud-based” software or “Software-as-a-Service” threatens major “on-premise”  applications is the agility without risk.  Upgrades are delivered automatically  and may be executed a few times per month as Box.net proclaims.  With on-premise  software, a single upgrade can take months and, if anything goes wrong, it could  be much longer and more expensive.   Now, there really isn’t an equivalent in  the systems business–especially in storage.   However, one could argue that  Permabit’s approach–delivering embedded software that runs on x86 architecture /  standard operating systems that will ultimately serve as a single feature of the  system–could be the next best thing.  Let’s say you build primary storage  systems and you want to add dedupe, but your next product release is scheduled  12 months from now.  The issue is that you have already completed this future  system design and, unless you make substantial changes, the next possible  opportunity to add dedupe is a product that will be delivered three years from  now (2 years from the next scheduled release).  Alternatively, you could  leverage embedded software to minimize the R&amp;D investment and get the  capability to market in the next product cycle.  In other words, getting new  features into a system using a third party-embedded technology developer is the  way hardware systems become more “cloud-esque”.   The bullets explain the  benefits:</p>
<ul>
<li>Users can still buy tangible assets–but can get upgrades via code loads from  their favorite vendors.</li>
<li>System manufacturers can add features more rapidly in the middle of planned  product cycles.</li>
<li>System manufacturers can focus R&amp;D resources on getting the next  solution to market quicker, potentially accelerating product cycles.</li>
</ul>
<p>The bottom line is, while there are new R&amp;D methods (agile development  methodologies), there are other ways for developers to get features into  hardware solutions much quicker and thus new capabilities into users’ hands  faster.  Sometimes those other means may not be that apparent unless you study  the software marketplace.</p>
<p>Read Brian&#8217;s other blog entries at <a href="http://www.itbulletins.com/" target="_blank">IT BULLETins</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/06/permabits-primary-dedupe-has-me-thinking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deduping Upstream</title>
		<link>http://www.enterprisestrategygroup.com/2010/06/deduping-upstream/</link>
		<comments>http://www.enterprisestrategygroup.com/2010/06/deduping-upstream/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 01:34:31 +0000</pubDate>
		<dc:creator>kevin</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[Data Domain]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[Ocarina]]></category>
		<category><![CDATA[Permabit]]></category>
		<category><![CDATA[primary storage]]></category>
		<category><![CDATA[secondary storage]]></category>
		<category><![CDATA[Storwize]]></category>
		<category><![CDATA[Symantec]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=16839</guid>
		<description><![CDATA[While much of the hype around dedupe has been focused on data deduplication for secondary disk storage, the technology is rapidly evolving and being applied at earlier points in the data path—upstream from backup storage systems.  Several backup software vendors, including EMC Avamar, have been differentiating their solutions with source-side deduplication.  Deduplication and compression are [...]]]></description>
			<content:encoded><![CDATA[<p>While much of the hype around dedupe has been focused on data deduplication  for secondary disk storage, the technology is rapidly evolving and being applied  at earlier points in the data path—upstream from backup storage systems.   Several backup software vendors, including <a href="http://www.emc.com/" target="_blank">EMC</a> Avamar, have been differentiating their solutions with  source-side deduplication.  Deduplication and compression are being applied to  primary storage by EMC , <a href="http://www.netapp.com/">NetApp</a>, <a href="http://www.storwize.com/" target="_blank">StorWize</a> and <a href="http://ocarinanetworks.com/" target="_blank">Ocarina</a> to reduce the  footprint of primary data. Now <a href="http://www.permabit.com/" target="_blank">Permabit</a> has launched data reduction software that primary  storage systems vendors can embed—potentially rivaling other vendors’ position  here.</p>
<p>It’s definitely been interesting to watch capacity optimization technology  (deduplication and compression) evolve.  The most notable trend is the move  upstream.  Just last month, EMC announced its “Boost” option for <a href="http://www.datadomain.com/" target="_blank">Data Domain</a>, which enabled  Data Domain to move a portion of its deduplication processing up to the media  server in NetBackup OST environments—thereby distributing the workload and  gaining some network and performance efficiency in the process.  In a similar  move, <a href="http://www.symantec.com/" target="_blank">Symantec</a> extended its  deduplication processing to client systems (from the media server) to gain  end-to-end efficiency in the backup process.</p>
<p>These shifts shouldn’t come as a surprise.  Vendors (and end-users) realize  that there are big benefits to “stopping the avalanche at the top of the  mountain.”  They’re also probably drooling over NetApp’s success with its  storage efficiency efforts—headlined with NetApp deduplication in FAS arrays.   Just last month, NetApp posted seriously good financial results; its focus on  server virtualization initiatives, where redundancy and proliferation create an  optimal use case for deduplication, is probably a contributor to these  results.</p>
<p>Permabit, long a provider of deduplication technology through its archive  solution, has packaged its technology up for wide consumption.  Offering its  Albireo High Performance Data Optimization Software via an SDK allows the  primary dedupe “have nots” to embed Permabit’s time-tested deduplication  technology into their primary storage offerings.</p>
<p>The objection every vendor moving the process upstream has to manage will be  performance vs. cost.  Implementing deduplication to save bandwidth and capacity  makes sense … but not if the added processing impacts application performance.</p>
<p>Read Lauren&#8217;s other blog entries at <a href="http://www.dataprotectionperspectives.com/" target="_blank">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2010/06/deduping-upstream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
