<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Enterprise Strategy Group X Data Reduction Software</title>
	<atom:link href="http://www.enterprisestrategygroup.com/category/by-coverage-area/information-and-risk-management/data-protection/data-reduction-software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.enterprisestrategygroup.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Mon, 06 Feb 2012 21:55:58 +0000</lastBuildDate>
	<language></language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>CommVault Simpana now offering &#8220;One Pass&#8221;</title>
		<link>http://www.enterprisestrategygroup.com/2012/02/commvault-simpana-now-offering-one-pass/</link>
		<comments>http://www.enterprisestrategygroup.com/2012/02/commvault-simpana-now-offering-one-pass/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 14:15:00 +0000</pubDate>
		<dc:creator>Jason Buffington</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Digital Archiving Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Jason Buffington]]></category>
		<category><![CDATA[Technical Optimist]]></category>
		<category><![CDATA[backup-to-disk]]></category>
		<category><![CDATA[CommVault]]></category>
		<category><![CDATA[Simpana]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=28264</guid>
		<description><![CDATA[Today, CommVault is holding a virtual event to announce some of its latest innovations for the Simpana 9.0 product. I had the opportunity to do some early hands-on testing of a few of the new capabilities during an ESG Lab Review &#8212; including its new &#8220;OnePass&#8221; technology and its ability to integrate with Scale-out NAS. [...]]]></description>
			<content:encoded><![CDATA[<p>Today, CommVault is holding a virtual event to announce some of its latest innovations for the Simpana 9.0 product. I had the opportunity to do some early hands-on testing of a few of the new capabilities during an ESG Lab Review &#8212; including its new &#8220;OnePass&#8221; technology and its ability to integrate with Scale-out NAS.</p>
<blockquote><p><a title="ESG Lab Report on CommVault Simpana 9 OnePass" href="http://www.enterprisestrategygroup.com/2012/02/lab-review-commvault-simpana-9-“onepass" target="_blank">Click here</a> to read the new<em> ESG Lab Report on CommVault Simpana 9.0 &#8220;OnePass&#8221;</em><strong> </strong></p>
<p><a title="ESG Analyst Brief on CommVault Simpana" href="http://www.enterprisestrategygroup.com/2012/02/building-a-strategic-archive-with-commvault-simpana-software" target="_blank">Click here</a> to read a new <em>ESG Analyst Brief on CommVault Simpana 9</em></p></blockquote>
<p>With data growing at ever increasing rates, more data sets are simply becoming &#8220;too big&#8221; to back up &#8212; at least not in the traditional sense.  To help combat this, Archive is becoming more and more the steady-partner to Backup, whereby once something is adequately backed up, dormant data can be archived off &#8212; making future backups better.</p>
<p>That all sounds like steps in the right direction, but let&#8217;s take a look using a &#8220;Good, Better, Best&#8221; perspective for how these come together:</p>
<table border="0" cellspacing="0" cellpadding="2" width="600">
<tbody>
<tr>
<td width="42" valign="top"></td>
<td width="558" valign="top"><strong>Good &gt;</strong> Some IT environments are now doing Archive and Backup (and Storage Resource Monitoring), which is solving their tactical backup window and retention challenges &#8212; but they are using multiple point products; with each niche technology installing its own agent on the production servers, its own management console, and creating its own I/O/CPU impact on every production server.</td>
</tr>
<tr>
<td width="42" valign="top"></td>
<td width="558" valign="top"></td>
</tr>
<tr>
<td width="42" valign="top"></td>
<td width="558" valign="top"><strong>Better &gt;</strong> Some data protection vendors have either built or bought complementary archiving and/or SRM functionality. Often this eases buying and evaluation cycles, as well as support resolution. But the multiple agents, back-ends, management interfaces, and I/O/CPU impact on the production environments still apply.</td>
</tr>
<tr>
<td width="42" valign="top"></td>
<td width="558" valign="top"></td>
</tr>
<tr>
<td width="42" valign="top"></td>
<td width="558" valign="top"><strong>Best &gt;</strong> <em>One</em> agent &#8230; <em>One</em> back-end … <em>One</em> console … and <span style="font-size: xx-small;">(most importantly)</span> <em>One</em> CPU/I/O stream on each production server.</td>
</tr>
</tbody>
</table>
<p>In other words &#8212; <em>One Pass on the data</em>, which <span style="font-size: xx-small;">(not coincidently)</span> is the name of Simpana&#8217;s new feature.</p>
<p><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="CommVault_compare_OnePass_workflows_v3" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CommVault_compare_OnePass_workflows_v3.png" border="0" alt="CommVault_compare_OnePass_workflows_v3" width="474" height="211" /></p>
<p>CommVault may not be the only vendor to have ever converged its software’s methodologies, but it is now on a <em>very</em> short list of vendors who are addressing multiple data management problems with a truly unified solution through an elegant architecture.  And most impressively, they did it while not even asking for new licensing or deployment methods.  That&#8217;s right, existing Simpana 9.0 customers can take advantage of this by simply applying the most recent quarterly software update and then doing their normal agent update process.  After that, two simple checkboxes in the Simpana management console will enable the unified &#8220;OnePass&#8221; behavior within the Simpana system.  (<em>check out <a title="ESG Lab Report on CommVault Simpana &quot;OnePass&quot;" href="http://www.enterprisestrategygroup.com/2012/02/lab-review-commvault-simpana-9-“onepass" target="_blank">the ESG Lab Report</a> on all of this</em>)</p>
<p>While I would love to say that consolidating the 3 workflows of Backup, Archiving, and SRM into one process gives you 3X return for your backup window, there are too many variables to make that claim, including:  file types and size, amount of redundancy, archiving retention rules, etc.   But by only traversing the disk system once (instead of for each of the three processes) every Simpana customer should see an appreciable improvement in backup window SLA compliance, as well as the less quantifiable but more appreciable reduced I/O impact on production disks and networks and CPU &#8212; all of which will free the production environment to do less backup tasks and more production work.</p>
<p>As always, thanks for reading.</p>
<blockquote><p><strong><a title="Earlier ESG coverage of CommVault Simpana" href="http://www.enterprisestrategygroup.com/?s=commvault+simpana" target="_blank">Click here</a></strong> to read earlier ESG coverage of CommVault Simpana</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2012/02/commvault-simpana-now-offering-one-pass/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lab Review: CommVault Simpana 9 “OnePass”  Including Integration with HP X9000 Scale-out NAS</title>
		<link>http://www.enterprisestrategygroup.com/2012/02/lab-review-commvault-simpana-9-%e2%80%9conepass%e2%80%9d/</link>
		<comments>http://www.enterprisestrategygroup.com/2012/02/lab-review-commvault-simpana-9-%e2%80%9conepass%e2%80%9d/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 14:00:39 +0000</pubDate>
		<dc:creator>Jason Buffington</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Digital Archiving Software]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Jason Buffington]]></category>
		<category><![CDATA[Lab Reports]]></category>
		<category><![CDATA[CommVault]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[OnePass]]></category>
		<category><![CDATA[scale-out NAS]]></category>
		<category><![CDATA[Simpana]]></category>
		<category><![CDATA[X9000]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=28240</guid>
		<description><![CDATA[This ESG Lab Review documents hands-on testing of Simpana 9 software from CommVault, specifically its “OnePass” data change gathering and retention mechanisms as well as its integration with HP X9000 (IBRIX) scale-out NAS. The Challenges Companies of all sizes continue to struggle with the various aspects of data protection. A great deal of attention is [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">This ESG Lab Review documents hands-on testing of Simpana 9 software from <a href="http://www.commvault.com/">CommVault</a>, specifically its “OnePass” data change gathering and retention mechanisms as well as its integration with <a href="http://www.hp.com/">HP</a> X9000 (IBRIX) scale-out NAS.</div>
<h1>The Challenges</h1>
<p>Companies of all sizes continue to struggle with the various aspects of data protection. A great deal of attention is paid to solving not only traditional backup/restore, but also adding archiving and storage resource management to their infrastructures. Along with improving backups of virtualization platforms, laptops, and key workloads, ESG research<a href="#_ftn1">[1]</a> found that IT end-users planning to implement new data protection initiatives had other goals as well:</p>
<ul>
<li>19% plan to implement data archiving</li>
<li>19% plan to implement data deduplication</li>
<li>18% plan to re-architect their backup processes</li>
<li>13% plan to implement reporting of backup/storage</li>
</ul>
<p>Users attempting to address diverse backup, archive, and reporting needs often employ technologies from multiple vendors—each with their own agent technologies on individual production servers, as well as their own server back-ends and management interfaces. Each point solution performs its own operations on every production server, including traversing the disk, consuming memory/CPU cycles, and contributing to network traffic.</p>
<h1>The Solution: CommVault Simpana 9.0 with “OnePass”</h1>
<p>CommVault customers running Simpana software have already learned to appreciate something better than a myriad of point solutions. Simpana software’s common platform delivers backup, archive, search and storage resource management administered from a single console. While built on a single software code base, Simpana software modules have previously utilized separate processes and index databases to run archive jobs, followed by backup and, finally, reporting.</p>
<p>Throughout 2011, CommVault regularly added incremental features to its Simpana 9.0 platform—one of which is a new operating methodology referred to as &#8220;OnePass,&#8221; which enables backup, archiving, and analytical reporting from a single traversal of the file system. By only reading and/or moving data once, redundant backup, archive, and reporting processes are eliminated to speed operations, reduce storage costs, and simplify management.</p>
<p><strong><em> </em></strong></p>
<h3>ESG Lab Testing</h3>
<p>ESG Lab tested the new OnePass functionality at a shared CommVault and HP test facility located in Denver, Colorado. The ESG Lab test bed consisted of a typical Simpana software configuration of one CommServe and two MediaAgents, each configured to protect three HP X9720 scale-out NAS nodes sharing a single file system, as seen in Figure 1.</p>
<div class="graph_top">Figure 1. The ESG Lab Test Bed: CommVault   and HP Scale-out NAS</div>
<p><img class="alignnone size-full wp-image-28243" title="CVSimpanaLabf1" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CVSimpanaLabf1.png" alt="" width="650" height="288" /><br />
The test bed was provided by HP to assess Simpana 9.0’s ability to protect a high-volume of unstructured data.</p>
<p>ESG Lab investigated how CommVault consolidated data protection methodologies using the OnePass architecture. The left side of Figure 2 shows the typical IO patterns of three related data management workflows, including traditional backup, file-archival for reducing disk consumption, and reporting services. The right side of Figure 2 shows the combined workflow of the OnePass-enabled agent in Simpana 9.0.</p>
<div class="graph_top">Figure 2. Comparing Three Traditional Data   Protection Workflows to “OnePass” within CommVault Simpana 9.0</div>
<p><img class="alignnone size-full wp-image-28244" title="CVSimpanaLabf2" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CVSimpanaLabf2.png" alt="" width="650" height="262" /><br />
Figure 2 shows how “OnePass” traverses the production storage only once, thereby eliminating significant IO redundancies on the primary server, which should dramatically reduce backup windows and the IO penalties associated with data protection and management tasks.</p>
<p>In a traditional environment using three data management tools, ideally with some level of integration or at least reporting, one might:</p>
<ol>
<li>Perform a traditional backup for data recoverability using traditional incremental methods.</li>
<li>After the backup is complete and therefore recoverable (just in case), determine if any files are candidates for archive (hierarchical) management. These files should be &#8220;stubbed&#8221; to save space, meaning that the original file is replaced with a “stub” pointer referring back to the original file held in near-line storage. This ensures that the actual contents are able to be retrieved transparently when the file is accessed.</li>
<li>With the backup finished and the appropriate files migrated to near-line storage, update the reporting system for usage and capacity.</li>
</ol>
<p>In the case of Simpana OnePass functionality, the operating methodology is similar … yet optimized:</p>
<ol>
<li>The agent conducts a backup of changed files.</li>
<li>With the backup changes successfully committed on the media server, the same agent then assesses the files as candidates for archival, and, if so, stubs the file.</li>
</ol>
<ul>
<li>No additional file system traversal is necessary because it was done during the backup.</li>
<li>No additional disk &#8220;read&#8221; or network &#8220;send&#8221; operations are performed during stubbing, as would be required by a separate archival product. The archival process knows that the backup process already read the file and sent it during the backup operation—so it already exists within the Simpana unified storage pool.</li>
<li>Either way, the archival routines within the OnePass agent simply perform the stubbing operation of replacing the actual file with a stub—after which the file-system driver will handle retrieval requests in case the file is accessed.</li>
</ul>
<ol>
<li>With the backup complete and the appropriate files archived, the reporting mechanism updates its information.  Again, this occurs without any incremental disk traversal or network operations because Simpana OnePass uses a common index and reporting mechanism from a single collection.</li>
</ol>
<p>ESG Lab tested the unified OnePass operating model by first conducting separate backups, archives, and report generation using Simpana 9.0 without the OnePass methodology at work. The files were spread across six nodes of an HP NAS and were backed up in parallel by one of the two Simpana media server nodes seen in Figure 1. After the initial testing, ESG Lab audited the results of a similar prolonged test provided by CommVault.</p>
<p>ESG Lab found that the overall backup time was reduced anywhere from 30% to 200% based on three key factors: data types and sizes, amount of redundancy among stored files (e.g., versioning), and archival retention settings that will vary by company. At the low end, even a 30% time savings may mean the difference between compliance with backup window SLAs or not. At the high end, the incremental nature of these backup processes, coupled with nearly transparent archival and SRM functionality, may make the entire backup tax nearly vanish for some production environments.</p>
<p>While less quantifiable, ESG Lab noted that by 1) only traversing the file system once, and 2) offloading the analysis processes to the Simpana MediaServer seen in Figure 2, an appreciable amount of disk IO and CPU processing should be relieved from the production server(s). This means that the production platforms should spend far fewer resources on data protection/management, reserving resulting in more IO and CPU for production purposes.</p>
<p>ESG Lab was impressed by how simple the process was to enable OnePass for Simpana customers. As is typical, the actual agent software components are upgradable through either a push from the Simpana administration console or an .MSI through the customer’s typical software deployment tool. The software can be deployed at any point even if the OnePass functionality is not immediately enabled.</p>
<p>Figure 3 shows how enabling the Archive or SRM reporting functions within the unified agent (i.e., enabling &#8220;OnePass&#8221;) is simply a matter of two checkboxes within the backup configuration in the Simpana administration console.</p>
<div class="graph_top">Figure 3. Enabling “OnePass” via Two   Checkboxes within the Simpana File System Agent</div>
<p><img class="alignnone size-full wp-image-28245" title="CVSimpanaLabf3" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CVSimpanaLabf3.png" alt="" width="602" height="264" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="678" valign="top">
<h1>Why This   Matters</h1>
<p>Most IT professionals instinctively hope for a unified   data protection approach. Historically, they looked for a single backup   solution that protected the range of devices in their environments. With   continually growing data sets, systems are often becoming “too big” to back up   with traditional methods, so solutions for archival and reporting are   becoming equally sought after. And while those are good goals, the reality of   running at least three different data protection, retention, and analysis   agents and processes on a production server is highly undesirable if it means   managing multiple tools, supporting many agents, and continually switching   between tools due to various financial, environmental, or workload-specific   constraints.</p>
<p>ESG Lab found that, with its most recent innovations in   the 9.0 Simpana platform (which could arguably be called R2), CommVault seems   to have achieved something that most suite-based or pseudo-integrated   platform products strive for and that so many backup administrators with   multiple products have longed for: not just interoperability across data   protection and management processes, but actual unification with a single   agent per production platform, running truly combined processes to reduce its   disk/network/CPU footprint while still accomplishing multiple protection and   management goals.</td>
</tr>
</tbody>
</table>
<p><strong> </strong></p>
<h1>Simpana Archive Integration with Scale-out NAS</h1>
<p>&#8220;OnePass&#8221; is not the only innovation recently delivered for the Simpana 9.0 customer base. Along with backing up large file systems, CommVault now also offers its archival capabilities as the near-line extension of scale-out NAS platforms, including the HP X9000 (IBRIX) product family.</p>
<p>By integrating the Simpana software’s archival ability with scale-out NAS, CommVault software is able to offer an additional tier of near‑line storage, enabling organizations to leverage a wider range of storage options at a better price point.</p>
<h3>ESG Lab Testing</h3>
<p>ESG Lab initially treated the HP X9720 platforms as the production server farm being backed up by Simpana. By reconfiguring the test environment, ESG Lab was also able to test Simpana archival storage as a near-line expansion of a scale-out NAS appliance.</p>
<p>Figure 4 shows the reconfigured test bed with the production NAS being archived by the recent enhancements in Simpana 9.0, using the HP X9000 (IBRIX) platforms as a recent example.</p>
<div class="graph_top">Figure 4. Using Simpana software’s Archive   as Near-Line Extended Storage for Scale-out NAS</div>
<p><img class="alignnone size-full wp-image-28246" title="CVSimpanaLabf4" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CVSimpanaLabf4.png" alt="" width="643" height="269" /><br />
In Figure 4, files accessed from the X9000 platforms can be either taken from their own storage pool or transparently retrieved from the Simpana Archive. While the user “sees” all files just as they would expect to within the NAS, those files may be within the primary storage of the scale-out file system or within the Simpana archival storage pool (using any storage that Simpana software supports).</p>
<p>While some NAS vendors provide their own &#8220;archival&#8221; capabilities through storage tiering and near-line capacity, it doesn’t always align with the &#8220;unified&#8221; data protection benefits described above unless 1) backup and reporting are also performed within the NAS/SAN and 2) the NAS/SAN platform is common across the entire corporate environment.  Using a software-based approach, customers may be able to leverage the unified data protection/management capabilities of CommVault software across a wide variety of production servers and NAS platforms consistently—and as a complement to any data management functions that may be offered by the NAS itself.</p>
<p>ESG Lab tested this by enabling the Simpana Linux file server agent on each of the HP X9000 NAS nodes. While many data management products purportedly present challenges when integrating with IBRIX platforms, CommVault is able to use its standard agent with the addition of a registry key on each IBRIX node.</p>
<p>After enabling the agent, ESG Lab tested the user experience by defining archival policies within Simpana software for various files and then retrieving them from an NFS client workstation.</p>
<p>Figure 5 shows two files used during testing of the archive integration with scale-out NAS:</p>
<ul>
<li>The top file listing shows the files were originally 100 MB.
<li>The left statistic reveals each file consumes 102,512 KB.</li>
<li>The right statistic reports each file’s size as 104,857,600 bytes in the directory listing.</li>
<li>The middle of the screen reveals that the files were stubbed after archive—consuming only 20 KB each within the NAS, while still displaying 100 MB in the directory listing.</li>
<li>The last file listing shows that after accessing one of the files, it has been retrieved and thus consumes its regular capacity within the NAS while the other file remains archived until first access.</li>
</ul>
<div class="graph_top">Figure 5. NFS Client’s Experience in   Retrieving Files from an Archive-Enabled NAS</div>
<p><img class="alignnone size-full wp-image-28247" title="CVSimpanaLabf5" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/CVSimpanaLabf5.png" alt="" width="618" height="227" /><br />
Note, that while Figure 5 shows the attributes from an NFS perspective, Windows (CIFS) users would have a similar experience where the actual consumption size is masked and the user perception is all files being offered and stored on the HP NAS.</p>
<p>After enabling archival, ESG Lab configured recurring jobs to enable migration of data from the shared file system within the six IBRIX nodes to the Simpana ContentStore. Files that have been migrated will be returned to make file requests from a client workstation accessing the NFS shares on the X9000. ESG Lab observed no appreciable lag in performance or changes in the users’ experience as file requests were routed to the Simpana platform and transparently retrieved from the CommVault software-powered archive.</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="678" valign="top">
<h1>Why This   Matters</h1>
<p>ESG research<a href="#_ftn2">[2]</a> shows that scale-out NAS is no longer just for extreme usage scenarios; it is   becoming more and more mainstream. And while platforms like the HP X9000 (IBRIX)   offer significant storage performance, they sometimes require proprietary   data protection methods and often lack the extensibility to be protected by   more typical third-party software solutions.    CommVault and HP/IBRIX have partnered in such a way that a simple   registry key enablethe Simpana archive capability.</p>
<p>By combining the archive (and backup) capabilities of   Simpana with the scale-out NAS functionality of HP&#8217;s X9000 series, CommVault customers   can not only achieve their performance goals for NAS, but do so while   managing costs and capacity through Simpana software&#8217;s archive ability.</td>
</tr>
</tbody>
</table>
<p><strong> </strong></p>
<h1>ESG Lab Validation Highlights</h1>
<ul>
<li>ESG Lab examined and tested the combined methodology of “OnePass” with appreciably reduced overall data protection jobs, as well as reduced impact to the production servers due to the consolidated network and disk operations of “OnePass.”</li>
<li>ESG Lab observed how easy it was to enable Simpana software as an archive to a scale-out NAS, without perceivable changes to the end-users’ experience.</li>
</ul>
<h1>Issues to Consider</h1>
<ul>
<li>ESG Lab found that while it would be easy for an experienced Simpana  operator to add the OnePass functionality to their environment, the  Simpana administration console may appear complex to someone new. This  is a reasonable result of a very mature ninth-generation codebase that  continually adds new features and options based on feedback from over  15,000 customers.<a href="../../../../../wp-admin/post.php?post=28240&amp;action=edit&amp;message=9#_ftn3">[3]</a> Those considering converting to Simpana for its OnePass functionality,  its other workload-specific capabilities, or its ability to provide an  archival store for scale-out NAS should be prepared for a learning curve  which can be offset by training.</li>
<li>While the HP X9000 is just one of the scale-out NAS platforms  supported by the Simpana software archival function, customers will want  to ensure that their specific platform is currently covered. With  CommVault routinely producing updates and incremental functionality,  those not directly supported today may be supported later in 2012.</li>
</ul>
<h1>The Bigger Truth</h1>
<p>Most environments struggle with a myriad of data protection and management technologies, perhaps because of workload-specific requirements, data center solutions that are less ideal in remote offices, or simply different data management goals (e.g., backup, archive, and reporting). For many, the sentiment has often been “<em>If there was a unified solution that did everything well, then we would all own it already</em>.” For others, the potential interoperability of suite-based software or simply complementary products from the same vendor have left customers disappointed as they discovered that each product operates as if it were the only tool that matters.</p>
<p>By simply enabling the “OnePass” capabilities within Simpana 9.0, CommVault customers can enjoy something that many others should find very enviable: a single agent that backs up, archives, and reports on each production server, with only one network stream and significantly optimized disk-IO impact. The result is something that appears so intuitive that it should be the measure by which other unified products aspire—where functions/technologies may have originally been developed or even acquired separately, but eventually become folded into a single agent talking to a unified back end.</p>
<p>Along with observing the before and after effects of “OnePass,” ESG Lab also tested integration of the archival capabilities of Simpana software with scale-out NAS, showing an appreciable benefit to customers with applicable platforms. Without changing the client experience or installing client-side software, even the most advanced NAS platforms can take advantage of an additional tier of storage through the near-line capabilities of Simpana.</p>
<p>If you are currently using a variety of data and management technologies for different purposes and have been disappointed by the lack of integration or coexistence supportability, then Simpana may be exactly what you have been looking for. While individual test results will vary, the fact that common disk reads and network operations are unified should be a valuable optimization method that all environments can take advantage of. Looking at the unified workflow of Simpana software’s OnePass methodology should make you ask, “<em>Why doesn’t everyone do it like that</em>?”</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <a href="../../../../../2010/04/2010-data-protection-trends/"><em>2010 Data Protection Trends</em></a>, April 2010.</p>
<p><a name="_ftn2">[2]</a> See: ESG Research Report, <a href="../../../../../2011/03/scale-out-storage-market-forecast-2010-2015/"><em>Scale-Out Storage Market Forecast 2010-2015</em></a>, December 2010.</p>
<p><a name="_ftn3">[3]</a> CommVault <a href="http://news.commvault.com/press/000692_CommVault_Reaches_15000_Customer_Milestone_on_One-Year_Anniversary_of_Simpana_9.asp">press release</a>, November 2011</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#D3D3D3">
<tbody>
<tr>
<td width="706" valign="top">
<h1>ESG Lab Reports</h1>
<p>The goal of ESG Lab reports is to educate IT professionals about emerging technologies and products in the storage, data management and information security industries. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Lab&#8217;s expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by CommVault.</td>
</tr>
</tbody>
</table>
<p></br></br></p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2012/02/lab-review-commvault-simpana-9-%e2%80%9conepass%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Symantec Extends Appliance Series with NetBackup 5220</title>
		<link>http://www.enterprisestrategygroup.com/2012/01/symantec-extends-appliance-series-with-netbackup-5220/</link>
		<comments>http://www.enterprisestrategygroup.com/2012/01/symantec-extends-appliance-series-with-netbackup-5220/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 16:18:09 +0000</pubDate>
		<dc:creator>Lauren Whitehouse</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Briefs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[backup and recovery]]></category>
		<category><![CDATA[NetBackup]]></category>
		<category><![CDATA[Symantec]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=27739</guid>
		<description><![CDATA[Symantec’s initial focus with its appliance strategy was to deliver scalable deduplication via its NetBackup 5000 Series. In addition, the company introduced an all-in-one NetBackup 5200 appliance based on NetBackup 7. Symantec has extended the NetBackup 5200 Series appliance with the NetBackup 5220, expanding storage capacity and connectivity options. Overview Consolidation has been a big [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract"><a href="http://www.symantec.com/">Symantec</a>’s initial focus with its appliance strategy was to deliver scalable deduplication via its NetBackup 5000 Series. In addition, the company introduced an all-in-one NetBackup 5200 appliance based on NetBackup 7. Symantec has extended the NetBackup 5200 Series appliance with the NetBackup 5220, expanding storage capacity and connectivity options.</div>
<private_standard>
<h1>Overview</h1>
<p>Consolidation has been a big theme over the last few years. Data centers, servers, storage, and more are being combined for simplified management and cost savings. This theme is also being seen in data protection, with backup/recovery hardware and software components being united in appliance form factors. In recent ESG research,<a href="#_ftn1">[1]</a> survey respondents were polled regarding current and planned use of integrated computing platforms (see Figure 1). While adoption of integrated computing technology has been relatively tempered to date, ESG research reveals that, today, it’s more likely to see organizations committing IT budget to the purchase of integrated solutions.</p>
<div class="graph_top">Figure 1. Interest in Integrated   Computing Platforms, by Company Size</div>
<p><img class="alignnone size-full wp-image-27741" title="SymantecAppliancesf1" src="http://www.enterprisestrategygroup.com/media/wordpress/2012/01/SymantecAppliancesf1.png" alt="" width="651" height="376" /><br />
An integrated approach promises simplified management and faster provisioning—benefits that organizations with increasingly large and complex IT environments will appreciate. ESG research found that current and planned adopters cite benefits of the approach, including simplified management, reduced deployment time, better TCO, and improved interoperability, application performance, and service and support, as the main drivers for implementing converged infrastructure stacks.<a href="#_ftn2">[2]</a></p>
<p>When compared with software-only packaging for backup and recovery, a fully-integrated, all-in-one package has several advantages. Typically, backup software solutions are less expensive, but more time and technical acumen is required for setup. In addition to the software, other components of the &#8220;stack&#8221; need to be procured: the physical server and operating system it will run on, storage media, and networking components. Appliance-based solutions are pre-assembled with these components, offering a more plug-and-play installation and configuration experience. An all-in-one appliance approach removes the need to source individual components of a whole solution. In addition, the appliance vendor has pre-tested the configuration, security hardened it, and optimized it to perform for its backup software application, potentially reducing the user&#8217;s administrative overhead for maintaining the system. The approach reduces the degree of complexity and cost involved versus integrating disparate components in an ad hoc solution. Finally, since it&#8217;s sourced through a single vendor, interactions from purchase to support are streamlined significantly.</p>
<p>Symantec introduced appliance platforms for data deduplication in its NetBackup 5000 Series, and NetBackup backup engine in the NetBackup 5200 Series, in late 2010. Symantec more recently introduced the NetBackup 5220 appliance to deliver greater scalability and connectivity. The NetBackup 5220 appliance includes the company’s latest NetBackup 7 software and has integrated deduplication capabilities. It comes configured with 4 TB of storage capacity, which can be expanded to a maximum of 36 TB (with the addition of storage trays) or to a maximum of 192 TB (when combined with a NetBackup 5000 deduplication appliance). Capacity can be intermixed between deduplication and disk storage.</p>
<h1>Analysis</h1>
<p>Symantec is differentiated by its fully-integrated backup appliance and “deduplication everywhere” approach. The company offers integrated deduplication in its own backup software- and hardware-based solutions, as well as catalog-level integration with backup target devices of third-party vendors. Depending on the implementation, NetBackup 7 offers integrated source-, proxy- (i.e., NetBackup media server), and target-based deduplication, and inline or post-process configuration.</p>
<p>The newest member of the NetBackup 5200 Series Appliances is the NetBackup 5220 appliance. The NetBackup 5220 is a 2U rack-mountable form factor appliance that comes standard with 4 TB of usable capacity in a RAID-6 configuration. It can, however, be expanded up to 36 TB via an optional storage shelf, with future plans to expand both the physical and logical (deduplicated) capacities. When combined with NetBackup 5000 Series appliances (the NetBackup 5000 model or NetBackup 5020 model), the solution scales up to 192 TB of usable capacity. Connectivity options have been expanded in the new model to include six 1GbE and two 8Gb FC ports standard; and up to two 10GbE and up to six 8Gb FC ports as options. Support for 8Gb FC interface allows for improved streaming of SAN clients and/or to replace virtual tape libraries (VTLs) (garnering the benefits of Fibre Channel connectivity but without the limitations inherent in traditional VTL interfaces). Further, the implementation removes the need for a separate master or media server since it can be deployed as either. The NetBackup 5200 Series appliances unite both physical and virtual machine protection. Deduplicating across both environements provides greater deduplication results and further reduces storage requirements.</p>
<p>The “backup in a box” NetBackup 5200 Series appliances are licensed on a flat rate per hardware appliance. As for the software component, the volume of data on the front side (i.e., production data) determines the capacity license required. In an interesting twist that is more in line with a software distribution model, NetBackup licenses (on maintenance) can be transferred to the appliance. As newer appliance models become available, NetBackup licenses are transferrable, providing cost savings and future proofing.</p>
<p><strong><br />
</strong></p>
<h1>The Bigger Truth</h1>
<p>IT organizations modernizing their data protection infrastructures may look to integrated computing platforms to streamline deployments and reduce costs. Symantec’s appliance approach is applicable in small- to large-enterprise environments, as well as remote/branch offices (ROBOs). The appliance form factor of NetBackup 5200 Series provides a new level of deployment simplicity for current and prospective NetBackup customers, while also providing predictability of performance. With the addition of the NetBackup 5220 appliance, Symantec now offers a new level of flexibility for configuring and deploying NetBackup.</p>
<p>From a management perspective, there is, however, some room for improvement. The initial configuration of the appliance is done via a Web interface. Ongoing, day-to-day backup and recovery operations are managed and monitored via the NetBackup console. However, if multiple appliances are in use, centralized monitoring and reporting of backup activity occurs through NetBackup OpsCenter Web console, but a centralized view of appliance hardware status is not rolled up. Given Symantec’s relative newcomer status as a hardware provider, this limitation will likely be addressed in short order.</p>
<p>Smaller organizations typically have fewer resources to integrate the disparate components of the backup infrastructure stack. However, ideal candidates for NetBackup 5200 Series appliance adoption are likely the larger organizations more atypical of NetBackup’s installed base—especially as current customers upgrade to version 7 and/or take advantage of NetBackup’s key features supporting server virtualization deployments. For this reason, ESG expects Symantec’s appliance offerings to gain serious consideration and adoption in the near term.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Brief, <a href="../../../../../2011/03/esg-research-brief-integrated-computing-trends/"><em>Integrated</em> <em>Computing Trends</em></a>, March 2011.</p>
<p><a name="_ftn2">[2]</a> Ibid.<br />
<br /></br>
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2012/01/symantec-extends-appliance-series-with-netbackup-5220/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SNW Report: Dedupe Feedback from the Front Line</title>
		<link>http://www.enterprisestrategygroup.com/2011/10/snw-report-dedupe-feedback-from-the-front-line/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/10/snw-report-dedupe-feedback-from-the-front-line/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 14:52:19 +0000</pubDate>
		<dc:creator>Lauren Whitehouse</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[snw]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=25622</guid>
		<description><![CDATA[brightcove.createExperiences(); You can read Lauren&#8217;s other blog entries at Data Protection Perspectives.]]></description>
			<content:encoded><![CDATA[<p><!-- Start of Brightcove Player --></p>
<div style="display:none">
</div>
<p><!--<br />
By use of this code snippet, I agree to the Brightcove Publisher T and C found at https://accounts.brightcove.com/en/terms-and-conditions/.<br />
--></p>
<p><script type="text/javascript" src="http://admin.brightcove.com/js/BrightcoveExperiences.js"></script></p>
<p><object id="myExperience1211641092001" class="BrightcoveExperience"><param name="bgcolor" value="#FFFFFF" /><param name="width" value="486" /><param name="height" value="412" /><param name="playerID" value="53339150001" /><param name="playerKey" value="AQ~~,AAAADEwMNSE~,6RGpKmS-G-MJeoXI2D4HN4DB-Yc5PyXV" /><param name="isVid" value="true" /><param name="dynamicStreaming" value="true" /><param name="@videoPlayer" value="1211641092001" /></object></p>
<p><!--<br />
This script tag will cause the Brightcove Players defined above it to be created as soon<br />
as the line is read by the browser. If you wish to have the player instantiated only after<br />
the rest of the HTML is processed and the page load is complete, remove the line.<br />
--><br />
<script type="text/javascript">brightcove.createExperiences();</script></p>
<p><!-- End of Brightcove Player --></p>
<p>You can read Lauren&#8217;s other blog entries at <a href="http://www.dataprotectionperspectives.com/">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/10/snw-report-dedupe-feedback-from-the-front-line/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Virtualization and Copied Data</title>
		<link>http://www.enterprisestrategygroup.com/2011/09/data-virtualization-and-copied-data/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/09/data-virtualization-and-copied-data/#comments</comments>
		<pubDate>Thu, 29 Sep 2011 22:15:18 +0000</pubDate>
		<dc:creator>Steve Duplessie</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[Data Center Strategy and Best Practices]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[IT Operations]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Management Automation]]></category>
		<category><![CDATA[Steve Duplessie]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[Virtualization Management]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data virtualization]]></category>
		<category><![CDATA[Server Virtualization]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=25303</guid>
		<description><![CDATA[Video follow up to my blog on data virtualization. brightcove.createExperiences(); You can read Steve&#8217;s other blog entries at The Bigger Truth.]]></description>
			<content:encoded><![CDATA[<p>Video follow up to my blog on <a href="http://www.thebiggertruth.com/2011/09/treat-the-cause-not-the-symptom-virtualize-data/" target="_blank">data virtualization</a>.</p>
<p><!-- Start of Brightcove Player --></p>
<div style="display:none">
</div>
<p><!--<br />
By use of this code snippet, I agree to the Brightcove Publisher T and C<br />
found at https://accounts.brightcove.com/en/terms-and-conditions/.<br />
--></p>
<p><script type="text/javascript" src="http://admin.brightcove.com/js/BrightcoveExperiences.js"></script></p>
<p><object id="myExperience1183837868001" class="BrightcoveExperience"><param name="bgcolor" value="#FFFFFF" /><param name="width" value="486" /><param name="height" value="412" /><param name="playerID" value="53339150001" /><param name="playerKey" value="AQ~~,AAAADEwMNSE~,6RGpKmS-G-MJeoXI2D4HN4DB-Yc5PyXV" /><param name="isVid" value="true" /><param name="dynamicStreaming" value="true" /><param name="@videoPlayer" value="1183837868001" /></object></p>
<p><!--<br />
This script tag will cause the Brightcove Players defined above it to be created as soon<br />
as the line is read by the browser. If you wish to have the player instantiated only after<br />
the rest of the HTML is processed and the page load is complete, remove the line.<br />
--><br />
<script type="text/javascript">brightcove.createExperiences();</script></p>
<p><!-- End of Brightcove Player --></p>
<p>You can read Steve&#8217;s other blog entries at <a href="http://www.thebiggertruth.com/">The Bigger Truth</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/09/data-virtualization-and-copied-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ESG Research Webinar Recording: ROBO Technology Trends</title>
		<link>http://www.enterprisestrategygroup.com/2011/09/esg-research-webinar-recording-robo-technology-trends/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/09/esg-research-webinar-recording-robo-technology-trends/#comments</comments>
		<pubDate>Fri, 16 Sep 2011 13:56:16 +0000</pubDate>
		<dc:creator>Bob Laliberte</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Bob Laliberte]]></category>
		<category><![CDATA[Data Center Optimization]]></category>
		<category><![CDATA[Data Center Strategy and Best Practices]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Featured Section]]></category>
		<category><![CDATA[IT Operations]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Kristine Kao]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[Videos]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=25001</guid>
		<description><![CDATA[The top ROBO IT priorities are clear: Improve the end-user experience and bolster information security. A typical enterprise with 5,000 or more employees has an average of 394 physical locations. Many of these are remote and branch offices (ROBOs), which all require connectivity and secure access to corporate applications, data, and IT services. This has [...]]]></description>
			<content:encoded><![CDATA[<div id="media"><img class="alignnone size-full wp-image-25005" title="ROBO Webinar Header" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/09/ROBO-Webinar-Header.png" alt="" width="654" height="222" /></div>
<div>
<p>The top ROBO IT priorities are clear:</p>
<p>Improve the end-user experience and bolster information security.</p>
<p>A typical enterprise with 5,000 or more employees has an average of 394 physical locations.  Many of these are remote and branch offices (ROBOs), which all require connectivity and secure access to corporate applications, data, and IT services.  This has presented a perennial challenge to IT professionals everywhere as they look to provide a consistent user experience for ROBO employees while ensuring adequate data security and compliance and maximizing IT efficiency.</p>
<p>In this exclusive clients-only webinar, ESG senior analysts <a href="../../../../../bob-laliberte/" target="_blank">Bob Laliberte</a> and <a href="../../../../../lauren-whitehouse/" target="_blank">Lauren Whitehouse</a> present new research on current trends related to ROBO IT challenges and priorities, current and planned application delivery models, and how the usage of cloud computing services such as SaaS are shaping ROBO plans.  Specific technologies covered in this webcast will include networking/WAN optimization, information security, server and storage infrastructure, as well as data protection strategies.</p>
</div>
<div>You can view the recorded webinar below.  The recording is available only to ESG subscription clients, and you must be logged in to view it.</div>
<p></br></p>
<private_premium>
<!-- Start of Brightcove Player --></p>
<div style="display:none">
</div>
<p><!-- By use of this code snippet, I agree to the Brightcove Publisher T and C  found at https://accounts.brightcove.com/en/terms-and-conditions/.  --></p>
<p><script type="text/javascript" src="http://admin.brightcove.com/js/BrightcoveExperiences.js"></script></p>
<p><object id="myExperience1162113768001" class="BrightcoveExperience"><param name="bgcolor" value="#FFFFFF" /><param name="width" value="650" /><param name="height" value="551" /><param name="playerID" value="1016854257001" /><param name="playerKey" value="AQ~~,AAAADEwMNSE~,6RGpKmS-G-NgWAoJ_th9FNBToRF_gJXO" /><param name="isVid" value="true" /><param name="dynamicStreaming" value="true" /><param name="@videoPlayer" value="1162113768001" /></object></p>
<p><!--  This script tag will cause the Brightcove Players defined above it to be created as soon as the line is read by the browser. If you wish to have the player instantiated only after the rest of the HTML is processed and the page load is complete, remove the line. --><br />
<script type="text/javascript">brightcove.createExperiences();</script></p>
<p><!-- End of Brightcove Player --></p>
<h1>Related Reports</h1>
<p>ESG Research Report, <a href="http://www.enterprisestrategygroup.com/2011/07/remote-officebranch-office-technology-trends/" target="_blank">Remote Office/Branch Office Technology Trends</a></p>
<p>ESG Research Brief, <a href="http://www.enterprisestrategygroup.com/2011/08/esg-research-brief-remotebranch-offices-an-extremely-vulnerable-target-for-security-attacks/" target="_blank">Remote/Branch Offices – An Extremely Vulnerable Target for Security Attacks</a></p>
<p>ESG Research Brief, <a href="http://www.enterprisestrategygroup.com/2011/09/esg-research-brief-reference-research-remotebranch-office-trends/" target="_blank">Reference Research – Remote/Branch Office Trends</a></p>
<p>ESG Research Brief, <a href="http://www.enterprisestrategygroup.com/2011/09/esg-research-brief-reference-research-robo-data-storage-trends/" target="_blank">Reference Research – ROBO Data Storage Trends</a><span style="text-decoration: underline;"> </span></p>
<p>ESG Research Brief, <a href="http://www.enterprisestrategygroup.com/2011/09/esg-research-brief-reference-research-robo-server-infrastructure-trends/" target="_blank">Reference Research – ROBO Server Infrastructure Trends</a>
</private_premium>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/09/esg-research-webinar-recording-robo-technology-trends/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Permabit goes into dedupe hyperdrive &#8211; The Register</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/permabit-goes-into-dedupe-hyperdrive-the-register/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/permabit-goes-into-dedupe-hyperdrive-the-register/#comments</comments>
		<pubDate>Tue, 16 Aug 2011 20:03:32 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[In The News]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Albireo]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Permabit]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=24124</guid>
		<description><![CDATA[ESG has validated Permabit&#8217;s claims, with Steve Duplessie, founder and senior analyst, saying: &#8220;I don&#8217;t see any other real alternatives for OEMs to be able to quickly get to market with lightning fast dedupe capabilities for primary, secondary, or really any-dary storage. Albireo rocks.&#8221; via Permabit goes into dedupe hyperdrive &#8211; The Register.]]></description>
			<content:encoded><![CDATA[<p>ESG has validated Permabit&#8217;s claims, with Steve Duplessie, founder and senior analyst, saying: &#8220;I don&#8217;t see any other real alternatives for OEMs to be able to quickly get to market with lightning fast dedupe capabilities for primary, secondary, or really any-dary storage. Albireo rocks.&#8221;</p>
<p>via <a href="http://www.theregister.co.uk/2011/08/16/permabit_albireo_speedup/" target="_blank">Permabit goes into dedupe hyperdrive &#8211; The Register</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/permabit-goes-into-dedupe-hyperdrive-the-register/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Permabit Albireo: Empowering Unified Deduplication</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/permabit-albireo-empowering-unified-deduplication-2/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/permabit-albireo-empowering-unified-deduplication-2/#comments</comments>
		<pubDate>Mon, 15 Aug 2011 13:01:32 +0000</pubDate>
		<dc:creator>Brian Garrett</dc:creator>
				<category><![CDATA[Brian Garrett]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lab Reports]]></category>
		<category><![CDATA[Albireo]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[Permabit]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23964</guid>
		<description><![CDATA[Permabit’s field-proven data deduplication engine is now available as a software library and software development kit (SDK) which enables unified data deduplication.  This updated ESG Lab Validation report documents the results of hands-on testing of the Permabit Albireo SDK which focused on: ease of integration into existing solutions, capacity savings that can be achieved with [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract"><a href="http://www.permabit.com/" target="_blank">Permabit</a>’s field-proven data deduplication engine is now available as a software library and software development kit (SDK) which enables unified data deduplication.  This updated ESG Lab Validation report documents the results of hands-on testing of the Permabit Albireo SDK which focused on: ease of integration into existing solutions, capacity savings that can be achieved with real-world data, resource usage, and fault tolerance.  This report also examines the progress that has been made over the past 18 months with a focus on grid scalability and performance improvements.</div>
<h1>Introduction</h1>
<h2>Background</h2>
<p>Back in 2001, when Data Domain was founded and EMC purchased Belgian startup FilePool for its Centera product line, few in the industry had heard of data deduplication. Fewer still were aware that Permabit, founded in 2000, was hard at work developing scalable data deduplication technology. Today, it’s hard to find anyone in the storage industry who hasn’t heard about data deduplication.</p>
<p>ESG research indicates that there has been a large jump in deduplication adoption in the data protection market. As shown in Figure 1, adoption in 2008 was 11%, and grew to 38% of respondents in 2010 with more concentration in enterprises (45%) than in midmarket organizations (29%).<a href="#_ftn1">[1]</a> Further, the number of organizations that have adopted, or plan on adopting, deduplication has grown to 78%. ESG expects that interest and adoption in deduplication will increase significantly over the next five years as it reaches mainstream market adoption.</p>
<div class="graph_top">Figure 1. Widespread Interest in Data Deduplication</div>
<p><img class="aligncenter size-full wp-image-23967" title="PermabitAlbireoUpdateF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF1.png" alt="" width="610" height="321" />Interest outside of the backup and recovery market is growing as well. In primary storage, data deduplication has been added to network attached storage (NAS) systems  from a number of emerging and market leading storage vendors (e.g., NetApp FAS, EMC VNX). As a matter of fact, ESG research indicates that data reduction technologies are quickly rising on the list of most important features and attributes for storage solutions. This survey of decision makers within enterprise-class organizations indicates that 27% would not purchase a scale-out storage solution without data reduction and 51% would strongly prefer a solution with this attribute.<a href="#_ftn2">[2]</a> Given the dramatic data reduction benefits that have been realized with existing data deduplication technologies, ESG believes that it’s only a matter of time before deduplication becomes a core component of every tier of storage including primary, backup, archive, block, file, unified, solid state disk, and cloud, <em>and</em> moves up the stack into operating systems, applications, and databases.</p>
<h2>Introducing Permabit Deduplication Technology</h2>
<p>Permabit’s field-proven data deduplication engine is available as a software development kit (SDK), called Albireo, which enables unified data deduplication. The Albireo SDK is unique in its ability to provide a wide variety of deduplication services. The Albireo SDK can be used to:</p>
<ul>
<li>Reduce capacity within file, block, or unified storage systems (e.g., NAS appliances or FC disk arrays).</li>
<li>Process objects at the sub-file or single instance storage (SIS) level.</li>
<li>Remove duplicates in real-time (inline) or later after data has arrived (post-process) or a hybrid approach which Permabit refers to as “parallel.”</li>
<li>Scale from a single storage controller to a cluster of storage controllers or a cluster of deduplication appliances communicating with a storage solution over an industry standard Ethernet interface.</li>
<li>Provide deduplication services that are “end-to-end and everywhere” in the IT stack—from applications and databases running on physical and virtual hosts through operating systems and hypervisors to storage solutions.</li>
</ul>
<div class="graph_top">Figure 2. End-to-End and Everywhere Deduplication   with Permabit Albireo</div>
<p><img class="aligncenter size-full wp-image-23968" title="PermabitAlbireoUpdateF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF2.png" alt="" width="462" height="313" />Permabit was founded in 2000 by MIT engineers with a goal of developing a scalable enterprise-class archive product with built-in data deduplication. That product, now known as the Permabit Enterprise Archive, is shown toward the left in Figure 3. Industry standard servers are arranged in a grid, with each server acting as either an access or storage node. The storage nodes are packed full of high capacity SATA drives and presented as a network attached file system (NAS). Servers can be added to the grid for increased performance and capacity.</p>
<div class="graph_top">Figure 3. Introducing Permabit Albireo Deduplication Advisory Services</div>
<p><img class="aligncenter size-full wp-image-23969" title="PermabitAlbireoUpdateF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF3.png" alt="" width="578" height="257" />In the Permabit Enterprise Archive product, a global pool with deduplication technology is implemented within a grid of servers. Permabit has extracted this core deduplication technology to create the Albireo SDK. Albireo deduplication advisory services are accessed through an application programming interface (API) provided by Permabit. A storage system with Albireo running within a single controller attached to a number of drive enclosures is shown toward the right in Figure 3.</p>
<p>Albireo indexes chunks of data and provides advisory notifications when duplicates are detected. The software running in the storage system decides whether or not to take the advice. If the advice is taken, it is the responsibility of the storage system software to update references to duplicate data. The Permabit Albireo API supports both block and stream-based access methods. The stream-based API provides content-aware segmentation to optimize deduplication based on file type. Duplicate advisory services can be provided synchronously, or asynchronously using a registered callback. Taken together, the Albireo SDK provides a flexible and powerful array of deduplication services.</p>
<h1>ESG Lab Validation</h1>
<p>ESG Lab initially evaluated the Albireo SDK during two days of hands-on testing at Permabit headquarters in Cambridge, Massachusetts in late 2009 and performed an audit of resource and performance improvements in 2011. The evaluation began with an overview of how Albireo deduplication advisory services are used within an existing storage system. As shown in Figure 4 the software running within a storage system uses an API call to send new data and addressing information to Albireo. Albireo ingests the data and performs a SHA-256 hash. A two-stage lookup (memory and, if needed, disk) is performed to see if the data has already been stored. If the data is a duplicate, the API returns the location of the pre-existing data. If the data is new, Albireo remembers the SHA-256 hash and the chunk’s location so that it can advise of duplicate data in the future.</p>
<div class="graph_top">Figure 4. Permabit Albireo Deduplication   Advisory Services</div>
<p><img class="aligncenter size-full wp-image-23970" title="PermabitAlbireoUpdateF4" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF4.png" alt="" width="384" height="340" />ESG Lab evaluated the Albireo SDK using a pair of programs developed by Permabit:</p>
<ul>
<li><strong>PBSCAN:</strong> This utility uses Albireo advisory services to determine the savings that can be achieved with deduplication. The program is routinely used at prospective customer sites to determine the benefits of deduplication with real-world data sets.</li>
<li><strong>DD2FS:</strong> An open-source user-space file system was modified to illustrate the ease of integrating Albireo into an existing file system.</li>
</ul>
<p>The test program was implemented as a single process with the Albireo engine running as a separate service. The pbscan utility was implemented with multi-threads working in parallel.  As shown in Figure 5, the test bed used for this phase of the evaluation consisted of a single quad-core Intel Xeon 3 GHz processor with 4 GB of RAM. For test purposes only, and to more easily discern the resource utilization by Albireo, all but one of the processor cores were disabled in BIOS.</p>
<div class="graph_top">Figure 5. The ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-23971" title="PermabitAlbireoUpdateF5" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF5.png" alt="" width="480" height="377" />While ESG Lab testing was performed with a single processor core, it should be noted that the extreme scalability of the Albireo engine has been proven in production customer environments. Later in this report, we’ll take a look at how Permabit’s underlying deduplication technology is routinely deployed within a grid of multi-core servers.</p>
<h2>Capacity Savings</h2>
<p>ESG Lab used the Albireo-enabled pbscan utility and real-world application data collected from Permabit’s production IT servers to evaluate the capacity savings that can be achieved with Permabit deduplication advisory services. The results for two data sets are summarized in Figure 6 and Table 1.</p>
<div class="graph_top">Figure 6. Capacity Savings</div>
<p><img class="aligncenter size-full wp-image-23972" title="PermabitAlbireoUpdateF6" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF6.png" alt="" width="509" height="283" /></p>
<div class="graph_top">Table 1. Capacity Savings</div>
<p><img class="aligncenter size-full wp-image-23980" title="PermabitAlbireoUpdateT1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateT1.png" alt="" width="618" height="99" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>A 43 GB set of common office productivity files (e.g., documents, spreadsheets, presentations) was reduced by 32.67%.</li>
<li>Four VMware virtual server images with a total capacity of 157 GB were reduced to only 4.3 GB. VMware virtual server images are often highly redundant, especially when each virtual machine is running the same guest operating system as was the case for this test. In this example, 97% of the capacity required for the VMware images can be saved with Albireo-enabled deduplication.</li>
</ul>
<p>The experiment was repeated for a pair of Microsoft Hyper-V images and a week’s worth of Microsoft Exchange backup images. One of the backup images was a full backup; the other images were incremental. The results for all of the data types evaluated by ESG Lab are summarized in Figure 7 and Table 2. Note that the savings are shown as a deduplication ratio instead of a percentage of capacity saved.</p>
<div class="graph_top">Figure 7. Deduplication Ratios</div>
<p><img class="aligncenter size-full wp-image-23973" title="PermabitAlbireoUpdateF7" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF7.png" alt="" width="534" height="301" /></p>
<div class="graph_top">Table 2. Deduplication Ratios</div>
<p><img class="aligncenter size-full wp-image-23981" title="PermabitAlbireoUpdateT2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateT2.png" alt="" width="614" height="170" /></p>
<h4>What the Numbers Mean</h4>
<ul>
<li>A pair of Microsoft Hyper-V images was reduced by a factor of 2.1:1.</li>
<li>A single week’s worth of Microsoft Exchange backups (weekly full, daily incremental) was reduced from 2,203 GB to 501 GB, providing an excellent deduplication ratio of 7.4:1.</li>
<li>A deduplication ratio of 36.2:1 was recorded for four VMware images.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Data proliferation is a challenge for IT professionals   within organizations of all sizes. By eliminating redundant data, data   deduplication can significantly reduce capacity requirements, which reduces   the cost of storing and protecting data.</p>
<p>ESG Lab testing with real-world application data has   confirmed that Permabit Albireo deduplication advisory services can be used   to reduce capacity requirements for primary storage arrays, disk-based   archives, and backups.   ESG Lab   recorded an outstanding deduplication rate of 97% (36.2:1) for four VMware   virtual server images.</td>
</tr>
</tbody>
</table>
<h2>Resource Efficiency</h2>
<p>Identifying duplicate data is a resource-intensive operation. First, data needs to be fed into a deduplication engine.  Moving a lot of data can consume a lot of bandwidth. Next, the engine needs an algorithm which can be used to quickly and accurately identify duplicates. Most deduplication solutions use cryptographic hashing functions to identify duplicates, but hashing a lot of data can consume a lot of CPU horsepower. Last, but not least, the deduplication solution needs to maintain an index of previously processed data to find and keep track of duplicates.  Most deduplication solutions use a two stage lookup, with the first lookup occurring in memory and the second occurring on disk. Indexing lots of possibly duplicate data can consume a significant amount of memory. Each of these resource issues can have an effect on the overall performance of a deduplication solution.</p>
<p>Permabit supports two modes of indexing in the latest version of Albireo. Traditional dense indexing is a full chunk index approach that keeps all index entries available in memory to determine if that data chunk has previously been seen. At scale, this approach limits the efficiency of indexing and degrades performance. If the amount of memory is limited, then the ability to scale out is also limited by the amount of available RAM. Sparse indexing improves memory efficiency using a sampling technique that exploits the inherent locality within data streams. It addresses large scale (e.g., hundreds of terabytes) data chunk lookup bottleneck problems as it avoids the limitations of traditional dense indexing. Permabit has developed a hybrid approach using both sparse and dense indexing with a goal of increasing overall deduplication efficiency by 10X or more with only 0.1 bytes of RAM required for each chunk of indexed data.</p>
<p>ESG Lab’s analysis of CPU and memory efficiency began with a review of Permabit’s patented two-stage deduplication detection algorithm.<a href="#_ftn3">[3]</a> The patent describes a highly efficient two-stage index residing in memory and on disk. This novel approach uses a combination of bit sampling and byte differencing to provide a first stage memory lookup that executes very quickly, consumes very little memory, and has a very low probability of false positives.</p>
<p>A 42 GB set of office productivity files (documents, spreadsheets, presentations, PDFs, etc.) was processed by an Albireo-enabled deduplication utility. The utility was used to quantify the speed and resource overhead of Permabit deduplication advisory services. The utility opened and passed all of the data within each file to the Albireo API. The Albireo API was used to identify and keep track of chunks of data in a Permabit index of duplicate candidates.</p>
<p>Linux utilities were used to record CPU and memory utilization with Albireo services running on a single Intel Xeon processor. Trace output was used to record the average latency for each Albireo deduplication fingerprint API lookup. In addition, trace data was used to record the end-to-end latency associated with Albireo-enabled deduplication including the SHA-256 ingest, the Albireo fingerprint API, the first pass memory index, and a second order index operation for likely duplicate candidates. The results are summarized in Table 3 and Figure 8.</p>
<div class="graph_top">Table 3. Albireo Resource Usage</div>
<p><img class="aligncenter size-full wp-image-23982" title="PermabitAlbireoUpdateT3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateT3.png" alt="" width="626" height="99" /></p>
<div class="graph_top">Figure 8. End-to-End CPU Utilization</div>
<p><img class="aligncenter size-full wp-image-23974" title="PermabitAlbireoUpdateF8" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF8.png" alt="" width="353" height="315" /></p>
<h3>What the Numbers Mean</h3>
<ul>
<li>The Albireo fingerprint API delivered fast in-memory deduplication lookups that completed in 9 to 17 microseconds.<a href="#_ftn4">[4]</a></li>
<li>A SHA-256 hash and index with a 4 KB chunk size incurred only 43 microseconds of latency. That’s less than 1% of the overhead associated with a typical disk access latency of 5 milliseconds.</li>
<li>Albireo consumed less than 3.5 bytes of RAM per index entry with traditional dense indexing and 0.1 bytes per entry with sparse indexing.</li>
<li>A sparse index overhead of 0.1 bytes and a 4 KB chunk size allows deduplication up to 40 TB of data with 1 GB of RAM and up to 2.56 PB of data with 64 GB of RAM.</li>
<li>Albireo-enabled deduplication, including SHA-256 hashing, consumed approximately half (28% to 59%) of a 3 GHz Xeon processor core.</li>
<li>A larger chunk size consumed slightly more CPU. This is due to the fact that bigger chunks of data were passing through the CPU-intensive SHA-256 algorithm.</li>
</ul>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="685" valign="top">
<h1>Why This Matters</h1>
<p>Data deduplication is a resource-intensive   operation that can have a dramatic impact on the overall cost and performance   of a storage solution. ESG Lab is confident that resource-efficient Permabit   deduplication advisory services can be used to architect a cost-effective   solution that provides a virtually limitless pool of globally deduplicated   capacity using industry standard server hardware.</td>
</tr>
</tbody>
</table>
<h2>Scalability and Performance</h2>
<p>Permabit’s global deduplication algorithm is designed to run within a grid for maximum performance and capacity scalability. Permabit currently supports up to 4.6 PB (4,600 TB) in a single pool of capacity. The Permabit Archive product is continuously tested with Albireo deduplication services running on a grid of three or more servers. A typical entry-level grid at a customer site, comprised of access and storage nodes, has deduplication services running on 11 nodes. The largest grid deployed in production at a customer site is 38 nodes.</p>
<p>The performance scalability of a scale-out deduplication solution is dependent on its ability to handle two resource intensive tasks:  hashing and index lookups. The Albireo architecture was designed to maximize the efficiency and scalability of both. The Albireo architecture can support cryptographic hashing (e.g. SHA-256) distributed over multiple CPUs or offloaded to special-purpose hardware, and delivers scalable index lookup performance using an engine that can be deployed across multiple CPUs within a single server or a grid of Ethernet connected servers.</p>
<p>ESG Lab audited the results of nightly regression tests to assess the performance scalability of Albireo index lookups.  A Permabit developed utility was used to process data that was deterministically unique (i.e., no duplicates) during this test. This type of data was used with a goal of maximizing the stress on the Albireo deduplication advisory engine.</p>
<p>The performance of the Albireo fingerprint API was tested with 4 KB chunks on a single server. The number of chunks processed per second was multiplied by the chunk size to calculate the effective throughput of the Albireo deduplication advisory engine. Similar calculations were performed to assess throughput when processing 64 KB and 128 KB chunks of data. As shown in Figure 9, increasing the chunk size increased deduplication index lookup throughput to 39.74 GB/sec on a single server with four Intel Xeon processor cores.</p>
<div class="graph_top">Figure 9. Albireo Deduplication Advisory   API Performance  Analysis</div>
<p><img class="aligncenter size-full wp-image-23975" title="PermabitAlbireoUpdateF9" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF9.png" alt="" width="579" height="391" />These results not only highlight the efficiency and speed of the deduplication engine running on a single server, they also demonstrate the power of supporting a programmable chunk size. It also provides the flexibility to choose between maximizing deduplication rates (smaller chunks) or minimizing index overhead and maximizing the throughput of the deduplication engine (larger chunks).   A programmable chunk size can also be used to simplify integration with an existing architecture.</p>
<p>Similar tests were performed as nodes were added to an Albireo grid. As shown in Figure 10, performance scaled in near linear fashion to 412 GB/sec for a sixteen-node grid processing 128 KB chunks of data.</p>
<div class="graph_top">Figure 10. Albireo Deduplication Advisory   API Performance  Analysis</div>
<p><img class="aligncenter size-full wp-image-23976" title="PermabitAlbireoUpdateF10" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF10.png" alt="" width="593" height="351" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>If a deduplication solution can’t scale to   provide the capacity or performance required to meet the needs of the   business, then costs are incurred and capacity savings are lost as multiple   islands of deduplication are deployed within an organization. These islands   also increase the ongoing costs associated with deploying and maintaining a   deduplication solution.</p>
<p>ESG has confirmed that the algorithms   accessed via the core Albireo fingerprint API are extremely fast, efficient,   and scalable. A sixteen node grid of servers, each with a quad-core Intel   Xeon processor processing Albireo fingerprint API calls, delivered an outstanding   deduplication lookup rate of 412 GB/sec.</p>
<p>ESG Lab’s experience with nearly all of   the vendors offering data deduplication solutions indicates that scaling a   large pool of global deduplication that resides within multiple storage   controllers or servers is a difficult task that can take years to complete.   ESG Lab has confirmed that the field proven deduplication technology at the   heart of the Permabit Albireo SDK has been deployed on a grid of up to 38 servers   in production environments and validated on a grid of 96 servers in   Permabit’s test lab.</td>
</tr>
</tbody>
</table>
<h2>Fault Tolerance</h2>
<p>ESG Lab performed a series of tests on a nine node Permabit Enterprise Archive system to determine whether Albireo deduplication advisory services continue running after multiple hardware failures. A long running directory level file copy operation was started. As shown in Figure 11, a node was powered off and a drive was removed on a different node. The system remained available and the copy operation completed without error.</p>
<div class="graph_top">Figure 11. Validating Permabit Fault Tolerance</div>
<p><img class="aligncenter size-full wp-image-23977" title="PermabitAlbireoUpdateF11" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF11.png" alt="" width="497" height="373" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Errors   within a deduplication solution can cause application downtime or data   corruption. These impact end-user satisfaction and productivity—and, in the   worst case, lead to a loss of revenue. As the size of a scale-out   deduplication system grows, the risk and potential impact of downtime   increases dramatically.</p>
<p>ESG   Lab has confirmed that the scale-out Albireo architecture running on multiple   server nodes can be used to deliver fault tolerant scale-out deduplication   services. An error injection test by ESG Lab proved that deduplication   services remain available after a drive and a server failure.</td>
</tr>
</tbody>
</table>
<h2>Ease of Integration</h2>
<p>ESG Lab evaluated the ease of adding deduplication to an existing storage solution using the FUSE open-source user-space file system framework.<a href="#_ftn5">[5]</a> The FUSE file system is built over the native ext3 Linux file system. There’s no need to patch or recompile the kernel. Permabit created a file system using FUSE named dd2fs. The dd2fs file system was modified to use Albireo to identify and eliminate duplicates. Six Albireo API calls and 52 lines of supporting code were added to the 1,563 line FUSE demonstration program. Synchronous (inline) and asynchronous (post-processing) deduplication modes were demonstrated.</p>
<p>As shown in Figure 12, the <em>update_block</em> function passes data to be written to the Albireo <em>uds_index_block</em> API. A return value of 1 indicates that the block is a duplicate and that it can share storage with the canonical chunk. After a bit of bookkeeping and error checking, the storage associated with the duplicate block is freed. Other than this key function call, the majority of the Albireo-related code changes were isolated to initialization, shutdown, and handling callbacks when operating in asynch/post-process mode.</p>
<div class="graph_top">Figure 12. Integrating Albireo</div>
<p><img class="aligncenter size-full wp-image-23978" title="PermabitAlbireoUpdateF12" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF12.png" alt="" width="555" height="321" /><br />
The FUSE demo was run to see Albireo-enabled deduplication in action. As shown in Figure 13, a 64 KB file full of random data was written to an empty Albireo-enabled file system. A copy of the file with a new name was added.  The Linux <em>df</em> and <em>du</em> utilities were used to verify that the user’s view of the file system included the space consumed by both files, yet only a single file’s worth of disk capacity was consumed. The Linux <em>md5sum </em>utility was used to verify that the files were the same. As each file was deleted, the file system capacity and underlying disk capacity were checked.</p>
<div class="graph_top">Figure 13. Validating Albireo-enabled   Deduplication</div>
<p><img class="aligncenter size-full wp-image-23979" title="PermabitAlbireoUpdateF13" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateF13.png" alt="" width="587" height="416" /></p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Deduplication is valuable technology that can be   frustratingly hard to develop and debug. The field-proven and patented   deduplication technology at the core of Permabit Albireo API incorporates   over 25 man years of development. Integrating Albireo using six well-documented   API calls consumed only 52 lines of code. ESG Lab is confident that an experienced   storage systems architect working alongside a Permabit engineer can complete   a proof of concept integration in two weeks—or less.</td>
</tr>
</tbody>
</table>
<h2>Maturity</h2>
<p>ESG Lab performed a high level assessment of Permabit’s software development processes to understand the maturity of the Albireo SDK. The bulk of the code within the Albireo SDK has been used within Permabit’s shipping products for more than seven years. Permabit has been using agile software development processes for more than eight years.</p>
<p>Agile software development refers to a group of <a title="Software development methodologies" href="http://en.wikipedia.org/wiki/Software_development_methodologies">software development methodologies</a> based on iterative development where requirements and solutions evolve through collaboration between self-organizing <a title="Cross-functional team" href="http://en.wikipedia.org/wiki/Cross-functional_team">cross-functional teams</a>. The term was coined in the year 2001 with the formulation of the <a title="Agile Manifesto" href="http://en.wikipedia.org/wiki/Agile_Manifesto">Agile Manifesto</a>.<a href="#_ftn6">[6]</a> Agile methods generally promote a disciplined project management process that encourages frequent inspection and adaptation; a leadership philosophy that encourages teamwork, self-organization, and accountability; a set of engineering best practices that allow for rapid delivery of high-quality software; and a business approach that aligns development with customer needs and company goals.</p>
<p>The code at the core of Albireo SDK is continuously integrated and tested on a two week iteration cycle. Stories captured in an online Wiki are used to manage requirements. Unit, functional, and stress tests are highly automated and run continuously. Developers write the majority of unit and system level tests. All changes are peer reviewed. Face-to-face cross functional interaction is embraced. Reducing code complexity is valued and recognized.</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>Why This Matters</h1>
<p>Data deduplication is complicated technology. When it   works as designed, it saves capacity and money. When it fails, it can corrupt   data. Storage system vendors looking to add deduplication technology to an   existing product must be absolutely sure that their deduplication algorithm   will not fail. Rigorous design processes and continuous testing are needed to   ensure that the deduplication implementation is bug free.</p>
<p>While a more rigorous review is recommended for   organizations considering a partnership with Permabit, ESG Lab is very   impressed with the maturity and stability of Permabit’s software development   and QA processes.</td>
</tr>
</tbody>
</table>
<h1>ESG Lab Validation Highlights</h1>
<ul>
<li>The Permabit SDK identified potential capacity savings ranging from 33% to 97% for real-world applications including office productivity files, virtual server images, and e-mail backups.</li>
<li>Fast and resource efficient deduplication indexing was confirmed. Less than 0.1 bytes of memory per index and deduplication advisory throughput of 412 GB/sec was measured on a 16 node quad-core Xeon grid.</li>
<li>An open-source user-space file system was modified to use Albireo deduplication advisory services using six API calls and only 52 lines of code.</li>
<li>Synchronous and asynchronous APIs were used to implement inline and post-process data deduplication, respectively.</li>
<li>Permabit’s agile software development processes were audited.</li>
<li>Systems in the lab and in the field confirm that Permabit global deduplication advisory services have been deployed on grids of up to 38 servers and validated on grids of 96 servers in Permabit’s test lab.</li>
<li>An error injection test confirmed that the Albireo deduplication services running within a field-proven Permabit Enterprise Archive solution remain available after both a drive and a server failure.</li>
</ul>
<h1>Issues to Consider</h1>
<ul>
<li>The Permabit Albireo SDK detects duplicate data, but it does not actually remove it. Removing duplicates and maintaining pointers to duplicate data is implemented within the storage system that uses the Permabit SDK. Data structures, which map and keep track of duplicate data references, are needed to take advantage of the Permabit SDK. This is a trivial consideration for NAS systems which use an inode map to keep track of data on disk. For modern block-based disk arrays this service is often available for thin provisioning or virtualization, but it may be an issue that impacts the complexity and resources associated with Albireo integration.</li>
<li>An Albireo-enabled storage solution can continue operating even if Albireo becomes unavailable. If and when Albireo becomes unavailable the system is unable to detect new duplicates, but data access is unaffected. This is due to the fact that Albireo only provides deduplication advice, and the storage system uses that advice to eliminate duplicates while maintaining data integrity.</li>
<li>While the Permabit architecture has been designed to use any hashing algorithm, Permabit has been using the SHA-256 algorithm for years. While the 256-bit SHA2 hashing algorithm virtually eliminates the risk of deduplication induced data corruption due to a hash collision, it does add CPU overhead compared to less rigorous 128-bit algorithms (e.g., MD5).</li>
<li>The API integration and capacity savings results presented in this report were collected using relatively simple test programs running on a single server. Estimating the effort required to evaluate, architect, and implement a solution using a production storage system is beyond the scope of this report. Similarly, estimating the savings that can be achieved with your customer’s data is beyond the scope of this report.  Testing in your lab, with your storage solution, and with your data is strongly recommended.</li>
</ul>
<h1>The Bigger Truth</h1>
<p>One of ESG Lab’s first projects was a 2004 validation of a disk-based backup appliance with built-in data deduplication from Data Domain.<a href="#_ftn7">[7]</a> Since then, data deduplication has evolved into the hottest, most paradigm shifting technology to hit the storage industry since the UC Berkeley RAID papers were published in 1989. Like RAID, data deduplication quickly permeated the storage market due to its outstanding value proposition. Storage administrators struggling to finish backups within shrinking windows were able to reduce the capacity required to retain backups on disk by 90% or more. The value of this new technology was clearly compelling: data deduplication reduced the cost of disk-based backups, putting it on equal, or better, footing than tape. Backups that finish within a shrinking window and quick ad-hoc restores from disk had suddenly become economically feasible.</p>
<p>In recent years, data deduplication has begun to permeate the storage industry. A number of startups, including Diligent, Sepaton, and Exagrid, followed Data Domain into the disk-based backup appliance market. Since then, all of the major systems vendors have added deduplicating backup appliances to their portfolios. More recently, all of the major backup software vendors have added deduplication to their offerings. Content addressable disk-based solutions with embedded data deduplication technology were introduced in the archive market. Permabit was among the first vendors to enter this growing market. Deduplication has been used to reduce WAN traffic within primary and secondary storage replication solutions. And last, but not least, data deduplication is beginning to take hold in the primary storage market, with NetApp and Microsoft leading the way (Deduplication for FAS and Single Instance Store within Windows Storage Server, respectively).  Deduplication within primary storage solutions has proven to be extremely effective in virtual server and virtual desktop deployments that tend to have lots of data in common.  In ESG’s opinion, it’s simply a matter of time before data deduplication gains wide market acceptance throughout the primary storage market.</p>
<p>Data deduplication technology has driven a number of strategic acquisitions. ADIC purchased deduplication technology from Rocksoft for $63M (ADIC has since been acquired by Quantum). IBM acquired Diligent in a deal rumored to be worth between $160M and $200M. Those acquisitions were dwarfed by EMC’s acquisition of Data Domain, where a bidding war with NetApp drove the value of the deal up to $2.1B in cash.</p>
<p>So what’s the big deal with deduplication? It’s actually rather simple: deduplication reduces storage capacity requirements up to 99%. In other words, IT managers can squeeze up to a hundred times more out of each dollar they spend on disk capacity. Even as data deduplication becomes more of a feature than a product, the value is clearly compelling. While one could argue that the hype and valuations of deduplication solutions in the backup arena have gotten a bit ahead of the market, it’s clear to ESG that we haven’t seen the peak in the archive and primary storage markets yet.</p>
<p>The deduplication market has begun to mature in recent years. As the feature becomes more of a check-off item, vendors are leveraging differences in architectures and implementations to grow market share. Aside from the usual delineations based on price and performance, vendors are competing based on the finer differences between deduplication solutions: object vs. block-based, inline vs. post-process, fixed vs. variable length, and global vs. islands of deduplication. Permabit has extracted the core deduplication technology from a field-proven archiving solution with a goal of delivering a deduplication algorithm that can be used to architect solutions with any of these attributes in mind. In other words, instead of arguing the merits of one implementation versus another, Permabit enables a vendor to implement multiple alternatives and just say “yes”.</p>
<p>ESG Lab has confirmed that Permabit Albireo deduplication advisory services work as advertised. Inline and post-process deduplication support was added to a user space file system with only six Albireo function calls and 52 lines of code. The capacity of real-world data sets were reduced between 33% and 97%. The patented deduplication lookup and indexing algorithm was fast and efficient. Permabit deduplication was observed running on more than one server for a scalable global pool of deduplication and fault tolerance. ESG Lab saw no interruption in access when errors were injected on a nine node Permabit Enterprise Archiving grid.</p>
<p>ESG Lab’s experience with nearly all of the vendors that have brought data deduplication solutions to the market indicates that correctly implementing data deduplication is a difficult task that requires man-years of effort.   Performance, resource efficiency, and scalability have proven to be particularly challenging for a number of vendors. On top of the technical challenges, this relatively new and valuable technology has a growing number of patent portfolios that need to be navigated.</p>
<p>Speaking of patents, Permabit has been awarded a total of 26 patents covering diverse areas in data protection and archive and many more filings are pending in similar areas. The growing portfolio includes patents in the areas of hash-based deduplication for scalable file and object data storage, encrypted deduplication, memory-based snapshots, and many other features of Permabit’s product line. ESG Lab was particularly impressed with the resource efficient two-stage indexing method described in Patent 7,457,813, Storage System for Random Blocks of Data. Highlights of that well-claimed patent are summarized in the resource efficiency section of this report.</p>
<p>ESG Lab is confident that the flexibility provided by the Albireo SDK is unique in the industry. ESG Lab has confirmed that Albireo can be used with file or block storage. It provides deduplication services at the object or sub-file level. It supports inline and post-processing programming models with minimal performance and resource impact. Running over a grid, it can be used to create a global pool of deduplication with predictably scalable performance and rock solid reliability. It provides deduplication capacity savings that are far greater than can be achieved with compression. Stream support can be used to provide content aware data deduplication for objects that are misaligned with the block boundaries of an underlying file system. This allows data stored in container formats (e.g., TAR and ZIP files) to be intelligently deduplicated.</p>
<p>Last, but not least, the Permabit Albireo SDK was designed with quick and easy integration in mind. Based on hands-on experience with the Permabit Albireo SDK, ESG Lab believes that Permabit deduplication can be tested within an existing storage system in a matter of weeks. Given the growing size of the market for capacity reduction and the high cost of developing a deduplication solution, organizations considering the merits of adding data deduplication to an existing storage solution should seriously consider a test drive of Permabit’s field proven, patent protected deduplication algorithm.</p>
<h1>Appendix</h1>
<div class="graph_top">Table 4. The ESG Lab Test Bed</div>
<p><img class="aligncenter size-full wp-image-23983" title="PermabitAlbireoUpdateT4" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/PermabitAlbireoUpdateT4.png" alt="" width="621" height="69" /></p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <a href="../../../../../2010/04/2010-data-protection-trends/" target="_blank"><em>2010 Data Protection Trends</em></a><em>, </em>April 2010.</p>
<p><a name="_ftn2">[2]</a> Source: ESG Research Report, <a href="../../../../../2010/12/scale-out-storage-market-trends/" target="_blank"><em>Scale-out Storage Market Trends</em></a>, December 2010.</p>
<p><a name="_ftn3">[3]</a> US Patent 7457813, Storage System for Random Blocks of Data, Nov 25, 2008.</p>
<p><a name="_ftn4">[4]</a> Confirmed via a review of traces of code execution through the in memory index code path.</p>
<p><a name="_ftn5">[5]</a> fuse.sourceforge.net</p>
<p><a name="_ftn6">[6]</a> Agilemanifesto.org</p>
<p><a name="_ftn7">[7]</a> See: ESG Lab Report, <em>The Data Domain DD200 Restorer</em>, February 2004.</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#D3D3D3">
<tbody>
<tr>
<td width="706" valign="top">
<h1>ESG Lab Reports</h1>
<p>The goal of ESG Lab reports is to educate IT professionals about emerging technologies and products in the storage, data management and information security industries. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Lab&#8217;s expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by Permabit.</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/permabit-albireo-empowering-unified-deduplication-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The state of Backup Dedupe &#8211; Storage Technology Magazine</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/the-state-of-backup-dedupe-storage-technology-magazine-page-1/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/the-state-of-backup-dedupe-storage-technology-magazine-page-1/#comments</comments>
		<pubDate>Tue, 09 Aug 2011 17:01:04 +0000</pubDate>
		<dc:creator>Garrett Doherty</dc:creator>
				<category><![CDATA[Backup and Recovery Software]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[In The News]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[deduplication]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23958</guid>
		<description><![CDATA[In a relatively short time, data deduplication has revolutionized disk-based backup, but the technology is still evolving with new applications and more choices than ever. by Lauren Whitehouse via The state of backup dedupe &#8211; Storage Technology Magazine.]]></description>
			<content:encoded><![CDATA[<p>In a relatively short time, data deduplication has revolutionized disk-based backup, but the technology is still evolving with new applications and more choices than ever.</p>
<p>by Lauren Whitehouse</p>
<p>via <a href="http://searchstorage.techtarget.com/magazineContent/The-state-of-backup-dedupe" target="_blank">The state of backup dedupe &#8211; Storage Technology Magazine</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/the-state-of-backup-dedupe-storage-technology-magazine-page-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quantum’s DXi Accent Rivals EMC’s Data Domain Boost</title>
		<link>http://www.enterprisestrategygroup.com/2011/07/quantum%e2%80%99s-dxi-accent-rivals-emc%e2%80%99s-data-domain-boost/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/07/quantum%e2%80%99s-dxi-accent-rivals-emc%e2%80%99s-data-domain-boost/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 22:16:51 +0000</pubDate>
		<dc:creator>Lauren Whitehouse</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Protection Software & Services]]></category>
		<category><![CDATA[Data Reduction Software]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Lauren Whitehouse]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[Data Domain]]></category>
		<category><![CDATA[Data Domain Boost]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[DXi Accent]]></category>
		<category><![CDATA[DXi Series]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[NetBackup]]></category>
		<category><![CDATA[Quantum]]></category>
		<category><![CDATA[remote office/branch office]]></category>
		<category><![CDATA[Symantec]]></category>
		<category><![CDATA[Symantec OST]]></category>
		<category><![CDATA[target media]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23723</guid>
		<description><![CDATA[About a year ago I posted a blog about deduplicating data “upstream” from the backup target. In it, I referenced EMC’s introduction of Data Domain Boost. In a move that rivals EMC Data Domain, Quantum announced its newest addition to the DXi portfolio: DXi Accent, which offloads some deduplication processing to the media server. Quantum [...]]]></description>
			<content:encoded><![CDATA[<p>About a year ago I posted a <a href="http://www.dataprotectionperspectives.com/2010/06/deduping-upstream/" target="_blank">blog</a> about deduplicating data “upstream” from the backup target. In it, I referenced <a href="http://www.emc.com/" target="_blank">EMC</a>’s introduction of Data Domain Boost. In a move that rivals EMC Data Domain, <a href="http://www.quantum.com/" target="_blank">Quantum</a> announced its newest addition to the DXi portfolio: DXi Accent, which offloads some deduplication processing to the media server.</p>
<p>Quantum DXi Accent is a no-cost feature add-on to the company’s popular DXi portfolio of appliances for enterprise, midmarket, remote/branch office and small business backup environments, beginning with the DXi6700 Series. DXi Accent installs on the backup media server (in this case, it’s limited to <a href="http://www.symantec.com/" target="_blank">Symantec</a> NetBackup media servers and requires the use of Symantec’s OST).</p>
<p>DXi Accent, like Data Domain Boost, executes the deduplication process earlier in the backup data path. At the media server, DXi Accent chunks the backup data stream into blocks, calculates the hashes for each, and only sends unique data to be stored. This ensures less traffic between the media server and backup target and accelerates the process.</p>
<p>Those evaluating Quantum versus EMC should look at a few differences between DXi Accent and Data Domain Boost:</p>
<ul>
<li>Using DXi Accent is not an “all or nothing” scenario as with Data Domain Boost. With DXi Accent, users have a choice to enable or disable the feature on a media server-by-media server basis.</li>
<li>DXi Accent can be used on a LAN or WAN, while Data Domain Boost is limited to a LAN. Remote backup performed over a WAN from a media server (with DXi Accent but no local DXi appliance) to a central location (where the DXi appliance is located) is supported.</li>
<li>Quantum’s strategy of bundling high-value features, such as device-to-device replication, direct tape creation, and Symantec OST support—and now DXi Accent—provides Quantum with a price advantage versus Data Domain that has an a la carte licensing approach.</li>
</ul>
<p>Quantum continues to demonstrate its moxie in the target backup segment. The Quantum DXi portfolio of disk-based backup appliances with deduplication is surely worth a look.</p>
<p>You can read Lauren&#8217;s other blog entries at <a href="http://www.dataprotectionperspectives.com/" target="_blank">Data Protection Perspectives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/07/quantum%e2%80%99s-dxi-accent-rivals-emc%e2%80%99s-data-domain-boost/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

