Enterprise Strategy Group | Getting to the bigger truth.TM
Register to view ESG Content
Search

reports.gif Lab Reports: ESG Lab Review – Deduplication with Highly Efficient Storage: A Best Practice
The current popularity of deduplication should not blind users to the fact that it is only a contributory factor to overall storage efficiency; in this respect, best practices demand that users understand different ‘dedupe’ tools and integrate them into their storage efficiency efforts. Nexsan’s combination of deduplication with its ‘Highly Efficient Storage’ platform supports a best practice approach to holistic storage efficiency.

The Market Backdrop to Deduplication

Challenges, Realities, and why Dedupe?

Deduplication is sweeping rapidly though the IT world. Mostly, this is a good thing. After all, what’s not to like about it? The answer is ‘not much.’ However, like previous dramatic embraces of new technology tools, there are times when its sheer popularity generates a kind of semi-blindness in users—or at least a willingness to believe that all deduplication is the same and is inherently and equally good in any and all implementations. The reality—a focus of this paper—is that there are different types of deduplication with variable potential. In addition, the ‘rush to dedupe’ can lead to forgetting why it is a good thing at all; it may seem contrary to conventional wisdom, but there is nothing inherently good about having less data; the business value comes from what that reduced overall capacity requirement means.

Figure 1. IT Spending:Cost Reduction and Cost Containment Remain Critical

Simply, it means lower costs. As Figure 1 shows, recent ESG research[1] confirms that cost management—whether actually reducing or merely containing expenses—remains critical. And that is the direct business value of deduplication. The same research also found that ‘managing data growth’ was number five on a list of well over 20 ‘top IT priorities’ through to at least mid-2011. Couple this with the continued business focus on cost management and the popularity of a tool such as deduplication—which can address these two crucial requirements simultaneously—is easy to understand.

In addition to the direct budget savings, there are also other IT operational benefits implicit in having less data: a couple of good examples are reduced waste (perhaps of power as part of a wider corporate ‘green’ initiative) and complexity, as well as less time, bandwidth, space, power, etc. for backups and DR. In other words, removing duplicate data also removes duplicate resource use and duplicate IT hassle. Viewed in this light, deduplication is an efficiency tool, meaning that users should check the efficacy of various deduplication approaches and options as well as seeing deduplication as a weapon in a larger battle. Maximum benefit will invariably be derived with an overall storage efficiency focus, not merely by adding deduplication (however good) to other less efficient infrastructure and approaches; this, in a nutshell, is the best practice referenced in the title of this paper.

This paper examines all this with a specific focus on what Nexsan can offer—both in terms of deduplication itself and also how deduplication fits with the company’s overall ‘HES’ (Highly Efficient Storage) platforms. In addition to market perspective and insight, this paper also includes a specific ESG Lab technical evaluation of Nexsan’s deduplication approach. All this begs the question: what are the varying approaches to deduplication?

Varying Deduplication Approaches

While all deduplication solutions operate with the same goal in mind—reducing the amount of data stored on disk by eliminating data redundancy—the process can actually be implemented in many ways using many different technologies.  For example, some options use an agent on the client side to deduplicate the data before it is sent to storage, which can result in servers getting bogged down since the deduplication facility uses the system’s CPU and memory.  Other implementations perform deduplication directly in the storage target, deduplicating the data before sending it to disk. But there are further variations: such ‘target side deduplication’ can run in a storage array or a separate appliance, and can be used to reduce capacity for primary data as well as backup data. The access requirements for backups and primary data are very different and the deduplication methodologies used for these approaches are often so different that the same vendor may choose to offer completely different deduplication architectures for primary data and backup data. Since deduplication is a CPU-intensive task, it can also have an adverse impact on performance when implemented directly in the storage system.

Thus, there are two basic approaches to carrying out deduplication:

  • Deduplicating data inline, while data is being backed up
  • Waiting until after the backup, called post-process

Users should care about which approach is used because all deduplication incurs some overhead—new data must be compared to previously stored data to find and eliminate redundancy. If deduplication occurs during ingest, that overhead occurs during the backup window and may have an impact on the performance of the backup.  If deduplication is delayed until after the backup is finished, the overhead is incurred outside the backup window and performance of the backup is unaffected, but a non-deduplicated ‘landing area’ must be utilized to receive data prior to deduplication. ESG Lab has tested numerous deduplication solutions using all current methodologies and architectures[2] and has concluded that the deduplication ratios themselves are broadly consistent across similar implementations and are affected more by end-users’ backup policies and data types than by a particular implementation.  However, with the deduplication ratios being similar across multiple vendors’ implementations (in other words, the raw algorithms lead to a similar ability to spot and deal with duplicate data), users should examine other criteria (such as power and cooling efficiency, and system performance) to determine which deduplication solution is best for them. As mentioned already, deduplication is therefore only a partial means to an end, not an end in itself; furthermore, it can easily be sub-optimal if applied without consideration for all the other contributory factors that generate an efficient storage infrastructure and process. Even with all this stated—and knowing there are other considerations—it is, of course, still important to have as good a deduplication implementation as possible.

The Evolution of Deduplication: from ‘1.0’ to ‘2.0’

Deduplication has always claimed and promised efficiency—and, indeed, what we can call ‘Deduplication 1.0’ does deliver against that promise; however, what a ‘vanilla’ 1.0 tool delivers is data-capacity-efficiency alone. Users could indeed reduce costs by reducing the amount of storage required to protect and store backup data over time, yet end-users have also indicated to ESG that this promise has not always seemed to pay off for them holistically since in some cases they have ended up with higher costs after factoring in the higher cost of the deduplication tool and storage. What users should seek, therefore, is what we can call ‘Deduplication 2.0;’ in this implementation approach, advanced deduplication is integrated with reliable, easy to use, high density, high performance storage with advanced power efficiency technology in order to deliver cost of ownership benefits that can far exceed those of deduplication (version 1.0) alone. Although this Lab Review does not include precise TCO measurements, the impact of a ‘2.0’ deduplication can be significant as Figure 2 indicates. For example, the 2.0 offering from Nexsan includes power savings that can be 60% lower than a traditional approach, and floor space density can similarly improve by up to half. The next sections review the Nexsan offering in more detail, with actual results for operation, ease of use, and deduplication ratios.

Figure 2. The IT & Business Values of ‘Deduplication 2.0’

Nexsan’s Highly Efficient DeDupe SG

Nexsan DeDupe SG uses dedicated deduplication hardware coupled with its Highly Efficient Storage (HES) hardware to minimize deduplication’s impact on backup windows and allow users to selectively apply deduplication—excluding individual files, folders, or entire backup sets—while minimizing power and cooling requirements using Nexsan’s AutoMAID technology to reduce power requirements when drives are not in use.

Figure 3. Nexsan DeDupe SG


As seen in Figure 3, Nexsan’s DeDupe SG is offered in options that scale from 4 TB usable to 72 TB usable. Assuming an average deduplication ratio of 20:1 over time, Nexsan’s solutions can scale from 80 TB to 1.4 PB of protected data. Nexsan’s DeDupe SG offers features that increase efficiency, such as:

  • High performance backups. Deduplication hardware is separate from the regular storage hardware, minimizing the impact of deduplication on backups and restores.
  • Hosted Backup option. Allows backup software to be directly installed on the DeDupe SG node, reducing LAN traffic and backup server hardware requirements.
  • Selective deduplication. Users can exclude files, folders, or backup sets of non-redundant data.
  • Ultra-green: Nexsan’s ‘AutoMAID.’ Automated power reduction technology provides multiple levels of power efficiency based on a simple time-based data access policy with no impact to storage performance.
  • WAN efficient Remote Replication. Nexsan DeDupe SG can replicate deduplicated data for remote office or offsite DR protection by only sending unique blocks and indexes over the wire.

Real World: Nexsan ‘Deduplication 2.0’

ESG Lab recently tested the Nexsan DeDupe SG’s deduplication and AutoMAID capabilities in a simulated backup environment to evaluate the capacity and power/cooling savings offered by the platform.[3] ESG Lab logged into an un-configured Nexsan DeDupe SG system and ran through the initial configuration: first setting an IP address, then running through a short web-based configuration wizard. After configuring site-specific network settings, ESG Lab created a volume to act as a landing area for backups and ran the new share wizard to share the volume out to systems. The Nexsan DeDupe SG was ready to begin accepting data less than 20 minutes after the first keystroke.

As seen in Figure 4, deduplication and AutoMAID were left at the default settings: data would be automatically deduplicated immediately after being copied to the network share and AutoMAID was set to park the drive heads after two minutes of inactivity, slow rotation after 10 minutes, then stop the drives completely after 15 minutes.

Figure 4. Nexsan Deduplication and AutoMAID configuration

Both deduplication and AutoMAID can be controlled by simple, yet powerful schedulers that enable administrators to precisely control when deduplication and AutoMAID should take place.

Next, 4 GB of data (large text files) was copied to the network share created in the previous step. The copy operation completed in about 90 seconds, then ESG examined the DeDupe SG status screen and discovered that deduplication had already completed. The DeDupe SG user interface provided excellent details, indicating that the 4.2 GB of primary data had deduplicated down to 743.2 MB for an initial deduplication ratio of 5.7:1 and that the deduplication took 61 seconds in total.

Why This Matters

ESG has repeatedly heard from end-users that tape-based backup is inherently unreliable and fraught with excessive hard (tape media) and soft (management and administration) costs.  As application data continues to grow and backup windows continue to shrink, IT managers are increasingly adopting backup to disk technologies to get backups done quicker. As a matter of fact, ESG’s research indicates that the need to reduce backup times is the number one challenge IT managers reported facing with their data protection processes and technologies. [4]

Meeting backup windows, along with the cost of storing backed up data, has led many users to turn to disk-based solutions offering data deduplication technology. Deduplication changes the economics of disk-based data protection by not only making backing up to disk much more cost-effective, but also allowing users to keep data on disk longer (i.e., extend retention periods).

With dedicated hardware separating data-intensive backup and restore operations from CPU-intensive data deduplication operations, ESG Lab observed that Nexsan’s DeDupe SG provides excellent backup and restore performance with deduplicated data.

Due to the high cost of WAN bandwidth, organizations with large amounts of backup data have found that replicating backup data to a remote site can be costly and impractical.  This has forced users to rely on physically transporting tapes to an offsite location for disaster recovery.  Finding and restoring applications from tape after a disaster is difficult, error-prone, and time-consuming. By only moving unique data, Nexsan DeDupe SG can provide an affordable, WAN-optimized alternative to tape for offsite archival and disaster recovery.

ESG Lab also found that configuring a Nexsan DeDupe SG was straightforward and intuitive. Twenty minutes after beginning the configuration, the system was ready to accept data.

Finally, ESG monitored the system and watched as the drives progressed through the three stages of AutoMAID: after two minutes, the drive heads parked; after ten minutes, the drives spun down to 4,000 RPM; and then finally, after 15 minutes, the drives spun down completely. ESG then copied a file back from the system and watched as just the drives containing that file spun up. The remaining drives in the system remained idle.

Figure 5 shows what an environment with a typical backup schedule would look like using Nexsan DeDupe SG. Drives remain spun down most of the time, spinning up only for backup and restore operations. Data is automatically deduplicated and reconstituted as needed, with no administrative action required.

Figure 5. Nexsan Deduplication and AutoMAID in action

Why This Matters

Energy costs are rising steadily and some metropolitan data centers are already maxed out.  ESG research indicates that reducing power consumption in the data center is a key initiative for many organizations. ESG Lab has confirmed that Nexsan AutoMAID technology dramatically reduces power and cooling requirements, and Nexsan DeDupe SG data deduplication magnifies these savings dramatically while reducing the amount of capacity needed to retain backups on disk.  Nexsan DeDupe SG not only improves energy efficiency, it also reduces cooling requirements and extends the lifespan of disk drives.

This is valuable because if a customer completes backups in a typical backup window (between 2-8 hours on average), a significant amount of energy is wasted by keeping the target storage system online, waiting for the next data feed or request. Disk drives continue to spin and must be cooled, even if data is not being written. This presents a great opportunity for IT to cut energy costs—an opportunity that many organizations plan to take advantage of as nearly 42% of those recently surveyed by ESG said that they planned to maintain or increase their investments in more power-efficient storage hardware.[5] Customers can lower power consumption and support green initiatives with the Nexsan DeDupe SG as all its systems support AutoMAID to idle or shut down inactive drives with no performance penalty while reducing the system’s overall carbon footprint. Additionally, customers do not have to install any software on backup application servers or other hosts connected to the Nexsan DeDupe SG to run AutoMAID.

The Bigger Truth

Not for the first or last time in the IT business, terminology and semantics can actually create issues; deduplication is not one standard thing. Other examples are ‘SSD’ and ‘thin provisioning’—useful concepts, but not singularly defined tools. Deduplication is thus a concept that can be applied in a number of ways and that will usually help to provide IT and business value. However, the extent of that value depends not just on the quality of the deduplication itself but on a user’s broader storage efficiency context. To use an analogy—you wouldn’t just specify or buy a carburetor (or even an engine) when you’re running a vehicle—what you look for is a complete ‘fit-for-purpose’ vehicle. Of course you want a great carburetor and engine within it, but the real value comes from the whole (the sum of the parts) and not just the parts themselves.

The ‘best practice’ in the title of this paper boils down to precisely that: users should take a broad view of storage efficiency and understand that it is an ecosystem—one that is only as strong as its weakest link. Naturally, whether it’s a car or a storage system being considered, anyone will want to deal with a vendor that has done its homework on the user’s behalf. Nexsan has a very capable deduplication approach that compares favorably to its industry peers, which is a good thing (and the ESG Lab review shows its ease and value), but the company also offers a broader HES (Highly Efficient Storage) infrastructure approach, which is a better thing.

Nexsan’s HES delivers business value by minimizing the use of power, cost, and space for given capacity and performance workloads. Added to its base, powerful deduplication tool, Nexsan’s HES allows users to move from a traditional ‘deduplication 1.0’ implementation and to gain the added IT and business benefits of ‘deduplication 2.0.’  Combining this sort of all-encompassing approach with user processes that further emphasize data efficiency extends the best practice; after all, the inefficient storage of efficiently deduped data (or even un-needed, but deduped data) is still inefficient!

The bottom line is one of yin and yang: great deduplication on top of an inefficient storage system is sub-optimal, yet less effective deduplication on top of an efficient storage system is also less than ideal. Each aspect must be addressed and integrated in order to maximize IT and business value; this is something that Nexsan both understands and is doing.


Source: ESG Research Report, 2010 IT Spending Intentions Survey (report to be published in January 2010).

http://www.enterprisestrategygroup.com/category/lab-reports/

Due to limitations in the quantity and power of the clients available for testing, this examination was not designed to test the absolute performance limits of the Nexsan DeDupe SG.

Source: ESG Research Report, Data Protection Market Trends, January 2008.

Source: ESG Research Report, 2009 Data Center Spending Intentions Survey, March 2009.

Printer-Friendly Version.
Please login to view a printer-friendly PDF version of this document. If you are not a member, please register. When you register, you will be able to view PDF versions of all our freely available documents, and rate and comment on site content.
For important information about using this content, please review our Terms & Conditions

1 responses to "ESG Lab Review – Deduplication with Highly Efficient Storage: A Best Practice"

  1. Stor2 says:

    Dedupe is powered by FalconStor Software

Please register and/or login above to post a comment.