Enterprise Strategy Group | Getting to the bigger truth.TM
Register to view ESG Content
Search

brief.gif Briefs: NEC HYDRAstor
Published on Monday, September 21st, 2009 at 1:21 pm
Categories: Backup and Recovery Software | Briefs | Data Protection Software & Services | Digital Archiving Software | IT Infrastructure | Information Management Software & Services | Information and Risk Management | Storage |
Authors: Lauren Whitehouse | Terri McClure |
starstarstarstarstar
With its latest HYDRAstor release, NEC has rounded out the platform with the addition of a number of features including Write Once Read Many (WORM), quota management, granular resiliency levels, and in flight encryption. While this latest release incorporates a number of other new features and enhancements, the addition of WORM capability considerably strengthens its position as a disk-based long term archive platform. If NEC can ramp up its sales and marketing efforts to build better data center awareness, HYDRAstor could become a major player in both disk-based archiving and scale-out NAS.

Overview

NEC’s HYDRAstor is a hidden gem.  It is a Tier 2 scale-out disk storage system suitable for data archiving and backup.  HYDRAstor is what NEC calls a “self-evolving platform” that allows new nodes to be added to scale the system, supporting multiple generations of HYDRAstor nodes to co-exist in the cluster while maintaining a single system image.  This is an important point: given government regulations that require some archive data retention periods of multiple decades, data often outlasts the media it is stored on.  With HYDRAstor’s self evolving architecture, nodes can be added to or removed from the cluster while maintaining data access, integrity, and, if configured properly, data protection levels that would withstand multiple failures.  Over time, the system can be seamlessly upgraded to the latest and greatest technology without any disruption to data access.

A key requirement for long term archive is immutability—the ability to lock down files and ensure they cannot be deleted of tampered with for a set period of time.  For digital media, this is accomplished through write once, read many (WORM) technologies.  Having archive data readily available, as it is when stored in a disk-based archive, is key for litigation and e-discovery support.  But unless it can be proven the data is authentic and has not been tampered with, it is worthless; WORM helps ensure data is authentic.  NEC added WORM technology in its most recent release of HYDRAstor, taking a major adoption inhibitor off the table. Most competitive disk archive platforms on the market today offer WORM capability and, while HYDRAstor is a compelling platform, it needed WORM to be a fully viable option for long term archive.

NEC is rolling out a number of additional features in the latest platform that make it a strong choice for an online archive and disk-based backup solution, such as in-flight encryption to ensure data cannot be read if intercepted; quota management for tracking of capacity usage and chargeback;  multiple resiliency levels that allow users to align protection levels with the value of data, all within a single system; enhanced application-aware deduplication, incorporating support for Tivoli Storage Manager (TSM) and EMC NetWorker; and a performance boost—NEC claims up to a 67% performance boost based on a code upgrade that reduces and speeds inter-node communication.  With the rich feature set HYDRAstor brings to the table, there is no reason HYDRAstor could not be extended to general purpose NAS. ESG would not be surprised to see NEC move in that direction over time.

Analysis

There has been a lot of focus on clustered or scale-out storage lately: HP acquired IBRIX; LSI acquired OnStor and announced it will offer scale-out systems over time; and HDS is extending its relationship with BlueArc to include its mid-tier solution.  There is good reason for all the attention.  We’re storing more digital information than we could have dreamed about back when traditional scale-up architectures were invented.  Every electronic device we use, from our phones on up, has evolved into a content capture device.  File data is growing fast, with new formats like high definition video and richer graphics.  According to ESG research, archived file based data will exceed 69 Exabytes by 2012, dwarfing the amount of database and e-mail data stored.[1] Scale-up systems, designed to add a limited amount of capacity to a single or dual storage controller, quickly hit performance or capacity ceilings as capacity is added, meaning new discrete systems need to be added, managed as separate silos of information, housed, powered, and cooled, quickly escalating operating costs.

Scale-out, the ability to independently scale and tune bandwidth, processing, and storage capacity independently—all while managing the file system and single shared storage pool—is becoming the new backbone of file-based storage solutions. In this new file-based, content rich era, scale-out architectures have a lot to bring to the table.  Scale-out NAS technology is designed from the ground up for large and rapidly-growing file environments that outpace the scale and performance abilities of traditional scale up systems.

Scale-out NAS is gaining traction.  ESG conducted a survey of North American and Western European storage professionals at enterprise-level organizations (i.e., 1,000 or more employees) and found considerable interest in scale-out NAS solutions.  In fact, 11% of respondents indicated their organizations are already using the technology to some degree and another 75% have imminent plans for or interest in deploying it. Among current scale-out NAS users, faster provisioning; improved scalability, availability, and performance; and simplified management were all cited as key considerations in those organizations’ initial implementation decisions (see Figure 1).

Figure 1. Market Drivers for Scale-Out NAS Solutions

9-24-2009 2-29-37 PMWith cost containment a top-of-mind-issue for almost every business, IT staffs must spend their allotted budgets more judiciously than ever.  It follows, therefore, that organizations still in the process of evaluating scale-out NAS would be more aware of and attracted to its total potential financial impact, which may result in both capital and operational savings.  Specifically, the cost-related benefits that scale-out NAS technology offers include:

  • Low entry cost: The entry cost for scale-out systems varies depending on the minimum configurations supported, with most systems starting as small as two nodes and scaling out from there.
  • Just-in-time scalability: With clustered scale-out systems, capacity and/or performance resources can be added as needed.  Because of their modular nature, there is no requirement to buy (and subsequently power and cool) frames, power supplies, and mostly empty cabinets in advance of storage capacity.
  • Riding the commodity curve: As users defer purchases of frames, processors, or disks, they can benefit from the ongoing decline in component prices over time.
  • Higher utilization rates: Better utilization means deferred purchases of new capacity.  Since all of the NAS heads in scale-out systems can address the entire pool of useable capacity in a given cluster, there is no stranded capacity.
  • Operational savings: A single point of management allows scale-out NAS deployments to increase capacity without adding IT headcount.

That’s where HYDRAstor comes in.  HYDRAstor is an efficient, cost-effective, highly scalable, self managing, self optimizing, scale-out NAS system targeted at providing a disk-based platform for long-term active archive and disk-based backup.  HYDRAstor has a two-tier inline processing architecture made up of Accelerator Nodes and Storage Nodes.  Both tiers have processing power in terms of dual quad-core Xeon CPUs and processing occurs inline across the two-tier architecture before data is committed to disk.  This architecture allows users to scale capacity without hitting the performance ceilings seen in scale-up architectures.  Adding more Accelerator Nodes increases bandwidth, so as capacity is added, there is a corresponding increase in processing power and bandwidth for better overall throughput.

This architecture allows users to set ratios of processing power and throughput to storage capacity to appropriately align performance to application requirements.  For higher throughput, have a higher ratio of Accelerator Nodes to Storage Nodes (for example, one Accelerator Node for every two Storage Nodes).  For bulk capacity requiring lower throughput, have a lower ratio (for example, one Accelerator Node to every four Storage Nodes).

HYDRAstor’s scale-out design allows it to scale to massive capacity, beyond most backup and archive targets in the industry today, yet maintain a single system image.  That means as capacity is added, you still only have one system to manage.  Contrast this to scale-up systems that require adding a second, third, or fourth (and so on) system to add capacity, giving users lots of systems to manage.  There are implications to this design beyond manageability.  Because of its scale-out architecture, HYDRAstor has advantages that drive further efficiencies, such as inline global deduplication and multiple concurrent high availability levels.  Each deserves a deeper look.

Inline Global Deduplication

HYDRAstor performs an inline deduplication process, meaning deduplication is completed before the data is written to disk on the HYDRAstor system.  NEC leverages the HYDRAstor two-tier grid architecture to its advantage to perform deduplication.  Each Storage Node has a portion of the overall hash table maintained in memory and is responsible for deduplicating data chunks within a specific range of hash values.  As data is written to HYDRAstor, the Accelerator Node breaks it into variable length chunks and a hash value is created for each.  The hash value is then sent to the Storage Node that owns the range of hash values that the new chunk fits within.  If the hash value is a duplicate, only reference information is stored.  If it is unique, the new data chunk is stored across multiple disk drives spanning multiple Storage Nodes for maximum resiliency.  Everything is performed in memory, ensuring only unique compressed data is written to disk.  Because each Storage Node has 24 GB of RAM and two quad-core CPUs, as capacity is increased, so is the aggregate memory space and processing power available for processing data across the global distributed hash table.

HYDRAstor’s grid architecture and inline deduplication process reduces the performance impact[2] of deduplication.  Inline deduplication, when done effectively, can actually result in a speedup of performance since duplicate data does not have to be committed to disk, only data reference information is written.  The net result is that users are able to significantly reduce backend capacity requirements for both their archive and backup data. This, too, is a differentiator. ESG knows of no other scale-out storage vendor currently capable of deduplicating both backup and archive data on a shared storage platform.  HYDRAstor uses application-aware deduplication to further optimize the deduplication level by filtering application metadata away from the user data, where it is more likely duplicate data exists, maximizing the opportunity for deduplication and enabling cross application deduplication.

In general, because of the scalability and scale-out nature of HYDRAstor, users can realize significant deduplication efficiencies versus scale-up systems.  In scale-up systems, each system is a deduplication silo—there is no ability to deduplicate across systems.  In HYDRAstor’s scale-out solution, deduplication is driven across all nodes and file systems, creating one large deduplicated storage pool that can be leveraged as an active archive, backup target, or even for general purpose file storage.

High Availability

In terms of data protection and disaster recovery, HYDRAstor offers a unique capability called Distributed Resilient Data (DRD), which allows users to set, or “dial-in,” protection levels based on the type or criticality of the data.  The more critical the data, the higher the resiliency level; and the less critical the data, the lower the resiliency level. The resiliency level corresponds to the number of failures.  A resiliency level of “2” protects against two drive or Storage Node failures, while a resiliency of “3” protects against three drive or Storage Node failures. Users can set their resiliency levels to protect against two, three, or up to six concurrent failures.

To put this in perspective, NEC claims that a resiliency level of “3” provides three times the protection of RAID-5 (RAID-5 protects against one drive loss) with about the same storage capacity overhead. Along the same lines, the HYDRAstor architecture allows IT to “dial-up” resiliency levels, enabling data to survive the loss of nodes.

While all this redundancy is nice, HYDRAstor really puts the grid technology to work to significantly reduce rebuild time.  Unlike conventional RAID, HYDRAstor does not perform a 1:1 drive rebuild.  It just rebuilds the data that was on the failed drive or node.  And data is not rebuilt to a single drive; it is reconstructed across available space in multiple drives using data fragments spanning multiple nodes. In fact, if an entire Storage Node fails, the data from those drives is rebuilt across the remaining Storage Nodes. Because HYDRAstor leverages available space across the existing drives in the grid, there is no requirement for idle “spare” drives.  NEC is exploiting not just the massive parallelism opportunity presented by having multiple drives in a system, but the parallelism opportunity presented by having a grid-based architecture. The combination of the two results in a very fast rebuild and little performance impact in the event of a failure. Thanks to the fast rebuild, users have reduced risk of being exposed to multiple drive failures because of lengthy rebuild times.

While HYDRAstor’s flexible high availability architecture is by itself a strong solution, the latest release adds even more flexibility to the system with the ability to intermix multiple resiliency levels allowing users to align protection levels with the value of data—all within a single system.

NEC’s To-Do List

By incorporating WORM, NEC filled a major gap in the platform.  HYDRAstor is now an exceptionally compelling archive solution.  While HYDRAstor brings a lot to the table as an archive solution, it would be a much more compelling backup solution if tape emulation were incorporated.  HYDRAstor is a standards-based solution that supports NFS and CIFS, but most companies have backup software and processes designed to back up to tape.  While the backup can be changed to back up to an NFS- or CIFS-based disk target, and most major backup applications do support this approach, it requires modifying the backup to set disks as targets. This is a fairly easy process in a small environment, but in a large enterprise—where HYDRAstor is a compelling solution thanks to its ability to scale—there are often multiple versions and types of backup software deployed, so making the changes could be onerous.  If NEC adds tape emulation for some of the more popular drives on the market, it would take away that objection.

The second major to-do for NEC is to amp up the marketing volume.  NEC is not a name that routinely comes up in the data center outside of Japan, where it’s based.  While NEC is a $43 billion company, more than 75% of its revenue comes from Japan.  Outside of Japan, the HYDRAstor business unit is like a startup, though an extremely stable and well funded one.  It needs to take advantage of the deep pockets on the mother ship to invest in the sales organization (both channel and direct) and in marketing awareness required to build data center name recognition for both NEC and HYDRAstor brands.

The Bigger Truth

At its heart, HYDRAstor is a large, self managing storage pool that can be carved up to support many different application performance and protection profiles.  While it does act as a global storage pool, it also offers users the ability to create smaller sub-pools (in the form of file shares) that can be customized to meet user requirements while still benefitting from participating in the global storage pool for features like deduplication, essentially eliminating the deduplication stovepipes so common in other platforms.  Customization of shares allows users to set multiple protection levels and deduplication policies all within a single system, aligning storage services to application requirements.  The key point here is that it is a self-managing, self-provisioning system.  It takes all kinds of tasks that are manual in many systems today, like load balancing and provisioning, and automates them.  Management is done based on the file system or file share. The user can establish characteristics for a particular data stream (file-share-based), but does not need to carve out any particular amount of capacity: it is dynamically allocated via NEC’s DynamicStor, the operating system embedded in HYDRAstor.  Capacity management is where things like quotas come in if the user wants to cap the amount of capacity that any particular application or file system consumes.  Basically, the system manages the disks and allocation, the user manages the data and quotas.  The beauty of the architecture is that HYDRAstor represents a fundamental shift from requiring users to manage storage to having them manage information.

Reference information makes up roughly 80% of the data in the data center, and experiences little or no activity.  Yet much of it needs to remain accessible for business support or in the event of litigation.  It makes sense to migrate that data off to a secondary archive platform like HYDRAstor.  The net result is much shorter Tier 1 backup windows, as less data is being backed up, and a reduction in both CAPEX and OPEX.  CAPEX is reduced by deferring purchase of Tier 1 storage, OPEX through the benefits of deploying a scale-out storage system.  HYDRAstor offers even greater OPEX savings thanks to its compression and deduplication engines.

ESG has long been impressed by HYDRAstor—it appears to be a solid, feature-rich platform.  And adding WORM significantly strengthens its position in the archive space.  If NEC can execute on the sales and marketing front, it could certainly give some of the better known disk-based archive vendors, and NAS vendors, a run for their money.


[1] Source:  ESG Research Report: 2007 File Archiving Survey, December 2007.

[2] Data deduplication can be done in various ways and at various points of origin. It can be done inline during the backup process or it can be done post-process after the backup data is being ingested by the backup target. While each approach has pros and cons in terms of performance, capacity, cost, etc., ESG believes the benefits of data deduplication—in particular, the potential disk cost-savings—are significant enough to warrant the technology’s adoption in either case. That said, end-users should weigh the pros and cons of the available technologies to determine the right solution for their organizations.

Printer-Friendly Version.
Please login to view a printer-friendly PDF version of this document. If you are not a member, please register. When you register, you will be able to view PDF versions of all our freely available documents, and rate and comment on site content.
For important information about using this content, please review our Terms & Conditions
Tags: ,

0 responses to "NEC HYDRAstor"

    There are no comments yet.
Please register and/or login above to post a comment.