<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Enterprise Strategy Group X Market Reports</title>
	<atom:link href="http://www.enterprisestrategygroup.com/category/content-types/reports/market-reports/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.enterprisestrategygroup.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Wed, 08 Feb 2012 01:33:54 +0000</lastBuildDate>
	<language></language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>The Relevance and Value of a “Storage Hypervisor:” Virtualized Management for More Than Just Servers</title>
		<link>http://www.enterprisestrategygroup.com/2011/10/the-relevance-and-value-of-a-%e2%80%9cstorage-hypervisor%e2%80%9d-virtualized-management-for-more-than-just-servers/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/10/the-relevance-and-value-of-a-%e2%80%9cstorage-hypervisor%e2%80%9d-virtualized-management-for-more-than-just-servers/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 12:54:51 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Server Virtualization]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[storage virtualization]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[storage hypervisor]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=25309</guid>
		<description><![CDATA[The evolution of IT is reaching at least one true pinnacle of efficiency with server virtualization: the transformation from the “one application, one server” paradigm to running multiple applications on different operating systems on a single physical machine. While new server strategies have transformed operations, storage implementations can often hinder progress. This is not wholly [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">The evolution of IT is reaching at least one true pinnacle of efficiency with server virtualization: the transformation from the “one application, one server” paradigm to running multiple applications on different operating systems on a single physical machine. While new server strategies have transformed operations, storage implementations can often hinder progress. This is not wholly surprising. Storage architectures, implementations, and management techniques originally came from the monolithic mainframe era and, while huge improvements have been made, the underlying concepts upon which the storage architecture was originally designed are cracking under the weight of progress. Storage needs to improve its résumé to match the job openings in server virtualization.</div>
<private_standard>
<h1>Overview</h1>
<p>The need for economic and operational efficiencies transcends all organizations. Commercial businesses, non-profits, government departments, and large and small enterprises are all striving to make the most, and more, of what they have. Economic conditions don’t matter either. In economically challenging times, tight budgets dictate efficiency in order to get tasks accomplished with limited funds. But in boom times, efficiency is equally important to maximizing value in terms of profit, time, or resources while executing on objectives. This is borne out by ESG research: ESG asked senior IT professionals in enterprise and midmarket organizations in North America and Western Europe to identify the most important considerations for justifying IT investments in 2009-2011. As Figure 1 shows, the top two priorities continue to be reducing operational expenditures and business process improvement.<a href="#_ftn1">[1]</a> Clearly, investments are made in efficiency-focused endeavors.</p>
<div class="graph_top">Figure 1. Most Important Considerations for Justifying 2011 IT Investments, Three-year Trend</div>
<p><img class="aligncenter size-full wp-image-25312" title="StorageHypervisorF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/09/StorageHypervisorF1.png" alt="" width="617" height="404" /></p>
<h2>Efficiency and Hypervisors</h2>
<p>The evolution of IT is reaching at least one true pinnacle of efficiency with server virtualization: the transformation from the “one application, one server” paradigm to running multiple applications on different operating systems on a single physical machine. This is possible because of a specialized software layer that separates functionality from specific hardware: the hypervisor. The term was coined to indicate a software layer that resides at a higher level than a simple “supervisor” that controls hardware. It is a layer of abstraction between the physical hardware and guest operating systems. It can do the heavy lifting that makes it possible to run UNIX on a Windows machine. In a purely physical server environment, the operating system is intimately aware of underlying hardware such as processors, drivers, bios, etc. In a virtualized server environment, hypervisors such as <a href="http://www.vmware.com/" target="_blank">VMware</a> vSphere, <a href="http://www.microsoft.com/" target="_blank">Microsoft</a> Hyper-V, <a href="http://www.citrix.com/" target="_blank">Citrix</a> XenServer, and <a href="http://www.ibm.com/" target="_blank">IBM</a> PowerVM get in between these physical realities and the virtual machines (VMs) to provide a consistent platform regardless of the VM operating system. Since this is so accepted for servers and provides proven value to users, mightn’t the same hold true for storage?</p>
<h3>The Storage Hypervisor</h3>
<p>A storage hypervisor provides a similar layer of abstraction between the physical storage resources and the applications using them. Here’s how Wikipedia defines it:</p>
<p><em>The storage hypervisor, a centrally managed supervisory software program, provides a comprehensive set of storage control and monitoring functions that operate as a transparent virtual layer across consolidated [storage hardware] pools to improve their availability, speed, and utilization.</em><a href="#_ftn2">[2]</a></p>
<p>This sounds so straightforward, almost innocuous, but it subsumes some very important attributes and is dramatically different from the way that most storage is managed today. For instance, a storage hypervisor is designed to be agnostic to, and to accommodate arrays from, different manufacturers that may be using different disk tiers (SSD, SAS, SATA) and storage network protocols (Fibre Channel, FCoE, iSCSI). Arrays that would be incompatible in a traditional physical environment are suddenly able to work together.</p>
<p>However, just because a definition exists doesn’t mean the complete reality exists (although a few vendors are beginning to name storage virtualization products this way, and there are green shoots of a storage hypervisor spring appearing). Some early storage virtualization products that deliver some of this functionality are array-based and don’t support multi-vendor storage; some are appliance-based and may support some other vendor’s storage. Some storage virtualization products are software-only, some are software built into an external controller, and a few are network appliances. These are definitely a start, and given the resounding success of hypervisors delivering server virtualization, doesn’t it make sense to take storage virtualization in the same direction?</p>
<h1>What Do Users Need?</h1>
<p>The challenges that IT managers face today are daunting and well documented, and as such will not be repeated in depth here. First, there is a constantly increasing need for additional resources to support growing data volumes based on both natural application growth and new workloads (think social media and big data). In many cases, organizations try to handle this by simply throwing more hardware at the problem, resulting in massively under-utilized assets. Next, operational processes have not caught up with technology innovations, so as IT service delivery becomes more agile (and cloud enters the fray, for instance), administrators struggle to manage with the same old, inflexible processes. And, of course, budget constraints are always an issue, even more so in recent years.</p>
<p>In addition, as virtual servers and cloud computing are improving provisioning and providing a higher level of services, users are beginning to expect “instant IT.” In the old days (let’s say 5 years ago—or <em>now</em> for those that have not taken the server virtualization plunge!), if a user wanted to launch a new application to support a business process, he/she had an expectation that it would take a while—weeks or months—to get through the normal channels. But with fast, easy provisioning made possible by virtualization, IT can spin up a new VM in minutes.</p>
<h2>The New Normal</h2>
<p>Server virtualization is revolutionizing IT. It has enabled levels of efficiency (for which, read cost savings) never before dreamed of: reducing physical server needs by 50%, 60%, or even 80% and freeing up staff members from many management tasks. In the past, many business decisions were made based on what IT was practically able to do. Now, IT is more able to adjust to serve the business.</p>
<p>Of particular note is the fact that virtualization benefits actually <em>increase</em> as virtual deployments grow and mature. This was documented in ESG research which led to the creation of ESG’s Server Virtualization Maturity Model.<a href="#_ftn3">[3]</a> ESG was able to separate respondents into three groups based on the extent of their virtualization deployments: advanced (25% of those surveyed), progressing (53%), and basic (22%). The criteria used were:</p>
<ul>
<li>Scope of deployment, measured by the percent of servers virtualized</li>
<li>Virtual production ratio, measured by the percent of VMs running in the production environment</li>
<li>Efficiency, measured by the virtual-to-physical server consolidation ratio</li>
<li>Workload penetration, measured by the deployment of multiple virtual workloads, particularly mission critical ones</li>
</ul>
<p>ESG research indicates that while lower capital and operational costs and greater IT efficiency accompany all deployments, only the advanced implementations are actually becoming dynamic IT environments and vastly improving application provisioning, maintenance, availability, and backup/recovery processes. The benefits increase as virtualization experience expands. This reality will soon become the new normal in IT.</p>
<h2>Storage Hinders Progress</h2>
<p>Sounds good, does it not? But it doesn’t look quite so rosy from the storage end of things. While new server strategies have transformed operations, storage implementations can often hinder progress. This is not wholly surprising. Storage architectures, implementations, and management techniques originally came from the monolithic mainframe era and, while huge improvements have been made, the underlying concepts upon which the storage architecture was originally designed are cracking under the weight of progress. Scale-up silos of proprietary disk were designed to be physically managed and mapped to individual servers, but that is no longer how the processing side of IT works.</p>
<p>Server virtualization users know this already. As Figure 2 shows, when asked which storage developments would enable wider server virtualization usage, at least 25% of respondents mentioned each of the following aspects: faster storage provisioning, more scalable storage infrastructure to support rapid VM growth, and increased storage virtualization.</p>
<div class="graph_top">Figure 2. Storage Developments That Would Enable Wider Server Virtualization Usage</div>
<p><img class="aligncenter size-full wp-image-25311" title="StorageHypervisorF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/09/StorageHypervisorF2.png" alt="" width="612" height="318" /></p>
<h2>Server Virtualization Pressures Storage</h2>
<p>The virtualization of servers impacts storage and storage decisions; and not in a good way. First, equipment needs: servers with direct-attached storage can’t move VMs to other servers because the data is only available on that server. What good is the ability to move your virtual workloads if the data can’t come, too? Networked storage is required to make that happen, and consequently virtual implementations require investments in more and better storage. In addition, storage (and networks) must handle higher IO and throughput densities because now multiple applications are sharing and better utilizing a single server’s capabilities. And when it comes to backup, that problem gets worse. The upshot, which is not often mentioned, is that the cost of upgrading storage to handle server virtualization can negate a significant portion of the savings that virtualization enables. Oops!</p>
<p>Servers are basically cheap and interoperable today, and the hypervisor provides the functionality to make applications work on any server. Server virtualization is hardware agnostic and, except for some speeds and feeds, servers aren’t really differentiated by functionality. On the other hand, storage is expensive, complex, proprietary, and incompatible; vendors sell a lot of it based not just on capacity, but on built-in features like snapshots, replication capability, storage tiering, etc. that is software-based but built into the array. As a result, buying and deploying servers is a pretty easy process, while buying and deploying storage is not. It’s a mismatch of virtual capabilities on the server side and primarily physical capabilities on the storage side. Storage can be a ball and chain keeping IT shops in the 20<sup>th</sup> century instead of accommodating the 21<sup>st</sup> century.</p>
<p>Storage needs to improve its résumé to match the job openings in server virtualization. Does it make sense to make these improvements within arrays? Or, having seen the impact that server hypervisor functionality has made, would a storage hypervisor make more sense? Clearly, it makes conceptual sense—both operational and financial—to complete the abstraction; to separate the storage functionality from the physical storage system by creating a storage hypervisor. Then, IT organizations could use whatever arrays they like, even supposedly incompatible ones. Put snapshots, tiering, load balancing, and the like in the hypervisor, and IT can easily move data between arrays, scale up or down to accommodate business needs, and deliver high service levels to users while minimizing cost and waste. It would make virtual storage as easy to provision and manage as virtual servers.</p>
<h1>A Storage Hypervisor Vision</h1>
<p>So, the concept of a storage hypervisor is generically good and appealing. But what should it actually look like, and what would the specific benefits be? It should start with the same kind of core components that server virtualization uses: a secure, scalable storage virtualization platform (like VMware vSphere and others in the server space) to consolidate workloads, and a management platform (VMware vCenter, etc.) to simplify and automate infrastructure tasks, provisioning, service delivery, compliance, data protection, etc. As with server virtualization, this would enable IT administrators without specific storage expertise to manage storage tasks. Below are some of the key capabilities that a storage hypervisor would provide:</p>
<ul>
<li><strong>A common management platform to aggregate capacity from any hardware platform. </strong>This would be regardless of vendor or disk type and could be shared out to different servers. This capability would let IT select storage based on price and service regardless of vendor without worrying about the choice’s impact on processes and functionality.</li>
<li><strong>Storage provisioning and management automation. T</strong>his relieves administrators of manually creating array groups, allocating LUNs, partitioning volumes, creating RAID sets, and performing other complicated tasks designed to provide the capacity required for each application at the agreed performance and availability levels. Today, provisioning is often little more than an educated guessing game that requires estimating how much capacity an application will need today, next week, and next year. Over-provisioning is a common strategy for ensuring that applications don’t run out of capacity, but it is extremely inefficient as wasted resources sit idle. In contrast, a storage hypervisor would use built-in intelligence to provision storage for applications and servers, automatically selecting a combination of disks to achieve performance, availability, and cost objectives. It would also rearrange volumes to maximize performance and efficiency.</li>
<li><strong>Transparent data mobility across storage tiers and arrays without shifting or losing functionality</strong>. Like a server hypervisor, a storage hypervisor must provide its own functionality, not count on what the underlying arrays can do, in order to be vendor-neutral. For example, a storage hypervisor would free administrators from having to map virtual volumes to physical volumes to retain vendor-specific snapshots. In addition, this would reduce the total cost of ownership by eliminating array-specific software licenses to provide features such as multi-pathing, thin provisioning, automated tiering, etc.</li>
<li><strong>The ability to pool storage resources from various arrays and vendors, and across data centers, to be accessed from anywhere.</strong> This would enable the creation of a single virtual data center made up of geographically distributed locations. If not yet quite a data center without walls, at least the walls can be over the horizon! Most storage virtualization solutions are restricted to the data center by physical array limitations. Server clustering allows distributed servers to depend on each other for high availability and transparent movement of workloads, but storage at the site is the boundary. If that boundary is eliminated and data centers 50 km apart can serve that function for each other, then multi-data-center failover is possible without expensive add-on mirroring and automation products. This makes disaster recovery, failover, and high availability much simpler to achieve. Now you can think of having a highly available, virtual storage cloud that is stretched across geographic boundaries.</li>
<li><strong>Similar capital and operational cost benefits as the server hypervisor</strong>. This would minimize the number of arrays required, allowing IT to leverage price competition, and simplify operations. IT would benefit from “array meritocracy.”</li>
</ul>
<h2>What to Look For</h2>
<p>Some key capabilities are required in any viable storage hypervisor. IT departments depend on storage for more than the high level functions described above; the actual methods of achieving these objectives currently lie in array-based software features, so they would need to be included within the hypervisor. These include:</p>
<ul>
<li>Application integration with snapshots for backup and cloning, with options for when to snap, how long to retain snapshots, and when to back up offsite.</li>
<li>Integrated snapshot recovery management.</li>
<li>Automated storage tiering to balance performance and capacity requirements.</li>
<li>Thin provisioning for greater efficiency.</li>
<li>Deep integration with server hypervisor capabilities. This would include virtual array integration and data protection (such as integration with tools like VMware vStorage APIs for array integration) and management platform plug-ins. This would let the server hypervisor speed operations while consuming less processing power, memory, and storage network bandwidth.</li>
<li>Synchronous or asynchronous mirroring across sites, using virtualization to allow primary site tier-1 storage from vendor A to, for instance, mirror at a secondary site to tier-2 storage from vendor B.</li>
<li>Integration with site switching automation (examples are VMware Site Recovery Manager and IBM Tivoli System Automation) for complete server, storage, and network failover. Taken to its logical conclusion, this enables virtually any location, regardless of distance or infrastructure type, to provide high availability and failover services in case of disaster.</li>
<li>Intuitive management features such as providing visibility across the entire SAN topology with drill-down on individual components and integrated virtual volume performance analysis to speed problem resolution.</li>
</ul>
<p>A few vendors have already begun to start thinking in these terms; there are the industry giants—IBM with its SAN Volume Controller, <a href="http://www.emc.com/" target="_blank">EMC</a> with VPLEX, <a href="http://www.hds.com/" target="_blank">HDS</a> with its Universal Storage Platform-V—as well as some smaller software-only players such as <a href="http://www.datacore.com/" target="_blank">DataCore</a>. As the capabilities of these various approaches not only expand but become known and understood better, the end-user opportunity for improving efficiency and simplifying operations is simply monumental.</p>
<h1>The Bigger Truth</h1>
<p>The concept of a storage hypervisor is not just semantics. It is not just another way to market something that already exists or to ride the wave of a currently trendy IT term. A storage hypervisor has substantial, actual operational and business value. While some storage virtualization capabilities are available currently, they typically have limitations. Organizations have now experienced a good taste of the benefits of server virtualization with its hypervisor-based architecture and, in many cases, the results have been truly impressive: dramatic savings in both CAPEX and OPEX, vastly improved flexibility and mobility, faster provisioning of resources and ultimately of services delivered to the business, and advances in data protection. The storage hypervisor is a natural next step and it can provide a similar leap forward.</p>
<p>The combination of benefits from server <em>and</em> storage hypervisors can provide the sort of truly efficient infrastructure utility that has been promised for more than a decade. Both capabilities are needed to achieve flexible IT pools. As long as IT organizations have to manually intervene to ensure accurate, optimized, and smooth operations, then the “promised land” has definitely not been reached. Task automation is the foundation for any infrastructure that is truly in service to the business. Storage virtualization itself is a start—and has become endemic because of the inherent value and functionalities it provides—but it is ultimately only virtualizing a component and not a full system. By comparison, a competent storage hypervisor is the route to a fully flexible storage infrastructure that can help create a cross-site “‘storage cloud” with pooled resources accessible from anywhere, while providing a new approach to high-availability and workload flexibility. It can extend freedom of choice regarding storage and enable easy re-purposing of arrays for investment protection. This is one of those areas where technological possibility has now caught up with logical business desire. Storage hypervisors are logical, vital, and just plain sensible. Naturally, their impact will be earliest and greatest in large, complex IT environments, but, as with server hypervisors, the benefits will cascade broadly across the industry over time.</p>
<p>It’s somewhat ironic that the first virtual operating systems (SVS and MVS for IBM’s System/370 in 1972) were for mainframes; four decades later, it’s possible to argue that server virtualization is creating “software mainframes.” As we put IT back together again, it is crucial that we consolidate the management of storage just as much as anything else.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source:  ESG Research Report, <em><a href="../../../../../2011/01/2011-it-spending-intentions-survey/" target="_blank">2011 IT Spending Intentions Survey</a></em>, January 2011.</p>
<p><a name="_ftn2">[2]</a> <a href="http://en.wikipedia.org/wiki/Storage_hypervisor" target="_blank">http://en.wikipedia.org/wiki/Storage_hypervisor</a></p>
<p><a name="_ftn3">[3]</a> Source: ESG Research Report, <em><a href="../../../../../2010/11/the-evolution-of-server-virtualization/" target="_blank">The Evolution of Server Virtualization</a></em>, November 2010.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/10/the-relevance-and-value-of-a-%e2%80%9cstorage-hypervisor%e2%80%9d-virtualized-management-for-more-than-just-servers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Classifying Database Data</title>
		<link>http://www.enterprisestrategygroup.com/2011/09/classifying-database-data/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/09/classifying-database-data/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 16:44:34 +0000</pubDate>
		<dc:creator>Julie Lockner</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Julie Lockner]]></category>
		<category><![CDATA[Market Reports]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=25252</guid>
		<description><![CDATA[Organizations are also looking to gain control of the sheer volume of data flooding their data centers. Facing stringent compliance regulations, organizations in many industries are retaining massive amounts of data, and are becoming acutely aware of the problems they will encounter as they attempt to manage such vast quantities of information. Until an organization [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">Organizations are also looking to gain control of the sheer volume of data flooding their data centers. Facing stringent compliance regulations, organizations in many industries are retaining massive amounts of data, and are becoming acutely aware of the problems they will encounter as they attempt to manage such vast quantities of information. Until an organization knows exactly what kind of data it has stored in its database and where the data resides, it cannot manage the data or apply cost-cutting technologies such as virtualization and tiered storage solutions to it. Data classification can help.</div>
<private_standard>
<h1>Introduction</h1>
<p>For years, an organization’s storage management strategy consisted of nothing more than using data’s <em>age</em> as its primary means of classification. Older data was summarily deleted or transferred to alternative methods of storage (e.g., tapes), regardless of its business value. As a result, large stores of corporate knowledge were misplaced or lost. However, facing both internal and external pressures to increase data security, and retain data for longer periods of time to meet compliance requirements, more and more organizations are rethinking their existing data classification strategies.</p>
<p>One of the key drivers in this evolution of data classification is the desire to spend IT dollars wisely, optimize storage management, and minimize storage costs wherever possible. Storing all data at the same level or tier provides little value as unused/underused data consumes valuable storage space. Identifying the various “types” of data stored sets an organization up to store that data in the most cost-effective way possible.</p>
<p>Organizations are also looking to gain control of the sheer volume of data flooding their data centers. Facing stringent compliance regulations, organizations in many industries are retaining massive amounts of data, and are becoming acutely aware of the problems they will encounter as they attempt to manage such vast quantities of information. In response to a recent ESG survey, respondents stated that managing data growth and more efficiently managing the data that already exists in their databases were among their top challenges, pointing to the immense size of the databases they now manage.<a href="#_ftn1">[1]</a></p>
<div class="graph_top">Figure 1. Total Database Data</div>
<p><img class="aligncenter size-full wp-image-25253" title="DataClassificationF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/09/DataClassificationF1.png" alt="" width="605" height="351" />Unfortunately, while the amount of data continues to multiply by astounding amounts, few organizations have sound data classification processes in place to help them fully identify and optimize this data. Such growth is a double-edged sword; while the additional data may, in some cases, provide enormous opportunity for activities such as data mining, until an organization knows exactly what <em>kind</em> of data it has stored in its database and where the data resides, it cannot manage the data or apply cost-cutting technologies such as virtualization and tiered storage solutions to it. Therefore, the first step in any data optimization process is classification.</p>
<h1>Why Classify Corporate Data?</h1>
<p>First and foremost, a data classification initiative enables an organization to understand its data. This, in turn, enables activities such as identifying and deleting unnecessary data, and implementing the most cost-effective storage hierarchy for data that is of value to the organization. It also ensures that important data remains available to its users.</p>
<p>For most organizations, the main factor driving a data classification initiative is the need to optimize data storage to its fullest. Companies don’t want to spend their IT budgets on storing data that no longer serves any purpose, and they are well aware that it is cost-prohibitive to store all data under the auspices of a “tier-1 only” infrastructure.  Higher performing tier- 1 storage systems cost more per gigabyte. Companies want to be sure that these high performers are leveraged optimally. Organizations also want to ensure that their valuable intellectual assets are well-protected, easily accessible, and stored in accordance with all compliance requirements.</p>
<p><strong>Initiating a Data Classification Program</strong></p>
<p>There are a number of ways to initiate a data classification program. Classifying the entire database is one approach; however, as organizations consolidate multiple business processes into the same database application, they may discover that same databases store sensitive information in one table and public data in another. This could result in unnecessary, inappropriate, or excessive action taken against data in the future. In such situations, organizations must resort to other means for classifying their data.</p>
<p>For those seeking to manage data growth, identifying a classification schema that collects retention and end-user access requirements will help IT deploy an ILM technical strategy and architecture.<a href="#_ftn2">[2]</a> For organizations interested in improving information security compliance, a data classification schema that maps out sensitivity to data fields in a database will help focus their database security solutions and initiatives on the highest risk elements.<a href="#_ftn3">[3]</a></p>
<p>Regardless of the goal, the first step in the data classification process should always be to position the task as a companywide effort, with participation from every unit (e.g., business owners, IT, audit, DBAs, developers, legal, and records management) and a common understanding of the various classification levels to be implemented.</p>
<p>When it comes to the actual classification of data, it is also vital that those classifying it can identify all uses for, and users of, that data, and can pinpoint any possible dependencies the data may have with other stored information. Those involved with data classification must also be aware of business policies, and be able to classify the data under the agreed-upon schema.</p>
<p>One byproduct of the data classification process should be the development of a metadata repository that lists the databases, the applications that the database supports, the tables and business objects, retention requirements and information security policies should be established. This metadata repository can serve as the baseline set of information for the implementation of a data governance program.<a href="#_ftn4">[4]</a></p>
<h1>Key Steps in the Data Classification Process</h1>
<p>Ideally, data classification should be done proactively during application deployment or development, and include a process that enables end-users to easily classify new data as they create it. If data classification occurs reactively, the process should target costly or at-risk applications and databases first.</p>
<p>Regardless of the approach taken, a few key steps must be followed:</p>
<ul>
<li>For each database, list tables should be sorted by schema and by size, including all indexes. The resulting list should highlight where the bulk of the data resides. For retroactive initiatives, a good approach is to initially focus only on tables that are 5 GB in size or larger, with the exception of tables that are known to contain regulated data.</li>
<li>Next, assign a business object and information owner to the various components of the filtered list. For example, general ledger will be assigned to Finance, while sales orders would be owned by Sales. Information owners can then be assigned the task of identifying the various types of data, who (if anyone) will be using it, and any specific SLAs or other requirements that pertain to it. This will enable a valid classification to be assigned. Information owners should also take into account that, for data they must keep, end-user access requirements may change. Data may be heavily accessed during peak times only. Such changes in usage patterns allow IT teams to take advantage of tiered storage options using mixed price/performance storage media.</li>
<li>If necessary, refer to documentation from developers or the source application’s ISV to gather a complete list of the tables that comprise the business object. For packaged applications, such as Oracle E-Business Suite, PeopleSoft, or Siebel, leveraging a tool can significantly speed up the business object-to-database table classification.</li>
<li>After all tables are grouped by business object, it should be checked against the organization’s record retention schedule (RRS) to determine how long that business object needs to be retained. In some cases, large volumes of data may reside in a table that does not map to any business object in the RRS. This data could be working data or interface data, transient in nature, or log data. In this case, ask the respective business teams whether the data is used for any valid purpose. If so, define operational retention periods for this information as well.</li>
<li>Once the bulk of the data has been classified and assigned both a legal and operational retention period, an appropriate storage solution can be designed according to the classifications now assigned to the data.</li>
</ul>
<h1>The Bigger Truth</h1>
<p>Organizations are continuously seeking new ways to spend their storage management dollars wisely. Data classification, which enables them to “layer” data in cost-appropriate storage and service “tiers,” is an ideal way to both cut costs and regain control of the massive amounts of data under management. A carefully-planned, structured approach to data classification is vital for maximum leverage of such an initiative.</p>
<p>With the right team comprised of representatives from all disciplines, a comprehensive viewpoint can be incorporated into the data classification taxonomy. This is needed to effectively set up categories with regard to value and relevance aligned with the business while getting a sense of the quantity of data in each category. With this base information, enterprise architects can work more effectively with capacity planners in defining and scoping the technology needed while more accurately forecasting and budgeting technology spend by data category.</p>
<p>A corporate data governance framework can provide the communication guidelines, tools, processes, and organizational structure to facilitate proper management of data in accordance with company policy to result in more optimized investments in information technology. Data classification is the foundational component that ensures data governance can be implemented and executed for enduring success.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <em>2011</em> <em>Data Management Survey, </em>to be published September 2011.</p>
<p><a name="_ftn2">[2]</a> See: ESG Market Landscape Report, <a title="Permanent Link to Managing Database Growth – Optimizing Database Application Lifecycles" href="../../../../../2011/04/market-landscape-managing-database-growth-optimizing-database-application-lifecycles/" target="_blank"><em>Managing Database Growth – Optimizing Database Application Lifecycles</em></a>, April 2011.</p>
<p><a name="_ftn3">[3]</a> See: ESG Market Landscape Report,<em> </em><a href="../../../../../2011/09/database-security-keeping-data-safe/" target="_blank"><em>Database Security: Keeping Data Safe</em></a><em>,</em> September 2011.</p>
<p><a name="_ftn4">[4]</a> See: ESG Market Report, <a href="../../../../../2011/06/structured-data-reference-model-managing-databases-in-the-efficient-data-center/" target="_blank">Structured Data Reference Model: Managing Databases in the Efficient Data Center</a>, June 2011.</p>
<p>See also: ESG Brief, <em>Oracle ASM: A Frontrunner in Storage Management Solutions</em>, September 2011.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/09/classifying-database-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Rise and Logic of the All-flash Storage Array: Solid State Takes Center Stage</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/the-rise-and-logic-of-the-all-flash-storage-array-solid-state-takes-center-stage/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/the-rise-and-logic-of-the-all-flash-storage-array-solid-state-takes-center-stage/#comments</comments>
		<pubDate>Tue, 23 Aug 2011 13:05:30 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[Block Based Disk Storage Systems]]></category>
		<category><![CDATA[HDDs, SSDs, and Other Storage System Components]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[Solid State]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=24301</guid>
		<description><![CDATA[There’s an old saying about having your cake and eating it too. It’s something to which we all aspire or gravitate: to avoid compromise. If you can get more goodness rather than less, why wouldn’t you? That’s the essence of the proposition of the all-flash array, a storage system based entirely on solid state storage. [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">There’s an old saying about having your cake and eating it too. It’s something to which we all aspire or gravitate: to avoid compromise. If you can get more goodness rather than less, why wouldn’t you? That’s the essence of the proposition of the all-flash array, a storage system based entirely on solid state storage.</div>
<private_standard>
<h1>Introduction: Putting the Storage World into a <em>Non</em>-spin</h1>
<h2>The Underlying Idea</h2>
<p>There’s an old saying about having your cake and eating it too. It’s something to which we all aspire or gravitate: to avoid compromise. If you can get more goodness rather than less, why wouldn’t you? That’s the essence of the proposition of the all-flash array, a storage system based entirely on solid state storage. Of course, many IT buyers might immediately say, “Sure, I’d love to have solid state for all my storage, but it’d be way too expensive.” That is understandable, but also a position that is about to be turned upside-down. It’s also the inspiration behind this paper: the storage world is on the brink of a large and significant change. It will not be a complete overnight change, but the reality of all-flash storage is here. These systems can bestow all the well-known value of solid state storage <em>without</em> added complexity, lack of functionality, or increased expenditure. A natural human tendency toward skepticism is to be expected, but <em>if </em>all the above were possible—that is, fully-solid-state storage arrays that are easy to use, fully featured, and do <em>not</em> carry a price premium over current performance spinning disk arrays—why wouldn’t you at least consider it? That is the magnitude of what is happening: IT managers are about to be able to have their storage cake and eat it too!</p>
<h2>Some Perspective</h2>
<p>Spinning disk technology has been with us since IBM introduced the RAMAC in 1956 and it is essentially unchanged today; sure, the platter sizes, rotational speeds, and materials, amongst others things, have changed; but the basic technology has not been altered for over five decades. Consequently, as the IT world has developed, the storage industry <em>and </em>users have worked hard to make hard disk drives (HDDs) manageable and useful. These efforts have essentially been attempts to mask the two key pains associated with HDD-based systems: their inability to deliver sufficient raw performance, and their inability to do more (in terms of working with faster CPUs and integrating with virtualized systems) as the IT world progresses. The impacts of these pains are seen in a range of operational issues that motivate change:</p>
<ul>
<li><strong>“Ponzi” Storage: </strong>HDDs have many inherent restrictions and limitations; industry attempts to ameliorate these have created what might be called “Ponzi” storage where complexity is layered upon complexity in order to try to squeeze more blood out of the [HDD] stone. But this requires many unnatural acts as well.</li>
<li><strong>Unnatural Acts: </strong>HDDs are mechanical devices, each head and spindle can only do a set amount of work. Unfortunately, this means that things like short-stroking (only using a small percentage of a disk’s capacity in order to gain the IO that an extra spindle can serve) are necessary. Poor resource utilization is, of course, just another way of saying wasted money. Let’s say, for instance, that raw disk costs $10/GB; if you only use 25% of the available space in order to optimize performance, then the cost per GB is actually $40! And this is before the additional costs associated with wasted space, power, and cooling.</li>
<li><strong>Increasing Pains: </strong>From both a capacity and a performance viewpoint, the pain of HDDs is only increasing. Hence, the imperative for change is growing, too:
<ul>
<li>The rate at which storage capacity demand is increasing has accelerated over the last decade, and, contrary to the first few decades of computing, it is rising faster than the price-per-capacity is declining, which is not sustainable. Bigger disks help capacity, but not performance.</li>
<li>Random IO needs are actually increasing, driven by such things as multi-core processors, virtualization, and cloud implementations. This, in turn, only increases disk contention issues and therefore exacerbates the need for more over-provisioning and short-stroking. All of this leads to more money wasted on disks, power and space. Using less of a disk’s capacity helps performance but not economic value.</li>
<li>Putting these things together creates enormous strains in storage infrastructures: “access-ability” has not even kept close to the increase in HDD capacities. It’s been a bit like adding seats in a theater or bus without adding extra doors: it will inevitably lead to a slower audience or passenger arrival and departure process! Indeed, queuing dynamics will ensure that the impact is commensurately worse than linear as the crush (whether audience, passenger or data) produces deteriorating operational impacts. The immutable laws of physics mean that contention – even perhaps a scuffle that only introduces extra delays! – is inevitable.</li>
</ul>
</li>
</ul>
<table style="height: 130px;" border="1" cellspacing="3" cellpadding="5" width="315" bgcolor="#fff5de">
<tbody>
<tr>
<td width="295" valign="top">
<h2>Adding Capacity without Performance</h2>
<p>It’s   been a bit like adding seats in a theater or bus without adding extra doors:   it will inevitably lead to a slower audience or passenger arrival and   departure process!</td>
</tr>
</tbody>
</table>
<p>Against this backdrop, the industry has developed a number of “band-aid” innovations that range from thin provisioning and deduplication (aimed at the capacity utilization issue) to automated tiering and solid-state caching (to alleviate performance woes). These are steps in the right direction, but they do not represent the quantum leap that all-flash storage arrays do.</p>
<p>Flash storage is the current iteration of a technology, solid state, that has been around for decades, although its earlier limited scale and extremely high cost restricted it to supporting and niche roles. That looks to be about to change. Ironically, although solid state has been around in some manner for decades, many users are perceiving it as new and vendors spent the early year or two of this new era of solid state deployment (kicked off by its integration into enterprise storage systems in 2007) having to defend some of its attributes such the need to manage endurance and differing read and write capabilities. Yet it’s worth remembering that a range of things we do with HDDs today (just think of RAID, for instance) are only there to ameliorate and cover up the inadequacies of spinning disks. All that said, most IT managers are well-disposed to flash (people essentially “get it”) but simply think it’s too expensive and therefore out of reach. It’s been put in the Ferrari bracket: something many of us would love to have, but we know that very high performance comes with a very high price and severe operational restrictions (like only two seats and no trunk space). Many vendors are now focused on bringing the Ferrari of solid state into the mainframe storage world, making it affordable with room for the family and shopping and still offering high performance.</p>
<p>This brings us back to the earlier question: what if this really could be? What if the obvious goodness of solid state (especially performance and utilization) came without all the downsides, especially cost? What if flash really could be a complete storage system rather than an HDD turbo-boost? To many, this may seem simultaneously preposterous and astoundingly appealing. As we investigate the reality and the IT values of this new approach, we suggest checking any preconceptions and prejudices at the door. As Oscar Wilde said “An idea that is not a little dangerous is unworthy of being called an idea at all.” Perhaps a storage system without anything that spins is an idea whose time has arrived? We have already seen signs of all-flash storage, but implementations have been mainly limited to relatively small capacity PCIe cards or flash appliances. The big and dramatic change this paper examines is the arrival of the all-flash enterprise storage array.</p>
<p>One final point should be made before getting to the details: this paper is not at all suggesting that we’re even close to, or even realistically contemplating, the day when the Smithsonian is the only place to see spinning disks. However, it is both feasible and logical to believe that the purposes for which spinning disks are used are going to change, and that we’re rapidly going to move to a situation where leading edge organizations are increasingly likely to serve most of their active IO from solid state storage (perhaps solid state tiers and caches today, but moving to all-flash storage arrays tomorrow). We are already accustomed to the differences between performance HDDs and capacity HDDs, and the former seem to have a terminal disease even if the actual expiration date is not precisely known yet. It is in that space that all-flash arrays make sense, whereas capacity disks look to have many years left to live.</p>
<h1>How and Why All-flash Storage Arrays Make Sense</h1>
<h2>Prior Barriers</h2>
<p>Far and away the most significant barrier to all-flash storage arrays being deployed before now has been the <strong><span style="text-decoration: underline;">financial cost</span></strong>, both real and perceived. Because of its combination of high storage value (in terms of positive performance impact) and high price, flash has always been seen to be about fixing particular issues and/or addressing point applications. More recently, the technology has been employed more horizontally for its generic infrastructure value as a tiering and caching tool. And yet, logically, if the industry could go further, with the caveat that such an implementation would only ever gain market traction IF it can be made affordable (or, at least if the delta is small, worth affording for other reasons), that would be preferable. Other areas of early concern for flash vendors have been <strong><span style="text-decoration: underline;">issues with reliability and longevity</span></strong> or endurance. Some of the concerns were valid, but there was also misunderstanding and FUD around them because these are issues that can be managed around (via wear leveling, over provisioning, cell management, etc.). And, as mentioned already, spinning disks have plenty of pains and drawbacks themselves, not least of which is that HDDs have actually deteriorated in terms of effective performance (IO per GB) as disk capacities have grown (as illustrated in the theater/bus analogies).</p>
<p>Another significant restriction on the adoption of all-flash storage arrays has been a lack of all the necessary <strong><span style="text-decoration: underline;">enterprise storage functions </span></strong>that mid- and higher-end IT organizations look for in their storage subsystems, which would typically include:</p>
<ul>
<li>High Availability/Resilience: via clustered controllers and redundant components with automated failover</li>
<li>Virtualized storage software functions: such things as thin provisioning, snapshots, etc.</li>
<li>Scalability: preferably this should be non-disruptive, and capable of ranging from tens to hundreds of terabytes</li>
<li>Integration: standard storage protocols like FC, iSCSI, NFS, etc.</li>
</ul>
<p>Generally, these barriers have been, or are being, overcome. Although prospective users of all-flash storage arrays will no doubt demand proof points (both in an operational and business sense), the work that ESG has seen from vendors in this emerging space shows that all-flash storage arrays are [going to be] coming to the market fully loaded with the necessary functions and <em>can </em>also make financial sense when compared to high performance HDD-based systems.</p>
<h2>Earlier and Alternative Approaches</h2>
<p>The prima facie attractions of all-flash storage arrays have not precluded other less extensive implementations of solid state. From the first niche-application-focused solid state drives back in the 1980s and onwards, there has been some form of <strong><span style="text-decoration: underline;">solid state appliance</span></strong> on the market invariably aimed at addressing very specific application-performance challenges.</p>
<p>A broader implementation model that is gaining popularity lately, and that can provide real benefits, is to use solid state as a <strong><span style="text-decoration: underline;">cache and/or tiering tool</span></strong>. The Achilles Heel for such implementations is the ability to predict what data should be moved where and when; which has been why most HSM type implementations like this have either not worked well or have not been able to deliver optimal results. While there are improved software tools available, the approach is still clearly a compromise because it is inherently an odds game: users (or their software!) are trying to “guesstimate” the optimum placement of all data at all times. Since there are bound to be sub-optimal decisions made at some point (you can’t always know before it’s too late) applications often have to run on the assumption that the guesstimate might be incorrect and thus at times only provide application(s) with disk response/latency rather than flash response/latency (hence slowing applications because you don’t know which sort of response you’ll get and therefore you need to design for disk). To use another analogy, think of betting on a craps table: you can guarantee winning if you cover every bet, but then you’ve ruined the payback and original purpose. While performance can be improved, the implementation ratios advocated by vendors (5% flash and 95% disk is typical) can never achieve performance as good as an all-flash solution. No matter how generously you calculate the cache hit ratio, it’s likely that 25-50% of IO will be coming from disk, and that means that your worst-case latency will be the same, slowing the application.<a href="#_ftn1">[1]</a></p>
<p>The latest alternative approach is the use of <strong><span style="text-decoration: underline;">PCIe-based solid state</span></strong> in servers. As with all solid state, this can definitely provide good benefits although one way or another, it is still usually a matter of cherry picking the data or applications that are deemed worthy of the solid state boosts. Far more restrictive is the fact that this approach is not physically shareable across servers, which not only severely restricts or precludes the delivery of high availability, but makes all sorts of standard enterprise operations, such as backup, very challenging. Clearly, this approach doesn’t help for write-centric workloads as all IOs have to go to disk. Many users have spent years trying to escape DAS and the need to manage multiple islands of storage; this can obviate such progress.</p>
<p>This all brings us back to the recurring question: if an all-flash approach can do everything that these other alternatives can <em>and</em> do it economically (which means at or below the cost of a tiered methodology), then why would users want to bother with the other, inherently more restrictive, approaches? Tiering is popular right now, and it has relevance and value, <em>but </em>it is clearly (and by definition) also a halfway house. It acknowledges that flash is better while restricting its usage because of the price. If you could just jump to the logical end-game without having to juggle and balance the cost-benefit-analysis, why wouldn’t you?<a href="#_ftn2">[2]</a></p>
<h2>Technological Enablers</h2>
<p>Aside from the price aspect itself, it is also apparent that a range of technology changes and improvements must have occurred or been optimized in order for all-flash storage arrays to be a realistic proposition. Some of the areas that make this new approach possible, and which different all-flash vendors [will] have to some degree are:</p>
<ul>
<li> Flash media improvements
<ul>
<li>Sheer data handling and processing speed of all aspects of the flash media and controller.Higher flash media capacities and lower prices (the combination permits significant scalability).</li>
</ul>
</li>
<li>Flash management improvements
<ul>
<li>Enhanced ability to manage longevity, endurance, and reliability.</li>
<li>On flash, the location of data does not matter. With HDDs, where you put data matters a lot.</li>
<li>Application usability is crucial. In other words, users can’t be expected to tweak their existing applications or to need an army of “white coats” to make everything work.</li>
</ul>
</li>
<li>Flash system attributes
<ul>
<li>Full storage virtualization for an all-flash storage array is, of course, a prerequisite since it is the key to providing many crucial and valuable storage services.</li>
<li>Software advances are important, both for optimizing the amount of data that is actually stored (minimizing it via compression and deduplication for example) as well as for ensuring ease of adoption and use (maximizing this via integrated management tools).
<ul>
<li>Compression, deduplication, and advanced thin provisioning are crucial elements for all-flash arrays to hit the price point that enables them to overcome the economic adoption hurdle. Even although thin provisioning may seem old hat, it is often underutilized in traditional HDD systems (whether by the manufacturer or user) because of understandable concerns over performance: more data on a drive is great, but as you increase capacity utilization, you strain performance and may need to either restrict its use and/or reserve spindles for IOPS (again, this can be best illustrated by the theater analogy).</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2>Pricing Possibilities</h2>
<p>The price/value for an all-flash array still has to make sense. However, it is important to learn to analyze the prices as more than just cost per gigabyte or at least to understand that raw GBs alone do not necessarily matter. After all, if a user can (for instance) use inline compression and deduplication in an all-flash array, then less actual back-end physical storage will be needed to store a given amount of front-end capacity. As mentioned earlier, if a user is paying $10 for a gigabyte of regular HDD and short stroking to use only 25% of the available capacity, then the real price per gigabyte is actually $40. The logic of this is clear, yet users are so used to evaluating storage purchases simply based on cost per gigabyte that such easy, and significant, calculations are sometimes overlooked or hard to accept. It’s a bit like having someone say they would not (“no way … never”) pay $25 for a gallon of a new fuel that’s fully compatible with their existing vehicle because “it’s just too much” without finding out how many miles per gallon per dollar their vehicle could achieve on that new fuel. Years and decades of learned behavior are hard to set aside, but the fact is that old tools and old thoughts cannot be used to measure new approaches. The relevant measure is <em>effective cost</em> for a given workload or performance, and the all-flash-array innovators have the ability here to send some shock-waves around the industry incumbents.</p>
<p>Beyond straightforward storage costs, there are a couple of important additional TCO advantages for flash:</p>
<ul>
<li><strong>Power and space consumption:</strong> This is an order of magnitude or more less than for HDD arrays. Since it is common to find that it costs as much to rack, power, and cool an HDD array over three years as it does to buy it in the first place, this is a huge consideration in favor of flash systems.</li>
<li><strong>Server consolidation:</strong> If (as is possible) using flash produces enough performance that enables enough consolidation to reduce, say, Oracle nodes by 50% (both hardware and licenses), then the financial savings are such that they can make the math on the storage itself almost insignificant.</li>
</ul>
<p>While it’s tempting to think that technology rules the IT space, far more often it is the financial aspects that really dominate. Certainly, every once in a while, a game changing technology emerges that addresses a critical business issue such as time or cost. Think of the virtualization of servers as a good example of delivering massive cost improvements, or data deduplication as a good example of beating time. In truth, while time was certainly the IT driver for the value of backup deduplication, the underlying business driver was financial. Why? Because prior to deduplicating backups, it wasn’t as if backups <em>could not</em> be completed in time; it was simply that the technology needed to complete them was so outlandishly expensive. Technologies can be IT game-changers, but better economics are invariably the root value.<a href="#_ftn3">[3]</a></p>
<h2>New Needs</h2>
<p>New IT approaches also drive the move to flash. Take server virtualization, for example, which severely stresses the performance of storage systems. And, as Figure 1 shows, the move to virtualization has a long way to go yet.</p>
<div class="graph_top">Figure 1. Percentage of x86 Servers That Have Been   Virtualized</div>
<p><img class="aligncenter size-full wp-image-24305" title="AllFlashF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/AllFlashF1.png" alt="" width="618" height="313" />Figure 1 shows that the number of users with more than 50% of their X86 servers virtualized is expected to jump from 14% in 2010 to 38% in 2012. Such consolidated compute capacity inherently drives increased performance demands into storage systems, and is exacerbated because the VMs also tend to drive randomization of IO. Put starkly, even with most organizations having well under 50% of their servers virtualized, storage has already become a critical choke point for VM scalability. And, to make matters worse (unless you are an all-flash array vendor), the odds are that those early-virtualized servers support some of the less important applications since most users have taken a conservative approach to the prioritization of what gets virtualized. So we might take an educated guess that perhaps less than 20% of the IO in an organization is virtualized to date and that is already causing a storage issue! If the remaining server virtualization represents the other 80% of IO, then it is hard to imagine serving that successfully without flash.</p>
<h1>Buyer Advice: What’s Possible &amp; What to Look For</h1>
<p>Prospective users of new all-flash storage arrays should remember that this is not just about a nicer, “washes whiter” solid state disk drive. This is an ALL-flash storage system that replaces rather than enhances a performance HDD-based system. Performance is important, but this is about more than just performance.</p>
<p>A good way to think of things is in terms of means and ends. Previous SSD incarnations tended to view performance as an end in itself, but with all-flash storage arrays it is important to think of the performance as also being a means to achieve other ends. As we move along the natural continuum to using more and more silicon-based storage, leading edge IT operations will naturally want to consider maximizing their competitive advantage by moving into the new systems relatively early. For those users considering the purchase of an all-flash storage array as more become available through 2011 and 2012, what sort of things should they consider?</p>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h2>Things to Consider</h2>
<ul>
<li>Look for a purpose-built system rather than   something that is retrofitted.
<ul>
<li>Flash   is a different animal from HDD: it requires fluidity as flash-based data will   intentionally move around the system. Having everything run within individual   SSDs can overwork them and lead to less effective utilization, so a purpose   built management layer is crucial.<a href="#_ftn4">[4]</a></li>
<li>Fine   grain virtualization is imperative for optimum management of functions such   as thin provisioning, deletion management, reclamation, deduplication, and   others.</li>
<li>Given   that data reduction is an important tool to reduce the physical amount of   flash needed for a given workload, check that this operation can be done   without adding any significant latency.</li>
</ul>
</li>
<li>Seek proof from system validations and user   experiences.
<ul>
<li>With the efficiency of data   reduction being such a crucial aspect of the economic efficacy of any   particular system, check the vendors’ proven capabilities in this area. Not   just in general terms but, for instance, what data reduction is achieved for   database or virtualization workloads?</li>
</ul>
</li>
<li>Since you will be purchasing a full storage   system rather than an add-on performance device, scalability will be crucial;   you will clearly want to know that it has the ability to grow substantially   to meet likely needs.</li>
<li>Ask about the overall cost: what is the real $/GB,   the $ per usable/effective GB? And are such things as HA, RAID, metadata, and   flash management, etc. all factored into the effective cost calculation?</li>
<li>Ask about the performance implications for   your applications. What performance can be expected? Is a certain cache   hit/miss ratio expected? What if that changes?  Many benchmarks often don’t properly   showcase the benefits and issues of flash, so ask the vendor(s) for   real-world details on your applications.</li>
<li>Compare system and operational needs with the   type of storage necessary. Full flash systems have different characteristics   to hybrid flash/HDD systems; each can have its place but, for instance,   trying to run really high performance requirements through a hybrid system   can stress the (intentionally) lesser amount of flash that is present, which   can in turn negatively impact endurance and overall system performance. It’s   a bit like having all the traffic for a busy freeway going through a certain   junction and therefore causing jams and wearing out the pavement.</li>
<li>Check for full levels of HA. If you’re   considering an all-flash storage array, you are almost certainly going to be   replacing or adding to a high performance HDD disk system that is running   mission critical and business crucial applications. You are therefore most   likely to already have clustered controllers that share existing resources   and you would want no less for the all-flash array. How are failures   handled?  How is capacity expansion   handled?</li>
<li>“Regular” system compatibility with FC, iSCSI,   and NFS (as needed) should be verified in order to ensure easy adoption and   implementation.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<h1>Operational and Financial Reality</h1>
<h2>Costs and Performance</h2>
<p>There has always been a group of users and applications that are ready to pay for performance. Some 13 years after IBM introduced the RAMAC drive at just under $8K per megabyte, it came out with the high-performance 2301 drum which commanded nearly three times the cost per megabyte because it could offer a then-startling average access time of less than nine milliseconds. A decade later, StorageTek offered the first commercial DRAM-based solid state drive, which, more than 20 years after the RAMAC, could still command nearly $9k per megabyte. EMC, quickly followed by others, ushered in a new era of solid state use in 2007 when it started to add a limited number of flash drives as turbo boosts to its HDD systems.</p>
<p>All-flash is the next step, where solid state performance is not just an end, but also a means to deliver other things—including good economic value. Flash can, of course, easily compete with HDDs in terms of IO throughput; what’s intriguing is that flash can also compete in terms of TCO because it can be utilized better, has dramatically better OPEX structures, and because its performance characteristics are delivered without requiring the operational inefficiencies that are endemic to do so with spinning disks. All-flash storage arrays can also help to improve ROI, notably by driving the necessary performance to preclude storage from being an obstacle to server consolidation; such consolidation in turn leads to fewer CPU and server software licenses.</p>
<h2>Real World Practicalities</h2>
<p>With flash still on a steep product roadmap, there is a greater likelihood that compared to HDDs, it can deliver regular and dramatic cost/capacity improvements while at least maintaining performance levels (something HDDs are physically incapable of doing). And maintaining performance is a significant point: in all-flash implementations, consistent high speed is really more relevant and important than pure top speed since the products will be targeted at replacing high performance regular disk systems. In other words, one might think of comparing a racing car (point product solid state) with an everyday car (all-flash array): race cars are very good at what that do, but that’s all. They are not particularly cost effective when it comes to transporting people over distance and they are certainly not designed for everyday use. For flash to be a success as a mainstream “vehicle,” it needs to fit into the sports sedan rather than the Formula One class where the target is not top speed or performance at any cost, but is instead consistent high speed cruising, in town agility, safety, and plenty of room—all at an affordable price. Such long term, consistent high performance is itself demanded by many of the major drivers and initiatives within IT as we saw earlier in the paper.</p>
<h2>The Argument for Solid State</h2>
<p>It is interesting to step back to get a broader perspective; after all, nowhere is there any rule or law that says that storage must spin. It is simply that Winchester drives were the best option until now. Indeed, many in the industry will explain that the HDD was viewed as something of a stopgap and not expected to have a particularly long life. For instance, “They talked about spinning disks going away in three years when I worked on IBM’s 62GV in 1973.” <a href="#_ftn5">[5]</a></p>
<p>Ironically, it is the biggest vendors of enterprise storage (all of whom have embraced solid state in some form or fashion) that are making the argument for solid state in general and by extension for all-flash storage arrays specifically. They are all showing by their actions that the best place from which to serve IO is silicon. In other words, all the major IT vendors are implicitly saying that all-flash is the way to go for storage: this is because their use of some flash shows they know that there <em>is</em> ultimately a better place for [some] data to go. They are, however, suggesting that not all data should go there because it’s too expensive (a cynic might also say it has something to do with protecting their margins elsewhere with spinning disks). Some, EMC and Oracle for instance, have made big plays and statements about their long term commitment to enhancing solid state and increasing its role in storage.</p>
<p>But, what if you could get all your data to that better place without spending extra or losing anything? Perhaps even gaining operational value? What if you could have your storage cake and eat it too?</p>
<h1>The Bigger Truth</h1>
<p>All-flash storage arrays are here, and they are going to impact the market; the only question is how fast it happens. That is a function of the price point. Why? Beyond that price hurdle, the move to flash is simply too logical and attractive. If the all-flash price is under or even just close to that of performance spinning disk, the question simply becomes “Why wouldn’t you make the change?” All else being equal, why wouldn’t you want a MacBook Air (another revolutionary product enabled by flash) rather than a regular laptop?</p>
<p>There’s more to the enterprise use of all-flash storage arrays than just TCO and OPEX, however; making the financial aspect work simply removes a reason to <em>not </em>consider the change. There are also positive reasons <em>to consider </em>these all-flash storage systems as their availability increases. Most notably, consider the bifurcation of performance and capacity (which current technology has forced together). Think of two storage hierarchies replacing the one that we typically think of: there should be an IO-, or work-, focused hierarchy (solid state) and a pure capacity, or long-term-retention, hierarchy (fat cheap disk, or tape). Moreover, there will be a broader change coming to IT if solid state takes over storage: databases, file systems, and virtualization infrastructure will all evolve to take better advantage of flash. Although that’s a long term view, it is made more likely because of the short term ease of adopting all-flash arrays, which can be plug-compatible with current infrastructures, as well as being equally or more reliable, with better OPEX and, best and most eye-opening of all, competitive CAPEX to traditional disk.</p>
<p>Let’s face it: the end game for storage is not spinning rust, however clever we’ve gotten at doing it! The way we’ve been running disk farms is a sheer waste of space both on the disks themselves and in the physical data center at large. It would be laughable or criminal if it were not for the fact that it’s been the only practical option for decades. Churchill once said that “Democracy is the worst form of government there is … except for all the others.” And that’s how disk storage has been, too. Of course, things won’t change overnight; it will still take time for many IT users to get their heads around the fact that these all-flash devices are full mainstream, general –purpose storage arrays (and not an appliance or an add-on). But, assuming the promised financial points can be achieved, the emotional attachments to HDDs and simplistic $/raw GB analyses will fade.</p>
<p>It looks like there will be a number of new players in this space pretty quickly. So what of the traditional storage incumbents?  Disk-centric vendors could soon face an innovators’ dilemma in that they have centuries of man-hours of engineering invested in spinning disk, and markets and margins to protect. History shows that the smarter ones will already be working on their [contingency] plans and will adapt to all-flash on the basis that, unpleasant as it might be, it is better to do so rather than wait to be relegated to storage also-rans. It is easy to get into hyperbole, but all-flash arrays genuinely look to have paradigm shift potential. As such, it is clear that IT purchasers should absolutely consider an all-flash array in their next performance storage decision.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Applications often need to be designed around the expected worst case, i.e., getting 70-90% cache hits may not change the application design the way that getting 99.9% cache hits would, particularly as the hit ratio varies with the workload.</p>
<p><a name="_ftn2">[2]</a> Without too much depth on this, there are always challenges with hybrid approaches; for instance, because of cache efficiency, disks favor large chunk sizes (keep the map in DRAM, contiguous access on disk) but solid-state favors small chunk sizes (don&#8217;t waste the flash for cold data adjacent to hot data). By trying to balance the needs of disk and flash, you end up with an implementation that is not ideal for either.</p>
<p><a name="_ftn3">[3]</a> Source: This paragraph is adapted from a companion ESG Market Report,<em> </em><a href="../../../../../2011/05/how-economics-alter-the-storage-landscape-the-financial-leverage-of-storing-less-will-win/" target="_blank"><em>How Economics Alter the Storage Landscape</em></a>, May 2011.</p>
<p><a name="_ftn4">[4]</a> To add some technical detail, look for things such as a data layout that does not update in place, auto-refreshes (for data aging) and has efficient garbage collection and wear leveling. A data layout that aligns with flash page boundaries will avoid write amplification issues.</p>
<p><a name="_ftn5">[5]</a> Source: Chris Pollard in a comment on The Business of Storage Blog (“Storage…with an emphasis on age”), February 19, 2011.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/the-rise-and-logic-of-the-all-flash-storage-array-solid-state-takes-center-stage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Defining Tier-1 Storage in the Modern Data Center</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/defining-tier-1-storage-in-the-modern-data-center/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/defining-tier-1-storage-in-the-modern-data-center/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 13:03:55 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[Block Based Disk Storage Systems]]></category>
		<category><![CDATA[File-based Disk Storage Systems and File System Software]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[tier-1]]></category>
		<category><![CDATA[tiered storage]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=24139</guid>
		<description><![CDATA[The purpose of this paper is to define what constitutes “tier-1” storage in the modern IT world and in the data centers and services that support it. The term seems to be implicitly understood when discussing traditional environments, but what constitutes the norm in data centers has evolved rapidly over the last few years. What [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">The purpose of   this paper is to define what constitutes “tier-1” storage in the modern IT   world and in the data centers and services that support it. The term seems to   be implicitly understood when discussing traditional environments, but what   constitutes the norm in data centers has evolved rapidly over the last few   years. What was a simple environment just a few years ago with mainframes or   a few large servers to be supported has evolved into a complex web of virtual   machines, clouds, and expanding user expectations. These three factors   demand, and also create, flexibility, but they do so in a way that pushes a   lack of predictability upon the storage infrastructure.</div>
<private_standard>
<table border="1" cellspacing="3" cellpadding="5" bgcolor="#fff5de">
<tbody>
<tr>
<td width="706" valign="top">
<h1>The Purpose &amp;   Nature of this Paper</h1>
<h2>Storage in the New IT World</h2>
<p>The purpose of   this paper is to define what constitutes “tier-1” storage in the modern IT   world and in the data centers and services that support it. The term seems to   be implicitly understood when discussing traditional environments, but what   constitutes the norm in data centers has evolved rapidly over the last few   years. What was a simple environment just a few years ago with mainframes or   a few large servers to be supported has evolved into a complex web of virtual   machines, clouds, and expanding user expectations. These three factors   demand, and also create, flexibility, but they do so in a way that pushes a   lack of predictability upon the storage infrastructure.</p>
<h2>Part of a Broader Categorization Initiative</h2>
<p>Of course, the   impacts of the structural, application, and environmental changes are not   just felt in the storage infrastructure. This paper is just one of a series   in which ESG will attempt to define high-level criteria, attributes, and   categorization for the infrastructure required to operate a “next-generation   data center” be it on premises, in the cloud, or as some hybrid   implementation.<a href="#_ftn1">[1]</a> These categorization exercises are necessary   to help delineate those vendors and technologies that are only suited for past/current   data center requirements from those that have what it takes to support a true   next-generation data center.</p>
<h2>Partake in the Effort</h2>
<p>ESG’s objective is   to run the overall undertaking with the support of the industry as these   categorizations are certainly not specific to ESG and as such input from   other professionals—users, vendors, commentators—can only be beneficial. Each   document will frame the debate and, for lack of anything better, outline a   proposed methodology and approach to the definitional conundrum.ESG will act   as a clearing house and compositor and will update this paper (and the others   as they become available) regularly. Please address comments and suggestions   to the author (contact information available on the ESG website), as dialogue   is encouraged to continually improve the definitions.</p>
<h2>A Line in the Sand</h2>
<p>Clearly, having tier-1   abilities in anything should be a function of delivering the requisite   product functions that meet necessary attributes. This paper is a “line in   the sand” for what those criteria should be for tier-1 storage. It does not   purport to be <em>the </em>answer, although   it may be! It is, however, <em>an </em>answer;   a “straw horse” of definition based upon numerous industry conversations. Within   each criterion, there will also be some variation of completeness of delivery   and ESG has also provided guidelines for the levels of maturity within each   criterion where possible: basic, moderate, and advanced. The intent is to use   both the general criteria and the specific maturity/completeness as an   objective gauge by which the whole industry can measure the suitability of   products for the next-generation data center.</td>
</tr>
</tbody>
</table>
<h1>Tier-1 Storage in the New Data Center</h1>
<h2>The Beauty of Storage Tiers</h2>
<p>Human beauty, so the saying goes, is in the eye of the beholder. It defies description, but we know it when we see it. Is that really so? Recent scientific investigation has attempted to deconstruct what defines beauty and there are rigorous mathematical ratios (such as the dimension of facial features, length of nose compared to width of eyebrows, etc.) that can be shown to be standard across subjects that multiple beholders perceive as beautiful. Now, of course this work and these relationships do not mean that all beautiful people look alike. But it does show that they have certain attributes that could be considered to be a “core beauty foundation” upon which their own individual facial nuances are layered.</p>
<p>What on earth does this have to do with tier-1 storage? The parallels are extremely pertinent.Tier-1 storage is something that people nod sagely about, and they know what it is when they are confronted by it. But, when asked for a definition, things get somewhat hazier. After all, one user’s tier-1 storage system may be another’s data dump depending on situations, budgets, applications, and industries. But, as with beauty, does this have to be the case? Surely there must be some absolutes for which nothing but the best will do? Is there a set of core tier-1 storage system attributes? As with the beautiful people, it would <em>not</em> mean that all tier-1 storage systems are the same, but rather it <em>would</em> mean that all such storage has certain prescribed attributes.</p>
<p>As we enter a new era of virtualized and “clouded” IT (with its burgeoning complexity, yet strident focus on flexibility and economic value) a term like “tier-1 storage” would be useful, provided it had actual meaning (as opposed to just a positive marketing nicety) attached to it. This is for two reasons:</p>
<ol>
<li>There is a wave of recent, emerging, and brand-new storage providers (themselves built on waves of storage virtualization, scale up- and out- architectures, and [relatively] new technologies such as solid state) that might well be tier-1 storage, but which are implicitly precluded from joining the self-anointed “tier-1 club” today by assumption rather than fact.</li>
<li>It is very easy to throw around a descriptor such as “tier-1” without a definition. It is hard for tier-1 providers to push back on those who would unfairly seek their mantle.</li>
</ol>
<p>A description allows everyone, existing and new vendors alike, to have a level semantic and technical field upon which to be measured, replacing glib assumption with specific assertion. Now, that would be a thing of beauty, wouldn’t it?</p>
<h2>The Storage “Tier-archy:” What and Why?</h2>
<p>Describing some storage as tier-1 is by, self-definition, to acknowledge that there are at least two tiers of storage. In reality, there are more, of course. The goal of this paper is not to focus to any great extent on the merits of those various tiers or of tiering in general, but rather to define the criteria necessary for tier-1 storage in a modern data center. Some brief references are necessary as background, but that is all.</p>
<p>For the purpose of this report, tier-1 does not necessarily mean the highest tier that a user needs or can afford since their particular needs might be met by some other technology even if they refer to it as “tier-1.” Instead, tier-1 storage is storage designed for mission-critical applications in extremely high performing, extremely highly available, and extremely well-protected data environments. Put another way, the tier-1 storage that this paper seeks to define is about <strong>quality tiering</strong> (which is desired or dogmatically required for an application) as opposed to <strong>forced or economic tiering</strong> (born of lesser needs or pragmatic choice).</p>
<h2>New IT World = New Storage Needs</h2>
<p>Traditionally, the top tier of storage has been very much about performance and reliability: IOPS and “five nines.” Of course, these requirements do not go away in a new world of virtualization and clouds! What makes the “new tier-1” for the “new data center” such a tough challenge is that it requires vendors to combine traditional attributes with exceptional agility and efficiency. As we’ve moved through the various eras of IT, from mainframe, to distributed, to internet-focus, to virtualized/cloud, the storage infrastructure to support each type of IT had to change, too. Just adding capacity to a “string” of tier-1 mainframe disk in the 1980s or 1990s could require weeks of planning and multiple visits to the change management meetings. Today, that sort of thing had better happen automatically and non-disruptively. We used to employ armies of specialists who did capacity planning and implementation. There were even guidelines as to how many terabytes each individual might conceivably manage. Today, we manage petabyte systems and expect dynamic provisioning tools and automated Quality of Service (QoS) software to manage things for us. Today, predictability and order are replaced with flexibility and the unexpected, and yet somehow the storage infrastructure has to flex and adapt accordingly. Virtualization, flexible business needs, and all sorts of clouds have made for extremely active, unpredictable, and variable demands upon storage. With so many changes in terms of what’s expected from a tier-1 storage system, we cannot afford to assume modern capabilities based on old criteria.</p>
<h2>High Level Storage Needs<a href="#_ftn2"><strong>[2]</strong></a></h2>
<h3>How We Got Here</h3>
<p>There are clearly some immediate areas within storage that need to be addressed for it to be a compliant and valuable contributor to this “new IT.” In order to understand exactly why, a little history is needed.</p>
<p>Commercial computing took hold when one single infrastructure stack executed one specific application for one specific purpose. The original mainframe was a glorified calculator. Centralized computing was predictable and controllable, albeit expensive. But it could be managed: one processor system and one IO subsystem.</p>
<p>Decentralized (or distributed) computing was developed largely to try to solve the economic challenges of centralized computing (essentially CAPEX) and yielded low-cost, commodity servers which we promptly plugged into proprietary, large, expensive, monolithic storage boxes. Servers became cheaper and more interoperable while storage has remained proprietary and expensive. In the old days, the server was the thing that cost all the money. You picked your server by your OS. You picked your OS by your application. Storage was a “peripheral.”</p>
<p>Today, servers are cheap and interoperable while storage is outlandishly expensive, complex, incompatible, and difficult. In many respects, it is the last bastion of IT awkwardness: the peripheral tail wagging the purposeful dog!</p>
<h3>Where to Next?</h3>
<p>Let’s take for granted that we want to virtualize (which could include clouds as an element) in general because it can bring efficiencies in asset utilization, take advantage of the commoditization of hardware, leverage common infrastructures, provide seamless mobility options, etc. If we can do all this, then we are set up to drive the next (higher) level of value where we can then aspire to provide infrastructure that:</p>
<ul>
<li><strong>Self-optimizes:</strong> boxes that tune/reconfigure themselves for the workloads that are presented and change as those requirements change.</li>
<li><strong>Self-heals:</strong> infrastructure deals with fault scenarios autonomously, remapping/rebuilding itself so that the application is not affected.</li>
<li><strong>Scales dynamically:</strong> up or down, in or out; infrastructure that extends—virtually—to whatever requirements the workload(s) presents.</li>
<li><strong>Self-manages:</strong> adapts to changing scenarios based on policy and enforces those policies via automation.</li>
</ul>
<h2>A Note on Finances</h2>
<p>One thing that has not changed, and indeed which is becoming more crucial by the day because of the well known gap between the demand for IT and the resources to supply it, is the need for all storage to make financial sense. We may couch our conversation in terms of the “right data” and the “right place,” but those are really symptoms; the underlying cause is the simple fact that no one can afford to have everything on main memory, which is what would occur (putting things like back up, replication, and volatility to one side) if storage were free. A tier or level of storage can be built from a wide variety of storage products and the number of choices has never been greater. Choices range from ultra-high capacity, low cost, lower performance storage devices to subsystems with advanced data management functionality, scalable capacity, and very high levels of performance and data protection.</p>
<p>The need for financial efficiency is clear. At the same time, the needs of the “new data center” (that is, some mix of clouds, virtualization, maybe ITaaS, and definitely burgeoning needs and expectations) are for extreme flexibility and agility. Balancing the two (and, of course, doing so with high levels of reliability and availability) is the job of the IT organization and ESG research (see Figure 1) shows that the needs for efficiency (cost control) and agility (business process improvement) are indeed the top two business initiatives that impact IT spending decisions.</p>
<div class="graph_top">Figure 1. Business Initiatives That Will Impact IT Spending Decisions, Three-year Trend</div>
<p><img class="aligncenter size-full wp-image-24141" title="Tier1StorageF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/Tier1StorageF1.png" alt="" width="617" height="443" /></p>
<h1>New Data Center Tier-1 Storage Criteria Defined</h1>
<p>For the purposes of brevity, we shall assume that the general concepts of tiers and tiering are understood. The trade-off decisions among price, performance, availability and so on for various applications and users have been made; data classification has been done (where it can be in today, since automated and policy-based solutions that tier based on age and access patterns are taking over in the new data center); and now it’s time to evaluate which tier-1 storage to use for your mission critical needs.</p>
<p>As a side note, the next-generation storage infrastructure model will eventually need to be folded into the historical tiering model, or indeed supplant it. We need to take the manual labor out of managing storage. An effective policy-based classification and tiering requires a system that can accommodate the automation and orchestration of resources—you can’t effectively tier with stove-piped, fixed systems of old. As we shall see, this calls for tiers that can expand dynamically, absorb and return capacity, etc., in order to be truly effective in the new world.</p>
<p>Here are the criteria that should exist for a tier-1 next-generation storage infrastructure designed to address the demands of a virtualized, clouded, variable world, with flexible and unpredictable storage demands. A few general notes apply to the criteria:</p>
<ol>
<li>Basic (traditional) attributes  such as high performance, high reliability and an easy to use management GUI are taken as givens.</li>
<li>Here, the focus is on the criteria that separate and distinguish the “new” tier-1 storage (whether block or file at the system view) from the traditional.</li>
<li>The sub-bullets shown under each criterion are the maturity levels (where applicable and where they have been defined to date).</li>
<li>There is a deliberately pedantic aspect to the terminology and, at times, a very specific delineation of functions. This is simply so that aspects that might all “hang together” automatically in one vendor’s implementation are not inadvertently assumed to do the same in all products. For instance configuration and provisioning might be integrated into a set up wizard, but that is not always the case; plus it helps to highlight the distinction of the new criteria from the traditional.</li>
<li>A less technically precise, but far more colloquially appealing, description of the main impact/benefit follows each of the criteria; a kind of “Cliffs Notes” version to glean the key points.</li>
</ol>
<ul>
<li><strong>Self-Configuration </strong>
<ul>
<li><strong>This is about various levels of automated set up, and is designed to speed and simplify system adoption and growth. No more spreadsheets and whiteboards for system administrators to </strong><strong>figure things out! </strong></li>
</ul>
</li>
</ul>
<ol>
<li>Automatic initialization.</li>
<li>Automatic configuration of new disk/controller/cache resources.</li>
<li>Auto load balancing and optimization of new resources.</li>
</ol>
<ul>
<li><strong>Dynamic Volume/File Provisioning</strong>
<ul>
<li><strong>Flexible provisioning (by attribute and/or over time) accelerates and simplifies one of the most common storage administration tasks. The lower maturity levels are very common while the true full dynamic ability drives enhanced optimization and is operationally very valuable. </strong></li>
</ul>
</li>
</ul>
<ol>
<li>Can configure and present basic LUNs/volumes/files.</li>
<li>Ability to provision resources dynamically, without pre-planning, based upon policies/SLAs which can have differing performance and/or economic attributes.</li>
<li>Ability to dynamically and non-disruptively reconfigure/reprovision storage system resources to address new service level requirements.</li>
</ol>
<ul>
<li><strong>Self-optimizing and Tuning </strong>
<ul>
<li><strong>Think of this as an engine management system that automatically helps to balance $/IOPS, $/GB and service levels in a user environment.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>At the basic level, all things are treated identically.</li>
<li>Automatic tiering is based on pre-determined schedules which may include some on-the-fly reaction and some meritocracy such as time-based auto-tiering.</li>
<li>Can dynamically and perpetually configure/reconfigure the storage system to optimize the use of available resources based on QoS, performance and/or economic parameters.</li>
</ol>
<ul>
<li><strong>Sub-LUN Optimization (of Capacity and Other Resources) </strong>
<ul>
<li><strong>An enhanced, more granular management technique that improves the ability to balance $/IOPS, service levels, and $/GB with the aim of delivering satisfactory service levels at the lowest cost. Multiple workloads with varying QoS levels can thereby reside on one storage system while retaining optimum efficiency.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>Multiple [sub] LUNs can share a single disk.</li>
<li>Multiple [sub] LUNs of varying QoS can share a single disk, reducing the need to provision and manage separate pools for each QoS.</li>
<li>Sub-LUN/volume elements are perpetually optimized at granularly, each with QoS, capabilities, and autonomous attributes derived from the master volume. In other words, multiple sub-LUN elements that have different QoS needs can share common disk resources without prescribed pools or reservations.</li>
</ol>
<ul>
<li><strong>Secure Multi-tenancy </strong>
<ul>
<li><strong>This capability is about securing the visibility to, and interaction with, data belonging to any particular user on a shared system.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>Basic level is just physical domains.</li>
<li>Permission-based access to [virtual] independent storage domains securely implemented on shared physical assets.</li>
<li>Dynamic support for varying QoS across virtual domains.</li>
</ol>
<ul>
<li><strong>Scale-out Infrastructure </strong>
<ul>
<li><strong>This is the ability to deploy new resources fast and as needed, and thereby to grow flexibly. A scale-out infrastructure is obviously one of the keys to supporting new, dynamic, and unpredictable capacity demands.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>Basic level is limited to two controllers.</li>
<li>Ability to add controllers, cache, and storage devices horizontally.</li>
<li>Ability to add performance and capacity resources independently and dynamically, and for the system to automatically absorb and reconfigure new assets as part of the resource pool in real-time.</li>
</ol>
<ul>
<li><strong>Scalable Multi-tenancy</strong>
<ul>
<li><strong>The combination of scaling and multi-tenancy provides additional benefits in terms of greater consolidation per physical asset, which can reduce hardware sprawl (and hence both CAPEX and associated OPEX) and management </strong></li>
</ul>
</li>
</ul>
<ol>
<li>Basic level only supports two controllers per system</li>
<li>Ability to scale transactions or workloads (themselves being transactional and/or sequential) by adding device, controller, cache and/or IO capabilities. This can be characterized as predictable scaling.</li>
<li>Dynamically integrate new assets into the overall resource pool and redistribute/reconfigure to automatically optimize the performance/SLAs of diverse and high-service-level workloads based on QoS policies. This can be characterized as the dynamic ability to address and manage unpredictable workload changes.</li>
</ol>
<ul>
<li><strong>High Availability (HA) and Planned Resilience </strong>
<ul>
<li><strong>The benefit here is clear: it is about having tools to preclude the escalating negative impacts of prolonged downtime which is especially harmful to heavily-trafficked shared storage such as that which is found in highly virtualized and cloud environments.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>Failure resilience, which maintains QoS even if major system components (cache, controller, etc.) fail.</li>
<li>Multi-site (anything more than two) remote data replication is crucial to ensuring a rapid return to service should an entire array or site fail (and even tier-1 systems can do so).</li>
<li>“Planned Resilience” concerns minimizing the real operational impact of issues. A “graceful” failure and a rapid return to service are more important than the number of “9s.” Whether it is a cloud or a “regular” data center operation, the idea of staying down for days is untenable.</li>
</ol>
<ul>
<li><strong>Optimized Asset Utilization</strong>
<ul>
<li><strong>The main intent and benefit here is financial impact, including TCO and ROI; better utilization means buying less hardware and also aligns with a pay-as-go model which suits many modern budget cycles well.</strong></li>
</ul>
</li>
</ul>
<ol>
<li>Thin provisioning, so that real capacity is only consumed when data is actually written.</li>
<li>Ability to convert under-utilized (“fat”) assets to optimized (“thin”) utilization; other capacity reduction tools such as deduplication and compression can also apply.</li>
<li>Reclamation, dynamically and perpetually, releases under-utilized assets back to the overall pool.</li>
</ol>
<ul>
<li><strong>Virtualized  and Federated  Resource Pooling </strong>
<ul>
<li><strong>This extends the ability to pool resources and serve workloads/applications across systems and distance, enabling additional economic improvement, flexibility, and even security. </strong></li>
</ul>
</li>
</ul>
<ol>
<li>Simply applies to LUNs/volumes</li>
<li>All LUNs or volumes can span all resources within the system</li>
<li>Virtual storage instances can move transparently and non-disruptively across physical systems within the broader/federated pool, and do so while retaining their QoS/policy/SLA attributes.</li>
</ol>
<p>The goal is to achieve automated operational flexibility and scalability (a.k.a., IT agility) that is combined with the optimum use and re-use of all the resources (a.k.a., business efficiency). When used for mission-critical applications (those demanding top-notch performance and RAS), the criteria above constitute the range of elements that can be combined to produce varying levels of tier-1 storage in new world data centers.</p>
<h1>The Bigger Truth</h1>
<p>Often a paragon of progress in general, the IT industry can be surprisingly conservative or assumptive at times. A set of clear criteria for the prerequisite attributes of tier-1 storage in the new era of computing is a prime example. We don’t have an agreed set of such criteria even though this would help vendors and users alike. Instead, we typically simply assert “tier-1ness” (if you’re a vendor) or you’ll “know it when you see it” (if you’re an IT user).</p>
<p>With so much change occurring in IT and data centers—virtualization, clouds, and a necessary fixation on economic efficiency—this is neither a sensible nor a sustainable state of affairs. You would never think to measure the attributes of a car by the expectations or design specs of the mid 1970s, nor would you seek to procure wireless service for your iPhone or BlackBerry based on the dial-up Internet standards of even a decade ago. Things have changed. And yet those sorts of approaches are how we still tend to judge the uber-premium, absolutely positively, high end, tier-1 storage. We not only need criteria, we need the appropriate criteria for the emerging data center where uncertainty reigns supreme for storage demands and tier-1 storage had better be able to cope and respond so that users can meet their IT and business needs. Hopefully this paper will be cause for the industry to stop and to think. Hopefully, it provides at least some of the answer.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Note: ESG has already worked to produce some standard criteria and attributes across the three main categories of infrastructure—server, networks, and storage—and would be pleased to discuss and work upon with any interested parties.</p>
<p><a name="_ftn2">[2]</a> Note: this section is an extract from the ESG Market Report, <a href="../../../../../2011/01/the-future-of-storage-in-a-virtualized-data-center/" target="_blank"><em>The Future of Storage in a Virtualized Data Center</em></a>, January 2011.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/defining-tier-1-storage-in-the-modern-data-center/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Application Retirement Options</title>
		<link>http://www.enterprisestrategygroup.com/2011/08/application-retirement-options/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/08/application-retirement-options/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 15:51:44 +0000</pubDate>
		<dc:creator>Julie Lockner</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Julie Lockner]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Application Retirement]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=24144</guid>
		<description><![CDATA[As new applications and technologies flood the marketplace, organizations are hard-pressed to keep up with the rapidly-changing technological landscape.  Companies often find themselves adding newer, more sophisticated applications to their growing arsenal of tools, while continuing to retain redundant or unused ( legacy) applications for fear of losing access to the data associated with them.  [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">As new applications and technologies flood the marketplace, organizations are hard-pressed to keep up with the rapidly-changing technological landscape.  Companies often find themselves adding newer, more sophisticated applications to their growing arsenal of tools, while continuing to retain redundant or unused ( <em>legacy</em>) applications for fear of losing access to the data associated with them.  While these applications have long since outlived their original purpose, they continue to absorb valuable time and resources, impact overall organizational agility, and add unnecessary financial burdens to already-strained IT budgets.</p>
<p>For most organizations, the solution to the unnecessary financial drain on data centers is obvious—retire unused applications and implement a solution that retains access to the data held within these applications.  But, before such an effort can be undertaken, it is necessary to identify those applications that are suitable for retirement, analyze the needs of various entities who must access the data stored in those applications, and develop a comprehensive plan for ensuring that the data in the applications can be accessed and leveraged as necessary, while retaining the original data in order to meet compliance standards.</p>
</div>
<private_premium>
<h1>Introduction</h1>
<p>As new applications and technologies flood the marketplace, organizations are hard-pressed to keep up with the rapidly-changing technological landscape.  Companies often find themselves adding newer, more sophisticated applications to their growing arsenal of tools, while continuing to retain redundant or unused ( <em>legacy</em>) applications for fear of losing access to the data associated with them.  While these applications have long since outlived their original purpose, they continue to absorb valuable time and resources, impact overall organizational agility, and add unnecessary financial burdens to already-strained IT budgets.</p>
<p>According to a recent poll, ESG found that 86% of all enterprise organizations currently run legacy applications within their data centers (see Figure 1).<a href="#_ftn1">[1]</a></p>
<div class="graph_top">Figure 1. Number of Legacy Applications Running in Organizations&#8217; IT Environments</div>
<p><img class="alignnone size-full wp-image-24147" title="AppRetirementf1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/AppRetirementf1.png" alt="" width="644" height="409" /><br />
When those same organizations were asked if they have plans to decommission (retire) these business applications, 50%<strong> </strong>said yes (see Figure 2).</p>
<div class="graph_top">Figure 2. Percentage of Organizations Planning to Retire Applications</div>
<p><img class="alignnone size-full wp-image-24148" title="AppRetirementf2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/08/AppRetirementf2.png" alt="" width="621" height="422" /><br />
For most organizations, the solution to the unnecessary financial drain on data centers is obvious—retire unused applications and implement a solution that retains access to the data held within these applications.  But, before such an effort can be undertaken, it is necessary to identify those applications that are suitable for retirement, analyze the needs of various entities who must access the data stored in those applications, and develop a comprehensive plan for ensuring that the data in the applications can be accessed and leveraged as necessary, while retaining the original data in order to meet compliance standards.</p>
<h1>Identifying Legacy Applications for Retirement</h1>
<p>A legacy application can be thought of as any application in a data center that is no longer necessary to support day-to-day operations.  This could mean an application that does not support any current business activities, or one whose data has been transferred to a newer application while the older application lies dormant, causing unnecessary strain on resources, storage, and manpower.  While most organizations are well aware of the dangers involved in allowing unused legacy applications to continue to clog organizational databases, the question is how do organizations arrive at this situation in the first place?</p>
<p><strong><br />
</strong></p>
<p><strong> </strong></p>
<h2>How Do Applications Become ‘Legacy’ Applications?</h2>
<p>Applications become ‘legacy’ applications for a variety of reasons, although this evolution typically occurs due to:</p>
<ul>
<li>Application replacements or upgrades.</li>
<li>Consolidation projects.</li>
<li>Mergers or acquisitions.</li>
</ul>
<p><strong> </strong></p>
<h3>Application Replacements</h3>
<p>As applications are replaced by newer, more robust ones, older, legacy systems are often no longer considered mission-critical, and are left unmanaged and/or abandoned, introducing a new source of risk and financial strain on IT budgets.  Over time, ownership/assigned responsibility for the management of these applications becomes ambiguous, and the dormant, ‘orphaned’ applications are left to absorb valuable time and resources in the form of expensive backup, file management, and storage costs.  Although the data within the application may need to be retained for regulatory or compliance purposes, the application itself is redundant.</p>
<h3>Consolidation Projects</h3>
<p>As part of broader initiatives, many organizations are consolidating data centers and the applications that reside within them. It is not unusual for organizations to extend consolidation projects to combine individual business processes and existing   IT systems into enterprise-wide applications such as enterprise resource planning (ERP) and customer relationship management (CRM) systems.  While these initiatives offer enhanced capabilities to an organization, without careful planning, they can backfire in terms of actual cost savings.  If data is simply duplicated from one application to a new, more robust application while leaving the original application intact, the end result can double the maintenance workload, thereby defeating the purpose of the entire consolidation initiative.</p>
<h3>Mergers and Acquisitions</h3>
<p>Legacy applications are often created as the result of corporate mergers and acquisitions, as organizations seek to resolve application redundancies and standardize organizational procedures.   Although such an approach is necessary  in order to create a unified corporate identity,  merged organizations often find it difficult to let go of the ‘old way’ of doing things and, as a consequence,  continue to maintain ‘backup’ systems and applications as a safeguard from losing any of their original company data.  Again, this duplication of data adds even more strain on already-taxed data centers.</p>
<h2>Why Do Organizations Retain Outdated Applications?</h2>
<p>Any time an organization continues to maintain applications that are no longer used in day-to-day operations, the underlying factor is usually the fear of losing access to important data by closing down the legacy application.  In earlier years, that was a valid fear.  When some older applications were shut down, the data contained in those applications was no longer accessible.  Formats were incompatible with newer applications, and valuable data was lost.  In today’s technology, however, literally hundreds of technology companies spend countless hours developing faster, newer, easier ways of accessing valuable data through industry standard reporting tools.  Today, regardless of the original application in which data was created, these reporting tools enable users to access and view the data in a user-friendly manner.</p>
<p><strong> </strong></p>
<h1>Planning for Application Retirement</h1>
<p>Application retirement solutions offer a cost-effective way to migrate data from archaic or unsupported formats to a supported format, while maintaining business context and some level of end-user access.  These solutions are similar to database archiving solutions in that they have options for how database data is stored in an archive, but differ in that the original, legacy  application is replaced by a comparable user interface and can be completely retired.</p>
<p>When defining an application retirement project, it is vital that all business users (end-users, IT operations, and legal/compliance) provide input in order to choose the most appropriate retirement solution.</p>
<h2>Gathering End User Data Access Requirements</h2>
<p>When determining your approach for retiring an application, several key questions must be asked of the end-users who must access this data:</p>
<ul>
<li><strong>How long does the data need to be retained to support business / operation processes? </strong>The longer the retention period, the more the solution architecture needs to account for potentially significant data volumes and technology upgrades or obsolescence.  This will help analysts determine whether to keep the data online in a database or move it an archive and media options such as online, nearline, or offline.</li>
<li><strong>Does the application data need to be accessed in the context of the original business application? </strong>If data has to be kept for an extended period of time, does it need to be accessed with the original application, and is the format  relevant? Consider the implications if the application is retired or upgraded. Will the data still need to be viewed? If so, this may have an impact on whether your organization has the option of archiving the data to a non-database format.</li>
</ul>
<h2>Technology Obsolesence</h2>
<p>When determining the approach for retiring an application, several key questions must be asked of the IT groups who must provide infrastructure for this data.  It is also important to understand the underlying technology containing the stored information and to consider that physical media, operating systems, and proprietary platforms have a definitive shelf life.</p>
<ul>
<li><strong>How often will the data be accessed over time, and what are the related performance expectations? </strong>Migrating or archiving less- frequently accessed database data to lower cost infrastructure for storage is an excellent option for reducing the overall cost of managing the data. If the number of users and the frequency of accessing the archive data will be relatively high, IO will need to be factored in when selecting the target architecture for the archive data. It is typical to limit access to the archive data to a smaller set of super users or administrators. If few users will be accessing the data and the number of times the data will be accessed is limited, it is a good candidate for lower cost storage with lower performance characteristics. Even if access is expected to be minimal-to-none with practically no users, the data may need to be accessed during e-discovery—online access or searchability may drive the target architecture.</li>
</ul>
<h2>Gathering Records Retention Regulations/ Compliance Requirements</h2>
<p>Legal and Records Management departments have a stake in application retirement projects.  Retaining application data beyond legal requirements introduces risk just as much as not retaining data when it is legally required to.</p>
<ul>
<li><strong>What compliance/regulation policies are driving extended retention periods for the legacy application? </strong>Corporate records managers often maintain a records retention schedule that lists the corporate records types, information owners and custodians, regulations that lists the retention statues, and the retention period definition.  In any application retirement project, it is essential that data that needs to be retained is not deleted, and that data that does not  legally need to be retained is deleted.When both the business stakeholders and IT staff have a solid understanding of the regulations or compliance requirements, policies and procedures for retiring legacy applications can be defined in a way that makes it easier for architects to design a solution that can be standardized across the enterprise. Without this common approach, organizations may miss out on bigger cost-cutting opportunities and introduce unnecessary complexity, which opposes the overall goal of an application retirement project.</li>
</ul>
<h1>Choosing the Appropriate Application Retirement Solution</h1>
<p>Based on the usage/retention requirements provided by the business user community, your organization can pinpoint the most appropriate application retirement solution.  Several approaches can be used, and different vendors offer different archive data storage options, including:</p>
<ul>
<li>Migrating the legacy data to a separate database.</li>
<li>Storing the data in an online proprietary file.</li>
<li>Storing the data in an open standard file.</li>
</ul>
<h2>Migrating to a Separate Database</h2>
<p>Archiving legacy data to a common online database repository allows organizations to manage this data by using existing skilled DBA resources and technology available in the data center.<em> </em>End-user access to the data is minimally impacted and data retains its structure and format when moved from the legacy source to a target database and is the most commonly deployed approach in application retirement projects<a href="#_ftn2">[2]</a>.</p>
<p>This approach requires understanding the data model, creating an exact replica of the data in the target database environment, and either creating a new application to access the data or leverage an existing reporting tool.  The level of effort needs to be carefully considered.</p>
<p>With this solution, organizations should consider that they will still need to manage the legacy data in the target database using data center technology.  It may require purchasing another database license,  or additional storage and compute resources.   Because the data is stored in a database, the database file itself is not in a read-only format.  The data can be stored in a read-only table, but the physical storage prevents writing to write-once, read-many (WORM) type media.  If the regulatory requirements specifyWORM-type media storage requirements, this approach is not an option.</p>
<h2>Storing Data in a Proprietary File</h2>
<p>This option entails storing legacy data in an online accessible, secure archive proprietary file format.  This option potentially offers significant storage reduction through high data compression rates. Most vendors with a file-based archive solution provide open-standard database connectivity drivers, such as ODBC/JDBC, allowing end-users to maintain access to the data using existing reporting tools.  Once data is archived to the file format, the legacy application and database can be retired, eliminating the cost and maintenance previously required.</p>
<p>This approach also requires an understanding of the data model, a process for copying the data to the archive file, and a new or existing application to access the data.  Because the data is stored in a closed file, regulatory requirements that specify WORM-type storage media can be met using this option.</p>
<p>Read and write performance varies among solution providers that archive database data to a file.  End-user access requirements should be considered and weighed when evalulating this solution approach.  Additionally, access performance may be significantly deteriorated if the data is stored on slower storage media.  Administrators  will need to extend database security policies to data residing in a file and because the file format is proprietary, there is an added cost associated with vendor  maintenance contracts for data access driver updates.</p>
<h2>Storing Data in an Open Standard File</h2>
<p>To eliminate any technology dependency, another approach involves archiving legacy data to an open standard format for long-term readability that is vendor-agnostic.  Formats such as comma-separated values (CSV) and Extensible Markup Language (XML) are commonly used for this reason.  With the plethora of tools available in the market that can read CSV and XML file documents, organizations can sunset archaic, legacy systems and access the archived data using open-standard technology.</p>
<p>In some cases, such as with XML, the storage requirements may increase significantly.  When archiving to XML files, each data element is tagged with metadata and should be considered when the data volumes are large.  However, organizations can take advantage of new application development environments that allow the rapid deployment of end-user reporting interfaces with minimal effort.</p>
<p>Another consideration is that access performance may be  significantly deteriorated and administrators will need to extend security policies to data residing in a file.</p>
<h2>Additional Application Retirement Options</h2>
<p>The benefit IT organizations seek from retiring legacy applications is to reduce or eliminate the cost and effort of maintaining systems that are no longer needed to support day to day activities.  As a result, the options listed above are just a subset of what has been implemented to achieve this goal.  Below is a list of options that organizations have deployed with varying success:</p>
<ul>
<li><strong>Data will be archived to tape.</strong></li>
</ul>
<p><strong>Pros:</strong> Minimal maintenance cost will be associated with storing the legacy data for extended timeframes.  <strong>Cons:</strong> End-user access is limited and data migration should be planned if the the retention period extends beyond the tape media’s shelf life.</p>
<ul>
<li><strong>The entire application and data set will be archived as a virtual machine.</strong></li>
</ul>
<p><strong>Pros:</strong> Data can be accessed in the native application, retaining business context by simply starting the virtual server application.</p>
<p><strong>Cons:</strong> This approach is limited to operating systems supported by the virtual server vendor technology.</p>
<ul>
<li><strong>Shrink wrap the physical hardware and store it in a closet.</strong></li>
</ul>
<p><strong>Pros:</strong> Data can be accessed in the native application, retaining business context by starting the server hardware and software application.</p>
<p><strong>Cons:</strong> This approach depends on the working condition of the physical hardware and requires a specialized skillset in the legacy technology platform.</p>
<ul>
<li><strong>Migrate to a Software as a Service Model.</strong></li>
</ul>
<p><strong>Pros:</strong> Data will be made available when needed, eliminating the cost and risk of physically storing the data on premises.</p>
<p><strong>Cons:</strong> This approach introduces a dependency on a third party to maintain technology and skillsets required to keep the data available based on pre-defined service level agreements that have a corresponding cost associated with the effort.</p>
<p><strong> </strong></p>
<h1>The Bigger Truth</h1>
<p>Organizations today rapidly change technologies, and applications can quickly become ‘legacy’ systems (systems that no longer support day-to-day business activities) during events such as mergers and acquisitions, application upgrades and migrations, and data center modernization initiatives.  However, because of compliance and regulatory standards that govern many organizations, these unused legacy applications often are allowed to remain within the organization’s system,  inhabiting valuable storage space and tying up resources that could be better utilized in other areas,  while providing no intrinsic value to the organization.</p>
<p>Rather than continuing to maintain these unused legacy applications, organizations are now realizing the financial viability of deploying application retirement solutions as soon as any system is no longer used in day-to-day operations,  solutions that enable them to compress and archive the related data and retire the unneeded applications .</p>
<p>Numerous hardware, software, and service providers<strong> </strong>now offer entire business suites devoted solely to retiring unused applications while enabling easy access to the related data via standard reporting tools. Such an approach means that end-users gain significant efficiencies by having both old and new data available, ensures that the data remains in the proper format to support ongoing business processes, and fulfills regulatory and compliance requirements.</p>
<p>However, in order to select the most appropriate means of retiring an application, the application’s business users (end-users, IT, and legal/compliance teams) must work together to accurately categorize this data, define data access requirements, retention/compliance requirements, and identify any relevant SLAs.  Armed with this information, an informed decision can be made as to the type of application retirement solution most appropriate for the organization’s unique needs.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source:  ESG Research Report, <a href="../../../../../2011/01/2011-it-spending-intentions-survey/"><em>2011 IT Spending Intentions Survey</em></a>, January 2011. All research references are taken from this report unless otherwise stated.</p>
<p><a name="_ftn2">[2]</a> Source: ESG Research, <em>Application Retirement</em>, Report coming August 2011<br />
<br /></br>
</private_premium>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/08/application-retirement-options/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Logic and Value of Tiers of High Performance Storage</title>
		<link>http://www.enterprisestrategygroup.com/2011/06/the-logic-and-value-of-tiers-of-high-performance-storage/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/06/the-logic-and-value-of-tiers-of-high-performance-storage/#comments</comments>
		<pubDate>Wed, 29 Jun 2011 14:47:06 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[IT Operations]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[tiering]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23229</guid>
		<description><![CDATA[This paper makes the case for extending storage tiering into the “enhanced performance” IO space; the sense of such a move might make one wonder why it hasn’t been done before. Tiering: What, Why, and Where In the well known and often discussed “storage hierarchy,” we’ve effectively assumed that all really high performance data is [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">This paper makes the case for extending storage tiering into the “enhanced performance” IO space; the sense of such a move might make one wonder why it hasn’t been done before.</div>
<private_standard>
<h1>Tiering: What, Why, and Where</h1>
<p>In the well known and often discussed “storage hierarchy,” we’ve effectively assumed that all really high performance data is born equal. We happily angst over the precise placement of “regular” production data onto various forms of spinning disks because <em>we acknowledge that a) different data has different needs and b) different applications and businesses and times require varying attributes from a storage system. </em>And yet, somehow—and for no good reason—if data is deemed to be ultra important in terms of requiring higher performance or extra IO throughput, we are content to either just throw it into homogenous SSD and/or to throw extra disks at the problem, wasting capacity (which means wasting money) and potentially suffering unwanted latency.</p>
<p>These practices need to change with the constant change in IT demand. Performance-oriented data is no more homogenous than “regular” production data; it’s often the same data, just at a different stage of its life.<a href="#_ftn1">[1]</a> We wouldn’t treat all people the same at a certain age—nor should we do that with data. What constitutes high performance data—and therefore the tools we use to serve it—will flex, just as it does across the rest of the hierarchy. If you were choosing a car, you wouldn’t want the choice to be just a Lamborghini or a minivan—nor should storage have a single, static performance tier. It’s bad for different types of data/applications and bad for budget. Granularity and appropriateness are the watch words; as the demands and expectations on IT stretch its ability to deliver, all the major industry waves (such as cloud and virtualization) are adding more pressure to storage. And, consequently, performance IO (like any IO) needs to be served by its own hierarchy.</p>
<h2>The What and Why of Storage Tiering</h2>
<p>The cold hard fact about storage tiers—or storage tiering<a href="#_ftn2">[2]</a>—is that they are simply a means to an end and not an end in themselves. The reason tiered storage is becoming more crucial (indeed, a prerequisite for many organizations) is a simple matter of economics. After all, if storage were free, then everything would logically be stored on the fastest, best devices available. Capacity would be unlimited and cost no issue. Yes, there are some practical issues around connectivity, protection, and access, but you get the point: the only reason we have long sought the nirvana of an effective storage hierarchy is that storage isn’t free and, consequently, we need to make choices and allocate data to appropriate levels and types (i.e., costs) of storage. Indeed, the situation has been exacerbated because the gradual improvement in disk performance has trailed that of CPUs and networking by orders of magnitude over the last few decades: we are putting ever more capacity on disks that are less than ideally suited to efficiently serving that data back to applications.</p>
<p>Because of this , and also because the  rate of storage growth is outstripping the rate of its price decline, tiering and storage economics are back on the agenda—still not because they are inherently desired, but because of what they can deliver: namely, lower costs, better business outcomes, and a pragmatic route to cope with massive storage growth. These are the “economics of necessity” and <em>they apply just as much to the optimum placement of performance IO as they do to the maximum packing of pure capacity</em> <em>data </em>elsewhere in the storage hierarchy (itself served by numerous technologies, a “capacity hierarchy” if you will, including large SATA drives, deduplication appliances, removable disk and tape). Of course, once you have a good range of media options for the tiers as well as the tools to make effective tiering feasible (and the economics compelling), users can also start looking at the “economics of opportunity”—in other words, if tiering is something that can be <em>embraced rather than avoided</em>, then additional value can be sought and overall efficiency can be increased. Tiered storage can move from something that users do because they <em>have</em> to (as little as they can get away with) to something they do because they <em>want</em> to. Prima facie, it’s obvious what high performance storage can do for demanding applications (these are typically those that drive high IO to handle extremely frequent transactions and/or are ultra sensitive to latency); however, what’s also invariably true is that a judicious amount of high IO and performance  capacity can do wonders for  driving better storage economics. Put simply, we can use extremely high performance storage—that is, solid state storage in some form or fashion—to manage costs. And with performance demands growing dramatically, driven by an online, integrated and consolidating IT world supporting ever more IO-voracious and latency-sensitive applications, it is simply a waste of money to not use as wide a range of storage tiers as appropriate.</p>
<h2>Why Tier? The Academic Answer</h2>
<p>The storage hierarchy has been discussed for years and can be succinctly summarized  as having the right data on the right device at the right time for both the data type and lifecycle requirements. This delivers value as follows:</p>
<ul>
<li>Having the right data on the right storage device at the right time matches application needs</li>
<li>Better utilization of all storage assets (which can, logically, also lower floor space needs and power needs)</li>
<li>Reduced management resources (despite, and because of, better alignment of data to tiers)</li>
</ul>
<p>All of this can help reduce costs, which—whatever else is going on in IT—remains the number one business initiative impacting IT investments, as the ESG research in Figure 1 shows.<a href="#_ftn3">[3]</a></p>
<div class="graph_top">Figure 1. Business Initiatives That Will Impact IT Spending Decisions, Three-year Trend</div>
<p><img class="aligncenter size-full wp-image-23232" title="TieredStorageF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/TieredStorageF1.png" alt="" width="620" height="443" />Part of the attraction of high IO performance tiers is that they can help with not only the number one issue (overall costs), but also the number two issue (business process improvement). Standard tools and implementations cannot deliver this dual value. By targeting the <span style="text-decoration: underline;">critical applications</span> in any given business, tiered performance storage has the potential to not only address but also to be beneficially associated with positive business outcomes. These might range from improving banking customer satisfaction, to lowering the cost of service in retail, to discovering more oil, to improving the speed and depth of data mining, to having more simulations for better weather forecasts, and to the faster testing of new products.</p>
<h2>Why Tier? The Real World Answer</h2>
<p>In today’s competitive business environment, customer satisfaction directly impacts overall success; very often, those levels of customer satisfaction could be improved if applications (online service, retail, delivery of analytics, etc.) ran faster. So far, so good. But what’s interesting is that tiered storage can make a significant contribution to increasing storage efficiency, which can in turn drive improved application performance. How? For instance, IT managers can tier their storage for efficiency by adding cheaper, higher capacity storage where high performing storage is not required; obviously, not all applications need high-end storage. It’s the judicious use of high-performance storage tiers that can therefore be a part of driving efficiency throughout the storage infrastructure. As the ESG research in Figure 2 shows, respondents clearly recognize that tiered storage is a crucial aspect in improving storage efficiency. Moreover, tiers of performance-oriented storage can also contribute to other factors, such as reducing overall storage capacity by precluding the wasteful over-provisioning practices that are the inevitable collateral damage of traditional methods that are used to fight the performance war.</p>
<div class="graph_top">Figure 2. Technologies Contributing to Improved Storage Efficiency</div>
<p><img class="aligncenter size-full wp-image-23233" title="TieredStorageF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/TieredStorageF2.png" alt="" width="615" height="401" /></p>
<h2>Where in the Hierarchy should Tiers Exist?</h2>
<p>As is plain by now, it is logical to tier everywhere (wherever the value of doing so outweighs the cost) and yet our ability to draw a pretty hierarchical triangle has done us the disservice of allowing us to think we have been doing so. The next section explains how the self-deception has been perpetrated and how today’s tools are beginning to make true tiering something that users should demand because it’s finally becoming possible <em>not just for “regular”  data, but for the performance data, too.</em> Tiering up and down the storage hierarchy is simple common sense and yet most focus to date has been at the “mid” levels, ignoring the high performance and high capacity parts of the data spectrum.<a href="#_ftn4"><strong><strong>[4]</strong></strong></a> In other words, most users could do with more tiers rather than fewer, especially at the top and bottom of the storage pyramid, where, respectively, performance and capacity needs are the greatest and, served optimally, can have the most impact on IT and business performance</p>
<h1>Storage Tiering</h1>
<h2>History and Opportunity</h2>
<p>Figure 3 shows very simply, yet starkly, how we arrived where we are; it also shows why there is a need for more complete tiering at all levels of the data hierarchy.</p>
<div class="graph_top">Figure 3. The Evolution of the Storage Hierarchy</div>
<p><img class="aligncenter size-full wp-image-23231" title="TieredStorageF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/TieredStorageF3.png" alt="" width="606" height="426" /></p>
<ul>
<li>Picture 1 portrays the “perfect story” spun for decades; however, it has never properly existed and would actually be better represented …</li>
<li>… by the sort of wedding-cake diagram shown in Picture 2 in which the “steps” from one storage tier to another can be seen to be large. In the real world, such moves (such as that from disk to tape) were for decades also awkward, largely manual, and, hence, to be avoided as much as possible.</li>
<li>As shown in Picture 3, the moves were pretty much one way, almost what could be called a storage “lower-archy!”</li>
<li>Over the years, we have also added more gradations for active data (Picture 4) in terms of differing types of disk drive (speed, capacity, caching, etc.), but the top and bottom layers of the data “cake” have been pretty much left alone: solid state disks and tape.</li>
<li>What we need to do as an industry is to extend additional layers (see Picture 5) in these places, too.<a href="#_ftn5"><strong><strong>[5]</strong></strong></a></li>
<li>What you end up with is a gradual smoothing of the layer cake from top to bottom so that it begins to look (as Picture 6 shows) rather like the original vision we had decades prior. In some instances, it will be abundantly clear (for applications that have known, extreme IO performance or latency-sensitive needs) that certain data should reside permanently on an ultra-performance tier. Another option is to employ some form of tiering software that moves the most deserving or demanding data at any given time to the highest available storage tier. Both options have their place; the former is very specific and absolutely guarantees the best service for certain data; the latter is a more generic approach that generates a lesser degree of performance improvement across a wider amount of data. Even this distinction embodies the essence of why performance storage tiers should be employed.</li>
<li>The final element is shown in Picture 7 whereby technologies such as deduplication, thin provisioning, and solid state/caching can be applied to shrink the pyramid for a given workload.</li>
</ul>
<h1>Tiered High Performance Storage: Market Value</h1>
<p>Implementing a better approach to tiering performance data has important benefits for users:</p>
<ul>
<li><strong>Cost advantages</strong> are delivered by a greater flexibility to marry specific performance to specific application need and more generally to do so economically by precluding the wasteful over-provisioning of today’s band-aid methods</li>
<li><strong>Management ease</strong> can come in a variety of flavors, each of which has merit and each user should evaluate what would suit their environment best, either overall or perhaps on an application-by-application basis:
<ul>
<li>One approach is to have a variety of tiers that are used transiently—either via migration or caching depending on need. This is a generic ecosystem approach that provides some improvement across all the storage and applications.</li>
<li>Another method is to have data allocated to a tier; in other words, matching a workload to an appropriate level of performance storage. This application approach delivers the certainty and consistency of maximum prescribed/available performance for that application.</li>
</ul>
</li>
</ul>
<p>Each of these approaches can have varying media within it, delivering a genuine hierarchy of performance IO layers with a range of latency and throughput characteristics. In terms of the media, current popular offerings are DRAM (protected against volatility) and various types of NAND flash, with hybrid offerings as an intriguing possibility to mix, match, and automate the placement of data within a performance tier.  DRAM has the lowest latency and exceptional endurance, so that it might best serve write-intensive or latency-sensitive applications such as OLTP or DBMS; meanwhile, NAND flash (itself available in different types of capacity and performance, much as with spinning disks) is typically more suited to read-intensive applications.</p>
<p>In all industries, from finance (e.g., trading and banking) and telcos (e.g., billing, CRM and usage) to manufacturing (e.g., BI and ERP)<strong><em> </em></strong>and web hosting (e.g., rendering and analytics), there is a need for enhanced performance for some application or other.</p>
<p><strong>A Note on Virtualization: </strong>The potential for positive impact (both in an IT [SLA] and a business [ROI /TCO] sense) is obvious because, as previously mentioned, the need for performance is growing rapidly. And the current trend toward virtualization is only driving this need further and faster. ESG research<a href="#_ftn6"><sup><sup>[6]</sup></sup></a> found that performance issues were the third biggest storage challenge related to server virtualization for those users at the “basic” level (early stage) of virtualization; while those users in the more mature (“progressing” and “advanced”) stages of adoption have performance <em>relatively </em>under more control (in terms of it being a challenge), they have <em>absolutely </em>more need for it. Basic server virtualization users have only deployed mission-critical applications (those that would typically need performance IO) in 13% of cases, a number that grows to 37% of progressing users and 75% of advanced users. And server virtualization in general stresses storage needs overall, increasing pressure to make the storage environment as  efficient as possible; something that performance IO tiering could help with as most users could uncover and release free capacity if they handle their most crucial IO better. It is no surprise that when ESG has researched users’ proclivity for tiering, it ranks higher among organizations that also cite cost reduction as a major factor impacting their storage spending.</p>
<h1>Designing Tiered High Performance Storage</h1>
<p>So far, we have observed two things:</p>
<ol>
<li>A tiered approach for highly demanding performance data  makes sense in terms of the hierarchy and the simple fact that data isn’t all the same even in its highly active phase (no more so than when it becomes archived).</li>
<li>The sheer size and growth of the overall IT performance expectations and burgeoning rich content makes the ability to add better service levels and better economics a business imperative.</li>
</ol>
<p>So, with these two things determined, what should users be looking for—and indeed demanding from—vendors in a tiered performance-storage offering?</p>
<ul>
<li>First and foremost, the solution should be easy to use and encompass multiple tiers. This provides a range of price and performance (latency, throughput) which is the essence of tiering.</li>
<li>An implementation that does not require tuning between layers is preferable and will avoid hotspots.</li>
<li>A host of practical best-practice aspects should be sought:
<ul>
<li>What is the data protection regimen?</li>
<li>High availability options are necessary for this level of application, especially when opting for the fixed tier approach.</li>
<li>Easy and extensive scalability is clearly advantageous.</li>
<li>Self-management and single-pane of-glass control will minimize administrative effort and complexity.</li>
</ul>
</li>
</ul>
<h1>The Bigger Truth</h1>
<p>A lengthy closing argument here could be construed as “over-egging the pudding.” Given the scale of the performance-hungry data universe and the need for all users to find economic efficiency wherever they can, <em>having a tiered IO high- performance data layer just makes plain common sense. </em>The only small caveat is that the industry must deliver (and users must demand) it in a manner that is easy to implement.   <em> </em></p>
<p>While we all “get” the idea of data lifecycles and the various demands of different applications, trying to implement and serve them without multiple storage tiers is sub-optimal at best and financially reckless at worst. And as service demands, budget constraints, and competitive pressures increase, tiered performance layers in the storage hierarchy are moving from a common sense IT improvement to an absolute business imperative. Hopefully, users and the vendor community are paying attention.</p>
<hr size="1" /><a name="_ftn1">[1]</a> This is a generalization and does not hold true for all cases, such as with databases where the metadata is accessed constantly and the actual data not necessarily so.</p>
<p><a name="_ftn2">[2]</a> NOTE &#8211; This paper is focused on the considerable logic and value of having varying levels of high-performance IO on varying tiers of media—all the way from cached disk, through various types of flash drives, and up to DRAM-based devices. Sometimes the data placed on such tiers is temporary (automatic storage tiering) and sometimes it is permanent (this is the concept of matching certain data to the right type of solid state storage). Both approaches are valid: the key point is that storage tiers/tiering should be focused on the needs of the data and applications and therefore should be about a lot more than just FC versus SATA versus tape.</p>
<p><a name="_ftn3">[3]</a> Source: ESG Research Report,<em> <span style="text-decoration: underline;">2011 IT Spending Intentions Survey,</span> </em>January 2011.</p>
<p><a name="_ftn4">[4]</a> While this paper concentrates on tiering at the high performance level, a companion ESG Market Report was published to address the corollary need for tiering at the archival level: ESG Market Report, <a href="../../../../../2011/04/the-logic-and-value-of-a-tiered-archive-tiering-across-more-of-the-storage-hierarchy/" target="_blank"><em>The Logic and Value of a Tiered Archive</em></a>, April 2011.</p>
<p><a name="_ftn5">[5]</a> Ibid.</p>
<p><a name="_ftn6">[6]</a> Source: ESG Research Brief, <a href="../../../../../?p=18937" target="_blank"><em>Introducing the ESG Server Virtualization Maturity Model</em></a>, November 2010.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/06/the-logic-and-value-of-tiers-of-high-performance-storage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Structured Data Reference Model: Managing Databases in the Efficient Data Center</title>
		<link>http://www.enterprisestrategygroup.com/2011/06/structured-data-reference-model-managing-databases-in-the-efficient-data-center/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/06/structured-data-reference-model-managing-databases-in-the-efficient-data-center/#comments</comments>
		<pubDate>Wed, 29 Jun 2011 13:35:09 +0000</pubDate>
		<dc:creator>Julie Lockner</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Julie Lockner]]></category>
		<category><![CDATA[Market Reports]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23206</guid>
		<description><![CDATA[In today’s 24/7 global economy, business requires that IT deliver systems that are online and available all of the time. This “Now” culture is pushing real-time operational systems to their limits. While IT organizations need to invest in systems that meet business needs, they also need to be conscientious of the fact that while budgets [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">In today’s 24/7 global economy, business requires that IT deliver systems that are online and available all of the time. This “Now” culture is pushing real-time operational systems to their limits. While IT organizations need to invest in systems that meet business needs, they also need to be conscientious of the fact that while budgets are growing, they may not be growing as fast as requirements. Management is looking to eliminate waste in key business processes as a way to cut cost out of inefficient operations.</div>
<private_standard>
<h1>Business Process Improvement: A Key Priority</h1>
<p>Business process improvement (BPI) aims to reduce waste or variation in a business process, which, clearly, can result in better resource utilization. According to recent ESG research, BPI is high on the priority list for organizations of all sizes: 33% of respondents believe that BPI will have the greatest impact on their organization’s spending decisions over the next 12-18 months<a href="#_ftn1">[1]</a> (see Figure 1).</p>
<div class="graph_top">Figure 1. Business Initiatives that will Have the Greatest Impact on IT Spending</div>
<p><img class="aligncenter size-full wp-image-23209" title="StructuredDataF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF1.png" alt="" width="626" height="462" />One such process that could stand to be improved upon is the management of structured data held within databases. Many organizations use multiple databases across various departments to maintain and process customer data, financial information, and employee records and leverage data for customer relationship management, enterprise resource planning, and business intelligence. These inconsistent data management strategies are a common malady that can have a significant impact on infrastructure costs. With cost reduction initiatives often superseding business priorities, addressing database management can drive significant efficiencies into and cost out of the data center.</p>
<p>Effective data management can ensure data is properly leveraged while minimizing costs. Many businesses focused on transforming IT into a service bureau are held back by an inability to efficiently manage database applications. They need a comprehensive view of current challenges and an understanding of the solutions available as well as the frameworks that would allow them to address these challenges methodically and cost effectively. Data governance and a structured data reference model can help. As part of an overall data governance initiative, applying common best practices can help an organization achieve its goal of improving process efficiency while controlling costs.</p>
<h2>Data Governance</h2>
<p>As data is acquired, transacted, and acted upon to make informed business decisions, it must be stored properly, secured in accordance with corporate and industry requirements, protected in ways that deliver both high availability and business continuity, retained for varying periods of time, and ultimately archived and/or deleted. A corporate data governance framework can provide the communication guidelines, tools, processes, and organizational structure to facilitate proper management of data in accordance with company policy (see Figure 2).</p>
<div class="graph_top">Figure 2. ESG’s Data Governance Framework</div>
<p><img class="aligncenter size-full wp-image-23210" title="StructuredDataF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF2.png" alt="" width="596" height="334" />Business requirements provide the context for data creation, access, and use. Business processes, key performance indicators (KPIs), and employee roles and responsibilities are essentially business strategy translated into actionable tasks and items.</p>
<p>IT infrastructure is the technical component—comprised of servers (processing), networks (distributing), and storage (residence)—that facilitates information management at the electronic layer. Business applications play a key role in turning data into information. Data management encompasses IT management and control of the structure in which data is processed and delivered.</p>
<p>In essence, data and IT governance becomes the Rosetta Stone enabling translation between the needs of the business and information technology requirements. With the proper tools, processes, and a governing body in place to ensure data is managed in accordance to corporate policies and goals, alignment between business and IT can be achieved.</p>
<h1>The Structured Data Reference Model</h1>
<p>The Structured Data Reference Model provides a high-level depiction of the common applications and supporting IT systems that have an impact on database data (see Figure 3). While this may vary from organization to organization and from application to application, generally speaking, four data lifecycles are initiated when data is created: transaction processing, reporting and analytics, backup or disaster recovery, and application testing and development.</p>
<div class="graph_top">Figure 3. “My Data” and the ESG Structured Data Reference Model</div>
<p><img class="aligncenter size-full wp-image-23211" title="StructuredDataF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF3.png" alt="" width="614" height="361" />This illustration only represents a single data flow or business process; organizations typically have multiple business processes supported by multiple database applications with interchanging data flows. To appreciate the challenges IT departments face in building the right database architecture and developing maintenance processes as new applications age, it is helpful to understand two key aspects:</p>
<ul>
<li>The lifecycle of the data held in a database</li>
<li>Infrastructure requirements as the database changes over time</li>
</ul>
<p>A good way to get a sense of the complete application data lifecycle is to follow a structured data set from beginning to end. This report will describe an e-commerce transaction example using a data set called “My Data.”</p>
<h2>My Data Lifecycle</h2>
<p>We follow My Data as it is created, managed, copied, protected, used for multiple purposes, archived, and deleted.  Along the way, this paper describe what happens to the data, what IT must be concerned with, and common approaches to addressing those challenges based on the desire to run an efficient data center.</p>
<p>When data is initially created and stored in the transaction processing application database (see Figure 4), it may be secured, updated, audited, corrupted and then restored, destroyed and then recovered, reported upon, and recorded (becoming “read-only” data). As it ages, the data may be archived, deleted, or left to accumulate in the production database, taking up space.</p>
<div class="graph_top">Figure 4. Transaction Processing Database Data</div>
<p><img class="aligncenter size-full wp-image-23212" title="StructuredDataF4" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF4.png" alt="" width="610" height="258" />Copies of production data exist in the reporting and analytics database and in backup and D/R copies, as well as in each and every copy made for test and development purposes. Copies of My Data used for business intelligence (BI) in decision support processes require technology to copy the data from the source transaction processing applications to a data warehouse and/or analytical database application.</p>
<div class="graph_top">Figure 5. Business Intelligence Database Data</div>
<p><img class="aligncenter size-full wp-image-23213" title="StructuredDataF5" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF5.png" alt="" width="608" height="252" />Depending on the format of the source data and the target BI system, My Data may need to be secured, processed, transformed, integrated, aggregated, and summarized for standard reports or dashboards. If My Data is going to be used for data analytics, it may be loaded into an analytics platform where it will be checked for patterns, analyzed, and modeled—the output of that process may be a new marketing campaign or customer retention program that  is rolled into a production online transaction processing application.</p>
<p>All of this is referring to production applications; these systems will most likely have a data protection plan as part of best practices. Backup and disaster recovery are designed to restore corrupted data or recover destroyed data (see Figure 6). My Data can be backed up, replicated, secured or encrypted, stored on lower cost media such as tape, compressed, archived, and eventually deleted.</p>
<div class="graph_top">Figure 6. Data Protection: Backup and Disaster Recovery Copies</div>
<p><img class="aligncenter size-full wp-image-23214" title="StructuredDataF6" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF6.png" alt="" width="620" height="240" />Many organizations retain several backup copies in addition to the remote location copy. As database data volumes grow, so does the size of each and every backup copy. For every production database application, best practices assume that development organizations will have a copy of each environment for test and development purposes (see Figure 7).</p>
<div class="graph_top">Figure 7. Test and Development Copies</div>
<p><img class="aligncenter size-full wp-image-23215" title="StructuredDataF7" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF7.png" alt="" width="607" height="246" />Each of these test and development copies may contain full copies or subsets of My Data. Test data may be secured or masked, deployed on tiered infrastructure, restored to a point in time, or deleted.</p>
<p>Organizations looking to deploy tiered IT services or provide IT-as-a-Service can use this framework as a starting point for classifying and aligning business needs with infrastructure capability. Within each lifecycle, data may have its own unique performance, reliability, and availability requirements that change over time.</p>
<h1>Key IT Focus Areas</h1>
<p>Organizations have experienced issues and challenges associated with key data events, processes, and applications over the course of the data lifecycle. The balance of this report will cover each area of concern, describe the problem statement, and review common and emerging solutions that address these challenges.</p>
<h2>Data Creation</h2>
<p>The data lifecycle begins at creation, usually with a defined and measurable business process; for example, an online shopper confirming the purchase of an item. Information is gathered and stored to record the customer’s order, move inventory, ship the item, process payment, etc. This data, which, again, we call “My Data,” may include personally identifiable information (PII) such as the customer’s name, billing and shipping addresses, credit card number, item selected, and shipping method. As soon as an organization assumes the responsibility of storing PII, it may be subject to regulatory compliance requirements focused on protecting this data. This drives many service level requirements that translate directly to high security IT features such as data encryption.</p>
<p>My Data is created and securely stored in a production database. This data is of the highest value—it initiates a purchase from the online vendor and therefore must be stored in a secure, high performance, highly available database environment. The environment must be secure to avoid exposing the PII and leaving the vendor vulnerable to legal action. It must be highly available and high performing because customers expect swift transactions and may abandon their purchases if they encounter sluggish response times or process failures. Fast transaction times also enable more transactions per minute, maximizing revenue.</p>
<div class="graph_top">Figure 8. The Lifecycle of My Data: Creation</div>
<p><img class="aligncenter size-full wp-image-23216" title="StructuredDataF8" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF8.png" alt="" width="598" height="155" />My Data may be accessed frequently once it is created. For instance, the online vendor may run a report on open orders that have not shipped, orders placed by customers in various regions, etc. My Data will be used in these reports and distributed to business users as needed. The response time of those reports may be demanding as any disruption to response time could be detrimental to the business process.</p>
<h3>Key IT Concerns</h3>
<p><strong>Security</strong><strong>. </strong>When My Data is created, the IT department must be concerned first and foremost with security. IT must ensure the production system is not vulnerable to attack or failure. Highly publicized infrastructure security breaches (in some cases resulting in total business failure) are pushing organizations to take steps to secure their databases according to upcoming ESG research on database security solutions.<a href="#_ftn2">[2]</a> Many organizations are looking for insight into their database environments so they can identify vulnerabilities, assess data management risks, and ultimately add to or upgrade the infrastructure accordingly.</p>
<p><strong>Growth. </strong>As more data is created and stored in the database, IT must be aware of the impact database growth has on its ability to support the business. If the database outgrows the supporting infrastructure, it can compromise performance, availability, and IT’s ability to maintain SLAs. Queries may take too long and reports may not be completed in time. Adding compute resources or assigning valuable skilled database administrators to constantly tune an application is not always the answer to performance problems associated with data growth. Plus, growth challenges are multiplied as databases are copied for protection, business intelligence, and reporting.<br />
IT needs to proactively monitor and manage databases if it is to stay on top of data growth challenges. Because managing data growth is a top IT priority, it’s no surprise that there are many solutions available to help reduce the amount data being stored or data’s storage footprint. Some increase storage efficiency, classify data for archiving, and/or automatically move data to lower tiers.<a href="#_ftn3">[3]</a></p>
<h2>Data Protection</h2>
<p>Depending on the criticality of the business process, the next activity that may impact My Data is replication for disaster recovery (DR) and business continuity. This occurs immediately after the data has been created so that any issue, whether it is a system failure or user error, will not result in data loss. According to ESG research, when asked which applications would be considered IT’s top priority from a data protection perspective, databases ranked the highest.<a href="#_ftn4">[4]</a></p>
<p>For data to be recoverable in the event of a disaster, it must be replicated to a separate location; this ensures that a site-wide disaster such as a fire or flood will not destroy any data. Replication is done either synchronously or asynchronously (depending on tolerance for time to recovery and point of recovery) so that as data is written to the production database, it is either simultaneously (synchronously) written to the disaster recovery site or done so with a lag (asynchronously). DR sites commonly use security settings and infrastructure tiers that are equivalent to the production site so that if the production server fails, the DR site can immediately take over processing data and deliver the same service levels; this ensures fast recovery and high availability at all times.</p>
<div class="graph_top">Figure 9. My Data is Replicated for Disaster Recovery</div>
<p><img class="aligncenter size-full wp-image-23217" title="StructuredDataF9" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF9.png" alt="" width="597" height="180" />Another important data protection strategy is backup, which differs from disaster recovery and business continuity in that copies of data are made to be used to restore the original in the event of data loss. While disaster recovery and business continuity methods are selected for instant failover or fast recovery, backup methods are generally selected based on their ability to either create point-in-time snapshots of an application or provide long-term retention at the lowest possible cost. In the event of a local system failure or user error, backup copies can be used to restore the database to a point before the problem occurred. Some sites use tape-based backup, which is low cost, but restores are more complicated and time-consuming. Others back up to disk in order to speed restore time, but at greater expense. Organizations may choose to leverage tier-3 infrastructure for backup data to minimize costs over time. This is where the data service catalog is helpful—it enables the business to communicate to IT its expectations for recovery in the case of a failure or disaster event.</p>
<h3>Key IT Concerns</h3>
<p>Service Levels. Data protection service levels are determined by the amount of data that can be lost without significant business impact (the recovery point objective, or RPO) and the amount of sustainable downtime (the recovery time objective, or RTO). Business and IT managers must come together to decide on acceptable service levels, which in turn determine DR and backup methods and strategies. Without the business’s input, IT may over- or under-protect data, which could result in spending more than necessary on software licenses and infrastructure or incurring unnecessary risk.<strong> </strong></p>
<p><strong>Replication Method</strong>. Several methods are available for synchronous replication, including:</p>
<ul>
<li><em>Disk- or block-based replication</em> &#8211; leverages storage technology to move every changed disk block to the DR site using the SAN.</li>
<li><em>Network- or host-based replication</em> &#8211; leverages file-based attributes to determine when a change has been made to the database and therefore must be replicated across the network.<a href="#_ftn5">[5]</a></li>
<li><em>Application-based replication</em> &#8211; uses intelligence built into the database server that identifies when the database record has changed and copies that record to another networked database.</li>
</ul>
<p>When selecting the replication method, organizations consider cost; the performance impact to servers, storage, and networks that can interrupt other production activities; the amount of time it takes to fully restore data from the disaster recovery site back to the production database; and the level of management effort required to accomplish these tasks. Additionally, when looking at a database application technology stack, it is important to ensure that the replication approach is application-aware. If data is replicated from one server to another using a block-based approach and the application server data is replicated using a file-based approach, maintaining consistency between the replicated sites must be coordinated. Similarly, if one set of data is replicated synchronously and another asynchronously, additional measures must be in place to avoid data loss. All of this factors into an organization’s ability to improve business processes while minimizing expense.<a href="#_ftn6">[6]</a></p>
<p><strong>Backup Schedules and Tools. </strong>Database backups are stored for various degrees of time, including indefinitely. Most organizations run weekly full backups and daily incremental backups. Organizations must evaluate their environments and select the appropriate backup method; options include software-only solutions, backup appliances, backup to the cloud,<a href="#_ftn7">[7]</a> and features such as compression and deduplication designed to reduce the total data footprint.</p>
<h2>Data Warehouses and Business Intelligence</h2>
<p>Soon after My Data has been replicated for disaster recovery, it is often copied to a data warehouse for business intelligence and reporting. Tools such as extract, transform, and load (ETL) or enterprise application integration (EAI) are used to copy, manipulate (if necessary), and enter My Data into the data warehouse or other analytical platform. As more organizations recognize the value of operational or real-time business intelligence, these systems become somewhat mission-critical. It is not unusual to see business requirements drive the need for tier-1 service levels focusing on security, protection, and high availability. Some organizations may justify the need to apply disaster recovery strategies to their data warehouses as well. As these data warehouses/analytical data volumes grow to enormous sizes, they may be referred to as “big data” sets.</p>
<div class="graph_top">Figure 10. My Data is Replicated to a Data Warehouse for Business Intelligence &amp; Reporting</div>
<p><img class="aligncenter size-full wp-image-23218" title="StructuredDataF10" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF10.png" alt="" width="602" height="150" />The ability to run analyses and reports on My Data presents the opportunity to learn more about customers’ habits, patterns, and desires, allowing organizations to tailor sales and marketing efforts accordingly. This can maximize revenue while using data for multiple purposes. In many cases, information is summarized or only portions of the data are copied for analysis. Some analytical tools copy an entire database into the data warehouse, whereas others only copy changed data. Online analytics are run against My Data, including sales forecasts, sales region performance metrics, etc. Reports (generally marked as “read only”) are generated and distributed from the data warehouse to interested parties across the organization.</p>
<h3>Key IT Concerns</h3>
<p><strong>Integration</strong><strong>.</strong> When data is replicated for reporting or analytics, business analysts and data architects need to understand how new data will merge with existing data sets. Replicating and integrating data for analysis can be a daunting task when new data sources are added on a regular basis, when the data types vary, when the data volumes are significant, and when the data needs to be analyzed closer to real-time. When data integration needs span several applications and business processes, controlling master data becomes even more crucial. As data integration requirements become more complex, architects need to ensure that the data integration solutions they choose do not sacrifice business agility.</p>
<p><strong>Quality.</strong> As more data sources come online and need to be integrated into an analytics platform, the odds of data quality issues multiply. The cliché, “garbage in, garbage out,” is unreservedly applicable in a decision support system. Organizations need to make sure that decisions are being made on data that is accurate and trustworthy.</p>
<p><strong>Growth</strong><strong>.</strong> As mentioned previously, the expansion of the production database is multiplied by the number of copies created for tasks such as analysis and reporting. IT must keep an eye on database size and take action accordingly.</p>
<h2>“Big Data” Analytics</h2>
<p>As new data sets come online or existing data sets exceed traditional analytical platform capabilities, organizations may need to consider next generation analytics platforms designed to handle data analytics at massive scale. These “big data” systems are commonly deployed in combination with an existing data warehouse and business intelligence framework. Data integration tools may be used to copy and load My Data into an analytical database platform; operations such as regression analysis and predictive analysis can identify patterns and provide insight from masses of data. These analyses can then be used to implement real-time triggers, such as initiating a fraud detection event, identifying an opportunity for marketing in a particular region, or targeting product advertising to population segments.</p>
<div class="graph_top">Figure 11. My Data is Replicated for Big Data Analytics</div>
<p><img class="aligncenter size-full wp-image-23219" title="StructuredDataF11" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF11.png" alt="" width="615" height="187" />Going back to the e-commerce example, the online vendor might want to capture individual customer click stream data and detailed purchase history in order to develop a predictive shopping cart abandonment model. That model would then be operationalized in the e-commerce application by strategically placing ads to incent customers to follow through on a purchase. For these analytics, traditional systems may be unable to scale to the levels required and still remain both practical and cost effective; deteriorating application performance can occur with large data sets, but making additional compute and storage investments may not be possible due to budget constraints. Big data analytics systems are designed to address this type of complex analysis quickly and at scale, but require an initial material investment. Cloud-based solutions are available for organizations that want to mitigate the costs and risks of deploying on-site infrastructure for these large data sets.<a href="#_ftn8">[8]</a></p>
<h3>Key IT Concerns</h3>
<p><strong>Big Data Systems. </strong>The impetus to implement big data systems often arises due to performance and data capacity challenges that occur when traditional systems strain under the burden of massive growth and more detailed analytics. Customers must weigh the costs and benefits of big data systems and consider implementation options. Do they want an architecture based on symmetric multiprocessing (SMP) using shared resources or “shared-nothing” massively parallel processing (MPP) nodes? Are they better off with proprietary systems or commodity components? Equally important, customers need to ensure that they have the right skills in-house if they are to take advantage of these systems for business benefit. Architects should become familiar with emerging technologies and open source solutions, but it would be wise to tread carefully in this nascent market.</p>
<p><strong>New Data Types.</strong> With new data sources come the possibility of new data structures to be analyzed. Unstructured content, such as web-log click stream data, does not lend itself to relational database analytical techniques; Map/Reduce frameworks may offer an alternative approach that is more effective. However, if every new data source brings a new data type, architects need to be concerned about the flexibility of their analytics platforms. Deploying a new platform for every new data type is not practical.</p>
<h2>Database Activity Monitoring</h2>
<p>As part of a standard business process, after data is created, there may be a period of time during which it can be altered or changed. Examples of this might include changes to an online order before it ships, updates to a registered user’s profile data, or modifying stored credit card information. These changes need to be monitored and consistently replicated to DR or backup copies.</p>
<p>In some situations, changes to specific data may require an audit history log. Using the e-commerce example, customers may change their stored default credit card number because they are switching to a new card or replacing an expired card. When the customer alters their credit card information, an audit trigger may be flagged on the database credit card table to capture when the change was made, who made it, and what it was.</p>
<div class="graph_top">Figure 12. My Data is Updated and Audited</div>
<p><img class="aligncenter size-full wp-image-23220" title="StructuredDataF12" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF12.png" alt="" width="609" height="127" />The change history is logged and can be archived to adhere to corporate policies or regulatory requirements.</p>
<h3>Key IT Concerns</h3>
<p><strong>Security and Protection.</strong> When certain information is classified as sensitive data or has auditing requirements associated with its history, developers should consider a solution that captures all necessary information without impacting application performance. Auditing adds processing overhead which may severely impact production performance.</p>
<p><strong>Litigation or Regulatory Response. </strong>Data that is mandated for retention under regulatory guidelines or that is likely to be considered evidence in litigation warrants special consideration for access, documentation, preservation, and potential retrieval and production.</p>
<h2>Marking Data as “Read Only”</h2>
<p>Many business processes are complete when no more changes to the data can be made and the business transaction is recorded. For example, when the online shopper completes the order, the credit card has been charged, and the product has been shipped, the application marks the order status as “closed” to indicate that My Data can no longer be altered. My Data is now marked as “read only.”</p>
<p>Marking data as read only is important to classification of application data. Until this point in the data lifecycle, all changes made to My Data are replicated and reported on in real time. Once the order is closed, no more changes will be made and, therefore, different policies can be applied. For example, My Data no longer needs to be repeatedly backed up, replicated for disaster recovery, or re-loaded into the data warehouse.</p>
<h3>Key IT Concerns</h3>
<p><strong>Operational Efficiency. </strong>When the business can clearly articulate the point at which data becomes read only, IT can leverage that knowledge to implement significant operational efficiencies. For example, in the case of database backups, policies can be applied to no longer perform backups on data that doesn’t change. This can reduce the amount of time it takes to complete backup processes and minimize storage capacity needs.</p>
<p><strong>Data Growth. </strong>As read only data accumulates in production and backup copies, storage capacity requirements grow. Compression or deduplication solutions offer a promise of reduced storage footprint (if this is an option being evaluated, be sure to realistically assess data access performance expectations). These solutions may not perform well in application environments requiring high IO throughput.</p>
<h2>Data Tiering</h2>
<p>Information lifecycle management (ILM) strategies aim to align IT infrastructure with the business value of information as it changes over time to optimize resources and assets at the lowest possible cost. ILM is a key component of efficient, effective data governance.</p>
<p>Tiering matches business requirements with service levels that translate into infrastructure with different cost aspects. For example, tier-1 infrastructure characteristics (including equipment and configurations) offer the highest service levels (such as performance and availability), but this premium category usually comes at a higher price point. As data ages or is copied for various purposes, the business requirements that drive infrastructure choices may change. Organizations can drive cost out of the data center while optimizing the use of premium service assets by leveraging multiple tiers of service. These tiers are typically presented in a data services catalogue that can be used as a tool to translate business requirements into IT services with a corresponding price points. A data services catalog is an important component in a data governance program.</p>
<p>The sorted colored boxes in the reference model (see Figure 13) indicate high level classes of service that demonstrate one way to implement a tiered services framework based on the activities with which a data set is associated.</p>
<div class="graph_top">Figure 13. Tiering Database Data</div>
<p><img class="aligncenter size-full wp-image-23221" title="StructuredDataF13" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF13.png" alt="" width="618" height="270" />Tiers are usually defined by server processor power, storage disk speeds, service level agreements (SLAs) such as promised percentages of uptime and data accessibility, RAID configurations, replication methods, etc. As Figure 13 illustrates, data requiring tier-1 services may not have the same capacity requirements as tier-2 and -3 data.  Organizations that have a good handle on their data requirements can optimize their tiered-storage or server capacity, offering performance when needed at the right price point. Requirements for tiered data services can change as time passes, as data flows through a business process, or as project-based activities require access to a certain set of data for a period of time. Understanding the business-related dynamics for data usage gives IT a significant advantage when designing tools such as a data services catalogue and architecting a tiered services deployment model.</p>
<p>Tiering can be accomplished in many ways. Data can be moved to less expensive servers with slower processing speeds or to storage environments with slower disks, fewer advanced features, less automation, etc. Data can also be moved to a lower SLA tier, such as moving from RAID 1 (where data is protected with mirroring, doubling disk requirements) to RAID 5 (where data is protected with parity striping to use fewer disk resources). Generally speaking, any environment that has slower performance or reduced data availability can constitute an infrastructure tier; these lower tiers may or may not maintain the same high level of security and auditing as production tiers. Moving data from one tier to another can be accomplished either through the database application at the file level or within the storage array independently. Those options depend on the type of storage environment that is deployed.</p>
<h3>Key IT Concerns</h3>
<p><strong>Tiering Options.</strong> It is important to have a clear understanding of data’s usage at this stage. If it is eligible for archiving, data can be moved out of the production environment and into an archive environment. But how to know what data is eligible? Data classification tools can provide guidance and recommendations to help define policies for database tiering and archiving. There are also many database archiving options available from applications and through third parties. These provide the ability to selectively identify, relocate, delete, or archive database data in the context of the business application; end-user access can be maintained either through the native application or with a comparable user interface. Data can be archived in various types of target repositories depending on an organization’s specific needs.<a href="#_ftn9">[9]</a></p>
<p>There are also “in-database” tiering solutions. These complement native database table structures and partitioning features by isolating data within the database object as “hot” or “cold” based on policies. Some of these solutions automatically move hot data to one partition and cold to another; others automatically move a partition to a different location in the underlying storage infrastructure to maximize performance or optimize capacity.<a href="#_ftn10">[10]</a> There are also storage-based solutions that seamlessly move database blocks between different disk media based on policies.<a href="#_ftn11">[11]</a></p>
<p>While tiering data offers significant performance and cost benefits, it is important to select a strategy that will work in context of the business application. Not doing so could introduce the opposite effect: reduced performance and poorly utilized resources.</p>
<h2>Copying Data for Test and Development</h2>
<p>To support the production environment, IT makes copies of My Data (and subsets of My Data) for patching, test, development, training, and sandbox prototypes. This allows them to improve processes, test upgrades before full implementation, train new staff, etc. using data that is as close as possible to live production data. In fact, most organizations make an average of six copies of data. This means that if My Data (including sensitive data such as addresses and credit card numbers) was in production, it is included in all six data copies. Additionally, the method used to create non-production copies can impact how long it takes to deploy new application functionality.</p>
<div class="graph_top">Figure 14. My Data is Copied for Testing</div>
<p><img class="aligncenter size-full wp-image-23222" title="StructuredDataF14" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF14.png" alt="" width="614" height="224" /></p>
<h3>Key IT Concerns</h3>
<p><strong>Security</strong><strong><em>. </em></strong>With multiple copies of sensitive data in the data center, security is of the utmost importance; this process represents the biggest insider theft threat. However, many test and development environments are built on tier-2 infrastructure with relaxed security. Database administrators (DBAs) need access to data and system accounts in order to complete tests on the copies, opening up a security threat. Sensitive data should not be moved to insecure areas; if My Data is not removed before that transition, it should be masked.</p>
<p><strong>Resources</strong><strong>. </strong>Many organizations make full copies even when they don’t need them, consuming vast amounts of storage resources. One way to save money is to consider the types of copies needed, using full copies only when necessary.</p>
<p>The method of creating non-production copies is another important decision, often based on the time it takes to create them as well as the resources consumed by each copy. Can some of the copies be made with data snapshots which use pointers to production data and therefore far less storage? Can IT leverage cloning and snapshot functionality within the storage array to offload the copy process from network and server resources? With these options, IT can often maintain production performance and availability while minimizing copy time and impact to CPU resources and network bandwidth.</p>
<p>Finally, IT must have a plan to ensure there is adequate storage capacity to handle the copies needed and to expand them easily when necessary. Some storage solutions expand easily, quickly, and online while others require disruptive downtime. In addition, IT should match storage features with the activities to be performed, utilizing array automation when possible to optimize resources.</p>
<p><strong>Virtualization.</strong> Virtualization technologies promise dramatic resource utilization and efficiency benefits. IT organizations need to have a solid understanding of the vendor support issues in a virtualized server environment before standardizing on a particular platform.<a href="#_ftn12">[12]</a></p>
<h2>Masking and Testing Data Sets</h2>
<p>Masking is a method of obfuscating the sensitive portions of My Data such as names, addresses, credit card numbers, and data related to corporate financial and human resource departments. For example, name and address information can be randomized to hide the customer’s identity and credit card information can be changed or scrambled in such a way that protects the card number identity while passing credit card number validations. Masked data is used for functional tests for software patches and upgrades, performance tests to evaluate new hardware implementations, staff training, etc. Copies are refreshed by restoring backups (from tape or disk) or with sophisticated network- or storage-based replication techniques. Creating copies of an application can be a time-consuming and complex process, as can restoring from tape-based backups. Restoring from disk backup and leveraging storage and network replication methods are generally much faster and have dramatically increased an organization’s ability to make use of its data copies.</p>
<div class="graph_top">Figure 15. My Copied Data is Masked and Tested</div>
<p><img class="aligncenter size-full wp-image-23223" title="StructuredDataF15" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF15.png" alt="" width="605" height="212" /></p>
<h3>Key IT Concerns</h3>
<p><strong>Security. </strong>Again, the key concern here is ensuring that sensitive data is protected in the non-production environment. Should an organization allow its sensitive customer information to be vulnerable, it is liable for severe legal ramifications. As more organizations outsource test and development activities to off-shore, off-premises facilities, the risk is magnified. Many organizations manage development activities in less secure environments and therefore may lack visibility into who accessed what data and what they did with it. These realities emphasize the need for masking or protection.<strong> </strong></p>
<p><strong>Retaining Functionality. </strong>When data is masked, there is a risk that its relational integrity may be compromised. The application data model may have interdependencies based on actual data values—if data is masked without considering this, an application copy could be rendered useless. Additionally, if data is masked in such a way that it impacts the test validity, application quality could be at risk. Some data masking solutions offer prepackaged application metadata support while others provide rich data substitution algorithms. When evaluating data masking solutions, be sure to understand the nature of the tests being conducted in relation to the application data model.</p>
<h2>Archiving and Deletion</h2>
<p>Now that the organization has extracted the maximum value out of My Data at all stages of the lifecycle, the various data sets can be archived on tier-3 infrastructure. In the case of the e-commerce example, once the order has been closed, the secure production data can be archived, as can the audit logs of data changes and updates. The backup data, business intelligence data and reports, and data copies used for testing, development, and training can all be sent to an archive.</p>
<p>Data is generally archived once it has aged and becomes less valuable. In addition, industry and corporate governance regulations can mandate archiving and retention periods. Other regulations mandate data deletion according to event triggers such as reaching the maximum retention period. Eventually, product, backup, business intelligence, and test and development data sets may be deleted to free up storage capacity.</p>
<div class="graph_top">Figure 16. My Data is Archived or Deleted</div>
<p><img class="aligncenter size-full wp-image-23224" title="StructuredDataF16" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF16.png" alt="" width="621" height="275" /></p>
<h3>Key IT Concerns</h3>
<p><strong>Determining When to Archive and Delete.</strong> While corporate and regulatory mandates are good guidelines, organizations are often aided by classification tools to help them decide when to archive based on business process needs and data usage. Existing or impending problems stemming from enormous data growth often trigger this type of data management. Classification tools can capture and report on data in each database table, row, or column and measure user access over time. These tools are often available as part of a consulting assessment or included in a broader software solution.</p>
<p>Similarly, application retirement solutions offer a cost-effective way to migrate data from archaic or unsupported formats to more modern tools while maintaining business context and some level of user access. These solutions are helpful when a database application is no longer used to support an ongoing business process but still contains data subject to legal or regulatory retention requirements.</p>
<p><strong>E-discovery. </strong>Data not subject to regulatory scrutiny or current litigation should be examined for applying records management policy, including timeframes and guidelines for its retention and even potential decommissioning. Since all electronically-stored data is potentially subject to litigation and regulatory requests can be broader than mandated retention for compliance, this data should be evaluated in terms of its risk and business value to determine whether it should be retained or deleted defensibly to avoid having to produce it in the future.</p>
<h1>The Bigger Truth</h1>
<p>Deploying a Structured Data Reference Model is an essential part of an enterprise-wide data governance initiative. When the focus is on IT optimization or running an efficient data center, ILM strategies are integral to the data governance framework. By mapping data service tiers to data value, organizations can minimize costs while ensuring the right levels of availability, protection, and performance at each lifecycle stage.</p>
<p>Once the underlying reasons for each step in the chain are understood, it is often helpful to apply a consistent model to all data within the database application. For example, an employee’s personal data contained in the human resources database includes information about salary, their Social Security number, medical history, etc. As mentioned, copies of this data are made for backup, for data warehousing and analysis, and for testing and development. All copies—whether on disk or tape, or in a data warehouse or archive—should be handled with the same care as data in the production database.</p>
<p>Similar consistency should be applied to all database applications across an organization’s departments. As Figure 17 shows, once a model is defined, it can be leveraged across multiple departments for enterprise resource planning, customer resource management, business intelligence, etc. The consistency of this model is another way to ensure that data is treated in a way that protects it from loss, corruption, or theft, protecting the organization from legal vulnerabilities.</p>
<div class="graph_top">Figure 17. Deploying a Consistent Structured Data Reference Model</div>
<p><img class="aligncenter size-full wp-image-23225" title="StructuredDataF17" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/StructuredDataF17.png" alt="" width="601" height="270" />Once organizations utilize a common structured data management model across all business processes and underlying applications, deploying and enforcing data governance initiatives enterprise-wide becomes a much easier pill to swallow. Following My Data through its lifecycle reveals how critical structured information assets are to business decisions, productivity, and ultimately profit. Understanding the technology considerations related to data from “cradle to grave” can enable organizations to improve this business process cost effectively.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <a href="../../../../../2011/01/2011-it-spending-intentions-survey/" target="_blank"><em>2011 Spending Intentions Survey</em></a>, January 2011.</p>
<p><a name="_ftn2">[2]</a> For some common approaches to securing valuable data assets, see the upcoming ESG Market Landscape Report, <em>Securing Databases</em>.</p>
<p><a name="_ftn3">[3]</a> For additional detail on these solutions, see: ESG Market Landscape Report, <a title="Permanent Link to Managing Database Growth – Optimizing Database Application Lifecycles" href="../../../../../2011/04/market-landscape-managing-database-growth-optimizing-database-application-lifecycles/" target="_blank"><em>Managing Database Growth – Optimizing Database Application Lifecycles</em></a><em>, </em>April 2011.</p>
<p><a name="_ftn4">[4]</a> See: ESG Research Report, <a href="../../../../../2010/04/2010-data-protection-trends/" target="_blank"><em>2010 Data Protection Trends</em></a>, April 2010.</p>
<p><a name="_ftn5">[5]</a> For additional information, see the upcoming ESG Market Landscape Report, <em>Host and Array-Based Replication</em></p>
<p><a name="_ftn6">[6]</a> Ibid.</p>
<p><a name="_ftn7">[7]</a> See: ESG Market Landscape Report, <a href="../../../../../2011/02/data-protection-backup-as-a-service/" target="_blank"><em>Data Protection: Backup-as-a-Service</em></a>, February 2011.</p>
<p><a name="_ftn8">[8]</a> For more information, see ESG Research Brief, <a href="../../../../../2011/04/esg-research-brief-big-data-and-the-cloud/" target="_blank"><em>Big Data and the Cloud</em></a>, April 2011.</p>
<p><a name="_ftn9">[9]</a> For additional information, see the ESG Market Landscape Report, <a href="../../../../../2011/04/market-landscape-managing-database-growth-optimizing-database-application-lifecycles/" target="_blank"><em>Management Database Growth &#8211; Optimizing Database Application Lifecycles</em></a>, April 2011.</p>
<p><a name="_ftn10">[10]</a> Ibid.</p>
<p><a name="_ftn11">[11]</a> For additional information, see ESG Market Report, <a href="../../../../../2011/04/the-logic-and-value-of-a-tiered-archive-tiering-across-more-of-the-storage-hierarchy/" target="_blank"><em>The Logic and Value of a Tiered Archive: Tiering Across More of the Storage Hierarchy</em></a>, April 2011.</p>
<p><a name="_ftn12">[12]</a> For additional information, see ESG Research Brief, <a href="../../../../../2011/03/esg-research-brief-application-virtualization-challenges/" target="_blank"><em>Application Virtualization Challenges</em></a>, March 2011.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/06/structured-data-reference-model-managing-databases-in-the-efficient-data-center/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting Ready for (Big) Data Integration</title>
		<link>http://www.enterprisestrategygroup.com/2011/06/getting-ready-for-big-data-integration/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/06/getting-ready-for-big-data-integration/#comments</comments>
		<pubDate>Fri, 17 Jun 2011 19:16:39 +0000</pubDate>
		<dc:creator>Julie Lockner</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Julie Lockner]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=23011</guid>
		<description><![CDATA[Traditional data integration methods are struggling to keep up with demand for actionable information from diverse and growing data sources. Vendors should focus not just on addressing the needs of “big data,” but on data integration platform innovations that provide high performance, agility, and end-user capabilities in today’s service-oriented IT paradigm. Overview The constant refrain [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">Traditional data integration methods are struggling to keep up with demand for actionable information from diverse and growing data sources. Vendors should focus not just on addressing the needs of “big data,” but on data integration platform innovations that provide high performance, agility, and end-user capabilities in today’s service-oriented IT paradigm.</div>
<private_premium>
<h1>Overview</h1>
<p>The constant refrain of data growth accompanies just about every IT process. From garden-variety business volume growth to the so-called “big data” sets created by multi-petabyte data warehouses, social media, scientific research, cloud-based applications, and e-commerce, IT faces a barrage of challenges simply because there is so much data to handle. Adding to the strain is the expectation of real-time information; users want “always on” applications and data access, leaving IT with no time for off-peak batch processing. Considering these (and other) challenges, it’s no surprise that managing data growth is the number two priority for IT organizations over the next 12-18 months (see Figure 1).<a href="#_ftn1">[1]</a></p>
<div class="graph_top">Figure 1. Top 10 IT Priorities for 2011</div>
<p><img class="aligncenter size-full wp-image-23014" title="BigDataIntF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/BigDataIntF1.png" alt="" width="612" height="391" />When asked to identify the areas of information management expected to see the most significant levels of investment over the next 12-18 months, nearly one-third of respondents pointed to data integration solutions (see Figure 2).</p>
<div class="graph_top">Figure 2. Most Significant Information Management Investments over the Next 12-18 months</div>
<p><img class="aligncenter size-full wp-image-23015" title="BigDataIntF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/BigDataIntF2.png" alt="" width="608" height="412" />This massive growth—along with an increasing need for real-time, integrated information—places extreme demand on current infrastructure and business processes. On the business side, organizations are hoping to take advantage of all this data to more swiftly identify and respond to customer behavior, escalate online processing, learn from astronomical and scientific research, and even examine the human genome. IT is charged with figuring out how to collect these masses of data and make them usable in real time. It’s a daunting challenge.</p>
<p>If real-time scalability problems weren’t enough, the data types being generated are becoming more diverse. As these trends converge, the next “problem child” for IT is likely to be in data integration—pulling together large, disparate data sets in real time. Current data integration platforms, built for an older generation of data challenges, will limit IT’s ability to support the business. In order to keep up, organizations are beginning to look at next-generation data integration techniques and platforms.</p>
<h1>Traditional Data Integration</h1>
<p>The purpose of data integration is to take data from various application-focused business processes and assemble it in a unified way to make it available for downstream activities, analytics, and reporting. Different formats and processing architectures, specifically tuned for unique workloads, make this amalgamation complicated. For example, the systems needed to report on financial and transaction processing data are different from those required for initial processing; data integration platforms provide the translation mechanism between data types and application architectures. It is a critical process, enabling organizations to take data from different applications and silos and turn it into actionable information. Different data types, database structures, and organizational schemata make integration complex.</p>
<p>The two most common data integration methods are Extract, Transform, and Load (ETL,) used predominantly in data warehousing applications, and Enterprise Architecture Integration (EAI).</p>
<h2>ETL</h2>
<p>With ETL, data is obtained from database applications (Extract), converted into a common compatible format (Transform), and then inserted (Load) into a reporting database or data warehouse. This process requires dedicated compute, network, and storage resources. It was designed around data center batch processes to run at off-peak times, such as overnight. Some organizations implement it manually by writing SQL statements and Perl scripts to extract and transform the data and batch process it for importing. Others use GUI-based, point-and-click tools created by vendors such as <a href="http://www.informatica.com/" target="_blank">Informatica</a> and <a href="http://www.ibm.com/" target="_blank">IBM</a>, dramatically simplifying the process; these tools usually include additional functionality for data quality checking, compliance and audit policies, event processing, and integration of partner and supplier data. Networked and federated databases provide methods of integrating data that changes often, but they still depend on ETL processes.</p>
<div class="graph_top">Figure 3. Traditional Extract, Transform, and Load Solution</div>
<p><img class="aligncenter size-full wp-image-23016" title="BigDataIntF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/BigDataIntF3.png" alt="" width="594" height="335" /></p>
<h2>EAI</h2>
<p>Disparate applications that require more real-time, dynamic integration as part of a tightly integrated business process may warrant the use of EAI solutions. This technique, usually tied to transaction-focused applications such as enterprise resource planning (ERP), customer relationship management (CRM), and supply chain management (SCM), operates on-demand rather than in batch processes and leverages message-based communication protocols including SOAP, SMTP, and AMF. Recently, EAI capabilities have been packaged as integrated Service Oriented Architecture (SOA) features deployed on middleware servers; examples include <a href="http://www.tibco.com/" target="_blank">Tibco Software</a> and <a href="http://www.oracle.com/" target="_blank">Oracle</a> Fusion Middleware. Because these solutions are application-centric, they often require custom development, making them difficult to modify. Rather than performing batches of data transformations, many EAI solutions incorporate services to transform data to a common format; application-specific connectors can be licensed to standardize the data. Benefits of using this message-based data integration are the ability to transform data in real time and to do so without disrupting the application platform.</p>
<div class="graph_top">Figure 4. Traditional Enterprise Application Integration   Solution</div>
<p><img class="aligncenter size-full wp-image-23017" title="BigDataIntF4" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/BigDataIntF4.png" alt="" width="607" height="290" /></p>
<h1>Challenges with Current Approaches</h1>
<p>The pressures of massive data growth and real-time data needs are straining traditional data integration techniques. Various challenges are forcing organizations to search for new ways to make data accessible and usable.</p>
<ul>
<li><strong>Data growth</strong>. Volumes continue to grow; it’s simply a fact of life. Terabytes, petabytes, and even zettabytes are being described as “big data”—a term with no specific definition, but a description with which IT shops are intimately acquainted as they bear the brunt of its impact. As data grows, it becomes difficult to pull together disparate data into a unified whole. Many data integration platforms are unable to scale to the extent needed to support large data volumes. Whether the integration platform is batch- or message-oriented, data growth is straining these systems.</li>
<li><strong>Time</strong>. Trying to accomplish batch processing in a shrinking, off-peak IT window is like drinking from a fire hose. Running batch ETL on mountains of data from multiple transaction processing systems creates disruptive bottlenecks. Data often cannot be transformed in time, reports take too long, and service level agreements are missed. This is unacceptable in a 24X7 IT environment that must support operations around the globe. Organizations are being forced to deploy faster networks, more processing power, and larger storage devices—and are often still unable to perform timely analytics. While message-based approaches are designed to address real-time needs, they do not apply well to analytics applications.</li>
<li><strong>Data types</strong>. Today’s business environments seek to leverage more and more diverse data types. It’s not just about bringing together floating point numbers, integers, and currency data into a common unit for querying; businesses also recognize real value in unstructured content. But how do you include data from documents, e-mail, blogs, videos, and audio files not in formats conducive to analytics? Writing scripts for this data for ingest into data warehouses is not only difficult, but also extremely time-consuming and hard to scale. New data types can be incorporated more easily into message-based platforms than into batch processes required for analytics.</li>
<li><strong>Data quality</strong>. Data quality problems are often exacerbated when new data sources are introduced into a business process without proper attention. Performing data quality checks <em>en masse</em> can slow the ability to leverage new data sources quickly. The time it takes for current data integration processes to complete can actually prevent the business from leveraging data for the purposes for which it was gathered; at the same time, poor data quality may render the data suspect at best and useless at worst.</li>
<li><strong>Self-service</strong>. Business users have high expectations. This is due in part to the ubiquity of the Internet, making information instantly available. Another driver is the “consumerization” of IT: today’s IT organizations are working to deliver user-focused services and building the required infrastructure to support that in the background. As IT becomes a service, users are expecting faster, easier, even self-administered data services; current data integration platforms were not built for those requirements.</li>
</ul>
<h1>Data Integration Must Evolve</h1>
<p>Growth in data volume and type is not likely to abate any time soon—nor is demand for real-time information and 24X7 operations. Even if organizations upgrade their infrastructures, they will be unable to accomplish their objectives with current data integration techniques. In the new world of server virtualization, cloud computing, and IT-as-a-service, it is doubtful that retrofitting old technologies will succeed. New data integration strategies are beginning to emerge and they will become essential to providing quality data in real time. Those strategies that focus on information-centric data integration techniques will likely establish the platform.</p>
<div class="graph_top">Figure 5. The Evolution of Data Integration</div>
<p><img class="aligncenter size-full wp-image-23013" title="BigDataIntF5" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/06/BigDataIntF5.png" alt="" width="606" height="295" />The following are some examples of emerging data integration techniques.</p>
<ul>
<li><strong>Moving data integration closer to the source.</strong> Instead of extracting data and squeezing it into an architecture that doesn’t fit current paradigms, new solutions are beginning to bring the analytics to the data. This can not only speed the data integration process, but also enable organizations to more easily incorporate disparate data types and sources.<br />
One method of moving analytics to the source is to incorporate new analytics platforms. Some solutions move the “extract” process directly into massively parallel processing (MPP) architectures and transform the data there. These MPP architectures are designed to process huge volumes of data efficiently in real time. Instead of extract, transform, load, the process is extract, load, transform.</li>
<li><strong>New frameworks.</strong> Organizations have the opportunity today to incorporate diverse data types—such as combining customer transaction data with interaction data such as social media—to enable new insights into customer needs and behaviors. The open-source Apache Hadoop project incorporates both a distributed file system (HDFS) and a map/reduce processing framework. This is an example of a framework that simplifies the data processing component using commodity clustered or grid-based compute nodes for structured (Hbase) or unstructured (HDFS) data. Dividing up processing tasks and executing them in parallel enables fast processing of large data sets affordably.</li>
<li><strong>Data integration accelerators. </strong>Complex data transformation processes present a unique challenge for organizations dealing with new data, new frameworks, and data growth. Common approaches are to increase the compute power available for the transform and sort tasks, and to increase storage capacity and expand staging areas to conduct transformations and sorts in phases. These additional resources are costly, particularly when extra licenses are CPU- and/or capacity-based. Alternative approaches, such as <a href="http://www.syncsort.com/" target="_blank">SyncSort</a>’s data integration accelerators, offer technology that plugs into existing infrastructures and “turbo-charges” sorting and complex transformation processes without source and target upgrades.</li>
<li><strong>Flexible deployment options. </strong>As mentioned previously, users are coming to expect to access data on their own, without IT assistance, in real time. As a result, new layers of abstraction are being built around data integration processes to hide the complexity of back-end processing and simply make data available to users as needed. Cloud-based, self-service models can relieve some of the IT burden and put the power in the hands of the user who knows the business context of a data need. Data integration platforms must be able to integrate various types of data in the cloud, along with data on premises and in remote locations. Companies such as <a href="http://www.snaplogic.com/" target="_blank">SnapLogic</a> have built solutions to easily integrate data from on-premises, software-as-a-service, web-based, and social-network applications. <a href="http://www.dell.com/" target="_blank">Dell</a>’s recently-acquired Boomi platform offers a similar kind of data “mashup” built for on-demand, cloud-based integration. These types of services are often used for specific tasks that do not justify an on-premises deployment.</li>
<li><strong>Solution simplicity. </strong>The pace of change in business and IT is not slowing; businesses must be agile so they can adapt to changing markets without losing ground to the competition. Solution providers must keep up with changing technologies and create solutions accordingly. For example, new platforms are being developed that offer open-source, integrated suites of capabilities such as ETL, online analytical processing (OLAP), querying, reporting, interactive analysis, dashboards, and data mining. This one-stop data integration and business intelligence platform can be easier to deploy and manage, reduce costs, enable organizations to more quickly respond to changes, and speed time to value.</li>
</ul>
<h1>Call to Action</h1>
<p>Time is of the essence. The rapid pace of change and the accelerating adoption of cloud-based services suggest that data integration providers must make adjustments quickly. For many, this will require changing their way of thinking and focusing on the proper objectives.</p>
<h2>Thinking Differently</h2>
<p>Businesses are looking for ways to leverage new data sources to improve analytics and business intelligence. These new data sources are being introduced at a rapid pace and IT organizations are applying agile software development approaches to business intelligence (BI) projects to keep up with the demand. Deployment options such as self-service and cloud-based BI are accelerating the expectations of time-to-value for these data sources.</p>
<p>The very nature of self-service and cloud deployments opens up opportunities for global BI deployments—good for the business, but further complicating data integration. In multi-national business processes, integrated data sources may have no boundaries regarding time zones, languages, operating systems, or platforms. What was once considered state-of-the-art data integration technology is rapidly becoming what the automotive industry might call a classic: while the majority of auto engines are still gas-powered, hybrid and electric engines are becoming more common and their accelerated adoption is forcing gas stations to consider providing electric recharge ports. This change may be disruptive, but it serves a widely accepted objective. The same is true for data integration: “classic” techniques may not be able to adapt to new IT paradigms and, while adopting new methods is not always easy, solution providers must do so to serve the clear business need.</p>
<p>The goals of a data integration strategy should align with corporate priorities to improve business process efficiencies while controlling costs, but developers and architects should not need to reengineer data integration methods with every new application or data source. An agile, information-centric data integration strategy should cease that process as a barrier to success.</p>
<h1>The Bigger Truth</h1>
<p>Data management experts and analysts are often heard talking about “big data,” and vendors are marketing “big data” solutions. While it seems to be a useful construct, vendors focused <em>only</em> on that phenomenon are short-sighted. Addressing data volume growth using old-style platforms and techniques will ultimately fail because growth is only part of the problem. Data management vendors must innovate and deliver next generation platforms that deal with not only volume growth, but also diverse data types, real-time demands, and changing user expectations.</p>
<p>These trends are stressing current data integration platforms and infrastructures built for a simpler world. The current ETL-focused data integration process is expected to remain viable for some time, but organizations that take this opportunity to build next-generation data integration based on today’s needs and requirements are likely to gain a competitive advantage. Businesses aren’t going to stop finding new ways to use their data—but they may well abandon processes and platforms that inhibit their ability to gain the most leverage from their data sources.</p>
<p>Organizations are looking to get more value out of their various data types in real time in order to get products to market faster, serve customers better, and take advantage of massively growing volumes before the data ages. At the same time, IT is proceeding toward greater efficiency, IT-as-a-service, and using IT as a competitive advantage. Data integration providers must be in step with these emerging data center requirements; users won’t put up with restricted data leverage limited by traditional data integration processes.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <a href="../../../../../2011/01/2011-it-spending-intentions-survey/" target="_blank"><em>2011 IT Spending Intentions Survey</em></a>, January 2001. All research references are taken from this report unless otherwise stated.<br />
<br /></br>
</private_premium>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/06/getting-ready-for-big-data-integration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Economics Alter the Storage Landscape: The Financial Leverage of Storing Less Will Win</title>
		<link>http://www.enterprisestrategygroup.com/2011/05/how-economics-alter-the-storage-landscape-the-financial-leverage-of-storing-less-will-win/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/05/how-economics-alter-the-storage-landscape-the-financial-leverage-of-storing-less-will-win/#comments</comments>
		<pubDate>Thu, 12 May 2011 18:31:29 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[IT Operations]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Steve Duplessie]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[deduplication]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=22332</guid>
		<description><![CDATA[While it’s tempting to think that technology rules the IT space, the truth is that far more often, it is the financial aspects that really dominate. Sometimes, that’s obvious, other times it’s less so.Technologies can be IT game-changers, but better economics are invariably the root value. Introduction IT is About More than Technology While it’s [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">While it’s tempting to think that technology rules the IT space, the truth is that far more often, it is the financial aspects that really dominate. Sometimes, that’s obvious, other times it’s less so.Technologies can be IT game-changers, but better economics are invariably the root value.</div>
<private_standard>
<h1>Introduction</h1>
<h2>IT is About More than Technology</h2>
<p>While it’s tempting to think that technology rules the IT space, the truth is that far more often, it is the financial aspects that really dominate. Sometimes, that’s obvious, other times it’s less so. Certainly, every once in a while, a “game changing” technology emerges that addresses a critical business issue such as time or cost. Think of the virtualization of servers as a good example of delivering massive cost improvements or data deduplication in the backup market as a good example of beating time. In truth, while time was certainly the IT driver for the value of backup deduplication, the underlying business driver was financial. Why? Because—prior to deduplicating backups—it wasn’t as if backups could not be completed in time; the technology needed to complete them was so outlandishly expensive. Technologies can be IT game-changers, but better economics are invariably the root value.</p>
<p>Today, we are facing another major challenge that is contextually similar to the server sprawl or incomplete backups that, respectively, spawned the success of server virtualization and backup deduplication. Storage demand continues to grow rapidly at a rate that exceeds its relative price decline. It is not a sustainable model: data growth trends are driving the need for another game-changing event. Data growth is outstripping IT CAPEX budgets, available IT space, management capabilities, and OPEX thresholds and the pressures to manage more data in cloud environments only makes the challenge more acute. We are approaching—some users are even at—a breaking point. As a result, data storage must undergo a massive leap in efficiency or the business advantages that could be realized from all that data will be lost.</p>
<p>A sea change is about to occur in primary storage, driven by the obvious mismatch between demand for capacity and the ability to supply that demand at a reasonable cost. There are only two variables here: since the price of storage isn’t declining sufficiently fast, the only other option is for the amount of data that actually gets stored to decrease. Doing this while still serving demand for up-front capacity means that the necessary sea change for storage is (once again) spelled D-E-D-U-P-L-I-C-A-T-I-O-N. However, this is not the technology that was designed for backup. Deduplication for primary storage has a different set of challenges—mainly mitigating or precluding performance impacts, and dealing with massive data sets. It also offers different opportunities, not only reducing traditional data center needs, but also adding an operational and economic enablement to cloud implementations. The storage vendor that gets this right could be the next game changer that will most probably gain similarly dominant market shares, much like other recent market game changers such as VMware and Data Domain.</p>
<h2>Market Functions</h2>
<p>Why were those companies game changers? Because of the problems they solved, markets that were previously unavailable—or simply did not exist before—suddenly became valid with significant opportunities. For example, not being able to complete a backup within a 24-hour period is a great example of a time problem as are the server deployment costs and subsequent underutilization that results in out of control expenses. In both examples, users/buyers invariably looked to current suppliers to provide a “fix” to the issues. The typical quick fix is to make incremental improvements to core products, providing the bare minimum incremental “feature.” Vendors continue to support and extend that bare minimum feature set until such time that the solution simply doesn’t work anymore—which is when the user/buyer will first truly look for an alternative.</p>
<p>Unfortunately, history demonstrates that, regardless of how much better (cheaper/faster) an alternative company’s product/technology/solution is versus the incumbent supplier’s, most buyers will almost always stick with the incumbent <em>until the pain is too great or they outright fail</em>. It’s the “Devil you know” applied to business, the “no-one ever got fired for buying vendor X” mantra. That is the power of incumbency and it’s why massive cost savings are so important: only overwhelming improvement will overcome natural inertia and conservatism. Huge, “slap in the face” economic improvements demand attention.</p>
<p>Indeed, ESG research has found that users report that cost reduction is the prime business initiative that impacts their storage spending, as Figure 1 shows.<a href="#_ftn1">[1]</a></p>
<div class="graph_top">Figure 1. Business Initiatives That Will Impact IT   Spending Decisions, 2011</div>
<p><img class="aligncenter size-full wp-image-22334" title="StorageEconF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/05/StorageEconF1.png" alt="" width="604" height="451" />It’s also worth noting that the same research has, for the last few years, shown a greater relative emphasis on bringing down OPEX as opposed to CAPEX; no doubt as a realization that the lifetime costs of storage (space, power, and management) can be much greater than the initial purchase price.</p>
<p>Remember, also, that we only have storage tiers because storage isn’t free. Different types of storage exist only because of cost differentials; after all, if storage were free, we would all naturally put everything on the fastest devices possible. Only because storage has a price—indeed, a range of prices—do we think about where to put different data. We often disguise that as a conversation about performance, but that’s a symptom rather than a cause. The cause is money. By the way, just to extend this to its logical conclusion, even if storage were free, it still would not be free to operate. As mentioned, reducing OPEX is the most important aspect users mention in terms of overall cost reduction. As such, we need to be driving down operational costs as much, or more than, the capital costs.</p>
<p>The industry has done many clever things to address the escalating cost of storing data: dynamic tiering, thin provisioning, and the like. These methods provide only incremental cost improvements. What’s needed is a much more effective tool for addressing the storage growth problem.</p>
<p>So, how are we to address this challenge? What’s needed is what might be called “data optimization economics,” a potential kill shot that has the ability to address the root cause—the massive growth of data that need to be physically stored—and thereby produce the next great leap forward for IT economics.</p>
<h2>“The Law of 10X”</h2>
<p>For whatever reason, history tells us that when we are in a “good enough” market state, the only vendor combatants to successfully alter the status quo (and thereby steal appreciable share from the incumbents) are those that offer solutions that are 10X “better” than the competition. You can argue the degree of improvement—it may be 7X or 12X—but the point is that it is an immediate and extreme change that results in improved financial value; way more than could have been expected under the standard regime of incremental change . These are the only ways that market tectonics truly shift. While being able to provide such a leap forward does not guarantee success, not doing so does seem to relegate those would-be players to a life in the niches of the market, rarely providing them an opportunity for dramatic market penetration or riches.</p>
<p>Furthermore, all “10Xs” are not created equal. Simply having a 10X “something” is by no means a guarantee that the market will adopt an offer on a broad scale. For the last 20+ years in storage, most systems have performed at a level that’s “good enough” and, as such, simply having a system that is 10X faster would only be interesting to whichever niche within the market craves additional performance. But, where the “something” is a financial play that yields a 10X improvement, more of the market will listen. After all, most of the market is focused on cost reduction to some extent. Simply said, a 10X improvement does guarantee widespread adoption when it is driving hard capital cost and operational cost (both hard and soft) savings. It cannot be ignored.</p>
<h2>Recent Proof Points</h2>
<p>Let’s look at a couple of examples of the “Law of 10X” in the real world. Each of these is an example of the “data optimization economics” already mentioned and each also achieved first-mover-advantage. That means reaping massive rewards (market share, selling price, etc.) for the vendor in lock-step with the massive benefits enjoyed by IT users.</p>
<ul>
<li><strong>Data Domain: </strong>Data Domain saw that the issue was not any intrinsic need for deduplication for better backup per se; instead, it solved the business issues of <em>time and cost. </em>Users could not complete backups in 24 hours (at least not without ludicrous costs) and so Data Domain delivered backup to disk, solving the time issue. It delivered on the “Law of 10X” and solved the broader economic problem with deduplication that drove the <em>effective </em>cost of disk much closer to that of tape. A huge percentage of users now buy deduplication <em>technologies</em> for their <em>financial </em>value—they save them money! As a first-mover, Data Domain took 90% or more of the value generated while it was independent and now, post acquisition, EMC probably still takes 90% of that value.</li>
<li><strong>VMware: </strong>Server virtualization is undoubtedly the biggest wave in IT at present. Yet it was not an overnight success; initially, it was something of a “cool toy” used by a small cadre of propeller heads, QA teams, and the like who could avoid buying 10X the equipment to get their projects and work done. The epiphany moment came when the economic—as opposed to technological—value of VMware was realized to be a mainstream market solution because the footprint and cost issues of server sprawl had become paramount in data centers. Virtual machines have no additional footprint and cost a great deal less than physical servers: 10X utilization with no footprint gain was an economic home run.</li>
</ul>
<h1>What Will Happen?</h1>
<h2>The Current Situation</h2>
<p>Despite all the consolidation and marketing noise, the enterprise storage market has been settled, some might say stagnant, for a while. In this status quo, a “big picture view” reveals that:</p>
<ul>
<li>Everything is fast enough for the most part and where it’s not, there is investment in very expensive, albeit cost-effective, flash/solid state storage devices to at least address the niche need.</li>
<li>Enterprise storage remains up around 100X the cost of consumer storage; there are many valid reasons for this, but it is a significant gating factor on businesses ability to keep up with growing capacity demands.</li>
<li>As the world heads into the zettabyte era, relentless data growth—and the commensurate cost growth that accompanies it—is the major challenge in IT.</li>
</ul>
<p>The collective impact of these points is that data optimization—and the massively favorable economic impact of such optimization—is hugely important in storage today. As such, it is a massive business opportunity. However, it’s not about deduplication technology (wonderful as it may be): it’s about the ability to deal with rampant data growth <em>economically</em>. The technology is the enabler, but the real business issue is to be able to keep up with data growth demand at an affordable cost, not just in terms of CAPEX but also from an OPEX perspective, too. Given that a range of high level industry projections show annual data creation being in the tens of zettabytes over the next decade and with the knowledge that typically only 20% or 30% of that data is unique, the impact of data optimization could preclude the necessity to physically store tens of zettabytes of duplicate data. It’s a huge number, a huge potential impact, and a huge market opportunity.</p>
<p>Data Domain (and its followers) taught the market how deduplication (data optimization) technology could be applied to disk to deliver huge (and impossible to ignore) financial leverage; but to this point (other than a couple of relatively immature, although impactful, offerings by NetApp and Oracle), it has only been applied to the very “end tier” of the data lifecycle (i.e., backup) which represents only a tiny portion of the total available disk-based storage market. The same thing needs to happen across <em>all</em> the other storage tiers and when it does, the 10X criteria will be readily met and the market impact will be enormous.</p>
<h2>What Has to Happen</h2>
<p>It’s simple: someone will figure out how to apply technologies to gain 10X economic advantage, without performance impact and at scale across the breadth of the primary storage market. Taking a page from the Data Domain playbook, someone (or someone else!) will do the same thing to the higher levels of the primary storage market (tier 0, and beyond), thus making the most expensive parts of the primary storage market far more attainable for those who need it by, for instance, even applying deduplication or compression to solid state and thereby extending its value. SSD and flash are today’s hot topics in terms of helping address the highest performance needs of users, but they remain priced in the stratosphere. Compression and deduplication technologies that can be applied to this part of the market could alter the economic dynamics of performance by more than 10X—maybe even an order of magnitude more.</p>
<p>As ever, first-mover advantage is available to any company that understands the “Law of 10X” and makes the land-grab while other market players are talking about science projects and roadmaps or simply trying to maintain the status quo.</p>
<h2>Who Will Do It … and Why?</h2>
<p>The “who” is hard to know. Given the dynamics of the primary storage market, it could be an existing big player (the likes of HP, EMC, IBM, Dell, and now even Oracle), a large pure-play (NetApp or HDS) or possibly one of the mid/smaller vendors (such as Nexsan, Xiotech, BlueArc, Coraid, or nascent “next-gen” technology providers). Market coverage and traction favors the first two categories. It will come down to who wants to merely defend their market position compared to who wishes to aggressively grow their market position at their competitors’ expense. The first-mover advantage is available to be claimed; what’s most likely is for disruptive primary data optimization on a broad scale to be claimed by a “major” in combination with a “next-gen” technology player, with others then scrambling to catch up as the market adopts and embraces this next “aha” technology and financial “no-brainer.”</p>
<p>This is, after all, how markets have invariably operated. Once a market ignites, the first-mover, rarely facing stiff price competition, tends to take 90% of the value generated in that market until such time that the market morphs (i.e., the problem that was solved is changed or disappears) or matures to a point where the solution is so commoditized that only a price leader can sustain a presence.</p>
<p>The opportunity for whichever vendor decides to grab the reins in order to “Data Domain/VMware” the primary data optimization market is enormous. The storage business represents tens of billions of dollars in annual spend—and it is growing. With the demand for data storage capacities soaring and yet with cost control paramount for users, there is a genuine opportunity for primary data optimization to cause a significant shift in storage vendor market shares. Even a few percentage points translate to multi-billion dollar annual revenues and almost certainly significant market valuation.</p>
<h1>The Bigger Truth</h1>
<p>This opportunity—to optimize the enormous amounts of data being stored—is based on a problem whose time has clearly come. It has massive financial implications (The Law of 10X) and it is relevant and unavoidable across the entire storage marketplace. It is a logical IT need with an abundantly clear financial requirement. It will happen. The questions are: who will be the big winner, who will lose, and when will it happen? The economics are compelling and the business impact is even more seductive, which means that the “when” is likely to be very soon.</p>
<p>The obvious need, the financial value to users, and the impact on the market are all nearly impossible to overstate. Economics always wins and, in this case, the winning economics have a realistic chance to alter the market landscape, awarding the first-mover huge top line and bottom line gains. It will change everything and upset (or at least re-arrange!) a $20B or more annual spend. It will definitely create net new value distribution in the market and it’s entirely conceivable that some of the biggest incumbents—those that look unassailable today—will get hurt (in terms of sales, market share, and valuation) unless they move very quickly for leadership.</p>
<p>Optimizing data volumes for financial benefit is not some passing need; for sure, it is categorically relevant to all the key IT moves today including virtualization, cloud, “big-data,” and next generation OSs, which are all driving increased data storage needs. Further, it will remain relevant in all conceivable future data storage and management scenarios. There is no forecast on earth that has IT and data demands shrinking; we have to find ways to manage and contain that growth, which is why this is such a crucial move. As we increase sharing, mobility, functionality, raw demands, and general IT expectations, the underlying and continuing truth is that money matters.  And with data demands growing exponentially, the extent to which money matters in the storage world is growing exponentially, too.</p>
<p>There’s a multi-billion dollar win that will accrue to whoever addresses it first—and well.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report,<em> <a href="http://www.enterprisestrategygroup.com/2011/01/2011-it-spending-intentions-survey/" target="_blank"><span style="text-decoration: underline;">2011 IT Spending Intentions Survey</span></a></em>,January 2011.
</private_standard>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/05/how-economics-alter-the-storage-landscape-the-financial-leverage-of-storing-less-will-win/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The Logic and Value of a Tiered Archive: Tiering Across More of the Storage Hierarchy</title>
		<link>http://www.enterprisestrategygroup.com/2011/04/the-logic-and-value-of-a-tiered-archive-tiering-across-more-of-the-storage-hierarchy/</link>
		<comments>http://www.enterprisestrategygroup.com/2011/04/the-logic-and-value-of-a-tiered-archive-tiering-across-more-of-the-storage-hierarchy/#comments</comments>
		<pubDate>Mon, 18 Apr 2011 14:13:21 +0000</pubDate>
		<dc:creator>Mark Peters</dc:creator>
				<category><![CDATA[Digital Archiving Software]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[Information Management Software & Services]]></category>
		<category><![CDATA[Information and Risk Management]]></category>
		<category><![CDATA[Mark Peters]]></category>
		<category><![CDATA[Market Reports]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[archiving]]></category>
		<category><![CDATA[tiering]]></category>

		<guid isPermaLink="false">http://www.enterprisestrategygroup.com/?p=21855</guid>
		<description><![CDATA[This paper makes the case for extending storage tiering into the archive data space; the sense of such a move would make one wonder why it hasn’t been done before. Yet, of course, the obvious isn’t always so obvious until is it pointed out! Tiering: What, Why, and Where In the well known and often [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">This paper makes the case for extending storage tiering into the archive data space; the sense of such a move would make one wonder why it hasn’t been done before. Yet, of course, the obvious isn’t always so obvious until is it pointed out!</div>
<h1>Tiering: What, Why, and Where</h1>
<p>In the well known and often discussed “Storage Hierarchy,” we’ve effectively thrown a huge percentage of aging data off an “archive cliff” … which is odd when you consider that archive data represents a huge, and growing, proportion of the overall data out there. We happily angst over the precise placement of production data onto various forms of spinning disks (with and without cache, solid state, or physical variations to tweak performance and capacity) <em>because we acknowledge that different data has different needs and different applications and businesses and times require varying attributes from a storage system. And yet, somehow—and for no good reason—once data becomes less active and/or reaches its archival stage, we are content to just throw it over the equivalent of an IT wall into a big homogenous pot. </em></p>
<p>This needs to change. Archive data is no more homogenous than production data. After all, it’s the same data, just at a different stage of its life; we wouldn’t treat all people the same at a certain age—nor should we with data. We don’t want the equivalent of a data “Logan’s Run,” killing it all off at a certain point; nor are we going to force activity on older people that are not suited or accustomed to it!</p>
<h2>The What and Why of Storage Tiering</h2>
<p>Before we get too sentimental about the data equivalent of locking grandparents in long-term retirement facilities without visitors or distractions regardless of their prior lives or current needs and abilities, let’s turn to some cold hard facts about storage tiering. It is simply a means to an end—and not an end in itself. The reason tiered storage is becoming more crucial (indeed, a prerequisite for many organizations) is a simple matter of economics. After all, if storage were free, then everything would logically be stored on the fastest devices available. Capacity would be unlimited and cost no issue. Yes, there are some practical issues around connectivity, protection, and access, but you get the point: the only reason we have long sought the nirvana of an effective storage hierarchy is that storage isn’t free and, consequently, we need to make choices and allocate data to appropriate levels and types (i.e., costs) of storage.</p>
<p>Now, with the rate of storage growth outstripping the rate of its price decline, tiering and storage economics are back on the agenda—still not because they are inherently desired, but because of what they can deliver: namely, lower costs, better business outcomes, and a pragmatic route to cope with massive storage growth. These are the “economics of necessity.” Of course, once good tiers and tools that make effective tiering feasible (and the economics compelling) exist, then users can also start looking at the “economics of opportunity”—in other words, if tiering is something that can be <em>embraced rather than avoided</em>, then additional value can be sought and overall efficiency can be increased. Tiered storage can move from something that users do because they <em>have</em> to (as little as they can get away with) to something they do because they <em>want</em> to.</p>
<p>For active production data, we have, at least conceptually, “got this”—though full adoption is still in progress. But for the wider archival layer, it has apparently escaped our attention and is definitely needed. It is needed because of the continuing massive growth of capacity requirements (much of it fixed content and unstructured data), combined with a need to store many things for longer and longer. Most IT operations find that an increasing percentage of their budgets goes to storage—and that’s despite the improvements that “counter-growth” tools like thin provisioning and deduplication have provided. However, these advantages are soon wiped out by the fact that most backup is now done (at least as a first stage, to meet tighter backup windows and recovery needs) to disk, increasing demand for a relatively expensive resource compared to less costly tape, and having only one archive tier is bad for matching necessary service levels and available budgets. For instance, whether extra capacity demands are driven by desire (replication) or demand (regulations) might well influence the service level you crave and the budget you are prepared to allocate. And that’s before we consider the applications, business stage, relative success, and so on.</p>
<p>To summarize, tiering is the manifestation of an economic need: performance is a reaction—or symptom—rather than a cause, despite many assertions to the contrary. That said, whether or not we understand our semantics and motivations, to not tier archive data as efficiently as we do active data is a crime and a simple waste of money. Worse still, many users compound the transgression by not even having one non-tiered archive level and instead having a whole range of individual single-tier archives; they are then piling management overhead, waste, and additional risk onto their economic inefficiency. The situation is only getting worse as data volumes grow and retention requirements intensify.</p>
<h2>Why Tier? The Academic Answer</h2>
<p>The “Storage Hierarchy” has been discussed for years and is simply described as putting the right data on the right device at the right time for both data type and for that data’s lifecycle requirements. Thus, data moves from one level to another as needed. And it’s the “movement” and “as needed” parts of that equation that have proven toughest to provide. Indeed, what we’ve been searching for is not a Storage Hierarchy per se, as that’s always existed. Instead, what’s been needed is to add a dynamic, automated aspect to it. The value that this can deliver is:</p>
<ul>
<li>Having the right data (only) on the right storage device (always) at the right time</li>
<li>Better utilization of all storage assets (which also means lowering floor space and power expenditure)</li>
<li>Reduced management resources (despite, and because of, better alignment of data to tiers)</li>
</ul>
<p>All of this together can help reduce costs, which—whatever else is going on in IT—remains the number one business initiative impacting IT investments, as the ESG research in Figure 1 shows.<a href="#_ftn1">[1]</a></p>
<div class="graph_top">Figure 1. Business Initiatives That Will Impact IT Spending Decisions, Three-year Trend</div>
<p><img class="aligncenter size-full wp-image-21859" title="ArchiveTierF1" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/04/ArchiveTierF1.png" alt="" width="620" height="439" /></p>
<h2>Why Tier? The Real World Answer</h2>
<p>While the points in the previous section are all excellent, important, and true, they are a little “dry.” They’re not the way busy IT professionals focus their minds every minute; so let’s take a more real world look at the pains and inefficiencies that will be more familiar in many data centers. What’s really behind the need to tier is the growing wave of fixed (or static, or persistent, or cold) data; whatever the preferred descriptor, there’s no denying that there’s an enormous amount of it, whether it’s older e-mail files, scanned images, rich media, copies of personnel quarterly reviews from  summer 2006, or the CEO’s all hands presentation from 2002! Despite the fact that this data isn’t changing—and much of it isn’t accessed frequently—it is <span style="text-decoration: underline;">nonetheless sitting on expensive disk</span> subsystems designed to provide a much richer suite of capabilities and performance than is appropriate for this data. This would be bad enough, but the situation is further compounded by two other issues:</p>
<ol>
<li>As this data grows, the easy option is to simply throw another expensive, over-qualified disk shelf or rack at the problem; while it’s understandable to do this rather than address tiers and archiving it’s also a significant factor in driving up the overall percentage of IT spend that is sunk into storage (market estimates vary, but a good rule of thumb is that it has doubled in the last decade). This path of least resistance approach is clearly not sustainable and needs to be replaced by a path of redemption (or at least of common business sense) whereby appropriate data—even primary data that is older and less accessed (the definition of archived data)—is moved off of the most expensive media.</li>
<li>Inappropriate data that is kept and allowed to grow on main disk systems is, naturally, treated with the same care as the appropriate, highly-active data on those systems; it gets backed up frequently (daily, weekly, whatever, despite the fact it hasn’t changed) to expensive disk and/or tape devices. There are expensive administrators managing it (even though, again, it’s just sitting there!) and then users wonder why their backup windows are shrinking and their administrators are tearing their collective hair out.</li>
</ol>
<p>It has been said that the definition of insanity is to keep doing the same thing and expect a different outcome; hoping that this vicious cycle resolves itself without changing anything is clearly insane.</p>
<h2>Where in the Hierarchy should Tiers Exist?</h2>
<p>As is plain by now, it is logical to tier everywhere and yet our ability to draw a pretty hierarchical triangle has done us the disservice of allowing us to think we have been doing so. The next section explains how the self-deception has been perpetrated and how today’s tools are beginning to make true tiering something that users should demand because it’s finally becoming possible <em>not just for the active data, but for the archive data, too.</em> Tiering up and down the storage hierarchy is simple common sense and yet most focus to date has been at the “mid” levels, ignoring the high performance and high capacity parts of the data spectrum.<a href="#_ftn2">[2]</a></p>
<h1>Storage Tiering</h1>
<h2>History and Opportunity</h2>
<p>Figure 2 shows very simply, yet starkly, how we arrived where we are; it also shows why there is a need for more complete tiering at all levels of the data hierarchy.</p>
<p>In the graphic, Picture 1 portrays the “perfect story” that we&#8217;ve spun for decades; however, it never properly existed and would actually be better represented by the sort of wedding-cake shown in Picture 2 in which the “steps” from one storage tier to another can be seen to be large. In the real world, such moves (such as that from disk to tape) were for years also awkward, largely manual, and, hence, to be avoided as much as possible. As shown in Picture 3, the moves were pretty much one way, almost a storage “lower-archy!” Over the years, we have added more gradations for active data (Picture 4) in terms of differing types of disk drive (speed, capacity, caching, etc.), but the top and bottom layers of the data “cake” have been pretty much left alone: solid state disks and tape. What we need to do as an industry is to extend additional layers (see Picture 5) in these places, too. What you end up with is a gradual smoothing of the layer cake from top to bottom so that it begins to look (as Picture 6 shows) rather like the original vision we had decades prior. The “magic” comes from automated tools (the arrow in the picture) that allow data to be moved both up and down the hierarchy to truly get it in the right place at the right time (and to fulfill on the promise of things like HSM and ILM). The final element is shown in Picture 7 whereby technologies such as deduplication, thin provisioning, and solid state/caching can be applied to shrink the pyramid for a given workload.</p>
<div class="graph_top">Figure 2. The Evolution of the Storage Hierarchy</div>
<p><img class="aligncenter size-full wp-image-21860" title="ArchiveTierF2" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/04/ArchiveTierF2.png" alt="" width="610" height="523" /></p>
<h1>Tiered Archive Market Value</h1>
<p>A better approach to tiering archived data has many benefits for users:</p>
<ul>
<li>Flexibility</li>
<li>Cost advantages</li>
<li>Management ease (assuming it is automated)</li>
</ul>
<p>When we look from a market perspective, the potential for positive impact (both in an IT [SLA] and a business [ROI /TCO] sense) is obvious because the capacity of digital archive data is huge and growing rapidly, as the ESG research shown in Figure 3 shows.<a href="#_ftn3">[3]</a></p>
<div class="graph_top">Figure 3. Total Worldwide Digital Archive Capacity,   2010-2015</div>
<p><img class="aligncenter size-full wp-image-21858" title="ArchiveTierF3" src="http://www.enterprisestrategygroup.com/media/wordpress/2011/04/ArchiveTierF3.png" alt="" width="629" height="346" /></p>
<p>There is growth across all media types and, hence, opportunities for many technologies and approaches to play a role in a tiered archive implementation: disk (internal or external), tape, removable disk, optical, and cloud resources.</p>
<p>When ESG has researched users’ proclivity for tiering it has found that it ranks higher among organizations that also cite cost reduction as a major factor impacting their storage spending.  Indeed a tiered archive can also have a positive impact on some other common business initiatives—for instance, well managed archive tiers will almost certainly place more data on more power-efficient storage devices.</p>
<h1>Designing a Tiered Archive</h1>
<p>So far we know two things:</p>
<ol>
<li>A tiered archive approach makes sense in terms of the hierarchy and the simple fact that data isn’t all the same once it becomes less active (no more so than when it was more active).</li>
<li>The sheer size and growth of the digital archive space makes the ability to add better service levels and better economics a business imperative.</li>
</ol>
<p>So, with these two things determined, what should users be looking for—and indeed demanding from—vendors in a tiered archive offering?</p>
<ul>
<li>First and foremost, the solution should be automated and, rather obviously, encompass multiple archive tiers, moving data from a nearline-type existence to increasingly lower performance (but still relatively easily available) platforms. This provides a range of price and performance which is the essence of tiering and so maximized “intelligence” is crucial. This means dynamic, policy-based data movement and management.</li>
<li>Unless and until we have genuine complete hierarchical pools that can control data from “cradle to grave” (and this is, of course, nirvana), in the meantime it would be excellent if a solution could at least have a foot in each camp; in other words, cover more than one layer of the hierarchy by offering an ability to have some level of online/active data devices controlled and managed as well.</li>
<li>An ability to control even removed and offsite data would be an advantage especially if combined with some form of automated management tool to check on the viability of media and data.</li>
<li>A host of practical aspects:
<ul>
<li>Data protection is needed. Don’t forget this is archived, primary copy data, so some form of backup is highly likely to be required. Because DR and backup recovery times vary—by application, business, and budget richness, for example—a range of integrated options can only add value to the overall solution.</li>
<li>High availability options will be good for some data. This is still data that may be needed (if it is not, then it should be deleted and if it is suddenly promoted in the storage hierarchy, you’ll want to be able to ensure availability).</li>
<li>Easy and extensive scalability are required.</li>
<li>Self-management and single-pane of-glass control will minimize administrative effort and complexity.</li>
</ul>
</li>
</ul>
<p>It’s worth noting that the underlying bedrock for all this to occur is storage virtualization; that is what permits the migration and management that generates the value. It also makes any tiered archive compliant with what is a major component in any modern data center. Virtualization is one of the few tools with the potential to balance spiraling needs and expectations with constrained resources.</p>
<h1>The Bigger Truth</h1>
<p>A lengthy closing argument here could be construed as “over-egging the pudding.” Given the scale of the archived data universe and the need for all users to find economic efficiency wherever they can, <em>having a tiered archive data layer just makes plain common sense. </em>The only small caveat is that the industry must deliver (and users must demand) it in a manner that is easy to implement and automated.</p>
<p>We all “get” the idea of data lifecycles but without a tiered archive approach, we are not provisioning accordingly for a huge proportion of the world’s data: instead, we continue to place inappropriate data on very expensive disk systems and support, grow, and protect it as if it were either critical and/or highly referenced. That is crazy and financially reckless, which in itself one might argue is sufficient reason to change it. Yet as we look forward, the continuing gap between burgeoning archive data volumes and the ability to afford to provision them is only going to expand—tiered archiving is moving from a common sense improvement to an absolute imperative. We hope the industry is paying attention =. It may not be the most glamorous or sexy factor in IT and data management, but, like the aging human population or dwindling oil stocks relative to demand, it is a challenge we cannot and should not ignore.</p>
<hr size="1" /><a name="_ftn1">[1]</a> Source: ESG Research Report, <a href="../../../../../2011/01/2011-it-spending-intentions-survey/" target="_blank"><em>2011 IT Spending Intentions Survey</em></a><em>, </em>January 2011.</p>
<p><a name="_ftn2">[2]</a> While this paper concentrates on tiering at the archive level, a companion ESG Market Report is to be published—planned for June 2011—that will address the needs for tiering at the extremely high performance level.</p>
<p><a name="_ftn3">[3]</a> Source: ESG Research Report, <a title="Permanent Link to Scale-out Storage Market Forecast 2010-2015" href="../../../../../2011/03/scale-out-storage-market-forecast-2010-2015/" target="_blank"><em>Scale-out Storage Market Forecast 2010-2015</em></a>, March 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.enterprisestrategygroup.com/2011/04/the-logic-and-value-of-a-tiered-archive-tiering-across-more-of-the-storage-hierarchy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

