How to do storage virtualisation right
- 11 September, 2007 22:00
When Roland Etcheverry joined chemical company Champion Technologies two years ago, he looked around and realised he needed to remake the company's storage environment. He had done this twice before at other companies, so he knew he wanted a storage area network (SAN) to tie the various locations to the corporate data centre, as well as to a separate disaster recovery site, each with about 7TB of capacity. He also knew he wanted to utilise storage virtualisation.
At its most basic, storage virtualisation makes scores of separate hard drives look to be one big storage pool. IT staffers spend less time managing storage devices, since some chores can be centralised. Virtualisation also increases the efficiency of storage, letting files be stored wherever there is room, rather than have some drives go underutilised. And IT can add or replace drives without requiring downtime to reconfigure the network and affected servers: The virtualisation software does that for you. Backup and mirroring are also much faster because only changed data needs to be copied; this eliminates the need for scheduled storage management downtime, Etcheverry notes.
Better yet, he will save money on future storage needs, because his FalconStor storage management software combines drives from multiple vendors as if they were one virtual drive, letting Etcheverry avoid getting locked in to the expensive, proprietary drives that array-based storage systems often require.
Although storage virtualisation technology is fairly new, it's quickly gaining traction in the enterprise. In 2006, 20 percent of 1,017 companies surveyed by Forrester Research had adopted storage virtualisation. By 2009, 50 percent of those enterprises expect to. And the percentages are even higher for companies with 20,000 or more employees, the survey notes: 34 percent of such firms had deployed storage virtualisation in 2006, and that will climb to 67 percent by 2009.
But storage virtualisation requires a clear strategy, Etcheverry says. "A lot of people don't think much about storage, so they don't do the planning that can save costs," he says. Because storage virtualisation is a very different approach to managing data, those who don't think it through may miss several of the technology's key productivity and cost-savings advantages, concurs Nik Simpson, a storage analyst at the Burton Group.
Strategically, storage virtualisation brings the most value to resource-intensive storage management chores meant to protect data and keep it available in demanding environments. These chores include the following: replication to keep distributed databases synchronised; mirroring to keep a redundant copy of data available for use in case the primary copy becomes unavailable; backup to keep both current and historical data available in case it gets deleted but is needed later; and snapshots to copy the original portions of changed data and make it easier to go back to the original version. All these activities have become harder to accomplish using traditional storage management techniques as data volumes surge and time for backup chores decreases.
Because storage virtualisation technology used for these purposes copies just the individual parts of changed data, not entire files or even drive volumes as in traditional host-based storage architectures, these data-protection activities are faster and tax the network less. "You end up transferring 40 or 50 percent less, depending on the data you have," says Ashish Nadkarni, a principal consultant at the storage consultancy GlassHouse Technologies.
This efficiency lets a CIO contemplate continuous backup and replication, and enables quick moves to new equipment in case of hardware failure. "We can add new storage as needed and have data transferred in the background, without the users even knowing," says Ryan Engh, IT infrastructure manager at the investment firm Wasatch Advisors, which uses DataCore's virtualisation software.
Another advantage: "This prevents the states of the disaster recovery site and the production site from pulling apart," he says, a common problem in a traditional environment where the two data sets are usually out of synch because of the long replication times needed.
Moreover, the distributed nature of the data storage gives IT great flexibility in how data is stored, says Chris Walls, president of IT services at the healthcare data management firm PHNS, which uses IBM's virtualisation controller. "That control layer gives you the flexibility to put your data in a remote site, or even in multiple sites," he says, all invisible to users.
Understanding these capabilities, a CIO could thus introduce 24/7 availability and disaster recovery, perhaps as part of a global expansion strategy. That is precisely what Etcheverry is doing at Champion. "We now have a zero-window backup, and I can rebuild a drive image in almost real-time," he says.
Some enterprises have gained additional advantage from storage virtualisation by combining it with an older technology called thin provisioning that fools a drive into thinking it has more capacity than it has; this is done typically to create one standard user volume configuration across all drives, so when you replace drives with larger ones, IT staff does not have to change the user-facing storage structure. By adding storage virtualisation, these standardised, thin-provisioned volumes can exceed the physical limit of any drive; the excess is simply stored on another drive, without the user knowing. "This really eases configuration," says Wasatch's Engh. That also reduces IT's need to monitor individual drive usage; the virtualisation software or appliance just gets more capacity where it can find it.
For example, Epilepsy Project, a research group at the University of California at San Francisco, uses thin provisioning, coupled with Network Appliance's storage virtualisation appliance. The project's analysis applications generate hundreds of gigabytes of temporary data while crunching the numbers. Rather than give every researcher the Windows maximum of 2TB of storage capacity for this occasional use, CIO Michael Williams gives each one about a quarter of that physical space, then uses thin provisioning. The appliance allocates the extra space for the analysis applications' temporary data only when it's really needed, essentially juggling the storage space among the researchers.
Virtual storage tools
Storage virtualisation comes in several forms, starting with the most established, array-based virtualisation. Here, a vendor provides an expandable array, to which that vendor's drives can be added; management software virtualises the drives so they appear as a common pool of data. You're typically locked in to one vendor's hardware but don't have to worry about finger-pointing among vendors if something goes wrong, says Forrester Research analyst Andrew Reichman.
Providers of such arrays include Compellent, EMC, Hewlett-Packard, Hitachi Data Systems, Network Appliance (NetApp), Sun Microsystems and Xiotech. Reichman notes that several such array-based virtualisation products, including those from Hitachi (also sold by HP and Sun) and NetApp, also support third-party storage arrays. The Hitachi array is "the only option for the high end," he says, while the others are designed for relatively small storage systems of less than 75TB.
The newer option, network-based storage virtualisation, uses software or a network appliance to manage a variety of disk drives and other storage media. The media can come from multiple vendors, typically allowing for the purchase of lower-cost drives than the all-from-one-vendor options. This lets you use cheaper drives for non-mission-critical storage needs and allows you to reuse at least some storage you've accumulated over the years through mergers and acquisitions, says Ashish Nadkarni, a principal consultant at the IT infrastructure consulting and services company GlassHouse Technologies.
The hard part
Storage virtualisation's newfound flexibility and control does have risks. "The flexibility can be your worst nightmare...it's like giving razor blades to a child," says Wasatch's Engh. The issue that storage virtualisation introduces is complexity.
Although the tools keep track of where the files' various bits really are, IT staff not used to having the data distributed over various media might manage the disks the old-fashioned way, copying volumes with partial files rather than copying the files themselves for backup. Or when setting up virtualised storage networks, they might accidentally mix lower- performance drives into high-performance virtual servers, hindering overall performance in mission-critical applications, notes GlassHouse's Nadkarni.
Virtualisation tools aren't hard to use, but it's hard for storage engineers to stop thinking about data from a physical point of view, says PHNS's Walls. "Everything you thought you knew about storage management you need to not bring to the party," he adds.
Another issue is choosing the right form of storage virtualisation, network-based or array-based. The network-based virtualisation technology is delivered via server-based software, a network appliance, or an intelligent Fibre Channel switch, and it comes in two flavors: block-level and file-level. Array-based virtualisation is typically provided as part of the storage management software that comes with an array.
Array-based virtualisation is mature, says Burton Group's Simpson. But it's limited to storage attached directly to the array or allocated just to that array via a SAN; IT usually must buy array storage from the array vendor, creating expensive vendor lock-in.
Network-based storage virtualisation has been in existence just a few years and so has largely been offered by startups. It's the most flexible form of storage virtualisation, says Forrester's Andrew Reichman, and lets you manage almost all your storage resources, even offsite, as long as they are available via the SAN. Although these tools can theoretically act as a choke point on your SAN, in practice the vendors are good at preventing that problem, he notes.
Most network-based storage virtualisation products work at the block level, meaning they deal with groups of bits rather than whole files. While block-level network-based storage virtualisation is the most flexible option, the technology typically requires that an enterprise change its storage network switches and other network devices to ones that are compatible, Nadkarni notes. "But no one wants to shut down their SAN to do so," he says. Although you can add the technology incrementally, that just raises the complexity, since you now have some virtualised storage and some nonvirtualised storage, all of which need to be managed in parallel.
Thus, most organisations should consider adopting network-based storage virtualisation as part of a greater storage reengineering effort, he advises.
In both cases, all the setup work happened in a nonproduction environment and could be tested thoroughly without affecting users. Once the two IT leaders were happy with their new systems, they then transferred the data over and brought them online. That meant there was only a single disruption to the storage environment that users noticed. "This was a one-time event," Walls notes.