In Part 2 of this series we're going to deploy a Virtual Storage Appliance (VSA) based on an open storage platform which uses Sun's Zetabyte File System (ZFS) as its underpinnings. We've been working with Nexenta's NexentaStor SAN operating system for some time now and will use it - with its web-based volume management - instead of deploying OpenSolaris and creating storage manually.
Part 2, Choosing a Virtual Storage Architecture
To get started on the VSA, we want to identify some key features and concepts that caused us to choose NexentaStor over a myriad of other options. These are:
- NexentaStor is based on open storage concepts and licensing;
- NexentaStor comes in a "free" developer's version with 4TB 2TB of managed storage;
- NexentaStor developer's version includes snapshots, replication, CIFS, NFS and performance monitoring facilities;
- NexentaStor is available in a fully supported, commercially licensed variant with very affordable $/TB licensing costs;
- NexentaStor has proven extremely reliable and forgiving in the lab and in the field;
- Nexenta is a VMware Technology Alliance Partner with VMware-specific plug-ins (commercial product) that facilitate the production use of NexentaStor with little administrative input;
- Sun's ZFS (and hence NexentaStor) was designed for commodity hardware and makes good use of additional RAM for cache as well as SSD's for read and write caching;
- Sun's ZFS is designed to maximize end-to-end data integrity - a key point when ALL system components live in the storage domain (i.e. virtualized);
- Sun's ZFS employs several "simple but advanced" architectural concepts that maximize performance capabilities on commodity hardware: increasing IOPs and reducing latency;
While the performance features of NexentaStor/ZFS are well outside the capabilities of an inexpensive "all-in-one-box" lab, the concepts behind them are important enough to touch on briefly. Once understood, the concepts behind ZFS make it a compelling architecture to use with virtualized workloads. Eric Sproul has a short slide deck on ZFS that's worth reviewing.
ZFS and Cache - DRAM, Disks and SSD's
Legacy SAN architectures are typically split into two elements: cache and disks. While not always monolithic, the cache in legacy storage typically are single-purpose pools set aside to hold frequently accessed blocks of storage - allowing this information to be read/written from/to RAM instead of disk. Such caches are generally very expensive to expand (when possible) and may only accomodate one specific cache function (i.e. read or write, not both). Storage vendors employ many strategies to "predict" what information should stay in cache and how to manage it to effectively improve overall storage throughput.
[caption id="attachment_968" align="aligncenter" width="405" caption="New cache model used by ZFS allows main memory and fast SSDs to be used as read cache and write cache, reducing the need for large DRAM cache facilities."]

Like any modern system today, available DRAM in a ZFS system - that the SAN Appliance's operating system is not directly using - can be apportioned to cache. The ZFS adaptive replacement cache, or ARC, allows for main memory to be used to access frequently read blocks of data from DRAM (at microsecond latency). Normally, an ARC read miss would result in a read from disk (at millisecond latency), but an additional cache layer - the second level ARC, or L2ARC - can be employed using very fast SSDs to increase effective cache size (and drastically reduce ARC miss penalties) without resorting to significantly larger main memory configurations.
[caption id="attachment_971" align="aligncenter" width="450" caption="The L2ARC in ZFS sits in-between the ARC and disks, using fast storage to extend main memory caching. L2ARC uses an evict-ahead policy to aggregate ARC entries and predictively push them out to flash to eliminate latency associated with ARC cache eviction."]

In fact, the L2ARC is only limited by the DRAM (main memory) required for bookkeeping at a ratio of about 50:1 for ZFS with an 8-KB record size. This means only that 10GB of additional DRAM would be required to add 512GB of L2ARC (4-128GB read-optimized SSD's in RAID0 configuration). Together with the ARC, the L2ARC allows for a storage pool consisting of fewer numbers of disks to perform like a much larger array of disks where read operations are concerned.
[caption id="attachment_973" align="aligncenter" width="450" caption="L2ARC's evict-ahead polict aggregates ARC entries and predictively pushes them to L2ARC devices to eliminate ARC eviction latency. The L2ARC also acts as a ARC cache for processes that may force premature ARC eviction (runaway application) or otherwise adversely affect performance."]

Next, the ZFS Intent-Log and write caching...
The ZFS Intent-Log: Write Caching
For synchronous write operations, ZFS employs a special device called the ZFS intent-log, or ZIL. It is the job of the ZIL to allow synchronous writes to be quickly written and acknowledged to the client before they are actually committed to the storage pool. Only small transactions are written to the ZIL, while larger writes are written directly to the main storage pool.
[caption id="attachment_974" align="aligncenter" width="449" caption="The ZFS intent-log (ZIL) allows synchronous writes to be quickly written and acknowledged to the client before data is written to the storage pool."]

The ZIL can be dealt with in one of four ways: (1) disabled, (2) embedded in the main storage pool, (3) directed to a dedicated storage pool, or (4) directed to dedicated, write-optimized SSDs. Since the ZIL is only used for smaller synchronous write operations, the size of the ZIL (per storage pool) ranges from 64MB in size to 1/2 the size of physical memory. Additionally, log device size is limited by the amount of data - driven by target throughput- that could potentially benefit from the ZIL (i.e. written to ZIL within a two 5 second periods). For instance, a single 2Gbps FC connection's worth of synchronous writes might require a maximum of 2.5GB ZIL.
ZFS Employs Commodity Economies of Scale
Besides enabling the economies of scale delivered by commodity computing components, potential power savings delivered by the use of SSDs in place of massive disk arrays and the I/O and latency benefits of ARC, L2ARC and ZIL caches, ZFS does not require high-end RAID controllers to perform well. In fact, ZFS provides the maximum benefit when directly managing all disks in the storage pool, allowing for direct access to SATA, SAS and FC devices without the use of RAID abstractions.
That is not to say that ZFS cannot make use of RAID for the purpose of fault tolerance. On the contrary, ZFS provides four levels of RAID depending on use case: striped, no redundancy (RAID0); mirrored disk (RAID1); striped mirror sets (RAID1+0); or striped with parity (RAIDz). Disks can be added to pools at any time at the same RAID level and any additional storage created is immediately available for use. A feature of ZFS causes pool data to be redistributed across new volumes as writes are performed, slowly redistributing data as it is modified.
Next, why we chose NexentaStor...
About Nexenta
Nexentais a VMware Technology Alliance Partner based in Mountain View, California. Nexenta was founded in 2005, and is the leading provider of hardware independent OpenStorage solutions. Nexenta's mantra over the last 12-months has been to "end vendor lock-in" associated with legacy storage platforms. NexentaStor - and their open source operating system NexentaCore - based on ZFS and Debian - represent the company's sole product focus.
NexentaStor is a software based NAS and SAN appliance. NexentaStor is a fully featured NAS/SAN solution that has evolved from its roots as a leading disk to disk and second tier storage solution increasingly into primary tier use cases. The addition of NexentaStor 2.0, including phone support, has accelerated this transition as has the feedback and input of well over 10,000 NexentaStor users and the ongoing progress of the underlying OpenSolaris and Nexenta.org communities, each of which have hundreds of thousands of members.
NexentaStor is able to take virtually any data source (including legacy storage) and share it completely flexibly. NexentaStor is built upon the ZFS file system which means there are no practical limits to the number of snapshots or to file size when using NexentaStor. Also, Nexenta has added synchronous replication to ZFS based asynchronous replication. Thin provisioning and compression improve capacity utilization. Also, no need to 'short stroke' your drives to achieve performance as explained below.
Today's processors can easily handle end to end checksums on every transaction. The processors that existed when legacy file systems were designed could not. Checksuming every transaction end to end means any source of data corruption can be detected. Plus, if you are using NexentaStor software RAID it can automatically correct data corruption.
The underlying ZFS file system was built to exploit cache to improve read and write performance. By adding SSDs you can achieve a dramatic improvement in performance without increasing the number of expensive spinning disks, thereby saving money, footprint, and power and cooling. Other solutions require you to decide which data should be on the flash or SSDs. This can be quite challenging and will never be as efficient in a dynamic environment as the real time algorithms built into the ZFS file system.
Specifically, with NexentaStor you NEVER run out of snapshots whereas with legacy solutions you run out fairly quickly, requiring work arounds that take time and increase the risk of service disruption. In summary, over 3x the capacity, equivalent support thanks to Nexenta's partners, superior hardware, and superior software for over 75% less than legacy solutions.
SOLORI on NexentaStor
We started following NexentaStor's development in late 2008 and have been using it in the lab since version 1.0.6 and in limited production since 1.1.4. Since then, we've seen great improvements to the NexentaStor roster over the basic ZFS features, including:
Simple Failover (HA), 10GE, ATA over Ethernet, Delorean, VM Datacenter, improvements to CIFS and iSCSI support, GUI improvements, COMSTAR support, VLAN and 802.3ad support in GUI, Zvol auto-sync, improved analytics from GUI, HA Cluster (master/master), developer/free edition capacity increase from 1TB to 2TB, and the addition of a network professional support and services for NexentaStor customers.
Now, with the advent of the 2.1 release, NexentaStor is showing real signs of maturity. Its growth as a product has been driven by improvements to ZFS and Nexenta's commercial vision of open storage on commodity hardware sold and serviced by a knowledgable and vibrant partner network.
Beyond availability, perhaps the best improvements for NexentaStor have been in the support and licensing arena. The updated license in 2.1 allows for capacity to be measured as total useable capacity (after formatting and redundancy groups) not including the ZIL, L2ARC and spare drives. Another good sign of the product's uptake and improved value is its increasing base price and available add-on modules. Still, at $1,400 retail for 8TB of managed storage, it's a relative bargain.
One of our most popular blogs outside of virtualization has been the setup and use of FreeNAS and OpenFiler as low-cost storage platform. Given our experience with both of these alternatives, we find NexentaStor Developer's Edition to be superior in terms of configurability and stability as an iSCSI or NFS host, and - with its simple-to-configure replication and snapshot services - it provides a better platform for low-cost continuity, replication and data integrity initiatives. The fact that Nexenta is a VMware Technology Partner makes the choice of Nexenta over the other "open storage" platforms a no-brainer.
Coming in our next installment, Part 3, we will create a NexentaStor VSA, learn how to provision iSCSI and NFS storage and get ready for our virtual ESX/ESXi installations...
[...] Results. In search of the elegant solution to complex business problems… « In-the-Lab: Full ESX/vMotion Test Lab in a Box, Part 2 Quick Take: HP’s Sets Another 48-core VMmark Milestone » In-the-Lab: Full [...]
ReplyDelete[...] Part 2, Selecting a Virtual Storage Appliance (VSA) [...]
ReplyDeletecan you provide the VMotion test LAB links and Documents....
ReplyDelete