Sunday, July 18, 2010

Quick-Take: NexentaStor AD Shares in 100% Virtual SMB

Here's a maintenance note for SMB environments attempting 100% virtualization and relying on SAN-based file shares to simplify backup and storage management: beware the chicken-and-egg scenario on restart before going home to capture much needed Zzz's. If your domain controller is virtualized and it's VMDK file lives on SAN/NAS, you'll need to restart SMB services on the NexentaStor appliance before leaving the building.

Here's the scenario:

  1. An afterhours SAN upgrade in non-HA environment (maybe Auto-CDP for BC/DR, but no active fail-over);

  2. Shutdown of SAN requires shutdown of all dependent VM's, including domain controllers (AD);

  3. End-user and/or maintenance plans are dependent on CIFS shares from SAN;

  4. Authentication of CIFS shares on NexentaStor is AD-based;


Here's the typical maintenance plan (detail omitted):

  1. Ordered shutdown of non-critical VM's (including UpdateManager, vMA, etc.);

  2. Ordered shutdown of application VM's;

  3. Ordered shutdown of resource VM's;

  4. Ordered shutdown of AD server VM's (minus one, see step 7);

  5. Migrate/vMotion remaining AD server and vCenter to a single ESX host;

  6. Ordered shutdown of ESX hosts (minus one, see step 8);

  7. vSphere Client: Log-out of vCenter;

  8. vSphere Client: Log-in to remaining ESX host;

  9. Ordered shutdown of vCenter;

  10. Ordered shutdown of remaining AD server;

  11. Ordered shutdown of remaining ESX host;

  12. Update SAN;

  13. Reboot SAN to update checkpoint;

  14. Test SAN update - restoring previous checkpoint if necessary;

  15. Power-on ESX host containing vCenter and AD server (see step 8);

  16. vSphere Client: Log-in to remaining ESX host;

  17. Power-on AD server (through to VMware Tools OK);

  18. Restart SMB service on NexentaStor;

  19. Power-on vCenter;

  20. vSphere Client: Log-in to vCenter;

  21. vSphere Client: Log-out of ESX host;

  22. Power-on remaining ESX hosts;

  23. Ordered power-on of remaining VM's;


A couple of things to note in an AD environment:

  1. NexnetaStor requires the use of AD-based DNS for AD integration;

  2. AD-based DNS will not be available at SAN re-boot if all DNS servers are virtual and only one SAN is involved;

  3. Lack of DNS resolution on re-boot will cause a failure for DNS name based NTP service synchronization;

  4. NexentaStor SMB service will fail to properly initialize AD credentials;

  5. VMware 4.1 now pushes AD authentication all the way to ESX hosts, enabling better credential management and security but creating a potential AD dependency as well;

  6. Using auto-startup order on the remaining ESX host for AD and vCenter could automate the process (steps 17 & 19), however, I prefer the "manual" approach after a SAN upgrade in case the upgrade failure is detected only after ESX host is restarted (i.e. storage service interaction in NFS/iSCSI after upgrade).


SOLORI's Take: This is a great opportunity to re-think storage resources in the SMB as the linchpin to 100% virtualization.  Since most SMB's will have a tier-2 or backup NAS/SAN (auto-sync or auto-CDP) for off-rack backup, leveraging a shared LUN/volume from that SAN/NAS for a backup domain controller is a smart move. Since tier-2 SAN's may not have the IOPs to run ALL mission critical applications during the maintenance interval, the presence of at least one valid AD server will promote a quicker RTO, post-maintenance, than coming up cold. [This even works with DAS on the ESX host]. Solution - add the following and you can ignore step 15:

3a. Migrate always-on AD server to LUN/volume on tier-2 SAN/NAS;


24. Migrate always-on AD server from LUN/volume on tier-2 SAN/NAS back to tier-1;


Since even vSphere Essentials Plus has vMotion now (a much requested and timely addition) collapsing all remaining VM's to a single ESX host is a no brainer. However, migrating the storage is another issue which cannot be resolved without either a shutdown of the VM (off-line storage migration) or Enterprise/Enterprise Plus version of vSphere. That is why the migration of the AD server from tier-2 is reserved for last (step 17) - it will likely need to be shutdown to migrate the storage between SAN/NAS appliances.

Friday, July 16, 2010

Quick-Take: vSphere 4, Now with SUSE Enterprise Linux, Gratis

Earlier this month VMware announced that it was expanding its partnership with Novell in order to offer a 1:1 CPU enablement license for SLES. Mike Norman's post at VirtualizationPractice.com discusses the potential "darker side" of the deal, which VMware presents this way:
VMware and Novell are expanding their technology partnership to make it easier for customers to use SLES operating system in vSphere environments with support offerings that will help your organization:

  • Reduce the cost of maintaining SLES in vSphere environments

  • Obtain direct technical support from VMware for both vSphere and SLES

  • Simplify your purchasing and deployment experience


In addition, VMware plans to standardize our virtual appliance-based products on SLES for VMware further simplifying the deployment and ongoing management of these solutions.

  • Customers will receive SLES with one (1) entitlement for a subscription to patches and updates per qualified VMware vSphere SKU. For example, if a customer were to buy 100 licenses of a qualified vSphere Enterprise Plus SKU, that customer would receive SLES with one hundred (100) entitlements for subscription to patches and updates.

  • Customers cannot install SLES with the accompanying patches and updates subscription entitled by a VMware purchase 1) directly on physical servers or 2) in virtual machines running on third party hypervisors.

  • Technical support for SLES with the accompanying patches and updates subscription entitled by a VMware purchase is not included and may be purchased separately from VMware starting in 3Q 2010.


- VMware Website, 6/2010



The part about standardization has been emphasized by us - not VMware - but it seems to be a good fit with VMware's recent acquisition of Zimbra (formerly owned by Yahoo!) and the release of vSphere 4.1 with "cloud scale" implications. That said, the latest version of the VMware Data Recovery appliance has been recast from RedHat to CentOS with AD integration, signaling that it will take some time for VMware to transition to Novell's SUSE Linux.


SOLORI's Take: Linux-based virtual appliances are a great way to extend features and control without increasing license costs. Kudus to VMware for hopping on-board the F/OSS train. Now where's my Linux-based vCenter with a Novell Directory Services for Windows alternative to Microsoft servers?

Thursday, July 15, 2010

ZFS Pool Import Fails After Power Outage

The early summer storms have taken its toll on Alabama and UPS failures (and short-falls) have been popping-up all over. Add consolidated, shared storage to the equation and you have a recipe for potential data loss - at least this is what we've been seeing recently. Add JBOD's with separate power rails and limited UPS life-time and/or no generator backup and you've got a recipe for potential data loss.



Even with ZFS pools, data integrity in a power event cannot be guaranteed - especially when employing "desktop" drives and RAID controllers with RAM cache and no BBU (or perhaps a "bad storage admin" that has managed to disable the ZIL). When this happens, NexentaStor (an other ZFS storage devices) may even show all members in the ZFS pool as "ONLINE" as if they are awaiting proper import. However, when an import is attempted (either automatically on reboot or manually) the pool fails to import.




From the command line, the suspect pool's status might look like this:


root@NexentaStor:~# zpool import
pool: pool0
id: 710683863402427473
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
        pool0        ONLINE
          mirror-0   ONLINE
            c1t12d0  ONLINE
            c1t13d0  ONLINE
          mirror-1   ONLINE
            c1t14d0  ONLINE
            c1t15d0  ONLINE

Looks good, but the import it may fail like this:

root@NexentaStor:~# zpool import pool0
cannot import 'pool0': I/O error

Not good. This probably indicates that something is not right with the array. Let's try to force the import and see what happens:



Nope. Now this is the point where most people start to get nervous, their neck tightens-up a bit and they begin to flip through a mental calendar of backup schedules and catalog backup repositories - I know I do. However, it's the next one that makes most administrators really nervous when trying to "force" the import:



root@NexentaStor:~# zpool import -f pool0
pool: pool0
id: 710683863402427473
status: The pool metadata is corrupted and the pool cannot be opened.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
cannot import 'pool0': I/O error


Really not good. Did it really suggest going to backup? Ouch!.



In this case, something must have happened to corrupt metadata - perhaps the non-BBU cache on the RAID device when power failed. Expensive lesson learned? Not yet. The ZFS file system still presents you with options, namely "acceptable data loss" for the period of time accounted for in the RAID controller's cache. Since ZFS writes data in transaction groups and transaction groups normally commit in 20-30 second intervals, that RAID controller's lack of BBU puts some or all of that pending group at risk. Here's how to tell by testing the forced import as if data loss was allowed:



root@NexentaStor:~# zpool import -nfF pool0
Would be able to return data to its state as of Fri May 7 10:14:32 2010.
Would discard approximately 30 seconds of transactions.


or

root@NexentaStor:~# zpool import -nfF pool0
WARNING: can't open objset for pool0

If the first output is acceptable, then proceeding without the "n" option will produce the desired effect by "rewinding" the last couple of transaction groups (read ignoring) and imported the "truncated" pool. The "import" option will report the exact number of "seconds" worth of data that cannot be restored. Depending on the bandwidth and utilization of your system, this could be very little data or several MB worth of transaction(s).



What to do about the second option? From the man pages on "zpool import" Sun/Oracle says the following:



zpool import [-o mntopts] [ -o property=value] ... [-d dir-c cachefile] [-D] [-f] [-R root] [-F [-n]]-a
Imports all pools found in the search directories. Identical to the previous command, except that all pools with a sufficient number of devices available are imported. Destroyed pools, pools that were previously destroyed with the “zpool destroy” command, will not be imported unless the-D option is specified.



-o mntopts
Comma-separated list of mount options to use when mounting datasets within the pool. See zfs(1M) for a description of dataset properties and mount options.

-o property=value
Sets the specified property on the imported pool. See the “Properties” section for more information on the available pool properties.

-c cachefile
Reads configuration from the given cachefile that was created with the “cachefile” pool property. This cachefile is used instead of searching for devices.

-d dir
Searches for devices or files in dir. The -d option can be specified multiple times. This option is incompatible with the -c option.

-D
Imports destroyed pools only. The -f option is also required.

-f
Forces import, even if the pool appears to be potentially active.

-F
Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.

-a
Searches for and imports all pools found.

-R root
Sets the “cachefile” property to “none” and the “altroot” property to “root”.

-n
Used with the -F recovery option. Determines whether a non-importable pool can be made importable again, but does not actually perform the pool recovery. For more details about pool recovery mode, see the -F option, above.


No real help here. What the documentation omits is the "-X" option. This option is only valid with the "-F" recovery mode setting, however it is NOT well documented suffice to say it is the last resort before acquiescing to real problem solving... Assuming the standard recovery mode "depth" of transaction replay is not quite enough to get you over the hump, the "-X" option gives you an "extended replay" by seemingly providing a scrub-like search through the transaction groups (read "potentially time consuming") until it arrives at the last reliable transaction group in the dataset.

Lessons to be learned from this excursion into pool recovery are as follows:



  1. Enterprise SAS good; desktop SATA could be a trap

  2. Redundant Power + UPS + Generator = Protected; Anything else = Risk

  3. SAS/RAID Controller + Cache + BBU = Fast; SAS/RAID Controller + Cache - BBU = Train Wreck



The data integrity functions in ZFS are solid when used appropriately. When architecting your HOME/SOHO/SMB NAS appliance, pay attention to the hidden risks of "promised performance" that may walk you down the plank towards a tape backup (or resume writing) event. Better to leave the 5-15% performance benefit on the table or purchase adequate BBU/UPS/Generator resources to supplant your system in worst-case events. In complex environments, a pending power loss can be properly mitigated through management supervisors and clever scripts: turning down resources in advance of total failure. How valuable is your data???