Tuesday, November 16, 2010

Short-Take: New Oracle/Sun ZFS Goodies

I wanted to pass-on some information posted by Joerg Moellenkamp at c0t0d0s0.org - some good news for Sun/ZFS users out there about Solaris Express 2010.11 availability, links to details on ZFS encryption features in Solaris 11 Express and clarification on "production use" guidelines. Here's the pull quotes from his posting:
"Darren (Moffat) wrote three really interesting articles about ZFS encryption: The first one is Introducing ZFS Crypto in Oracle Solaris 11 Express. This blog entry gives you a first overview how to use encryption for ZFS datasets. The second one..."

-  Darren Moffat about ZFS encryption, c0t0d0s0.org, 11-16-2010


"There is a long section in the FAQ about licensing and production use: The OTN license just covers development, demo and testing use (Question 14) . However you can use Solaris 11 Express on your production system as well..."

- Solaris 11 Express for production use, c0t0d0s0.org, 11-16-2010


"A lot of changes found their way into the newest release of Solaris, the first release of Oracle Solaris Express 2010.11. The changes are summarized in a lengthy document, however..."

- What's new for the Administrator in Oracle Solaris  Express 2010.11, c0t0d0s0.org, 11-15-2010



Follow the links to Joerg's blog for more details and links back to the source articles. Cheers!

In-the-Lab: NexentaStor vs. Grub

In this In-the-Lab segment we're going to look at how to recover from a failed ZFS version update in case you've become ambitious with your NexentaStor installation after the last Short-Take on ZFS/ZPOOL versions. If you used the "root shell" to make those changes, chances are your grub is failing after reboot. If so, this blog can help, but before you read on, observe this necessary disclaimer:
NexentaStor is an appliance operating system, not a general purpose one. The accepted way to manage the system volume is through the NMC shell and NMV web interface. Using a "root shell" to configure the file system(s) is unsupported and may void your support agreement(s) and/or license(s).

That said, let's assume that you updated the syspool filesystem and zpool to the latest versions using the "root shell" instead of the NMC (i.e. following a system update where zfs and zpool warnings declare that your pool and filesystems are too old, etc.) In such a case, the resulting syspool will not be bootable until you update grub (this happens automagically when you use the NMC commands.) When this happens, you're greeted with the following boot prompt:
grub>

Grub is now telling you that it has no idea how to boot your NexentaStor OS. Chances are there are two things that will need to happen before your system boots again:

  1. Your boot archive will need updating, pointing to the latest checkpoint;

  2. Your master boot record (MBR) will need to have grub installed again.


We'll update both in the same recovery session to save time (this assumes you know or have a rough idea about your intended boot checkpoint - it is usually the highest numbered rootfs-nmu-NNN checkpoint, where NNN is a three digit number.) The first step is to load the recovery console. This could have been done from the "Safe Mode" boot menu option if grub was still active. However, since grub is blown-away, we'll boot from the latest NexentaStor CD and select the recovery option from the menu.

Import the syspool


Then, we login as "root" (empty password.) From this "root shell" we can import the existing (disks connected to active controllers) syspool with the following command:
# zpool import -f syspool

Note the use of the "-f" card to force the import of the pool. Chances are, the pool will not have been "destroyed" or "exported" so zpool will "think" the pool belongs to another system (your boot system, not the rescue system). As a precaution, zpool assumes that the pool is still "in use" by the "other system" and the import is rejected to avoid "importing an imported pool" which would be completely catastrophic.

With the syspool imported, we need to mount the correct (latest) checkpointed filesystem as our boot reference for grub, destroy the local zfs.cache file (in case the pool disks have been moved, but still all there), update the boot archive to correspond to the mounted checkpoint and install grub to the disk(s) in the pool (i.e. each mirror member).

List the Checkpoints


# zfs list syspool

From the resulting list, we'll pick our highest-numbered checkpoint; for the sake of this article let's say it's "rootfs-nmu-013" and mount it.

Mount the Checkpoint


# mkdir /tmp/syspool
# mount -F zfs syspool/rootfs-nmu-013 /tmp/syspool

Remove the ZPool Cache File


# cd /tmp/syspool/etc/zfs
# rm -f zpool.cache

Update the Boot Archive


# bootadm update-archive -R /tmp/syspool

Determine the Active Disks


# zpool status syspool

For the sake of this article, let's say the syspool was a three-way mirror and the zpool status returned the following:
  pool: syspool
 state: ONLINE
 scan: resilvered 8.64M in 0h0m with 0 errors on Tue Nov 16 12:34:40 2010
config:
NAME           STATE     READ WRITE CKSUM
        syspool        ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            c6t13d0s0  ONLINE       0     0     0
            c6t14d0s0  ONLINE       0     0     0
            c6t15d0s0  ONLINE       0     0     0

errors: No known data errors

This enumerates the three disk mirror as being composed of disks/slices c6t13d0s0, c6t14d0s0 and c6t15d0s0. We'll use that information for the grub installation.

Install Grub to Each Mirror Disk


# cd /tmp/syspool/boot/grub
# installgrub -f -m stage[12] c6t13d0s0
# installgrub -f -m stage[12] c6t14d0s0
# installgrub -f -m stage[12] c6t15d0s0

Unmount and Reboot


# umount /tmp/syspool
# sync
# reboot

Now, the system should be restored to a bootable configuration based on the selected system checkpoint. A similar procedure can be found on Nexenta's site when using the "Safe Mode" boot option. If you follow that process, you'll quickly encounter an error - likely intentional and meant to elicit a call to support for help. See if you can spot the step...

Monday, November 15, 2010

Short-Take: ZFS version ZPOOL Versions

As features are added to ZFS - the ZFS (filesystem) code may change and/or the underlying ZFS POOL code may change. When features are added, older versions of ZFS/ZPOOL will not be able to take advantage of these new features without the ZFS filesystem and/or pool being updated first.

Since ZFS filesystems exist inside of ZFS pools, the ZFS pool may need to be upgraded before a ZFS filesystem upgrade may take place. For instance, in ZFS pool version 24, support for system attributes was added to ZFS. To allow ZFS filesystems to take advantage of these new attributes, ZFS filesystem version 4 (or higher) is required. The proper order to upgrade would be to bring the ZFS pool up to at least version 24, and then upgrade the ZFS filesystem(s) as needed.

Systems running a newer version of ZFS (pool or filesystem) may "understand" an earlier version. However, older versions of ZFS will not be able to access ZFS streams from newer versions of ZFS.

For NexentaStor users, here are the current versions of the ZFS filesystem (see "zfs upgrade -v"):
VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifier (FUID)
 4   userquota, groupquota properties
 5   System attributes

For NexentaStor users, here are the current versions of the ZFS pool (see "zpool upgrade -v"):
VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties
 23  Slim ZIL
 24  System attributes
 25  Improved scrub stats
 26  Improved snapshot deletion performance

As versions change, upgrading the ZFS pool and filesystem is possible using the respective upgrade command. To upgrade all imported ZFS pools, issue the following command as root:
zpool upgrade -a

Likewise, to upgrade the ZFS filesystem(s) inside the pool and all child filesystems, issue the following command as root:
zfs upgrade -r -a

The new ZFS features available to these pool and filesystem version(s) will now be available to the upgraded pools/filesystems.

Friday, November 12, 2010

Quick-Take: Is Your Marriage a Happy One?

I came across a recent post by Chad Sakac (VP, VMware Alliance at EMC) discussing the issue of how vendors drive customer specifications down from broader goals to individual features or implementation sets (I'm sure VCE was not in mind at the time.) When it comes to vendors insist on framing the "client argument" in terms of specific features and proprietary approaches, I have to agree that Chad is spot on. Here's why:

First, it helps when vendors move beyond the "simple thinking" of infrastructure elements as a grid of point solutions and more of an "organic marriage of tools" - often with overlapping qualities. Some marriages begin with specific goals, some develop them along the way and others change course drastically and without much warning. The rigidness of point approaches rarely accommodates growth beyond the set of assumptions that created the it in the first place. Likewise, the "laser focus" on specific features detracts from the overall goal: the present and future value of the solution.

When I married my wife, we both knew we wanted kids. Some of our friends married and "never" wanted kids, only to discover a child on the way and subsequent fulfillment through raising them. Still, others saw a bright future strained with incompatibility and the inevitable divorce. Such is the way with marriages.

Second, it takes vision to solve complex problems. Our church (Church of the Highlands in Birmingham, Alabama) takes a very cautious position on the union between souls: requiring that each new couple seeking a marriage give it the due consideration and compatibility testing necessary to have a real chance at a successful outcome. A lot of "problems" we would encounter were identified before we were married, and when they finally popped-up we knew how to identify and deal with them properly.

Couples that see "counseling" as too obtrusive (or unnecessary) have other options. While the initial investment of money are often equivalent, the return on investment is not so certain. Uncovering incompatibilities "after the sale" provides for difficult and too often a doomed outcome (hence, 50% divorce rate.)

This same drama plays out in IT infrastructures where equally elaborate plans, goals and unexpected changes abound. You date (prospecting and trials), you marry (close) and are either fruitful (happy client), disappointed (unfulfilled promises) or divorce. Often, it's not the plan that failed but the failure to set/manage expectations and address problems that causes the split.

Our pastor could not promise that our marriage would last forever: our success is left to God and the two of us. But he did help us to make decisions that would give us a chance at a fruitful union. Likewise, no vendor can promise a flawless outcome (if they do, get a second opinion), but they can (and should) provide the necessary foundation for a successful marriage of the technology to the business problem.

Third, the value of good advice is not always obvious and never comes without risk. My wife and I were somewhat hesitant on counseling before marriage because we were "in love" and were happy to be blind to the "problems" we might face. Our church made it easy for us: no counseling, no marriage. Businesses can choose to plot a similar course for their clients with respect to their products (especially the complex ones): discuss the potential problems with the solution BEFORE the sale or there is no sale. Sometimes this takes a lot of guts - especially when the competition takes the route of oversimplification. Too often IT sales see identifying initial problems (with their own approach) as too high a risk and too great an obstacle to the sale.

Ultimately, when you give due consideration to the needs of the marriage, you have more options and are better equipped to handle the inevitable trials you will face. Whether it's an unexpected child on the way, or an unexpected up-tick in storage growth, having the tools in-hand to deal with the problem lessens its severity. The point is, being prepared is better than the assumption of perfection.

Finally, the focus has to be what YOUR SOLUTION can bring to the table: not how you think your competition will come-up short. In Chad's story, he's identified vendors disqualifying one another's solutions based on their (institutional) belief (or disbelief) in a particular feature or value proposition. That's all hollow marketing and puffery to me, and I agree completely with his conclusion: vendors need to concentrate on how their solution(s) provide present and future value to the customer and refrain from the "art" of narrowly framing their competitors.

Features don't solve problems: the people using them do. The presence (or absence) of a feature simply changes the approach (i.e. the fallacy of feature parity). As Chad said, it's the TOTALITY of the approach that derives value - and that goes way beyond individual features and products. It's clear to me that a lot of counseling takes place between Sakac's EMC team and their clients to reach those results. Great job, Chad, you've set a great example for your team!

Monday, November 8, 2010

Short-Take: vSphere Multi-core Virtual Machines

Virtual machines were once relegated to a second class status of single-core vCPU configurations. To get multiple process threads, you had to add to add one "virtual CPU" for each thread. This approach, while functional, had potential serious software licensing ramifications. This topic drew some attention on Jason Boche's blog back in July, 2010 with respect to vSphere 4.1.

With vSphere 4U2 and vSphere 4.1 you have the option of using an advanced configuration setting to change the "virtual cores per socket" to allow thread count needs to have a lesser impact on OS and application licensing. The advanced configuration parameter name is "cpuid.coresPerSocket" (default 1) and acts as a divisor for the virtual hardware setting "CPUs" which must be an integral multiple of the "cpuid.coresPerSocket" value. More on the specifics and limitations of this setting can be found in "Chapter 7, Configuring Virtual Machines" (page 79) of the vSphere Virtual Machine Administrator Guide for vSphere 4.1. [Note: See also VMware KB1010184.]

The value of "cpuid.coresPerSocket" is effectively ignored when "CPUs" is set to 1. In case "cpuid.coresPerSocket" is an imperfect divisor, the power-on operation will fail with the following message in the VI Client's task history:

[caption id="attachment_1726" align="aligncenter" width="328" caption="Virtual core count is imperfect divisor of CPUs"]Power-on-Fail[/caption]

If virtual machine logging is enabled, the following messages (only relevant items listed) will appear in the VM's log (Note: CPUs = 3, cpuid.coresPerSocket = 2):
Nov 08 14:17:43.676: vmx| DICT         virtualHW.version = 7
Nov 08 14:17:43.677: vmx| DICT                  numvcpus = 3
Nov 08 14:17:43.677: vmx| DICT      cpuid.coresPerSocket = 2
Nov 08 14:17:43.727: vmx| VMMon_ConfigMemSched: vmmon.numVCPUs=3
Nov 08 14:17:43.799: vmx| NumVCPUs 3
Nov 08 14:17:44.008: vmx| Msg_Post: Error
Nov 08 14:17:44.008: vmx| [msg.cpuid.asymmetricalCores] The number of VCPUs is not a multiple of the number of cores per socket of your VM, so it cannot be powered on.----------------------------------------
Nov 08 14:17:44.033: vmx| Module CPUID power on failed.

While the configuration guide clearly states (as Jason Boche rightly pointed out in his blog):
The number of virtual CPUs must be divisible by the number of cores per socket. The coresPerSocketsetting must be a power of two.

- Virtual Machine Configuration Guide, vSphere 4.1



We've found that "cpuid.coresPerCPU" simply needs to be a perfect divisor of the "CPUs" value. This tracks much better with prior versions of vSphere where "odd numbered" socket/CPU counts were allowed, so therefore odd numbers of cores-per-CPU allowed provided the division of CPUs by coresPerCPU is integral. Suffice to say, if the manual says "power of two" (1, 2, 4, 8, etc.)  then those are likely the only "supported" configuration available. Any other configuration that "works" (i.e. 3, 5, 6, 7, etc.) will likely be unsupported by VMware in the event of a problem.

That said, odd values of "cpuid.coresPerCPU" do work just fine. Since SOLORI has a large number of AMD-only eco-systems, it is useful to test configurations that match the physical core count of the underlying processors (i.e. 2, 3, 4, 6, 8, 12). For instance, we were able to create a single, multi-core virtual CPU with 3-cores (CPUs = 3, cpuid.coresPerSocket = 3) and run Windows Server 2003 without incident:

[caption id="attachment_1727" align="aligncenter" width="403" caption="Windows Server 2003 with virtual "tri-core" CPU"]Virtual Tri-core CPU[/caption]

It follows, then, that we were likewise able to run a 2P virtual machine with a total of 6-cores (3-per CPU) running the same installation of Windows Server 2003 (CPUs = 6, cpuid.coresPerSocket = 3):

[caption id="attachment_1728" align="aligncenter" width="403" caption="Virtual Dual-processor (2P), Tri-core (six cores total)"]2P Virtual Tri-core[/caption]

Here are the relevant vmware log messages associated with this 2P, six total core virtual machine boot-up:
Nov 08 14:54:21.892: vmx| DICT         virtualHW.version = 7
Nov 08 14:54:21.893: vmx| DICT                  numvcpus = 6
Nov 08 14:54:21.893: vmx| DICT      cpuid.coresPerSocket = 3
Nov 08 14:54:21.944: vmx| VMMon_ConfigMemSched: vmmon.numVCPUs=6
Nov 08 14:54:22.009: vmx| NumVCPUs 6
Nov 08 14:54:22.278: vmx| VMX_PowerOn: ModuleTable_PowerOn = 1
Nov 08 14:54:22.279: vcpu-0| VMMon_Start: vcpu-0: worldID=530748
Nov 08 14:54:22.456: vcpu-1| VMMon_Start: vcpu-1: worldID=530749
Nov 08 14:54:22.487: vcpu-2| VMMon_Start: vcpu-2: worldID=530750
Nov 08 14:54:22.489: vcpu-3| VMMon_Start: vcpu-3: worldID=530751
Nov 08 14:54:22.489: vcpu-4| VMMon_Start: vcpu-4: worldID=530752
Nov 08 14:54:22.491: vcpu-5| VMMon_Start: vcpu-5: worldID=530753

It's clear from the log that each virtual core spawns a new virtual machine monitor thread within the VMware kernel. Confirming the distribution of cores from the OS perspective is somewhat nebulous due to the mismatch of the CPU's ID (follows the physical CPU on the ESX host) and the "arbitrary" configuration set through the VI Client. CPU-z shows how this can be confusing:

[caption id="attachment_1729" align="aligncenter" width="407" caption="CPU#1 as described by CPU-z"]CPU-z output, 1 of 2[/caption]

[caption id="attachment_1730" align="aligncenter" width="406" caption="CPU#2 as described by CPU-z"]CPU-z CPU 2 of 2[/caption]

Note that CPU-z identifies the first 4-cores with what it calls "Processor #1" and the remaining 2-cores with "Processor #2" - this appears arbitrary due to CPU-z's "knowledge" of the physical CPU layout. In (virtual) reality, this assessment by CPU-z is incorrect in terms of cores per CPU, however it does properly demonstrate the existence of two (virtual) CPUs. Here's the same VM with a "cpuid.coresPerSocket" of 6 (again, not 1, 2, 4 or 8 as supported ):

[caption id="attachment_1731" align="aligncenter" width="405" caption="CPU-z demonstrating a 1P, six-core virtual CPU"]Single 6-core (virtual) CPU[/caption]

Note again that CPU-z correctly identifies the underlying physical CPU as an Opteron 2376 (2.3GHz quad-core) but shows 6-cores, 6-threads as configured through VMware. Note also that the "grayed-out" selection for "Processor #1" demonstrates a single processor is enumerated in virtual hardware. [Note: VMware's KB1030067 demonstrates accepted ways of verifying cores per socket in a VM.)

How does this help with per-CPU licensing in a virtual world? It effectively evens the playing field between physical and virtual configurations. In the past (VI3 and early vSphere 4) multiple virtual threads were only possible through the use of additional virtual sockets. This paradigm did not track with OS licensing and CPU-socket-aware application licensing since the OS/applications would recognize the additional threads as CPU sockets in excess of the license count.

With virtual cores, the underlying CPU configuration (2P, 12 total cores, etc.) can be emulated to the virtual machine layer and deliver thread-count parity to the virtual machine. Since most per-CPU licenses speak to the physical hardware layer, this allows for parity between the ESX host CPU count and the virtual machine CPU count, regardless of the number of physical cores.

Also, in NUMA systems where core/socket/memory affinity is a potential performance issue, addressing physical/virtual parity is potentially important. This could have performance implications for AMD 2400/6100 and Intel 5600 systems where 6 and 12 cores/threads are delivered per physical CPU socket.