Here's the scenario:
- An afterhours SAN upgrade in non-HA environment (maybe Auto-CDP for BC/DR, but no active fail-over);
- Shutdown of SAN requires shutdown of all dependent VM's, including domain controllers (AD);
- End-user and/or maintenance plans are dependent on CIFS shares from SAN;
- Authentication of CIFS shares on NexentaStor is AD-based;
Here's the typical maintenance plan (detail omitted):
- Ordered shutdown of non-critical VM's (including UpdateManager, vMA, etc.);
- Ordered shutdown of application VM's;
- Ordered shutdown of resource VM's;
- Ordered shutdown of AD server VM's (minus one, see step 7);
- Migrate/vMotion remaining AD server and vCenter to a single ESX host;
- Ordered shutdown of ESX hosts (minus one, see step 8);
- vSphere Client: Log-out of vCenter;
- vSphere Client: Log-in to remaining ESX host;
- Ordered shutdown of vCenter;
- Ordered shutdown of remaining AD server;
- Ordered shutdown of remaining ESX host;
- Update SAN;
- Reboot SAN to update checkpoint;
- Test SAN update - restoring previous checkpoint if necessary;
- Power-on ESX host containing vCenter and AD server (see step 8);
- vSphere Client: Log-in to remaining ESX host;
- Power-on AD server (through to VMware Tools OK);
- Restart SMB service on NexentaStor;
- Power-on vCenter;
- vSphere Client: Log-in to vCenter;
- vSphere Client: Log-out of ESX host;
- Power-on remaining ESX hosts;
- Ordered power-on of remaining VM's;
A couple of things to note in an AD environment:
- NexnetaStor requires the use of AD-based DNS for AD integration;
- AD-based DNS will not be available at SAN re-boot if all DNS servers are virtual and only one SAN is involved;
- Lack of DNS resolution on re-boot will cause a failure for DNS name based NTP service synchronization;
- NexentaStor SMB service will fail to properly initialize AD credentials;
- VMware 4.1 now pushes AD authentication all the way to ESX hosts, enabling better credential management and security but creating a potential AD dependency as well;
- Using auto-startup order on the remaining ESX host for AD and vCenter could automate the process (steps 17 & 19), however, I prefer the "manual" approach after a SAN upgrade in case the upgrade failure is detected only after ESX host is restarted (i.e. storage service interaction in NFS/iSCSI after upgrade).
SOLORI's Take: This is a great opportunity to re-think storage resources in the SMB as the linchpin to 100% virtualization. Since most SMB's will have a tier-2 or backup NAS/SAN (auto-sync or auto-CDP) for off-rack backup, leveraging a shared LUN/volume from that SAN/NAS for a backup domain controller is a smart move. Since tier-2 SAN's may not have the IOPs to run ALL mission critical applications during the maintenance interval, the presence of at least one valid AD server will promote a quicker RTO, post-maintenance, than coming up cold. [This even works with DAS on the ESX host]. Solution - add the following and you can ignore step 15:
3a. Migrate always-on AD server to LUN/volume on tier-2 SAN/NAS;
24. Migrate always-on AD server from LUN/volume on tier-2 SAN/NAS back to tier-1;
Since even vSphere Essentials Plus has vMotion now (a much requested and timely addition) collapsing all remaining VM's to a single ESX host is a no brainer. However, migrating the storage is another issue which cannot be resolved without either a shutdown of the VM (off-line storage migration) or Enterprise/Enterprise Plus version of vSphere. That is why the migration of the AD server from tier-2 is reserved for last (step 17) - it will likely need to be shutdown to migrate the storage between SAN/NAS appliances.
No comments:
Post a Comment