Initially, the introduction of more cores to the mix is good for virtualization allowing us to scale more gracefully and confidently as compared to hyper-threading. While hyper-threading is reported to increase scheduling efficiency in vSphere, it is not effectively a core. Until Nehalem-EX is widely available and we can evaluate 4P performance of hyper-threading in loaded virtual environments I'm comfortable awarding hyper-threading a 5% performance bonus - all things being equal.
[caption id="attachment_435" align="alignright" width="300" caption="AMD's Value Shift"]

What's Coming?
That said, where is AMD going with Opteron in the near future and how will that affect Opteron-based eco-systems? At least one thing is clear: compatibility is assured and performance - at the same thermal footprint - will go up. So let's look at the ramifications of the new models/sockets and compare them to our well-known 2000/8000 series to glimpse the future.
A fundamental shift away from DDR2 and towards DDR3 for the new sockets is a major difference. Like the Phenom II, Core i7 and Nehalem processors, the new Opteron will be a DDR3 specimen. Assuming DDR3 pricing continues to trend down and the promise of increased memory bandwidth is realized in the HT3/DCA2 and Opteron, DDR3 will deliver solid performance in 4000 and 6000 configurations.
Opteron 6000: Socket G34
From the announcement, G34 is analogous to the familiar 8000-series line with one glaring exception: no 8P on the road-map. In the 2010-2011 time frame, we'll see 8-core, 12-core and 16-core variants with a new platform being introduced in 2012. Meanwhile, the 6000-series will support 4-channels of "unbuffered" or "registered" DDR3 across up to 12DIMMs per socket (3 banks by 4 channels). Assuming 6000 will support DDR3-1600, the theoretical bandwidth of a 4 channel design would yield memory bandwidths in the 40-50GB/sec range per link (about twice Istanbul's).
[caption id="attachment_433" align="alignright" width="300" caption="AMD 2010-2013 Road-map"]

With a maximum module density of 16GB, a 12-DIMM by 4-socket system could theoretically contain 768GB of DDR3 memory. In 2011, that equates to 12GB/core in a 4-way, 64-core server. At 4:1 consolidation ratios for typical workloads, that's 256 VM/host at 3GB/VM (4GB/VM with page sharing) and an average of 780MB/sec of memory bandwidth per VM. I think the math holds-up pretty well against today's computing norms and trends.
This configuration will lend itself to database consolidation and VDI initiatives. That's up to two 8-vCPU, 96GB VMs per socket for database consolidations, or around $2,000 per SQL server equivalence. For VDI, that's 64-128 active VDI users per chassis, or about $175 per VDI user equivalence (estimated). Again, the math scales pretty well.
Opteron 4000: Socket C32
Opteron 2000 users will be comfortable with the 4000-series road-map. In the 2010-2011 time frame, we'll see 4-core, 6-core and 8-core variants, again, with a new platform being introduced in 2012. Like the 2000, the 4000-series will support 2-channels of "unbuffered" or "registered" memory, but use DDR3 across only 4DIMMs per socket (2 banks by 2 channels). Assuming DDR3-1600 support, the theoretical bandwidth of a 2 channel design would yield memory bandwidth in the 20-25GB/sec range per link (similar to Istanbul).
With a maximum module density of 16GB, a 4-DIMM by 2-socket system could theoretically contain only 128GB of DDR3 memory. This represents half of the available memory of 2000-series designs today (up to 16-DIMM per processor). In 2011, that equates to 8GB/core in a 2-way, 16-core server. At 3:1 consolidation ratios for typical workloads, that's 48 VM/host at 2.6GB/VM (3.5GB/VM with page sharing) and an average of 830MB/sec of memory bandwidth per VM.
[caption id="attachment_430" align="alignright" width="300" caption="AMD Power Bands"]

As a cloud computing component (i.e. very large arrays of computing engines) the CPU and memory footprint are not the only consolidation factors. With hundreds or thousands of systems in use, power efficiency becomes a serious factor in OPEX calculations. Given the 40W target of the EE-band of 4000-series processors, the opportunity to save 45% on OPEX (power and cooling) over traditional performance bands will be taken by many cloud providers. In the future, VM/W will be a key metric of the "cloud age." It would appear that the Opteron 4000-series will fit well into VM/W calculations.
Based on today's thinking, I think the math holds-up pretty well for the 4000-series against today's cloud computing trends. The "sweet-spot" for consolidation-per-unit in cloud infrastructure will not trend-out for some time. Why? Powering-down a node increasingly impacts a larger share of deployed VMs. Today, migrate-and-power-down is a valid strategy to impact power consumption in "off-peak" times. In elastic clouds, this will likely continue to be a prime strategy [for energy conservation].
In very large cloud paradigms, a mixture of "always-on" low-power nodes and controlled duty cycle "performance" nodes will likely prevail. As demand for high-performance applications wain in the daily cycle, those processes could be either consolidated into fewer nodes or retreat onto higher ratio low-power nodes - allowing a significant percentage of "performance" nodes to be powered-down. Cluster management will need to advance to accomodate these needs.
In elastic clouds, this will likely continue to be a prime strategy [for energy conservation].
ReplyDelete[...] recently by XbitLabs running in AMD’s 4-way test mule platform. We’ve talked about Magny-Cours and socket-G34 before, but had no picture until now. The multi-chip module (MCM) heritage is obvious given it’s [...]
ReplyDelete