Friday, October 31, 2008

Microsoft Azure Just Says NO to Multicore Apps in the Cloud


At their recent PDC’08, Microsoft unveiled their Azure Services Platform, Microsoft’s full-throttle venture into Cloud Computing. Apparently you shouldn’t bother with multithreading, since Azure doesn’t support multicore applications. It scales only “out,” using virtualization, as I said server code would generally do in IT Departments Should NOT Fear Multicore. I’ll give more details about that below; first, an aside about why this is important.

“Cloud Computing” is the most hyped buzzword in IT these days, all the way up to a multi-article special report in The Economist (recommended). So naturally its definition has been raped by the many people who apparently reason “Cloud computing is good. My stuff is good. Therefore my stuff is cloud computing.”

My definition, which I’m sure conflicts with many agendas: Cloud computing is hiring someone out on the web to host your computing, where “host your computing” can range across a wide spectrum: from providing raw iron (hardware provisioning), through providing building blocks of varying complexity, through providing standard commercial infrastructure, to providing the whole application. (See my cloud panel presentation at HPDC 2008.)

Clouds are good because they’re easy, cheap, fast, and can easily scale up and down. They’re easy because you don’t have to purchase anything to get started; just upload your stuff and go. They’re fast because you don’t have to go through your own procurement cycle, get the hardware, hire a sysadmin, upgrade your HVAC and power, etc. They’re cheap because you don’t have to shell out up front for hardware and licenses before getting going; you pay for what you use, when you use it. Finally, they scale because they’re on some monster compute center somewhere (think Google, Amazon, Microsoft – all cloud providers with acres of systems, and IBM’s getting in there too) that can populate servers and remove them very quickly – it’s their job, they’re good at it (or should be) – so if your app takes off and suddenly has huge requirements, you’re golden; and if your app tanks, all those servers can be given back. (This implicitly assumes “scale out,” not multicore, but that’s what everybody means by scale, anyway.)

It is possible, if you’re into such things, to have an interminable discussion with Grid Computing people about whether a cloud is a grid, grid includes cloud, a cloud is a grid with a simpler user interface, and so on. Foo. Similar discussions can revolve around terms like utility computing, SaaS (Software as a Service), PaaS (Platform as …), IaaS (Infrastructure …) and so on. Double foo – but with a nod to the antiquity of “utility computing.” Late 60s. Project MAC. Triassic computing.

Microsoft Azure Services slots directly into the spectrum of my definition at the “provide standard commercial infrastructure” point: Write your code using Microsoft .NET, Windows Live, and similar services; upload it to a Microsoft data center; and off you go. Its presentation is replete with assurances that people used to Microsoft’s development environment (.NET and related) can write the same kind of things for the Microsoft cloud. Code doesn’t port without change, since it will have to use different services – Azure’s storage services in particular look new, although SQL Services are there – but it’s the same kind of development process and code structure many people know and are comfortable with.

That sweeps up a tremendous number of potential cloud developers, and so in my estimation bodes very well for Microsoft doing a great hosting business over time. Microsoft definitely got out in front of the curve on this one. This assumes, of course, that the implementation works well enough. It’s all slideware right now, but a beta-ish Community Technology Preview platform is supposed to be available this fall (2008).

For more details, see Microsoft’s web site discussion and a rather well-written white paper.

So this is important, and big, and is likely to be widely used. Let’s get back to the multicore scaling issues.

That issue leaps out of the white paper with an illustration on page 13 (Figure 6) and a paragraph following. That wasn’t the intent of what was written, which was actually intended to show why you don’t have to manage or build your own Windows systems. But it suffices. Here’s the figure:


[Figure explanation: IIS is Microsoft’s web server (Internet Information Services) that receives web HTTP requests. The Web Role Instance is user code that initially processes that, and passes it off to the Worker Role Instance through queues via the Agents. This is all apparently standard .NET stuff (“apparently” because I can’t claim to be a .NET expert). So the two sets of VM boxes roughly correspond to the web tier (1st tier), with IIS instead of Apache, and application tier (2nd tier) in non-Microsoft lingo.]

Here’s the paragraph:

While this might change over time, Windows Azure’s initial release maintains a one-to-one relationship between a VM [virtual machine] and a physical processor core. Because of this, the performance of each application can be guaranteed—each Web role instance and Worker role instance has its own dedicated processor core. To increase an application’s performance, its owner can increase the number of running instances specified in the application’s configuration file. The Windows Azure fabric will then spin up new VMs, assign them to cores, and start running more instances of this application. The fabric also detects when a Web role or Worker role instance has failed, then starts a new one.

The scaling point is this: There’s a one-to-one relationship between a physical processor core and each of these VMs, therefore each role instance you write runs on one core. Period. “To increase an application’s performance, its owner can increase the number of running instances” each of which is a separate single-core virtual computer. This is classic scale out. It simply does not use multiple cores on any given piece of code.

There are weasel words up front about things possibly changing, but given the statement about how you increase performance, it’s clear that this refers to initially not sharing a single core among multiple VMs. That would be significantly cheaper, since most apps don’t use anywhere near 100% of even a single core’s performance; it’s more like 12%. Azure doesn’t share cores, at least initially, because they want to ensure performance isolation.

That’s very reasonable; performance isolation is a big reason people use separate servers (there are 5 or so other reasons). In a cloud megacenter, you don’t want your “instances” to be affected by another company’s stuff, possibly your competitor, suddenly pegging the meter. Sharing a core means relying on scheduler code to ensure that isolation, and, well, my experience of Windows systems doing that is somewhat spotty. The biggest benefit I’ve gotten out of dual core on my laptop is that when some application goes nuts and sucks up the CPU, I can still mouse around and kill it because the second core is not being used.

Why do this, when multicore is a known fact of life? I have a couple of speculations:

First, application developers in general shouldn’t have anything to do with parallelism, since it’s difficult, error-prone, and increases cost; developers who can do it don’t come cheap. That’s a lesson from multiple decades of commercial code development. Application developers haven’t dealt with parallelism since the early 1970s, with SMPs, where they wrote single-thread applications that ran under transaction monitors, instantiated just like Azure is planning (but not on VMs).

Second, it’s not just the applications that would have to be robustly multithreaded; there’s also the entire .NET and Azure services framework. That’s got to be multiple millions of lines of code. Making it all really rock for multicore – not just work right, but work fast – would be insanely expensive, get in the way of adding function that can be more easily charged for, and is likely unnecessary given the application developer point above.

Whatever the ultimate reasons, what this all means is that one of the largest providers of what will surely be one of the most used future programming platforms has just said NO! to multicore.

Bootnote: I’ve seen people on discussion lists pick up on the fact that “Azure” is the color of a cloudless sky. Cloudless. Hmmm. I’m more intrigued by its being a shade of blue: Awesome Azure replacing Big Blue?

3 comments:

Anonymous said...

In my mind, the aim in cloud apps would be to scale out over the number of users of your app. Multi-core code is a pain to write compared to single core code. We do that anyways in personal computers as the other core is lying free , which is not the case in the cloud.

Additionally, MS probably realised that most code is certainly not that parallelisable. Right now, you can think of your application code scaling to 3-4 cores because of some awesome thing you did - but MS is working with machines at 64 cores now (and that is a limit because Vista's core id has max range of 64) - no application is going to scale that high. If MS allowed multi-core code then it would be a major major pain in the ass to manage. single processor would need multiple process != number of cores etc. , then shutting down a VM and moving it somewhere else would require more checks

This solution they have taken is awesome - it is simple and covers 98% of the cases imho.

Greg Pfister said...

Thanks for the thoughts. I think we're actually agreeing. I didn't mean to imply that Microsoft did the wrong thing for this context.

The issue for me is how and why do you use multicore at all? (And if you don't, hasn't CPU performance effectively stopped increasing?)

This method effectively uses it as a cluster-on-a-chip, and as I said in a prior post, that's expected. It certainly works, but ignores lots of the complicated hardware chip vendors spend a lot of time and effort on.

Also, what if the operation being done is complicated -- like rendering a multiplayer game for display on a handheld? Probably for now that's in the other 2%.

Tim said...

the near term approach to keeping up with Moore's Law for cpus will come from multicores. multicores can only do multitasking, not multithreading.

This is 'a good thing'. As you note, threads are error prone and likely to bite the naive coder as she battles to implement what ought to be in the O/S. Multiprocess code is a lot easier to distribute whether across cores or machines, so it's where we're going to have to go. Quite right, too: coders should be focused on solving business problems not technical puzzles.

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.