Thursday, October 16, 2008

IT Departments Should NOT Fear Multicore

"Gartner analyst Carl Claunch ... argued Thursday that IT types should fear–yes fear–multicore technology."

This statement appears in "Does IT really need to fear multicore?" on ZDNet, to which I was alerted by Surendra and Sujana Byna's excellent Multicoreinfo.com. So this is a fourth-level indirection to the source: Here to Multicoreinfo.com to the summary by Larry Dignan in ZDNet of a talk by Gartner analyst Carl Claunch at Gartner's Symposium/ITxpo (whew).

I'm bringing up this family tree because I don't have access to the original, only to the ZDNet summary; and, as the title of this blog entry indicates, I don't agree. Unfortunately, I don't know for sure whether I'm disagreeing with Claunch, of Gartner; or disagreeing with Digman's (ZDNet) summary rendition of what Claunch said. So Carl, if what I say interprets what you said wrongly, I apologize.

Let's start off quoting Claunch of Gartner (quote lifted from ZDNet): 
"We have had the easy fix for decades to deal with existing applications whose requirements have outgrown their current system — we refreshed the technology and the application ran better. This led to relatively long life cycles for applications, allowed operations to address application performance problems easily and required little advance notice of capacity shortfalls."

This is completely true. 

In contrast, with multicore, the simple process above becomes more complex. Of the five bullet points detailing that added complexity, three make sense to me: 
  • IT departments will need new coding skills 
  • IT departments will need new development tools;  
  • what you get out of it, the achieved performance increase, will vary, a lot, by application.
(I'll talk at the end about the ones that I consider flat-out wrong.)

I think this is entirely too alarmist.

Well. Given my past posts, you may be excused for thinking of bristly 300-lb hogs daintily hovering before tiny flowers, sipping nectar like hummingbirds; and of course the sky has fallen (thud!) (didn't even bounce). But I've not said multicore is a problem for server systems.

The issues brought up above are certainly problems if you have a single-thread production program and you need it to go faster. If that's your case, I'd be even more vehement about how you are in deep trouble, how much those skills are going to cost, etc.; see prior posts in this blog. Given the whole spectrum of business processing, I've no doubt such situations exist.

However: The vast majority of server workloads already scale out. They've been developed or recoded over the last decade or so to use multiple whole computers (clusters, farms) to provide greater performance. (Scale out is opposed to scale up, meaning using more CPUs in a single multicore system.) The price-performance of commodity rack systems is just too staggeringly better than the alternatives for anybody to do otherwise.

That server workloads scale out is not speculation. I and others have trudged through all the categories and sub-categories of International Data Corp's (IDC's) Server Workload taxonomy (see this report for a list of those categories, at the end, in Table 11), asking application experts whether their case scaled out: CRM? Check, scales out. Email? Check; got the diagrams. Data mining? Double-triple check, scales way out. And so on, through a dismaying long list. 

(Those are sample applications in IDC's categories "Business Processing," "Collaboration,"and "Business Intelligence," respectively.)

In fact, there is so much scale out of commercial workloads that the issue has confused many in the cloud computing crowd completely: Are you talking about computing out in the Internet cloud somewhere, or are you talking about computing on a cloud of systems (cluster)? Usual implicit answer: Both. They assume scale out so deeply that it's built into some of the names of cloud facilities, like Amazon's "Elastic Computing Cloud"; the claimed elasticity arises only if you scale out. But hey, everything scales out. That's the assumption. In fact, in most software discussions, "scales" is used to mean "scales out."

Now, a few of the server workloads in IDC's taxonomy don't scale out without issues. For example:
  • OLTP -- short transaction processing, like ATM transactions -- is problematical, despite the best efforts of Oracle, IBM, and others (IBM comes closer) (or maybe Tandem). 
  • Batch depends a whole lot on what you're batching; if what you're batching is totally single-thread, it can be a big problem. 
Those two, however, amount to less than 10% of server revenue, and... hey, they hark back to old-fashioned mainframe days. Mainframes have been SMP (multicore) since the 1970s, so at least some adaptation has taken place; OLTP in particular scales like a banshee on multicore; all the top record-holders of the OLTP benchmark, TPC-C, are SMPs (= multiprocessors) (= multicores). Batch is not always terrible either, but that's a longer story.

So IT shops already scale out, a whole lot, and if they don't they probably scale up and directly use multicore already. Where does that lead us?

Well, if you already scale out, have I got a deal for you: Virtualization. 

Just partition that 16-core monster into 16 single-core pussycats, each running as a separate, independent, system.

This isn't completely painless. You will need a large amount of memory per system, but probably not much more than would scale up due to the claimed increased performance of the multicore CPU set. As for IO, well, that will hopefully be relieved when PCIe 3.0 starts deploying. It can be a problem now, on some systems. And virtualization adds yet more pain to systems management, which we definitely do not need; this is probably the worst aspect of the virtualization deal. But it totally beats parallelizing all your code, hands down and match over, which is what the article says Claunch says you have to do. I don't think so.

I suppose virtualization isn't the best long-term solution, simply because it seems it would be much cleaner to use a multicore system as one parallel thing. Partitioning it up seems artificial and inelegant. However, it least buys some time for a beleagured IT manager, and, well, interim solutions that work always last longer than one would anticipate.

Now, about the other bullet points appearing in the ZDNet article:
  • There are multiple core designs that will affect your software.
This differs from the existing situation how? The next generation of hardware has always been had a design different from the previous one, requiring at least recompilation to tap the claimed performance. Moreover, with Intel apparently (we don't know for sure, but I'd bet on it) ditching their interconnect busses for point-to-point connections like AMD, multicore designs are converging, not diverging.
  • Your existing software licensing deals become more complicated.
Just to pick the key example they used: Oracle has been licensing its products on multicore/multiprocessor systems since at least the mid-1980s. See comments above about breaking records. It's not the same as licensing on multiple separate systems, true, and if that's all you've ever dealt with in your IT career, you probably have some education required. But that doesn't mean it's horrid. IT shops have dealt with it for decades.

Really, though, licensing points to a problem with my solution: Licensing across multiple virtual machines has been a quagmire of the first order. It's inching towards something more rational, but there are still problems. Fortunately, the particular case of Oracle (and DB2, and other databases) is irrelevant for the multicore case, since they scale up to many processors and have for a long time; so you don't need to play the virtualization game.

Overall, multicore is actually a pretty good fit for server workloads. That's one aspect of multicore that won't need the alarm bells ringing. If servers alone had the volumes to keep this train rolling, many of us would sleep better at night.

(By the way, apologies to all for neglecting this blog for a while. I've got a lot more to post; I just got tangled up for a while. Comments will be answered.)

((By the way)^2: If somebody knows of an equivalent of the IDC workload taxonomy but for clients, I'd be much obliged if they could give me a pointer to it.)

3 comments:

Steve Rogers said...

I think you're spot on concerning why IT shouldn't fear multicore. Where they fear it, it's because they've had innovation beaten out of them.

Anonymous said...

Greg, isn't there a concern that even server apps will ultimately be bound by resources of the machine (memory bandwidth, harddrive, network, ...) as opposed to the CPU itself?

Do you think that adding more cores to an 8-core web server would really increase the throughput?

On the other hand, I do agree that IT departments are less in trouble than regular users, simply because scaling is possible by adding more machines.

Igor

PS.: I noticed that your blog posts have dried up. I hope that's temporary because I enjoy reading your blog!

Greg Pfister said...

Igor,

You're of course correct. If the rest of a system -- memory bandwidth, IO -- isn't appropriately balanced with the CPU power for a workload set, you are toast. (That's a major issue with accelerators, by the way.)

However, developments like Intel's QuickPath and AMD's HyperTransport (plus Torrenza) well before it, as well as the coming PCI 2.0, indicate that vendors are aware of the issues and working to at least maintain the ratios we have now. There are workloads for which those aren't the right ratio, but for the most part things are OK.

About the blog: Yes, the drying up is temporary, even though it's run stupidly long. I ran into some overall issues of direction and intent. I'll post about that soon (I hope), along with other topics.

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.