Wednesday, September 24, 2008

Clarifying the Black Hole

The announcement of this blog in LinkedIn’s High Performance and Supercomputing group produced some responses indicating that I need to clarify what this blog is about. Thanks for that discussion, folks. One of my (semi-?) writers' block problems with this book is clearly explaining exactly what the issue is.

Here’s a whack at that, an overall outline of the situation as I see it. Everything in this outline requires greater depth of explanation, with solid numbers and references. This is just the top layer, and I’m really unsatisfied with it. Hang in there with me, prod me with questions if there are parts you are really itchy about, and I’ll get there.

First, I don’t mean to imply or say that there is some dastardly conspiracy to force the industry to use explicit parallelism. The basic problems are (1) physics; (2) the exhaustion of other possibilities. There simply isn't any choice -- but, no choice to do what? To increase performance.

This industry has lived and breathed dramatically increasing year-to-year single-thread performance since the 1960s. The promise has always been: Do nothing, and your programs run 45% faster this time next year, on cheaper hardware. That’s an incredible situation, the only time it’s ever happened in history, and it’s a situation that’s the very bedrock of any number of industry business and technical assumptions. Now we are moving to a new paradigm: To actually achieve the additional performance available in new systems, programmers must do something. What they must do is a major effort, it’s not clear how broadly applicable it is, and it’s so difficult that few can master it. That, to put it mildly, is a major paradigm shift. (And it’s not being caused by some nefarious cabal; as I said, it’s physics.)

This is not a problem for server systems. Those have been broadly parallel – clusters / farms / warehouses -- since the mid-90s, and server processing has arguably been parallelized since the mid-70s on mainframe SMPs. Transaction monitors, web server software like Apache, and other infrastructure all have enabled this parallelism (but the applications had to be there).

Unfortunately, servers alone don't produce the semiconductor volumes that keep an important fraction of this industry moving forward – a fraction, not all of the industry, but certainly the part that's arguably key, to say nothing of the loudest, and will make the biggest ruckus as it increasingly runs into trouble. The industry is going to be a very different place in the foreseeable future.

At this point, it’s reasonable to ask that if this is actually true, why aren’t the combined voices of the blogosphere, industry rags, and industry stock analysts all shouting it from the housetops? Why are these statements believable? Where’s the consensus? Are we in conspiracy theory territory?

No, we’re not. I think we’re in the union of two territories. First and foremost, we’re in the “left hand doesn’t know what the right hand is doing” territory, greatly enhanced by the narrow range of people with the cross-domain expertise needed to fully understand the problem. There was a flare-up of concern, flashing through the blogosphere and the technical news outlets back in 2004, but it was focused (almost correctly) on crying “Moore’s Law Is Ending!” so it was squashed when the technorati high priests responded “No, Moore’s Law isn’t ending, you dummies” which it isn’t, in the original literal sense. But saying that is a fine case of not seeing the forest for the veins on the leaves of the trees, or, in this case, not seeing a chasm because you’re a botanist, not a geologist.

That is the main territory, or at least I hope so. But having lived in the industry for quite a while, I understand that there’s also a form of denial – or hope – going on: There have definitely been occasions, well-remembered by management and business leaders, where those excitable, barely comprehensible techno-nerds have cried wolf, said the sky was falling, and it didn’t. This produces a reaction like: Nah, this can’t really be that big a deal; it’s the wrong answer; obviously we should not panic, since things have always happened to make such issues just go away. Also, aren’t they really posturing for more funding?

Unfortunately, they’re not. And the truth of the situation is starting to come across, with some industry funding for parallel programming, and pleas for a Manhattan project on parallel programming showing up. For reasons detailed in my 101 Parallel Languages series of posts, I think those are looking under the wrong lamppost.

My choice of the place to look is for applications that are “embarrassingly parallel,” where there’s no search for the parallelism – it’s obvious – and little need for explicit parallel programming. There are a few possibilities there (I’m partial to virtual worlds), but I’m far from certain that they’ll be widespread enough to pick up the slack in the client space. So I fear we may be in for a significant, long-term, downturn for companies whose business relies on the replacement cycle for client systems. This is not a pleasant prospect, since those companies are presently at the heart of the industry.

6 comments:

Anonymous said...

This is fascinatin' stuff! One puzzlement on my part: I'm not sure how finding some applications that are embarrassingly parallel is going to help. Even if we find a bunch of such applications, people are still going to want to use all of their existing not-so-parallel applications, and performance on those will continue not to climb, and the client industry will still have a sea-change. At least that's my immediate thought.

Are you thinking that we can find some very important apps that people *already* use, and that turn out to be e'ly-p'able, and parallelize them so well that we can continue to feed the cycle that the industry is used to?

Or are you thinking that we can coax people into giving up some of those apps in favor of the e'ly-p'able ones that we find? I admit I'm a huge Virtual Worlds addict myself :) but that sounds like an awfully tall order.

I imagine I'm just overlooking a key part of your argument, but if I am other folks may be also :).

And thanks for the deep and straightforward thinking! Keep it up...

Jim Cownie said...

I think you missed one more reason why this change is so hard for the software industry. It's not just that there was no need to tune your code because the hardware will fix the problem for you next year, but that spending the time to tune was actually a bad thing, because all the history shows that getting something out there and taking market share is the way to make your product the one everyone else has to compete with.

Note, too, that you're not alone pointing this problem out. Herb Sutter has been saying it too http://www.gotw.ca/publications/concurrency-ddj.htm

Greg Pfister said...

James, thanks for commenting -- that tuning was actively bad due to cost raises things to another level.

And thanks for the pointer to Sutter's article. Near the end he says he'll have more to say about high-level programming models. Do you have a pointer to that? I've googled his name, and don't see it. I'm generally rather less sanguine than he... more about that in the blog.

Robert said...

Greg,

With this sentence:

"Unfortunately, servers alone don't produce the semiconductor volumes that keep an important fraction of this industry moving forward – a fraction, not all of the industry, but certainly the part that's arguably key, to say nothing of the loudest, and will make the biggest ruckus as it increasingly runs into trouble."

you tap dance around this "important fraction of the industry" without actually saying who they are. Who, precisely, are these people who can't easily parallelize, and really need continuing performance gains, and constitute a significant fraction of the computing market? Before we can find new solutions for them we need to know who they are and what they are trying to accomplish.

Arkadius said...

From my point of view, we are concentrating to much on the point that many applications on the client side of the equation aren't obviously parallel. Therefore we don't see how "satellites" could enhance their performance.

Take Word for example; yes, writing your letter using Word may not profit from 1000s of cores. Not directly that is.

But what about FFT-performance you can utilize using many-core technologies and graph "traversals" to create an "intelligent" speech-to-text plug in for it? Auto-generation of text (predictive typing and predictive auto-generation of phrases/sentences/whole texts) based on simultaneously scanning millions or billions of text documents?

What about pre-rendering of audio in music production? Timestretching/pitchshifting algorithms for dozens or hundreds of tracks.
What about, again, generation of "creative" data (see the Word example) using huge amounts of preselected data (notes, songs, elements like drum loops, musical phrases, styles of musicians (say, the timing of drummers or creating of guitar parts based on the finger position of specific players based on their physique, etc...).

What about projects like OpenCV and using 3D-cameras, which are breaking the $50 barrier, to create the next generation of user interfaces for the end user.......again; audio and video (what you say, is often what you want to see) go hand in hand here and will increasingly rely on studying, categorizing and interpolating/generating data in parallel.

Specific, Artificial Intelligence is needed here and most of it will be based on "creative" interpolation of data from existing, huge datasets.

Say I want to feed my application with 1GB of data/sec and write 1GB of data/sec.
I don't have a 8Gbit/s internet connection and even if I had it, I'd need roughly 5-6 modern SSD driver to "play" the data to the main memory and another 5-6, in order to "record" it.

At the same time, I can get systems with 8 or 12GBs of RAM for a very attractive price, while these 10-12 SSD drives, which will get me ca. 1.5GBs of "room" and I/O performance will set me back (including a controller) 5-7k (!!!).
A top-notch PC with a 6-core CPU, 12 GB of RAM memory, 2 x GPU with 2GB of memory (and roughly 1000cores), 8 TBs of "standard" HD room will be way "south" of this. Something doesn't add up here. I need I/O and for parallel applications on the client side, I need it FAST!

We are looking for the elephant in the room, but we forget that once we find it (and for many computational problems regarding to parallelism, we may find it sooner than we think), we have to feed the beast.

Finding ways to combine the end user's needs and parallelism? No problemo.

Greg Pfister said...

@Arkadius,

I don't think any of the cases you describe are really a killer app that drives client volumes.

Speech-to-text plugin for Word - First, Dragon already does it without parallelism; second, it's not usable in the usual cubicle maze with open tops and no sound isolation. (I don't mean not usable algorithmically, I mean socially.)

Music creation - narrow audience.

Finding the pic of Aunt Flo in the bkikini - yes people will want to do this, occasionally. At least MS thinks so, in commercials, but MS locates that function "in the cloud," whatever that means in this context. Killer -- no.

AI I'll believe when somebody can tell me what the I really is. The cases based on analysis of a jillion inputs (like Google's spelling check) have that analysis done elsewhere, on servers, and the compiled result doesn't need massive parallelism. Also, I'm a veteran of Japan's "5th Generation" whoohah, which coupled AI and parallelism just by saying "we need parallel to do this," and fell on its face.

Could there be a killer app needing parallelism? Sure, there could be. I'm a firm believer that people write programs pragmatically to run well on the hardware available, and that will be parallel. I just don't think anybody's found it yet.

Greg

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.