Thursday, November 26, 2009

Oh, for the Good Old Days to Come

I recently had a glorious flashback to 2004.

Remember how, back then, when you got a new computer you would be just slightly grinning for a few weeks because all your programs were suddenly so crisp and responsive? You hadn't realized your old machine had a rubbery-feeling delay responding to your clicks and key-presses until zip! You booted the new machine for the first time, and wow. It just felt good.

I hadn't realized how much I'd missed that. My last couple of upgrades have been OK. I've gotten a brighter screen, better graphics, lighter weight, and so on. They were worth it, intellectually at least. But the new system zip, the new system crispness of response – it just wasn't there.

I have to say I hadn't consciously noticed that lack because, basically, I mostly didn't need it. How much faster do you want a word processor to be, anyway? So I muddled along like everyone else, all our lives just a tad more drab than they used to be.

Of course, the culprit denying us this small pleasure has been the flattening of single-thread performance wrought by the half-death of Moore's Law. Used to be, after a couple of years delay you would naturally get a system that ran 150% or 200% faster, so everything just went faster. All your programs were rejuvenated, and you noticed, instantly. A few weeks or so later you were of course used to it. But for a while, life was just a little bit better.

That hasn't happened for nigh unto five years now. Sure, we have more cores. I personally didn't get much use out of them. All my regular programs don't perk up. But as I said, I really didn't notice, consciously.

So what happened to make me realize how deprived I – and everybody else – has been? The Second Life client.

I'd always been less than totally satisfied with how well SL ran on my system. It was usable. But it was rubbery. Click to walk or turn and it took just a little … time before responding. It wasn't enough to make things truly unpleasant (except when lots of folks were together, but that's another issue). But it was enough to be noticeably less than great. I just told myself, what the heck, it's not Quake but who cares, that's not what SL is about.

Then for reasons I'll explain in another post, I was motivated to reanimate my SL avatar. It hadn't seen any use for at least six months, so I was not at all surprised to find a new SL client required when I connected. I downloaded, installed, and cranked it up.

Ho. Ly. Crap.

The rubber was gone.

There were immediate, direct responses to everything I told it to do. I proceeded to spend much more time in SL than I originally intended, wandering around and visiting old haunts just because it was so pleasant. It was a major difference, on the order of the difference I used to encounter when using a brand-new system. It was like those good old days of CPU clock-cranking madness. The grin was back.

So was this "just" a new, better, software release? Well, of course it was that. But I wouldn't have bothered writing this post if I hadn't noticed two other things:

First, my CPU utilization meter was often pegged. Pegged, as in 100% utilization, where flooring only one of my two CPUs only reads 50%. When I looked a little deeper, I saw the one, single SL process was regularly over 50%. I've not looked at any of the SL documentation on this, but from that data I can pretty confidently say that this release of the SL client can make effective use of both cores simultaneously. It's the only program I've got with that property.

Second, my thighs started burning. Not literally. But that heat tells me when my discrete GPU gets cranking. So, this client was also exercising the GPU, to good effect.

Apparently, this SL client actually does exploit the theoretical performance improvements from graphics units and multiple cores that had been laying around unused in my system. I was, in effect, pole-vaulted about two system generations down the road – that's how long it's been since there was a discernible difference. The SL client is my first post-Moore client program.

All of this resonates for me with the recent SC09 (Supercomputing Conference 2009) keynote of Intel's Justin Rattner. Unfortunately it wasn't recorded by conference rules (boo!), but reports are that he told the crowd they were in a stagnant, let us not say decaying, business unless they got their butts behind pushing the 3D web. (UPDATE: Intel has posted video of Rattner's talk.)

Say What? No. For me, particularly following the SL experience above, this is not a "Say What?" moment. It makes perfect sense. Without a killer application, the chip volumes won't be there to keep down the costs of the higher-end chips used in non-boutique supercomputers. Asking that audience for a killer app, though, is like asking an industrial assembly-line designer for next year's toy fashion trends. Killer apps have to be client-side and used by the masses, or the volumes aren't there.

Hence, the 3D Web. This would take the kind of processing in the SL client, which can take advantage of multicore and great graphics processing, and put it in something that everybody uses every day: the browser. Get a new system, crank up the browser, and bang! you feel the difference immediately.

Only problem: Why does anybody need the web to be 3D? This is the same basic problem with virtual worlds: OK, here's a virtual world. You can run around and bump into people. What, exactly, do you do there? Chat? Bogus. That's more easily done, with more easily achieved breadth of interaction, on regular (2D) social networking sites. (Hence Google's virtual world failure.)

There are things that virtual worlds and a "3D web" can, potentially, excel at; but that's a topic for a later post.

In the meantime, I'll note that in a great crawl-first development, there are real plans to use graphics accelerators to speed up the regular old 2D web, by speeding up page rendering. Both Microsoft and Mozilla (IE & Firefox) are saying they'll bring accelerator-based speedups to browsers (see CNET and Bas Schouten's Mozilla blog) using Direct2D and DirectWrite to exploit specialized graphics hardware.

One could ask what good it is to render a Twitter page twice as fast. (That really was one of the quoted results.) What's the point? Asking that, however, would only prove that One doesn't Get It. You boot your new system, crank up the browser and bam! Everything you do there, and you do more and more there, has more zip. The web itself – the plain, old 2D web – feels more directly connected to your inputs, to your wishes; it feels more alive. Result?

The grin will be back. That's the point.

Sunday, November 8, 2009

Multicore vs. Cloud Computing

Multicore is the wave of the future. Cloud Computing is the wave of the future. Do they get along? My take: Eh. Sorta. There are the usual problems with parallel programming support, despite hubbub about parallel languages and runtimes and ever bigger multicores.

Multicore announcements in particular have been rampant recently. Not just the usual drumbeat from Intel and AMD; that continues, out to 8-way, 12-way, and onward to the future. Now more extreme systems are showing their heads, such as ScaleMP announcing vSMP for the Cloud, (and also for SMB), a way of gluing together X86 multicore systems into even larger shared-memory (NUMA) systems. 3Leaf is doing also doing essentially the same thing. Tilera just announced a 100-core chip product, beating Intel and others to the punch. Windows 7 has replaced the locking in XP, allowing it to "scale to 256 processors" – a statement that tells me (a) they probably did fix a bunch of stuff; and (b) they reserved one whole byte for the processor/thread ID. (Hope that's not cast in concrete for the future, or you'll have problems with your friendly neighborhood ScaleMP'd Tilera.)

So there's a collection of folks outside the Big Two processor vendors who see a whole lot of cores as good. Non-"commodity" systems – by the likes of IBM, Sun, and Fujitsu – have of course been seeing that for a long time, but the low-priced spread is arguably as the most important in the market, and certainly is the only hardware basis I've seen for clouds.

What's in the clouds for multicore?

Amazon's instance types and pricing do take muticore into account: at the low end small Linux is $0.085/hour for 1 core, nominally 1 GHz; and at the high end, still Linux, you can get "Extra-large High CPU" Linux is $0.68/hour for 8 cores, 2.5GHz each. So, assuming perfect parallel scaling, that's about 20X performance for 8X the price, a good deal. (I simplified 1 Amazon compute unit to 1 GHz. Amazon says it's 1.0-1.2 GHz 2007 Opteron or Xeon.)

Google App Engine (GAE) just charges per equivalent 1.2 GHz single CPU. What happens when you create a new thread in your Java code (now supported) is… well, you can't. Starting a new thread isn't supported. So GAE basically takes the same approach as Microsoft Azure, which treats multicore as an opportunity to exercise virtualization (not necessarily hardware-level virtualization, by my read), dividing multicores down into single core systems.

The difference between AWS, on the one hand, and GAE or Azure on the other, of course makes quite a bit of sense.

GAE and Azure are both PaaS (Platform as a Service) systems, providing an entire application platform, and the application is web serving, and the dominant computing models for web-serving are all based on throughput, not multicore turnaround.

AWS, in contrast, is IaaS (Infrastructure as a Service): You got the code? It's got the hardware. Just run it. That's any code you can fit into a virtual machine, including shared-memory parallel code, all the way up to big hulking database systems.

Do all God's chillum have to be writing their web code in Erlang or Haskell or Clojure (which turns into Java at runtime) or Ct or whatever before PaaS clouds start supporting shared-memory parallelism? But if PaaS is significant, doesn't it have to support E/H/C/Ct/etc. before those chillums will use them? Do we have a chicken-and-egg problem here? I think this just adds to the already long list of good reasons why parallel languages haven't taken off: PaaS clouds won't support them.

And in the meantime, there are of course a formidable barrier to others hosting their own specialty code on PaaS, like databases and at least some HPC codes. Hence the huge number of Amazon Solution Providers, including large-way SMP users such as of Oracle, IBM DB2, and Pervasive, while Google has just a few third-party Python libraries so far.

PaaS is a good thing, but I think that sooner or later it will be forced, like everyone else, to stop ignoring the other wave of the future.


Postscript / Addendum / Meta-note: My apologies for the lack up blog updates recently; I've been busy on things that make money or get me housed. I've something like six posts stacked up to be written, so you can expect more soon.