Friday, May 28, 2010

No, Larrabee is Not Dead


We interrupt our series of posts on virtualization for a public service announcement: The numerous reports of Larrabee being dead are, at a minimum, greatly exaggerated. (Note, a significant addition after posting was made below, marked in red.)

Larrabee is the highly-publicized erstwhile mega discrete graphics chip from Intel, the subject of flame wars with Nvidia's CEO, whose initial product introduction was cancelled last December because its performance wasn't yet competitive.

Now, a recent Technology @ Intel blog post about Intel graphics (An Update On Our Graphics-related Programs) has resulted in a flurry of "Larrabee is Dead!" postings. There's Anandtech (Intel Kills Larrabee GPU), Device Magazine (Intel Cancels Larrabee Project), PCWorld (Intel Cancels Larrabee), ZDNet (Intel officially (again) kills off Larrabee), the Inquirer (Larrabee will not be), and … I'll stop there, since a quick Google will find you many more.

In the minority is eWeek (Intel Clarifies Graphics Plans, Hints at HPC Project), taking a balanced "this is what the blog actually said" approach, with SemiAccurate (Larrabee alive and well) on the other side, considering those "dead Larrabee" posts this a case of mass flakiness. I agree.

All the doom-sayers should actually listen to what Paul Otellini, Intel's CEO, said at Intel's Investor Meeting of May 11 2010 – not to what someone else interpreted what he heard his cousin's dog say Otellini said, but to the words he actually said. The full webcasts are archived and publically available. In particular, listen to the last segment, Q&A, starting at time 1:39, when someone named Hans asked about Larrabee with the comment that it hadn't appeared in the prior presentations.

Here's my partial transcript of the response, and I urge you not to believe me. It's a pain to transcribe, and I'm sure I got something wrong. Go listen to the webcast. Some words of mine, and a comment or two, are inserted in brackets [like this].





"Everything you saw in the roadmap today does not have Larrabee built into it. … [our mainstream product will be] based on evolving our mainstream integrated graphics products… [but there will be a] sea change in the architecture with Sandy Bridge… by going onto the chip and moving from two generations behind silicon to [current] silicon you get… best of class [best for integrated graphics, which isn't saying much, but even that will be a huge change for Intel if it happens].
"… In terms of Larrabee, we did not stop the project. If we made any mistake with Larrabee, we probably should not have talked about something that was high risk and long term. We have not stopped the project. We have shipped STVs out. We're looking at how and when to bring it to market. It still has very very high promise in areas of throughput computing and in terms of a general reprogrammable graphics engine using small IA cores. We still like the idea. But we've taken the risk associated with a new architecture out of our roadmap over the next few years so we have the flexibility to stay competitive while still working on it."
Nothing in the tech blog entry says anything more than is said above. It's not in the roadmap, so when the blog says "We will not bring a discrete graphics product to market, at least in the short-term" – the statement that fired up the nay-sayers' posts – the blog is simply re-stating the official party line, an action which is doubtless far from an accident.

Also, please note that Otellini did say, twice, "we did not stop the project" and that they are looking at how and when to bring it to market. All of this is completely consistent with the position taken last January that I reported in another post (The Problem with Larrabee), about relating things I picked up in a talk by Tom Forsyth about "the future of Intel graphics despite what the press says." There has been no change.



**ADDED**

Contrast this with another case: InfiniBand. Intel realized, late in the game and after publicity, that its initial IB product would also be noncompetitive. There, the response was to just fold up the shop, completely. People were reassigned, and the organization ceased to be. (I was there when this all happened, as a significant, working, designing, writing committee-leading IBM rep on the IB standard committee.) The situation is totally different for Larrabee; the shop is open and working, and high-level statements saying so have been repeated. This indicates a very different attitude, a continuing commitment to the technology.

With that kind of consistent high-level corporate backing, to say nothing of the large number of very talented individuals working on Larrabee, were they to kill it there would be a bit more of a ruckus than a sentence or two in a tech blog post.

It must have been a very slow news day.

We now return you to our previously scheduled blog posts, which will resume after Memorial Day.

Thursday, May 27, 2010

How Hardware Virtualization Works (Part 2)




This is the second in a series of posts about how hardware virtualization works. See Part 1 to catch it from the start.

The Goal

The goal of hardware virtualization is to maintain, for all the code running in a virtual machine, the illusion that it is running on its own, private, stand-alone piece of hardware. What a provider is giving you is a lease on your own private computer, after all.

"All code" includes all applications, all middleware like databases or LAMP stacks, and crucially, your own operating system –including the ability to run different operating systems, like Windows and Linux, on the same hardware, simultaneously. Hence: Isolation of virtual machines from each other is key. Each should think it still "owns" all of its own hardware.

The result isn't always precisely perfect. With sufficient diligence, operating system code can figure out that it isn't running on bare metal. Usually, however, that is the case only when specific programming is done with the aim of finding that out.




Trap and Map

The basic technique used is often referred to as "trap and map." Imagine you are a thread of computation in a virtual machine, running on a one processor of a multiprocessor that is also running other virtual machines.

So off you go, pounding away, directly executing instructions on your own processor, running directly on bare hardware. There is no simulation or, at this point, software of any kind involved in what you are doing; you manipulate the real physical registers, use the real physical adders, floating-point units, cache, and so on. You are running asfastas thehardwarewillgo. Fastasyoucan. Poundingoncache, playingwithpointers, keepinghardwawrepipelinesfull, until…

BAM!

You attempt to execute an instruction that would change the state of the physical machine in a way that would be visible to other virtual machines. (See the figure nearby.)




Just altering the value in your own register file doesn't do that, and neither does, for example, writing into your own section of memory. That's why you can do such things at full-bore hardware speed.

Suppose, however, you attempt to do something like set the real-time clock – the one master real time clock for the whole physical machine. Having that clock altered out from under other running virtual machines would not be very good at all for their health. You aren't allowed to do things like that.

So, BAM, you trap. You are wrenched out of user mode, or out of supervisor mode, up into a new higher privilege mode; call it hypervisor mode. There, the hypervisor looks at what you wanted to do – change the real-time clock -- and looks in a bag of bits it keeps that holds the description of your virtual machine. In particular, it grabs the value showing the offset between the hardware real time clock and your real time clock, alters that offset appropriately, returns the appropriate settings to you, and gives you back control. Then you start runningasfastasyoucan again. If you later read the real-time clock, the analogous sequence happens, adding that stored offset to the value in the hardware real-time clock.

Not every such operation is as simple as computing an offset, of course. For example, a client virtual machine's supervisor attempting to manipulate its virtual memory mapping is a rather more complicated case to deal with, a case that involves maintaining an additional layer of mapping (kept in the bag 'o bits): A map from the hardware real memory space to the "virtually real" memory space seen by the client virtual machine. All the mappings involved can be, and are, ultimately collapsed into a single mapping step; so execution directly uses the hardware that performs virtual memory mapping.




Concerning Efficiency

How often do you BAM? Unhelpfully, this is clearly application dependent. But the answer in practice, setting aside input/output for the moment, is not often at all. It's usually a small fraction of the total time spent in the supervisor, which itself is usually a small fraction of the total run time. As a coarse guide, think in terms of overhead that is well less than 5%, or in other words, for most purposes, negligible. Programs that are IO intensive can see substantially higher numbers, though, unless you have access to the very latest in hardware virtualization support; then it's negligible again. A little more about that later.

I originally asked you to imagine you were a thread running on one processor of a multiprocessor. What happens when this isn't the case? You could be running on a uniprocessor, or, as is commonly the case, there could be more virtual machines than physical processors or processor hardware theads. For such cases, hypervisors implement a time-slicing scheduler that switches among the virtual machine clients. It's usually not as complex as schedulers in modern operating systems, but it suffices. This might be pointed to as a source of overhead: You're only getting a fraction of the whole machine! But assuming we're talking about a commercial server, you were only using 12% or so of it anyway, so that's not a problem. A more serious problem arises when you have less real memory than all the machines need; virtualization does not reduce aggregate memory requirements. But with enough memory, many virtual machines can be hosted on a single physical system with negligible degradation.

The next post covers more of the techniques used to do this, getting around some hardware limitations (translate/trap/map) and efficiency issues (paravirtualization). (Link will be added when it is posted.)

Monday, May 24, 2010

How Hardware Virtualization Works (Part 1)


Zero.

Zilch. Nada. Nothing. Rien.

That's the best approximation to the intrinsic overhead for computer hardware virtualization, with the most modern hardware and adequate resources. Judging from comments and discussions I've seen, there are many people who don't understand this. It is possible to find many explanations of hardware virtualization all over the Internet and, of course, in computer science courses. Apparently, though, they don't stick, or aren't approachable enough. So I'll try to explain in this multi-part series of posts how this trick is pulled off.

This discussion is actually a paper that has been published as a single piece in the proceedings of CloudViews – Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference, held may 20-21 in Porto, Portugal. I was planning to attend and discuss it as a talk, but unfortunately other things intervened and I could not attend.

Before talking about how hardware virtualization works, let's put it in context with cloud computing and other forms of virtualization.


Virtualization and Cloud Computing

Virtualization is not a mathematical prerequisite for cloud computing; there are cloud providers who do serve up whole physical servers on demand. However, it is very common, for two reasons:

First, it is an economic requirement. Cloud installations without virtualization are like corporate IT shops prior to virtualization; there, the average utilization of commodity and RISC/UNIX servers is about 12%. (While this seems insanely low, there is a lot of data supporting that number.) If a cloud provider could only hope for 12% utilization at best, when all servers were used, the provider will have to charge a price above that of competitors who use virtualization. That can be a valid business model, and has advantages (like somewhat greater consistency of in what's provided) and customers who prefer it, but the majority of vendors have opted for the lower-price route.

Second, it is a management requirement. One of the key things virtualization does is reduce a running computer system to a big bag of bits, which can then be treated like any other bag o' bits. Examples: It can be filed, or archived; it can be restarted after being filed or archived; it can be moved to a different physical machine; and it can be used as a template to make clones, additional instances of the same running system, thus directly supporting one of the key features of cloud computing: elasticity, expansion on demand.

Notice that I claimed the above advantages for virtualization in general, not just the hardware virtualization that creates a virtual computer. Virtual computers, or "virtual machines," are used by Amazon AWS and other providers of Infrastructure as a Service (IaaS); they lease you your own complete virtual computers, on which you can load and run essentially anything you want.

In contrast, systems like Google App Engine and Microsoft Azure provide you with complete, isolated, virtual programming platform – a Platform as a Service (PaaS). This removes some of the pain of use, like licensing, configuring and maintaining your own copy of an operating system, possibly a database system, and so on. However, it restricts you to using their platform, with their choice of programming languages and services.

In addition, there are virtualization technologies that target a point intermediate between IaaS and PaaS, such as the containers implemented in Oracle Solaris, or the WPARs of IBM AIX. These provide independent virtual copies of the operating system within one actual instantiation of the operating system.

The advantages of virtualization apply to all the variations discussed above. And if you feel like stretching your brain, imagine using all of them at the same time. It's perfectly possible: .NET running within a container running on a virtual machine.

Here, however, I will only be discussing hardware virtualization, the implementation of virtual machines as done by VMware and many others. Also, within that area, I am only going to touch lightly on virtualization of input/output functions, primarily to keep this article a reasonable length.

So, on we go to the techniques used to virtualize processors and memory. See the next post, part 2 of this series. (Link to be added when that is posted.)

Monday, May 17, 2010

Living In It: A Tale of Learning in Second Life


I think I have experienced something that captures a unique advantage of virtual worlds, something that cannot be done in the one-dimensional text streams that lay across web 2.0 pages, and cannot even be done in real life.

It's not in the usual answers to a question I still hear: "What is Second Life [or another virtual world] good for?" I know all the standard answers: create your own new life, socialize, educate, introduce products, become a millionaire virtual land baroness, etc. I don't doubt those. They are real, and many have exploited them. But they don't grab me as something that is a unique advantage of a 3D world simulation.

Recently, however, I had an experience there that did grab me and I believe has significant implications for presentation techniques in virtual worlds. As I find the case with most things dealing with virtual worlds, I think this is best described as a narrative, the way I experienced it. So, here's the tale. 








First, a short aside: What has this got to do with the usual subjects for this blog, like multicore, accelerators, cloud computing… Oh, yes. Gotta have something to use all that stuff. Think killer app. End short aside.

Oops, second short aside: What’s below is based on my experience in a course by the Shapiro Negotiations Institute. While this is the only time I’ve run into the kind of thing I describe below, I’ve no information that their implementation is unique. I hope it is not.







Prolog

A while ago (this post got delayed more than six months by my moving into a new house), I happened on an invitation to take a free course in Second Life on negotiating skills, in exchange for commenting about the technology. Free! And something I always think I can learn more about. So I signed up.

The course was run by Mark Jankowski (avatar "Mark Wisenheim") of Shapiro Negotiations Institute. I'd taken another web-based negotiation course, given by a relocation program, so I had some direct experience to compare the Shapiro SL course against; and I've attended some seminars in SL. So I thought I knew roughly what to expect.

I dropped in at the SLURL (Second Life URL) for Shapiro's campus ahead of time to make sure I wouldn't have some hangup on the day of the course, and also to be nosy and poke around.

See the pictures scattered in this post; I later took Second Life snapshots of what I found.








Arriving, I found the usual collegiate-looking green quad that screams "education!", complete with trees, benches, and Georgian-like architecture buildings. There were also some red teleport boxes. Teleports are SL shortcuts: click, and you're elsewhere. So, I clicked.

I found myself in a baseball stadium skybox, looking out over the diamond. Interesting. OK, how do I get out? There was only one door, at the back, so I walked through and found myself falling down a black hole.








I landed in Christmas holiday scene, with big Christmas trees, a large snowman, gifts, a gingerbread house, and snow on the ground. Well, hmm. What else is there?

Walking through a door in a snow-adorned mountain on one side of the scene, I went through some blackness again, but without falling, and into a symphony concert hall. There was wood paneling, a small raised amphitheater seating area for an audience, and a space outfitted for a midsize orchestra with curved ranks of musicians' stations, each with a chair and a music stand, facing a conductor's lectern.

Curiouser and curiouser.

The exit out the side wall lead through blackness again to what I can only describe as an indoor pig farm, complete with pigs, and some strange signs. A dark wooded glen was appended nearby. I could find no further passage, so I clicked on a teleport pedestal – they were scattered everywhere but in the skybox – and found myself back in the quad.








Well, that was strange enough. Christmas with a symphony and a skybox? And pigs? I decided I really would show up for the course, if for no other reason than to see what that was all about.







The Course

On the day of the course, the group of us who signed up met in the quad with Mark. He explained that this would be a run-through of the first phase of their negotiation class, covering preparation, and ushered us through the teleport to the baseball skybox.

In the skybox, he described a negotiation they had done on behalf of a baseball player who was asking for more money than ever previously paid for a shortstop, his position. Preparation: They checked other infield positions, not just shortstop, found many paid higher, and given that data were able to convince the player's club to agree with his request. The message was not to be too narrow in checking out your own worth. We were then posed with a question, and voted by walking into circles designating our choices, had some discussion about it, and left through the back door.

Down the rabbit hole to Christmas-land we went, where we heard about negotiations between a large store and the Amalgamated Order of Real Bearded Santas. The order had a local monopoly, and was jacking up prices. [See clarification in comments below.] Preparation: Found that the store in the past had its own Santa program. At Shapiro's suggestion, the store said they would pay this year, but next year re-start their own Santa program, and not use AORBS. AORBS caved. Message: Don't limit yourself to the present; look to the past, and also use possible future actions as a lever. Voting dance again, and onwards through the black corridor.








In the symphony hall, we heard the tale of a conductor who wanted them to negotiate for him to get a $1 raise. Preparation: Find out what was up with this nut. It turned out that he just wanted the organization to acknowledge that he was special, and getting $1 above union scale would do that. They got him a star on his dressing room door instead. He was delighted, and a potential mess with the union was avoided. Message: Understand what you really want from a negotiation; it may be different from what you are asking for, allowing movement within the negotiation.

Then everybody teleported back to the Quad, went into a conference room, and discussed what we thought. I gave the opinion that the settings probably took a huge amount of effort to construct; on general principles, they probably could have gotten 80% of the effect with 20% of the work. That got a polite nod, like "probably so." I also asked whether they thought Second Life was the best vehicle for this kind of thing. Mark acknowledged that, in particular, they had problems with setup. They actually bought laptops, preconfigured them with SL installed, and shipped them to executive clients for use in the sessions. That also hit me as a major investment. He said they of course got them back, and thought it was worth it when re-used across multiple clients.

That, I then thought, was it.













The Follow-Up

A week or so later, I receive an email request for any follow-up comments. I was ready to provide some unsurprising replies, when I realized something: Everything from that course was still clear in my mind.

In comparison, when I thought back to it, I remembered almost nothing from the earlier webinar I'd attended. Furthermore, the "almost" part was costly: in a salary negation a while back, forgetting one principle from the webinar cost me about a factor of two in consulting fees. Now I'll remember it. Grrr.

Also, my memory of this course was strongly anchored by those settings. I'd think of the orchestra pit, and zang! that conductor sprang to mind, and the point of the session with him. Ditto for the Christmas place and the skybox. Everything I wrote above about the sessions is, in fact, direct from memory. I didn't take any notes during the course. This is unusual for me. I am normally terrible on details, remembering relatively little outside of my notes. So what happened?

Certainly the examples used were memorable, and Mark is an excellent speaker and teacher. But the webinar instructor was good too, and, as far as I recall, which isn't much, her examples were quite good, too. But that isn't what stands to attention in my mind when I think back on this SL course: It's the settings.

Somehow, the experience of walking around inside those places, navigating their geographies, being immersed in it, makes the content easy to bring up from memory.

This seems a bit like a mnemonic trick I've heard of: Imagine a house. Walk around through its rooms, and furnish it with the facts you want to remember. Maybe do something like imagining that the dining room table has Avogadro's number on it, shaped like an avocado. I don't know; I've never managed to use it successfully. But for this course I didn't work at. It just happened.

I think this is related to how easily many people – probably not all; I don't know – can become immersed in a virtual world and identify with their avatars. For example, in real life I have a bit of vertigo. If I unexpectedly find myself near a sharp drop, or drive over a tall bridge, I feel queasy. In a virtual world, when I / my avatar goes near a cliff edge or over a flimsy edge – same thing. My stomach goes into knots. It doesn't matter that the whole thing is a digital construct, not "real." My unconscious processes don't care. As far as they're concerned, I, me, this person, is living in that world, and they react accordingly.

So I didn't attend that presentation, I lived in it. No wonder I remember it.

Compare this with the usual process of hypnotizing chickens with PowerPoint. No, don't bother; there's no comparison. Perhaps there is one useful analogy, though, one that points up a disadvantage to this technique today.

I tend to think of each of the settings used – skybox, Christmas, symphony – as each being analogous to a PowerPoint slide: It's an aid you construct to focus your audience on an aspect of your topic. One big difference that stands out to me is that it's easy to whip up a slide. Building a pocket universe, like those scenes in this course, is just too dang hard for the average presenter, given today's tools. Sorry, Lindens (owners of SL), I know your world is chock full of user-created content, and I've attended all the New Citizens' courses on building, animating, and so on, and I get it. And people obviously do it. But it's just not on a level with slide creation. People hire SL contractors to build stuff for them! Maybe if there were thousands of standard clipart-like objects to use (free! or very cheap, as for slides), and hundreds of template pocket universes to start from (free! or nearly), the situation would be different. Then maybe you could get 80% of the effect with 20% of the work required today.

This train of thought makes slide presentations done in virtual worlds seem perverse. It is, I guess, but they're all over the place. From the ones I've attended in SL, I find them a good halfway point between a webinar and actual physical presence. I do know that while attending them in-world, I'm far less likely to do email or chat with others; not so with webinars, which seem a fertile field for multitasking. Oh, and that's it's not just because my avatar will slump over, asleep, if I don't operate it. (At least it doesn't snore. (I know, it could.)) Slides in SL also do provide a way to transition from traditional web-based education to something in a virtual world. All your presentations can now be done in a corporate boardroom, or in a spaceship, or in a Tiki Hut on the edge of the sea, at sunset, with a virtual Mojito in the hand of every avatar. (I've never understood the point of virtual libations, except as decoration.)

But something much better is possible, something that appears to harnesses our hindbrains directly: Living in a presentation. It cannot be done in two dimensions. You cannot even do it in real life. It's something virtual worlds are, uniquely, good for.

(By the way, no, I never did found out what the pig farm is about.)

Monday, May 3, 2010

All Hail the GPU! – a Tweetstream


I recently attended a talk at Colorado State University (CSU) by Sharon Glotzer, and tweeted what was going on in real time. Someone listening said he missed the start, and suggested I blog it. So, here is a nearly zero-effort blog post of my literal tweetstream, complete with hashtags.

I did add a few comments at the end, and some in the middle [marked like this]

Value add: This is a CSU Information Science and Technology (ISTeC) Distinguished Lecture Series. The slides and a video of the lecture will soon appear on the page summarizing all of the lectures. Keep scrolling down to the bottom. Many prior lectures are there, too.

Starting Tweetstream:

At CSU talk by Sharon Glotzer UMich Ann Arbor "All Hail the GPU: How the Video Game Industry is Transforming Molecular & Materials Selection [no hashtag because I hit the 140 character limit]

Right now everybody's waiting to get the projector working #HailGPU

At meetup b4 talk, She said they redid their code in CUDA and overnight got 100X speedup #HailGPU [talked to her for about 3 minutes]

Backup projector found & works, presentation starting soon, I hope. #HailGPU

None of her affiliations is CS or EE -- all in materials, fluids, etc. "Good to be talking about our new tools." #HailGPU

First code she ever wrote as a grade student was for the CM-2. 64K procs. #HailGPU

Image examples of game graphics 2003, 05, 08 - 03 looks like Second Life. #HailGPU

Also b4 talk, said when they moved to Fermi they got another 2X. Not bothered using CUDA; says "it's easy." #HailGPU

Just reviewing CPU vs. GPU arch now. #HailGPU

"Typical scientific apps running on GPUs are getting 75% of peak speed." Hoowha? #HailGPU [This is an almost impossibly large efficiency. Says more about her problems than about GPUs.]

"Huge infusion from DARPA to make GPGPUs" -- Huh? Again. Who? When? #HailGPU

"Today you can get 2GF for $500. That is ridiculous." #HailGPU [bold obviously added here, to better indicate what she said]

Answer to Q: Nvidia got huge funding from DARPA to develop GPGPU technology over last 5 years. #HailGPU [I didn't know that. It makes all kinds of sense.]

"If you've ever written MPI code, CUDA is easy. Summer school students do it productively. Docs 1st rate." #HailGPU [MPI? As in message-passing? Leads to CUDA, which is stream? Say what? Must be a statement unique to her problem domain.]

Who should use them? Folks with data-parallel problems. Yes, indeed. #HailGPU

She works on self-assembly of molecules. Like lipids self-assembled into membranes. #HailGPU

Her group doing materials that change (Terminator), multi-function & sensors (Iron Man), cloaking (illustration was a blank :-) #HailGPU [cloaking as in "Klingons"] [Bah.]

Said those kinds of things are "what the material science community is doing now." #HailGPU

Hm, not seeing tweets from anybody else. Is this thing working? // ra_livesey @gregpfister - it most certainly is, keep going [Just wanted some feedback; wasn't seeing anything else.]

Her prob, Molecular Dynamics, is F=ma across a bazillion particles a bazillion times. Yeah, data parallel. #HailGPU [The second bazillion is doing the first bazillion over a bazillion time steps.]

First generates neighbor list for each particle - what particles does each particle interact with? Mainly based on distance. #HailGPU

Response to Q: Says can reduce neighbor calc from N^2 to less (but not "Barnes-Hut"), but no slides for that. #HailGPU

Typically have ~100 neighbors per particle. #HailGPU [Aha! This is where a chunk of the 100X speedup comes from! For each molecule or whatever, do exactly the same code in simple SIMD parallel for all 100 neighbors, at the same time, just varying their locations. If they had 100 threads; I think they do, would have to check. !Added in edit to this post!]

Says get same perf on $1200 GPU workstation as on $20,000 cluster. (whole MD code HOOMD-Blue) #HailGPU [I think I may have the numbers slightly wrong here – may have been $40,000, etc. – but the spirit is right; see the slides and presentation for what she exactly said.]

Most people would rewrite code for 3X speedup. For 100X, do it yesterday. #HailGPU

Done work on "patchy nanotetrahedra" forming strands that bundle together spontaneously. #HailGPU

"Monte Carlo not so data parallel" (I don't agree.) #HailGPU

Used to be a guy at IBM who did molecular dynamics on mainframes with attached vector procs. It is easy to parallelize. #HailGPU [Very, very easy. See "bazillions" above. In addition, lots of floating-point computing at each individual F=ma calculation.]

Guy at IBM was Enrico something-or-other. Forget last name. #HailGPU [Unfortunately, the only things after "Enrico" that come to my mind are "Fermi" -- which I know is wrong -- and, for some unknown psychological reason, "vermicelli." Also um, wrong. But tasty.]

Worked on how water molecules interacted. Thought massively parallel was trash. #myenemy #HailGPU

Trying to design material that, when you do something like turn on a light, chante: become opaque, start flowing, etc. #HailGPU

Also studying Tetris as a "primitive model of complex patchy particles" Like crystal structures form. #HailGPU

Students named their software suite "Glotzilla". Uh huh. She doesn't object. Self-analysis code called Freud. #HailGPU

My general take: MD simulation is a field in need of massive compute capabilities, is pleasantly parallel, more FPUs=good. #HailGPU

Answer to post-talk Q: Her Monte Carlo affects state of the system, can't accept moves that isn't legal and affects others. Strange.

Limits of GPU usability relate to memory size. They can do 100K particles, with limited-range interaction. #HailGPU

So if you have a really large-scale problem, can't use GPUs without going off-chip and losing a LOT. #HailGPU

Talk over, insane volume of tweets will now cease. #HailGPU

End of Tweetstream.

I went to a lunch with her, but didn't get a chance to ask any meaningful questions. Well, this depends on what your definition of "meaningful" is; she grew up in NYC, and therefore thinks thin-crusted, soft, drippy pizza is the only pizza. As do I. But she folds it. Heresy! That muffles the flavor!

More (or less) seriously, molecular dynamics has always been an area in which it is really fairly simple to achieve tremendous parallel efficiency: Many identical calculations (except for the data), lots of floating-point for each calculation (electric charge force, Van Der Walls forces, etc.), not a whole lot of different data required for each calculation. I have no doubt whatsoever that she gets 75% efficiency; I wouldn't be surprised at even better results. But I think it would be a mistake to think it's easy to extend such results outside that area. It was probably well worth DARPA's investment, though, in terms of the materials science enabled. I mean, cloaking? Really?