Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


The Multicore Challenge

August 26th, 2008 / in research horizons / by Peter Lee

Researchers working in areas spanning computer architecture, programming languages, operating systems, algorithms, and more have been thinking harder about the problem of parallel computing. Why has the age-old concept of parallelism become so “hot” today? To provide the first of an upcoming series of opinion pieces, we asked David Patterson, Professor in Computer Science at UC Berkeley, to give us his thoughts, and the rationale for increased government funding to solve the multicore challenge.

Since the first commercial computer in 1950, the information technology industry has improved cost-performance of computing by about 100 billion overall. For most of the last 20 years, architects used the rapidly increasing transistor speed and budget made possible by silicon technology advances to double performance every 18 months. The implicit hardware/software contract was that increases in transistor count and power dissipation were OK as long as architects maintained the existing programming model. This contract led to innovations that were inefficient in transistors and power but which increased performance. This contract worked fine until we hit the power limit that a chip could dissipate.

Computer architects were forced to find a new paradigm to sustain ever-increasing performance. The industry decided that only viable option was to replace the single power-inefficient processor by several more efficient processors on the same chip. The whole microprocessor industry thus declared that its future was in parallel computing, with a doubling of the number of processors or cores each technology generation, which occur every two years. This style of chip was labeled a multicore microprocessor. Hence, the leap to multicore is not based on a breakthrough in programming or architecture; it’s actually a retreat from the even harder task of building power-efficient, high-clock-rate, single-core chips.

Many startups tried commercializing multiple core hardware over the years. They all failed, as programmers accustomed to continuous improvements in sequential performance saw little need to explore parallelism. Convex, Encore, Floating Point Systems, INMOS, Kendall Square Research, MasPar, nCUBE, Sequent, and Thinking Machines are just the best-known members of the Dead Parallel Computer Society, whose ranks are legion. Given this sad history, there is plenty of reason for pessimism about the future of multicore. Quoting computing pioneer and Stanford President John Hennessy:

“…when we start talking about parallelism and ease of use of truly parallel computers, we’re talking about a problem that’s as hard as any that computer science has faced. … I would be panicked if I were in industry.”

Jeopardy for the IT industry means opportunity for the research community. If researchers meet the parallel challenge, the future of IT is rosy. If they don’t, it’s not. Failure could jeopardize both the IT field and the portions of the economy that depend upon rapidly improving information technology. It is also an opportunity for the leadership in IT to move from the US to wherever in the world someone invents the solution to make it easy to write efficient parallel software.

Given this current crisis, its ironic that since 2001 DARPA chose to decrease funding of academic research in computer systems research. Knowing what we know today, if we could go back in time we would have launched a Manhattan Project to bring together the best minds in applications, software architecture, programming languages and compilers, libraries, testing and correctness, operating systems, hardware architecture, and chip design to tackle this parallel challenge.

Since we don’t have time travel, there is an even greater sense of urgency to get such an effort underway. Indeed, industry has recently stepped in to fund three universities to get underway–Berkeley, Illinois, and Stanford–but its unrealistic to expect industry to fund many more. Its also clear given the urgency and importance to the industry and the nation, we can’t depend on just three academic projects to preserve the future of the US IT industry. We need the US Government to return to its historic role to bring the many more minds on these important problem. To make real progress, we would need a long-term, multi-hundred million dollar per year program.

The consequences of not funding aren’t a drop in Nobel prizes or research breakthroughs; its a decline in the US-led IT industry, a slowdown in portions of the US economy, and possibly ceding the leadership in IT to another part of the world were governments understand the potential economic impact of funding academic IT research on parallelism.

David Patterson

The Multicore Challenge

23 comments

  1. Dan Grossman says:

    I agree with all David’s points and would add this complementary one: To my knowledge, none of the researchers addressing the parallel-computing software challenge believe there is going to be one single savior technology that we’ve simply been overlooking. Rather, even our rosiest forecasts are that a dozen or two Great Ideas will work together and/or apply in different domains to make a real difference. That makes it even more important that we have the funding to support research in a variety of subfields, time horizons, geographic regions, etc.

  2. Alwyn Goodloe says:

    I too remember those cool computers of lore.
    Heck I’ve even programmed a few of them back in the day. Hypercubes sure were cool
    as was the programming language OCCAM.
    Unfortunately, almost no one was
    willing to write code for the new
    machines. I think Hennessy’s statement about “ease of use” is really key in the effort. I worked for fourteen years in the IT industry and most CIOs will probably say that recompiling the ole dusty deck is as
    much as they are willing to do. We probably need to say exactly what we mean by ease of use.

    I’m going to say that I cringe every time
    I hear an academic arguing for a new
    Manhattan project for his particular area.
    Suppose DARPA or some other DOD agency did do a Manhattan Project. First lets make one thing clear. It was a TOP SECRET project. No clearances no work. Working on the Manhattan project was an interruption of one’s academic career to support the war. I believe it was Wheeler who received a letter by his brother serving in the European theater of operations saying “Hurry UP”. His brother would be killed a few days latter. Had the War Dept not viewed the effort as supporting combat operations they wouldn’t have gotten the resources to do it. Today, a huge effort directing a large part of the scientific community to say detecting IEDs would be a better analogy and it too would be top secret. So folks, lets put that one to rest for good and pick a better analogy.

  3. Jerry Callen says:

    I’m a former employee of several members of the Dead Parallel Computing Society. I don’t believe we died from lack of programming tools; rather, Moore’s Law rendered the performance improvements too ephemeral. As I type I’m looking at a CM-5 sandwich board; its performance was matched by a standard PC just 5 or 6 years after this board was SOTA – for WAY less money.

    Now we’re at a point where sequential processors can’t easily be made faster – really! – so parallelism actually, honestly matters. Many, many commercially interesting problems can be solved by data parallelism, and we have a growing set of data parallel tools available commercially. I’m still toiling away in the fields of commercial parallel processing, and I don’t see the need for any massive government investment here.

  4. It’s my experience that every decade or so, a packaging breakthrough allows some previously forgotten or abandoned approach to be resurrected at a lower price point, and all the previous lessons are forgotten.

    Here are a few of those lessons:
    1. parallel programming is inherently hard, and tools and techniques claiming to avoid the problems never work as advertised
    2. heterogeneous and asymmetric architectures are much harder to program effectively than homogeneous symmetric architectures
    3. Programmer-managed memory is much harder to use than system-managed (whether OS or hardware)
    4. Specialist instruction sets are much harder to use effectively than general-purpose ones

    I nearly fell out of my chair laughing when the Cell was launched. It contained all 4 of these errors in one design. Despite the commercial advantages in bringing out a game that fully exploited its features, as I recall, only 1 game available when the Playstation 3 was launched came close.

    I hereby announce Machanick’s corollory to Moore’s Law: any rate of improvement in the number of transistors you can buy for your money will be matched by erroneous expectations that programmers will become smarter.

    Unfortunately there is no Moore’s Law for IQ.

    The only real practical advantage of multicore over discrete chip multiprocessors (aside from the packaging and cost advantages) is a significant reduction of interprocess communication costs — provided IPC is core-to-core, i.e., if you communicate through shared memory, you’d better make sure that the data is cached before the communication occurs. That makes the programming problem harder, not easier (see Lesson 3 above).

    Good luck with transactional memories and all the other cool new ideas. Ask yourself one question: do they make parallel programming easier, or do they add one more wrinkle for programmers to take care of — that may be different in the next generation or on a rival design?

    Putting huge numbers of cores on-chip is a losing game. The more you add, the smaller the fraction of the problem space you are addressing, and the harder you make programming. I would much rather up the size of on-chip caches to the extent that they effectively become the main memory, with off-chip accesses becoming page faults. Whether you go multicore or aggressive uniprocessor, off-chip memory is a major bottleneck.

    As Seymour Cray taught us, the thing to aim for is not peak throughput, but average throughput. 100 cores each running at 1% of its full speed because or programming inefficiencies, inherent nonparallelism of the workload and bottlenecks in the memory system is hardly an advance on 2 to 4 cores each running at at least 50% of its full speed.

    In any case all of this misses the real excitement in the computing world: turn Moore’s Law on its head, and contemplate when something that cost $1,000,000 will cost $1. That’s the point where you can do something really exciting on a small, almost free device.

  5. JamesF says:

    “Rather, even our rosiest forecasts are that a dozen or two Great Ideas will work together and/or apply in different domains to make a real difference.”

    I definitely agree with Dan on this one. Here at Pervasive we are adding our egg to the dozen, DataRush. A flow based or actor oriented java solution which works really well with problems that focus on large amounts of data manipulation.
    JamesF
    PervasiveDataRush

  6. marvin says:

    Another thing to note is that many of the approaches that address the multicore problem are applicable to the related problem of grid computing. The benefit of a shift in how applications are developed could be twofold, both allowing actual USE of multicore chips AND (locally or globally) distributed computers.

  7. Wolf Halton says:

    Would it be easier, do you think, to program applications to specifically address a specific core, so several processes that may not be connected, could really use the multicore throughput potential? Timeslice in two dimensions?

  8. Wolf, what you are describing sounds like multitasking. Within limits, that is one of the better uses of multicore systems but the operating system can become a bottleneck.

  9. I think the key think is cost per compute power, and volume drives cost down. The one place parallism has really won is in graphics for gamer, a high volume application that demands a low price point. (The customer is millions of teenagers, not one or two DARPA folks). All sorts of non-graphics apps are now jumping on GPU bandwagon.