The II Multicore and Parallel Computing miniconf is now over, a big thank you to all who attended the fantastic presentations. Congratulations to the excellent speakers that timely and professionally delivered great talks, and a big bravo to the LCA2011 Organisation that smoothly overcame all the problems caused for the change of venue in the last week, due to floods in Brisbane.

A large (~200 ppl) and qualified audience attended the miniconf. During good part of the day, we had Vint Cerf (VP of Google and “Father of the Internet”) in the audience, simultaneously with Linus Torvalds (Linux kernel Project Coordinator), Paul McKenney (IBM’s Linux CTO) and Dirk Hohndel (Intel’s Chief Linux and Open Source Technologist). But the most gratifying moments happened after the miniconf and on the following days, when delegates approached me with comments like: “thank you for organising such a good miniconf”, “it was very interesting, a plus of the entire LCA”, “This miniconf just gets better”.  These were the best rewards that I could expect.

Here are the presentations from the talks. Will endeavor to add photos from the miniconf in the following days. Videos will be also available as soon as the general organisers of LCA2011 release them.

Verifying Parallel Software Paul McKenney

Integration of Intel’s TBB with Facebook’s HipHop Open Parallel

In Search of Transmission Capacity – a Multicore Dilemma Vint Cerf

Is Parallel Programming Hard? Paul McKenney

How To Build Large Scale Applications Using PHP Sam Vilain

Parallel Programming – An Overview of Non-Mainstream Languages Lenz Gschwendtner

Multicore vs. FPGA John Williams

Painless Parallelization With Gearman Tim Uckun

Discovering Inherent Parallelism in Sequential Programs Wayne Kelly


Welcome to the II Multicore and Parallel Computing miniconference, part of LCA2011. This edition will be in Brisbane, Australia, Tuesday 25 January 2011

We are pleased to announce Vint Cerf and Paul McKenney as keynotes as part of a list of eight distinguished speakers on a full day packed with exciting and challenging presentations. Full Schedule is now available.

Known as one of the “Fathers of the Internet”, Vinton Cerf is Chief Internet Evangelist and VicePresident of Google, Inc. His one hour talk will be “In Search of Transmission Capacity – a Multicore Dilemma”.

Paul E. McKenney has been coding for almost four decades, more than half of that on parallel hardware, where his work has earned him a reputation among some as a flaming heretic. He will prove it along two talks: “Is Parallel Programming Hard, And If So, Why?” and “Verifying Parallel Software: Can Theory Meet Practice?”. Paul is a Distinguished Engineer and Linux Chief Technology Officer of IBM, Inc.

Topics of the day include Lightning Talks about “How to speed up WordPress using Intel’s TBB and Facebook’s HipHop” and about Web Performance Optimization on “How to build large scale applications using PHP”.

Other Talks will be “An overview of Non Mainstream Parallel Programming languages” by Lenz Gschwendtner, Open Parallel; “Multicore vs. FPGAs” by John Williams, Petalogix; “Discovering Inherent Parallelism in Sequential Programs” by Wayne Kelly, Queensland University of Technology; and “Painless Parallelization with Gearman” by Tim Uckun, Enspiral

The day will be summarised with a Panel on “Which Industries / Applications Need Parallelisation Today”, moderated by Nicolás Erdödy, Open Parallel

Here is the Schedule complete with Abstracts and Outlines. The miniconference will be in room N515 (QUT Kelvin Grove Campus). Seating capacity: 250.

For more information, contact Nicolás Erdödy – Miniconference Organiser, at “MulticoreLCA (AT) gmail (DOT) com”

Vint G. Cerf – In Search of Transmission Capacity – a Multicore Dilemma


In this talk, I will explore at least one of the challenges that arises in multicore architectures and in large scale data centers in general: moving data out of processors and into the system.

While processing speeds have leveled off, more cores deliver more cycles but moving data on and off chip as well as in and out of multicore processors is proving to be a problem. Can optical communication help?
Changes in protocols? new machine architectures?



Paul E. McKenney – Is Parallel Programming Hard, And If So, Why?


Objective of presentation:

To advocate for the position that although parallel programming might be unfamiliar to many, it is inevitable and not as difficult as the doom-cryers would have you believe.  If done properly, work-a-day parallel programming requires perhaps 5% more training than does sequential programming.



In less than a decade, multicore hardware has made the transition from exotic to ubiquitous.  To those of you who fear and loathe parallel programming, I offer my condolences, but the settled fact of the matter is that parallel programming has irrevocably joined mainstream programming practice.  However, I can also offer some good news along with the bad.  The good news is that parallel programming is not all that much more difficult than sequential programming.  Of course, some additional bad news is that most people cannot deal even with sequential programming. This talk will discuss ways that we can not just cope, but actually thrive in this new multicore environment.


-Review of MIPS/clock-frequency trends.
-Parallel is here to stay: parallel hand-held battery anecdote.
-Pastiche of “parallel is inhumanly hard” quotes.
-But the universe is concurrent!!!  And people are too!!! (cartoon)  Additional examples.
-But just because concurrency is natural does not mean that concurrent programming is natural.  Especially given that -programming- does not seem to be natural!  Three obstacles: (1) theory of mind (2) common sense (3) fragmentary plans. (Auto-rental example — free upgrade due to being Linux kernel hacker.)

Most people are not going to undertake parallel programming, mostly because most people are not going to program period!!!
Other topics from the blog series will be chosen randomly and capriciously, as there will be time for only a few:

  • The Great Multicore Software Crisis is upon us, but we can learn from the 1970s/1980s Great Software Crisis.
  • Embarrassing parallelism isn’t.
  • Parallel programmers can be trained.  Without experience and/or proper training, high IQ is a negative.
  • Darwinian selection favored fear and loathing of parallelism, but the fitness in the past does not necessarily imply fitness in the future.
  • Code is like water.  It is easy to do something with a cupful of water, but not so easy to do the same thing with the Pacific Ocean.
  • Past serial-only design and coding decisions cannot be wished away.
  • Parallelism introduces global constraints that are not as prevalent in sequential software.  A global view of parallel source code is critically important.
  • You get what you measure.  So be careful what you measure.
  • Amdahl’s Law is a great tool for evaluating scalability. Too bad that performance is what really matters.
  • Tools.  Parallel tools.  We need them.
  • Validating parallel programs requires special care.
  • Don’t ask the janitor to remodel your building. This caution also applies to software, despite the fact that your speaker is an exception to this rule.

But there is hope: like the Great Software Crisis of the 1970s and 1980s, the Great Multicore Software Crisis will spawn the good, the fad, and the ugly.  The new ubiquitous multicore hardware will be available to countless millions of developers and users, which will fuel a huge burst of creativity that will shock and amaze all of us, especially those who still fear and loathe parallel software.  As in the past this creativity will tend to favor open-source projects: if two heads are better than one, just try ten thousand heads!

Target audience:

Parallel developers, sequential developers, academics, users, and most especially innocent bystanders caught in the crossfire.

Project homepage / blog:

Paul E. McKenney – Verifying Parallel Software: Can Theory Meet Practice?


Objective of presentation:

To advocate for a deeper theoretical toolchest for those who would validate parallel programs.



The advent of low-cost readily available multicore hardware has brought new urgency to the question of validating parallel software. The traditional validation approaches leverage linearizability, commutativity, lock freedom, and wait freedom, each of which has attractive theoretical properties, but which collectively suffer from the minor shortcoming of being unable to address common parallel-programming techniques. Given that these techniques include RCU, I have a vested interest in this debate. This talk will discuss what might be done to bridge this researcher-practitioner divide.




  • “So read_barrier_depends() stuff in Linux is also totally busted. (Just like refcounting, etc.)” (2005)
  • “And I don’t believe that the semantics of read_barrier_depends() are actually definable” (2006)
  • “And I think that does work for RCU, at least for conventional optimizations. But the more I think about, the less I’m convinced that it’s 100% reliable.” (2007)

-Hardware/physics issues with traditional validation approaches.

-Mechanical-engineering viewpoint: if there are many jobs, there just might be more than one tool required.

-A few new validation tools.


Target audience:

Parallel developers, sequential developers, academics, users, and most especially innocent bystanders caught in the crossfire.


Project homepage / blog:

Lenz Gschwendtner – Parallel Programming – An Overview of Non Mainstream Languages

The recent rise of functional programming languages is not only a search for alternatives to the established C / C++ and Java world but also a quest to simplify multicore programming.

Lenz will give an overview of the variety of functional programming languages that are open source and try to conquer the concurrent programming space. He will look at Erlang, Scala and F# but also venture into less prominent ones like the RoarVM.

Go with him on a journey through the current variety of options and watch out for your new favorite language on the way.

John Williams – Multicore vs. FPGAs

FPGAs, or Field Programmable Gate Arrays, are essentially arrays of primitive digital logic resources which can be configured and connected in a huge variety of ways, to implement arbitrary digital logic functions.

Starting out life as humble glue logic, FPGAs are now large and fast enough to implement sophisticated digital systems, across a wide spectrum of architectures and applications.  From complex System-on-Chip designs, cryptographic processors, computational engines and massively parallel DSP and image processing machines, any application which can make use of the massive bit-level parallelism inherent in modern FPGA architectures is a candidate for development on this platform.

In spite of this potential, the use of FPGAs as computational accelerators to conventional computing architectures remains relatively limited.  The so-called “productivity gap” between the resources offered by modern devices vs designers’ abilities to actually make use of them, is one such reason.  Hardware circuits were traditionally designed at a very low level of abstraction, which is just not feasible for 1 million-plus gates.  High level design tools and component based re-use are two approaches to tackle this issue.

Finally there are practical issues such as bandwidth mismatches between CPUs, system memory, backplanes and FPGA devices themselves.

Looking at the world of multicore, it is tempting to draw parallels. There is no shortage of multicore architectures, with impressive specifications at all scales from embedded through to supercomputer.

The problem seems to be in actually writing software to use them. Generations of software developers have grown up in a strict single-core, Von Neumann framework, and are now struggling with the parallel thinking required to make efficient use of multicore architectures.  This is the “productivity gap” all over again.

Other analogies exist – for example the problem of automatically generating (efficient) hardware circuits from high level specifications such as C code, is more or less identical to the problem of automatic parallelisation of single-threaded code for multicore execution.  In both communities, similar approaches have been proposed – for example functional programming and its side-effect free semantics offers a good mapping to both FPGA and multicore implementations.  Unfortunately, they remain relatively obscure.

In this talk I will attempt to bring the audience up to speed with the basics of FPGA-based computing, talk about some of the ongoing research on the FPGA side, and hopefully stimulate discussion on ways that the parallels between these technologies might be fruitfully exploited.

Tim Uckun – Painless Parallelization with Gearman

Use Gearman to write scalable applications without changing your current toolset or team.


Recently I was tasked to design a solution for S4 USA called iTV Momentum .  The scope of the project dictated distributed and parallel processing from day one, since Enspiral and the client had decided to use a standard open source stack of Linux, Postgres, Apache, and Ruby/Rails I chose to achieve parallelization and distributed processing with Gearman.

The application uses a swarm of 48 workers which co-ordinate using Gearman and the database.  The infrastructure consists of Linux
virtual hosts on Linode. New machines can be brought up at will to meet increased workloads or to redistribute work from one machine to another.  The application could have been built with any language, we chose ruby because we like ruby and Enspiral has lots of ruby talent on hand.  Gearman was technology that allowed us to scale ruby and Rails. Gearman was our “Erlang in a box”.

The whole thing turned out great. The customer loves it, and I had great fun designing and developing the system.  Ruby made the
development fun, deployment is super simple using Capistrano and Git provides the lubrication to make the whole machinery tick. The system proved itself recently when Linode had a problem and one of the servers did not come back up.  I was able to redistribute the workers on that machine to others and the system kept going as normal.

I am happy to talk abut any aspect of the system but the focus of the talk will be on Gearman.  I will be talking about why we chose the technologies we used, what worked great and what could use some improvement.  I hope my talk will provide a more practical and
pragmatic way to do distributed processing using the language and framework of your choice.

Dr Wayne KellyDiscovering Inherent Parallelism in Sequential Programs

Objective of presentation:

To explain some of the key challenges faced today by developers trying to parallelise sequential programs and to present some next generation, work in progress ideas for addressing these challenges.



Most parallel programs start life as sequential programs. We first code the algorithm; debug it to get it working and then (maybe at a much later stage) try to parallelise it to improve performance. Trying to determine which parts of a sequential program can be safely run in parallel is often very challenging – especially when seeking to parallelise “coarse-grained” outer loops were the code inside the loop(s) contain many function calls, possibly to code that we didn’t write and don’t fully understand. Even compilers specifically designed to automatically parallelise such programs generally struggle to perform precise enough static dependence analysis to parallelise most real programs. The principle challenge is memory aliasing – knowing for sure whether arbitrary instructions might ever access the same memory locations. Unless a parallelising compiler can be certain that a dependence can never exist, it must conservatively decide not to parallelise that code.

At runtime, however, this memory aliasing uncertainty disappears – we know exactly which memory locations are accessed by each instruction. This talk presents the idea of dynamically instrumenting code so that all data and control dependencies that actually arise at runtime are logged and used to determine which parts of the program could have been safely executed in parallel. In other words, it’s a tool to help programmers better understand the flow of data and control within programs they’re trying to parallelise.

The dynamic instrumentation is done as part of the JIT processing of the open source Mono.NET runtime environment on Linux. The dynamically determined data dependencies are exported in an XML format and then presented to the programmer as a graphical overlay on their source code in an IDE designed to serve as parallel programmer’s workbench.



  • Exploiting inherent parallelism in sequential programs.
  • How data and control dependencies prevent parallelisation.
  • How to dynamically determine data and control dependencies.
  • Demo of dynamic data dependence collection using Mono.NET runtime on Linux.
  • Demo of graphical overlay of dependences in IDE.
  • Discussion of where to from here?


Target audience:

  • Developers wanting to understand some of the key challenges of parallelising sequential programs
  • Parallel developers seeking new tools/approaches to make their jobs easier
  • Those who like hacking JITs and assembly code


Vinton G. Cerf is Chief Internet Evangelist and VicePresident of Engineering of Google, Inc.

With a PhD in Computer Science from UCLA and another 15 Honoris Causa from seven countries, Vint is recognised worldwide for his contribution to the creation and development of the internet. With Bob Kahn he co-designed the TCP/IP protocol suite. Active evangelist, has been speaking in thousands of conferences since 1981, publishing 70+ papers since 1970 and having several guest appearances on TV series. He won the Turing award and received medals from Presidents Clinton and George W. Bush but also from Spain, Tunisia and Bulgaria. His career include senior roles at DARPA (Principal Scientist), MCI (SVP), ICANN (Chairman), WorldCom (SVP), and is president of a corporation set up to commercialise Interplanetary Internet protocols. His full CV shows the evolution of technology for the last 45 years and gives a hint of the future. Cerf’s wine cellar is internet-enabled, sending him a text message when the temperature and humidity reach unfavorable levels…His job as CIE Google is to convert the remaining 70% of world population that is not yet connected. Visit his corporate biography at Google and the Wikipedia page about him.