Combining high productivity with high performance on commodity hardware

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

The current advances in the natural sciences are increasingly dependent on the available in computer power. At the same time, the increase in computer power is no longer based on faster cores, but on multiple cores and specialized hardware. As most scientific software is written for sequential processing, the increase in hardware performance cannot be utilized. Most existing scientific software is written in low-level languages such as C or FORTRAN, making it difficult to rewrite these to work in parallel. As the brief CELL-BE processor history showed, writing solutions that are tied to a particular hardware platform, is a risky investment.

To make this problem worse, the scientists that have the required field expertise to write the algorithms are not formally trained programmers. This usually leads to scientists writing buggy, inefficient and hard to maintain programs. Occasionally, a skilled programmer is hired, which increases the program quality, but increases the cost of the program. This extra link also introduces longer development iterations and may introduce other errors, as the programmer is not necessarily an expert in the field. And neither approach solves the issue of changing hardware platforms.

In this thesis, I explore different approaches for efficient execution of scientific kernels, and work towards a complete system that aims to handle all the mentioned issues. I present the work on the CELL-BE processor, which comprises a CSP-like library, and a JIT-like compiler for translating CIL bytecode on the CELL-BE. I then introduce a bytecode converter that transforms simple loops in Java bytecode to GPGPU capable code.

I then introduce the numeric library for the Common Intermediate Language, NumCIL. I can then utilizing the vector programming model from NumCIL and map this to the Bohrium framework. The result is a complete system that gives the user a choice of high-level languages with no explicit parallelism, yet seamlessly performs efficient execution on a number of hardware setups.
ForlagThe Niels Bohr Institute, Faculty of Science, University of Copenhagen
Antal sider135
StatusUdgivet - 2013

ID: 88081954