As a member of the Parallel Programming Lab (PPL), I work on the Charm++ programming model and runtime system. Charm++ is an object-based parallel programming model, mainly used in the construction of high performance computing (HPC) applications (usually involving scientific simulations). I'm involved with a variety of efforts within PPL. However, my particular area of focus is enabling the execution of applications on heterogeneous systems through various programming model, runtime system, and build process enhancements/modifications.

In addition to working on Charm++ itself, I also work on one of the applications that Charm++ has been used to create, called NAMD. NAMD is used throughout the world to simulate biomolecular systems at the atomic scale using classical mechanics. Scientists have used NAMD for genome sequencing, understanding anesthetics, combating bird flu, understanding the brain, pinpointing the causes of Parkinson's and Alzheimer's, and much more.

Overview

My research interests focus mainly on programming model and runtime system support for heterogeneous systems. In other words, what support can (should) programming languages and their associated runtime systems play in helping the programmers create applications that target heterogeneous systems? There has been an increased interest in using heterogeneous systems, especially for computation intensive codes, in recent years since power limitations effectively ended the clock speed race. We keep the term "heterogeneous" as open as possible. It may include systems with simple differences such as the amount of RAM per node in a cluster to more complex systems that include multiple host core architectures along with multiple accelerator technologies. Currently, the work focuses on supporting Cell and MIC, however, we are also looking into methods for extending support to GPGPU hardware.

Please note, since this research is in the context of the Charm++ programming model, the remainder of this discussion assumes that the reader is somewhat familiar with the Charm++ programming model. If this is not the case, please see the publications section below. The paper "Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell" has a good introduction to the research along with a brief high-level introduction to the Charm++ programming model.

We are extending the Charm++ programming model and modifying the Charm++ runtime system to support accelerator technologies and heterogeneous clusters in general. In short, we have introduced accelerator entry methods into the Charm++ programming model. Entry methods, in general, can be thought of as tasks. Accelerated entry methods are entry methods that may or may not execute on an accelerator. The underlying runtime system then takes care of automatically moving data as required to the core, host or accelerator, that is tasked to execute the entry method. Entry methods, including accelerated entry methods, and data movement all occur asynchronously under the direction of the runtime system. Given the clear boundaries between the entry methods, we have further modified runtime system to handle some of the mundane details of executing an application on a heterogeneous system. For example, with knowledge of the data types, array lengths, and so on that make up the application's data, the runtime system can modify the data to correct for architecture differences, such as endianness, as data passes between cores. In addition to accelerated entry methods, we have also introduced accelerated blocks and a SIMD Instruction Abstraction. For more details on our research, please see the publications section for relevant papers.

When programming for accelerator technologies, it is quite common for programmers to have to include architecture specific code within their application code. This increases the burden placed on programmers in that they not only have to structure their application towards a specific type of core, but it also decreases the portability of the code itself. Our modifications to the Charm++ programming model and runtime system help to divorce the application code from the architecture specific details. However, it is clear that these architecture specific details are important, especially when it comes to the performance of an application running on the given architecture. Thus, a balance must be struck to make sure performance is good while still assisting the programmer.

Perhaps more importantly, given a unified programming model and portable code, the runtime system can start doing some more interesting things on the programmer's behalf. One such activity is automatic dynamic load balancing. Given a heterogeneous application (that is, an application with multiple different calculations going on, with task variations within a given calculation), spreading the application's workload across the available cores, host and accelerator alike, may not be straight forward for the programmer to do (especially at compile time). The Charm++ load balancing framework has already makes runtime measurements to load balance applications executing on homogeneous clusters. This research intends to extend this work to load balancing on heterogeneous systems by having the runtime system dynamically migrated work between the host cores and any available accelerators.

Publications

David M. Kunzman. “Runtime support for object-based message-driven parallel applications on heterogeneous clusters.” Diss. Department of Computer Science, University of Illinois at Urbana-Champaign, June 2012. (link)
David M. Kunzman and Laxmikant V. Kalé, Programming Heterogeneous Clusters with Accelerators using Object-Based Programming, Journal of Scientific Programming 19 (2011), no. 1, 47–62, IOS Press. (link)
Laxmikant V. Kalé, David M. Kunzman, and Lukasz Wesolowski, Accelerator Support in the Charm++ Programming Model, Scientific Computing with Multicore and Accelerators (Jakub Kurzak, David A. Bader, and Jack Dongarra, eds.), CRC Press (Taylor and Francis Group), December 2010.
David M. Kunzman and Laxmikant V. Kalé. Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell. In SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1–12, New York, NY, USA, 2009. ACM. (SC'09 Best Student Paper Finalist) (link)
David M. Kunzman, and Laxmikant V. Kalé. Model for Programming Heterogeneous Systems. In SAAHPC’09: Symposium on Application Accelerators in High Performance Computing, July 2009. (link to workshop agenda, presentation slides and paper listed)
David Kunzman. Charm++ on the Cell Processor. Master’s Thesis, Dept. of Computer Science, University of Illinois, 2006. http://charm.cs.uiuc.edu/papers/KunzmanMSThesis06.shtml. (link)
David Kunzman, Gengbin Zheng, Eric Bohm, and Laxmikant V. Kalé. Charm++, Offload API, and the Cell Processor. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006. (link)

Posters

David Kunzman and Laxmikant V. Kalé. Programming Heterogeneous Systems. In the PhD Forum at the IEEE International Parallel and Distributed Processing Symposium, Anchorage, AK, USA, May, 2011. (PDF - 3.35MB)
Laxmikant V. Kalé, Celso Mendes, Gengbin Zheng, and David Kunzman. Building Petascale Applications with Charm++. In workshop on “Building Petascale Applications and Software Environments on the Teragrid,” Tempe, AZ, USA, December, 2007.
D. Kunzman, G. Zheng, E. Bohm, J. Phillips, and L. V. Kalé. Charm++ Simplifies Coding for the Cell Processor. In Super Computing 2006, Tampa Bay, FL, USA, November 2006. (PDF - 34MB)
D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé. Charm++ on Cell. In Edge Workshop, Chapel Hill, NC, USA, May 2006.
D. Kunzman, T. Wilmarth, and L. V. Kalé. Parallel VHDL Simulation. In WinterSim 2005, Orlando, FL, USA, December 2005. (PDF - 18MB)