Award Abstract # 0953100
CAREER: Autotuning Foundations for Exascale Computing

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: GEORGIA TECH RESEARCH CORP
Initial Amendment Date: April 21, 2010
Latest Amendment Date: June 1, 2014
Award Number: 0953100
Award Instrument: Continuing Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Direct For Computer & Info Scie & Enginr
Start Date: April 15, 2010
End Date: March 31, 2015 (Estimated)
Total Intended Award Amount: $460,000.00
Total Awarded Amount to Date: $460,000.00
Funds Obligated to Date: FY 2010 = $120,437.00
FY 2011 = $81,830.00

FY 2012 = $83,830.00

FY 2013 = $85,890.00

FY 2014 = $88,013.00
History of Investigator:
  • Richard Vuduc (Principal Investigator)
    richie@cc.gatech.edu
Recipient Sponsored Research Office: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
(404)894-4819
Sponsor Congressional District: 05
Primary Place of Performance: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): EMW9FC8J3HN4
Parent UEI: EMW9FC8J3HN4
NSF Program(s): CAREER: FACULTY EARLY CAR DEV,
COMPILERS,
Software & Hardware Foundation
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
01001112DB NSF RESEARCH & RELATED ACTIVIT

01001213DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 1187, 9215, 9218, HPCC
Program Element Code(s): 104500, 732900, 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The goal of this research is to discover novel foundational principles
for developing highly-efficient and reliable software that can achieve
sustainable performance on the exascale computing platforms expected by
2020. Such platforms will deliver three orders of magnitude beyond
today?s systems; harnessing this raw computational power could
revolutionize our modeling and understanding of critical phenomena in
areas like climate modeling, energy, medicine, sustainability,
cosmology, engineering design, and massive-scale data analytics. Yet,
developing software for exascale systems is a tremendous challenge
because the hardware is complex and it is not believed that the most
productive ?high-level? software development environments (e.g.,
programming languages and libraries) will be able to effectively exploit
these exascale systems.

The investigator aims to address this challenge by using automated
tuning (autotuning) to eliminate the low performance traditionally
associated with high-level programming models. This research (a)
develops new model-driven frameworks for tuning parallel algorithms and
data structures, going beyond existing techniques that focus on
low-level code tuning; and (b) studies autotuning for programs expressed
in high-level programming models, with the aim of eliminating the
performance gap. Concomitant with this research, the PI will create a
new practicum course: The HPC Garage. The HPC Garage physically
co-locates interdisciplinary teams in a social collaborative lab space;
the teams engage in a year-long competition, called the XD Prize, to
develop highly scalable algorithms and software for NSF TeraGrid?s
next-generation XD facilities. The HPC Garage also hosts summer interns
in Georgia Tech?s Computing Research Undergraduate Intern Summer
Experience (CRUISE) program, whose mission is to encourage students,
especially those from underrepresented groups, to pursue graduate
degrees in computing.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chandramowlishwaran, A; Knobe, K; Vuduc, R "Applying the Concurrent Collections Programming Model to Asynchronous Parallel Dense Linear Algebra" ACM SIGPLAN NOTICES , v.45 , 2010 , p.345 View record at Web of Science
Choi, JW; Singh, A; Vuduc, RW "Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs" ACM SIGPLAN NOTICES , v.45 , 2010 , p.115 View record at Web of Science
Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Tuan-Anh Nguyen; Sampath, Rahul; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis; Biros, George "A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures" COMMUNICATIONS OF THE ACM , v.55 , 2012 , p.101-109
Lee, J.; Kim, H.; Vuduc, R. "When prefetching works, when it doesn't, and why" ACM Transactions on Architecture and Code Optimization (TACO) , v.9(1) , 2012 10.1145/2133382.2133384
Sim, Jaewoong; Dasgupta, Aniruddha; Kim, Hyesoon; Vuduc, Richard "A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications" ACM SIGPLAN NOTICES , v.47 , 2012 , p.11-21
Vuduc, R; Czechowski, K "What GPU Computing Means for High-End Systems" IEEE MICRO , v.31 , 2011 , p.74 View record at Web of Science

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of this CAREER project was to discover fundamental principles of designing automatically tuned algorithms and software for next-generation high-performance computing (or supercomputing) systems. In this context, “automatically tuned” (autotuned) refers to the idea of an algorithm or software adjusting itself to run as efficiently as possible on a given machine. The motivation for autotuned systems is that supercomputer-class systems are become so complex that it is become harder to predict how efficiently an algorithm or piece of software will run. Consequently, high-performance software development is becoming more costly, as is the cost of running the machines, in terms of all the metrics of time, energy, and power to execute a program.


The intellectual merit of this CAREER project has been to produce a number of research results that advance our understanding of how to build autotuned systems for supercomputers.


The first set of research results concerns a number of new principles for designing algorithms and improving the performance of software on current and future supercomputers that contain, as one of their key components, manycore co-processors. These principles highlight not just the benefits of using such co-processors, but also their limitations, which is critical to our understanding of such systems. One highlight of these results was the collaborative development of a blood flow simulation code that, in 2010, received the Gordon Bell Prize, an award for setting a new high-water mark in achieved supercomputer performance.


The second research set of results concerns new models for predicting how well a program will perform. Models are critical to developing autotuned systems in the following way: a “tunable” or “adaptive” program will having a number of parameters that need to be set correctly for a given machine; and models can narrow the set of parameters that need to be considered.

One of these models, referred to as the energy roofline model, permits reasoning not just about the execution time of an algorithm or program, but also its execution energy and power. These latter metrics are becoming as or arguably more important costs to consider than time as the scale of supercomputers increases. A new NSF project to extend this model is underway.


One culmination of these research results is the development of an autotuning framework for a class of computations known as sparse direct solvers. Such solvers lie at the heart of a number of computational science and engineering applications. As this project comes to a close, this prototype is being integrated into a widely used open source solver package.


Beyond its research outcomes, the broader impact of this CAREER project has been to advance several educational outcomes aimed at broadening interest in and the use of high-performance computing. These outcomes include the development of a new massively open online course on high-performance computing, which is to become part of Georgia Tech’s experimental low-cost Online Masters in Computer Science degree program; as well a new Research Experience for Undergraduate (REU) sub-project that aims to bring high-performance computing to the Web.


Last Modified: 07/07/2015
Modified by: Richard W Vuduc

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page